Last year, Google unveiled a pretty efficient & powerful app called Recorder. It leverages on-device machine learning to transcribe hour-long recordings into a readable & editable format. It was efficient in recognizing suitable tags for titles, highlighting audio and music, applause, and other elements. Even with all this, it wasn’t that easy to find the specific text or sentence especially if it is a long transcript.
Turns out, Google has just introduced “Smart Scrolling” that gives Recorder the ability to mark important sentences in any transcripts. The feature can recognize the most representative keywords, and crucial sentences to attach it with. This allows users to scroll through the long transcription with chapter-like headings. Users can either tap on the words to direct to the particular section or sentence or scroll amidst the headings.
How Does Google Recorder’s Smart Scrolling Work?
The best thing about Smart Scrolling is that it leverages on-device machine learning to get the best results. As said, it finds representative keywords and assigns them with sections that are unique textually. For this, the feature uses a distilled bidirectional transformer (BERT) and a modified extractive term frequency-inverse document frequency (TF-IDF) model.
Don’t get spooked by these technical terms. For Smart Scrolling to work, it users both these models in parallel to extract keywords and sections using aggregation heuristics (average weighted). Another benefit of using this combination of models is it helps the system mitigate drawbacks to churn out the best results.
Both these models were trained on a publicly available conversational dataset including interviews, lectures, and others. This allows the system to have the same word frequency distribution as per Zipf’s law which is crucial.
For extracting keywords, the TF-IDF model detects representative keywords and scores them based on their representativeness in a textual context. Here, TF-IDF is prone to find uncommon keywords while BERT has a high variance on the possible keywords found. When these models are used in combination, it pays off as these two models complement each other.
Next up, both of these models use a similar approach to find sections with the most importance. Several moving parts are involved in grading the sections and selecting the best results.
Smart Scrolling has to face some challenges too, one of which includes identifying whether any section or keyword is important or not. For this, the researchers trained the models to find sections with importance along with keywords with importance in these sections.
Runtime Execution
The Smart Scrolling feature embedded with the Recorder app has shown promising results. When recording something, the underlying system processes each section as soon as the microphone captures it and feeds it as an input. The system stores intermediate results in the memory and once the recording is over, the system consolidates all the results and shows them to the user.
Interestingly, the researchers have already made quite a significant importance in terms of mastering the ‘Recorder’ app. With Smart Scrolling, users have an additional navigational ability to hover on the important parts of any transcript without searching from A to Z.