Personalized Speech recognition on mobile devices

03/10/2016
by   Ian McGraw, et al.
0

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5 dictation task, running with a median speed that is seven times faster than real-time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2017

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Recurrent neural networks (RNNs), especially long short-term memory (LST...
research
02/25/2020

Small-Footprint Open-Vocabulary Keyword Spotting with Quantized LSTM Networks

We explore a keyword-based spoken language understanding system, in whic...
research
02/08/2023

Short-Term Memory Convolutions

The real-time processing of time series signals is a critical issue for ...
research
07/31/2020

Future Vector Enhanced LSTM Language Model for LVCSR

Language models (LM) play an important role in large vocabulary continuo...
research
09/26/2019

Optimizing Speech Recognition For The Edge

While most deployed speech recognition systems today still run on server...
research
03/29/2021

Shrinking Bigfoot: Reducing wav2vec 2.0 footprint

Wav2vec 2.0 is a state-of-the-art speech recognition model which maps sp...
research
10/31/2018

Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition

Low power digital signal processors (DSPs) typically have a very limited...

Please sign up or login with your details

Forgot password? Click here to reset