Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

06/02/2020
by   Yuan Shangguan, et al.
0

The demand for fast and accurate incremental speech recognition increases as the applications of automatic speech recognition (ASR) proliferate. Incremental speech recognizers output chunks of partially recognized words while the user is still talking. Partial results can be revised before the ASR finalizes its hypothesis, causing instability issues. We analyze the quality and stability of on-device streaming end-to-end (E2E) ASR models. We first introduce a novel set of metrics that quantify the instability at word and segment levels. We study the impact of several model training techniques that improve E2E model qualities but degrade model stability. We categorize the causes of instability and explore various solutions to mitigate them in a streaming E2E ASR system. Index Terms: ASR, stability, end-to-end, text normalization,on-device, RNN-T

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2022

Learning a Dual-Mode Speech Recognition Model via Self-Pruning

There is growing interest in unifying the streaming and full-context aut...
research
07/09/2021

Noisy Training Improves E2E ASR for the Edge

Automatic speech recognition (ASR) has become increasingly ubiquitous on...
research
11/07/2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lex...
research
04/06/2021

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

As speech-enabled devices such as smartphones and smart speakers become ...
research
02/23/2023

Evaluating Automatic Speech Recognition in an Incremental Setting

The increasing reliability of automatic speech recognition has prolifera...
research
05/07/2020

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

In recent years, all-neural end-to-end approaches have obtained state-of...
research
10/05/2021

Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Fast contextual adaptation has shown to be effective in improving Automa...

Please sign up or login with your details

Forgot password? Click here to reset