Noisy Training Improves E2E ASR for the Edge

by   Dilin Wang, et al.

Automatic speech recognition (ASR) has become increasingly ubiquitous on modern edge devices. Past work developed streaming End-to-End (E2E) all-neural speech recognizers that can run compactly on edge devices. However, E2E ASR models are prone to overfitting and have difficulties in generalizing to unseen testing data. Various techniques have been proposed to regularize the training of ASR models, including layer normalization, dropout, spectrum data augmentation and speed distortions in the inputs. In this work, we present a simple yet effective noisy training strategy to further improve the E2E ASR model training. By introducing random noise to the parameter space during training, our method can produce smoother models at convergence that generalize better. We apply noisy training to improve both dense and sparse state-of-the-art Emformer models and observe consistent WER reduction. Specifically, when training Emformers with 90 WER improvements on the LibriSpeech Test-other and Test-clean data set, respectively.


page 1

page 2

page 3

page 4


Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

The demand for fast and accurate incremental speech recognition increase...

Learning a Dual-Mode Speech Recognition Model via Self-Pruning

There is growing interest in unifying the streaming and full-context aut...

Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios

Although end-to-end automatic speech recognition (E2E ASR) has achieved ...

SynthASR: Unlocking Synthetic Data for Speech Recognition

End-to-end (E2E) automatic speech recognition (ASR) models have recently...

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

From wearables to powerful smart devices, modern automatic speech recogn...

Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation

Data augmentation is a technique to generate new training data based on ...

ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization

End-to-end (E2E) multi-channel ASR systems show state-of-the-art perform...

Please sign up or login with your details

Forgot password? Click here to reset