Learning a Dual-Mode Speech Recognition Model via Self-Pruning

07/25/2022
by   Chunxi Liu, et al.
0

There is growing interest in unifying the streaming and full-context automatic speech recognition (ASR) networks into a single end-to-end ASR model to simplify the model training and deployment for both use cases. While in real-world ASR applications, the streaming ASR models typically operate under more storage and computational constraints - e.g., on embedded devices - than any server-side full-context models. Motivated by the recent progress in Omni-sparsity supernet training, where multiple subnetworks are jointly optimized in one single model, this work aims to jointly learn a compact sparse on-device streaming ASR model, and a large dense server non-streaming model, in a single supernet. Next, we present that, performing supernet training on both wav2vec 2.0 self-supervised learning and supervised ASR fine-tuning can not only substantially improve the large non-streaming model as shown in prior works, and also be able to improve the compact sparse streaming model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2020

Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

The demand for fast and accurate incremental speech recognition increase...
research
10/15/2021

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

From wearables to powerful smart devices, modern automatic speech recogn...
research
01/11/2023

Dual Learning for Large Vocabulary On-Device ASR

Dual learning is a paradigm for semi-supervised machine learning that se...
research
07/09/2021

Noisy Training Improves E2E ASR for the Edge

Automatic speech recognition (ASR) has become increasingly ubiquitous on...
research
11/07/2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lex...
research
10/27/2020

Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model

Recently, online end-to-end ASR has gained increasing attention. However...
research
03/31/2022

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

History and future contextual information are known to be important for ...

Please sign up or login with your details

Forgot password? Click here to reset