TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

09/05/2023
by   Yuan Shangguan, et al.
0

Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job. TODM leverages insights from prior work on Supernet, where Recurrent Neural Network Transducer (RNN-T) models share weights within a Supernet. It reduces layer sizes and widths of the Supernet to obtain subnetworks, making them smaller models suitable for all hardware types. We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet: adaptive dropouts, an in-place Alpha-divergence knowledge distillation, and the use of ScaledAdam optimizer. We validate our approach by comparing Supernet-trained versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using LibriSpeech. Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3 better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.

READ FULL TEXT
research
10/15/2021

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

From wearables to powerful smart devices, modern automatic speech recogn...
research
01/08/2022

Two-Pass End-to-End ASR Model Compression

Speech recognition on smart devices is challenging owing to the small me...
research
06/11/2021

Improving RNN-T ASR Performance with Date-Time and Location Awareness

In this paper, we explore the benefits of incorporating context into a R...
research
04/07/2015

Transferring Knowledge from a RNN to a DNN

Deep Neural Network (DNN) acoustic models have yielded many state-of-the...
research
05/31/2023

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Automatic Speech Recognition (ASR) has seen remarkable advancements with...
research
05/16/2020

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

In automatic speech recognition (ASR), model pruning is a widely adopted...

Please sign up or login with your details

Forgot password? Click here to reset