Two-Pass End-to-End ASR Model Compression

01/08/2022
by   Nauman Dawalatabad, et al.
0

Speech recognition on smart devices is challenging owing to the small memory footprint. Hence small size ASR models are desirable. With the use of popular transducer-based models, it has become possible to practically deploy streaming speech recognition models on small devices [1]. Recently, the two-pass model [2] combining RNN-T and LAS modules has shown exceptional performance for streaming on-device speech recognition. In this work, we propose a simple and effective approach to reduce the size of the two-pass model for memory-constrained devices. We employ a popular knowledge distillation approach in three stages using the Teacher-Student training technique. In the first stage, we use a trained RNN-T model as a teacher model and perform knowledge distillation to train the student RNN-T model. The second stage uses the shared encoder and trains a LAS rescorer for student model using the trained RNN-T+LAS teacher model. Finally, we perform deep-finetuning for the student model with a shared RNN-T encoder, RNN-T decoder, and LAS rescorer. Our experimental results on standard LibriSpeech dataset show that our system can achieve a high compression rate of 55 significant degradation in the WER compared to the two-pass teacher model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2022

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

The smaller memory bandwidth in smart devices prompts development of sma...
research
11/11/2020

Efficient Knowledge Distillation for RNN-Transducer Models

Knowledge Distillation is an effective method of transferring knowledge ...
research
06/14/2021

CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

We propose a simple yet effective method to compress an RNN-Transducer (...
research
09/05/2023

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

Automatic Speech Recognition (ASR) models need to be optimized for speci...
research
11/28/2022

Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

Recently, the advance in deep learning has brought a considerable improv...
research
03/10/2023

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

This work studies knowledge distillation (KD) and addresses its constrai...
research
06/29/2022

Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices

Modern search systems use several large ranker models with transformer a...

Please sign up or login with your details

Forgot password? Click here to reset