Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition

07/15/2022
by   Xun Gong, et al.
0

Modern non-autoregressive (NAR) speech recognition systems aim to accelerate the inference speed; however, they suffer from performance degradation compared with autoregressive (AR) models as well as the huge model size issue. We propose a novel knowledge transfer and distillation architecture that leverages knowledge from AR models to improve the NAR performance while reducing the model's size. Frame- and sequence-level objectives are well-designed for transfer learning. To further boost the performance of NAR, a beam search method on Mask-CTC is developed to enlarge the search space during the inference stage. Experiments show that the proposed NAR beam search relatively reduces CER by over 5 real-time-factor (RTF) increment. By knowledge transfer, the NAR student who has the same size as the AR teacher obtains relative CER reductions of 8/16 AISHELL-1 dev/test sets, and over 25 test-clean/other sets. Moreover, the  9x smaller NAR models achieve  25 relative CER/WER reductions on both AISHELL-1 and LibriSpeech benchmarks with the proposed knowledge transfer and distillation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2020

Improving Fluency of Non-Autoregressive Machine Translation

Non-autoregressive (nAR) models for machine translation (MT) manifest su...
research
05/11/2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Although attention based end-to-end models have achieved promising perfo...
research
07/20/2021

Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models

Non-autoregressive (NAR) modeling has gained more and more attention in ...
research
12/16/2021

Can Multilinguality benefit Non-autoregressive Machine Translation?

Non-autoregressive (NAR) machine translation has recently achieved signi...
research
10/24/2020

Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

Non-autoregressive models greatly improve decoding speed over typical se...
research
04/06/2021

Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization

Mandarin-English code-switching (CS) is frequently used among East and S...
research
04/14/2022

Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via Intent Conditioning

Semantic parsing (SP) is a core component of modern virtual assistants l...

Please sign up or login with your details

Forgot password? Click here to reset