Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

06/27/2023
by   Haitao Tang, et al.
0

Transducer is one of the mainstream frameworks for streaming speech recognition. There is a performance gap between the streaming and non-streaming transducer models due to limited context. To reduce this gap, an effective way is to ensure that their hidden and output distributions are consistent, which can be achieved by hierarchical knowledge distillation. However, it is difficult to ensure the distribution consistency simultaneously because the learning of the output distribution depends on the hidden one. In this paper, we propose an adaptive two-stage knowledge distillation method consisting of hidden layer learning and output layer learning. In the former stage, we learn hidden representation with full context by applying mean square error loss function. In the latter stage, we design a power transformation based adaptive smoothness method to learn stable output distribution. It achieved 19% relative reduction in word error rate, and a faster response for the first token compared with the original streaming model in LibriSpeech corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2023

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

Streaming automatic speech recognition (ASR) models are restricted from ...
research
03/16/2023

DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech ...
research
11/02/2022

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames

Recently, the unified streaming and non-streaming two-pass (U2/U2++) end...
research
11/11/2020

Efficient Knowledge Distillation for RNN-Transducer Models

Knowledge Distillation is an effective method of transferring knowledge ...
research
05/18/2023

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Due to the rapid development of computing hardware resources and the dra...
research
10/11/2021

Towards Streaming Egocentric Action Anticipation

Egocentric action anticipation is the task of predicting the future acti...
research
06/01/2023

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

The unified streaming and non-streaming speech recognition model has ach...

Please sign up or login with your details

Forgot password? Click here to reset