DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

03/16/2023
by   Yanzhe Fu, et al.
0

Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech recognition (ASR). However, the large model size and the non-streaming architecture make it hard to be used under low-resource or streaming scenarios. In this work, we propose a two-stage knowledge distillation method to solve these two problems: the first step is to make the big and non-streaming teacher model smaller, and the second step is to make it streaming. Specially, we adopt the MSE loss for the distillation of hidden layers and the modified LF-MMI loss for the distillation of the prediction layer. Experiments are conducted on Gigaspeech, Librispeech, and an in-house dataset. The results show that the distilled student model (DistillW2V2) we finally get is 8x faster and 12x smaller than the original teacher model. For the 480ms latency setup, the DistillW2V2's relative word error rate (WER) degradation varies from 9 23.4 application scope.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2023

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

Streaming automatic speech recognition (ASR) models are restricted from ...
research
10/01/2022

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

The smaller memory bandwidth in smart devices prompts development of sma...
research
06/27/2023

Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Transducer is one of the mainstream frameworks for streaming speech reco...
research
11/11/2020

Efficient Knowledge Distillation for RNN-Transducer Models

Knowledge Distillation is an effective method of transferring knowledge ...
research
05/18/2023

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Due to the rapid development of computing hardware resources and the dra...
research
06/13/2023

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer

Conformer-based end-to-end models have become ubiquitous these days and ...
research
08/16/2023

Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

Millimeter wave (mmWave) based speech recognition provides more possibil...

Please sign up or login with your details

Forgot password? Click here to reset