DeepAI AI Chat
Log In Sign Up

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition

by   Jiangyan Yi, et al.

In order to improve the performance for far-field speech recognition, this paper proposes to distill knowledge from the close-talking model to the far-field model using parallel data. The close-talking model is called the teacher model. The far-field model is called the student model. The student model is trained to imitate the output distributions of the teacher model. This constraint can be realized by minimizing the Kullback-Leibler (KL) divergence between the output distribution of the student model and the teacher model. Experimental results on AMI corpus show that the best student model achieves up to 4.7 conventionally-trained baseline models.


page 1

page 2

page 3

page 4


Essence Knowledge Distillation for Speech Recognition

It is well known that a speech recognition system that combines multiple...

Developing Far-Field Speaker System Via Teacher-Student Learning

In this study, we develop the keyword spotting (KWS) and acoustic model ...

Teach an all-rounder with experts in different domains

In many automatic speech recognition (ASR) tasks, an ideal model has to ...

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

Teacher-student (T/S) has shown to be effective for domain adaptation of...

Shrinking Bigfoot: Reducing wav2vec 2.0 footprint

Wav2vec 2.0 is a state-of-the-art speech recognition model which maps sp...

Improved training for online end-to-end speech recognition systems

Achieving high accuracy with end-to-end speech recognizers requires care...