Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition

05/19/2020
by   Yan Gao, et al.
0

Knowledge distillation has been widely used to compress existing deep learning models while preserving the performance on a wide range of applications. In the specific context of Automatic Speech Recognition (ASR), distillation from ensembles of acoustic models has recently shown promising results in increasing recognition performance. In this paper, we propose an extension of multi-teacher distillation methods to joint ctc-atention end-to-end ASR systems. We also introduce two novel distillation strategies. The core intuition behind both is to integrate the error rate metric to the teacher selection rather than solely focusing on the observed losses. This way, we directly distillate and optimize the student toward the relevant metric for speech recognition. We evaluated these strategies under a selection of training procedures on the TIMIT phoneme recognition task and observed promising error rate for these strategies compared to a common baseline. Indeed, the best obtained phoneme error rate of 16.4 end-to-end ASR systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Automatic Speech recognition for Speech Assessment of Preschool Children

The acoustic and linguistic features of preschool speech are investigate...
research
11/28/2022

Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

Recently, the advance in deep learning has brought a considerable improv...
research
12/19/2017

Improving End-to-End Speech Recognition with Policy Learning

Connectionist temporal classification (CTC) is widely used for maximum l...
research
10/11/2022

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

Differential privacy (DP) is one data protection avenue to safeguard use...
research
12/03/2020

End to End ASR System with Automatic Punctuation Insertion

Recent Automatic Speech Recognition systems have been moving towards end...
research
11/03/2022

Hybrid-SD (H_SD): A new hybrid evaluation metric for automatic speech recognition tasks

Many studies have examined the shortcomings of word error rate (WER) as ...
research
10/29/2018

Cascaded CNN-resBiLSTM-CTC: An End-to-End Acoustic Model For Speech Recognition

Automatic speech recognition (ASR) tasks are resolved by end-to-end deep...

Please sign up or login with your details

Forgot password? Click here to reset