Distilling the Knowledge of BERT for CTC-based ASR

09/05/2022
by   Hayato Futami, et al.
0

Connectionist temporal classification (CTC) -based models are attractive because of their fast inference in automatic speech recognition (ASR). Language model (LM) integration approaches such as shallow fusion and rescoring can improve the recognition accuracy of CTC-based ASR by taking advantage of the knowledge in text corpora. However, they significantly slow down the inference of CTC. In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR. CTC-based ASR learns the knowledge of BERT during training and does not use BERT during testing, which maintains the fast inference of CTC. Different from attention-based models, CTC-based models make frame-level predictions, so they need to be aligned with token-level predictions of BERT for distillation. We propose to obtain alignments by calculating the most plausible CTC paths. Experimental evaluations on the Corpus of Spontaneous Japanese (CSJ) and TED-LIUM2 show that our method improves the performance of CTC-based ASR without the cost of inference speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2020

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

Attention-based sequence-to-sequence (seq2seq) models have achieved prom...
research
09/08/2022

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

Connectionist temporal classification (CTC) -based models are attractive...
research
11/02/2022

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic sp...
research
11/16/2021

Attention-based Multi-hypothesis Fusion for Speech Summarization

Speech summarization, which generates a text summary from speech, can be...
research
04/21/2021

Disfluency Detection with Unlabeled Data and Small BERT Models

Disfluency detection models now approach high accuracy on English text. ...
research
11/02/2020

Abstracting Influence Paths for Explaining (Contextualization of) BERT Models

While "attention is all you need" may be proving true, we do not yet kno...
research
05/16/2020

Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss

Code-Switching (CS) remains a challenge for Automatic Speech Recognition...

Please sign up or login with your details

Forgot password? Click here to reset