Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

04/17/2019
by   Gakuto Kurata, et al.
0

Conventional automatic speech recognition (ASR) systems trained from frame-level alignments can easily leverage posterior fusion to improve ASR accuracy and build a better single model with knowledge distillation. End-to-end ASR systems trained using the Connectionist Temporal Classification (CTC) loss do not require frame-level alignment and hence simplify model training. However, sparse and arbitrary posterior spike timings from CTC models pose a new set of challenges in posterior fusion from multiple models and knowledge distillation between CTC models. We propose a method to train a CTC model so that its spike timings are guided to align with those of a pre-trained guiding CTC model. As a result, all models that share the same guiding model have aligned spike timings. We show the advantage of our method in various scenarios including posterior fusion of CTC models and knowledge distillation between CTC models with different architectures. With the 300-hour Switchboard training data, the single word CTC model distilled from multiple models improved the word error rates to 13.7 Switchboard/CallHome test sets without using any data augmentation, language model, or complex decoder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2023

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

End-to-end (E2E) systems have shown comparable performance to hybrid sys...
research
03/17/2021

Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation

End-to-end automatic speech recognition (ASR), unlike conventional ASR, ...
research
02/16/2021

Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation

We present a novel large-context end-to-end automatic speech recognition...
research
05/21/2023

Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems

End-to-end (e2e) systems have recently gained wide popularity in automat...
research
11/28/2022

Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

Recently, the advance in deep learning has brought a considerable improv...
research
05/18/2023

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Due to the rapid development of computing hardware resources and the dra...
research
06/04/2020

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

This paper describes FBK's participation in the IWSLT 2020 offline speec...

Please sign up or login with your details

Forgot password? Click here to reset