Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions

04/06/2021
by   Jumon Nozaki, et al.
0

This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) models. We train a CTC-based ASR model with auxiliary CTC losses in intermediate layers in addition to the original CTC loss in the last layer. During both training and inference, each generated prediction in the intermediate layers is summed to the input of the next layer to condition the prediction of the last layer on those intermediate predictions. Our method is easy to implement and retains the merits of CTC-based ASR: a simple model architecture and fast decoding speed. We conduct experiments on three different ASR corpora. Our proposed method improves a standard CTC model significantly (e.g., more than 20 with a little computational overhead. Moreover, for the TEDLIUM2 corpus and the AISHELL-1 corpus, it achieves a comparable performance to a strong autoregressive model with beam search, but the decoding speed is at least 30 times faster.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2021

Intermediate Loss Regularization for CTC-based Speech Recognition

We present a simple and efficient auxiliary loss function for automatic ...
research
05/25/2022

Improving CTC-based ASR Models with Gated Interlayer Collaboration

For Automatic Speech Recognition (ASR), the CTC-based methods have becom...
research
12/27/2022

Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Automatic Speech Recognition (ASR) systems frequently use a search-based...
research
08/16/2022

Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

Optimization of modern ASR architectures is among the highest priority t...
research
11/02/2022

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

This paper presents InterMPL, a semi-supervised learning method of end-t...
research
04/01/2022

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

This paper proposes InterAug: a novel training method for CTC-based ASR ...
research
10/12/2022

A context-aware knowledge transferring strategy for CTC-based ASR

Non-autoregressive automatic speech recognition (ASR) modeling has recei...

Please sign up or login with your details

Forgot password? Click here to reset