Improving CTC-based ASR Models with Gated Interlayer Collaboration

05/25/2022
by   Yuting Yang, et al.
0

For Automatic Speech Recognition (ASR), the CTC-based methods have become a dominant paradigm due to its simple architecture and efficient non-autoregressive inference manner. However, these methods without external language models usually lack the capacity of modeling the conditional dependencies and the textual interaction. In this work, we present a Gated Interlayer Collaboration (GIC) mechanism which introduces the contextual information into the models and relaxes the conditional independence assumption of the CTC-based models. Specifically, we train the model with intermediate CTC losses calculated by the interlayer outputs of the model, in which the probability distributions of the intermediate layers naturally serve as soft label sequences. The GIC block consists of an embedding layer to obtain the textual embedding of the soft label at each position, and a gate unit to fuse the textual embedding and the acoustic features. Experiments on AISHELL-1 and AIDATATANG benchmarks show that the proposed method outperforms the recently published CTC-based ASR models. Specifically, our method achieves CER of 4.0 dev/test sets using CTC greedy search decoding without external language models.

READ FULL TEXT
research
04/06/2021

Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions

This paper proposes a method to relax the conditional independence assum...
research
12/05/2018

End-to-end contextual speech recognition using class language models and a token passing decoder

End-to-end modeling (E2E) of automatic speech recognition (ASR) blends a...
research
09/19/2023

Semi-Autoregressive Streaming ASR With Label Context

Non-autoregressive (NAR) modeling has gained significant interest in spe...
research
11/02/2022

InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss

This paper presents InterMPL, a semi-supervised learning method of end-t...
research
05/18/2023

A Lexical-aware Non-autoregressive Transformer-based ASR Model

Non-autoregressive automatic speech recognition (ASR) has become a mains...
research
04/10/2021

Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR

Continuous integrate-and-fire (CIF) based models, which use a soft and m...

Please sign up or login with your details

Forgot password? Click here to reset