Decoupled Structure for Improved Adaptability of End-to-End Models

08/25/2023
by   Keqi Deng, et al.
0

Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications. The E2E ASR model implicitly learns an internal language model (LM) which characterises the training distribution of the source domain, and the E2E trainable nature makes the internal LM difficult to adapt to the target domain with text-only data To solve this problem, this paper proposes decoupled structures for attention-based encoder-decoder (Decoupled-AED) and neural transducer (Decoupled-Transducer) models, which can achieve flexible domain adaptation in both offline and online scenarios while maintaining robust intra-domain performance. To this end, the acoustic and linguistic parts of the E2E model decoder (or prediction network) are decoupled, making the linguistic component (i.e. internal LM) replaceable. When encountering a domain shift, the internal LM can be directly replaced during inference by a target-domain LM, without re-training or using domain-specific paired speech-text data. Experiments for E2E ASR models trained on the LibriSpeech-100h corpus showed that the proposed decoupled structure gave 15.1 rate reductions on the TED-LIUM 2 and AESRC2020 corpora while still maintaining performance on intra-domain data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2023

Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax

End-to-end (E2E) automatic speech recognition (ASR) implicitly learns th...
research
02/02/2021

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

The efficacy of external language model (LM) integration with existing e...
research
05/05/2023

Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

End-to-end ASR models trained on large amount of data tend to be implici...
research
02/18/2022

Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models

In this paper, we investigate domain adaptation for low-resource Automat...
research
09/18/2023

Improved Factorized Neural Transducer Model For text-only Domain Adaptation

End-to-end models, such as the neural Transducer, have been successful i...
research
01/24/2020

Data Techniques For Online End-to-end Speech Recognition

Practitioners often need to build ASR systems for new use cases in a sho...
research
10/31/2022

Modular Hybrid Autoregressive Transducer

Text-only adaptation of a transducer model remains challenging for end-t...

Please sign up or login with your details

Forgot password? Click here to reset