Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax

02/16/2023
by   Keqi Deng, et al.
0

End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence distribution of paired audio-transcript training data. However, it still suffers from domain shifts from training to testing, and domain adaptation is still challenging. To alleviate this problem, this paper designs a replaceable internal language model (RILM) method, which makes it feasible to directly replace the internal language model (LM) of E2E ASR models with a target-domain LM in the decoding stage when a domain shift is encountered. Furthermore, this paper proposes a residual softmax (R-softmax) that is designed for CTC-based E2E ASR models to adapt to the target domain without re-training during inference. For E2E ASR models trained on the LibriSpeech corpus, experiments showed that the proposed methods gave a 2.6 reduction on the Switchboard data and a 1.0 corpus while maintaining intra-domain ASR results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2023

Decoupled Structure for Improved Adaptability of End-to-End Models

Although end-to-end (E2E) trainable automatic speech recognition (ASR) h...
research
06/15/2022

Residual Language Model for End-to-end Speech Recognition

End-to-end automatic speech recognition suffers from adaptation to unkno...
research
10/07/2021

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

As end-to-end automatic speech recognition (ASR) models reach promising ...
research
02/26/2020

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

This article describes a density ratio approach to integrating external ...
research
12/06/2019

Audio-attention discriminative language model for ASR rescoring

End-to-end approaches for automatic speech recognition (ASR) benefit fro...
research
01/24/2020

Data Techniques For Online End-to-end Speech Recognition

Practitioners often need to build ASR systems for new use cases in a sho...
research
11/09/2022

Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

Noisy Student Training (NST) has recently demonstrated extremely strong ...

Please sign up or login with your details

Forgot password? Click here to reset