Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

11/03/2020
by   Zhong Meng, et al.
0

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models. In this work, we propose an internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models with no additional model training, including the most popular recurrent neural network transducer (RNN-T) and attention-based encoder-decoder (AED) models. Trained with audio-transcript pairs, an E2E model implicitly learns an internal LM that characterizes the training data in the source domain. With ILME, the internal LM scores of an E2E model are estimated and subtracted from the log-linear interpolation between the scores of the E2E model and the external LM. The internal LM scores are approximated as the output of an E2E model when eliminating its acoustic components. ILME can alleviate the domain mismatch between training and testing, or improve the multi-domain E2E ASR. Experimented with 30K-hour trained RNN-T and AED models, ILME achieves up to 15.5 error rate reductions from Shallow Fusion on out-of-domain LibriSpeech and in-domain Microsoft production test sets, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

The efficacy of external language model (LM) integration with existing e...
research
06/04/2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

Integrating external language models (LMs) into end-to-end (E2E) models ...
research
05/05/2023

Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

End-to-end ASR models trained on large amount of data tend to be implici...
research
02/26/2020

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

This article describes a density ratio approach to integrating external ...
research
11/25/2019

Independent language modeling architecture for end-to-end ASR

The attention-based end-to-end (E2E) automatic speech recognition (ASR) ...
research
01/26/2022

Internal language model estimation through explicit context vector learning for attention-based encoder-decoder ASR

An end-to-end (E2E) speech recognition model implicitly learns a biased ...
research
02/21/2022

Adaptive Discounting of Implicit Language Models in RNN-Transducers

RNN-Transducer (RNN-T) models have become synonymous with streaming end-...

Please sign up or login with your details

Forgot password? Click here to reset