Internal language model estimation through explicit context vector learning for attention-based encoder-decoder ASR

01/26/2022
by   Yufei Liu, et al.
0

An end-to-end (E2E) speech recognition model implicitly learns a biased internal language model (ILM) during training. To fused an external LM during inference, the scores produced by the biased ILM need to be estimated and subtracted. In this paper we propose two novel approaches to estimate the biased ILM based on Listen-Attend-Spell (LAS) models. The simpler method is to replace the context vector of the LAS decoder at every time step with a learnable vector. The other more advanced method is to use a simple feed-forward network to directly map query vectors to context vectors, making the generation of the context vectors independent of the LAS encoder. Both the learnable vector and the mapping network are trained on the transcriptions of the training data to minimize the perplexity while all the other parameters of the LAS model is fixed. Experiments show that the ILMs estimated by the proposed methods achieve the lowest perplexity. In addition, they also significantly outperform the shallow fusion method and two previously proposed Internal Language Model Estimation (ILME) approaches on multiple datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

The efficacy of external language model (LM) integration with existing e...
research
11/03/2020

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

The external language models (LM) integration remains a challenging task...
research
04/12/2021

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

Attention-based encoder-decoder (AED) models learn an implicit internal ...
research
05/05/2023

Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

End-to-end ASR models trained on large amount of data tend to be implici...
research
04/07/2021

Librispeech Transducer Model with Internal Language Model Prior Correction

We present our transducer model on Librispeech. We study variants to inc...
research
03/15/2018

Advancing Connectionist Temporal Classification With Attention Modeling

In this study, we propose advancing all-neural speech recognition by dir...
research
10/31/2022

Modular Hybrid Autoregressive Transducer

Text-only adaptation of a transducer model remains challenging for end-t...

Please sign up or login with your details

Forgot password? Click here to reset