Log In Sign Up

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

by   Mohammad Zeineldeen, et al.

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to this implicit LM, similarly as in the hybrid hidden Markov model approach. The implicit LM cannot be calculated efficiently in general and it is yet unclear what are the best methods to estimate it. In this work, we compare different approaches from the literature and propose several novel methods to estimate the ILM directly from the AED model. Our proposed methods outperform all previous approaches. We also investigate other methods to suppress the ILM mainly by decreasing the capacity of the AED model, limiting the label context, and also by training the AED model together with a pre-existing LM.


page 1

page 2

page 3

page 4


Internal language model estimation through explicit context vector learning for attention-based encoder-decoder ASR

An end-to-end (E2E) speech recognition model implicitly learns a biased ...

Early Stage LM Integration Using Local and Global Log-Linear Combination

Sequence-to-sequence models with an implicit alignment mechanism (e.g. a...

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Attention-based recurrent neural encoder-decoder models present an elega...

Improving Factored Hybrid HMM Acoustic Modeling without State Tying

In this work, we show that a factored hybrid hidden Markov model (FH-HMM...

Modular Hybrid Autoregressive Transducer

Text-only adaptation of a transducer model remains challenging for end-t...

Relaxed Attention for Transformer Models

The powerful modeling capabilities of all-attention-based transformer ar...

On Language Model Integration for RNN Transducer based Speech Recognition

The mismatch between an external language model (LM) and the implicitly ...