Transformer Language Models with LSTM-based Cross-utterance Information Representation

02/12/2021
by   G. Sun, et al.
0

The effective incorporation of cross-utterance information has the potential to improve language models (LMs) for automatic speech recognition (ASR). To extract more powerful and robust cross-utterance representations for the Transformer LM (TLM), this paper proposes the R-TLM which uses hidden states in a long short-term memory (LSTM) LM. To encode the cross-utterance information, the R-TLM incorporates an LSTM module together with a segment-wise recurrence in some of the Transformer blocks. In addition to the LSTM module output, a shortcut connection using a fusion layer that bypasses the LSTM module is also investigated. The proposed system was evaluated on the AMI meeting corpus, the Eval2000 and the RT03 telephone conversation evaluation sets. The best R-TLM achieved 0.9 TLM baseline, and 0.5 cross-utterance TLM baseline on the AMI evaluation set, Eval2000 and RT03 respectively. Improvements on Eval2000 and RT03 were further supported by significance tests. R-TLMs were found to have better LM scores on words where recognition errors are more likely to occur. The R-TLM WER can be further reduced by interpolation with an LSTM-LM.

READ FULL TEXT
research
08/19/2020

Cross-Utterance Language Models with Acoustic Error Sampling

The effective exploitation of richer contextual information in language ...
research
09/19/2017

Language Modeling with Highway LSTM

Language models (LMs) based on Long Short Term Memory (LSTM) have shown ...
research
06/29/2023

Leveraging Cross-Utterance Context For ASR Decoding

While external language models (LMs) are often incorporated into the dec...
research
06/23/2023

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

Current ASR systems are mainly trained and evaluated at the utterance le...
research
07/03/2017

Improving LSTM-CTC based ASR performance in domains with limited training data

This paper addresses the observed performance gap between automatic spee...
research
02/14/2020

A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification

Acoustic event classification (AEC) and acoustic event detection (AED) r...
research
06/15/2021

ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

Automatic Speech Recognition (ASR) robustness toward slot entities are c...

Please sign up or login with your details

Forgot password? Click here to reset