Cross-Utterance Language Models with Acoustic Error Sampling

08/19/2020
by   G. Sun, et al.
0

The effective exploitation of richer contextual information in language models (LMs) is a long-standing research problem for automatic speech recognition (ASR). A cross-utterance LM (CULM) is proposed in this paper, which augments the input to a standard long short-term memory (LSTM) LM with a context vector derived from past and future utterances using an extraction network. The extraction network uses another LSTM to encode surrounding utterances into vectors which are integrated into a context vector using either a projection of LSTM final hidden states, or a multi-head self-attentive layer. In addition, an acoustic error sampling technique is proposed to reduce the mismatch between training and test-time. This is achieved by considering possible ASR errors into the model training procedure, and can therefore improve the word error rate (WER). Experiments performed on both AMI and Switchboard datasets show that CULMs outperform the LSTM LM baseline WER. In particular, the CULM with a self-attentive layer-based extraction network and acoustic error sampling achieves 0.6 reduction on the Switchboard part and 0.9 of Eval2000 test set over the respective baselines.

READ FULL TEXT
research
02/12/2021

Transformer Language Models with LSTM-based Cross-utterance Information Representation

The effective incorporation of cross-utterance information has the poten...
research
10/27/2022

Contextual-Utterance Training for Automatic Speech Recognition

Recent studies of streaming automatic speech recognition (ASR) recurrent...
research
01/01/2020

Attentive batch normalization for lstm-based acoustic modeling of speech recognition

Batch normalization (BN) is an effective method to accelerate model trai...
research
03/01/2022

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

This study addresses robust automatic speech recognition (ASR) by introd...
research
07/31/2020

Future Vector Enhanced LSTM Language Model for LVCSR

Language models (LM) play an important role in large vocabulary continuo...
research
03/17/2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

While the community keeps promoting end-to-end models over conventional ...
research
06/29/2023

Leveraging Cross-Utterance Context For ASR Decoding

While external language models (LMs) are often incorporated into the dec...

Please sign up or login with your details

Forgot password? Click here to reset