Leveraging Cross-Utterance Context For ASR Decoding

06/29/2023
by   Robert Flynn, et al.
0

While external language models (LMs) are often incorporated into the decoding stage of automated speech recognition systems, these models usually operate with limited context. Cross utterance information has been shown to be beneficial during second pass re-scoring, however this limits the hypothesis space based on the local information available to the first pass LM. In this work, we investigate the incorporation of long-context transformer LMs for cross-utterance decoding of acoustic models via beam search, and compare against results from n-best rescoring. Results demonstrate that beam search allows for an improved use of cross-utterance context. When evaluating on the long-format dataset AMI, results show a 0.7% and 0.3% absolute reduction on dev and test sets compared to the single-utterance setting, with improvements when including up to 500 tokens of prior context. Evaluations are also provided for Tedlium-1 with less significant improvements of around 0.1% absolute.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

LSTM language models (LSTM-LMs) have been proven to be powerful and yiel...
research
02/12/2021

Transformer Language Models with LSTM-based Cross-utterance Information Representation

The effective incorporation of cross-utterance information has the poten...
research
04/19/2021

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

This paper addresses end-to-end automatic speech recognition (ASR) for l...
research
06/23/2023

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems

Current ASR systems are mainly trained and evaluated at the utterance le...
research
08/19/2020

Cross-Utterance Language Models with Acoustic Error Sampling

The effective exploitation of richer contextual information in language ...
research
03/21/2022

Enhancing Speech Recognition Decoding via Layer Aggregation

Recently proposed speech recognition systems are designed to predict usi...
research
07/22/2015

Discriminative Segmental Cascades for Feature-Rich Phone Recognition

Discriminative segmental models, such as segmental conditional random fi...

Please sign up or login with your details

Forgot password? Click here to reset