Context-aware RNNLM Rescoring for Conversational Speech Recognition

11/18/2020
by   Kun Wei, et al.
0

Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance. To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new context-aware manner. For RNNLM training, we capture the contextual dependencies by concatenating adjacent sentences with various tag words, such as speaker or intention information. For lattice rescoring, the lattice of adjacent sentences are also connected with the first-pass decoded result by tag words. Besides, we also adopt a selective concatenation strategy based on tf-idf, making the best use of contextual similarity to improve transcription performance. Results on four different conversation test sets show that our approach yields up to 13.1 reduction compared with 1st-pass decoding and common lattice-rescoring, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition

LSTM language models (LSTM-LMs) have been proven to be powerful and yiel...
research
05/21/2019

Acoustic-to-Word Models with Conversational Context Information

Conversational context information, higher-level knowledge that spans ac...
research
05/21/2023

CASA-ASR: Context-Aware Speaker-Attributed ASR

Recently, speaker-attributed automatic speech recognition (SA-ASR) has a...
research
11/05/2021

Conversational speech recognition leveraging effective fusion methods for cross-utterance language modeling

Conversational speech normally is embodied with loose syntactic structur...
research
03/18/2021

Contextual Biasing of Language Models for Speech Recognition in Goal-Oriented Conversational Agents

Goal-oriented conversational interfaces are designed to accomplish speci...
research
09/03/2021

Detecting Speaker Personas from Conversational Texts

Personas are useful for dialogue response prediction. However, the perso...
research
03/13/2023

Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model

Despite the success of deep neural network (DNN) on sequential data (i.e...

Please sign up or login with your details

Forgot password? Click here to reset