Joint Contextual Modeling for ASR Correction and Language Understanding

by   Yue Weng, et al.

The quality of automatic speech recognition (ASR) is critical to Dialogue Systems as ASR errors propagate to and directly impact downstream tasks such as language understanding (LU). In this paper, we propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with LU to improve the performance of both tasks simultaneously. To measure the effectiveness of this approach we used a public benchmark, the 2nd Dialogue State Tracking (DSTC2) corpus. As a baseline approach, we trained task-specific Statistical Language Models (SLM) and fine-tuned state-of-the-art Generalized Pre-training (GPT) Language Model to re-rank the n-best ASR hypotheses, followed by a model to identify the dialog act and slots. i) We further trained ranker models using GPT and Hierarchical CNN-RNN models with discriminatory losses to detect the best output given n-best hypotheses. We extended these ranker models to first select the best ASR output and then identify the dialogue act and slots in an end to end fashion. ii) We also proposed a novel joint ASR error correction and LU model, a word confusion pointer network (WCN-Ptr) with multi-head self-attention on top, which consumes the word confusions populated from the n-best. We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14 with joint models trained using small amounts of in-domain data.


page 1

page 2

page 3

page 4


N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

Error correction models form an important part of Automatic Speech Recog...

Remember the context! ASR slot error correction through memorization

Accurate recognition of slot values such as domain specific words or nam...

Clinical Dialogue Transcription Error Correction using Seq2Seq Models

Good communication is critical to good healthcare. Clinical dialogue is ...

Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding

Spoken Language Understanding (SLU) converts hypotheses from automatic s...

ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

Automatic Speech Recognition (ASR) robustness toward slot entities are c...

Cross-sentence Neural Language Models for Conversational Speech Recognition

An important research direction in automatic speech recognition (ASR) ha...

Device Directedness with Contextual Cues for Spoken Dialog Systems

In this work, we define barge-in verification as a supervised learning t...

Please sign up or login with your details

Forgot password? Click here to reset