N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses

06/11/2021
by   Karthik Ganesan, et al.
0

Spoken Language Understanding (SLU) systems parse speech into semantic structures like dialog acts and slots. This involves the use of an Automatic Speech Recognizer (ASR) to transcribe speech into multiple text alternatives (hypotheses). Transcription errors, common in ASRs, impact downstream SLU performance negatively. Approaches to mitigate such errors involve using richer information from the ASR, either in form of N-best hypotheses or word-lattices. We hypothesize that transformer models learn better with a simpler utterance representation using the concatenation of the N-best ASR alternatives, where each alternative is separated by a special delimiter [SEP]. In our work, we test our hypothesis by using concatenated N-best ASR alternatives as the input to transformer encoder models, namely BERT and XLM-RoBERTa, and achieve performance equivalent to the prior state-of-the-art model on DSTC2 dataset. We also show that our approach significantly outperforms the prior state-of-the-art when subjected to the low data regime. Additionally, this methodology is accessible to users of third-party ASR APIs which do not provide word-lattice information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2020

Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding

Spoken Language Understanding (SLU) converts hypotheses from automatic s...
research
11/19/2021

Lattention: Lattice-attention in ASR rescoring

Lattices form a compact representation of multiple hypotheses generated ...
research
11/16/2021

Attention-based Multi-hypothesis Fusion for Speech Summarization

Speech summarization, which generates a text summary from speech, can be...
research
02/03/2020

Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks

Spoken dialogue systems typically use a list of top-N ASR hypotheses for...
research
04/13/2020

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

Automatic Speech Recognition (ASR) systems introduce word errors, which ...
research
12/07/2020

Using multiple ASR hypotheses to boost i18n NLU performance

Current voice assistants typically use the best hypothesis yielded by th...
research
01/15/2023

Improving Noise Robustness for Spoken Content Retrieval using Semi-supervised ASR and N-best Transcripts for BERT-based Ranking Models

BERT-based re-ranking and dense retrieval (DR) systems have been shown t...

Please sign up or login with your details

Forgot password? Click here to reset