Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis

03/22/2022
by   Zexun Wang, et al.
4

Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants. Considering that most ASR errors are caused by phonetic confusion between similar-sounding expressions, intuitively, leveraging the phoneme sequence of speech can complement ASR hypothesis and enhance the robustness of SLU. This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). The cross attention block is devised to catch the fine-grained interactions between phoneme and word embeddings in order to make the joint representations catch the phonetic and semantic features of input simultaneously and for overcoming the ASR errors in downstream natural language understanding (NLU) tasks. Extensive experiments are conducted on three datasets, showing the effectiveness and competitiveness of our approach. Additionally, We also validate the universality of CASLU and prove its complementarity when combining with other robust SLU techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2022

Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

Spoken language understanding (SLU) is an essential task for machines to...
research
02/12/2021

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

Spoken language understanding (SLU) systems extract transcriptions, as w...
research
08/30/2021

ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

Language understanding in speech-based systems have attracted much atten...
research
10/31/2022

Design Considerations For Hypothesis Rejection Modules In Spoken Language Understanding Systems

Spoken Language Understanding (SLU) systems typically consist of a set o...
research
11/03/2020

Warped Language Models for Noise Robust Language Understanding

Masked Language Models (MLM) are self-supervised neural networks trained...
research
04/11/2022

Building an ASR Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data

A Virtual Patient (VP) is a powerful tool for training medical students ...
research
05/26/2017

ASR error management for improving spoken language understanding

This paper addresses the problem of automatic speech recognition (ASR) e...

Please sign up or login with your details

Forgot password? Click here to reset