From Audio to Semantics: Approaches to end-to-end spoken language understanding

09/24/2018
by   Parisa Haghani, et al.
0

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules are typically optimized independently. In this paper, we formulate audio to semantic understanding as a sequence-to-sequence problem [1]. We propose and compare various encoder-decoder based approaches that optimize both modules jointly, in an end-to-end manner. Evaluations on a real-world task show that 1) having an intermediate text representation is crucial for the quality of the predicted semantics, especially the intent arguments and 2) jointly optimizing the full system improves overall accuracy of prediction. Compared to independently trained models, our best jointly trained model achieves similar domain and intent prediction F1 scores, but improves argument word error rate by 18

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2018

Towards end-to-end spoken language understanding

Spoken language understanding system is traditionally designed as a pipe...
research
02/11/2021

Speech-language Pre-training for End-to-end Spoken Language Understanding

End-to-end (E2E) spoken language understanding (SLU) can infer semantics...
research
04/07/2022

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

In spoken language understanding (SLU), what the user says is converted ...
research
08/14/2020

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

We consider the problem of spoken language understanding (SLU) of extrac...
research
08/12/2020

End-to-End Neural Transformer Based Spoken Language Understanding

Spoken language understanding (SLU) refers to the process of inferring t...
research
07/14/2022

Two-Pass Low Latency End-to-End Spoken Language Understanding

End-to-end (E2E) models are becoming increasingly popular for spoken lan...
research
10/31/2022

Design Considerations For Hypothesis Rejection Modules In Spoken Language Understanding Systems

Spoken Language Understanding (SLU) systems typically consist of a set o...

Please sign up or login with your details

Forgot password? Click here to reset