End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

07/17/2022
by   Thierry Desot, et al.
0

Spoken Language Understanding (SLU) is a core task in most human-machine interaction systems. With the emergence of smart homes, smart phones and smart speakers, SLU has become a key technology for the industry. In a classical SLU approach, an Automatic Speech Recognition (ASR) module transcribes the speech signal into a textual representation from which a Natural Language Understanding (NLU) module extracts semantic information. Recently End-to-End SLU (E2E SLU) based on Deep Neural Networks has gained momentum since it benefits from the joint optimization of the ASR and the NLU parts, hence limiting the cascade of error effect of the pipeline architecture. However, little is known about the actual linguistic properties used by E2E models to predict concepts and intents from speech input. In this paper, we present a study identifying the signal features and other linguistic properties used by an E2E model to perform the SLU task. The study is carried out in the application domain of a smart home that has to handle non-English (here French) voice commands. The results show that a good E2E SLU performance does not always require a perfect ASR capability. Furthermore, the results show the superior capabilities of the E2E model in handling background noise and syntactic variation compared to the pipeline model. Finally, a finer-grained analysis suggests that the E2E model uses the pitch information of the input signal to identify voice command concepts. The results and methodology outlined in this paper provide a springboard for further analyses of E2E models in speech processing.

READ FULL TEXT

page 10

page 14

page 16

research
04/21/2021

Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

In the traditional cascading architecture for spoken language understand...
research
06/24/2021

Where are we in semantic concept extraction for Spoken Language Understanding?

Spoken language understanding (SLU) topic has seen a lot of progress the...
research
06/27/2021

Open, Sesame! Introducing Access Control to Voice Services

Personal voice assistants (VAs) are shown to be vulnerable against recor...
research
08/14/2020

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

We consider the problem of spoken language understanding (SLU) of extrac...
research
08/30/2021

ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

Language understanding in speech-based systems have attracted much atten...
research
04/02/2022

End-to-end model for named entity recognition from speech without paired training data

Recent works showed that end-to-end neural approaches tend to become ver...
research
01/25/2023

Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives

Disfluencies (i.e. interruptions in the regular flow of speech), are ubi...

Please sign up or login with your details

Forgot password? Click here to reset