End-to-End Spoken Language Understanding for Generalized Voice Assistants

06/16/2021
by   Michael Saxon, et al.
14

End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model. Previous work in this area has focused on targeted tasks in fixed domains, where the output semantic structure is assumed a priori and the input speech is of limited complexity. In this work we present our approach to developing an E2E model for generalized SLU in commercial voice assistants (VAs). We propose a fully differentiable, transformer-based, hierarchical system that can be pretrained at both the ASR and NLU levels. This is then fine-tuned on both transcription and semantic classification losses to handle a diverse set of intent and argument combinations. This leads to an SLU system that achieves significant improvements over baselines on a complex internal generalized VA dataset with a 43 the popular Fluent Speech Commands dataset. We further evaluate our model on a hard test set, exclusively containing slot arguments unseen in training, and demonstrate a nearly 20 truly demanding VA scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2021

Speech-language Pre-training for End-to-end Spoken Language Understanding

End-to-end (E2E) spoken language understanding (SLU) can infer semantics...
research
02/12/2021

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

Spoken language understanding (SLU) systems extract transcriptions, as w...
research
08/14/2020

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

We consider the problem of spoken language understanding (SLU) of extrac...
research
06/10/2019

Automated Curriculum Learning for Turn-level Spoken Language Understanding with Weak Supervision

We propose a learning approach for turn-level spoken language understand...
research
08/06/2020

Semantic Complexity in End-to-End Spoken Language Understanding

End-to-end spoken language understanding (SLU) models are a class of mod...
research
08/08/2020

Deep F-measure Maximization for End-to-End Speech Understanding

Spoken language understanding (SLU) datasets, like many other machine le...
research
11/11/2020

Towards Semi-Supervised Semantics Understanding from Speech

Much recent work on Spoken Language Understanding (SLU) falls short in a...

Please sign up or login with your details

Forgot password? Click here to reset