Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

by   Milind Rao, et al.

We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening up the opportunity to deploy on hardware constrained scenarios like devices enabling voice assistants to work offline, in a privacy preserving manner, whilst also reducing server costs. We first present models that extract utterance intent directly from speech without intermediate text output. We then present a compositional model, which generates the transcript using the Listen Attend Spell ASR system and then extracts interpretation using a neural NLU model. Finally, we contrast these methods to a jointly trained end-to-end joint SLU model, consisting of ASR and NLU subsystems which are connected by a neural network based interface instead of text, that produces transcripts as well as NLU interpretation. We show that the jointly trained model shows improvements to ASR incorporating semantic information from NLU and also improves NLU by exposing it to ASR confusion encoded in the hidden layer.


page 1

page 2

page 3

page 4


Deliberation Model for On-Device Spoken Language Understanding

We propose a novel deliberation-based approach to end-to-end (E2E) spoke...

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

Spoken language understanding (SLU) systems extract transcriptions, as w...

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Voice Assistants such as Alexa, Siri, and Google Assistant typically use...

End-to-End Spoken Language Understanding for Generalized Voice Assistants

End-to-end (E2E) spoken language understanding (SLU) systems predict utt...

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

Spoken Language Understanding (SLU) is a core task in most human-machine...

From Audio to Semantics: Approaches to end-to-end spoken language understanding

Conventional spoken language understanding systems consist of two main c...

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Spoken Language Understanding (SLU) is a task that aims to extract seman...

Please sign up or login with your details

Forgot password? Click here to reset