Deliberation Model for On-Device Spoken Language Understanding

04/04/2022
by   Duc Le, et al.
1

We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings. By formulating E2E SLU as a generalized decoder, our system is able to support complex compositional semantic structures. Furthermore, the sharing of parameters between ASR and NLU makes the system especially suitable for resource-constrained (on-device) environments; our proposed approach consistently outperforms strong pipeline NLU baselines by 0.82 version of the TOPv2 dataset. We demonstrate that the fusion of text and audio features, coupled with the system's ability to rewrite the first-pass hypothesis, makes our approach more robust to ASR errors. Finally, we show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training, but more work is required to make text-to-speech (TTS) a viable solution for scaling up E2E SLU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2023

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

End-to-end (E2E) spoken language understanding (SLU) systems that genera...
research
12/15/2020

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Voice Assistants such as Alexa, Siri, and Google Assistant typically use...
research
06/12/2023

Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

Recent voice assistants are usually based on the cascade spoken language...
research
08/14/2020

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

We consider the problem of spoken language understanding (SLU) of extrac...
research
07/14/2022

Two-Pass Low Latency End-to-End Spoken Language Understanding

End-to-end (E2E) models are becoming increasingly popular for spoken lan...
research
07/20/2023

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

There has been an increased interest in the integration of pretrained sp...
research
11/26/2020

SLURP: A Spoken Language Understanding Resource Package

Spoken Language Understanding infers semantic meaning directly from audi...

Please sign up or login with your details

Forgot password? Click here to reset