Two-Pass Low Latency End-to-End Spoken Language Understanding

07/14/2022
by   Siddhant Arora, et al.
0

End-to-end (E2E) models are becoming increasingly popular for spoken language understanding (SLU) systems and are beginning to achieve competitive performance to pipeline-based approaches. However, recent work has shown that these models struggle to generalize to new phrasings for the same intent indicating that models cannot understand the semantic content of the given utterance. In this work, we incorporated language models pre-trained on unlabeled text data inside E2E-SLU frameworks to build strong semantic representations. Incorporating both semantic and acoustic information can increase the inference time, leading to high latency when deployed for applications like voice assistants. We developed a 2-pass SLU system that makes low latency prediction using acoustic information from the few seconds of the audio in the first pass and makes higher quality prediction in the second pass by combining semantic and acoustic representations. We take inspiration from prior work on 2-pass end-to-end speech recognition systems that attends on both audio and first-pass hypothesis using a deliberation network. The proposed 2-pass SLU system outperforms the acoustic-based SLU model on the Fluent Speech Commands Challenge Set and SLURP dataset and reduces latency, thus improving user experience. Our code and models are publicly available as part of the ESPnet-SLU toolkit.

READ FULL TEXT
research
04/04/2022

Deliberation Model for On-Device Spoken Language Understanding

We propose a novel deliberation-based approach to end-to-end (E2E) spoke...
research
11/10/2020

A low latency ASR-free end to end spoken language understanding system

In recent years, developing a speech understanding system that classifie...
research
05/02/2023

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

Recently there have been efforts to introduce new benchmark tasks for sp...
research
02/14/2020

A Data Efficient End-To-End Spoken Language Understanding Architecture

End-to-end architectures have been recently proposed for spoken language...
research
09/24/2018

From Audio to Semantics: Approaches to end-to-end spoken language understanding

Conventional spoken language understanding systems consist of two main c...
research
03/28/2020

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

Thus far, end-to-end (E2E) models have not been shown to outperform stat...
research
01/15/2021

TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored Search

Text encoders based on C-DSSM or transformers have demonstrated strong p...

Please sign up or login with your details

Forgot password? Click here to reset