A Streaming End-to-End Framework For Spoken Language Understanding

05/20/2021
by   Nihal Potdar, et al.
0

End-to-end spoken language understanding (SLU) has recently attracted increasing interest. Compared to the conventional tandem-based approach that combines speech recognition and language understanding as separate modules, the new approach extracts users' intentions directly from the speech signals, resulting in joint optimization and low latency. Such an approach, however, is typically designed to process one intention at a time, which leads users to take multiple rounds to fulfill their requirements while interacting with a dialogue system. In this paper, we propose a streaming end-to-end framework that can process multiple intentions in an online and incremental way. The backbone of our framework is a unidirectional RNN trained with the connectionist temporal classification (CTC) criterion. By this design, an intention can be identified when sufficient evidence has been accumulated, and multiple intentions can be identified sequentially. We evaluate our solution on the Fluent Speech Commands (FSC) dataset and the intent detection accuracy is about 97 performance of the state-of-the-art non-streaming models, but is achieved in an online and incremental way. We also employ our model to a keyword spotting task using the Google Speech Commands dataset and the results are also highly promising.

READ FULL TEXT
research
04/07/2019

Speech Model Pre-training for End-to-End Spoken Language Understanding

Whereas conventional spoken language understanding (SLU) systems map spe...
research
10/23/2019

Incremental Online Spoken Language Understanding

Spoken Language Understanding (SLU) typically comprises of an automatic ...
research
11/10/2020

A low latency ASR-free end to end spoken language understanding system

In recent years, developing a speech understanding system that classifie...
research
04/07/2022

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

In spoken language understanding (SLU), what the user says is converted ...
research
06/08/2021

Sequential End-to-End Intent and Slot Label Classification and Localization

Human-computer interaction (HCI) is significantly impacted by delayed re...
research
11/10/2018

Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency

For a large portion of real-life utterances, the intention cannot be sol...
research
06/10/2019

Automated Curriculum Learning for Turn-level Spoken Language Understanding with Weak Supervision

We propose a learning approach for turn-level spoken language understand...

Please sign up or login with your details

Forgot password? Click here to reset