Sequential End-to-End Intent and Slot Label Classification and Localization

06/08/2021
by   Yiran Cao, et al.
0

Human-computer interaction (HCI) is significantly impacted by delayed responses from a spoken dialogue system. Hence, end-to-end (e2e) spoken language understanding (SLU) solutions have recently been proposed to decrease latency. Such approaches allow for the extraction of semantic information directly from the speech signal, thus bypassing the need for a transcript from an automatic speech recognition (ASR) system. In this paper, we propose a compact e2e SLU architecture for streaming scenarios, where chunks of the speech signal are processed continuously to predict intent and slot values. Our model is based on a 3D convolutional neural network (3D-CNN) and a unidirectional long short-term memory (LSTM). We compare the performance of two alignment-free losses: the connectionist temporal classification (CTC) method and its adapted version, namely connectionist temporal localization (CTL). The latter performs not only the classification but also localization of sequential audio events. The proposed solution is evaluated on the Fluent Speech Command dataset and results show our model ability to process incoming speech signal, reaching accuracy as high as 98.97 single-label classification, and as high as 95.69 on two-label prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/26/2022

Skit-S2I: An Indian Accented Speech to Intent dataset

Conventional conversation assistants extract text transcripts from the s...
research
09/29/2019

Recent Advances in End-to-End Spoken Language Understanding

This work investigates spoken language understanding (SLU) systems in th...
research
05/13/2021

Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition

In this work, we explore a Connectionist Temporal Classification (CTC) b...
research
05/20/2021

A Streaming End-to-End Framework For Spoken Language Understanding

End-to-end spoken language understanding (SLU) has recently attracted in...
research
10/31/2021

FANS: Fusing ASR and NLU for on-device SLU

Spoken language understanding (SLU) systems translate voice input comman...
research
09/28/2017

Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks

Sentence-level classification and sequential labeling are two fundamenta...
research
04/08/2022

A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

SLU combines ASR and NLU capabilities to accomplish speech-to-intent und...

Please sign up or login with your details

Forgot password? Click here to reset