Speech Model Pre-training for End-to-End Spoken Language Understanding

04/07/2019
by   Loren Lugosch, et al.
0

Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model. Achieving high accuracy with these end-to-end models without a large amount of training data is difficult. We propose a method to reduce the data requirements of end-to-end SLU in which the model is first pre-trained to predict words and phonemes, thus learning good features for SLU. We introduce a new SLU dataset, Fluent Speech Commands, and show that our method improves performance both when the full dataset is used for training and when only a small subset is used. We also describe preliminary experiments to gauge the model's ability to generalize to new phrases not heard during training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2020

Improving End-to-End Speech-to-Intent Classification with Reptile

End-to-end spoken language understanding (SLU) systems have many advanta...
research
02/26/2022

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

The lack of speech data annotated with labels required for spoken langua...
research
02/14/2020

A Data Efficient End-To-End Spoken Language Understanding Architecture

End-to-end architectures have been recently proposed for spoken language...
research
05/20/2021

A Streaming End-to-End Framework For Spoken Language Understanding

End-to-end spoken language understanding (SLU) has recently attracted in...
research
04/08/2022

A Study of Different Ways to Use The Conformer Model For Spoken Language Understanding

SLU combines ASR and NLU capabilities to accomplish speech-to-intent und...
research
08/06/2020

Semantic Complexity in End-to-End Spoken Language Understanding

End-to-end spoken language understanding (SLU) models are a class of mod...
research
04/07/2022

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

In spoken language understanding (SLU), what the user says is converted ...

Please sign up or login with your details

Forgot password? Click here to reset