Speech-language Pre-training for End-to-end Spoken Language Understanding

02/11/2021
by   Yao Qian, et al.
9

End-to-end (E2E) spoken language understanding (SLU) can infer semantics directly from speech signal without cascading an automatic speech recognizer (ASR) with a natural language understanding (NLU) module. However, paired utterance recordings and corresponding semantics may not always be available or sufficient to train an E2E SLU model in a real production environment. In this paper, we propose to unify a well-optimized E2E ASR encoder (speech) and a pre-trained language model encoder (language) into a transformer decoder. The unified speech-language pre-trained model (SLP) is continually enhanced on limited labeled data from a target domain by using a conditional masked language model (MLM) objective, and thus can effectively generate a sequence of intent, slot type, and slot value for given input speech in the inference. The experimental results on two public corpora show that our approach to E2E SLU is superior to the conventional cascaded method. It also outperforms the present state-of-the-art approaches to E2E SLU with much less paired data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2021

Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

In the traditional cascading architecture for spoken language understand...
research
05/29/2023

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Spoken Language Understanding (SLU) is a task that aims to extract seman...
research
06/16/2021

End-to-End Spoken Language Understanding for Generalized Voice Assistants

End-to-end (E2E) spoken language understanding (SLU) systems predict utt...
research
04/07/2021

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

A major focus of recent research in spoken language understanding (SLU) ...
research
10/26/2020

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

Much recent work on Spoken Language Understanding (SLU) is limited in at...
research
09/24/2018

From Audio to Semantics: Approaches to end-to-end spoken language understanding

Conventional spoken language understanding systems consist of two main c...
research
05/20/2023

Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding

The pre-trained speech encoder wav2vec 2.0 performs very well on various...

Please sign up or login with your details

Forgot password? Click here to reset