Semi-Supervised Speech-Language Joint Pre-Training for Spoken Language Understanding

10/05/2020
by   Yu-An Chung, et al.
0

Spoken language understanding (SLU) requires a model to analyze input acoustic signals to understand its linguistic content and make predictions. To boost the models' performance, various pre-training methods have been proposed to utilize large-scale unlabeled text and speech data. However, the inherent disparities between the two modalities necessitate a mutual analysis. In this paper, we propose a novel semi-supervised learning method, AlignNet, to jointly pre-train the speech and language modules. Besides a self-supervised masked language modeling of the two individual modules, AlignNet aligns representations from paired speech and transcripts in a shared latent semantic space. Thus, during fine-tuning, the speech module alone can produce representations carrying both acoustic information and contextual semantic knowledge. Experimental results verify the effectiveness of our approach on various SLU tasks. For example, AlignNet improves the previous state-of-the-art accuracy on the Spoken SQuAD dataset by 6.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2021

Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

In the traditional cascading architecture for spoken language understand...
research
06/14/2023

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

Self-supervised learning (SSL) for speech representation has been succes...
research
11/24/2022

TESSP: Text-Enhanced Self-Supervised Speech Pre-training

Self-supervised speech pre-training empowers the model with the contextu...
research
02/13/2020

Pre-Training for Query Rewriting in A Spoken Language Understanding System

Query rewriting (QR) is an increasingly important technique to reduce cu...
research
02/15/2021

MAPGN: MAsked Pointer-Generator Network for sequence-to-sequence pre-training

This paper presents a self-supervised learning method for pointer-genera...
research
05/20/2023

Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding

The pre-trained speech encoder wav2vec 2.0 performs very well on various...
research
09/14/2021

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Task-adaptive pre-training (TAPT) and Self-training (ST) have emerged as...

Please sign up or login with your details

Forgot password? Click here to reset