Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

11/06/2022
by   Jiatong Shi, et al.
0

Spoken language understanding (SLU) is a task aiming to extract high-level semantics from spoken utterances. Previous works have investigated the use of speech self-supervised models and textual pre-trained models, which have shown reasonable improvements to various SLU tasks. However, because of the mismatched modalities between speech signals and text tokens, previous methods usually need complex designs of the frameworks. This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models, resulting in an unsupervised speech-to-semantic pre-trained model for various tasks in SLU. To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models. Our experiments show that unsupervised ASR itself can improve the representations from speech self-supervised models. More importantly, it is shown as an efficient connector between speech and textual pre-trained models, improving the performances of five different SLU tasks. Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2022

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

Collecting sufficient labeled data for spoken language understanding (SL...
research
11/23/2022

Device Directedness with Contextual Cues for Spoken Dialog Systems

In this work, we define barge-in verification as a supervised learning t...
research
07/03/2023

Semantic enrichment towards efficient speech representations

Over the past few years, self-supervised learned speech representations ...
research
05/04/2023

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

It is challenging to extract semantic meanings directly from audio signa...
research
11/15/2022

Introducing Semantics into Speech Encoders

Recent studies find existing self-supervised speech encoders contain pri...
research
03/07/2023

Adaptive Knowledge Distillation between Text and Speech Pre-trained Models

Learning on a massive amount of speech corpus leads to the recent succes...
research
03/14/2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Transfer learning has proven to be crucial in advancing the state of spe...

Please sign up or login with your details

Forgot password? Click here to reset