Introducing Semantics into Speech Encoders

11/15/2022
by   Derek Xu, et al.
10

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding performance by over 10% on intent classification, with modest gains in named entity resolution and slot filling, and spoken question answering FF1 score by over 2%. Our unsupervised approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders.

READ FULL TEXT
research
05/04/2023

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

It is challenging to extract semantic meanings directly from audio signa...
research
11/10/2022

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

Collecting sufficient labeled data for spoken language understanding (SL...
research
11/06/2022

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Spoken language understanding (SLU) is a task aiming to extract high-lev...
research
10/26/2020

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

Much recent work on Spoken Language Understanding (SLU) is limited in at...
research
05/25/2020

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

In a spoken multiple-choice question answering (SMCQA) task, given a pas...
research
05/29/2023

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Spoken Language Understanding (SLU) is a task that aims to extract seman...
research
03/15/2023

Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences

Past work on unsupervised parsing is constrained to written form. In thi...

Please sign up or login with your details

Forgot password? Click here to reset