End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

05/04/2023
by   Jixuan Wang, et al.
0

It is challenging to extract semantic meanings directly from audio signals in spoken language understanding (SLU), due to the lack of textual information. Popular end-to-end (E2E) SLU models utilize sequence-to-sequence automatic speech recognition (ASR) models to extract textual embeddings as input to infer semantics, which, however, require computationally expensive auto-regressive decoding. In this work, we leverage self-supervised acoustic encoders fine-tuned with Connectionist Temporal Classification (CTC) to extract textual embeddings and use joint CTC and SLU losses for utterance-level SLU tasks. Experiments show that our model achieves 4 state-of-the-art (SOTA) dialogue act classification model on the DSTC2 dataset and 1.3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2022

Introducing Semantics into Speech Encoders

Recent studies find existing self-supervised speech encoders contain pri...
research
02/12/2021

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

Spoken language understanding (SLU) systems extract transcriptions, as w...
research
11/06/2022

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Spoken language understanding (SLU) is a task aiming to extract high-lev...
research
05/20/2023

Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding

The pre-trained speech encoder wav2vec 2.0 performs very well on various...
research
05/11/2022

A neural prosody encoder for end-ro-end dialogue act classification

Dialogue act classification (DAC) is a critical task for spoken language...
research
11/11/2020

Towards Semi-Supervised Semantics Understanding from Speech

Much recent work on Spoken Language Understanding (SLU) falls short in a...
research
12/13/2021

Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Personal narratives (PN) - spoken or written - are recollections of fact...

Please sign up or login with your details

Forgot password? Click here to reset