Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

11/29/2021
by   Lasse Borgholt, et al.
0

Spoken language understanding (SLU) tasks are usually solved by first transcribing an utterance with automatic speech recognition (ASR) and then feeding the output to a text-based model. Recent advances in self-supervised representation learning for speech data have focused on improving the ASR component. We investigate whether representation learning for speech has matured enough to replace ASR in SLU. We compare learned speech features from wav2vec 2.0, state-of-the-art ASR transcripts, and the ground truth text as input for a novel speech-based named entity recognition task, a cardiac arrest detection task on real-world emergency calls and two existing SLU benchmarks. We show that learned speech features are superior to ASR transcripts on three classification tasks. For machine translation, ASR transcripts are still the better choice. We highlight the intrinsic robustness of wav2vec 2.0 representations to out-of-vocabulary words as key to better performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

Recent voice assistants are usually based on the cascade spoken language...
research
04/23/2021

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Self-Supervised Learning (SSL) using huge unlabeled data has been succes...
research
03/22/2017

Topic Identification for Speech without ASR

Modern topic identification (topic ID) systems for speech use automatic ...
research
01/02/2020

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

Research on speech processing has traditionally considered the task of d...
research
05/18/2023

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Conformer, a convolution-augmented Transformer variant, has become the d...
research
01/25/2023

Fillers in Spoken Language Understanding: Computational and Psycholinguistic Perspectives

Disfluencies (i.e. interruptions in the regular flow of speech), are ubi...
research
11/03/2020

Warped Language Models for Noise Robust Language Understanding

Masked Language Models (MLM) are self-supervised neural networks trained...

Please sign up or login with your details

Forgot password? Click here to reset