Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

12/16/2022
by   Esaú Villatoro-Tello, et al.
28

In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4 However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18 architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

Adapting Pretrained Transformer to Lattices for Spoken Language Understanding

Lattices are compact representations that encode multiple hypotheses, su...
research
02/03/2021

Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Word vector representations enable machines to encode human language for...
research
04/07/2019

Spoken Language Intent Detection using Confusion2Vec

Decoding speaker's intent is a crucial part of spoken language understan...
research
07/01/2021

Word-Free Spoken Language Understanding for Mandarin-Chinese

Spoken dialogue systems such as Siri and Alexa provide great convenience...
research
07/06/2020

Learning Spoken Language Representations with Neural Lattice Language Modeling

Pre-trained language models have achieved huge improvement on many NLP t...
research
06/29/2022

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

In Spoken Language Understanding (SLU) the task is to extract important ...
research
02/01/2018

Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription

State-of-the-art English automatic speech recognition systems typically ...

Please sign up or login with your details

Forgot password? Click here to reset