The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems

07/28/2023
by   Andreas Liesenfeld, et al.
0

Speech recognition systems are a key intermediary in voice-driven human-computer interaction. Although speech recognition works well for pristine monologic audio, real-life use cases in open-ended interactive settings still present many challenges. We argue that timing is mission-critical for dialogue systems, and evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). This impacts especially the recognition of conversational words (study 2), and in turn has dire consequences for downstream intent recognition (study 3). Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies.

READ FULL TEXT

page 2

page 4

research
11/05/2022

Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective

Automatic speech recognition (ASR) meets more informal and free-form inp...
research
09/14/2019

Current Challenges in Spoken Dialogue Systems and Why They Are Critical for Those Living with Dementia

Dialogue technologies such as Amazon's Alexa have the potential to trans...
research
03/07/2022

Building and curating conversational corpora for diversity-aware language science and technology

We present a pipeline and tools to build a maximally natural data set of...
research
10/07/2020

WER we are and WER we think we are

Natural language processing of conversational speech requires the availa...
research
02/22/2022

Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey

A Machine-Critical Application is a system that is fundamentally necessa...
research
08/29/2022

Turn-Taking Prediction for Natural Conversational Speech

While a streaming voice assistant system has been used in many applicati...
research
08/29/2017

Comparing Human and Machine Errors in Conversational Speech Transcription

Recent work in automatic recognition of conversational telephone speech ...

Please sign up or login with your details

Forgot password? Click here to reset