Speech Aware Dialog System Technology Challenge (DSTC11)

12/16/2022
by   Hagen Soltau, et al.
0

Most research on task oriented dialog modeling is based on written text input. However, users interact with practical dialog systems often using speech as input. Typically, systems convert speech into text using an Automatic Speech Recognition (ASR) system, introducing errors. Furthermore, these systems do not address the differences in written and spoken language. The research on this topic is stymied by the lack of a public corpus. Motivated by these considerations, our goal in hosting the speech-aware dialog state tracking challenge was to create a public corpus or task which can be used to investigate the performance gap between the written and spoken forms of input, develop models that could alleviate this gap, and establish whether Text-to-Speech-based (TTS) systems is a reasonable surrogate to the more-labor intensive human data collection. We created three spoken versions of the popular written-domain MultiWoz task – (a) TTS-Verbatim: written user inputs were converted into speech waveforms using a TTS system, (b) Human-Verbatim: humans spoke the user inputs verbatim, and (c) Human-paraphrased: humans paraphrased the user inputs. Additionally, we provided different forms of ASR output to encourage wider participation from teams that may not have access to state-of-the-art ASR systems. These included ASR transcripts, word time stamps, and latent representations of the audio (audio encoder outputs). In this paper, we describe the corpus, report results from participating teams, provide preliminary analyses of their results, and summarize the current state-of-the-art in this domain.

READ FULL TEXT
research
03/23/2021

Hallucination of speech recognition errors with sequence to sequence learning

Automatic Speech Recognition (ASR) is an imperfect process that results ...
research
04/20/2020

ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

Automatic speech recognition (ASR) via call is essential for various app...
research
06/08/2023

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

Large Language Models (LLMs) have been applied in the speech domain, oft...
research
11/08/2019

Investigation of Error Simulation Techniques for Learning Dialog Policies for Conversational Error Recovery

Training dialog policies for speech-based virtual assistants requires a ...
research
05/14/2018

The Spot the Difference corpus: a multi-modal corpus of spontaneous task oriented spoken interactions

This paper describes the Spot the Difference Corpus which contains 54 in...
research
06/15/2022

Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project

Czech is a very specific language due to its large differences between t...
research
07/20/2022

Improving Data Driven Inverse Text Normalization using Data Augmentation

Inverse text normalization (ITN) is used to convert the spoken form outp...

Please sign up or login with your details

Forgot password? Click here to reset