OLISIA: a Cascade System for Spoken Dialogue State Tracking

04/20/2023
by   Léo Jacqmin, et al.
0

Though Dialogue State Tracking (DST) is a core component of spoken dialogue systems, recent work on this task mostly deals with chat corpora, disregarding the discrepancies between spoken and written language.In this paper, we propose OLISIA, a cascade system which integrates an Automatic Speech Recognition (ASR) model and a DST model. We introduce several adaptations in the ASR and DST modules to improve integration and robustness to spoken conversations.With these adaptations, our system ranked first in DSTC11 Track 3, a benchmark to evaluate spoken DST. We conduct an in-depth analysis of the results and find that normalizing the ASR outputs and adapting the DST inputs through data augmentation, along with increasing the pre-trained models size all play an important role in reducing the performance discrepancy between written and spoken conversations.

READ FULL TEXT
research
12/23/2021

TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue Modeling on Spoken Conversations

Task-oriented dialogue systems have been plagued by the difficulties of ...
research
08/29/2023

Adapting Text-based Dialogue State Tracker for Spoken Dialogues

Although there have been remarkable advances in dialogue systems through...
research
03/08/2022

Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations

Building robust and general dialogue models for spoken conversations is ...
research
07/24/2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Punctuation restoration is an important task in automatic speech recogni...
research
02/03/2020

Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks

Spoken dialogue systems typically use a list of top-N ASR hypotheses for...
research
07/13/2023

Adapting an ASR Foundation Model for Spoken Language Assessment

A crucial part of an accurate and reliable spoken language assessment sy...
research
07/20/2021

Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation

Transcripts generated by automatic speech recognition (ASR) systems for ...

Please sign up or login with your details

Forgot password? Click here to reset