Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations

by   Ruijie Yan, et al.

Building robust and general dialogue models for spoken conversations is challenging due to the gap in distributions of spoken and written data. This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10. In order to mitigate the discrepancies between spoken and written text, we mainly employ extensive data augmentation strategies on written data, including artificial error injection and round-trip text-speech transformation. To train robust models for spoken conversations, we improve pre-trained language models, and apply ensemble algorithms for each sub-task. Typically, for the detection task, we fine-tune and ELECTRA, and run an error-fixing ensemble algorithm. For the selection task, we adopt a two-stage framework that consists of entity tracking and knowledge ranking, and propose a multi-task learning method to learn multi-level semantic information by domain classification and entity selection. For the generation task, we adopt a cross-validation data process to improve pre-trained generative language models, followed by a consensus decoding algorithm, which can add arbitrary features like relative metric, and tune associated feature weights toward directly. Our approach ranks third on the objective evaluation and second on the final official human evaluation.


page 1

page 2

page 3

page 4


TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue Modeling on Spoken Conversations

Task-oriented dialogue systems have been plagued by the difficulties of ...

OLISIA: a Cascade System for Spoken Dialogue State Tracking

Though Dialogue State Tracking (DST) is a core component of spoken dialo...

Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model

This paper summarizes our submission to Task 2 of the second track of th...

External Knowledge Selection with Weighted Negative Sampling in Knowledge-grounded Task-oriented Dialogue Systems

Constructing a robust dialogue system on spoken conversations bring more...

Learning Spoken Language Representations with Neural Lattice Language Modeling

Pre-trained language models have achieved huge improvement on many NLP t...

E2E Spoken Entity Extraction for Virtual Agents

This paper reimagines some aspects of speech processing using speech enc...

Data Augmentation with Paraphrase Generation and Entity Extraction for Multimodal Dialogue System

Contextually aware intelligent agents are often required to understand t...

Please sign up or login with your details

Forgot password? Click here to reset