RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems

12/29/2020
by   Baolin Peng, et al.
0

For task-oriented dialog systems to be maximally useful, it must be able to process conversations in a way that is (1) generalizable with a small number of training examples for new task domains, and (2) robust to user input in various styles, modalities or domains. In pursuit of these goals, we introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains. By including tasks with limited training data, RADDLE is designed to favor and encourage models with a strong generalization ability. RADDLE also includes a diagnostic checklist that facilitates detailed robustness analysis in aspects such as language variations, speech errors, unseen entities, and out-of-domain utterances. We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain. Overall, existing models are less than satisfactory in robustness evaluation, which suggests opportunities for future improvement.

READ FULL TEXT
research
06/22/2022

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

We introduce GODEL (Grounded Open Dialogue Language Model), a large pre-...
research
04/24/2020

A Tailored Pre-Training Model for Task-Oriented Dialog Generation

The recent success of large pre-trained language models such as BERT and...
research
03/28/2023

Zero-Shot Generalizable End-to-End Task-Oriented Dialog System using Context Summarization and Domain Schema

Task-oriented dialog systems empower users to accomplish their goals by ...
research
08/11/2023

Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System

Overlapped Speech Detection (OSD) is an important part of speech applica...
research
09/23/2020

Hierarchical Pre-training for Sequence Labelling in Spoken Dialog

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identific...
research
12/15/2021

CheckDST: Measuring Real-World Generalization of Dialogue State Tracking Performance

Recent neural models that extend the pretrain-then-finetune paradigm con...
research
12/20/2022

Enhancing Task Bot Engagement with Synthesized Open-Domain Dialog

Many efforts have been made to construct dialog systems for different ty...

Please sign up or login with your details

Forgot password? Click here to reset