MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

09/29/2018
by   Paweł Budzianowski, et al.
0

Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.

READ FULL TEXT
research
10/17/2020

RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling

In order to alleviate the shortage of multi-domain data and to capture d...
research
07/17/2018

Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing

Robust dialogue belief tracking is a key component in maintaining good q...
research
07/10/2020

MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines

MultiWOZ is a well-known task-oriented dialogue dataset containing over ...
research
03/03/2022

Dialogue Summaries as Dialogue States (DS2), Template-Guided Summarization for Few-shot Dialogue State Tracking

Annotating task-oriented dialogues is notorious for the expensive and di...
research
04/19/2022

A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets

In recent years, interest has arisen in using machine learning to improv...
research
04/22/2022

SalesBot: Transitioning from Chit-Chat to Task-Oriented Dialogues

Dialogue systems are usually categorized into two types, open-domain and...
research
05/20/2020

ScriptWriter: Narrative-Guided Script Generation

It is appealing to have a system that generates a story or scripts autom...

Please sign up or login with your details

Forgot password? Click here to reset