Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

07/26/2023
by   Songbo Hu, et al.
0

Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi3WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom-up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation

Multilingual task-oriented dialogue (ToD) facilitates access to services...
research
12/15/2021

AllWOZ: Towards Multilingual Task-Oriented Dialog Systems for All

A commonly observed problem of the state-of-the-art natural language tec...
research
05/20/2022

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

Research on (multi-domain) task-oriented dialog (TOD) has predominantly ...
research
04/03/2023

LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate Speech Identification

Current research on hate speech analysis is typically oriented towards m...
research
10/14/2021

GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

Much recent progress in task-oriented dialogue (ToD) systems has been dr...
research
08/02/2022

Multilingual Coreference Resolution in Multiparty Dialogue

Existing multiparty dialogue datasets for coreference resolution are nas...

Please sign up or login with your details

Forgot password? Click here to reset