Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

05/20/2022
by   Chia-Chien Hung, et al.
0

Research on (multi-domain) task-oriented dialog (TOD) has predominantly focused on the English language, primarily due to the shortage of robust TOD datasets in other languages, preventing the systematic investigation of cross-lingual transfer for this crucial NLP application area. In this work, we introduce Multi2WOZ, a new multilingual multi-domain TOD dataset, derived from the well-established English dataset MultiWOZ, that spans four typologically diverse languages: Chinese, German, Arabic, and Russian. In contrast to concurrent efforts, Multi2WOZ contains gold-standard dialogs in target languages that are directly comparable with development and test portions of the English dataset, enabling reliable and comparative estimates of cross-lingual transfer performance for TOD. We then introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks. Using such conversational PrLMs specialized for concrete target languages, we systematically benchmark a number of zero-shot and few-shot cross-lingual transfer approaches on two standard TOD tasks: Dialog State Tracking and Response Retrieval. Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task. Most importantly, we show that our conversational specialization in the target language allows for an exceptionally sample-efficient few-shot transfer for downstream TOD tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2022

Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval

State-of-the-art neural (re)rankers are notoriously data hungry which - ...
research
04/17/2021

Crossing the Conversational Chasm: A Primer on Multilingual Task-Oriented Dialogue Systems

Despite the fact that natural language conversations with machines repre...
research
04/11/2022

Zero-shot Cross-lingual Conversational Semantic Role Labeling

While conversational semantic role labeling (CSRL) has shown its usefuln...
research
04/03/2023

Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning

Cross-lingual transfer of language models trained on high-resource langu...
research
09/18/2021

DuRecDial 2.0: A Bilingual Parallel Corpus for Conversational Recommendation

In this paper, we provide a bilingual parallel human-to-human recommenda...
research
07/26/2023

Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

Creating high-quality annotated data for task-oriented dialog (ToD) is k...
research
08/27/2021

Code-switched inspired losses for generic spoken dialog representations

Spoken dialog systems need to be able to handle both multiple languages ...

Please sign up or login with your details

Forgot password? Click here to reset