MDIA: A Benchmark for Multilingual Dialogue Generation in 46 Languages

08/27/2022
by   Qingyu Zhang, et al.
0

Owing to the lack of corpora for low-resource languages, current works on dialogue generation have mainly focused on English. In this paper, we present mDIA, the first large-scale multilingual benchmark for dialogue generation across low- to high-resource languages. It covers real-life conversations in 46 languages across 19 language families. We present baseline results obtained by fine-tuning the multilingual, non-dialogue-focused pre-trained model mT5 as well as English-centric, dialogue-focused pre-trained chatbot DialoGPT. The results show that mT5-based models perform better on sacreBLEU and BertScore but worse on diversity. Even though promising results are found in few-shot and zero-shot scenarios, there is a large gap between the generation quality in English and other languages. We hope that the release of mDIA could encourage more works on multilingual dialogue generation to promote language diversity.

READ FULL TEXT

page 3

page 7

page 8

research
08/18/2022

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation

Building dialogue generation systems in a zero-shot scenario remains a h...
research
06/09/2023

I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

A simile is a figure of speech that compares two different things (calle...
research
10/14/2021

GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

Much recent progress in task-oriented dialogue (ToD) systems has been dr...
research
12/23/2021

Investigating Effect of Dialogue History in Multilingual Task Oriented Dialogue Systems

While the English virtual assistants have achieved exciting performance ...
research
03/16/2022

Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?

What can pre-trained multilingual sequence-to-sequence models like mBART...
research
06/14/2023

LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming

Open-domain dialogue systems have made promising progress in recent year...
research
11/15/2022

QAmeleon: Multilingual QA with Only 5 Examples

The availability of large, high-quality datasets has been one of the mai...

Please sign up or login with your details

Forgot password? Click here to reset