Ìtàkúròso: Exploiting Cross-Lingual Transferability for Natural Language Generation of Dialogues in Low-Resource, African Languages

04/17/2022
by   Tosin Adewumi, et al.
11

We investigate the possibility of cross-lingual transfer from a state-of-the-art (SoTA) deep monolingual model (DialoGPT) to 6 African languages and compare with 2 baselines (BlenderBot 90M, another SoTA, and a simple Seq2Seq). The languages are Swahili, Wolof, Hausa, Nigerian Pidgin English, Kinyarwanda Yorùbá. Generation of dialogues is known to be a challenging task for many reasons. It becomes more challenging for African languages which are low-resource in terms of data. Therefore, we translate a small portion of the English multi-domain MultiWOZ dataset for each target language. Besides intrinsic evaluation (i.e. perplexity), we conduct human evaluation of single-turn conversations by using majority votes and measure inter-annotator agreement (IAA). The results show that the hypothesis that deep monolingual models learn some abstractions that generalise across languages holds. We observe human-like conversations in 5 out of the 6 languages. It, however, applies to different degrees in different languages, which is expected. The language with the most transferable properties is the Nigerian Pidgin English, with a human-likeness score of 78.1 unanimous. The main contributions of this paper include the representation (through the provision of high-quality dialogue data) of under-represented African languages and demonstrating the cross-lingual transferability hypothesis for dialogue systems. We also provide the datasets and host the model checkpoints/demos on the HuggingFace hub for public access.

READ FULL TEXT
research
01/27/2021

An Empirical Study of Cross-Lingual Transferability in Generative Dialogue State Tracker

There has been a rapid development in data-driven task-oriented dialogue...
research
06/03/2021

ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation

Despite the recent advancement in NLP research, cross-lingual transfer f...
research
03/17/2020

XPersona: Evaluating Multilingual Personalized Chatbot

Personalized dialogue systems are an essential step toward better human-...
research
05/24/2023

Cross-lingual Data Augmentation for Document-grounded Dialog Systems in Low Resource Languages

This paper proposes a framework to address the issue of data scarcity in...
research
11/09/2020

CLAR: A Cross-Lingual Argument Regularizer for Semantic Role Labeling

Semantic role labeling (SRL) identifies predicate-argument structure(s) ...
research
03/04/2023

Self-tuning hyper-parameters for unsupervised cross-lingual tokenization

We explore the possibility of meta-learning for the language-independent...
research
06/05/2021

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

Task-oriented dialogue (ToD) benchmarks provide an important avenue to m...

Please sign up or login with your details

Forgot password? Click here to reset