Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model

05/31/2022
by   Xuan-Phi Nguyen, et al.
9

Numerous recent work on unsupervised machine translation (UMT) implies that competent unsupervised translations of low-resource and unrelated languages, such as Nepali or Sinhala, are only possible if the model is trained in a massive multilingual environment, where theses low-resource languages are mixed with high-resource counterparts. Nonetheless, while the high-resource languages greatly help kick-start the target low-resource translation tasks, the language discrepancy between them may hinder their further improvement. In this work, we propose a simple refinement procedure to disentangle languages from a pre-trained multilingual UMT model for it to focus on only the target low-resource task. Our method achieves the state of the art in the fully unsupervised translation tasks of English to Nepali, Sinhala, Gujarati, Latvian, Estonian and Kazakh, with BLEU score gains of 3.5, 3.5, 3.3, 4.1, 4.2, and 3.3, respectively. Our codebase is available at https://github.com/nxphi47/refine_unsup_multilingual_mt

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2021

EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT

We describe the EdinSaar submission to the shared task of Multilingual L...
research
10/20/2022

SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages

In recent years, multilingual machine translation models have achieved p...
research
06/20/2023

Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Large language models (LLMs) are known to effectively perform tasks by s...
research
04/24/2020

Practical Comparable Data Collection for Low-Resource Languages via Images

We propose a method of curating high-quality comparable training data fo...
research
09/20/2018

Lessons learned in multilingual grounded language learning

Recent work has shown how to learn better visual-semantic embeddings by ...
research
05/05/2023

Train Global, Tailor Local: Minimalist Multilingual Translation into Endangered Languages

In many humanitarian scenarios, translation into severely low resource l...
research
04/15/2021

Demystify Optimization Challenges in Multilingual Transformers

Multilingual Transformer improves parameter efficiency and crosslingual ...

Please sign up or login with your details

Forgot password? Click here to reset