MMTAfrica: Multilingual Machine Translation for African Languages

04/08/2022
by   Chris C. Emezue, et al.
0

In this paper, we focus on the task of multilingual machine translation for African languages and describe our contribution in the 2021 WMT Shared Task: Large-Scale Multilingual Machine Translation. We introduce MMTAfrica, the first many-to-many multilingual translation system for six African languages: Fon (fon), Igbo (ibo), Kinyarwanda (kin), Swahili/Kiswahili (swa), Xhosa (xho), and Yoruba (yor) and two non-African languages: English (eng) and French (fra). For multilingual translation concerning African languages, we introduce a novel backtranslation and reconstruction objective, BT&REC, inspired by the random online back translation and T5 modeling framework respectively, to effectively leverage monolingual data. Additionally, we report improvements from MMTAfrica over the FLORES 101 benchmarks (spBLEU gains ranging from +0.58 in Swahili to French to +19.46 in French to Xhosa). We release our dataset and code source at https://github.com/edaiofficial/mmtafrica.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2021

Back-translation for Large-Scale Multilingual Machine Translation

This paper illustrates our approach to the shared task on large-scale mu...
research
01/21/2018

A Universal Semantic Space

Multilingual embeddings build on the success of monolingual embeddings a...
research
09/09/2023

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

We introduce MADLAD-400, a manually audited, general domain 3T token mon...
research
09/13/2023

Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Hallucinations and off-target translation remain unsolved problems in ma...
research
05/31/2023

Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios

We tackle the task of automatically discriminating between human and mac...
research
10/31/2022

TaTa: A Multilingual Table-to-Text Dataset for African Languages

Existing data-to-text generation datasets are mostly limited to English....
research
12/16/2021

Can Multilinguality benefit Non-autoregressive Machine Translation?

Non-autoregressive (NAR) machine translation has recently achieved signi...

Please sign up or login with your details

Forgot password? Click here to reset