Low-Resource Neural Machine Translation for Southern African Languages

04/01/2021
by   Evander Nyoni, et al.
0

Low-resource African languages have not fully benefited from the progress in neural machine translation because of a lack of data. Motivated by this challenge we compare zero-shot learning, transfer learning and multilingual learning on three Bantu languages (Shona, isiXhosa and isiZulu) and English. Our main target is English-to-isiZulu translation for which we have just 30,000 sentence pairs, 28 importance of language similarity on the performance of English-to-isiZulu transfer learning based on English-to-isiXhosa and English-to-Shona parent models whose BLEU scores differ by 5.2. We then demonstrate that multilingual learning surpasses both transfer learning and zero-shot learning on our dataset, with BLEU score improvements relative to the baseline English-to-isiZulu model of 9.9, 6.1 and 2.0 respectively. Our best model also improves the previous SOTA BLEU score by more than 10.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2021

Many-to-English Machine Translation Tools, Data, and Pretrained Models

While there are more than 7000 languages in the world, most translation ...
research
03/09/2020

Tigrinya Neural Machine Translation with Transfer Learning for Humanitarian Response

We report our experiments in building a domain-specific Tigrinya-to-Engl...
research
06/01/2023

Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity

This paper investigates the impact of data volume and the use of similar...
research
11/02/2018

Neural Machine Translation into Language Varieties

Both research and commercial machine translation have so far neglected t...
research
09/07/2018

Logographic Subword Model for Neural Machine Translation

A novel logographic subword model is proposed to reinterpret logograms a...
research
02/15/2018

Universal Neural Machine Translation for Extremely Low Resource Languages

In this paper, we propose a new universal machine translation approach f...
research
08/11/2022

Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation

This paper proposes a simple yet effective method to improve direct (X-t...

Please sign up or login with your details

Forgot password? Click here to reset