Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages

07/19/2021
by   Dana Ruiter, et al.
0

For most language combinations, parallel data is either scarce or simply unavailable. To address this, unsupervised machine translation (UMT) exploits large amounts of monolingual data by using synthetic data generation techniques such as back-translation and noising, while self-supervised NMT (SSNMT) identifies parallel sentences in smaller comparable data and trains on them. To date, the inclusion of UMT data generation techniques in SSNMT has not been investigated. We show that including UMT techniques into SSNMT significantly outperforms SSNMT and UMT on all tested language pairs, with improvements of up to +4.3 BLEU, +50.8 BLEU, +51.5 over SSNMT, statistical UMT and hybrid UMT, respectively, on Afrikaans to English. We further show that the combination of multilingual denoising autoencoding, SSNMT with backtranslation and bilingual finetuning enables us to learn machine translation even for distant language pairs for which only small amounts of monolingual data are available, e.g. yielding BLEU scores of 11.6 (English to Swahili).

READ FULL TEXT
research
05/11/2020

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Over the last few years two promising research directions in low-resourc...
research
10/20/2021

Multilingual Unsupervised Neural Machine Translation with Denoising Adapters

We consider the problem of multilingual unsupervised machine translation...
research
12/14/2021

GEO-BLEU: Similarity Measure for Geospatial Sequences

In recent geospatial research, the importance of modeling large-scale hu...
research
02/07/2020

A Multilingual View of Unsupervised Machine Translation

We present a probabilistic framework for multilingual neural machine tra...
research
04/20/2018

Phrase-Based & Neural Unsupervised Machine Translation

Machine translation systems achieve near human-level performance on some...
research
10/31/2017

Unsupervised Machine Translation Using Monolingual Corpora Only

Machine translation has recently achieved impressive performance thanks ...
research
09/07/2021

Paraphrase Generation as Unsupervised Machine Translation

In this paper, we propose a new paradigm for paraphrase generation by tr...

Please sign up or login with your details

Forgot password? Click here to reset