Multi-Agent Cross-Translated Diversification for Unsupervised Machine Translation

06/03/2020
by   Xuan-Phi Nguyen, et al.
0

Recent unsupervised machine translation (UMT) systems usually employ three main principles: initialization, language modeling and iterative back-translation, though they may apply these principles differently. This work introduces another component to this framework: Multi-Agent Cross-translated Diversification (MACD). The method trains multiple UMT agents and then translates monolingual data back and forth using non-duplicative agents to acquire synthetic parallel data for supervised MT. MACD is applicable to all previous UMT approaches. In our experiments, the technique boosts the performance for some commonly used UMT methods by 1.5-2.0 BLEU. In particular, in WMT'14 English-French, WMT'16 German-English and English-Romanian, MACD outperforms cross-lingual masked language model pretraining by 2.3, 2.2 and 1.6 BLEU, respectively. It also yields 1.5-3.3 BLEU improvements in IWSLT English-French and English-German translation tasks. Through extensive experimental analyses, we show that MACD is effective because it embraces data diversity while other similar variants do not.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2019

Cross-lingual Language Model Pretraining

Recent studies have demonstrated the efficiency of generative pretrainin...
research
03/12/2021

Bilingual Dictionary-based Language Model Pretraining for Neural Machine Translation

Recent studies have demonstrated a perceivable improvement on the perfor...
research
02/07/2020

A Multilingual View of Unsupervised Machine Translation

We present a probabilistic framework for multilingual neural machine tra...
research
09/04/2018

Unsupervised Statistical Machine Translation

While modern machine translation has relied on large parallel corpora, a...
research
08/30/2019

Bilingual is At Least Monolingual (BALM): A Novel Translation Algorithm that Encodes Monolingual Priors

State-of-the-art machine translation (MT) models do not use knowledge of...
research
06/10/2021

Exploring Unsupervised Pretraining Objectives for Machine Translation

Unsupervised cross-lingual pretraining has achieved strong results in ne...
research
05/12/2022

AppTek's Submission to the IWSLT 2022 Isometric Spoken Language Translation Task

To participate in the Isometric Spoken Language Translation Task of the ...

Please sign up or login with your details

Forgot password? Click here to reset