An Effective Approach to Unsupervised Machine Translation

02/04/2019
by   Mikel Artetxe, et al.
0

While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only. In this paper, we identify and address several deficiencies of existing unsupervised SMT approaches by exploiting subword information, developing a theoretically well founded unsupervised tuning method, and incorporating a joint refinement procedure. Moreover, we use our improved SMT system to initialize a dual NMT model, which is further fine-tuned through on-the-fly back-translation. Together, we obtain large improvements over the previous state-of-the-art in unsupervised machine translation. For instance, we get 22.5 BLEU points in English-to-German WMT 2014, 5.5 points more than the previous best unsupervised system, and 0.5 points more than the (supervised) shared task winner back in 2014.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2018

Unsupervised Statistical Machine Translation

While modern machine translation has relied on large parallel corpora, a...
research
10/30/2018

Unsupervised Neural Machine Translation Initialized by Unsupervised Statistical Machine Translation

Recent work achieved remarkable results in training neural machine trans...
research
04/04/2019

Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation

The overreliance on large parallel corpora significantly limits the appl...
research
11/10/2019

Language Model-Driven Unsupervised Neural Machine Translation

Unsupervised neural machine translation(NMT) is associated with noise an...
research
02/28/2020

Do all Roads Lead to Rome? Understanding the Role of Initialization in Iterative Back-Translation

Back-translation provides a simple yet effective approach to exploit mon...
research
10/25/2020

The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task

This paper describes the submission of LMU Munich to the WMT 2020 unsupe...
research
09/11/2015

A Parallel Corpus of Translationese

We describe a set of bilingual English--French and English--German paral...

Please sign up or login with your details

Forgot password? Click here to reset