Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation

04/04/2019
by   Jiawei Wu, et al.
0

The overreliance on large parallel corpora significantly limits the applicability of machine translation systems to the majority of language pairs. Back-translation has been dominantly used in previous approaches for unsupervised neural machine translation, where pseudo sentence pairs are generated to train the models with a reconstruction loss. However, the pseudo sentences are usually of low quality as translation errors accumulate during training. To avoid this fundamental issue, we propose an alternative but more effective approach, extract-edit, to extract and then edit real sentences from the target monolingual corpora. Furthermore, we introduce a comparative translation loss to evaluate the translated target sentences and thus train the unsupervised translation systems. Experiments show that the proposed approach consistently outperforms the previous state-of-the-art unsupervised machine translation systems across two benchmarks (English-French and English-German) and two low-resource language pairs (English-Romanian and English-Russian) by more than 2 (up to 3.63) BLEU points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2017

LIUM Machine Translation Systems for WMT17 News Translation Task

This paper describes LIUM submissions to WMT17 News Translation Task for...
research
02/04/2019

An Effective Approach to Unsupervised Machine Translation

While machine translation has traditionally relied on large amounts of p...
research
01/19/2023

Improving Machine Translation with Phrase Pair Injection and Corpus Filtering

In this paper, we show that the combination of Phrase Pair Injection and...
research
04/27/2020

Intelligent Translation Memory Matching and Retrieval with Sentence Encoders

Matching and retrieving previously translated segments from a Translatio...
research
04/09/2020

Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios

Unsupervised neural machine translation (UNMT) that relies solely on mas...
research
10/15/2020

Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings

We describe an unsupervised method to create pseudo-parallel corpora for...
research
10/31/2017

Unsupervised Machine Translation Using Monolingual Corpora Only

Machine translation has recently achieved impressive performance thanks ...

Please sign up or login with your details

Forgot password? Click here to reset