Understanding Back-Translation at Scale

08/28/2018
by   Sergey Edunov, et al.
0

An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences. We find that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective. Our analysis shows that sampling or noisy synthetic data gives a much stronger training signal than data generated by beam or greedy search. We also compare how synthetic data compares to genuine bitext and study various domain effects. Finally, we scale to hundreds of millions of monolingual sentences and achieve a new state of the art of 35 BLEU on the WMT'14 English-German test set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2018

Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation

Neural Machine Translation has achieved state-of-the-art performance for...
research
11/06/2019

Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation

The quality of neural machine translation can be improved by leveraging ...
research
06/02/2021

Self-Training Sampling with Monolingual Data Uncertainty for Neural Machine Translation

Self-training has proven effective for improving NMT performance by augm...
research
05/10/2022

ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair

We release our synthetic parallel paraphrase corpus across 17 languages:...
research
12/22/2019

Tag-less Back-Translation

An effective method to generate a large number of parallel sentences for...
research
11/07/2018

Data Selection with Feature Decay Algorithms Using an Approximated Target Side

Data selection techniques applied to neural machine translation (NMT) ai...
research
06/18/2019

Adaptation of Machine Translation Models with Back-translated Data using Transductive Data Selection Methods

Data selection has proven its merit for improving Neural Machine Transla...

Please sign up or login with your details

Forgot password? Click here to reset