Dynamic Data Selection and Weighting for Iterative Back-Translation

04/07/2020
by   Zi-Yi Dou, et al.
0

Back-translation has proven to be an effective method to utilize monolingual data in neural machine translation (NMT), and iteratively conducting back-translation can further improve the model performance. Selecting which monolingual data to back-translate is crucial, as we require that the resulting synthetic data are of high quality and reflect the target domain. To achieve these two goals, data selection and weighting strategies have been proposed, with a common practice being to select samples close to the target domain but also dissimilar to the average general-domain text. In this paper, we provide insights into this commonly used approach and generalize it to a dynamic curriculum learning strategy, which is applied to iterative back-translation models. In addition, we propose weighting strategies based on both the current quality of the sentence and its improvement over the previous iteration. We evaluate our models on domain adaptation, low-resource, and high-resource MT settings and on two language pairs. Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2021

Synthesizing Monolingual Data for Neural Machine Translation

In neural machine translation (NMT), monolingual data in the target lang...
research
01/22/2020

Unsupervised Domain Adaptation for Neural Machine Translation with Iterative Back Translation

State-of-the-art neural machine translation (NMT) systems are data-hungr...
research
11/06/2019

Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation

The quality of neural machine translation can be improved by leveraging ...
research
12/02/2022

Improving Simultaneous Machine Translation with Monolingual Data

Simultaneous machine translation (SiMT) is usually done via sequence-lev...
research
11/14/2020

Iterative Self-Learning for Enhanced Back-Translation in Low Resource Neural Machine Translation

Many language pairs are low resource - the amount and/or quality of para...
research
10/06/2020

Iterative Domain-Repaired Back-Translation

In this paper, we focus on the domain-specific translation with low reso...
research
06/03/2019

Dynamically Composing Domain-Data Selection with Clean-Data Selection by "Co-Curricular Learning" for Neural Machine Translation

Noise and domain are important aspects of data quality for neural machin...

Please sign up or login with your details

Forgot password? Click here to reset