Data Diversification: An Elegant Strategy For Neural Machine Translation

11/05/2019
by   Xuan-Phi Nguyen, et al.
0

A common approach to improve neural machine translation is to invent new architectures. However, the research process of designing and refining such new models is often exhausting. Another approach is to resort to huge extra monolingual data to conduct semi-supervised training, like back-translation. But extra monolingual data is not always available, especially for low resource languages. In this paper, we propose to diversify the available training data by using multiple forward and backward peer models to augment the original training dataset. Our method does not require extra data like back-translation, nor additional computations and parameters like using pretrained models. Our data diversification method achieves state-of-the-art BLEU score of 30.7 in the WMT'14 English-German task. It also consistently and substantially improves translation quality in 8 other translation tasks: 4 IWSLT tasks (English-German and English-French) and 4 low-resource translation tasks (English-Nepali and English-Sinhala).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2020

Using Self-Training to Improve Back-Translation in Low Resource Neural Machine Translation

Improving neural machine translation (NMT) models using the back-transla...
research
11/14/2020

Iterative Self-Learning for Enhanced Back-Translation in Low Resource Neural Machine Translation

Many language pairs are low resource - the amount and/or quality of para...
research
03/25/2022

Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation

Subword regularizations use multiple subword segmentations during traini...
research
03/24/2021

Low-Resource Machine Translation for Low-Resource Languages: Leveraging Comparable Data, Code-Switching and Compute Resources

We conduct an empirical study of unsupervised neural machine translation...
research
09/27/2019

On the use of BERT for Neural Machine Translation

Exploiting large pretrained models for various NMT tasks have gained a l...
research
02/17/2021

Sparsely Factored Neural Machine Translation

The standard approach to incorporate linguistic information to neural ma...
research
02/01/2023

Are UD Treebanks Getting More Consistent? A Report Card for English UD

Recent efforts to consolidate guidelines and treebanks in the Universal ...

Please sign up or login with your details

Forgot password? Click here to reset