Generalized Data Augmentation for Low-Resource Translation

06/10/2019
by   Mengzhou Xia, et al.
0

Translation to or from low-resource languages LRLs poses challenges for machine translation in terms of both adequacy and fluency. Data augmentation utilizing large amounts of monolingual data is regarded as an effective way to alleviate these problems. In this paper, we propose a general framework for data augmentation in low-resource machine translation that not only uses target-side monolingual data, but also pivots through a related high-resource language HRL. Specifically, we experiment with a two-step pivoting method to convert high-resource data to the LRL, making use of available resources to better approximate the true data distribution of the LRL. First, we inject LRL words into HRL sentences through an induced bilingual dictionary. Second, we further edit these modified sentences using a modified unsupervised machine translation framework. Extensive experiments on four low-resource datasets show that under extreme low-resource settings, our data augmentation techniques improve translation quality by up to 1.5 to 8 BLEU points compared to supervised back-translation baselines

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2017

Data Augmentation for Low-Resource Neural Machine Translation

The quality of a Neural Machine Translation system depends substantially...
research
08/30/2019

Handling Syntactic Divergence in Low-resource Machine Translation

Despite impressive empirical successes of neural machine translation (NM...
research
06/12/2023

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

In this work we investigate the impact of applying textual data augmenta...
research
09/10/2021

AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages

Reproducible benchmarks are crucial in driving progress of machine trans...
research
03/27/2023

Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

Neural machine translation (NMT) has progressed rapidly over the past se...
research
05/04/2021

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

In this paper, we investigate the driving factors behind concatenation, ...
research
06/09/2021

AUGVIC: Exploiting BiText Vicinity for Low-Resource NMT

The success of Neural Machine Translation (NMT) largely depends on the a...

Please sign up or login with your details

Forgot password? Click here to reset