DeepAI
Log In Sign Up

Data Augmentation for Low-Resource Neural Machine Translation

05/01/2017
by   Marzieh Fadaee, et al.
0

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/04/2020

Optimizing Transformer for Low-Resource Neural Machine Translation

Language pairs with limited amounts of parallel data, also known as low-...
05/04/2021

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

In this paper, we investigate the driving factors behind concatenation, ...
06/10/2019

Generalized Data Augmentation for Low-Resource Translation

Translation to or from low-resource languages LRLs poses challenges for ...
09/08/2022

Knowledge Based Template Machine Translation In Low-Resource Setting

Incorporating tagging into neural machine translation (NMT) systems has ...
05/13/2018

Triangular Architecture for Rare Language Translation

Neural Machine Translation (NMT) performs poor on the low-resource langu...
05/18/2022

Data Augmentation to Address Out-of-Vocabulary Problem in Low-Resource Sinhala-English Neural Machine Translation

Out-of-Vocabulary (OOV) is a problem for Neural Machine Translation (NMT...

Code Repositories

DataAugmentationNMT

Data Augmentation for Neural Machine Translation


view repo