Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

05/04/2021
by   Toan Q. Nguyen, et al.
0

In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for the improvement of about +1 BLEU across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2017

Data Augmentation for Low-Resource Neural Machine Translation

The quality of a Neural Machine Translation system depends substantially...
research
06/08/2021

Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading

We propose a simple and effective strategy for data augmentation for low...
research
09/24/2021

A Diversity-Enhanced and Constraints-Relaxed Augmentation for Low-Resource Classification

Data augmentation (DA) aims to generate constrained and diversified data...
research
06/10/2019

Generalized Data Augmentation for Low-Resource Translation

Translation to or from low-resource languages LRLs poses challenges for ...
research
01/06/2023

Mask-then-Fill: A Flexible and Effective Data Augmentation Framework for Event Extraction

We present Mask-then-Fill, a flexible and effective data augmentation fr...
research
09/22/2022

Semantically Consistent Data Augmentation for Neural Machine Translation via Conditional Masked Language Model

This paper introduces a new data augmentation method for neural machine ...
research
07/01/2021

Zero-pronoun Data Augmentation for Japanese-to-English Translation

For Japanese-to-English translation, zero pronouns in Japanese pose a ch...

Please sign up or login with your details

Forgot password? Click here to reset