AdMix: A Mixed Sample Data Augmentation Method for Neural Machine Translation

05/10/2022
by   Chang Jin, et al.
0

In Neural Machine Translation (NMT), data augmentation methods such as back-translation have proven their effectiveness in improving translation performance. In this paper, we propose a novel data augmentation approach for NMT, which is independent of any additional training data. Our approach, AdMix, consists of two parts: 1) introduce faint discrete noise (word replacement, word dropping, word swapping) into the original sentence pairs to form augmented samples; 2) generate new synthetic training data by softly mixing the augmented samples with their original samples in training corpus. Experiments on three translation datasets of different scales show that AdMix achieves signifi cant improvements (1.0 to 2.7 BLEU points) over strong Transformer baseline. When combined with other data augmentation techniques (e.g., back-translation), our approach can obtain further improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2018

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

In this work, we examine methods for data augmentation for text-based ta...
research
04/01/2022

CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation

We propose a novel data-augmentation technique for neural machine transl...
research
09/22/2022

Semantically Consistent Data Augmentation for Neural Machine Translation via Conditional Masked Language Model

This paper introduces a new data augmentation method for neural machine ...
research
09/24/2021

Faithful Target Attribute Prediction in Neural Machine Translation

The training data used in NMT is rarely controlled with respect to speci...
research
04/27/2023

NAP at SemEval-2023 Task 3: Is Less Really More? (Back-)Translation as Data Augmentation Strategies for Detecting Persuasion Techniques

Persuasion techniques detection in news in a multi-lingual setup is non-...
research
01/07/2022

Semantic-based Data Augmentation for Math Word Problems

It's hard for neural MWP solvers to deal with tiny local variances. In M...
research
10/12/2021

Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation

Neural Machine Translation (NMT) models are known to suffer from noisy i...

Please sign up or login with your details

Forgot password? Click here to reset