Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back Translation

10/07/2019
by   Zhenhao Li, et al.
0

Neural Machine Translation (NMT) models have been proved strong when translating clean texts, but they are very sensitive to noise in the input. Improving NMT models robustness can be seen as a form of "domain" adaption to noise. The recently created Machine Translation on Noisy Text task corpus provides noisy-clean parallel data for a few language pairs, but this data is very limited in size and diversity. The state-of-the-art approaches are heavily dependent on large volumes of back-translated data. This paper has two main contributions: Firstly, we propose new data augmentation methods to extend limited noisy data and further improve NMT robustness to noise while keeping the models small. Secondly, we explore the effect of utilizing noise from external data in the form of speech transcripts and show that it could help robustness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

Sentence Boundary Augmentation For Neural Machine Translation Robustness

Neural Machine Translation (NMT) models have demonstrated strong state o...
research
11/06/2017

Synthetic and Natural Noise Both Break Neural Machine Translation

Character-based neural machine translation (NMT) models alleviate out-of...
research
10/12/2021

Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation

Neural Machine Translation (NMT) models are known to suffer from noisy i...
research
12/30/2020

Synthetic Source Language Augmentation for Colloquial Neural Machine Translation

Neural machine translation (NMT) is typically domain-dependent and style...
research
11/04/2020

PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

Neural Machine Translation (NMT) has shown drastic improvement in its qu...
research
10/22/2019

Robust Neural Machine Translation for Clean and Noisy Speech Transcripts

Neural machine translation models have shown to achieve high quality whe...
research
05/31/2021

Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation

While it has been shown that Neural Machine Translation (NMT) is highly ...

Please sign up or login with your details

Forgot password? Click here to reset