Syntax-aware Data Augmentation for Neural Machine Translation

04/29/2020
by   Sufeng Duan, et al.
0

Data augmentation is an effective performance enhancement in neural machine translation (NMT) by generating additional bilingual data. In this paper, we propose a novel data augmentation enhancement strategy for neural machine translation. Different from existing data augmentation methods which simply choose words with the same probability across different sentences for modification, we set sentence-specific probability for word selection by considering their roles in sentence. We use dependency parse tree of input sentence as an effective clue to determine selecting probability for every words in each sentence. Our proposed method is evaluated on WMT14 English-to-German dataset and IWSLT14 German-to-English dataset. The result of extensive experiments show our proposed syntax-aware data augmentation method may effectively boost existing sentence-independent methods for significant translation performance improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2021

Developing neural machine translation models for Hungarian-English

I train models for the task of neural machine translation for English-Hu...
research
07/01/2021

Zero-pronoun Data Augmentation for Japanese-to-English Translation

For Japanese-to-English translation, zero pronouns in Japanese pose a ch...
research
05/25/2019

Soft Contextual Data Augmentation for Neural Machine Translation

While data augmentation is an important trick to boost the accuracy of d...
research
03/26/2023

SASS: Data and Methods for Subject Aware Sentence Simplification

Sentence simplification tends to focus on the generic simplification of ...
research
08/22/2018

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

In this work, we examine methods for data augmentation for text-based ta...
research
05/18/2022

Data Augmentation to Address Out-of-Vocabulary Problem in Low-Resource Sinhala-English Neural Machine Translation

Out-of-Vocabulary (OOV) is a problem for Neural Machine Translation (NMT...
research
05/01/2018

Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT

We explore strategies for incorporating target syntax into Neural Machin...

Please sign up or login with your details

Forgot password? Click here to reset