Data Augmentation for Machine Translation via Dependency Subtree Swapping

07/13/2023
by   Attila Nagy, et al.
0

We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across bisentences to create augmented samples. We perform thorough filtering based on graphbased similarities of the dependency trees and additional heuristics to ensure that extracted subtrees correspond to the same meaning. We conduct resource-constrained experiments on 4 language pairs in both directions using the IWSLT text translation datasets and the Hunglish2 corpus. The results demonstrate consistent improvements in BLEU score over our baseline models in 3 out of 4 language pairs. Our code is available on GitHub.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2021

Developing neural machine translation models for Hungarian-English

I train models for the task of neural machine translation for English-Hu...
research
11/30/2021

Minor changes make a difference: a case study on the consistency of UD-based dependency parsers

Many downstream applications are using dependency trees, and are thus re...
research
06/12/2023

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

In this work we investigate the impact of applying textual data augmenta...
research
08/22/2018

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation

In this work, we examine methods for data augmentation for text-based ta...
research
09/22/2022

Semantically Consistent Data Augmentation for Neural Machine Translation via Conditional Masked Language Model

This paper introduces a new data augmentation method for neural machine ...
research
09/09/2021

HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints

Back-translation (BT) of target monolingual corpora is a widely used dat...
research
01/25/2021

Facilitating Terminology Translation with Target Lemma Annotations

Most of the recent work on terminology integration in machine translatio...

Please sign up or login with your details

Forgot password? Click here to reset