Developing neural machine translation models for Hungarian-English

11/07/2021
by   Attila Nagy, et al.
4

I train models for the task of neural machine translation for English-Hungarian and Hungarian-English, using the Hunglish2 corpus. The main contribution of this work is evaluating different data augmentation methods during the training of NMT models. I propose 5 different augmentation methods that are structure-aware, meaning that instead of randomly selecting words for blanking or replacement, the dependency tree of sentences is used as a basis for augmentation. I start my thesis with a detailed literature review on neural networks, sequential modeling, neural machine translation, dependency parsing and data augmentation. After a detailed exploratory data analysis and preprocessing of the Hunglish2 corpus, I perform experiments with the proposed data augmentation techniques. The best model for Hungarian-English achieves a BLEU score of 33.9, while the best model for English-Hungarian achieves a BLEU score of 28.6.

READ FULL TEXT

page 15

page 19

page 31

page 41

research
01/18/2022

Syntax-based data augmentation for Hungarian-English machine translation

We train Transformer-based neural machine translation models for Hungari...
research
04/29/2020

Syntax-aware Data Augmentation for Neural Machine Translation

Data augmentation is an effective performance enhancement in neural mach...
research
06/21/2020

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation

In this paper, we propose a new adversarial augmentation method for Neur...
research
07/13/2023

Data Augmentation for Machine Translation via Dependency Subtree Swapping

We present a generic framework for data augmentation via dependency subt...
research
11/30/2021

Minor changes make a difference: a case study on the consistency of UD-based dependency parsers

Many downstream applications are using dependency trees, and are thus re...
research
12/30/2020

Synthetic Source Language Augmentation for Colloquial Neural Machine Translation

Neural machine translation (NMT) is typically domain-dependent and style...
research
07/01/2021

Zero-pronoun Data Augmentation for Japanese-to-English Translation

For Japanese-to-English translation, zero pronouns in Japanese pose a ch...

Please sign up or login with your details

Forgot password? Click here to reset