The Source-Target Domain Mismatch Problem in Machine Translation

09/28/2019
by   Jiajun Shen, et al.
0

While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in. As a result, people often talk about different things in different parts of the world. In this work we study the effect of local context in machine translation and postulate that particularly in low resource settings this causes the domains of the source and target language to greatly mismatch, as the two languages are often spoken in further apart regions of the world with more distinctive cultural traits and unrelated local events. In this work we first propose a controlled setting to carefully analyze the source-target domain mismatch, and its dependence on the amount of parallel and monolingual data. Second, we test both a model trained with back-translation and one trained with self-training. The latter leverages in-domain source monolingual data but uses potentially incorrect target references. We found that these two approaches are often complementary to each other. For instance, on a low-resource Nepali-English dataset the combined approach improves upon the baseline using just parallel data by 2.5 BLEU points, and by 0.6 BLEU point when compared to back-translation.

READ FULL TEXT
research
03/10/2021

Self-Learning for Zero Shot Neural Machine Translation

Neural Machine Translation (NMT) approaches employing monolingual data a...
research
11/06/2019

Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation

The quality of neural machine translation can be improved by leveraging ...
research
10/28/2022

Domain Adaptation of Machine Translation with Crowdworkers

Although a machine translation model trained with a large in-domain para...
research
05/29/2018

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Despite impressive progress in high-resource settings, Neural Machine Tr...
research
06/09/2021

AUGVIC: Exploiting BiText Vicinity for Low-Resource NMT

The success of Neural Machine Translation (NMT) largely depends on the a...
research
05/23/2023

When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale

Multilingual machine translation (MMT), trained on a mixture of parallel...

Please sign up or login with your details

Forgot password? Click here to reset