The Source-Target Domain Mismatch Problem in Machine Translation

by   Jiajun Shen, et al.

While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in. As a result, people often talk about different things in different parts of the world. In this work we study the effect of local context in machine translation and postulate that particularly in low resource settings this causes the domains of the source and target language to greatly mismatch, as the two languages are often spoken in further apart regions of the world with more distinctive cultural traits and unrelated local events. In this work we first propose a controlled setting to carefully analyze the source-target domain mismatch, and its dependence on the amount of parallel and monolingual data. Second, we test both a model trained with back-translation and one trained with self-training. The latter leverages in-domain source monolingual data but uses potentially incorrect target references. We found that these two approaches are often complementary to each other. For instance, on a low-resource Nepali-English dataset the combined approach improves upon the baseline using just parallel data by 2.5 BLEU points, and by 0.6 BLEU point when compared to back-translation.


Self-Learning for Zero Shot Neural Machine Translation

Neural Machine Translation (NMT) approaches employing monolingual data a...

Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation

The quality of neural machine translation can be improved by leveraging ...

Domain Adaptation of Machine Translation with Crowdworkers

Although a machine translation model trained with a large in-domain para...

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Despite impressive progress in high-resource settings, Neural Machine Tr...

AUGVIC: Exploiting BiText Vicinity for Low-Resource NMT

The success of Neural Machine Translation (NMT) largely depends on the a...

When Does Monolingual Data Help Multilingual Translation: The Role of Domain and Model Scale

Multilingual machine translation (MMT), trained on a mixture of parallel...

Please sign up or login with your details

Forgot password? Click here to reset