Exploring Monolingual Data for Neural Machine Translation with Knowledge Distillation

12/31/2020
by   Alham Fikri Aji, et al.
0

We explore two types of monolingual data that can be included in knowledge distillation training for neural machine translation (NMT). The first is the source-side monolingual data. Second, is the target-side monolingual data that is used as back-translation data. Both datasets are (forward-)translated by a teacher model from source-language to target-language, which are then combined into a dataset for smaller student models. We find that source-side monolingual data improves model performance when evaluated by test-set originated from source-side. Likewise, target-side data has a positive effect on the test-set in the opposite direction. We also show that it is not required to train the student model with the same data used by the teacher, as long as the domains are the same. Finally, we find that combining source-side and target-side yields in better performance than relying on just one side of the monolingual data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2022

UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation

Most translation tasks among languages belong to the zero-resource trans...
research
05/02/2020

Improving Non-autoregressive Neural Machine Translation with Monolingual Data

Non-autoregressive (NAR) neural machine translation is usually done via ...
research
09/06/2023

A deep Natural Language Inference predictor without language-specific training data

In this paper we present a technique of NLP to tackle the problem of inf...
research
12/02/2022

Improving Simultaneous Machine Translation with Monolingual Data

Simultaneous machine translation (SiMT) is usually done via sequence-lev...
research
04/01/2021

Sampling and Filtering of Neural Machine Translation Distillation Data

In most of neural machine translation distillation or stealing scenarios...
research
12/05/2020

Reciprocal Supervised Learning Improves Neural Machine Translation

Despite the recent success on image classification, self-training has on...
research
04/14/2021

The Curious Case of Hallucinations in Neural Machine Translation

In this work, we study hallucinations in Neural Machine Translation (NMT...

Please sign up or login with your details

Forgot password? Click here to reset