Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation

05/20/2019
by   Xinyi Wang, et al.
0

To improve low-resource Neural Machine Translation (NMT) with multilingual corpora, training on the most related high-resource language only is often more effective than using all data available (Neubig and Hu, 2018). However, it is possible that an intelligent data selection strategy can further improve low-resource NMT with data from other auxiliary languages. In this paper, we seek to construct a sampling distribution over all multilingual data, so that it minimizes the training loss of the low-resource language. Based on this formulation, we propose an efficient algorithm, Target Conditioned Sampling (TCS), which first samples a target sentence, and then conditionally samples its source sentence. Experiments show that TCS brings significant gains of up to 2 BLEU on three of four languages we test, with minimal training overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2020

Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation

To improve the performance of Neural Machine Translation (NMT) for low-r...
research
10/30/2019

Adapting Multilingual Neural Machine Translation to Unseen Languages

Multilingual Neural Machine Translation (MNMT) for low-resource language...
research
02/09/2019

Multilingual Neural Machine Translation With Soft Decoupled Encoding

Multilingual training of neural machine translation (NMT) systems has le...
research
11/30/2020

Dynamic Curriculum Learning for Low-Resource Neural Machine Translation

Large amounts of data has made neural machine translation (NMT) a big su...
research
04/09/2022

Towards Better Chinese-centric Neural Machine Translation for Low-resource Languages

The last decade has witnessed enormous improvements in science and techn...
research
05/22/2023

Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation

Despite advances in multilingual neural machine translation (MNMT), we a...
research
03/20/2021

Token-wise Curriculum Learning for Neural Machine Translation

Existing curriculum learning approaches to Neural Machine Translation (N...

Please sign up or login with your details

Forgot password? Click here to reset