Log In Sign Up

On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation

by   Changtong Zan, et al.

Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains (sometimes, even worse) on resource-rich NMT on par with its Random-Initialization (RI) counterpart. We take the first step to investigate the complementarity between PT and RI in resource-rich scenarios via two probing analyses, and find that: 1) PT improves NOT the accuracy, but the generalization by achieving flatter loss landscapes than that of RI; 2) PT improves NOT the confidence of lexical choice, but the negative diversity by assigning smoother lexical probability distributions than that of RI. Based on these insights, we propose to combine their complementarities with a model fusion algorithm that utilizes optimal transport to align neurons between PT and RI. Experiments on two resource-rich translation benchmarks, WMT'17 English-Chinese (20M) and WMT'19 English-German (36M), show that PT and RI could be nicely complementary to each other, achieving substantial improvements considering both translation accuracy, generalization, and negative diversity. Probing tools and code are released at:


page 1

page 2

page 3

page 4


Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

Self-supervised pre-training of text representations has been successful...

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

Pre-training (PT) and back-translation (BT) are two simple and powerful ...

On the Copying Behaviors of Pre-Training for Neural Machine Translation

Previous studies have shown that initializing neural machine translation...

Towards Better Chinese-centric Neural Machine Translation for Low-resource Languages

The last decade has witnessed enormous improvements in science and techn...

Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation

In the present study, we propose novel sequence-to-sequence pre-training...

Triangular Architecture for Rare Language Translation

Neural Machine Translation (NMT) performs poor on the low-resource langu...