Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

10/16/2021
by   Guanhua Chen, et al.
0

This paper demonstrates that multilingual pretraining, a proper fine-tuning method and a large-scale parallel dataset from multiple auxiliary languages are all critical for zero-shot translation, where the NMT model is tested on source languages unseen during supervised training. Following this idea, we present SixT++, a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages. SixT++ initializes the decoder embedding and the full encoder with XLM-R large, and then trains the encoder and decoder layers with a simple two-stage training strategy. SixT++ achieves impressive performance on many-to-English translation. It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively. Additionally, SixT++ offers a set of model parameters that can be further fine-tuned to develop unsupervised NMT models for low-resource languages. With back-translation on monolingual data of low-resource language, it outperforms all current state-of-the-art unsupervised methods on Nepali and Sinhal for both translating into and from English.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2022

Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation

We propose a two-stage training approach for developing a single NMT mod...
research
06/22/2023

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

We introduce a new proxy score for evaluating bitext mining based on sim...
research
09/23/2020

Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages

Unsupervised translation has reached impressive performance on resource-...
research
08/02/2022

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

In this work, we demonstrate that multilingual large-scale sequence-to-s...
research
10/20/2021

Continual Learning in Multilingual NMT via Language-Specific Embeddings

This paper proposes a technique for adding a new source or target langua...
research
10/19/2020

Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation

This work presents our ongoing research of unsupervised pretraining in n...
research
11/08/2016

Unsupervised Pretraining for Sequence to Sequence Learning

Sequence to sequence models are successful tools for supervised sequence...

Please sign up or login with your details

Forgot password? Click here to reset