Multilingual Denoising Pre-training for Neural Machine Translation

by   Yinhan Liu, et al.

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART – a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages; previous MT pre-training has focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.


page 3

page 16


Pre-training via Paraphrasing

We introduce MARGE, a pre-trained sequence-to-sequence model learned wit...

DOCmT5: Document-Level Pretraining of Multilingual Language Models

In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence ...

Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information

We investigate the following question for machine translation (MT): can ...

DEEP: DEnoising Entity Pre-training for Neural Machine Translation

It has been shown that machine translation models usually generate poor ...

Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each ...

Linguistically-driven Multi-task Pre-training for Low-resource Neural Machine Translation

In the present study, we propose novel sequence-to-sequence pre-training...

Please sign up or login with your details

Forgot password? Click here to reset