Log In Sign Up

VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation

by   Fuli Luo, et al.

Recent studies about learning multilingual representations have achieved significant performance gains across a wide range of downstream cross-lingual tasks. They train either an encoder-only Transformer mainly for understanding tasks, or an encoder-decoder Transformer specifically for generation tasks, ignoring the correlation between the two tasks and frameworks. In contrast, this paper presents a variable encoder-decoder (VECO) pre-training approach to unify the two mainstreams in both model architectures and pre-training tasks. VECO splits the standard Transformer block into several sub-modules trained with both inner-sequence and cross-sequence masked language modeling, and correspondingly reorganizes certain sub-modules for understanding and generation tasks during inference. Such a workflow not only ensures to train the most streamlined parameters necessary for two kinds of tasks, but also enables them to boost each other via sharing common sub-modules. As a result, VECO delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark covering text classification, sequence labeling, question answering, and sentence retrieval. For generation tasks, VECO also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1∼2 BLEU.


page 1

page 2

page 3

page 4


ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation

Recent cross-lingual cross-modal works attempt to extend Vision-Language...

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive ...

Cross-Thought for Sentence Encoder Pre-training

In this paper, we propose Cross-Thought, a novel approach to pre-trainin...

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Pre-training and fine-tuning, e.g., BERT, have achieved great success in...

GIT: A Generative Image-to-text Transformer for Vision and Language

In this paper, we design and train a Generative Image-to-text Transforme...

DOCmT5: Document-Level Pretraining of Multilingual Language Models

In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence ...

A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models

Undirected neural sequence models such as BERT have received renewed int...