Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation

04/16/2022
by   Changtong Zan, et al.
0

For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e.g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e.g. 25 languages from commoncrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e.g. English-German, making there exists the cross-lingual data discrepancy, namely domain discrepancy, and cross-lingual learning objective discrepancy, namely task discrepancy, between the pretrain and finetune stages. To bridge the above cross-lingual domain and task gaps, we extend the vanilla pretrain-finetune pipeline with extra code-switching restore task. Specifically, the first stage employs the self-supervised code-switching restore task as a pretext task, allowing the multilingual Seq2Seq PLM to acquire some in-domain alignment information. And for the second stage, we continuously fine-tune the model on labeled data normally. Experiments on a variety of cross-lingual NLG tasks, including 12 bilingual translation tasks, 36 zero-shot translation tasks, and cross-lingual summarization tasks show our model outperforms the strong baseline mBART consistently. Comprehensive analyses indicate our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.

READ FULL TEXT
research
09/23/2019

Cross-Lingual Natural Language Generation via Pre-Training

In this work we focus on transferring supervision signals of natural lan...
research
08/04/2021

PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining

Despite the success of multilingual sequence-to-sequence pretraining, mo...
research
07/25/2023

XDLM: Cross-lingual Diffusion Language Model for Machine Translation

Recently, diffusion models have excelled in image generation tasks and h...
research
03/07/2023

CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization

Cross-lingual summarization (CLS) has attracted increasing interest in r...
research
04/18/2021

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive ...
research
06/22/2021

It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

Commonsense reasoning is one of the key problems in natural language pro...
research
09/22/2022

MonoByte: A Pool of Monolingual Byte-level Language Models

The zero-shot cross-lingual ability of models pretrained on multilingual...

Please sign up or login with your details

Forgot password? Click here to reset