nmT5 – Is parallel data still relevant for pre-training massively multilingual language models?

06/03/2021
by   Mihir Kale, et al.
0

Recently, mT5 - a massively multilingual version of T5 - leveraged a unified text-to-text format to attain state-of-the-art results on a wide variety of multilingual NLP tasks. In this paper, we investigate the impact of incorporating parallel data into mT5 pre-training. We find that multi-tasking language modeling with objectives such as machine translation during pre-training is a straightforward way to improve performance on downstream multilingual and cross-lingual tasks. However, the gains start to diminish as the model capacity increases, suggesting that parallel data might not be as essential for larger models. At the same time, even at larger model sizes, we find that pre-training with parallel data still provides benefits in the limited labelled data regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

Cross-Lingual Supervision improves Large Language Models Pre-training

The recent rapid progress in pre-training Large Language Models has reli...
research
10/15/2021

Tricks for Training Sparse Translation Models

Multi-task learning with an unbalanced data distribution skews model lea...
research
03/19/2021

Let Your Heart Speak in its Mother Tongue: Multilingual Captioning of Cardiac Signals

Cardiac signals, such as the electrocardiogram, convey a significant amo...
research
05/20/2023

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

The increasing number of benchmarks for Natural Language Processing (NLP...
research
06/14/2023

Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each ...
research
11/08/2022

Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

The use of multilingual language models for tasks in low and high-resour...
research
12/16/2021

DOCmT5: Document-Level Pretraining of Multilingual Language Models

In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence ...

Please sign up or login with your details

Forgot password? Click here to reset