Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

05/24/2022
by   Terra Blevins, et al.
0

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior. However, because these analyses have focused on fully trained multilingual models, little is known about the dynamics of the multilingual pretraining process. We investigate when these models acquire their in-language and cross-lingual abilities by probing checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones. In contrast, when the model learns to transfer cross-lingually depends on the language pair. Interestingly, we also observe that, across many languages and tasks, the final, converged model checkpoint exhibits significant performance degradation and that no one checkpoint performs best on all languages. Taken together with our other findings, these insights highlight the complexity and interconnectedness of multilingual pretraining.

READ FULL TEXT

page 4

page 5

page 6

page 7

page 12

page 14

research
09/23/2021

Cross-Lingual Language Model Meta-Pretraining

The success of pretrained cross-lingual language models relies on two es...
research
11/05/2019

Unsupervised Cross-lingual Representation Learning at Scale

This paper shows that pretraining multilingual language models at scale ...
research
06/05/2023

Second Language Acquisition of Neural Language Models

With the success of neural language models (LMs), their language acquisi...
research
04/03/2023

ScandEval: A Benchmark for Scandinavian Natural Language Processing

This paper introduces a Scandinavian benchmarking platform, ScandEval, w...
research
02/24/2022

Oolong: Investigating What Makes Crosslingual Transfer Hard with Controlled Studies

Little is known about what makes cross-lingual transfer hard, since fact...
research
06/03/2021

How to Adapt Your Pretrained Multilingual Model to 1600 Languages

Pretrained multilingual models (PMMs) enable zero-shot learning via cros...
research
03/19/2022

Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

We investigate what kind of structural knowledge learned in neural netwo...

Please sign up or login with your details

Forgot password? Click here to reset