Unsupervised Cross-lingual Representation Learning at Scale

by   Alexis Conneau, et al.

This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8 +12.3 performs particularly well on low-resource languages, improving 11.8 accuracy for Swahili and 9.2 present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing per-language performance; XLM-Ris very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We will make XLM-R code, data, and models publicly available.


page 1

page 2

page 3

page 4


Larger-Scale Transformers for Multilingual Masked Language Modeling

Recent work has demonstrated the effectiveness of cross-lingual language...

Language Scaling for Universal Suggested Replies Model

We consider the problem of scaling automated suggested replies for Outlo...

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

The emergent cross-lingual transfer seen in multilingual pretrained mode...

Towards Best Practices for Training Multilingual Dense Retrieval Models

Dense retrieval models using a transformer-based bi-encoder design have ...

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

In this paper, we elaborate upon recipes for building multilingual repre...

A Conditional Generative Matching Model for Multi-lingual Reply Suggestion

We study the problem of multilingual automated reply suggestions (RS) mo...

Romanization-based Large-scale Adaptation of Multilingual Language Models

Large multilingual pretrained language models (mPLMs) have become the de...

Code Repositories


This repository keep my research materials about Named Entity Recognition using Transfer Learning

view repo

Please sign up or login with your details

Forgot password? Click here to reset