DeepAI AI Chat
Log In Sign Up

Larger-Scale Transformers for Multilingual Masked Language Modeling

by   Naman Goyal, et al.

Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8 RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3 average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available.


page 1

page 2

page 3

page 4


Unsupervised Cross-lingual Representation Learning at Scale

This paper shows that pretraining multilingual language models at scale ...

BabelBERT: Massively Multilingual Transformers Meet a Massively Multilingual Lexical Resource

While pretrained language models (PLMs) primarily serve as general purpo...

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Software engineers working with the same programming language (PL) may s...

Learning to Scale Multilingual Representations for Vision-Language Tasks

Current multilingual vision-language models either require a large numbe...

Does Transliteration Help Multilingual Language Modeling?

As there is a scarcity of large representative corpora for most language...

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

In this paper, we elaborate upon recipes for building multilingual repre...

UNKs Everywhere: Adapting Multilingual Language Models to New Scripts

Massively multilingual language models such as multilingual BERT (mBERT)...