Log In Sign Up

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

by   Barun Patra, et al.

In this paper, we elaborate upon recipes for building multilingual representation models that are not only competitive with existing state-of-the-art models but are also more parameter efficient, thereby promoting better adoption in resource-constrained scenarios and practical applications. We show that going beyond English-centric bitexts, coupled with a novel sampling strategy aimed at reducing under-utilization of training data, substantially boosts performance across model sizes for both Electra and MLM pre-training objectives. We introduce XY-LENT: X-Y bitext enhanced Language ENcodings using Transformers which not only achieves state-of-the-art performance over 5 cross-lingual tasks within all model size bands, is also competitive across bands. Our XY-LENT XL variant outperforms XLM-RXXL and exhibits competitive performance with mT5 XXL while being 5x and 6x smaller respectively. We then show that our proposed method helps ameliorate the curse of multilinguality, with the XY-LENT XL achieving 99.3 98.5 size band. We then analyze our models performance on extremely low resource languages and posit that scaling alone may not be sufficient for improving the performance in this scenario


Larger-Scale Transformers for Multilingual Masked Language Modeling

Recent work has demonstrated the effectiveness of cross-lingual language...

Unsupervised Cross-lingual Representation Learning at Scale

This paper shows that pretraining multilingual language models at scale ...

XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition

In this paper, we propose a weakly supervised multilingual representatio...

Multi-level Distillation of Semantic Knowledge for Pre-training Multilingual Language Model

Pre-trained multilingual language models play an important role in cross...

OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval

Aligning parallel sentences in multilingual corpora is essential to cura...

Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training

The goal of stance detection is to determine the viewpoint expressed in ...

Transformers without Tears: Improving the Normalization of Self-Attention

We evaluate three simple, normalization-centric changes to improve Trans...