Soft Language Clustering for Multilingual Model Pre-training

06/13/2023
by   Jiali Zeng, et al.
0

Multilingual pre-trained language models have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typology from source languages or when pre-training data is limited in size. In this paper, we propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods. On the tasks of XTREME including text classification, sequence labeling, question answering, and sentence retrieval, both base- and large-size language models pre-trained with our proposed method exhibit consistent performance improvement. Furthermore, it provides substantial advantages for low-resource languages in unsupervised sentence retrieval and for target languages that differ greatly from the source language in cross-lingual transfer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

CINO: A Chinese Minority Pre-trained Language Model

Multilingual pre-trained language models have shown impressive performan...
research
05/12/2022

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

Multilingual pre-trained models are known to suffer from the curse of mu...
research
09/13/2021

Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training

The goal of stance detection is to determine the viewpoint expressed in ...
research
12/13/2022

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Software engineers working with the same programming language (PL) may s...
research
09/15/2021

On the Universality of Deep COntextual Language Models

Deep Contextual Language Models (LMs) like ELMO, BERT, and their success...
research
10/26/2022

Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning

In this paper, we elaborate upon recipes for building multilingual repre...
research
09/14/2022

Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models

As pre-trained language models become more resource-demanding, the inequ...

Please sign up or login with your details

Forgot password? Click here to reset