Lifting the Curse of Multilinguality by Pre-training Modular Transformers

05/12/2022
by   Jonas Pfeiffer, et al.
4

Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-Mod) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.

READ FULL TEXT

page 6

page 7

page 9

page 10

page 11

page 12

page 13

page 14

research
06/13/2023

Soft Language Clustering for Multilingual Model Pre-training

Multilingual pre-trained language models have demonstrated impressive (z...
research
06/03/2021

Syntax-augmented Multilingual BERT for Cross-lingual Transfer

In recent years, we have seen a colossal effort in pre-training multilin...
research
09/01/2021

Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation

Recent multilingual pre-trained language models have achieved remarkable...
research
04/30/2020

A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

The Arabic language is a morphological rich language, posing many challe...
research
03/17/2021

SML: a new Semantic Embedding Alignment Transformer for efficient cross-lingual Natural Language Inference

The ability of Transformers to perform with precision a variety of tasks...
research
02/22/2023

Modular Deep Learning

Transfer learning has recently become the dominant paradigm of machine l...
research
10/23/2020

Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages

This article presents the strategy for developing a platform containing ...

Please sign up or login with your details

Forgot password? Click here to reset