When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer

by   Ameet Deshpande, et al.

While recent work on multilingual language models has demonstrated their capacity for cross-lingual zero-shot transfer on downstream tasks, there is a lack of consensus in the community as to what shared properties between languages enable such transfer. Analyses involving pairs of natural languages are often inconclusive and contradictory since languages simultaneously differ in many linguistic aspects. In this paper, we perform a large-scale empirical study to isolate the effects of various linguistic properties by measuring zero-shot transfer between four diverse natural languages and their counterparts constructed by modifying aspects such as the script, word order, and syntax. Among other things, our experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between transfer performance and word embedding alignment between languages (e.g., R=0.94 on the task of NLI). Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages rather than relying on its implicit emergence.



There are no comments yet.


page 4

page 6

page 12


Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

In recent years, pre-trained multilingual language models, such as multi...

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

In cross-lingual language models, representations for many different lan...

Establishing Interlingua in Multilingual Language Models

Large multilingual language models show remarkable zero-shot cross-lingu...

Finding Universal Grammatical Relations in Multilingual BERT

Recent work has found evidence that Multilingual BERT (mBERT), a transfo...

Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning

Multilingual models jointly pretrained on multiple languages have achiev...

How Language-Neutral is Multilingual BERT?

Multilingual BERT (mBERT) provides sentence representations for 104 lang...

Multilingual Alignment of Contextual Word Representations

We propose procedures for evaluating and strengthening contextual embedd...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.