When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer

10/27/2021
by   Ameet Deshpande, et al.
0

While recent work on multilingual language models has demonstrated their capacity for cross-lingual zero-shot transfer on downstream tasks, there is a lack of consensus in the community as to what shared properties between languages enable such transfer. Analyses involving pairs of natural languages are often inconclusive and contradictory since languages simultaneously differ in many linguistic aspects. In this paper, we perform a large-scale empirical study to isolate the effects of various linguistic properties by measuring zero-shot transfer between four diverse natural languages and their counterparts constructed by modifying aspects such as the script, word order, and syntax. Among other things, our experiments show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order, and there is a strong correlation between transfer performance and word embedding alignment between languages (e.g., R=0.94 on the task of NLI). Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages rather than relying on its implicit emergence.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 6

page 12

04/17/2021

Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training

In recent years, pre-trained multilingual language models, such as multi...
09/13/2021

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

In cross-lingual language models, representations for many different lan...
09/02/2021

Establishing Interlingua in Multilingual Language Models

Large multilingual language models show remarkable zero-shot cross-lingu...
05/09/2020

Finding Universal Grammatical Relations in Multilingual BERT

Recent work has found evidence that Multilingual BERT (mBERT), a transfo...
10/06/2021

Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning

Multilingual models jointly pretrained on multiple languages have achiev...
11/08/2019

How Language-Neutral is Multilingual BERT?

Multilingual BERT (mBERT) provides sentence representations for 104 lang...
02/10/2020

Multilingual Alignment of Contextual Word Representations

We propose procedures for evaluating and strengthening contextual embedd...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.