Establishing Interlingua in Multilingual Language Models

09/02/2021
by   Maksym Del, et al.
0

Large multilingual language models show remarkable zero-shot cross-lingual transfer performance on a range of tasks. Follow-up works hypothesized that these models internally project representations of different languages into a shared interlingual space. However, they produced contradictory results. In this paper, we correct claiming that "BERT is not an Interlingua" and show that with the proper choice of sentence representation different languages actually do converge to a shared space in such language models. Furthermore, we demonstrate that this convergence pattern is robust across four measures of correlation similarity and six mBERT-like models. We then extend our analysis to 28 diverse languages and find that the interlingual space exhibits a particular structure similar to the linguistic relatedness of languages. We also highlight a few outlier languages that seem to fail to converge to the shared space. The code for replicating our results is available at the following URL: https://github.com/maksym-del/interlingua.

READ FULL TEXT

page 4

page 7

research
12/04/2022

Cross-lingual Similarity of Multilingual Representations Revisited

Related works used indexes like CKA and variants of CCA to measure the s...
research
04/17/2021

A multilabel approach to morphosyntactic probing

We introduce a multilabel probing task to assess the morphosyntactic rep...
research
03/06/2023

Two-stage Pipeline for Multilingual Dialect Detection

Dialect Identification is a crucial task for localizing various Large La...
research
06/09/2022

Ancestor-to-Creole Transfer is Not a Walk in the Park

We aim to learn language models for Creole languages for which large vol...
research
09/15/2021

On the Universality of Deep COntextual Language Models

Deep Contextual Language Models (LMs) like ELMO, BERT, and their success...
research
04/24/2019

Semantic Drift in Multilingual Representations

Multilingual representations have mostly been evaluated based on their p...
research
10/24/2022

Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation

Linguistic analysis of language models is one of the ways to explain and...

Please sign up or login with your details

Forgot password? Click here to reset