On the Language Neutrality of Pre-trained Multilingual Representations

04/09/2020
by   Jindřich Libovický, et al.
0

Multilingual contextual embeddings, such as multilingual BERT (mBERT) and XLM-RoBERTa, have proved useful for many multi-lingual tasks. Previous work probed the cross-linguality of the representations indirectly using zero-shot transfer learning on morphological and syntactic tasks. We instead focus on the language-neutrality of mBERT with respect to lexical semantics. Our results show that contextual embeddings are more language-neutral and in general more informative than aligned static word-type embeddings which are explicitly trained for language neutrality. Contextual embeddings are still by default only moderately language-neutral, however, we show two simple methods for achieving stronger language neutrality: first, by unsupervised centering of the representation for languages, and second by fitting an explicit projection on small parallel data. In addition, we show how to reach state-of-the-art accuracy on language identification and word alignment in parallel sentences.

READ FULL TEXT
research
11/08/2019

How Language-Neutral is Multilingual BERT?

Multilingual BERT (mBERT) provides sentence representations for 104 lang...
research
03/17/2022

Combining Static and Contextualised Multilingual Embeddings

Static and contextual multilingual embeddings have complementary strengt...
research
01/19/2023

Language Embeddings Sometimes Contain Typological Generalizations

To what extent can neural network models learn generalizations about lan...
research
02/10/2020

Multilingual Alignment of Contextual Word Representations

We propose procedures for evaluating and strengthening contextual embedd...
research
01/26/2021

Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT

We investigate how Multilingual BERT (mBERT) encodes grammar by examinin...
research
01/28/2020

Unsupervised Multilingual Alignment using Wasserstein Barycenter

We study unsupervised multilingual alignment, the problem of finding wor...
research
02/06/2014

An Autoencoder Approach to Learning Bilingual Word Representations

Cross-language learning allows us to use training data from one language...

Please sign up or login with your details

Forgot password? Click here to reset