How Language-Neutral is Multilingual BERT?

11/08/2019
by   Jindřich Libovický, et al.
0

Multilingual BERT (mBERT) provides sentence representations for 104 languages, which are useful for many multi-lingual tasks. Previous work probed the cross-linguality of mBERT using zero-shot transfer learning on morphological and syntactic tasks. We instead focus on the semantic properties of mBERT. We show that mBERT representations can be split into a language-specific component and a language-neutral component, and that the language-neutral component is sufficiently general in terms of modeling semantics to allow high-accuracy word-alignment and sentence retrieval but is not yet good enough for the more difficult task of MT quality estimation. Our work presents interesting challenges which must be solved to build better language-neutral representations, particularly for tasks requiring linguistic transfer of semantics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2020

On the Language Neutrality of Pre-trained Multilingual Representations

Multilingual contextual embeddings, such as multilingual BERT (mBERT) an...
research
08/20/2020

Inducing Language-Agnostic Multilingual Representations

Multilingual representations have the potential to make cross-lingual sy...
research
10/19/2020

Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking

Cross-language entity linking grounds mentions in multiple languages to ...
research
09/27/2020

What does it mean to be language-agnostic? Probing multilingual sentence encoders for typological properties

Multilingual sentence encoders have seen much success in cross-lingual m...
research
05/09/2020

Finding Universal Grammatical Relations in Multilingual BERT

Recent work has found evidence that Multilingual BERT (mBERT), a transfo...
research
05/01/2020

Identifying Necessary Elements for BERT's Multilinguality

It has been shown that multilingual BERT (mBERT) yields high quality mul...
research
10/11/2020

Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

Contextualized word representations, such as ELMo and BERT, were shown t...

Please sign up or login with your details

Forgot password? Click here to reset