An Isotropy Analysis in the Multilingual BERT Embedding Space

10/09/2021
by   Sara Rajaee, et al.
0

Several studies have explored various advantages of multilingual pre-trained models (e.g., multilingual BERT) in capturing shared linguistic knowledge. However, their limitations have not been paid enough attention. In this paper, we investigate the representation degeneration problem in multilingual contextual word representations (CWRs) of BERT and show that the embedding spaces of the selected languages suffer from anisotropy problem. Our experimental results demonstrate that, similarly to their monolingual counterparts, increasing the isotropy of multilingual embedding space can significantly improve its representation power and performance. Our analysis indicates that although the degenerated directions vary in different languages, they encode similar linguistic knowledge, suggesting a shared linguistic space among languages.

READ FULL TEXT
research
01/21/2018

A Universal Semantic Space

Multilingual embeddings build on the success of monolingual embeddings a...
research
04/17/2021

A multilabel approach to morphosyntactic probing

We introduce a multilabel probing task to assess the morphosyntactic rep...
research
04/29/2021

Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses

Pre-trained language models (LMs) encode rich information about linguist...
research
01/26/2021

Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT

We investigate how Multilingual BERT (mBERT) encodes grammar by examinin...
research
04/13/2021

DirectProbe: Studying Representations without Classifiers

Understanding how linguistic structures are encoded in contextualized em...
research
04/21/2018

Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding

We construct a multilingual common semantic space based on distributiona...
research
05/01/2020

Identifying Necessary Elements for BERT's Multilinguality

It has been shown that multilingual BERT (mBERT) yields high quality mul...

Please sign up or login with your details

Forgot password? Click here to reset