The Geometry of Multilingual Language Models: An Equality Lens

05/13/2023
by   Cheril Shah, et al.
0

Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. Our findings indicate that the low-resource languages are not represented as good as high resource languages in any of the models

READ FULL TEXT
research
04/18/2023

Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese

Multilingual language models have pushed state-of-the-art in cross-lingu...
research
05/25/2022

Discovering Language-neutral Sub-networks in Multilingual Language Models

Multilingual pre-trained language models perform remarkably well on cros...
research
05/22/2022

The Geometry of Multilingual Language Model Representations

We assess how multilingual language models maintain a shared multilingua...
research
04/24/2019

Semantic Drift in Multilingual Representations

Multilingual representations have mostly been evaluated based on their p...
research
12/01/2020

Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios

Existing multilingual speech NLP works focus on a relatively small subse...
research
10/24/2022

Adapters for Enhanced Modeling of Multilingual Knowledge and Text

Large language models appear to learn facts from the large text corpora ...
research
02/23/2023

In What Languages are Generative Language Models the Most Formal? Analyzing Formality Distribution across Languages

Multilingual generative language models (LMs) are increasingly fluent in...

Please sign up or login with your details

Forgot password? Click here to reset