Semantic Drift in Multilingual Representations

04/24/2019
by   Lisa Beinborn, et al.
0

Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distributional representations which are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families. We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: they can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages. The code is available at https://github.com/beinborn/SemanticDrift.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2023

The Geometry of Multilingual Language Models: An Equality Lens

Understanding the representations of different languages in multilingual...
research
10/18/2022

Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation

We introduce a new method to improve existing multilingual sentence embe...
research
04/03/2022

Learning Disentangled Semantic Representations for Zero-Shot Cross-Lingual Transfer in Multilingual Machine Reading Comprehension

Multilingual pre-trained models are able to zero-shot transfer knowledge...
research
09/02/2021

Establishing Interlingua in Multilingual Language Models

Large multilingual language models show remarkable zero-shot cross-lingu...
research
05/16/2018

DINFRA: A One Stop Shop for Computing Multilingual Semantic Relatedness

This demonstration presents an infrastructure for computing multilingual...
research
08/14/2018

R-grams: Unsupervised Learning of Semantic Units in Natural Language

This paper introduces a novel type of data-driven segmented unit that we...
research
10/21/2022

Spectral Probing

Linguistic information is encoded at varying timescales (subwords, phras...

Please sign up or login with your details

Forgot password? Click here to reset