Are All Good Word Vector Spaces Isomorphic?

04/08/2020
by   Ivan Vulić, et al.
0

Existing algorithms for aligning cross-lingual word vector spaces assume that vector spaces are approximately isomorphic. As a result, they perform poorly or fail completely on non-isomorphic spaces. Such non-isomorphism has been hypothesised to result almost exclusively from typological differences between languages. In this work, we ask whether non-isomorphism is also crucially a sign of degenerate word vector spaces. We present a series of experiments across diverse languages which show that, besides inherent typological differences, variance in performance across language pairs can largely be attributed to the size of the monolingual resources available, and to the properties and duration of monolingual training (e.g. "under-training").

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2017

Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints

We present Attract-Repel, an algorithm for improving the semantic qualit...
research
10/16/2019

Meemi: A Simple Method for Post-processing Cross-lingual Word Embeddings

Word embeddings have become a standard resource in the toolset of any Na...
research
07/11/2018

Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces

We generalize the word analogy task across languages, to provide a new i...
research
08/21/2019

On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning

Cross-lingual word embeddings are vector representations of words in dif...
research
10/11/2022

IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces

The ability to extract high-quality translation dictionaries from monoli...
research
07/11/2018

Linear Transformations for Cross-lingual Semantic Textual Similarity

Cross-lingual semantic textual similarity systems estimate the degree of...
research
10/24/2020

Word2vec Conjecture and A Limitative Result

Being inspired by the success of word2vec <cit.> in capturing analogies,...

Please sign up or login with your details

Forgot password? Click here to reset