Analyzing the Limitations of Cross-lingual Word Embedding Mappings

06/12/2019
by   Aitor Ormazabal, et al.
0

Recent research in cross-lingual word embeddings has almost exclusively focused on offline methods, which independently train word embeddings in different languages and map them to a shared space through linear transformations. While several authors have questioned the underlying isomorphism assumption, which states that word embeddings in different languages have approximately the same structure, it is not clear whether this is an inherent limitation of mapping approaches or a more general issue when learning cross-lingual embeddings. So as to answer this question, we experiment with parallel corpora, which allows us to compare offline mapping to an extension of skip-gram that jointly learns both embedding spaces. We observe that, under these ideal conditions, joint learning yields to more isomorphic embeddings, is less sensitive to hubness, and obtains stronger results in bilingual lexicon induction. We thus conclude that current mapping methods do have strong limitations, calling for further research to jointly learn cross-lingual embeddings with a weaker cross-lingual signal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2020

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring

Recent research on cross-lingual word embeddings has been dominated by u...
research
04/22/2020

Revisiting the Context Window for Cross-lingual Word Embeddings

Existing approaches to mapping-based cross-lingual word embeddings are b...
research
08/18/2016

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

While cross-lingual word embeddings have been studied extensively in rec...
research
11/26/2020

Unsupervised Word Translation Pairing using Refinement based Point Set Registration

Cross-lingual alignment of word embeddings play an important role in kno...
research
05/16/2018

A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings

Recent work has managed to learn cross-lingual word embeddings without p...
research
10/16/2020

Multi-Adversarial Learning for Cross-Lingual Word Embeddings

Generative adversarial networks (GANs) have succeeded in inducing cross-...

Please sign up or login with your details

Forgot password? Click here to reset