Wasserstein distances for evaluating cross-lingual embeddings

10/24/2019
by   Georgios Balikas, et al.
0

Word embeddings are high dimensional vector representations of words that capture their semantic similarity in the vector space. There exist several algorithms for learning such embeddings both for a single language as well as for several languages jointly. In this work we propose to evaluate collections of embeddings by adapting downstream natural language tasks to the optimal transport framework. We show how the family of Wasserstein distances can be used to solve cross-lingual document retrieval and the cross-lingual document classification problems. We argue on the advantages of this approach compared to more traditional evaluation methods of embeddings like bilingual lexical induction. Our experimental results suggest that using Wasserstein distances on these problems out-performs several strong baselines and performs on par with state-of-the-art models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2020

Cross-lingual Transfer of Twitter Sentiment Models Using a Common Vector Space

Word embeddings represent words in a numeric space in such a way that se...
research
07/09/2018

Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings

The notions of concreteness and imageability, traditionally important in...
research
04/01/2016

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Despite interest in using cross-lingual knowledge to learn word embeddin...
research
05/11/2018

Cross-lingual Document Retrieval using Regularized Wasserstein Distance

Many information retrieval algorithms rely on the notion of a good dista...
research
09/15/2018

CLUSE: Cross-Lingual Unsupervised Sense Embeddings

This paper proposes a modularized sense induction and representation lea...
research
08/31/2018

Gromov-Wasserstein Alignment of Word Embedding Spaces

Cross-lingual or cross-domain correspondences play key roles in tasks ra...
research
11/12/2018

Unseen Word Representation by Aligning Heterogeneous Lexical Semantic Spaces

Word embedding techniques heavily rely on the abundance of training data...

Please sign up or login with your details

Forgot password? Click here to reset