Evaluating Word Embeddings with Categorical Modularity

06/02/2021
by   Sílvia Casacuberta, et al.
0

We introduce categorical modularity, a novel low-resource intrinsic metric to evaluate word embedding quality. Categorical modularity is a graph modularity metric based on the k-nearest neighbor graph constructed with embedding vectors of words from a fixed set of semantic categories, in which the goal is to measure the proportion of words that have nearest neighbors within the same categories. We use a core set of 500 words belonging to 59 neurobiologically motivated semantic categories in 29 languages and analyze three word embedding models per language (FastText, MUSE, and subs2vec). We find moderate to strong positive correlations between categorical modularity and performance on the monolingual tasks of sentiment analysis and word similarity calculation and on the cross-lingual task of bilingual lexicon induction both to and from English. Overall, we suggest that categorical modularity provides non-trivial predictive information about downstream task performance, with breakdowns of correlations by model suggesting some meta-predictive properties about semantic information loss as well.

READ FULL TEXT
research
06/05/2019

A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity

Cross-lingual word embeddings encode the meaning of words from different...
research
07/18/2016

Language classification from bilingual word embedding graphs

We study the role of the second language in bilingual word embeddings in...
research
07/24/2019

Bilingual Lexicon Induction through Unsupervised Machine Translation

A recent research line has obtained strong results on bilingual lexicon ...
research
11/08/2019

Interactive Refinement of Cross-Lingual Word Embeddings

Cross-lingual word embeddings transfer knowledge between languages: mode...
research
11/16/2017

Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

We explore the use of unsupervised methods in Cross-Lingual Word Sense D...
research
04/04/2019

Density Matching for Bilingual Word Embedding

Recent approaches to cross-lingual word embedding have generally been ba...
research
12/13/2021

A cognitively driven weighted-entropy model for embedding semantic categories in hyperbolic geometry

In this paper, an unsupervised and cognitively driven weighted-entropy m...

Please sign up or login with your details

Forgot password? Click here to reset