Analogies minus analogy test: measuring regularities in word embeddings

10/07/2020
by   Louis Fournier, et al.
0

Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim. We decompose and empirically analyze the classic arithmetic word analogy test, to motivate two new metrics that address the issues with the standard test, and which distinguish between class-wise offset concentration (similar directions between pairs of words drawn from different broad classes, such as France–London, China–Ottawa, ...) and pairing consistency (the existence of a regular transformation between correctly-matched pairs such as France:Paris::China:Beijing). We show that, while the standard analogy test is flawed, several popular word embeddings do nevertheless encode linguistic regularities.

READ FULL TEXT

page 4

page 11

research
10/23/2020

Dynamic Contextualized Word Embeddings

Static word embeddings that represent words by a single vector cannot ca...
research
02/23/2021

Paraphrases do not explain word analogies

Many types of distributional word embeddings (weakly) encode linguistic ...
research
08/14/2018

Embedding Grammars

Classic grammars and regular expressions can be used for a variety of pu...
research
08/16/2021

A visual remote associates test and its validation

The Remote Associates Test (RAT) is a widely used test for measuring cre...
research
09/04/2017

Hypothesis Testing based Intrinsic Evaluation of Word Embeddings

We introduce the cross-match test - an exact, distribution free, high-di...
research
07/28/2015

Reasoning about Linguistic Regularities in Word Embeddings using Matrix Manifolds

Recent work has explored methods for learning continuous vector space wo...
research
09/19/2020

Word class flexibility: A deep contextualized approach

Word class flexibility refers to the phenomenon whereby a single word fo...

Please sign up or login with your details

Forgot password? Click here to reset