Invariance and identifiability issues for word embeddings

11/06/2019
by   Rachel Carrington, et al.
0

Word embeddings are commonly obtained as optimizers of a criterion function f of a text corpus, but assessed on word-task performance using a different evaluation function g of the test data. We contend that a possible source of disparity in performance on tasks is the incompatibility between classes of transformations that leave f and g invariant. In particular, word embeddings defined by f are not unique; they are defined only up to a class of transformations to which f is invariant, and this class is larger than the class to which g is invariant. One implication of this is that the apparent superiority of one word embedding over another, as measured by word task performance, may largely be a consequence of the arbitrary elements selected from the respective solution sets. We provide a formal treatment of the above identifiability issue, present some numerical examples, and discuss possible resolutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2019

KaWAT: A Word Analogy Task Dataset for Indonesian

We introduced KaWAT (Kata Word Analogy Task), a new word analogy task da...
research
04/21/2015

Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representation on Sequence Labelling Tasks

Word embeddings -- distributed word representations that can be learned ...
research
02/06/2019

Word Embeddings for Entity-annotated Texts

Many information retrieval and natural language processing tasks due to ...
research
01/10/2020

MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding

Word embeddings have undoubtedly revolutionized NLP. However, pre-traine...
research
07/14/2017

Rotations and Interpretability of Word Embeddings: the Case of the Russian Language

Consider a continuous word embedding model. Usually, the cosines between...
research
10/11/2021

A Comprehensive Comparison of Word Embeddings in Event Entity Coreference Resolution

Coreference Resolution is an important NLP task and most state-of-the-ar...
research
01/18/2021

Alignment and stability of embeddings: measurement and inference improvement

Representation learning (RL) methods learn objects' latent embeddings wh...

Please sign up or login with your details

Forgot password? Click here to reset