DeepAI AI Chat
Log In Sign Up

Invariance and identifiability issues for word embeddings

11/06/2019
by   Rachel Carrington, et al.
The University of Nottingham
0

Word embeddings are commonly obtained as optimizers of a criterion function f of a text corpus, but assessed on word-task performance using a different evaluation function g of the test data. We contend that a possible source of disparity in performance on tasks is the incompatibility between classes of transformations that leave f and g invariant. In particular, word embeddings defined by f are not unique; they are defined only up to a class of transformations to which f is invariant, and this class is larger than the class to which g is invariant. One implication of this is that the apparent superiority of one word embedding over another, as measured by word task performance, may largely be a consequence of the arbitrary elements selected from the respective solution sets. We provide a formal treatment of the above identifiability issue, present some numerical examples, and discuss possible resolutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/17/2019

KaWAT: A Word Analogy Task Dataset for Indonesian

We introduced KaWAT (Kata Word Analogy Task), a new word analogy task da...
02/06/2019

Word Embeddings for Entity-annotated Texts

Many information retrieval and natural language processing tasks due to ...
12/11/2018

On the Dimensionality of Word Embedding

In this paper, we provide a theoretical understanding of word embedding ...
01/10/2020

MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding

Word embeddings have undoubtedly revolutionized NLP. However, pre-traine...
10/11/2021

A Comprehensive Comparison of Word Embeddings in Event Entity Coreference Resolution

Coreference Resolution is an important NLP task and most state-of-the-ar...