How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks

02/07/2017
by   Stanisław Jastrzębski, et al.
0

Maybe the single most important goal of representation learning is making subsequent learning faster. Surprisingly, this fact is not well reflected in the way embeddings are evaluated. In addition, recent practice in word embeddings points towards importance of learning specialized representations. We argue that focus of word representation evaluation should reflect those trends and shift towards evaluating what useful information is easily accessible. Specifically, we propose that evaluation should focus on data efficiency and simple supervised tasks, where the amount of available data is varied and scores of a supervised model are reported for each subset (as commonly done in transfer learning). In order to illustrate significance of such analysis, a comprehensive evaluation of selected word embeddings is presented. Proposed approach yields a more complete picture and brings new insight into performance characteristics, for instance information about word similarity or analogy tends to be non--linearly encoded in the embedding space, which questions the cosine-based, unsupervised, evaluation methods. All results and analysis scripts are available online.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2018

Semi-Supervised Multi-Task Word Embeddings

Word embeddings have been shown to benefit from ensembling several word ...
research
04/21/2015

Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representation on Sequence Labelling Tasks

Word embeddings -- distributed word representations that can be learned ...
research
01/10/2020

MoRTy: Unsupervised Learning of Task-specialized Word Embeddings by Autoencoding

Word embeddings have undoubtedly revolutionized NLP. However, pre-traine...
research
09/24/2020

CogniFNN: A Fuzzy Neural Network Framework for Cognitive Word Embedding Evaluation

Word embeddings can reflect the semantic representations, and the embedd...
research
06/24/2022

Using BERT Embeddings to Model Word Importance in Conversational Transcripts for Deaf and Hard of Hearing Users

Deaf and hard of hearing individuals regularly rely on captioning while ...
research
09/19/2019

CogniVal: A Framework for Cognitive Word Embedding Evaluation

An interesting method of evaluating word representations is by how much ...
research
09/18/2019

Decision-Directed Data Decomposition

We present an algorithm, Decision-Directed Data Decomposition, which dec...

Please sign up or login with your details

Forgot password? Click here to reset