Deconstructing and reconstructing word embedding algorithms

11/29/2019
by   Edward Newell, et al.
0

Uncontextualized word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applications. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the necessary and sufficient conditions required for making performant word embeddings. We find that each algorithm: (1) fits vector-covector dot products to approximate pointwise mutual information (PMI); and, (2) modulates the loss gradient to balance weak and strong signals. We demonstrate that these two algorithmic features are sufficient conditions to construct a novel word embedding algorithm, Hilbert-MLE. We find that its embeddings obtain equivalent or better performance against other algorithms across 17 intrinsic and extrinsic datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2020

Deconstructing word embedding algorithms

Word embeddings are reliable feature representations of words used to ob...
research
11/06/2019

Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form

Word embedding algorithms produce very reliable feature representations ...
research
06/14/2023

Contrastive Loss is All You Need to Recover Analogies as Parallel Lines

While static word embedding models are known to represent linguistic ana...
research
03/03/2021

Lex2vec: making Explainable Word Embedding via Distant Supervision

In this technical report we propose an algorithm, called Lex2vec, that e...
research
06/16/2022

TransDrift: Modeling Word-Embedding Drift using Transformer

In modern NLP applications, word embeddings are a crucial backbone that ...
research
07/02/2018

Transparent, Efficient, and Robust Word Embedding Access with WOMBAT

We present WOMBAT, a Python tool which supports NLP practitioners in acc...
research
09/12/2018

Graph Convolutional Networks based Word Embeddings

Recently, word embeddings have been widely adopted across several NLP ap...

Please sign up or login with your details

Forgot password? Click here to reset