Context Vectors are Reflections of Word Vectors in Half the Dimensions

02/26/2019
by   Zhenisbek Assylbekov, et al.
0

This paper takes a step towards theoretical analysis of the relationship between word embeddings and context embeddings in models such as word2vec. We start from basic probabilistic assumptions on the nature of word vectors, context vectors, and text generation. These assumptions are well supported either empirically or theoretically by the existing literature. Next, we show that under these assumptions the widely-used word-word PMI matrix is approximately a random symmetric Gaussian ensemble. This, in turn, implies that context vectors are reflections of word vectors in approximately half the dimensions. As a direct application of our result, we suggest a theoretically grounded way of tying weights in the SGNS model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2019

Analogies Explained: Towards Understanding Word Embeddings

Word embeddings generated by neural network methods such as word2vec (W2...
research
09/05/2017

Using k-way Co-occurrences for Learning Word Embeddings

Co-occurrences between two words provide useful insights into the semant...
research
10/18/2019

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Semantic representations of words have been successfully extracted from ...
research
11/09/2020

Catch the "Tails" of BERT

Recently, contextualized word embeddings outperform static word embeddin...
research
02/27/2020

Binarized PMI Matrix: Bridging Word Embeddings and Hyperbolic Spaces

We show analytically that removing sigmoid transformation in the SGNS ob...
research
11/29/2016

Geometry of Compositionality

This paper proposes a simple test for compositionality (i.e., literal us...
research
05/30/2018

What the Vec? Towards Probabilistically Grounded Embeddings

Vector representation, or embedding, of words is commonly achieved with ...

Please sign up or login with your details

Forgot password? Click here to reset