The Spectral Underpinning of word2vec

02/27/2020
by   Ariel Jaffe, et al.
0

word2vec due to Mikolov et al. (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.

READ FULL TEXT
research
10/24/2018

Local Homology of Word Embeddings

Topological data analysis (TDA) has been widely used to make progress on...
research
04/05/2019

A Literature Study of Embeddings on Source Code

Natural language processing has improved tremendously after the success ...
research
03/08/2021

AfriVEC: Word Embedding Models for African Languages. Case Study of Fon and Nobiin

From Word2Vec to GloVe, word embedding models have played key roles in t...
research
12/06/2019

Improved Analysis of Spectral Algorithm for Clustering

Spectral algorithms are graph partitioning algorithms that partition a n...
research
10/05/2015

Nonlinear Spectral Analysis via One-homogeneous Functionals - Overview and Future Prospects

We present in this paper the motivation and theory of nonlinear spectral...
research
04/25/2017

Spectral Methods - Part 1: A fast and accurate approach for solving nonlinear diffusive problems

This paper proposes the use of the Spectral method to simulate diffusive...

Please sign up or login with your details

Forgot password? Click here to reset