A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

08/16/2015
by   Shaohua Li, et al.
0

Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using Singular Value Decomposition (SVD), may incur loss of corpus information. In addition, it is desirable to incorporate global latent factors, such as topics, sentiments or writing styles, into the word embedding model. Since generative models provide a principled way to incorporate latent factors, we propose a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models. The model inference reduces to a low rank weighted positive semidefinite approximation problem. Its optimization is approached by eigendecomposition on a submatrix, followed by online blockwise regression, which is scalable and avoids the information loss in SVD. In experiments on 7 common benchmark datasets, our vectors are competitive to word2vec, and better than other MF-based methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2023

Word Embedding with Neural Probabilistic Prior

To improve word representation learning, we propose a probabilistic prio...
research
04/10/2017

Word Embeddings via Tensor Factorization

Most popular word embedding techniques involve implicit or explicit fact...
research
12/12/2018

Word Embedding based on Low-Rank Doubly Stochastic Matrix Decomposition

Word embedding, which encodes words into vectors, is an important starti...
research
10/02/2017

Weighted-SVD: Matrix Factorization with Weights on the Latent Factors

The Matrix Factorization models, sometimes called the latent factor mode...
research
11/06/2015

Towards a Better Understanding of Predict and Count Models

In a recent paper, Levy and Goldberg pointed out an interesting connecti...
research
09/21/2019

Low-Rank Approximation of Matrices for PMI-based Word Embeddings

We perform an empirical evaluation of several methods of low-rank approx...
research
08/21/2018

Downsampling Strategies are Crucial for Word Embedding Reliability

The reliability of word embeddings algorithms, i.e., their ability to pr...

Please sign up or login with your details

Forgot password? Click here to reset