Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

06/02/2016
by   Alexandre Salle, et al.
0

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2016

Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory

In this paper we take a state-of-the-art model for distributed word repr...
research
02/06/2016

Swivel: Improving Embeddings by Noticing What's Missing

We present Submatrix-wise Vector Embedding Learner (Swivel), a method fo...
research
08/19/2019

Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics

In distributional semantics, the pointwise mutual information (PMI) weig...
research
04/26/2017

Riemannian Optimization for Skip-Gram Negative Sampling

Skip-Gram Negative Sampling (SGNS) word embedding model, well known by i...
research
11/17/2021

How and When Random Feedback Works: A Case Study of Low-Rank Matrix Factorization

The success of gradient descent in ML and especially for learning neural...
research
06/29/2018

Play Duration based User-Entity Affinity Modeling in Spoken Dialog System

Multimedia streaming services over spoken dialog systems have become ubi...
research
10/21/2019

Improving Word Representations: A Sub-sampled Unigram Distribution for Negative Sampling

Word2Vec is the most popular model for word representation and has been ...

Please sign up or login with your details

Forgot password? Click here to reset