Swivel: Improving Embeddings by Noticing What's Missing

02/06/2016
by   Noam Shazeer, et al.
0

We present Submatrix-wise Vector Embedding Learner (Swivel), a method for generating low-dimensional feature embeddings from a feature co-occurrence matrix. Swivel performs approximate factorization of the point-wise mutual information matrix via stochastic gradient descent. It uses a piecewise loss with special handling for unobserved co-occurrences, and thus makes use of all the information in the matrix. While this requires computation proportional to the size of the entire matrix, we make use of vectorized multiplication to process thousands of rows and columns at once to compute millions of predicted values. Furthermore, we partition the matrix into shards in order to parallelize the computation across many nodes. This approach results in more accurate embeddings than can be achieved with methods that consider only observed co-occurrences, and can scale to much larger corpora than can be handled with sampling methods.

READ FULL TEXT
research
06/02/2016

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

In this paper, we propose LexVec, a new method for generating distribute...
research
05/04/2017

Matrix Factorization with Side and Higher Order Information

The problem of predicting unobserved entries of a partially observed mat...
research
11/16/2020

Graph embeddings via matrix factorization for link prediction: smoothing or truncating negatives?

Link prediction – the process of uncovering missing links in a complex n...
research
06/29/2021

Meta-learning for Matrix Factorization without Shared Rows or Columns

We propose a method that meta-learns a knowledge on matrix factorization...
research
12/23/2017

Online Forecasting Matrix Factorization

In this paper the problem of forecasting high dimensional time series is...
research
03/02/2018

Hashing with Mutual Information

Binary vector embeddings enable fast nearest neighbor retrieval in large...
research
07/18/2018

Efficient Training on Very Large Corpora via Gramian Estimation

We study the problem of learning similarity functions over very large co...

Please sign up or login with your details

Forgot password? Click here to reset