Word Embeddings via Tensor Factorization

04/10/2017
by   Eric Bailey, et al.
0

Most popular word embedding techniques involve implicit or explicit factorization of a word co-occurrence based matrix into low rank factors. In this paper, we aim to generalize this trend by using numerical methods to factor higher-order word co-occurrence based arrays, or tensors. We present four word embeddings using tensor factorization and analyze their advantages and disadvantages. One of our main contributions is a novel joint symmetric tensor factorization technique related to the idea of coupled tensor factorization. We show that embeddings based on tensor factorization can be used to discern the various meanings of polysemous words without being explicitly trained to do so, and motivate the intuition behind why this works in a way that doesn't with existing methods. We also modify an existing word embedding evaluation metric known as Outlier Detection [Camacho-Collados and Navigli, 2016] to evaluate the quality of the order-N relations that a word embedding captures, and show that tensor-based methods outperform existing matrix-based methods at this task. Experimentally, we show that all of our word embeddings either outperform or are competitive with state-of-the-art baselines commonly used today on a variety of recent datasets. Suggested applications of tensor factorization-based word embeddings are given, and all source code and pre-trained vectors are publicly available online.

READ FULL TEXT
research
02/02/2019

Understanding Composition of Word Embeddings via Tensor Decomposition

Word embedding is a powerful tool in natural language processing. In thi...
research
12/14/2019

Integrating Lexical Knowledge in Word Embeddings using Sprinkling and Retrofitting

Neural network based word embeddings, such as Word2Vec and GloVe, are pu...
research
08/16/2015

A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

Most existing word embedding methods can be categorized into Neural Embe...
research
09/27/2018

Predictive Embeddings for Hate Speech Detection on Twitter

We present a neural-network based approach to classifying online hate sp...
research
03/18/2021

Generalized infinite factorization models

Factorization models express a statistical object of interest in terms o...
research
03/06/2017

Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use

The popular Alternating Least Squares (ALS) algorithm for tensor decompo...
research
02/21/2018

CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions

Word embedding is a useful approach to capture co-occurrence structures ...

Please sign up or login with your details

Forgot password? Click here to reset