CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions

02/21/2018
by   Kevin Tian, et al.
0

Word embedding is a useful approach to capture co-occurrence structures in a large corpus of text. In addition to the text data itself, we often have additional covariates associated with individual documents in the corpus---e.g. the demographic of the author, time and venue of publication, etc.---and we would like the embedding to naturally capture the information of the covariates. In this paper, we propose CoVeR, a new tensor decomposition model for vector embeddings with covariates. CoVeR jointly learns a base embedding for all the words as well as a weighted diagonal transformation to model how each covariate modifies the base embedding. To obtain the specific embedding for a particular author or venue, for example, we can then simply multiply the base embedding by the transformation matrix associated with that time or venue. The main advantages of our approach is data efficiency and interpretability of the covariate transformation matrix. Our experiments demonstrate that our joint model learns substantially better embeddings conditioned on each covariate compared to the standard approach of learning a separate embedding for each covariate using only the relevant subset of data, as well as other related methods. Furthermore, CoVeR encourages the embeddings to be "topic-aligned" in the sense that the dimensions have specific independent meanings. This allows our covariate-specific embeddings to be compared by topic, enabling downstream differential analysis. We empirically evaluate the benefits of our algorithm on several datasets, and demonstrate how it can be used to address many natural questions about the effects of covariates.

READ FULL TEXT
research
06/21/2017

Jointly Learning Word Embeddings and Latent Topics

Word embedding models such as Skip-gram learn a vector-space representat...
research
02/02/2019

Understanding Composition of Word Embeddings via Tensor Decomposition

Word embedding is a powerful tool in natural language processing. In thi...
research
06/06/2022

Class Prior Estimation under Covariate Shift – no Problem?

We show that in the context of classification the property of source and...
research
01/08/2019

Deconstructing Word Embeddings

A review of Word Embedding Models through a deconstructive approach reve...
research
04/10/2017

Word Embeddings via Tensor Factorization

Most popular word embedding techniques involve implicit or explicit fact...
research
05/11/2018

Covariate-Adjusted Tensor Classification in High-Dimensions

In contemporary scientific research, it is of great interest to predict ...
research
05/23/2018

Embedding Syntax and Semantics of Prepositions via Tensor Decomposition

Prepositions are among the most frequent words in English and play compl...

Please sign up or login with your details

Forgot password? Click here to reset