Vector of Locally-Aggregated Word Embeddings (VLAWE): A novel document-level embedding

02/23/2019
by   Radu Tudor Ionescu, et al.
0

In this paper, we propose a novel representation for text documents based on aggregating word embedding vectors into document embeddings. Our approach is inspired by the Vector of Locally-Aggregated Descriptors used for image representation, and it works as follows. First, the word embeddings gathered from a collection of documents are clustered by k-means in order to learn a codebook of semnatically-related word embeddings. Each word embedding is then associated to its nearest cluster centroid (codeword). The Vector of Locally-Aggregated Word Embeddings (VLAWE) representation of a document is then computed by accumulating the differences between each codeword vector and each word vector (from the document) associated to the respective codeword. We plug the VLAWE representation, which is learned in an unsupervised manner, into a classifier and show that it is useful for a diverse set of text classification tasks. We compare our approach with a broad range of recent state-of-the-art methods, demonstrating the effectiveness of our approach. Furthermore, we obtain a considerable improvement on the Movie Review data set, reporting an accuracy of 93.3 state-of-the-art approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2019

Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

In this paper, we propose a novel representation for text documents base...
research
07/25/2017

From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings

In this paper, we propose a novel approach for text classification based...
research
07/18/2019

Evaluating the Utility of Document Embedding Vector Difference for Relation Learning

Recent work has demonstrated that vector offsets obtained by subtracting...
research
09/07/2018

Learning Embeddings of Directed Networks with Text-Associated Nodes---with Applications in Software Package Dependency Networks

A network embedding consists of a vector representation for each node in...
research
07/08/2018

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Since the amount of information on the internet is growing rapidly, it i...
research
11/27/2015

Category Enhanced Word Embedding

Distributed word representations have been demonstrated to be effective ...
research
11/29/2018

Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

Current state-of-the-art nonparametric Bayesian text clustering methods ...

Please sign up or login with your details

Forgot password? Click here to reset