Spherical Paragraph Model

07/18/2017
by   Ruqing Zhang, et al.
0

Representing texts as fixed-length vectors is central to many language processing tasks. Most traditional methods build text representations based on the simple Bag-of-Words (BoW) representation, which loses the rich semantic relations between words. Recent advances in natural language processing have shown that semantically meaningful representations of words can be efficiently acquired by distributed models, making it possible to build text representations based on a better foundation called the Bag-of-Word-Embedding (BoWE) representation. However, existing text representation methods using BoWE often lack sound probabilistic foundations or cannot well capture the semantic relatedness encoded in word vectors. To address these problems, we introduce the Spherical Paragraph Model (SPM), a probabilistic generative model based on BoWE, for text representation. SPM has good probabilistic interpretability and can fully leverage the rich semantics of words, the word co-occurrence information as well as the corpus-wide information to help the representation learning of texts. Experimental results on topical classification and sentiment analysis demonstrate that SPM can achieve new state-of-the-art performances on several benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2017

word representation or word embedding in Persian text

Text processing is one of the sub-branches of natural language processin...
research
11/10/2017

Bayesian Paragraph Vectors

Word2vec (Mikolov et al., 2013) has proven to be successful in natural l...
research
05/16/2014

Distributed Representations of Sentences and Documents

Many machine learning algorithms require the input to be represented as ...
research
08/07/2020

A Context-based Disambiguation Model for Sentiment Concepts Using a Bag-of-concepts Approach

With the widespread dissemination of user-generated content on different...
research
04/05/2017

Bag-of-Words Method Applied to Accelerometer Measurements for the Purpose of Classification and Energy Estimation

Accelerometer measurements are the prime type of sensor information most...
research
01/29/2016

Zipf's law is a consequence of coherent language production

The task of text segmentation may be undertaken at many levels in text a...
research
02/25/2020

Declarative Memory-based Structure for the Representation of Text Data

In the era of intelligent computing, computational progress in text proc...

Please sign up or login with your details

Forgot password? Click here to reset