Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews

12/27/2015
by   Bofang Li, et al.
0

Despite the loss of semantic information, bag-of-ngram based methods still achieve state-of-the-art results for tasks such as sentiment classification of long movie reviews. Many document embeddings methods have been proposed to capture semantics, but they still can't outperform bag-of-ngram based methods on this task. In this paper, we modify the architecture of the recently proposed Paragraph Vector, allowing it to learn document vectors by predicting not only words, but n-gram features as well. Our model is able to capture both semantics and word order in documents while keeping the expressive power of learned vectors. Experimental results on IMDB movie review dataset shows that our model outperforms previous deep learning models and bag-of-ngram based models due to the above advantages. More robust results are also obtained when our model is combined with other models. The source code of our model will be also published together with this paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2021

Improving Document-Level Sentiment Classification Using Importance of Sentences

Previous researchers have considered sentiment analysis as a document cl...
research
12/11/2015

Words are not Equal: Graded Weighting Model for building Composite Document Vectors

Despite the success of distributional semantics, composing phrases from ...
research
05/16/2014

Distributed Representations of Sentences and Documents

Many machine learning algorithms require the input to be represented as ...
research
05/26/2022

The Document Vectors Using Cosine Similarity Revisited

The current state-of-the-art test accuracy (97.42%) on the IMDB movie re...
research
09/20/2021

From None to Severe: Predicting Severity in Movie Scripts

In this paper, we introduce the task of predicting severity of age-restr...
research
12/19/2014

N-gram-Based Low-Dimensional Representation for Document Classification

The bag-of-words (BOW) model is the common approach for classifying docu...
research
07/29/2015

Document Embedding with Paragraph Vectors

Paragraph Vectors has been recently proposed as an unsupervised method f...

Please sign up or login with your details

Forgot password? Click here to reset