Word Embeddings for Sentiment Analysis: A Comprehensive Empirical Survey

02/02/2019
by   Erion Çano, et al.
0

This work investigates the role of factors like training method, training corpus size and thematic relevance of texts in the performance of word embedding features on sentiment analysis of tweets, song lyrics, movie reviews and item reviews. We also explore specific training or post-processing methods that can be used to enhance the performance of word embeddings in certain tasks or domains. Our empirical observations indicate that models trained with multithematic texts that are large and rich in vocabulary are the best in answering syntactic and semantic word analogy questions. We further observe that influence of thematic relevance is stronger on movie and phone reviews, but weaker on tweets and lyrics. These two later domains are more sensitive to corpus size and training method, with Glove outperforming Word2vec. "Injecting" extra intelligence from lexicons or generating sentiment specific word embeddings are two prominent alternatives for increasing performance of word embedding features.

READ FULL TEXT

page 9

page 10

page 11

page 13

page 15

research
03/06/2020

Quality of Word Embeddings on Sentiment Analysis Tasks

Word embeddings or distributed representations of words are being used i...
research
08/14/2017

Data Sets: Word Embeddings Learned from Tweets and General Data

A word embedding is a low-dimensional, dense and real- valued vector rep...
research
01/17/2023

Word Embeddings as Statistical Estimators

Word embeddings are a fundamental tool in natural language processing. C...
research
08/18/2021

FeelsGoodMan: Inferring Semantics of Twitch Neologisms

Twitch chats pose a unique problem in natural language understanding due...
research
10/06/2018

Text-based Sentiment Analysis and Music Emotion Recognition

Sentiment polarity of tweets, blog posts or product reviews has become h...
research
06/20/2018

Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields

In this paper, the main goal is to detect a movie reviewer's opinion usi...
research
09/21/2017

Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts

Word embedding is a Natural Language Processing (NLP) technique that aut...

Please sign up or login with your details

Forgot password? Click here to reset