A New Statistical Approach for Comparing Algorithms for Lexicon Based Sentiment Analysis

by   Mateus Machado, et al.

Lexicon based sentiment analysis usually relies on the identification of various words to which a numerical value corresponding to sentiment can be assigned. In principle, classifiers can be obtained from these algorithms by comparison with human annotation, which is considered the gold standard. In practise this is difficult in languages such as Portuguese where there is a paucity of human annotated texts. Thus in order to compare algorithms, a next best step is to directly compare different algorithms with each other without referring to human annotation. In this paper we develop methods for a statistical comparison of algorithms which does not rely on human annotation or on known class labels. We will motivate the use of marginal homogeneity tests, as well as log linear models within the framework of maximum likelihood estimation We will also show how some uncertainties present in lexicon based sentiment analysis may be similar to those which occur in human annotated tweets. We will also show how the variability in the output of different algorithms is lexicon dependent, and quantify this variability in the output within the framework of log linear models.


page 1

page 2

page 3

page 4


NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

Sentiment analysis is one of the most widely studied applications in NLP...

SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets

In this paper, we present the results of the SemEval-2020 Task 9 on Sent...

RETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and CNN

This article presents classifiers based on SVM and Convolutional Neural ...

Sentiment of Emojis

There is a new generation of emoticons, called emojis, that is increasin...

CIDER: Context sensitive sentiment analysis for short-form text

Researchers commonly perform sentiment analysis on large collections of ...

Towards Enhancing Lexical Resource and Using Sense-annotations of OntoSenseNet for Sentiment Analysis

This paper illustrates the interface of the tool we developed for crowd ...

SentiWords: Deriving a High Precision and High Coverage Lexicon for Sentiment Analysis

Deriving prior polarity lexica for sentiment analysis - where positive o...

Please sign up or login with your details

Forgot password? Click here to reset