Current sentiment analysis methods - ranging from baseline bag-of-words methods to state-of-the-art neural methods - typically focus on deducing information on subjectivity or polarity only (Section 2). Human emotions move far beyond these simple metrics and are much more diverse. This implies that such subjectivity- or polarity-analysis only gives limited information on the actual intent of an author of a message.
Defining axes of polarity is not a hard task, typically one has negativity, positivity and a notion of neutrality or objectivity in between. For emotions however, defining a complete and clear set of emotions is much more difficult. Though several researchers attempted at defining standards in this field [Parrott2001, Plutchik1980, Schroder et al.2011], AAAC111The Association for the Advancement of Affective Computing - http://emotion-research.net/, there is still no consensus on a basic set of emotions that is generally accepted and could be objectively verified.
The goal of this paper is to present a sentiment analysis approach accompanied by a model of emotions that fit well together in order to set a standard in emotion analysis to expand upon.
We present a new RBEM-Emo approach for emotion detection from human-written texts.222We expect a revised and extended version of this manuscript describing RBEM-Emo to appear in [Tromp and Pechenizkiy2015] This algorithm is based on work by [Tromp and Pechenizkiy2013] where the authors introduced the Rule-Based Emission Model (RBEM) algorithm for polarity detection only. RBEM generates positive and negative emissions based on several groups of patterns that capture various ways how sentiment can be expressed in natural language. We show how this approach can be developed further to go beyond polarity and measure emotions as given by Plutchik’s wheel of emotions.
We conducted an experimental evaluation of RBEM-Emo on a publicly available benchmark and on a new benchmark that we constructed. The results of our evaluation suggest that RBEM-Emo outperforms the current state-of-the-art approaches for emotion detection. To facilitate reproducibility of the results and further progress in emotion classification we made our benchmark publicly available.
2 Related Work
Moving beyond polarity in sentiment analysis in currently upcoming and not well studied yet. Few examples can be found where novel methods are introduced to capture more information than just polarity such as the work of [Socher et al.2011] where a recursive auto-encoder is used to predict sentiment distributions in five dimensions. [Cambria and Hussain2012] and [Cambria et al.2012] promote affective computing using a framework they call SenticNet. The sentiment dimensions of this framework are modeled in an hourglass-model which is a derivative of Plutchik’s wheel of emotions [Plutchik1980]. In [Mohammad2012] the author collected and experimented with a large collection of tweets with self-labeled emotion hashtags.
The closest work to our approach is [Andreevskaia and Bergler2007], in which the authors considered a rule-based approach based on a set of positive and negative patterns and valence shifters for handling negations and other linguistic constructs defining the sentiment of a sentence.
Standards on emotion frameworks are difficult to define as emotions are usually subjective and cannot be crisply defined. Works of [Parrott2001, Plutchik1980, Schroder et al.2011] do aim to define standards in this area by defining a minimal set of basic emotions from which more complex ones can be derived or constructed by combining basic emotions. In [Cambria et al.2012] the authors develop methods to reason about emotions. In [Ekman1989], facial expressions are linked to emotions and a final six universal basic emotions are presented.
3 Approach to Emotion Detection
3.1 Plutchik’s Wheel of Emotions
To tackle the problem of emotion detection, one needs to have a notion of emotion. As e.g. in text mining the problem can be formulated differently depending on whether we have just two classes like in spam filtering, or several categories like topic classification or a large number of categories like in automated tagging. We choose the wheel of emotions defined by Robert Plutchik [Plutchik1980] (see Figure 1) because it defines only eight basic emotions, which makes the problem manageable for envisioned applications and RBEM-Emo a good match to perform classification according to this model of emotions.
These eight emotions are assumed to be complete in the sense that any expressed emotion is related or subsumed by one of the eight. In his work, Plutchik states that these emotions are culturally independent. Given this assumption, we can apply this model to any given language, which we consider to be a strong point.
Another reason for using this model is that each of these eight basic emotions are opposites of one of the other basic emotions. This means that we can in fact measure four axes where opposite emotions exist on the two extremes of a single axis. Additionally, Plutchik defines eight human feelings that are derivatives of combinations of two basic emotions. This in fact means that with modeling only four axes, we can get a total of sixteen dimensions of emotions and feelings.
3.2 RBEM for Emotion Detection
In our previous work we conducted several case studies with RBEM illustrating that it is rather generic and easily extendable allowing to develop different solutions that are scalable, transparent, and easy to maintain and adapt for the needs of a particular domain. We considered integrating RBEM into a larger data analytics project [Tromp and Pechenizkiy2011] and into mobile settings with computing resources be a bottleneck [Chambers et al.2012]. Hence, we had a good incentive to extend it to emotion classification on social media.
The original Rule-Based Emission Model (RBEM) algorithm [Tromp and Pechenizkiy2013] can be used for polarity detection assigning new messages a label that is one of positive, neutral, negative. The algorithm’s internals work in such a way that either positive or negative emissions can be generated upon which subsequently different rules are executed to modify these emissions.
The rules work on patterns that belong to one of the following groups: positive and negative patterns, e.g. good, well done and bad, terrible; amplifier and attenuator patterns to strengthen or weaken polarity of entities very much, a lot and a little, a tiny bit; right- and left-flip patterns to handle negations, e.g. not, no and sentences containing constructs with e.g. but, however; continuator patterns to handle constructs with e.g. and, and also; stop patterns to interrupt the emission of polarity when punctuation signs such as a dot or an exclamation mark, expressing the general case that polarity does not cross sentence boundaries, appear in a message.
Emission of emotions rules:
Crucial to the algorithm is that positivity and negativity are opposites of each other and hence allow for example negations to simply invert the emission. This specific characteristic of the algorithm makes it work well with Plutchik’s model since the emotions defined in that model are also opposites of each other. We in fact extend the RBEM algorithm to perform the same type of rules but now – instead of having one axis to measure; positive on one end of the extreme and negative on the other extreme – we have four different axes, together yielding eight different emotions being measured.
The RBEM algorithm requires pattern groups to be defined. It uses a pattern matching on wildcards to identify patterns in a message. When classifying previously unseen messages, two steps are performed. First all patterns in the model that match a message are collected. Then, rule(s) associated with each pattern group for each pattern present in the message are applied.
This actual internal algorithms for constructing and applying RBEM remain unchanged. We refer to the original paper on RBEM for their formal description [Tromp and Pechenizkiy2013].
RBEM-Emo extends RBEM for emotion detection by introducing new pattern groups. The RBEM algorithm uses two base pattern groups to define emission of polarity, positive and negative patterns. For our RBEM-Emo algorithm, we replace these two pattern groups with eight new pattern groups, one for each basic emotion of Plutchik’s model: joy, sadness, trust, disgust, fear, anger, surprise, anticipation. Similarly, we replace the two rules that are defined on positive and negative patterns with eight new rules. Note that conceptually, we perform the exact same process we do for positive polarity on one hand and negative polarity on the other hand, but now four times, once for each axis.
Since we no longer operate on a single emission score but instead on four, we define a mapping from emotions to an index by and we define a sign counterparts for each emotion on a single axis. Here and , and , and , and . We also define a subscripted emission score where and the value of corresponds with the emotion axis for the emotions that map to using (i.e. is the axis function used by Joy and Sadness).
The new rules that replace the original rules defining positive sentiment emission and negative sentiment emission are defined as shown at the top of the page333Note that the RBEM algorithm requires rules to be executed in-order..
All the the other original RBEM rules are executed four times, once for every . When the algorithm terminates, this yields us four emission scores, i.e. one score per dimension.
Once the algorithm has terminated, we can obtain a total score for each pair or opposite emotions, e.g. for Joy and Sadness by summing of all emissions of . . Whenever we say that Joy was expressed in the original message. Similarly, when , we say that Sadness was expressed. If , neither Joy nor Sadness was expressed. The other three emission axes can be interpreted similarly.
As an illustrative example, consider the sentence I thought I would like the new XYZ phone, but now that I have it, it is a huge disappointment, it makes me angry. Suppose also that we have the following patterns (Part-of-Speech tags left out for simplicity): , , , , . The algorithm would first assign the emotion scores to all parts of the sentence where patterns are found. This would yield the first part emitting negatively on , the third phrase emitting negatively on and the last phrase emitting negatively on . Next, the scores on pattern indicated by the word huge will amplify the emissions on all axes, with the biggest effect on . Finally, the leftflip indicated by but will convert all negative emissions on its left – influencing mainly – to its opposite direction, yielding positive emissions on . The final outcome will hence be that – ordered by decreasing strength – Sadness, Anger and Surprise are present.
4 Experimental Evaluation
With the experimental study we aim to evaluate the proposed RBEM-Emo algorithm, which is tailored towards Plutchik’s model of emotions.
4.1 Experiment Setup
We compare our method against a majority class baseline, Support Vector Machines (SVMs), regression and the recursive auto-encoder of[Socher et al.2011] and evaluate on accuracy. In [Socher et al.2011] five-dimensional sentiment model originating from the Experience Project444See http://www.experienceproject.com is introduced. It would be reasonable to evaluate on this dataset, but the five labels used to express emotions in that dataset are quite arbitrary and ambiguous555The labels are Sorry, Hugs, You Rock, Teehee, I Understand and Wow, Just Wow, as the authors already indicate. In addition, these labels are produced by users that read an actual confession by a different person and instead of capturing the emotion of the actual message hence capture the emotion triggered with an external reader.
Due to the impracticalities of the Experience Project dataset for our experiments, we instead benchmark on a different, well-accepted dataset introduced in [Alm2008]. This dataset is annotated using Ekman’s emotions [Ekman1989] instead of Plutchik’s, but since the six basic emotions of Ekman are subsumed by the eight emotions of Plutchik’s model, we can use the labels in a straightforward manner, ignoring labels produced by RBEM-Emo that do not exist in Ekman’s model and producing the majority class as label in case we find a non-existing emotion. We refer to this dataset as the Affect Dataset.
In addition to benchmarking on a well-accepted public dataset, we also introduce our own Twitter Dataset that is annotated on Plutchik’s emotions.
For the SVM and regression classification we use LibShortText [Yu et al.2013]. We experiment using both word counts and TF-IDF scores as features. For the recursive auto-encoder, we use the Java version referenced to by the authors of [Socher et al.2011]666Can be found at https://github.com/sancha/jrae. To ensure we have the right setup of the auto-encoder, we reproduced the polarity detection experiments on the rotten tomatoes dataset as done in [Socher et al.2011] and obtained an accuracy of 77.0%. This is in line with the results presented in [Socher et al.2011], illustrating our setup is valid. When we apply our RBEM-Emo classifier, we get four scores for each axis in Plutchik’s model, summing up to eight emotions. Finally, we assign a single label corresponding to the highest of all eight emotion scores.
4.2 Datasets Description
The Affect Dataset we use is presented in [Alm2008] and is publicly available777http://lrc.cornell.edu/swedish/dataset/affectdata/. This dataset consists of snippets of text obtained from books written by three different authors.
For each snippet, every sentence is annotated by two annotators. These annotators provide two different labels each, one for the prevailing emotion found in the sentence and one for the mood found. The available labels are the six basic emotions of Ekman’s universal emotions, being angry, disgusted, fearful, happy, sad, surprised. In addition, the authors could also indicate neutrality.
We use only those messages for which both annotators agree upon emotion and we discard the mood label produced by the annotators. Moreover, since 85% of all sentences in the dataset are neutral, and many general purpose classification techniques suffer from class imbalance, we produce two different datasets, one where neutral sentences are removed and only emotion-bearing sentences are maintained and one where neutral messages are included. For evaluation purposes, we use roughly of the data for training and for testing. The resulting sizes of the training sets are 7527 and 1084 instances depending on the in- or exclusion of the neutral class, and for test sets – 3590 and 488 instances correspondingly.
Since the proposed RBEM-Emo method is tightly integrated with Plutchik’s wheel of emotions, we evaluate on data annotated on these emotions. We collected a large amount of tweets in three different languages: English, Dutch and German. We had at least two independent annotators to annotate each of these messages using a dedicated Web-based annotation tool. In case of disagreement, we use the prevailing emotion label given by the annotators as actual label for a message. If there is no agreement on the prevailing emotion label, the message was discarded.
In addition, the annotators were asked to identify patterns in these messages such that we can later on construct the RBEM-Emo model from them.
The data was collected from Twitter where a language detection algorithm was used to filter out those messages that are written in English, Dutch or German as a first step. All messages wrongly identified by language are later on filtered out by the annotators.
In line with the setup of the experiments presented in [Socher et al.2011] and adhered to here, we randomly split the data into roughly training and test data. The resulting training/test set sizes are Dutch 289/113 for Dutch, 235/113 for English and 225/109 for German.
The Twitter dataset is made publicly available888http://www.win.tue.nl/~mpechen/projects/smm/.
The accuracies of the best performing general purpose classification techniques on the Affect Dataset are compared to those of RBEM-Emo in Table 1. The majority class classification accuracy is given as a baseline. We report accuracies both for the case when neutral messages are kept in our dataset and when they are filtered out. We do this since the neutral messages compose 85% of the entire original dataset and it is expected that generic classification techniques will suffer from class imbalance and learn biases towards this data rather than find actual emotions. This is reflected in the accuracies of the SVM and regression classifiers which are marginally higher than the majority class baseline. Surprisingly, the recursive auto-encoder (RAE) that is currently claimed to be the state-of-the-art technique for emotion classification performs worse than several simpler classifiers and in fact is as good as a majority class classifier. One possible reason for this might be that the size of our dataset is relatively small. RBEM-Emo classifier being a tailor approach to deduce emotional patterns outperforms the other classifiers.
In the second column of Table 1, we report the accuracies when all messages belonging to the neutral class are removed, yielding a more class-balanced dataset. Here we see much better improvements over the majority class baseline for SVM and regression and now also for the recursive auto-encoder. Using TF-IDF scores for features is favored over using just word counts. The RBEM-Emo method however, still outperforms the other classifiers.
Table 2 lists the accuracies obtained per language on our own Twitter corpus. For each classifier, we report the accuracy on each language (being Dutch, English and German) and report a total accuracy which is the average accuracy over all messages in all three languages. A generic result over all classifiers is that the accuracies on English data seem to be the lowest, implying most ambiguity within this language. Remarkable is that the recursive auto-encoder performs worse than SVM and regression models and yields no benefit over the majority class guess. Again, this could be due to the small size of the corpus or difficulty in finding the most suitable model parameters. There is no clear evidence on whether TF-IDF scores or word counts work better for this dataset. The RBEM-Emo classifiers yields the highest accuracy for each of three languages.
|Method||Acc. w/ Ntl||Acc. no Ntl|
In this work we have introduced a new rule-based classification technique called RBEM-Emo for emotion classification on social media. This emotion classification approach is tightly coupled with the Plutchik’s model of emotions. We proposed to use this model because it relatively compact yet complete and models emotions as opposites of each other, a feature that works well with RBEM-Emo.
The results of our experimental study show that RBEM-Emo is competitive to the current state-of-the-art approaches to sentiment and emotion classification.
New approaches for emotion classification appear every year. It is important to facilitate an easy way to benchmark and compare their performance. For studying emotion classification with Plutchik’s model, we developed a new benchmark with carefully annotated Twitter messages in three different languages. To increase the reproducibility of our work and facilitate further development in this area, we released this benchmark to the public access. We also released the RBEM-Emo patterns extracted from the training dataset.
- [Alm2008] E.C.O. Alm. 2008. Affect in text and speech, PhD thesis.
- [Andreevskaia and Bergler2007] A. Andreevskaia and S. Bergler. 2007. Clac and clac-nb: Knowledge-based and corpus-based approaches to sentiment tagging. In Proceedings of the 4th International Workshop on Semantic Evaluations, SemEval ’07, pages 117–120, Stroudsburg, PA, USA. Association for Computational Linguistics.
- [Cambria and Hussain2012] E. Cambria and A. Hussain. 2012. Sentic Computing: Techniques, Tools, and Applications. Berlin Heidelberg: Springer.
[Cambria et al.2012]
E. Cambria, C. Havasi, and A. Hussain.
Senticnet 2: A semantic and affective resource for opinion mining and
Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference. AAAI Press.
- [Chambers et al.2012] L. Chambers, E. Tromp, M. Pechenizkiy, and M. Gaber. 2012. Mobile sentiment analysis. In Advances in Knowledge-Based and Intelligent Information and Engineering Systems - 16th Annual KES Conference, pages 470–479.
- [Ekman1989] P. Ekman, 1989. Handbook of Social Psychophysiology, chapter The Argument and Evidence about Universals in Facial Expressions of Emotion, pages 143–164. Wiley Handbooks of Psychophysiology.
- [Mohammad2012] S. Mohammad. 2012. #emotional tweets. In SEM 2012: The 1st Joint Conference on Lexical and Computational Semantics, pages 246–255. Association for Computational Linguistics.
- [Parrott2001] W.G. Parrott. 2001. Emotions in Social Psychology. Psychology Press, Philadelphia.
- [Plutchik1980] R. Plutchik, 1980. A general psychoevolutionary theory of emotion, pages 3–33. Academic press, New York.
- [Schroder et al.2011] M. Schroder, P. Baggia, F. Burkhardt, C. Pelachaud, C. Peter, and E. Zovato. 2011. Emotionml - an upcoming standard for representing emotions and related states. In Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction - Volume Part I, ACII’11, pages 316–325, Berlin, Heidelberg. Springer-Verlag.
[Socher et al.2011]
R. Socher, J. Pennington, E.H. Huang, A.Y. Ng, and C.D. Manning.
Semi-supervised recursive autoencoders for predicting sentiment distributions.In
Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP’11, pages 151–161, Stroudsburg, PA, USA. Association for Computational Linguistics.
- [Tromp and Pechenizkiy2011] E. Tromp and M. Pechenizkiy. 2011. Senticorr: Multilingual sentiment analysis of personal correspondence. In 11th IEEE International Conference on Data Mining (demo paper), pages 1247–1250.
- [Tromp and Pechenizkiy2013] E. Tromp and M. Pechenizkiy. 2013. RBEM: a rule based approach to polarity detection. In Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, WISDOM 2013.
- [Tromp and Pechenizkiy2015] E. Tromp and M. Pechenizkiy. 2015. Pattern-based emotion classification on social media. In (to appear) book chapter in Advances in Social Media Analysis. Springer.
- [Yu et al.2013] H. Yu, C. Ho, Y. Juan, and C. Lin. 2013. Libshorttext: A library for short-text classification and analysis.