This paper will be presented at the 3rd ExPROM workshop at COLING 2016. http://www.cse.unt.edu/exprom2016/
Sarcasm detection is the computational task of predicting sarcasm in text. Past approaches in sarcasm detection rely on designing classifiers with specific features (to capture sentiment changes or incorporate context about the author, environment, etc.) [Joshi et al.2015, Wallace et al.2014, Rajadesingan et al.2015, Bamman and Smith2015]
, or model conversations using the sequence labeling-based approach by joshiconll. Approaches, in addition to this statistical classifier-based paradigm are: deep learning-based approaches as in the case of silvioconll or rule-based approaches such as riloffemnlp,joshiwassa.
This work employs a machine learning technique that, to the best of our knowledge, has not been used for computational sarcasm. Specifically, we introduce a topic model for extraction of sarcasm-prevalent topics and as a result, for sarcasm detection
employs a machine learning technique that, to the best of our knowledge, has not been used for computational sarcasm. Specifically, we introduce a topic model for extraction of sarcasm-prevalent topics and as a result, for sarcasm detection. Our model based on a supervised version of the Latent Dirichlet Allocation (LDA) model [Blei et al.2003] is able to discover clusters of words that correspond to sarcastic topics. The goal of this work is to discover sarcasm-prevalent topics based on sentiment distribution within text, and use these topics to improve sarcasm detection. The key idea of the model is that (a) some topics are more likely to be sarcastic than others, and (b) sarcastic tweets are likely to have a different distribution of positive-negative words as compared to literal positive or negative tweets. Hence, distribution of sentiment in a tweet is the central component of our model.
Our sarcasm topic model is learned on tweets that are labeled with three sentiment labels: literal positive, literal negative and sarcastic. In order to extract sarcasm-prevalent topics, the model uses three latent variables: a topic variable to indicate words that are prevalent in sarcastic discussions, a sentiment variable for sentiment-bearing words related to a topic, and a switch variable that switches between the two kinds of words (topic and sentiment-bearing words). Using a dataset of 166,955 tweets, our model is able to discover words corresponding to topics that are found in our corpus of positive, negative and sarcastic tweets.
We evaluate our model in two steps: a qualitative evaluation that ascertains sarcasm-prevalent topics based on the ones extracted, and a quantitative evaluation
that evaluates sub-components of the model. We also demonstrate how it can be used for sarcasm detection. To do so, we compare our model with two prior work, and observe a significant improvement of around 25% in the F-score.
The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 presents our motivation for using topic models for automatic sarcasm detection. Section 4 describes the design rationale and structure of our model. Section 5 describes the dataset and the experiment setup. Section 6 discusses the results in three steps: qualitative results, quantitative results and application of our topic model to sarcasm detection. Section 7 concludes the paper and points to future work.
2 Related Work
Topic models are popular for sentiment aspect extraction. asum present an aspect-sentiment unification model that learns different aspects of a product, and the words that are used to express sentiment towards the aspects. In terms of using two latent variables: one for aspect and one for sentiment, they are related to our model. arjun use a semi-supervised model in order to extract aspect-level sentiment. The role of the supervised sentiment label in our model is similar to their work. Finally, ratingdimensions attempt to generate rating dimensions of products using topic models. However, the topic models that have been reported in past work have been for sentiment analysis in general. They do not have any special consideration to either sarcasm as a label or sarcastic tweets as a special case of tweets. The hierarchy-based structure (specifically, the chain of distributions for sentiment label) in our model is based on joshi2016political who extract politically relevant topics from a dataset of political tweets. The chain in their case is sentiment distribution of an individual and a group.
Sarcasm detection approaches have also been reported in the past [Joshi et al.2016b, Liebrecht et al.2013, Wang et al.2015, Joshi et al.2015]. 29 present a contextual model for sarcasm detection that collectively models a set of tweets, using a sequence labeling algorithm - however, the goal is to detect sarcasm in the last tweet in the sequence. The idea of distribution of sentiment that we use in our model is based on the idea of context incongruity. In order to evaluate the benefit of our model to sarcasm detection, we compare two sarcasm detection approaches based on our model with two prior work, namely by buschmeier2014impact and liebrecht2013perfect. buschmeier2014impact train their classifiers using features such as unigrams, laughter expressions, hyperbolic expressions, etc. liebrecht2013perfect experiment with unigrams, bigrams and trigrams as features. To the best of our knowledge, past approaches for sarcasm detection do not use topic modeling, which we do.
Topic models enable discovery of thematic structures in a large-sized corpus. The motivation behind using topic models for sarcasm detection arises from two reasons: (a) presence of sarcasm-prevalent topics, and (b) differences in sentiment distribution in sarcastic and non-sarcastic text. In context of sentiment analysis, topic models have been used for aspect-based sentiment analysis in order to discover topic and sentiment words [Jo and Oh2011]
. The general idea is that for a restaurant review, the word ‘spicy’ is more likely to describe food as against ambiance. On similar lines, the discovery that a set of words belong to a sarcasm-prevalent topic - a topic regarding which sarcastic remarks are common - can be useful as additional information to a sarcasm detection system. The key idea of our approach is that some topics are more likely to evoke sarcasm than some others. For example, a tweet about working late night at office/ doing school homework till late night is much more probable to be sarcastic than a tweet on Mother’s Day. A sarcasm detection system can benefit from incorporating this information about sarcasm-prevalent topics. The second reason is the difference in sentiment distributions. A positive tweet is likely to contain only positive words, a negative tweet is likely to contain only negative words. On the other hand, a sarcastic tweet may contain a mix of the two kind of words (for example, ‘I love being ignored’ where ‘love’ is a positive word and ‘ignored’ is a negative word), except in the case of hyperbolic sarcasm (for example ‘This is the best movie ever!’ where ‘best’ is a positive word and there is no negative word). Hence, in addition to sarcasm-prevalent topics, sentiment distributions for tweets also form a critical component of our topic model.
|Observed Variables and Distributions|
|Word in a tweet|
|Label of a tweet; takes values: positive, negative, sarcastic)|
|Distribution over switch values given a word w|
|Latent Variables and Distributions|
|Topic of a tweet|
|Sentiment of a word in a tweet; takes values: positive, negative|
|Switch variable indicating whether a word is a topic word or a sentiment word; takes values: 0, 1|
|Distribution over topics given a label l, with prior|
|Distribution over words given a topic z and switch =0 (topic word), with prior|
|Distribution over words given sentiment s and switch=1 (sentiment word), with prior|
|Distribution over words given a sentiment s and topic z and switch=1 (sentiment word), with prior|
|Distribution over sentiment given a label l and switch =1 (sentiment word), with prior|
|Distribution over sentiment given a label l and topic z and switch =1 (sentiment word), with prior|
4 Sarcasm Topic Model
4.1 Design Rationale
Our topic model requires sentiment labels of tweets, as used in Ramage:2009:LLS:1699510.1699543. This sentiment can be positive or negative. However, in order to incorporate sarcasm, we re-organize the two sentiment values into three: literal positive, literal negative and sarcastic. The observed variable in our model indicates this sentiment label. For sake of simplicity, we refer to the three values of as positive, negative and sarcastic, in rest of the paper.
Every word in a tweet is either a topic word or a sentiment word. A topic word arises due to a topic, whereas a sentiment word arises due to combination of topic and sentiment. This notion is common to several sentiment-based topic models from past work [Jo and Oh2011]. To determine which of the two (topic or sentiment word) a given word is, our model uses three latent variables: a tweet-level topic label , a word-level sentiment label , and a switch variable . Each tweet is assumed to have a single topic indicated by . The single-topic assumption is reasonable considering the length of a tweet. At the word level, we introduce two variables and . For each word in the dictionary, denotes the probability of the word being a topic word or a sentiment word. Thus, the model estimates three sets of distributions: (A) Probability of a word belonging to topic () or sentiment-topic combination (), (B) Sentiment distributions over label and topic (), and (C) Topic distributions over label (). The switch variable is sampled from , the probability of the word being a topic word or a sentiment word. We thus allow a word to be either a topic word or a sentiment word.111Note that is not estimated during the sampling but learned from a large-scale corpus, as will be described later.
4.2 Plate Diagram
Our sarcasm topic model to extract sarcasm-prevalent topics is based on supervised LDA [Blei et al.2003]. Figure 1 shows the plate diagram while Table 1 details the variables and distributions in the model. Every tweet consists of a set of observed words and one tweet-level, observed sentiment label . The label takes three values: positive, negative or sarcastic. The third label value ‘sarcastic’ indicates a scenario where a tweet appears positive on the surface but is implicitly negative (hence, sarcastic). is a tweet-level latent variable, denoting the topic of the tweet. The number of topics, is experimentally determined. is a word-level latent variable representing if a word is a topic word or a sentiment word, similar to mukherjee2012modeling. If the word is a sentiment word, the word-level latent variable represents the sentiment of that word. It can take unique values. Intuitively, is set as .
Among the distributions, is an observed distribution that is estimated beforehand. It denotes the probability of the word being a topic word or a sentiment word. Distribution represents the distribution over given the label of the tweet as . and are an hierarchical pair of distributions. represents the distribution over sentiment of the word given the topic and label of the tweet and that the word is a sentiment word. and are an hierarchical pair of distributions , where represents distribution over words, given the word is a sentiment word with sentiment and topic . is a distribution over words given the word is an topic word with topic . The generative story of our model is:
For each label , select
For each label , select
For each topic , select
For each topic and sentiment , select , and
For each topic select
For each tweet select
switch value for all words,
sentiment for all sentiment words,
all topic words,
all sentiment words,
We estimate these distribution using Gibbs sampling. The joint probability over all variables is decomposed into these distributions, based on dependencies in the model. Estimation details have not been included due to lack of space.
5 Experiment Setup
We create a dataset of English tweets for our topic model. We do not use datasets reported in past work (related to classifiers) because topic models typically require larger datasets than classifiers. The tweets are downloaded from twitter using the twitter API222https://dev.twitter.com/rest/public using hashtag-based supervision. Hashtag-based supervision is common in sarcasm-labeled datasets [Joshi et al.2015]. Tweets containing hashtags #happy, #excited are labeled as positive tweets. Tweets with #sad, #angry are labeled as negative tweets. Tweets with #sarcasm and #sarcastic are labeled as sarcastic tweets. The tweets are converted to lowercase, and the hashtags used for supervision are removed. Function words333www.sequencepublishing.com, punctuation, hashtags, author names and hyperlinks are removed from the tweets. Duplicate tweets (same tweet text repeated for multiple tweets) and re-tweets (tweet text with the ‘RT’ added in the beginning) are discarded. Finally, words which occur less than three times in the vocabulary are also removed. As a result, the tweets that have less than 3 words are removed. This results in a dataset of 166,955 tweets. Out of these, 70,934 are positive, 20,253 are negative and the remaining 75,769 are sarcastic. A total of 35398 tweets are used for testing, out of which 26,210 are of positive sentiment, 5535 are of negative sentiment and 3653 are sarcastic. We repeat that these labels are determined based on hashtags, as stated above.
The total number of distinct labels () is 3, and the total number of distinct sentiment () is 2. The total number of distinct topics (
) is experimentally determined as 50. We use block-based Gibbs sampling to estimate the distributions. The block-based sampler samples all latent variables together based on their joint distributions. We set asymmetric priors based on sentiment word-list from mcauley2013amateurs.
A key parameter of the model is since it drives the split of a word as a topic or a sentiment word. SentiWordNet [Baccianella et al.2010] is used to learn the distribution prior to estimating the model. We average across multiple senses of a word. Based on the SentiWordNet scores to all senses of a word, we determine this probability.
6.1 Qualitative Evaluation
The goal of this section is to present topics discovered by our sarcasm topic model. We do so in two steps. We first describe the topics generated when only sarcastic tweets from our corpus are used to estimate the distributions, followed by the ones when the full corpus is used. In case of the former, since only sarcastic tweets are used, the topics generated here indicate words corresponding to sarcasm-prevalent topics. In case of the latter, the sentiment-topic distributions in the model capture the prevalence of sarcasm.
The model estimates the and distributions corresponding to topic words and sentiment words. Top five words for a subset of topics (as estimated by ) are shown in Table 2. The headings in boldface are manually assigned444This is a common practice in topics model papers, in order to interpret topics. [Mukherjee and Liu2012b, Joshi et al.2016a, Kim et al.2013]. Sarcasm-prevalent topics, as discovered by our topic model, are work, party, weather, women, etc. The corresponding sentiment topics for each of these sarcasm-prevalent topics (as estimated by ) are given in Table 3. The headings in boldface are manually assigned. For topics corresponding to ‘party’ and ‘women’, we observe that the two columns contain words from opposing sentiment polarities. An example sarcastic tweet about work is ‘Yaay! Another night spent at office! I love working late night’.
The previous set of topics are all from sarcastic text. We now show the topics extracted by our model from the full corpus. These topics will indicate peculiarity of topics for each of the three labels, allowing us to infer what topics are sarcasm-prevalent. Table 4 shows the top 5 topic words for the topics discovered (as estimated in ) from the full corpus (i.e., containing tweets of all three tweet-level sentiment labels: positive, negative and sarcastic). Table 5 shows the top 3 sentiment words for each sentiment (as estimated by ) of each of the topics discovered. Like in the previous case, the heading in boldface is manually assigned. One of the topic discovered was ‘Music’. The top 5 topic words for the topic ‘Music’ are Pop, Country, Rock, Bluegrass and Beatles. The corresponding sentiment words for Music are ‘love’, ‘happy’, ‘good’ on the positive side and ‘sad’, ‘passion’ and ‘pain’ on the negative side.
The remaining sections present results when the model is learned on the full corpus.
6.2 Quantitative Evaluation
In this section, we answer three questions: (A) What is the likely sentiment label, if a user is talking about a particular topic? (Section 6.2.1), (B) We hypothesize that sarcastic text tends to have mixed-polarity words. Does it hold in case of our model? (Section 6.2.2), and (C) How can sarcasm topic model be used for sarcasm detection? (Section 6.2.3).
6.2.1 Probability of sentiment label, given topic
We compute the probability based on the model.
Table 6 shows these values for a subset of topics. Topics with a majority positive sentiment are Father’s Day (0.9224), holidays (0.9538), etc. The topic with the highest probability of a negative sentiment is the Orlando shooting incident (0.95). Gun laws (0.5230), work and humor are where sarcasm is prevalent.
6.2.2 Distribution of sentiment words for tweet-level sentiment labels
Figure 2 shows the proportion of word-level sentiment labels, for the three tweet-level sentiment labels, as estimated by our model. The X-axis indicates percentage of positive sentiment words in a tweet, while Y-axis indicates percentage of tweets which indicate a specific value of percentage. More than 60% negative tweets (bar in red) have 0% positive content words. The ‘positive’ here indicates the value of for a word in a tweet. In other words, the said red bar indicates that 60% tweets have 0% words sampled with as positive.
It follows intuition that negative tweets have low percentage of positive words (red bar on the left part of the graph) while positive tweets have high percentage of positive words (blue bar on the right part of the graph). The interesting variations are observed in case of sarcastic tweets. It must be highlighted that the sentiment labels considered for these proportions are as estimated by our topic model. Many sarcastic tweets contain very high percentage of positive sentiment words. Similarly, the proportion of tweets with around 50% positive sentiment words is around 20%, as expected. Thus, the model is able to capture the sentiment mixture as expected in the three tweet-level sentiment labels: (literal) positive, (literal) negative and sarcastic.
6.2.3 Application to Sarcasm Detection
We now use our sarcasm topic model to detect sarcasm, and compare it with two prior work. The task here is to classify a tweets as either sarcastic or not. We use the topic model for sarcasm detection using two methods:
Log-likelihood based: The topic model is first learned using the training corpus where the distributions in the model are estimated. Then, the topic model performs sampling for a pre-determined number of samples, in three runs - once for each label. For each run, the log-likelihood of the tweet given the estimated distributions (in the training phase) and the sampled values of the latent variables (for this tweet) is computed. The label of the tweet is returned as the one with the highest log-likelihood.
Sampling-based: Like in the previous case, the topic model first estimates distributions using the training corpus. Then, the topic model is learned again where the label is assumed to be latent, in addition to the tweet-level latent variable , and word-level latent variables , and . The value of as learned by the sampler is returned as the predicted label.
We compare our results with two previously existing techniques, buschmeier2014impact and liebrecht2013perfect. We ensure that our implementations result in performance comparable to the reported papers. The two rely on designing sarcasm-level features, and training classifiers for these features. For these classifiers, the positive and negative labels are combined as non-sarcastic. As stated above, the test set is separate from the training set. The results of these two past methods compared with the two based on topic models are shown in Table 7. Both prior work show poor F-score (around 18-19%) while our log-likelihood based approach achieves the best F-score of 41.34%. The low values, in general, may be because our corpus is large in size, and is diverse in terms of the topics. Also, features in liebrecht2013perfect are unigrams, bigrams and trigrams which may result in sparse features.
7 Conclusion & Future Work
We presented a novel topic model that discovers sarcasm-prevalent topics. Our topic model uses a dataset of tweets (labeled as positive, negative and sarcastic), and estimates distributions corresponding to prevalence of a topic, prevalence of a sentiment-bearing words. We observed that topics such as work, weather, politics, etc. were discovered as sarcasm-prevalent topics. We evaluated the model in three steps: (a) Based on the distributions learned by our model, we show the most likely label, for all topics. This is to understand sarcasm-prevalence of topics when the model is learned on the full corpus. (b) We then show distribution of word-level sentiment for each tweet-level sentiment label as estimated by our model. Our intuition that sentiment distribution in a tweet is different for the three labels: positive, negative and sarcastic, holds true. (c) Finally, we show how topics from this topic model can be harnessed for sarcasm detection. We implement two approaches: one based on most likely label as per log likelihood, and another based on last sampled value during iteration. In our log-likelihood based approach, we are able to significantly outperform two prior work based on feature design by F-Score of around 25%.
The current model is limited because of its key intuition about sentiment mixture in sarcastic text. Instances such as hyperbolic sarcasm go against the intuition. The current approach relies only on bag of words which may be extended to n-grams since a lot of sarcasm is expressed through phrases with implied sentiment. This work, being an initial work in topic models for sarcasm, sets up the promise of topic models for sarcasm detection, as also demonstrated in corresponding work in aspect-based sentiment analysis.
- [Baccianella et al.2010] Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In LREC, volume 10, pages 2200–2204.
- [Bamman and Smith2015] David Bamman and Noah A Smith. 2015. Contextualized sarcasm detection on twitter. In Ninth International AAAI Conference on Web and Social Media.
- [Blei et al.2003] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
- [Buschmeier et al.2014] Konstantin Buschmeier, Philipp Cimiano, and Roman Klinger. 2014. An impact analysis of features in a classification approach to irony detection in product reviews. In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 42–49.
- [Jo and Oh2011] Yohan Jo and Alice H Oh. 2011. Aspect and sentiment unification model for online review analysis. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 815–824. ACM.
[Joshi et al.2015]
Aditya Joshi, Vinita Sharma, and Pushpak Bhattacharyya.
Harnessing context incongruity for sarcasm detection.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, volume 2, pages 757–762.
- [Joshi et al.2016a] Aditya Joshi, Pushpak Bhattacharyya, and Mark Carman. 2016a. Political issue extraction model: A novel hierarchical topic model that uses tweets by political and non-political authors. In 7th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, pages 82–90.
- [Joshi et al.2016b] Aditya Joshi, Pushpak Bhattacharyya, and Mark James Carman. 2016b. Automatic sarcasm detection: A survey. arXiv preprint arXiv:1602.03426.
- [Joshi et al.2016c] Aditya Joshi, Vaibhav Tripathi, Pushpak Bhattacharyya, and Mark Carman. 2016c. Harnessing sequence labeling for sarcasm detection in dialogue from tv series ‘friends’. CoNLL 2016, page 146.
- [Khattri et al.2015] Anupam Khattri, Aditya Joshi, Pushpak Bhattacharyya, and Mark James Carman. 2015. Your sentiment precedes you: Using an author’s historical tweets to predict sarcasm. In 6TH WORKSHOP ON COMPUTATIONAL APPROACHES TO SUBJECTIVITY, SENTIMENT AND SOCIAL MEDIA ANALYSIS WASSA 2015, page 25.
- [Kim et al.2013] Suin Kim, Jianwen Zhang, Zheng Chen, Alice H Oh, and Shixia Liu. 2013. A hierarchical aspect-sentiment model for online reviews. In AAAI.
- [Liebrecht et al.2013] CC Liebrecht, FA Kunneman, and APJ van den Bosch. 2013. The perfect solution for detecting sarcasm in tweets# not.
- [McAuley and Leskovec2013a] Julian McAuley and Jure Leskovec. 2013a. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165–172. ACM.
- [McAuley and Leskovec2013b] Julian John McAuley and Jure Leskovec. 2013b. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd international conference on World Wide Web, pages 897–908. International World Wide Web Conferences Steering Committee.
- [Mukherjee and Liu2012a] Arjun Mukherjee and Bing Liu. 2012a. Aspect extraction through semi-supervised modeling. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 339–348. Association for Computational Linguistics.
- [Mukherjee and Liu2012b] Arjun Mukherjee and Bing Liu. 2012b. Aspect extraction through semi-supervised modeling. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 339–348. Association for Computational Linguistics.
- [Mukherjee and Liu2012c] Arjun Mukherjee and Bing Liu. 2012c. Modeling review comments. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 320–329. Association for Computational Linguistics.
- [Rajadesingan et al.2015] Ashwin Rajadesingan, Reza Zafarani, and Huan Liu. 2015. Sarcasm detection on twitter: A behavioral modeling approach. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM ’15, pages 97–106, New York, NY, USA. ACM.
- [Ramage et al.2009] Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP ’09, pages 248–256, Stroudsburg, PA, USA. Association for Computational Linguistics.
- [Riloff et al.2013] Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In EMNLP, volume 13, pages 704–714.
- [Silvio Amir et al.2016] Byron C Silvio Amir, Wallace, Hao Lyu, and Paula Carvalho Mário J Silva. 2016. Modelling context with user embeddings for sarcasm detection in social media. CoNLL 2016, page 167.
- [Wallace et al.2014] Byron C. Wallace, Do Kook Choe, Laura Kertz, and Eugene Charniak. 2014. Humans require context to infer ironic intent (so computers probably do, too). In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 512–516, Baltimore, Maryland, June. Association for Computational Linguistics.
- [Wang et al.2015] Zelin Wang, Zhijian Wu, Ruimin Wang, and Yafeng Ren. 2015. Twitter sarcasm detection exploiting a context-based model. In Web Information Systems Engineering–WISE 2015, pages 77–91. Springer.