Topic Based Sentiment Analysis Using Deep Learning

In this paper , we tackle Sentiment Analysis conditioned on a Topic in Twitter data using Deep Learning . We propose a 2-tier approach : In the first phase we create our own Word Embeddings and see that they do perform better than state-of-the-art embeddings when used with standard classifiers. We then perform inference on these embeddings to learn more about a word with respect to all the topics being considered, and also the top n-influencing words for each topic. In the second phase we use these embeddings to predict the sentiment of the tweet with respect to a given topic, and all other topics under discussion.



There are no comments yet.


page 3


Sentiment Analysis by Joint Learning of Word Embeddings and Classifier

Word embeddings are representations of individual words of a text docume...

A Hybrid Approach for Aspect-Based Sentiment Analysis Using Deep Contextual Word Embeddings and Hierarchical Attention

The Web has become the main platform where people express their opinions...

Topical Stance Detection for Twitter: A Two-Phase LSTM Model Using Attention

The topical stance detection problem addresses detecting the stance of t...

Automated Generation of Multilingual Clusters for the Evaluation of Distributed Representations

We propose a language-agnostic way of automatically generating sets of s...

Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy

In recent years, we have seen deep learning and distributed representati...

Performing Stance Detection on Twitter Data using Computational Linguistics Techniques

As humans, we can often detect from a persons utterances if he or she is...

TDAM: a Topic-Dependent Attention Model for Sentiment Analysis

We propose a topic-dependent attention model for sentiment classificatio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Twitter is a commonly used microblogging platforms where millions of users express their opinions on various topics or domains. These opinions can be regarding a political issue, technology, sports and entertainment, a particular product, a celebrity or any person of interest.

Due to the sheer volume of data, it is impossible to manually go through the tweets to extract opinions and sentiments. Hence, there arises a need for automated systems that extract polarity of an opinion with respect to a particular topic.

It is vital that the context with respect to a topic is understood since words and phrases have have sentiments under different contexts. In it is clearly seen that the word "court" has radically different sentiment with respect to a political and a sporting scenario. All these subtleties and nuances have to be addressed for the task of topic/entity based sentiment prediction. In the rest of the paper, we will use the terms topic and entity interchangeably.

We solve this problem by creating our own word embedding and then using these embedding for sentiment classification. In this paper, we have also tried to demystify the word embedding representing the sentiment it contributes for every topic (see figure 5 ). We have introduced a model , where it is possible to make positive and negative sentiment nuclei for each topic ( figure 6 ) .

2 Related Work

Social media , especially micro-blogging websites like twitter are a mine of unstructured but very vocal data. People take to these platforms to express their views and sentiments on several topics.

There has been a lot of work on topic based sentiment analysis particularly using recurrent neural networks because of its recent success on language

[Nallapati et al.2016] , [Cheng and Lapata2016], [Bahdanau et al.2014]. Twitter based word embeddings have been used in [Xiong2016] on a few million tweets. There has also been focus on input topic embeddings to network and also attention over sentence level nodes to extract information with respect to the topic [Wang et al.] [Jebbara and Cimiano2016].

There is also feature engineered topic extraction using information from POS trees to distribute probability mass across every work in a sentence with respect to an topic. Distance embeddings have also been used to give weights to each word in a sentence according to it’s position with respect to the topic or topic of interest

[Jebbara and Cimiano2016]. There has also been focus on identifying aspects in a sentence and predict the sentiment on those aspects [Wang et al.][Dhanush et al.2016].

There has also been focus on methods that make use of LDA, BTM topic modeling approaches and then use semantic and syntactic approaches to classify sentiment based on aspect [Pavlopoulos2014]. Also most of the approaches are restricted to a small number of domains or a particular domain [Gulaty2016] [Thet et al.2010] and don’t perform well generally on other topics.

All of these approaches either require hand engineering with respect to an extent topics, POS tags, etc., or completely lack interpretability.

The model we propose is uses the custom embeddings generated to predict the sentiment conditioned on the tweet and the topic. The CNN model is interpretable in the sense that it highlights the sentiment contribution of each word with respect to every topic. Also, given a topic, we can establish it’s positive and neutral aspects by visualizing words or phrases with the most positive or negative contributions in the predictions obtained from phase 1.

Figure 1: Sample topics that were selected

3 Data

Our data comes from SemEval 2016 Task 4 (Sentiment Analysis in Twitter) as well as from the Sanders data-set. Initially our data has the following attributes :

  1. Tweet id

  2. Tweet

  3. Topic

  4. Sentiment Score (from -2 to +2)

For the scope of our testing , we picked topics from 4 domains : technology , sports, politics and music. Overall, we selected data from 77 topics (see Figure 1 for sample topics we used) .

We performed standard preprocessing on the tweets including cleaning htmls, removing mentions , trailing hashtags , removing special characters and punctuations. For our custom embedding , we also filter out all words below a minimum occurrence frequency (currently 3)

We also noticed that our data was heavily skewed towards Positive and Neutral classes, there wasn’t enough data for the Negative class. Because of this, the model would have a tendency to over-fit to Positive and Neutral classes. To overcome this , we randomly filtered one-third of data from the Positive and Neutral classes to have a more balanced data distribution . At the end of the preprocessing :

  • vocabulary : 14k words

  • genre (domains) : 10

  • number of topics : 370

  • number of tweets : 16895

We trained our model on 13300 tweets, cross validated on 700 tweets tested on remaining 2895 tweets.

Figure 2: Diagrammatic representation of word2Topic architecture

4 Custom Word Embedding

4.1 The Model

The model to create the custom word embedding is depicted in Figure 2.

The first step is obtaining our custom word embeddings that represent the sentiment of a word towards every topic. If the number of topics are t

then the we can express each words as a 1 x t vector .

We made the decision to use custom word to embeddings as opposed to standard embeddings as we found that that the latter over-fit (Subsection 4.2

). We obtain the custom word embeddings (henceforth called word2topic) by training a CNN . The input to the CNN is a stacked array where each row is the one-hot encoding for every word in our vocabulary . If our vocabulary had

n words, then the input would be an array for size n x n.

The labels to the CNN are a normalized mean of the sentiment expressed by a word with respect to every topic. For every word, the label vector is of size 1 x t. The label array is of size n x t.

The inputs and labels are fed into a 6 layer CNN (2D). We take inspiration from the architecture of word2vec [Mikolov et al.2013] and use the representation from the penultimate hidden layer (FeedForward) for computation of Sentiment with respect to Topic (see section 5) . We use Adam to for optimization and mean squared errorloss function.

The output from the last layer ( of the same size as label , 1 x t for every word ) is used to perform inference ( subsection 6.1).

Figure 3: Confusion Matrix for Logistic Regression of Test Data for Word2Vec and Word2Topic embedding

4.2 Baselines

The task that we’re performing can be split into two parts : the first is to generate custom word embedding for each word that reflects the sentiment of a word with respect to each topic. The second is to predict the sentiment of a tweet with respect to all topics, not just the topic we were given labeled data for.

Since the first task is pretty standard, we compared our word embedding against state-of-the-art word2vec embedding to train a Logistic Regression Classifier. The results can be seen in figure 3

. Each tweet is represented as the concatenated word embedding representation of each word (padded up to 30 words per sentence) and one-hot encoding of the topic. The label is the sentiment score .

We see that not only does word2topic give a better accuracy than word2vec, but also that word2vec tends to over-fit the data as it classifies most of the tweets as neutral/positive and hardly classifying anything in the negative class. Word2Topic , performs better and its confusion matrix is stronger on the diagonal. This gives us the motivation to move forward with word2topic as our word embedding for classification .

Figure 4: Diagrammatic representation of Sentiment Classification RNN architecture
Figure 5: Topic-wise Representation of the word ’court’ for Inference
Figure 6: Top influencing words for the topic ’Amazon Prime Day’

5 Sentiment Classification

We classify topic sentiment at the sentence level and treat it as a conditional modeling problem of predicting the sentiment score, whether positive or negative or neutral, given a tweet text ’t’ and an topic/topic ’a’. Both these variables serve as input to a model which further classifies the sentiment as either positive, negative or neutral.

Each word in a tweet is represented as the value of the hidden layer of size 100 extracted from phase 1. The topic is represented as a single word embedding. We use a Bidirectional Recurrent Neural network architecture with a LSTM cell for the proposed task and the architecture of the network is shown in


. The network can be decomposed into two blocks, the sentence block and the topic block. The sentence block consists of 30 LSTM units which takes in a sentence as input where each sentence is represented as a vector of word embeddings. The topic block takes in as input the word embedding of the topic and the output of both these blocks are concatenate together in order for the sentence level information and the topic information to interact. We include two additional Bidirectional LSTM layers to increase the model complexity and have a 3 node softmax activation as the output layer for classification. Adam was used as the optimizer with a learning rate of 0.0005. It was trained with a batch size of 64 for 40 epochs with categorical cross entropy as the loss function.

5.1 Results

The results from the RNN are shown in LABEL:fig:rnn_results: The RNN gives an accuracy of 74.4% accuracy on 3 class classification 8 and 64.81% on 5 class classification7. It can be clearly seen from the confusion matrix that the classifies accuracy identifies negative opinions from the positive ones with a very high accuracy. It misses the fine line between neutral-positive and neutral-negative on a few cases. The training accuracy is also shown in 9 and 10 .Hence, future work can be focused on classifiers that first classify neutral comments from opinionated ones, and as a next step, classify positive opinions from the negatives.

Figure 7: Confusion Matrix(Test) for RNN 5 class
Figure 8: Confusion Matrix(Test) for RNN 3 class
Figure 9: Confusion Matrix(Train) for RNN 5 class
Figure 10: Confusion Matrix(Train) for RNN 3 class

6 Inference

One of the novel features of our model is that is interpretable. By simply examining the output vectors at each phase , we can perform different kind of inferences .

6.1 Sentiment contributed by a word given topic

If we look at the output of the CNN model as discussed in section 4.1, we see that for each word , we get a 1 x t vector , where t is the number of topics under discussion . For example , we can see the word-embedding of the word ’court’ for 77 topics in figure 5 . The arrays have been reshaped to 11 x 7 for better visibility .

If one were to examine this matrix carefully , we can see that the word court has a positive sentiment for the topic ’angela markel’ (1.5) and also for ’real madrid’ . Although the context in which the word court is used for these topics is completely different, yet we are able to capture the sentiment that the word is contributing to with respect to each topic.

Another way to interpret this would be to think of these values as potential functions , and that the sentiment of a tweet is proportional to the (weighted) sum of the potential functions of all its constituent words. Therefore , the presence of the word ’court’ in a tweet about the topic ’real madrid’ is going to have a positive effect on the sentiment .

This representation helps us interpret the sentiment contributed by every word in the vocabulary for every topic.

6.2 Most influential words for a topic

By doing some analysis on our data, we can also infer given any topic, the words that bear the most positive and negative sentiment for it. For example, let’s consider the topic ’Amazon Prime Day’ as shown in figure 6.

Currently we’ve obtained the most positive and negative words for a topic by examining the sentiment expressed by them in the custom word embedding. Words like ’credible’ ,’smartest’, ’faster’ contribute to a very positive sentiment , whereas words like ’worst’, ’throw’, ’slipping’ contribute to a very negative sentiment . The number associated with the word can be interpreted as the potential function of a word given a topic.

By performing further analysis on these ’influential key words’ one can create positive and negative nuclei for every topic. Although, we haven’t yet used these nuclei to refine our results, we would want to incorporate them in future work.

7 Conclusion

To conclude, we have built a model for sentiment classification of tweets given topics using deep learning. We do this in two parts : first by creating custom word embedding , second by predicting using custom embedding . Our model is interpretable , as we can infer the sentiment that a word contributes to every topic and the most influential words for any topic. Our model is novel as given any tweet , it can predict the sentiment for not just one but many topics.

Although, we have created a model that gives us interpretable results, our macro F-score can still be improved. We plan to do this by introducing POS tagging weight on labels when generating word embeddings. We also plan to introduce a sophisticated distance embedding for each tweet from a given topic (again , incorporating POS tagging) .


We would like to acknowledge the efforts of our course instructor Amita and fellow classmates , who gave us critical advise and guided us to complete our project on time.