Citizens' Emotion on GST: A Spatio-Temporal Analysis over Twitter Data

by   Deepak Uniyal, et al.

People might not be close-at-hand but they still are - by virtue of the social network. The social network has transformed lives in many ways. People can express their views, opinions and life experiences on various platforms be it Twitter, Facebook or any other medium there is. Such events constitute of reviewing a product or service, conveying views on political banters, predicting share prices or giving feedback on the government policies like Demonetization or GST. These social platforms can be used to investigate the insights of the emotional curve that the general public is generating. This kind of analysis can help make a product better, predict the future prospects and also to implement the public policies in a better way. Such kind of research on sentiment analysis is increasing rapidly. In this research paper, we have performed temporal analysis and spatial analysis on 1,42,508 and 58,613 tweets respectively and these tweets were posted during the post-GST implementation period from July 04, 2017 to July 25, 2017. The tweets were collected using the Twitter streaming API. A well-known lexicon, National Research Council Canada (NRC) emotion Lexicon is used for opinion mining that exhibits a blend of eight basic emotions i.e. joy, trust, anticipation, surprise, fear sadness, anger, disgust and two sentiments i.e. positive and negative for 6,554 words.



There are no comments yet.


page 1

page 2

page 3

page 4


Approaches for Sentiment Analysis on Twitter: A State-of-Art study

Microbloging is an extremely prevalent broadcast medium amidst the Inter...

A Tool for Spatio-Temporal Analysis of Social Anxiety with Twitter Data

In this paper, we present a tool for analyzing spatio-temporal distribut...

Sentiment Analysis of Twitter Data: A Survey of Techniques

With the advancement of web technology and its growth, there is a huge v...

Tweet Sentiment Analysis (TSA) for Cloud Providers Using Classification Algorithms and Latent Semantic Analysis

The availability and advancements of cloud computing service models such...

Public Sentiment Toward Solar Energy: Opinion Mining of Twitter Using a Transformer-Based Language Model

Public acceptance and support for renewable energy are important determi...

MigrationsKB: A Knowledge Base of Public Attitudes towards Migrations and their Driving Factors

With the increasing trend in the topic of migration in Europe, the publi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Goods and Services Tax (GST) is a concept of One-Nation, One-Tax, a Single Tax policy which is applied to all the goods and services at a national level. Introduction of GST brings all the taxes under the umbrella of of a uniform Tax Base. GST was rolled out by the Union government on the recommendations of the GST Council, which is the key decision-making body of GST regime. The Council is chaired by the Union Finance minister, Arun Jaitley. As per Article 279A of the amended Constitution, the GST Council is a joint forum of the Center and the states 111

The first draft of the GST bill was brought up in the parliament in the year 1999 under the leadership of the then PM of India Mr. Atal Bihari Vajpayee. However, it took almost two decades to implement it. The GST Bill was implemented at midnight (June 30th, 2017 - July 1st, 2017) by the President of India and the Government of India 222

Before the GST regime, various indirect taxes were levied by the Central and the State governments. Indian consumers had to pay taxes like Value Added Tax (VAT) or Local Sales Tax, Service Tax, Central Excise Duty, Customs Duty etc. Now, the GST will subsume almost all the indirect taxes of the Center and the State governments. The new tax system is fairly simple vis-a-vis its predecessor. The tax slabs [7] for GST in India are 0%, 5%, 12%, 18%, and 28%, depending on the type of product/service. For example, Zero (0%) is for all essential items including food grains and Five (5%) for items of mass consumption including essential commodities. There are few products like Petroleum, alcoholic drinks, electricity, and real estate that are taxed separately by the respective state governments, as per the previous tax regime. There are three forms of GST implemented in India - Central-GST (CGST), State-GST (SGST), and Interstate-GST (IGST). A seller has to collect both CGST and SGST from the buyer in Intra-State transactions, whereas he has to collect IGST in an Inter-State transaction.

Prior to India, more than 160 countries have already implemented the GST form of Tax Structure. The GST in India, in its current form, is a bit distinct and complex than the rest of countries. Unlike other nations, GST in India will be charged at different rates depending upon the category, value or state of the business. In Asia Pacific region, more than 40 kinds of GST models are operational that make a diverse set of rules and regulations.

It was considered as one of the biggest tax reforms of Indian history, which was expected to boost overall growth of the economy. Suitably, the repercussions were huge too. Introduction of GST created a lot of ripples on major social media platforms including Facebook and Twitter. People engaged in discourse, expressing all kinds of opinions on taxation system in India and the plausibility of a brand new idea.

Researchers use this data to understand the insight of public sentiments, emotions or opinions on an event and present a report to the concerned authority in case any further action is required e.g. to make a better product, predict the target price of the shares or use public inputs to implement the government policy in a better way. Such analysis where the data collected from social media platforms is mined to analyse the public sentiments is called sentiment analysis or opinion mining.

The first step in Sentiment Analysis is to collect data from social media accounts, and preprocess it to extract meaningful information from the data. Opinion mining can be performed by understanding the sentiments of different words in clean and preprocessed data. To deduce the emotions and sentiments we have used NRC - Lexicon which consists of 2554 words, NRC - Lexicon lists down eight emotions (joy, trust, anticipation, surprise, fear sadness, anger and disgust) and two sentiments (positive and negative).

We performed two type of analysis - Temporal and Spatial. In Temporal analysis, we calculated the frequency of tweets per hour and presented the varying emotions graphically while in Spatial analysis, we found the location of tweets and tried to show the degree of acceptance, region wise. Data were collected from July 4th to July 25th, 2017. The purpose of collecting data over a month is to make a fair judgement about the sentiments of people over a long period of time.

2 Related Work

Sentiment analysis can be defined as the process of automatically determining the sentiments, views, emotions, opinions and attitudes towards a social event. According to previous [1] opinions, views, sentiments are used interchangeably but still there is some difference between them.

According to [2]

sentiment analysis is about finding out opinions, identifying the sentiments people express and then classifying their polarity which could be positive, negative or neutral. There are two types of approaches for sentiment analysis - machine learning techniques (supervised and unsupervised) and lexicon based approach. Lexical based approach is further divided in to Dictionary based approach and Corpus based approach. Dictionary based approach relies on the terms collected and annotated manually and grows by adding synonyms and antonyms of the dictionary words, whereas corpus based approach uses a domain specific dictionary.

We have studied some of the past research that have been performed in the sentiment analysis in various fields [3] [4]. Some of the researchers are using standard datasets that are already annotated whereas some are using their own datasets for analysis.

In [5], sentiment analysis has been performed on 104,003 tweets which were collected before the German national election between August 13th and September 19th, 2009. The analyses show that a small section of people are dominating the social media and more than 40% of the messages or posts are made by only 4% of the people. The analyses also indicate a close correspondence between political discussions on public platforms and offline scenario.

In[6] a survey is presented that talks about different features in news content like linguistic features which are helpful in detecting fake news. There are many other similar tasks, like spam detection, rumour classification, on truth discovery that paper has discussed.

A similar paper is [7] which has presented a work that identifies fake tweets by using various features from tweets and twitter accounts.

In [8] a system has been developed for real-time analysis of public sentiments towards US presidential election 2012. The system analyses and visualizes over 36 million tweets that shows number of real time tweets every minute and tweets with different polarity of sentiments every five minutes.

Reserach [9]

uses Naive Bayes Algorithm to analyse public sentiments on approximately 20000 tweets collected from 27th June 2017 to 07th July 2017. The analysis shows that approximately 52% tweets are positive about the new taxation system in India while the rest of the tweets, negative and neutral are also quite significant in number.

In [10] [11] a technique is used to process and analyse a huge amount of real time data using Hadoop cluster and Naive Bayes approach. The paper focuses on the speed of the sentiment analysis rather than accuracy of the analysis. It processes the data by removing stop words, converting unstructured data to structured data and replacing emotions to their corresponding words.

In [12] sentiment analysis has been performed on over 2,50,000 tweets mentioning #Microsoft, #Windows, $MSFT etc. using supervised machine learning approach to find out the correlation between stock market movements and sentiments in the tweets.

In[13] author finds correlation between Cricketers’ performances and fans’ emotions. Their work shows that fans’ emotions depend on players’ performance in the tournament. In[14] the author performs the Geo-spatial sentiment analysis for the UK-EU referendum over the Twitter data. It analyses the data to find out the most talked about British politicians and the public sentiments for them. We have used the same approach to remove the noise from tweets as discussed in [15] research.

In [16] authors have witnessed correlations between the good performance with positive emotions and bad performance with negative emotions. Besides this, they found Trust (a type of positive emotion) as a most entangled emotion corresponding to each performance. Moreover, trading price of commercial brands are found to have transitive relationship with their brand ambassadors’ performance in the match. As a reference, they have used Google Trends to verify the influence of players performance all over the globe.

3 Methodology

3.1 Data Collection

Before we perform the sentiment analysis, we need to collect the data. The most important thing before we collect data is to decide the source of data and the time period for which we need to collect it. We selected the Twitter as a data source, as nowadays Twitter is very popular platform worldwide for public to express their views, opinions and sentiments like happiness, celebration, anger, surprise or fear etc. towards an event.

Dataset can be collected in three time periods - before the event has occurred, during the event and after the event has occurred. We chose to collect and analyse the data three days after the GST was implemented as by that time people would have got some information about the GST via social media, TV debates and other sources which decreases the probability of being biased or prejudices towards the policy.

We collected 1,63,372 tweets over 22 days with the help of twitter streaming API by using some popular and trending hashtags, which we were able to identify after manual scan of popular hashtags. After data preprocessing and cleaning process we removed more than 20000 tweets as they lacked the meaningful information. Finally 1,42,508 tweets were further considered for sentiment analysis. TABLE-1 shows the number of tweets collected during 22 days starting from 4th July, 2017 to 25th July, 2017.

Date No of Tweets Date No of Tweets
4th July 13025 5th July 16171
6th July 7644 7th July 10778
8th July 15834 9th July 12964
10th July 6170 11th July 5240
12th July 1772 13th July 7821
14th July 4241 15th July 5869
16th July 4099 17th July 6801
18th July 4782 19th July 4439
20th July 1892 21th July 2034
22nd July 2252 23rd July 3677
24th July 4419 25th July 644
Total Tweets 1,42,508
Table 1: Number of Tweets Per Day (July 2017)

3.2 Data Preprocessing and Cleaning

Data Preprocessing is a data mining technique that transforms raw data into a meaningful information which is understandable and ready for mining. This phase is very critical and a crucial step in data mining as it hugely impacts the analysis. There are high chances that unprocessed data remains inconsistent and noisy which might result into incorrect analysis. Number of previous study has fallow different approaches to tackle with noisy content[8][10][11][12][13]. To make the data uniform and clean we pre-process it by removing all the characters, words and phrases that are less significant or carry less weightage in sentiment analysis.

We passed each and every tweet through a python program that performs the preprocessing as shown in Fig-1 through a sample tweet. It starts with the removal of all the hashtags, mentions and URLs which is further processed by removing all the punctuations, single & double characters and recurring characters. Finally it removes all the stop words from all the tweets against the list, commonly used as stop words list to keep only meaningful data. In the final step of data preprocessing we have trimmed the tweets and then removed all the blank tweets from the data corpus. As a result, we have the dataset containing tweets, which will be more significant in the analysis and will carry some sentiments or emotions.

(a) Preprocessing and Sentiment Analysis -1
(b) Preprocessing and Sentiment Analysis -2
Figure 1: Showing the Data Preprocessing and Sentiment Analysis Process on Sample Tweets

3.3 Sentiment Analysis

After the data has been preprocessed, it is ready for analysis with the tweets containing all the words with more weightage or sentiments. In this paper, we have performed analysis on the basis of time and location on 1,42,508 and 58,613 tweets respectively.

(a) Sentiments
(b) Emotions
(c) Hourly Frequency
Figure 2: Showing the varying Sentiments, Emotions and Hourly Frequency of Tweets over 22 days in July 2017

3.3.1 Temporal Analysis

In case of temporal or time based analysis, the sentiment analysis has been performed on the data on an hourly and a daily basis. On preprocessed tweets, we have calculated the number of tweets with corresponding emotions (joy, trust, anticipation, surprise, fear sadness, anger and disgust) and sentiments (positive and negative), as shown in Fig-1 by a sample tweet, using NRC emotion lexicon.

To calculate the number of tweets and corresponding emotion or sentiment, we have applied the following methodology -

  • Tag each word of the tweet with the corresponding emotion and sentiment as per NRC emotion lexicon. If a word corresponds to more than one emotion, tag the word with each one of them.

  • Tag a tweet as neutral, if none of the word in a tweet has been tagged to any of the emotions or sentiments.

  • Mapping emotions, sentiments and neutral responses to their counts in the tweet.

  • Provide an emotion, a sentiment or a neutral tag to the tweet on the basis of maximum count or score we achieved in the previous step.

After applying the above methodology, all the tweets are tagged to an emotion, a sentiment or as neutral tweets. Finally we have created a mapping for emotions, sentiments, and neutral tweets to their counts per day starting from July 04, 2017 to July 25, 2017.

The graph in Fig-2(a) shows that most of the tweets are either neutral or positive and negative tweets are approximately half of the positives, which indicates that most of the people are positive about the modified taxation system of India.

We have also shown a graph for emotions and corresponding counts on a daily basis in Fig-2(b). This graph represents that most of the tweets indicate trust, followed by sadness, anticipation, surprise and fear in that order.

In both the graphs we can see a spike on 17th July which indicates a sudden jump in negativity and sadness in public emotions, but that again rebounds to higher positives and trustworthy tweets from the next day.

User Description of User Mentions
@NARENDRAMODI Mr. Narendra Modi, PM of India 14,160
@ARUNJAITLEY Mr. Arun Jaitley, FM of India 8,675
@PMOINDIA Office of PM 3,661
@PIYUSHGOYAL Mr. Piyush Goyal, Union Minister 3,641
@INCINDIA Indian National Congress 3,183
@FINMININDIA Ministry of Finance 3,176
@OFFICEOFRG Rahul Gandhi, President, INC 3,086
@ASKGST_GOI GOI official Handle for GST Queries 3,004
@BJP4INDIA Bharatiya Janata Party 2,916
@ADHIA03 Dr Hasmukh Adhia, Fin. Sec., GOI 2,815
@SAPINDIA SAP India Pvt. Ltd. 2,194
@ARVINDKEJRIWAL Mr. Arvind Kejriwal, CM, Delhi 2,064
@SANJAYAZADSLN Mr. Sanjay Singh, Rajya Sabha MP, AAP 1,873
@GST_COUNCIL GST Council 1,808
@AMITSHAH Mr. AMIT Shah, President of BJP 1,656
@PREETISMENON Ms. Preeti Menon, Member, AAP 1,652
@AJAYMAKEN Mr. Ajay Maken, President Delhi Congress 1,628
@MSISODIA Mr. Manish Sisodia, Deputy CM of Delhi 1,601
Total Mentions 62,793
Table 2: Number of Mentions Per User (July 2017)
(a) Hashtag Word Cloud
(b) Mention @user Word Cloud
Figure 3: Showing the Word Cloud for Top 40 Hashtags and Top 40 Mentions
(a) Sentiments on Mr. Modi Tweets
(b) Emotions on Mr. Modi Tweets
(c) Sentiments on Mr. Arun Jaitley Tweets
(d) Emotions on Mr. Arun Jaitley Tweets
Figure 4: Showing the Variation of Sentiments and Emotions On Tweets Addressed To Mr. Narendra Modi and Mr. Arun Jaitley
Location No of Tweets Location No of Tweets
Delhi-NCR 10919 Maharasthra 8437
Karnataka 3583 Gujrat 3146
Tamilnadu 3067 Rajasthan 2548
Uttar Pradesh 2586 Madhya Pradesh 2552
Haryana 1047 Punjab 1225
West Bengal 1204 Chhattishgarh 649
Bihar 590 Jammu & Kashmir 520
Uttarakhand 361 Orissa 371
Jharkhand 362 Kerala 303
Goa 167 Assam 165
Telangana 95 Andhra-Pradesh 243
Himachal Pradesh 108
India (State Not Mentioned ) 11,524
India (Total Tweets) 55,773
Foreign 2840
Total Tweets 58,613
Table 3: Location Wise Number of Tweets

We also analysed the tweets on an hourly basis for which we divided every-day’s time into 24 slots. Each slot is of 1 hour, where the 1st slot starts at 00:00:00 Hrs and ends at 00:59:59 Hrs. Accordingly, the 24th slot starts at 23:00:00 Hrs and ends at 23:59:59 Hrs. We summed up the count of tweets in each slot over 22 days of period to find out the peak time of the day when maximum people were tweeting. It can be seen in the graph in Fig-2(c), that most of the tweets have been posted from 07:00:00 Hrs to 19:00:00 Hrs and peak time slot is from 17:00:00 Hrs to 18:00:00 Hrs.

We have plotted word cloud for hashtags and most of the hashtags indicate that the taxation system has been simplified and will boost the Indian economy. Hashtag word cloud in Fig-3(a) shows that the top hashtags in the GST data are #GST, #GSTFORCOMMONMAN, #GSTBOOSTFORBIZ, #GSTFORNEWINDIA, #GSTSIMPLIFIED, #ONENATIONONETAX, #INDIAWELCOMESGST which clearly indicates that public is expressing trust and positive sentiments towards modified tax policy of India.

We have also extracted the mentions @userTable 2, from the tweets and calculated their counts in the entire dataset. This analysis has been performed to find out the personalities that are being addressed by more number of people. We have also plotted word cloud for the top forty mentions in Fig-3(b) so that it’s easy to visualize and find out the top mentions during the event. From this analysis we have found out that more than 15,400 users have been mentioned 2,77,200 times in totality in the collected GST twitter dataset. Prime Minister of India, Mr. Narendra Modi has been mentioned 14,160, highest number of times, followed by Finance Minister of India, Mr. Arun Jaitley. We have not shown the data of 15,390 users as their count of mentions is less than users specified in Table 2.

We have plotted the sentiment and emotion analysis graphs for the top two leaders, Mr. Narendra Modi and Mr. Arun Jaitley. All the graphs show that tweets mentioning these two personalities are highly positive in sentiment and highly trustworthy in emotion.

3.3.2 Spatial Analysis

In Spatial analysis, we have mined the data to extract the locations, as per their availability, from the tweets and analysed the count and sentiments of tweets for the cities with maximum number of tweets.

We have also extracted the tweet count and the sentiments which are posted from abroad. This shows that people outside India also took interest in this process and presented their views on the new tax regime. We are assuming that most of the tweets has been posted by the NRI’s who are concern about their country.

Table 3 shows the state wise tweets count, which shows that the maximum number of tweets has been posted from the Delhi-NCR region, national capital of India. We have extracted approximately 80000 tweets, that are having some record of the locations but finally we extracted approximately 59,000 tweets mentioning a real location.

Those who have not mentioned the name of any state or a city but are from India, have been shown under a separate tag - INDIA.

Table-3 shows that most of the tweets were posted from the metropolitan cities. It could have been due to following reasons - other people are not interested in the event or they are not expressing their views on social media. People from other cities/towns don’t have much access to the social media or Internet, or they don’t have much information about the event. That’s why they are unable to give their take on the issue. Out of the 58,613 tweets more than 70% of the tweets are coming from cities like Delhi-NCR, Mumbai, Bangalore, Chennai, Gujrat, Hyderabad, Pune which shows that social media is dominated by small group of people and hence the views and opinions on social media are generated by a small section of the society. The major challenges in analysing twitter or any other social media platform is get the true sentiment, to know whether these groups or sections of people are representing the views of the whole country or not.

4 Concluding Remarks and Future Research

In this paper we analysed the GST twitter data on the basis of time and location. We performed temporal analysis to show the varying amount of sentiment and emotions of public after GST implementation in India. The analysis was performed on the POST-GST data which we started collecting three days after its implementation and continued it for next twenty-two days. We have found out that most of the tweets represent positivity and trust. We have also found out that most of the people are addressing the PM of India, Mr. Narendra Modi and the Finance Minister of India, Mr. Arun Jaitley in their tweets expressing positive sentiments and trust in their emotions. We have also performed temporal analysis to find out the peak hours of the day when most of the people are expressing their views on twitter and it came out to be 17:00:00 Hrs to 18:00:00 Hrs IST.

In spatial or location based analysis it has been observed that more than 45% of the tweets have been tweeted from Delhi-NCR & Mumbai. The top six states, as can be seen from the table-3, accounts for approximately 70% tweets.

The whole procedure shows that the data on which analysis has been performed comes from a small part of the country which is comparatively more developed. It might have due to many reasons like other people are not interested in expressing their views, don’t have enough knowledge about the event, or don’t have Internet connectively.

In this research we have assumed that all the users are genuine. To refine this research further, we shall try to find out and remove the redundancy and fake user data in the future works we do.


  • [1] Kharde, Vishal, and Prof Sonawane. ”Sentiment analysis of twitter data: a survey of techniques.” arXiv preprint arXiv:1601.06971 (2016).
  • [2] Medhat, Walaa, Ahmed Hassan, and Hoda Korashy. ”Sentiment analysis algorithms and applications: A survey.” Ain Shams Engineering Journal 5, no. 4 (2014): 1093-1113.
  • [3] Agarwal, A. and Toshniwal, D., 2019. ”SmPFT: Social media based profile fusion technique for data enrichment. Computer Networks”, 158, pp.123-131.
  • [4] Agarwal, A. and Toshniwal, D., 2019. ”Face off: Travel habits, Road conditions and Traffic city characteristics bared using Twitter”. IEEE Access.
  • [5] Tumasjan, Andranik, Timm Oliver Sprenger, Philipp G. Sandner, and Isabell M. Welpe. ”Predicting elections with twitter: What 140 characters reveal about political sentiment.” Icwsm 10, no. 1 (2010): 178-185.
  • [6] Shu K, Sliva A, Wang S, Tang J, Liu H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter. 2017 Sep 1;19(1):22-36.
  • [7] Krishnan, S. and Chen, M., 2018, July. Identifying Tweets with Fake News. In 2018 IEEE International Conference on Information Reuse and Integration (IRI) (pp. 460-464). IEEE.
  • [8] Wang, H., Can, D., Kazemzadeh, A., Bar, F. and Narayanan, S., 2012, July. A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations (pp. 115-120). Association for Computational Linguistics.
  • [9] Das, S. and Kolya, A.K., 2017, November. Sense GST: Text mining & sentiment analysis of GST tweets by Naive Bayes algorithm. In Research in Computational Intelligence and Communication Networks (ICRCICN), 2017 Third International Conference on (pp. 239-244). IEEE.
  • [10] Ganguly, M. and Roy, S., 2018, January. A social network analysis of opinions on GST in India within Twitter. In Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and Networking (p. 18). ACM.
  • [11] Mane, S.B., Sawant, Y., Kazi, S. and Shinde, V., 2014. Real time sentiment analysis of twitter data using hadoop. IJCSIT) International Journal of Computer Science and Information Technologies, 5(3), pp.3098-3100.
  • [12] Pagolu, V.S., Reddy, K.N., Panda, G. and Majhi, B., 2016, October. Sentiment analysis of Twitter data for predicting stock market movements. In Signal Processing, Communication, Power and Embedded System (SCOPES), 2016 International Conference on (pp. 1345-1350). IEEE.
  • [13] Agarwal, A. and Toshniwal, D., 2018, June. Application of Lexicon Based Approach in Sentiment Analysis for short Tweets. In 2018 International Conference on Advances in Computing and Communication Engineering (ICACCE) (pp. 189-193). IEEE.
  • [14] Agarwal, A., Singh, R. and Toshniwal, D., 2018. Geospatial sentiment analysis using twitter data for UK-EU referendum. Journal of Information and Optimization Sciences, 39(1), pp.303-317.
  • [15] Agarwal, A., Gupta, B., Bhatt, G. and Mittal, A., 2015, December. Construction of a Semi-Automated model for FAQ Retrieval via Short Message Service. In Proceedings of the 7th Forum for Information Retrieval Evaluation (pp. 35-38). ACM.
  • [16] Agarwal Amit, Brijraj Singh, Jatin Bedi, and Durga Toshniwal. ”A Datamining Approach for Emotions Extraction and Discovering Cricketers performance from Stadium to Sensex.”arXiv preprint arXiv:1809.00310 (2018).