Characterizing Long-Running Political Phenomena on Social Media

by   Emre Calisir, et al.

Social media provides many opportunities to monitor and evaluate political phenomena such as referendums and elections. In this study, we propose a set of approaches to analyze long-running political events on social media with a real-world experiment: the debate about Brexit, i.e., the process through which the United Kingdom activated the option of leaving the European Union. We address the following research questions: Could Twitter-based stance classification be used to demonstrate public stance with respect to political events? What is the most efficient and comprehensive approach to measuring the impact of politicians on social media? Which of the polarized sides of the debate is more responsive to politician messages and the main issues of the Brexit process? What is the share of bot accounts in the Brexit discussion and which side are they for? By combining the user stance classification, topic discovery, sentiment analysis, and bot detection, we show that it is possible to obtain useful insights about political phenomena from social media data. We are able to detect relevant topics in the discussions, such as the demand for a new referendum, and to understand the position of social media users with respect to the different topics in the debate. Our comparative and temporal analysis of political accounts can detect the critical periods of the Brexit process and the impact they have on the debate.




Tracking Elections: our experience during the presidential elections in Ecuador

The world's digital transformation has influenced not only the way we do...

A User Modeling Pipeline for Studying Polarized Political Events in Social Media

This paper presents a user modeling pipeline to analyze discussions and ...

A Method for Estimating Individual Socioeconomic Status of Twitter Users

The rise of social media and computational social science (CSS) has open...

A Decade of Social Bot Detection

On the morning of November 9th 2016, the world woke up to the shocking o...

Learning Ideological Embeddings from Information Cascades

Modeling information cascades in a social network through the lenses of ...

Measuring Issue Ownership using Word Embeddings

Sentiment and topic analysis are common methods used for social media mo...

Analyzing digital politics: Challenges and experiments in a dual perspective

Social networks have become in the last decade central to political life...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Social media provides many opportunities to monitor and evaluate political phenomena such as referendums and elections. Citizens from all around the world, voters, politicians, private and public authorities participate and contribute to debates on social media platforms with tremendous interest. According to a survey, 66% of social media users have employed these platforms to post their thoughts about civic and political issues, react to others’ postings, press friends to act on issues and vote, follow candidates, like and link to others’ content, and belong to groups formed on social networking sites [1]. In this context, Twitter is known as one of the most convenient social media platforms with its prominent features including hashtag based information annotation and retrieval, mentionbased people referring and re-tweet/like based agreement on the opinions. Segesten and Bossetta found that that citizens not political parties are the primary initiators and sharers of political calls for action leading up to the 2015 British General Elections [2].

The political issue investigated in this study concerns one of the most important political events of recent times, which defines the process of United Kingdom’s exit from the European Union (EU), informally named Brexit. On 23 June 2016, the United Kingdom voted to leave the EU, by 51.9% for Leave, and 48.1% for Remain side. However, the local and global impacts of the referendum have made the issue a highly active and long-standing discussion well beyond the end of referendum, as seen in the continuity of the Google search trend (Fig.1). Another indicator of the constant interest in the subject is the continuing discussion on Twitter, whose trend is highly correlated to the Google search trend on the topic (Pearson correlation of 0.92). This result makes Twitter a convenient data source to analyze the Brexit phenomenon with respect to various aspects.

In this study, we aim to address the following research questions:

Question 1: Can we determine the political stance of Twitter users with respect to Brexit based on the content they share? Can we analyze how stance evolves in time?

Question 2: What are the main discussion topics, what is the general sensitivity on these issues and which polarized side reacts to the different issues?

Question 3: Which politicians have been discussed most, what is the general sensitivity with respect to these politicians and which polarized side is more responsive to them?

Question 4: What is the impact of automated bot accounts to the online discussions, and to which side are they aligned most?

The rest of the paper is organized as follows: We first present the primary studies in our research focus. Then we give detailed information about our collected data and findings including user demographics and public interest to tweets. We present our two-fold stance classification approach and experiment on the Brexit referendum. Next, we analyze the Twitter accounts regarding their bot behavior, and then we interpret the stance of bot accounts for the Brexit experiment. In topic analysis part, we share the results of topic discovery implementation concerning the public attitude and sentiment of discovered topics. Additionally, we share our findings of engagement of social media users with the politicians, and we analyze the public reaction to these accounts over time. We conclude our work by providing the technical details of our implementation, and with the detailed tables in the appendix section.

2 Related Work

2.1 Social Media and Politics

Social media has an essential role in terms of sharing information during political happenings. In a study related to German federal elections, the authors found that Twitter is used intensively for political deliberation [3]. In another study related to 2013 Italian elections, Vaccari et al. demonstrated that the political deliberation on social media also makes people more conscious and active on the political news [4]. In a recent study on the Brexit referendum, it is argued that social media data could be used to elucidate the underlying themes/concerns of the political discourse [5]. The intense use of social media in politics makes this platform a vast source of information for understanding various aspects of human behavior and political facts.

Figure 1: After the referendum, public interest to Brexit debate continues with the same trend in terms of Twitter posts and Google searches (Correlation 0.92)

2.2 Stance Classification

In recent years, the researchers have shown great interest to estimate public opinions about political phenomenon through social media data. Even though there exist some studies

[6] arguing that social media could not be used as a source for electoral predictions in general, several studies achieved notable results. Identifying the users who are in favor of, against, or neutral towards a target is known as stance classification. The target of the stance analysis may be a person, an organization, a government policy, a movement, a product, and so on. On the other hand, stance classification is usually confused with sentiment detection. According to [7], while in sentiment analysis the goal is to extract the sentiment from a piece of text, in stance classification the purpose is to determine favorability toward a given (pre-chosen) target of interest. The examples in Table 1 show the difference; tweets may have the same stance, but opposite sentiment.

Tweets have by nature a concise text structure, which makes the stance classification task more challenging. To overcome this obstacle, many studies have focused on the different steps of machine learning pipeline. For the data annotation part of the supervised learning task, manual

[8, 9, 10] or automatical [11] methods have been used. Besides, there also exist some specific studies presenting richer datasets in order to define a gold standard [12]

. Specifically for Twitter, various feature engineering techniques are implemented such as lexical (n-grams), word-embedding

[7], syntactic (sentiment, grammatical) [13, 3], meta-data (retweet count, follower count, mentions), network-specific (retweet-based propagation)[14] and argumentative analysis(argumentativeness, source type) [8]

. As a machine learning algorithm, the authors achieved successful results with Naive Bayes

[8, 14]

, Support Vector Machines (SVM)

[7, 8, 9]

, Decision Trees


and Recurrent Neural Network (RNN)

[15], and a combination of RNN with long-short memory (LSTM) and target-specific attention extractor [16].

As a complementary step of stance classification, some studies also have applied an age-adjustment since Twitter users do not represent the demographics of voters genuinely. In a recent study, Grcar et al. argue that the age correction changed their prediction outcome from Remain to Leave, by achieving a very close ratio compared to referendum outcome [9]. In another study, Lopez et al. achieved 71% correlation for Leave and 65% for Remain without applying any age adjustment [11].

Tweet Stance Sentiment
I voted #Remain in the #Referendum I love my #European brothers and sisters ProRemain Positive
#Brexit consequences just seem to get worse and worse ProRemain Negative
Congratulations, Great Britain on #Brexit Independence day Enjoy ProLeave Positive
I voted leave because I dont think the #EU works i dont see anything to suggest that it ever will #Brexit ProLeave Negative

Table 1: Stance and sentiment analysis are separate tasks, expressions from a specific stance may have opposite sentiment.

2.3 Role of Automated Accounts (Bots) in Elections and Referendums

While social media is a platform made for the use of people, it is also known that a large share of accounts are automated generators of posts and other activities on social networks. These accounts are often referred to as bots. A type of bots is political social media bots specializing in political issues that are particularly active in public policies, elections, and polarized political discussions [17]. However, their presence in online political discussions could be harmful in many senses. Ratkiewicz et al.’s work on 2010 US Midterm elections and Metaxas et al.’s work on 2010 Massachusetts special election showed that political bots might artificially inflate support for a political candidate [6, 18]. In a recent study on 2016 US Elections, Bessi and Ferrara found that bot accounts generated about one-fifth of the entire conversation, and their presence negatively affected democratic political discussion rather than improving it, which in turn could potentially alter public opinion and endanger the integrity of the Presidential elections [19]. Similarly, in the case of Brexit referendum, political bots profoundly dominated Twitter for spreading information supporting the idea of leaving the EU, and they generated almost one-third of all content [17]. Again in Brexit referendum, Bastos and Mercea uncovered a bot network comprising 13,493 accounts that massively retweeted user-generated hyperpartisan news and then disappeared from Twitter shortly after the day of the referendum [20]. These studies prove that political bots play an active role in political phenomena and their presence may have negative impacts on the voting results and public opinion.

2.4 Topic Discovery

With the high amount of people participating in online social discussions, it becomes challenging to track the discussed topics. For this reason, applying the automatic methods of topic discovery could be an efficient way to explore the discussion focus. Chinnov et al. summarizes the challenges of dealing with short social media texts in topic discovery practices [21]. As a specific example to solve these problems, Hong and Davison follow an aggregation strategy to increase the amount of short text content for training the topic models [22]. As an example to topic discovery applications in particular political science domain, the authors employed US presidential elections and Brexit referendum by creating a general framework based on latent topic models and user features [23]. As a baseline of their topic discovery method, they used the algorithm suggested by Zhao et al. [24] which is an adaptation of the Latent Dirichlet Allocation model. In another study [25], the authors examined how social and political topics are related to the South Korean presidential elections of 2012, and they had a two-fold method: First, to implement a temporal LDA to analyze and validate the relationship between topics, and then to develop the term co-occurrence retrieval technique in order to compensate LDA’s limitations.

3 Data Collection and Analysis

In our study, we queried for the tweets containing the keyword Brexit posted between January 2016 and October 2018. Although the meaning of Brexit is UK’s exit from the EU, the neutrality of this term has been proven by empirical studies [11]. By using Twitter’s API, we collected 10 million tweets sent by 1.5 million users in different languages. As shown in Fig.2, more than half of the users participated in the discussion only once.

Figure 2: Tweet post frequency of Users related to Brexit
(a) Users by Ethnicity
(b) Users by Gender
(c) Users by Age
(d) Tweets by Language
(e) Tweets by Country
(f) Impact of Tweets based on Nb. of Retweets and Likes
Figure 3: User demographics and tweet meta-data analysis

3.1 User Demographics, Spatial Analysis

Social media messages may contain additional attributes that may provide demographics and location information about users. In our approach to demographic analysis, we benefited from the profile photos of Twitter users. Taking into account Jung et al.’s experiments, we analyzed profile images through face detection and recognition in order to find the age, gender and ethnicity of users with a single face in the profile photo

[26]. According to our analysis, 30% of the user base has a single face in profile photos, and we have been able to make demographic inferences for that user base. Our results showed that users of every ethnic background share their opinions on the Brexit process (see Fig.2(a)). On the other hand, the percentage of male users is slightly higher than the Twitter average (see Fig.2(b)) 111Statista

Surprisingly, we have found that young people are less interested in the Brexit debate. Although 37% of Twitter users are under 18 years old according to the latest statistics222Omnicore Agency, this ratio is only 15% in our database (see Fig.2(c)). This result is important because in some of the Brexit related stance classification studies [9], the authors performed age adjustments on their prediction results by claiming that the Twitter users are much younger than English voters. However, our result shows that the participants to Brexit debate on Twitter do not represent general Twitter users.

In our language and spatial analysis, we found that 81% of tweets are written in English (Fig.2(d)), and 45% of tweets are posted from the United Kingdom (Fig.2(e)). In the stance classification and topic discovery analyses where the textual content is the main feature, we only use the tweets written in English.

3.2 Tweet and User Meta-Data Analysis

In this section, we provide useful insights based on our meta-data analysis on the Twitter users and their posts. The first valuable information we found is that the average number of followers of Twitter users participating in the Brexit discussions is six times higher than the average Twitter user average, which could be interpreted as the audience discussing Brexit is composed of highly influential people.333DMR Business Statistics

Stance Characterizing Hashtags
Remain #strongerin, #voteremain, #intogether, #labourinforbritain, #moreincommon, #greenerin, #catsagainstbrexit, #bremain, #betteroffin, #leadnotleave, #remain, #stay, #ukineu, #votein, #voteyes, #yes2eu, #yestoeu, #sayyes2europe, #fbpe, #stopbrexit, #stopbrexitsavebritain
Leave #leaveeuofficial, #leaveeu, #leave, #labourleave, #votetoleave, #voteleave#takebackcontrol, #ivotedleave, #beleave, #betteroffout, #britainout, #nottip, #takecontrol, #voteno, #voteout, #voteleaveeu, #leavers, #vote_leave, #leavetheeu, #voteleavetakecontrol, #votedleave
Ambigious #euref, #eureferendum, #eu, #uk
Table 2: Stance-Indicative (SI) and Stance-Ambiguous (SA) Hashtags

Our second finding shows that Twitter users become more interested in Brexit-related content in time, even more than in the day of the referendum. Figure 2(f) illustrates the increase in the number of retweets and likes per tweet over time.

4 Brexit Stance Classification

In stance classification, we aim to find users in proRemain or proLeave stance and analyze their participation in the Brexit discussions. Some studies [9, 11]

considered the presence of stance-indicative (SI) hashtags as an effective way to discover polarized tweets and users. The disadvantage of using this method is that it cannot evaluate tweets that do not contain SI hashtags. Unfortunately, this typically includes a substantial share of tweets. The solution we propose is to divide our dataset into two subsets, the ones that contain SI hashtags and the ones that don’t. Then, we classify the tweets with SI hashtags by rule-based method, and the remaining tweets by machine learning methods. Notice that in our context, only 8% of the tweets contain SI hashtags. Thanks to our approach, we can instead analyze the remaining 92% too. After classifying each tweet as proRemain, proLeave or non-polarized, we will be able to determine each user’s stance by looking at the number of tweets in each class.

4.1 Rule-based Classification

Hashtags are commonly used by Twitter users to express their stance in a political phenomenon. According to our analysis, between January 2016 and September 2018, more than 600 thousand unique hashtags were used with the Brexit hashtag. As shown in Table 2, we created a list of stance-indicative (SI) and stance-ambiguous hashtags by finding the most commonly used hashtags and considering the findings of other Brexit related studies. In this method, we classified the tweets based on the following hypothesis. In our approach, the stance of a tweet is:

  • ProRemain (PRT), if it contains at least one Remain, but not any Leave related hashtag,

  • ProLeave (PRL), if it contains at least one Leave, but not any Remain hashtag,

  • Non-polarized for all other cases.

Then, to calculate the user stance, we applied the following formula considering all tweets of the user in our database.

In our comparative approach, we only take into account ProLeave and proRemain users, and we get the ratio of a class by dividing its value to the sum of two classes. As a result, we found that the number of proRemain users is relatively higher than the number of proLeave users (see Table 3). However, this method classified 92% of tweets as non-polarized because they do not contain SI hashtags. Within our knowledge, Twitter has become the primary place for online social discussions on the Brexit referendum, and there should be a higher number of active polarized users on Twitter. Therefore, we have developed the following complementary method using machine learning techniques for stance classification of the tweets not featuring SI hashtags.

4.2 Machine Learning (ML) Based Classification

In this task, we only focused on the tweets that are labeled as non-polarized in the previous method. For the preparation of training and development set for our learning-based classifier, a subject expert involved in our study, and prepared three sets of 1000 tweets from each class: proRemain, proLeave and non-polarized. In terms of feature engineering, we normalized the tweets with a Twitter-specific tokenizer and then transformed to n-gram pairs (uni-bi-trigrams). For the implementation of the classification algorithm, we tested various algorithms, and we obtained the best results with the Support Vector Machines having a linear kernel. In a recently shared task about stance classification, Mohammd et al. obtained the highest score among other tasks with a machine learning model similar to ours [7].

As a result of the 10-fold cross verification, the weighted average F1 score and AUC scores achieved to 0.71 and 0.80. By predicting the tweets using this model, we obtained 2.1 million tweets from proRemain and 1.8 million tweets from pro

Leave classes. Then, for the validation of the classification task, a subject expert evaluated the predicted labels on a randomly selected subset of data. As a result, we found that the model’s variance is less than 5% for both classes.

This method allowed us to detect a significant amount of polarized tweets. In the final step, we obtained a complete tweet set of 2.55 million proRemain and 1.8 million proLeave tweets by combining the results of rule-based and machine learning-based methods. Over this dataset, we applied the user stance evaluation, and we found that 432,000 users are proRemain and 309,000 of users are proLeave.

4.3 Analysis of Changes of Users’ Stance in time

Besides the static classification of users’ stance, we also analyzed the change in stance from two perspectives. In our first approach, we compared the users’ pre and post-referendum tweets, and we found that the number of users who change their stance is significantly higher in the proLeave side (62%) than the proRemain side (33%) (Fig. 4).

Figure 4: When we compare the users before and after the referendum, we see that the change of stance is mostly from proLeave to proRemain.

In our second approach, we analyzed monthly changes in the stance of users. By calculating a single stance value for users from their monthly tweets, we visualized the increases and decreases of participation to debate from each side (Fig. 5). Our result validates the referendum outcome with 51% of proLeave and 49% proRemain users. Furthermore, our results show that the percentage of ProRemain users is varying between 60% and 70% over the past two years.

Method Type Remain Leave
Rule-based (RB) Tweets 462K 254K
Users 62K 38K
Machine Learning Tweets 2.1M 1.8M
based (MLB) Users 408K 296K
Merged (RB + MLB) Tweets 2.56M 2.05M
Users 432K 309K
Table 3: Two-fold approach to classify the user stance

5 Impact of Bots on Online Social Debate and Overall Stance

As we described in the Related Work section, various studies show the relevance of political bot accounts during political elections and referendums. In a recent article [28], the author states that the computational propaganda powered by political bots takes many forms: networks of highly automated Twitter accounts; fake users on Facebook, YouTube, and Instagram; chatbots on Tinder, Snapchat, and Reddit. These bot accounts track different strategies to mimic human users, making it difficult for social media providers to identify them. In our Brexit experiment, we found that there are many accounts deactivated or suspended accounts. On the other hand, we found that many Twitter bot accounts are still alive. As a method of identifying Twitter bot accounts, we benefited from the stateofthe

art bot detector which assigns a bot score to a Twitter account in the range (0,1) describing how likely it is to be an automated account with 1 being the maximum probability

[27] 444Botometer As suggested by the author, we mark an account as bot if it’s score is higher than 0.8. As a result of our analysis, we found that the percentage of bot accounts that are still alive on Twitter is 2.2%, and their average post frequency was 25% higher than the nonbot accounts. Our result confirms the statement of Howard and Kollanyi [17], claiming that the bot accounts were highly active in the Brexit debate.

By extending our findings one step further, we combined the bot scores with the results of user stance classification described in the previous section. Interestingly, our result shows that the higher the bot score, the more likely the account is in a proLeave position (See Fig.6).

Figure 5: Our daily analysis identifies a major change in stance after the referendum. In the following time, the rate of participation in the proRemain side was consistently higher than ProLeave.
Figure 6: The higher the bot score for a Twitter account, the more likely of being in proLeave stance

6 Topic Discovery

We analyzed the topics of Brexitrelated discussions on Twitter. Brexit is a longterm happening regarding its impact on society; therefore a variety of topics have been discussed by Twitter users in the context of Brexit including immigration, borders, and economic impacts. In our study, we benefited from Latent Dirichlet Allocation (LDA) algorithm to extract the topics [29]. One questionable aspect of applying LDA algorithm for our scenario could be the shortness of text contents and data ambiguity. To overcome this limitation, we applied a data selection strategy to eliminate the shortest and noninfluential tweets. As a result, we executed the topic discovery algorithm on a dataset containing 306 thousand tweets posted between January 2016 and October 2018. We evaluated the LDA algorithm based on coherence score and subject expert feedback. We didn’t use the perplexity score because perplexity and human judgment are often not correlated, and even sometimes slightly anticorrelated [30].

6.1 Fullperiod Topic Analysis

In our first experiment, we directly fed the LDA algorithm with the whole set of 306 thousand tweets on January 2016 and October 2018. Then, our subject expert assigned labels to the discovered topics through the representative words as shown in Table 4. However, we found that the quality of the topics is not high in terms of both the coherence score and the subject expert evaluation. This is mainly because the model is ineffective in finding timevarying topics as it operates over a long period. This has led to the inability to find shortterm but essential issues.

For this reason, we decided to reduce the time interval, and we did this systematically by examining the change in the participation to the topics on a monthly basis. As shown in Figure 7, we found that the percentages of change were higher in four specific times. Therefore, we applied the LDA algorithm separately with the tweets sent over these periods. In this way, we achieved more consistent and specific topics than our first experiment.

For instance, our experiment on time period P4 has successfully discovered many key topics related to the cabinet, trade deals with EU, new referendum expectations, Scottish referendum and Irish border (see Table 5). Other results are shown in Appendix (Table 8,9, and 10).

Label Top Representative Words
1.USRussia trump,farage,attack,threat,russia,boris,putin,usa
2.Europe europe,trust,germany,maymustgo,france,cameron,merkel
3.Results live,ask,watch,question,immigration,war,secretary,maga
4.Labour tory,labour,party,scotland,corbyn,leader,political
5.Trade deal,good,trade,bad,ita,free,dona,agreement,sell,offer
6.Inconsist. explain,medium,consequence,message,send,suggest,letter
7.Vote vote,leave,people,want,remain,british,government,
8.Plan theresa_may,conservative,plan,borisjohnson,deliver,fail
9.Inconsist. britain,great,world,article,wrong,nation,damage
10.Economy year,job,business,economy,warn,economic,impact,face
11.Inconsist. take,govt,control,law,place,interest,protect,strong,bill
12.Remain stay,customs_union,single_market,england,membership
13.Economy nhs,pay,money,tax,fund,save,foreign,spend
14.Remain stop,join,help,stand,thank,pro,fight,speak,march,lord
15.Remain stopbrexit,fbpe,country,right,work,thing,remainer,finalsay
16.Leave lie,campaign,ukip,euref,voteleave,leaveeu,blame,truth
17.Borders hard,ireland,border,idea,problem,possible,irish,mess
18.Polarized nigel_farage,feel,freedom,disaster,act,remainernow
19.Inconsist. today,new,day,pm,post,talk,eu,look,future,read
20.Inconsist. prime_minister,reality,everything,westminster,charge
Table 4: The experiment on the whole time period contains topics that are not consistent and repetitive, and cannot find discover many key topics in the Brexit process.

6.2 Relations Among Topics, User Stance and Sentiment

By taking the results of the previous section one step further, we also revealed which polarized sides do the sharing of the topics found.(proRemain / proLeave), and what is the general sentiment to these topics. Our aim is to generate statements such as: For the topic related to immigration, mostly the proRemain/proLeave users are tweeting, and the overall sentiment to this topic is positive/negative. In this task, we used our stance classification results and syntactic wordbased sentiment detection approach.

In our findings, we included the comparison of proRemain/proLeave stances and Positive/Negative sentiments for each discovered topic (see Table 5). One of the impressive results is that the 97% of tweets of the New Referendum Request topic is from the proRemain side with a negative sentiment. On the other hand, for the topic entitled cabinet, 73% of tweets are posted by proLeave side. 88% of the tweets sent related to the Irish border issue have a positive feeling.

Table 5: Stance and Sentiments of Topics discovered in P4 time period (Nov,2017Sep,2018)
Figure 7: Monthly cumulative change of topics based on fullperiod analysis
Politicians News Channels CampaignParty
@theresa_may @BBCNews @UKLabour
@jeremycorbyn @SkyNews @Conservatives
@Nigel_Farage @guardian @LeaveEUOfficial
@BorisJohnson @LBC @vote_leave*
@realDonaldTrump @FT @LibDems
@David_Cameron @Independent @UKIP
@DavidDavisMP @Telegraph @StrongerIn*
@Jacob_Rees_Mogg @afneil @theSNP*
@Anna_Soubry @BBCr4today
@ChukaUmunna @MailOnline
@Keir_Starmer @business
Table 6: Twitter accounts that are mentioned by other users for 10K+ times.The starred accounts have very high bot scores.

7 Analysis of Politician Accounts on Twitter

Online social media is a significant platform for politicians to interact directly with the public. Twitter users can reach politicians directly by mentioning their accounts and declare their opinions. In our study, we analyzed to find the politician accounts that interacted most, and as a result of our categorization through the most frequently mentioned accounts (Table 6), we focused on ten politician accounts. In our comparative temporal analysis (see Fig.8), we have obtained the following insights:

  • James Cameron had lost his influence in Twitter after handing over his Prime Minister (PM) role to Theresa May. New PM Theresa May has become the essential actor of the Brexit process, although she was not known widely by the public before the referendum.

  • At the beginning of July 2017, we discovered a sudden increase in Jacob ReesMogg’s influence on Twitter. He increased his popularity and surpassed the Twitter account, Nigel Farage.

  • After becoming President of the United States, Donald Trump became very popular at the center of the Brexit debate, and this interest continued until 2017. However, as of February 2017, another politician, Jeremy Corbyn, was discussed more than Trump and other politicians.

Figure 8: Comparative influence levels of politicians over time
Table 7: Stance and sentiments of tweets related to specific politician accounts

In addition to the temporal analysis, we also measured the sentiment and stance of Brexit related tweets that are mentioning politician accounts (Table 7). The characteristics of mentions to Nigel Farage and Donald Trump is very similar; those tweets are mostly positive and sent by proLeave users. On the other hand, Jeremy Corbyn and Boris Johnson are mostly discussed by proRemain users.

8 Conclusions

In this study, we provide a comprehensive analysis of the interpretation of large-scale, long-running political phenomena in online social media. By focusing on the one of the most important political happening of recent times, the Brexit referendum, we applied several computational social science techniques on 33 months of public Twitter data. We first performed a demographic analysis on the users participating in the online social discussions on Twitter, and then we predicted their polarized stance with a combination of rule-based and machine learning-based classification methods. As a result of our temporal analysis, we found that the highest change in user stance after the referendum occurred on the proLeave side. Additionally, we extracted the most significant topics of debate, and we measured the public stance and sentiment in respect to these topics. Finally, we analyzed reactions to public accounts of politicians in stance and sentiment, and we compared the volumetric distribution of reactions over time. As a result of our study, we show that social mediabased analysis could provide useful insights to understand people and facts during political phenomena.

9 Implementation Details

In the Tweet and User MetaData Analysis section, we used the Face++ services to determine the number of faces and get demographics information in case of there is a single face in profile photo of the Twitter account.555Face++ In the location analysis section, we used Yandex geocoding services to convert geocoordinates and missing or incomplete location data into a standard format.666Yandex geocoding services

In the stance classification, sentiment detection, and topic discovery parts, we only used the tweets written in English.

In topic discovery section, we used the LDA algorithm [29] provided by Gensim library [31]. In order to eliminate noninfluential tweets from topic discovery logic, we filtered out the tweets that are retweeted by other users for less than 10 times and containing less than 10 words. This criteria plays a role in eliminating noninfluential and short tweets from topic discovery algorithm. By using this dataset, we performed the following operations: preprocessing with the method of Gensim library, removing the stopwords, lemmatizing the words, and converting words to bigrams. Regarding the coherence score and the human judgment on the topics, we concluded that the LDA model achieves its best results with the following parameters: topic count=20, iteration count=500.

At the beginning of our politician account analysis, we first divided the accounts that had more than ten thousand mentions into three categories: politicians, news channels, campaign/party accounts. We also analyzed the bot behavior of these accounts and found bot behavior in only two campaigns and one party account. (See Table 6).


  • [1] Rainie, L., Smith, A., Schlozman, K., Brady, H. and Verba, S. (2012). Social Media and Political Engagement. Pew Research Center: Internet, Science & Technology
  • [2] Segesten, A. D., Bossetta, M. (2016). A typology of political participation online: how citizens used Twitter to mobilize during the 2015 British general elections. Information, Communication & Society, 119
  • [3] Tumasjan, A., Sprenger, T., Sandner, P. and Welpe, I. (2010). Election Forecasts With Twitter. Social Science Computer Review 29(4), pp.402418.
  • [4] Vaccari, C., Valeriani, A., Barbera, P., Bonneau, R., Jost, J., Nagler, J. and Tucker, J. (2015). Political Expression and Action on Social Media: Exploring the Relationship Between Lower and HigherThreshold Political Activities Among Twitter Users in Italy. Journal of ComputerMediated Communication, 20(2), pp.221239.
  • [5] Khatua A and Khatua A (2016) Leave or Remain? Deciphering Brexit Deliberations on Twitter. 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW, 428-433,
  • [6] Metaxas, P. T., Mustafaraj, E., & Gayo-Avello, D. (2011). How (Not) to predict elections. In Proceedings of PASSAT/SocialCom 2011, 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom) IEEE Computer Society, Los Alamitos, CA, 165171.
  • [7] Mohammad S., Sobhani P. and Kiritchenko S. (2017). Stance and Sentiment in Tweets. ACM Transactions on Internet Technology, 17(3), 123.
  • [8] Addawood A., Schneider J. and Bashir M. (2017). Stance Classification of Twitter Debates. In Proceedings of the 8th International Conference on Social Media & Society SMSociety17.
  • [9] Grcar M., Cherepnalkoski D. and Mozetic I. et al. (2017). Stance and influence of Twitter users regarding the Brexit referendum. Computational Social Networks, 4(1).
  • [10] Celli F., Stepanov E., Poesio M., and Riccardi G. (2016) Predicting Brexit: Classifying agreement is better than sentiment and pollsters. In: Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, 110118
  • [11] Amador Diaz Lopez J, Collignon-Delmar S and Benoit K et al. (2017). Predicting the Brexit Vote by Tracking and Classifying Public Opinion Using Twitter Data. Statistics, Politics and Policy, 8(1).
  • [12] Hurlimann M, Davis B and Cortis K et al. (2016) A Twitter Sentiment Gold Standard for the Brexit Referendum. In: Proceedings of the 12th International Conference on Semantic Systems - SEMANTiCS 2016.
  • [13] Somasundaran S. and Wiebe J. (2009). Recognizing stances in online debates. In:

    Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

    , 226234.
  • [14] Rajadesingan A. and Liu H. (2014). Identifying Users with Opposing Opinions in Twitter Debates. Social Computing, Behavioral-Cultural Modeling and Prediction, 153160.
  • [15]

    Zarrella G. and Marsh A. (2016). MITRE at SemEval-2016 Task 6: Transfer learning for stance detection. In:

    Proceedings of the International Workshop on Semantic Evaluation
  • [16] Du J., Xu R., He Y, Gui L. (2016) Stance classification with target-specific neural attention networks. In:

    Proceedings of the Internal Joint Conference on Artificial Intelligence (IJCAI2017)

  • [17] Howard P. and Kollanyi B. (2016). Bots, #Strongerin, and #Brexit: Computational Propaganda During the UK-EU Referendum. CoRR abs/1606.06356
  • [18] Ratkiewicz, J., Conover, M., Meiss, M.R., Goncalves, B., Flammini, A., & Menczer, F. (2011). Detecting and Tracking Political Abuse in Social Media. ICWSM.
  • [19] Bessi A. and Ferrara E. (2016). Social bots distort the 2016 U.S. Presidential election online discussion. First Monday, 21(11)
  • [20] Bastos M. and Mercea D. (2017). The Brexit Botnet and User-Generated Hyperpartisan News. Social Science Computer Review.
  • [21] Chinnov, A., Kerschke, P., Meske, C., Stieglitz, S., and Trautmann, H. (2015). An Overview of Topic Discovery in Twitter Communication through Social Media Analytics. AMCIS.
  • [22] Hong L. and Davison B. (2010). Empirical study of topic modeling in Twitter. In: Proceedings of the First Workshop on Social Media Analytics.
  • [23] Guimaraes A., Wang L. and Weikum G. (2017). Us and Them: Adversarial Politics on Twitter. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW).
  • [24] Zhao W., Jiang J. and Weng J. et al. (2011) Comparing Twitter and Traditional Media Using Topic Models. Lecture Notes in Computer Science, 338349.
  • [25] Song M., Kim M. and Jeong Y. (2014). Analyzing the Political Landscape of 2012 Korean Presidential Election in Twitter. IEEE Intelligent Systems, 29(2), 1826.
  • [26] Jung S.G., An J., Kwak H., Salminen J. and Jansen B.J. (2017). Inferring social media users’ demographics from profile pictures: A Face++ analysis on Twitter users. In: Proceedings of The 17th International Conference on Electronic Business, 140145
  • [27] Davis C., Varol O. and Ferrara E. et al. (2016). BotOrNot. In: Proceedings of the 25th International Conference Companion on World Wide Web (WWW ’16 Companion), pp.273274.
  • [28] Howard P. (2018). The Rise of Computational Propaganda. IEEE Spectrum 55(11) 2631.
  • [29] Blei D, Ng A and Jordan M et al. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 9931022.
  • [30] Chang J., Gerrish S., Wang C., Boyd-Graber JL., Blei DM. (2009) Reading Tea Leaves: How Humans Interpret Topic Models. In: Proceedings of the 22Nd International Conference on Neural Information Processing Systems, 288296.
  • [31] Rehurek R, Sojka P. (2010). Software Framework for Topic Modelling with Large Corpora In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Vol Valletta Malta: ELRA 4550.