An analysis of community structure in Brazilian political topic-based Twitter networks

06/14/2019
by   Camila P. S. Tautenhain, et al.
Unife
0

Online social networks such as Twitter are important platforms for spreading public opinion on a variety of subjects. The classification of users through the analysis of their posts on Twitter according to their opinion sharing can help marketing ads and political campaigns to focus on specific user groups. Community detection-based techniques are specially useful to classify Twitter users, as they do not require rule-based methods or labeled data to perform the clustering task. In this paper, we constructed networks using data related to political discussions in Brazil extracted from Twitter. We show that (i) these networks follow the power-law distribution, indicating that a few popular users are responsible for most of the "mentions" and "retweets"; (ii) the most popular tweets are viral and spread across the communities whereas most of the remaining tweets are trapped in the communities where they originated; and (iii) words associated with positive sentiments are predominant in network communities related to the Brazilian presidential elections and appear in viral tweets.

READ FULL TEXT VIEW PDF

Authors

page 12

page 16

page 20

04/15/2022

Political Communities on Twitter: Case Study of the 2022 French Presidential Election

With the significant increase in users on social media platforms, a new ...
08/05/2016

Community Detection in Political Twitter Networks using Nonnegative Matrix Factorization Methods

Community detection is a fundamental task in social network analysis. In...
07/29/2020

#Brexit: Leave or Remain? The Role of User's Community and Diachronic Evolution on Stance Detection

Interest has grown around the classification of stance that users assume...
09/10/2018

Studying Confirmation Bias in Hashtag Usage on Twitter

The micro-blogging platform Twitter allows its nearly 320 million monthl...
06/14/2020

Communities of attention networks: introducing qualitative and conversational perspectives for altmetrics

We propose to analyze the level of recommendation and spreading in the s...
08/16/2019

A Survey on Computational Politics

Computational Politics is the study of computational methods to analyze ...
08/24/2020

Predicting Shifting Individuals Using Text Mining and Graph Machine Learning on Twitter

The formation of majorities in public discussions often depends on indiv...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Social networks have an important role in spreading the political views of their users, who may be public figures or common citizens (Weller ., 2013). Twitter, in particular, is an online social network used to post short texts, called tweets, through which the users express their opinions, thoughts and feelings. In addition, users can make or share public statements they agree with (Pak  Paroubek, 2010). They may also interact with each other on Twitter through the option “follow” and by retweeting other tweets. In this sense, Twitter presents valuable information on public opinion on various areas and can be employed for a variety of purposes, e.g., to direct marketing ads (Cambria ., 2013) and to guide political campaigns (Larsson  Moe, 2012; Golbeck ., 2010; Yardi  Boyd, 2010). This online social network has approximately 321 million active users (Kastrenakes, 2019). Brazil is one of the 10 countries with the highest number of active accounts in Twitter according to information provided by the company. Therefore, data mining algorithms play a key role in the analysis of networks created with data collected from Twitter.

Context mining, in particular, is a research topic that has attracted the attention of the scientific community (Conover ., 2011; Pennacchiotti  Popescu, 2011; Makazhanov ., 2014; Surian ., 2016; Gaul  Vincent, 2017; Ahmadi ., 2018)

. A number of tools has been proposed for the analysis of Twitter data, among which we highlight the ones based on natural language processing techniques

(Ding  Liu, 2007; Sun ., 2017)

and supervised machine learning algorithms

(Pennacchiotti  Popescu, 2011; Conover ., 2011; Makazhanov ., 2014). These strategies aim at detecting users’ opinions and feelings according to the content of what they publish. A major drawback of natural language processing techniques is the diversity of linguistic rules. Furthermore, supervised machine learning methods can only be used when there is enough labeled data to train the algorithm.

Graph-based techniques, on the other hand, compile Twitter data into graphs where the vertices are the users and the arcs are the interactions between users. These techniques use graphs with unlabeled data to detect public opinions. Examples of graph-based techniques can be found in (Conover ., 2011; Pennacchiotti  Popescu, 2011; Sluban ., 2015; Surian ., 2016). According to a social network property known as homophily (Tang ., 2014), users are more likely to retweet or mention tweets from other users they share a common interest with. Community detection in networks, thereby, can provide useful information on whether or not tweets share the same contexts.

In light of the recent corruption investigations in Brazil and the presidential elections, discussions in social networks about the Brazilian political scenario have seen a significant increase. In the study reported here, we extracted data from tweets containing words related to politicians who would reportedly run for the 2018 Brazilian presidential elections and to corruption investigations in Brazil. The Twitter API (Twitter, 2018) was employed to extract data concerning interactions between users, such as retweets and mentions, to construct the Twitter networks proposed in this paper.

The primary purpose of this study was to analyze the structure and topology of networks constructed with data from Twitter in order to classify users according to the subject of their tweets using community detection algorithms and to investigate the sentiment associated with the communities in the networks related to the Brazilian presidential elections. We also confirmed the observation by Weng . (2013) that a few popular tweets, referred to as viral tweets, spread across the communities like viruses and, therefore, are not expected to affect the community structure. Non-viral tweets, on the other hand, get trapped in the communities where they originated and, thereby, are expected to stay within the communities.

The networks under investigation are scale-free and therefore have few high-degree vertices and a large number of low-degree vertices. In other words, a few popular users contribute to most of the retweets and mentions on these networks. Thereby, the most retweeted tweets are the ones posted by the most popular users. In addition, a case study on the networks related to the Brazilian presidential elections showed that words associated with positive sentiments are predominant in the communities. Amongst these words, the ones which contribute most to the positive sentiment in the communities are the same words that appear in viral tweets.

The remainder of the paper is organized as follows: Section 2 Community Detection in Networks introduces basic concepts and algorithms for community detection in networks; Section 3 Related Works presents a brief discussion about related works; Section 4 Extracted Networks shows the networks constructed from Twitter data; Section 5 Computational Experiments with Community Detection Algorithms presents the computational experiments carried out with community detection algorithms; Section 6 Network Analysis presents a network analysis where the structure of the introduced networks is investigated; Section 7 Context and Sentiment of the Communities in Relation to the Elections investigates the communities related to the 2018 Brazilian presidential elections and the sentiments associated with them; and Section 8 Final Remarks and Future Works summarizes the contributions of the paper and gives directions to future works.

2 Community Detection in Networks

This section presents the theoretical background and primary notations used throughout the paper. Moreover, it introduces some of the classical community detection algorithms employed in studies related to our investigation found in the literature, and which will also be the tools used for the network analysis in this paper.

2.1 Basic Concepts and Notations

In this paper, a network is defined as a directed graph (digraph) , where and are, respectively, its sets of vertices and arcs and is a function that assigns a weight to each arc of the digraph. An arc is defined as an ordered tuple , where the points are called tail and head of , respectively. Let , and be, respectively, the number of vertices, number of arcs and the total weight of arcs of .

The in- and out-degrees of a vertex are given by, respectively, and . Moreover, the degree of a vertex is defined by .

A partition of into communities is given by , where and , . The community of a vertex is referred to as . Moreover, let be the vertex-induced subgraph, which is the subgraph of such that and is composed by all arcs in whose points are in .

2.2 Community Evaluation Metrics

One way of evaluating the quality of the community structure of a given partition is by calculating its closeness to the expected partition according to metrics, like the Normalized Mutual Information (NMI) proposed in (Shannon, 1948). However, as the expected partition is not available in most applications, in order to evaluate the quality of partitions, we employed the widely used evaluation measure known as modularity.

Modularity is based on the intuitive notion that vertices belonging to the same community are more likely to connect with each other than with vertices from different communities. Equation (1) defines the modularity measure for weighted undirected graphs (Leicht  Newman, 2008) applied to a partition of digraph .

(1)

As can be seen in Equation (1), the modularity of consists in the difference between the total weight of the arcs between vertices of a same community and the expected sum of weights of edges inside communities in a random undirected graph whose vertices have the same degree sequence as the digraph under consideration. The values of the measure range from to and the higher the value, the better the partition.

2.3 Community Detection Algorithms

Since the modularity measure was proposed by Newman  Girvan (2004), several authors have studied ways to detect community structures in a plethora of complex networks. In particular, community detection algorithms that aim at maximizing the modularity measure have been extensively studied (Newman, 2004; Blondel ., 2008; Santos ., 2016; Francisquini ., 2017). Blondel . (2008) presented the Louvain method, a multilevel greedy algorithm for modularity maximization that was successfully applied to a web graph with 118 million vertices and 1 billion edges. Newman (2006)

studied a spectral decomposition using the leading eigenvector of the modularity matrix and introduced a greedy algorithm to define the communities.

Despite the good results achieved by modularity maximization-based methods, the measure has a resolution limit that restricts the size of the communities found by the algorithms. Yang . (2016) showed that the leading eigenvector algorithm proposed by Newman (2006) found partitions far from the expected even in networks with small mixture coefficients111The mixture coefficient of a network is the percentage of edges or arcs crossing communities with respect to the total number of edges or arcs in the graph.. Yang . (2016) also showed that the Louvain method was able to find the expected communities in networks with small mixture coefficients and partitions sufficiently close to the ones expected for the networks with high mixture coefficients.

Among the algorithms that aim to optimize different measures, we highlight those proposed in (Pons  Latapy, 2005; Rosvall  Bergstrom, 2007; Lancichinetti ., 2011). Pons  Latapy (2005)

introduced the Walktrap algorithm to detect vertex partitions based on the random walkers’ probabilities of finding trails between vertices in the networks.

Rosvall  Bergstrom (2007) introduced the map equation measure to calculate the description length of a random walker in a digraph. The Infomap algorithm, also proposed by the authors, minimizes the map equation to find communiries in networks. Walktrap achieved results comparable with the ones achieved by Louvain, whereas Infomap achieved worse results in networks with higher mixture coefficients (Yang ., 2016).

Lancichinetti . (2011) suggested finding communities in networks by evaluating the communities according to their statistical significance. They proposed an algorithm named Order Statistics Local Optimization Method (OSLOM), which obtained better results than all the aforementioned algorithms in networks with the higher mixture coefficients, according to Lancichinetti . (2011). Nevertheless, OSLOM requires more computational time than the other algorithms to find the communities.

The community detection algorithm known as Label Propagation (LP) (Raghavan ., 2007) is neighborhood-based and does not optimize any measure. To form the communities, it starts from a partition whose vertices are isolated in their own communities – singletons — and iteratively assigns each vertex to the community the majority of its neighbors belong to. The computational complexity of LP is quasi-linear, being very fast in detecting communities. However, LP is highly dependent on the number of iterations since if it is too high, it tends to merge all communities into a single cluster.

Next section presents a brief literature review of studies related to the classification of users in Twitter networks.

3 Related Works

This section briefly discusses classical approaches for mining public opinion in social networks through natural language processing and supervised machine learning algorithms. Special attention is given to applications using information extracted from Twitter and to works that classify users based on community detection algorithms.

Natural language processing techniques use a set of words that describe opinion and linguistic rules to infer users’ opinions on Twitter. Although rule-based techniques are rather effective, linguistic rules usually change along the years and differ from language to language. In addition, users always follow linguistic rules in their posts. Thus, these techniques are very limited and strongly dependent on specialist knowledge (Wang ., 2011). For further details on these techniques, we refer the reader to the recent review in (Sun ., 2017).

Supervised machine learning approaches are commonly applied to different domains and can therefore overcome the challenges presented by rule-based methods. Among these approaches, we highlight the methods studied in (Conover ., 2011; Pennacchiotti  Popescu, 2011; Makazhanov ., 2014). Pennacchiotti  Popescu (2011)

trained a decision tree to learn Twitter users’ information, such as political affiliation, ethnicity and whether the users were fans of a famous coffee shop according to the number of followers and friends, number of replies, average number of tweets and linguistic content of tweets.

Makazhanov . (2014)

also trained a decision tree coupled with a logistic regression to learn about political positions through the interaction between users.

Conover . (2011)

predicted the political alignment of Twitter users by using a Support Vector Machine (SVM) trained with the contents of users’ tweets.

However, these supervised methods require labeled data to be used as training data. Generally, real-time data are not inherently labeled. In such cases, a specialist can manually label a small sample for training machine learning algorithms, as performed in (Conover ., 2011). In dealing with big data, manual data labeling is not viable, since if a very small sample is used for training, the model becomes unreliable (Raudys ., 1991).

Community detection in networks does not need any prior knowledge about the data classes and mostly does not rely on rule-based techniques. These algorithms detect communities whose users interact more among each other, in terms of friendship, retweets or mentions, than with the rest of the users. Community detection in networks can thereby provide useful sharing information on tweets among users (Conover ., 2011; Sluban ., 2015). In addition to the SVM classifier, Conover . (2011) employed a community detection algorithm based on the leading eigenvector method (Newman, 2006) and on the label propagation algorithm (Raghavan ., 2007). Sluban . (2015) detected the communities of a Twitter network using the Louvain method (Blondel ., 2008) and then identified the topics of the tweets in each community using a content-based classification algorithm.

Surian . (2016) investigated discussions about HPV vaccines on Twitter using a community detection algorithm to infer about the topics of the tweets. Despite not explicitly employing a community detection algorithm, Pennacchiotti  Popescu (2011) refined the results found by a supervised classification method by updating the classes of the users according to the classes their friends belonged to.

Weng . (2013) studied the spreading of hashtags on Twitter networks and observed that a few viral hashtags disseminate through the communities like viruses, while most get trapped within the communities where they originated. According to Weng . (2013), the virality of a tweet can be measured as the percentage of retweets it receives from users of communities different than the community of the user who posted it. Let be a partition of vertices into communities. Equation (2) defines the virality of a tweet posted by a user from a community .

(2)

where is the set of users who retweeted tweet .

As Weng . (2013) affirmed that viral tweets spread equally to the communities, we can expect that they do not affect the community structure of the networks. Non-viral tweets, on the other hand, spread mostly in the communities where they originated and, thus, are expected to define the community structure of the networks.

Next section discusses the methodology employed to extract information from Twitter to construct the networks under evaluation in this paper.

4 Extracted Networks

In this paper, we constructed directed networks from data collected on Twitter using the streaming API (Twitter, 2018), which returns samples of recent tweets matching the application queries. The vertices of the digraphs represent the Twitter users from the sample of tweets. Arc in digraph expresses that user was mentioned, replied to or posted a tweet that was retweeted by user . The weight of arc , , is the number of times was mentioned, replied to or retweeted by user .

Figure 1 shows a triplet whose arcs were added due to tweets. In this figure, user retweeted a tweet from user , creating arc (). In addition, user mentioned user in another tweet, represented by arc . Note that the arcs of the networks do not distinguish retweets from mentions.

was retweeted by

was mentioned by
Figure 1: Example of a network with 3 users and whose arcs were created due to retweets and mentions.

Table 1 presents the queries employed to construct the networks. Table 2 presents the number of vertices () and arcs (); total weight of the arcs (

); and the average (avg), standard deviation (std), minimum (min) and maximum (max) in- and out-degrees of the proposed networks. Note that the sum of the in-degrees and the sum of the out-degrees of all vertices are equal to the total weight of the arcs,

, since each arc of counts to the out-degree of and to the in-degree of . Consequently, the average in- and out-degrees are the same. The networks introduced in this paper have the maximum out-degree at least times larger than the maximum in-degree.

The Politicians network was constructed by querying Twitter API tweets matching the names of politicians who demonstrated interest in running for the 2018 Brazilian presidential elections. As this network was constructed before the politicians had effectively presented their candidacy, we selected the names based on an article published by the BBC (Odilla, 2018). The Bolsonaro and Lula networks captured interactions in tweets that referred to politicians Jair Bolsonaro and Luiz Inácio Lula da Silva, respectively; LavaJato network was formed by tweets presenting reactions to Operation Car Wash, in Brazil, which investigates executives who have reportedly accepted bribes from construction firms in exchange for inflating contracts prices. 1st_Round and 2nd_Round networks were constructed using, respectively, tweets expressing reactions to the first and second rounds of the 2018 Brazilian presidential elections222The Brazilian presidential elections are composed of two rounds. In case the most voted candidate in the first round does not get the absolute majority of the votes, the second round of the election occurs. In the second round, the two most voted candidates in the first round are the only ones to run.. These last two networks only model interactions using retweets.

Network Query Date
Politicians lula,bolsonaro,alckmin,temer,cirogomes, marinasilva, May 15, 2018 to
alvarodias, fernandohaddad, manueladavila,guilhermeafif, May 16, 2018
fernandocollor, guilhermeboulos,joaoamoedo
Lula lulapreso,lula preso,lulalivre,lula livre, May 7, 2018 to
May 9, 2018
Bolsonaro bolsonaro, bolsonaro2018, jair bolsonaro, bolsonaro presidente, May, 7 2018 to
#bolsonaro2018, #bolsonaropresidente, #bolsonaro May 14, 2018
LavaJato lavajato,lava-jato,“lava jato” May, 7 2018 to
May 14, 2018
1st_Round jairbolsonaro,bolsonaro,fernandohaddad,haddad, cirogomes, 7 October, 2018
ciro,geraldoalckmin,alckmin, marinasilva, marina
alvarodias,henriquemeirelles, meirelles, joaoamoedo,amoedo
guilhermeboulos, boulos, josemariaeymael,cabodaciolo
2nd_Round jairbolsonaro,bolsonaro,fernandohaddad,haddad, cirogomes, 28 October, 2018
ciro,geraldoalckmin,alckmin,marinasilva,marina,
alvarodias,henriquemeirelles, meirelles, joaoamoedo,amoedo
guilhermeboulos, boulos, josemariaeymael,cabodaciolo
Table 1: Information on the queries submitted to Twitter streaming API.
Network in-degree out-degree
avg std min max avg std min max
Politicians 16137 31933 40858 2.53 6.06 0 359 2.53 32.39 0 2875
Lula 13319 33519 40456 3.04 7.8 0 236 3.04 35.8 0 1798
Bolsonaro 873 1264 1676 1.92 2.75 0 32 1.92 9.62 0 178
LavaJato 14081 35642 42937 3.05 7.8 0 237 3.05 35.16 0 1798
1st_Round 135865 242679 249042 1.83 2.49 0 103 1.83 42.76 0 6032
2nd_Round 100755 171813 176955 1.76 2.38 0 79 1.76 33.17 0 3863
Table 2: Information on the networks extracted from Twitter.

The following section presents computational experiments with community detection algorithms in these networks.

5 Computational Experiments with Community Detection Algorithms

In the experiments presented in this section, we considered four different community detection algorithms available in the igraph package (Csardi ., 2006): Louvain Method (Blondel ., 2008), Label Propagation (LP) (Raghavan ., 2007), Infomap (Rosvall  Bergstrom, 2007) and Walktrap (Pons  Latapy, 2005). Even though the Infomap method has a version to detect communities in digraphs, the version to undirected graphs found communities with higher modularity values and was therefore selected for the experiments. All the experiments were carried out on a computer with an Intel Core Xeon E5-1620 processor with 3.7-GHz and 32GB of main memory.

Table 3 shows the modularity of the partitions obtained by the algorithms for each of the networks used. In this table, the columns marked as “Modularity” refer to the modularity value of the partitions obtained by the community detection algorithms according to Equation (1). The columns marked as “Time” report the time (in seconds) required to detect the communities. Due to computer memory limitations, Walktrap could not find the communities for networks 1st_Round and 2nd_Round.

Network Louvain LP Infomap Walktrap
Modularity Time Modularity Time Modularity Time Modularity Time
Politicians 0.7211 0.16 0.6521 0.2 0.6162 10.09 0.6527 8.98
Lula 0.5464 0.15 0.4107 0.14 0.4676 8.77 0.4465 7.92
Bolsonaro 0.7993 0.01 0.746 0.01 0.7497 0.15 0.7435 0.02
LavaJato 0.5504 0.17 0.4247 0.15 0.4673 9.58 0.4594 8.89
1st_Round 0.6771 1.48 0.6488 8.48 0.5401 2.98 - -
2nd_Round 0.7228 1.54 0.6664 2.52 0.5742 1.95 - -
Table 3: Modularity values of communities achieved by reference community detection algorithms.

The Louvain method is the only algorithm included in Table 3 that aims at maximizing the modularity. Thereby, as expected, it achieved the highest modularity values for all the networks. The remaining algorithms, however, still obtained partitions with high modularity values. As previously explained, the modularity measure was selected to evaluate the quality of the partitions because there are no ground-truth communities available for these constructed networks. Therefore, we selected the Louvain method to detect communities in networks in the remainder of this paper as it was the algorithm that obtained communities with the highest modularity values in the lowest computational times.

The partitions achieved by the Louvain method for the 1st_Round and 2nd_Round networks were chosen to be analyzed at length in this paper. To illustrate these partitions, Figures 2(a) and 2(b) exhibit the 1st_Round and 2nd_Round networks along with the communities found. In these figures, vertices belonging to the same community have the same color.

(a) 1st_Round network.
(b) 2nd_Round network.
Figure 2: Communities found by the Louvain method.

Next section presents the analysis performed to describe the topology and the community structure of the constructed networks.

6 Network Analysis

Tedjamulia . (2005) suggested dividing the users of Twitter networks into categories according to their popularity and activity. In this paper, we have divided the users into 8 sets (categories) according to their popularity – number of mentions and retweets received by the user – and activity – number of mentions and retweets made by the user. The categories related to the popularity are called , , and , whereas those regarding activity are , , and . and contain the most popular and the most active users, respectively; and comprise the most popular users not in and the most active users , respectively; and have the most popular users that are neither in nor in and the most active users that are neither in nor in ; and and comprise as the least popular and the least active users, respectively.

According to these sets of users, the most popular and active users are those from and , respectively. The most popular and active users are those that belong to and , respectively.

Now, consider the division of tweets into sets (categories) according to their popularity regarding only the number of retweets they received: and are the categories of tweets and are respectively ranked as the most popular tweets, most popular tweets not in , most popular tweets not in nor in and least popular tweets.

The following sections show the analysis of the constructed networks with respect to the popularity and activity of their users, tweets and communities. Moreover, they also show the evolution of the topologies of the networks along time. The classification of users into communities and the spread of viral tweets is also discussed. In particular, the results of the largest network 1st_Round are further investigated.

6.1 Users’ Popularity and Activity

This section presents the analysis of the proposed networks with respect to the distribution of their degrees in order to describe the popularity and activity of the users and their influence on the constructed networks. The section investigates the 1st_Round network more thoroughly whereas the other networks are loosely discussed.

The popularity of a user is measured by the total number of mentions and retweets that they receive. A user’s activity is defined by the number of mentions and retweets they make. According to the definition of arcs presented in Section 4 Extracted Networks, the in- and out-degrees of a vertex represent, respectively, the activity and the popularity of the user that the vertex refers to.

Large social networks are usually scale-free. In particular, on Twitter-based networks, the most visible users are the nodes with the highest degrees. To analyze the topology of the constructed networks, Figures 3(b) and 3(a) exhibit histograms with the out- and in-degree frequencies of 1st_Round.

(a) Out-degree distribution.
(b) In-degree distribution.
Figure 3: Histograms of the out- and in-degree distributions of 1st_Round.

As can be noted in Figure 3, the distribution of the node degrees follows a power law. The power law degree distribution of the vertices corroborates with the observation that a few users are more active or visible in social networks.

Bruns  Stieglitz (2013) stated that a minority of users is very active or popular in a Twitter network. In the 1st_Round network, users from and were responsible for, respectively, and of the total number of tweets. Users from and received, respectively, and of the total number of retweets. As a consequence, on the one hand the 90% least active users () were responsible for posting most of the mentions and retweets. On the other, the top most popular users () received most of the mentions and retweets.

Also, of users from were among the least active users, i.e., they belonged to . A total of of users who ranked between the and the most popular users (in ) are also between the and most active ones (in ). Only of the top most popular users (in ) belong to the top most active users (in ). None of the 0.1% most popular users, i.e. in , is among the top 0.1% most active users, i.e. in . Since users from are responsible for of the tweets, it is possible to conclude that the most popular users are not necessarily the most active ones. This conclusion makes sense since the top most popular users are news organizations or public figures, such as, for example, politicians and journalists, who get more retweets than effectively post tweets on Twitter. The most popular users, therein, must effectively post popular tweets that receive most of the retweets and mentions.

Table 4 exhibits the percentage of the tweets posted by users from each popularity category. As can be seen in the table, the higher the popularity of the tweets, the higher the popularity of the users who posted them.

Users
Tweets 0.26% 94.88% 4.34% 0.52%
0% 27.96% 67.68% 4.37%
0% 0% 44.16% 55.84%
0% 0% 0% 100%
Table 4: Percentage of tweets posted by users from each popularity category for network 1st_Round.

Table 5 shows an analysis of the activity and popularity of users in the networks introduced in this paper. In this table, column “Network” identifies the network under analysis; column “Contribution to tweets posted” presents the percentage of mentions and retweets sent by the users from activity categories; column “Contribution to retweets” shows the percentage of mentions and retweets received by the users from popularity categories; column “Popular users ranked as active” presents the percentage of users ranked into popularity categories who are also in activity categories; and column “Popular tweets posted by users” presents the percentage of the 90% least and 10% most popular tweets posted by the 90% and 10% most popular users, respectively.

Network Contribution to Contribution to Popular users Popular tweets
tweets posted retweets or mentions ranked as actives posted by users
in in in in in
Politicians 53.24% 30.66% 11.03% 5.07% 5.11% 29.15% 36.88% 28.87% 91.04% 15.08% 5.56% 44.59% 100%
Lula 44.73% 35.61% 14.05% 5.62% 2.91% 25.86% 39.14% 32.09% 91.11% 15.03% 5.97% 90.34% 9.63%
Bolsonaro 58.65% 29.89% 6.8% 4.65% 22.37% 34.37% 18.62% 24.64% 89.94% 5.13% 10% 85.62% 16.67%
LavaJato 44.63% 35.76% 13.99% 5.61% 3.2% 26.86% 38.54% 31.39% 91.11% 15.23% 7.04% 91.29% 11.58%
1st_Round 61.08% 28.54% 8.28% 2.11% 1.88% 15.75% 29.62% 52.75% 89.77% 7.01% 1.18% 27.09% 100%
2nd_Round 61.53% 28.03% 8.31% 2.14% 0.87% 17.61% 35.43% 46.09% 89.5% 4.73% 1.39% 14.49% 100%
Table 5: Analysis of popularity and activity of users in the proposed networks.

The values in Table 5 corroborate with the findings on network 1st_Round: the 90% least active users () are responsible for posting most of the mentions and retweets; the top most popular users receive most of the mentions and retweets; the top most popular users are mostly not ranked in the top most active; and less than of the to most popular users () are also ranked as the to most active users ().

However, for all the networks but Politicians, 1st_Round and 2nd_Round, the percentage of the top most popular users () who posted the top most popular tweets () ranged from to . According to this result, the tweets that received most of the retweets were not necessarily posted by the most popular users. The users’ popularity might therefore be due to: (i) retweets received from a plethora of tweets; or (ii) mentions received from other users.

In the next section, like Sluban . (2015), we shall analyze the influence of the most popular users on the communities.

6.2 Influence of the Most Popular Users and Tweets

The popularity of a community is defined by the sum of the popularity of its users, that is, the total out-degree of the vertices that belong to the community.

Figure 4 presents the percentage of users from each popularity category in the communities of network 1st_Round. Each community is described on the x-axis according to their size. Figure 5 shows the contribution of the users from each popularity category, in percentage, to the total popularity of the communities found by Louvain in network 1st_Round. Again, the x-axis indicates the size of the communities, which is the number of vertices in a community.

Figure 4: Percentage of users in the communities found by Louvain method in network 1st_Round.
Figure 5: Contribution of each popularity category, in percentage, to the popularity of the communities found by Louvain method in network 1st_Round.

Although in smaller number, the top most popular users () are responsible for the majority of the community’s popularity for most of the communities with more than 28 vertices. Users from have also significantly contributed to the popularity of the communities. As expected, although the least popular users () are the majority of users in the communities, they do not contribute to the popularity of the communities.

The social influence on a community is the percentage of the retweets or mentions its users receive made by users from other communities (Sluban ., 2015). Figures 7 and 7 summarize the influence exerted by, respectively, users and tweets from each popularity category in the communities.

Figure 6: Influence exerted by users of the communities in network 1st_Round.
Figure 7: Influence exerted by tweets of the communities in network 1st_Round.

The least popular users have an average influence very close to , meaning that most of the retweets and mentions they receive are made by users within their community. Users from and have higher social influence on their communities than . A similar conclusion can be drawn on the influence of the most retweeted tweets on communities. The social influence concerning tweets from , and are substantially higher than of users from , which suggests that the most retweeted tweets are indeed good candidates to be viral tweets.

We classify a tweet as viral if its virality is greater than the threshold , that is, at least of the retweet it receives is from communities other than the one it originated from. Figure 8 exhibits the percentage of viral tweets in the communities of network 1st_Round regarding each tweet popularity category. According to this figure, the higher the popularity of the tweets, the higher their chance of being viral. In particular, all of the most popular tweets () are viral.

Figure 8: Percentage of viral tweets in the communities of network 1st_Round.

Table 6 presents a brief analysis of the communities found by the Louvain method in the networks introduced in this paper. In this table, columns “Number of users in communities (%)” and “Popularity of users in communities (%)” exhibit the percentage of users from each popularity category and their popularity, respectively, in the communities. Column “Viral tweets” shows the percentage of the most popular tweets that are viral. The columns marked as “avg” and “std” exhibit, respectively, the average and standard deviations of the aforementioned values.

Network Number of users in communities (%) Popularity of users in communities (%) Viral Virality of
tweets (%) tweets (%)
avg std avg std avg std avg std avg std avg std avg std avg std avg std avg std
Politicians 88.85 15.46 10.75 15.39 0.38 1.09 0.02 0.07 42.4 46.81 65.29 35.43 56.93 27.27 55.85 27.32 32.78 31.41 10.14 10.78
Lula 78.62 18.1 20.95 18.27 0.41 0.72 0.03 0.07 11.6 22.96 65.69 34.08 53.07 24.22 54.05 29.54 100 0 91.11 11.36
Bolsonaro 87.58 9.16 11.67 9.7 0.64 1.12 0.11 0.37 31.45 19.94 49.03 25.99 62.2 26.66 75.37 16.26 97.44 9.25 87.35 19.01
LavaJato 80.2 14.16 19.45 14.31 0.33 0.68 0.02 0.07 17.77 29.22 67.76 32.26 50.2 25.04 51.54 31.1 100 0 92.55 9.36
1st_Round 71.47 17.31 28.1 17.63 0.4 0.76 0.03 0.05 0 0 70.43 37.59 36.07 22.88 58.56 13.1 55.54 40.09 4.6 6.33
2nd_Round 74.72 15.6 24.91 15.93 0.34 0.66 0.03 0.05 0 0 72.7 37.87 41.14 25.69 56.4 21.42 47.24 41.86 7.99 9.98
Table 6: Analysis of the communities found by Louvain for the constructed networks.

On the one hand, the results of “Number of users in communities (%)” and “Popularity of users in communities (%)” presented in Table 6 confirm the preceding analysis of network 1st_Round: the most popular users are responsible for most of the popularity of the communities. On the other hand, all or almost all the most popular tweets are viral on the remaining networks.

Except for the Politicians, 1st_Round and 2nd_Round networks, tweets had viralities higher than . The average virality of tweets from Politicians, 1st_Round and 2nd_Round networks, on the other hand, were lower than .

Next section presents an analysis of the growing patterns in network 1st_Round.

6.3 Temporal Analysis of 1st_Round Network

To understand how the popularity and activity of users from network 1st_Round evolves along time, we took snapshots of the network at every 5000 tweets. Each snapshot represents the network constructed from the tweets posted up to that time. Figure 10 presents the average out-degree, that is, the number of retweets received from users from each popularity set with regard to the snapshots. Figure 10 presents the average in-degree, that is, the number of mentions and retweets made by users from each activity set with regard to the snapshots.

Figure 9: Average out-degree of users in popularity categories along time windows.
Figure 10: Average in-degree of users in activity categories along time windows.

The top most popular users () received an increasing number of retweets along time, as shown by their augmenting average out-degree. This observation corroborates with the remark that new vertices are more likely to connect with the existing vertices with the highest degrees in scale-free networks.

Figure 11 presents the evolution of network 1st_Round along time. In each snapshot, the following are considered: the total number of communities in the network obtained by the Louvain method, the average community size, the total number of arcs and nodes, the number of new tweets and the total number of different tweets in the network. As shown in Table 2, the network 1st_Round has 135865 nodes and 242679 arcs. It is possible to notice in Figure 11 that the number of new tweets that appear in the network at each snapshot is nearly constant. However, at snapshot 48, the number of new tweets drops significantly and then immediately starts to raise again.

Figure 11: Evolution of network Politicians over time.

On average, 15% of the new arcs that appear in the network in a given snapshot represent retweets and mentions of new tweets. The other 85% of the new arcs represent new retweets and mentions of tweets already in the network in previous snapshots. The number of arcs created from a tweet decreases rapidly over time. In Figure 10 it is possible to note that the popularity of users from ceases to increase just in snapshot 48, where the number of new tweets is smaller.

Figure 12: Duration of each snapshot in seconds.

Figure 12 shows the duration (in seconds) of each snapshot. The average duration of each snapshot was 108 seconds, which means that 5000 tweets were collected at every 108 seconds. The average duration of snapshots before and after snapshot 48 was, respectively, 110 and 104 seconds. Snapshot 48 was collected upon the disclosure of the results of the first round of the elections at 19:00:00 (Brasília Time (BRT)), while snapshot 49 was collected from 19:00:38 to 19:02:23 (Brasília Time (BRT)). On the one hand, the average duration of the snapshots obtained after the release of the first round of results decreased. That is, the 5000 tweets collected in each snapshot were obtained in a shorter time window, which meant that Twitter users were more active. On the other hand, the number of new tweets appearing in snapshot 49 is smaller than in previous snapshots. This possibly means that, during the disclosure of the election results, users created fewer new tweets and interacted more with previously posted tweets. However, the rate of new tweets soars after snapshot 48, whose content was probably about the released results.

7 Context and Sentiment of the Communities in Relation to the Elections

The output of a community detection algorithm provides a straightforward classification of users into communities. Tweets can be classified according to the communities of the users who posted them. In this section the context of the communities and their sentiments in relation to the words in the tweets of networks 1st_Round and 2nd_Round are studied.

Prior to the sentiment analysis, we first selected the words which were used at least 10 times in the communities with a minimum of 1000 vertices. Then, we translated all the words into English. Finally, we employed the R library sentimentr (Rinker, 2016) to analyse the sentiment related to the selected words in each community. Each word is associated with a score in the range [-1,1], where -1 and 1 indicate the most negative and positive sentiments, respectively.

Figures 13(a) and 13(b) illustrate word clouds with the most frequent words in the tweets from networks 1st_Round and 2nd_Round. The word cloud in Figure 13(a) shows that the names of the three most voted candidates in the first round of the elections were also amongst the most used words. This word cloud also shows the words “segundo” and “turno”, which mean “second” and “round” in English, respectively, suggesting interest in the second round during the first round of the elections. The word cloud in Figure 13(b) shows the name of the two candidates who ran in the second round of the elections.

(a) Word Cloud for network 1st_Round.
(b) Word Cloud for network 2nd_Round.
Figure 13: Word clouds with the most frequent words.

Figures 14(a) and 14(b) show the sentiment of the most common words in the tweets from networks 1st_Round and 2nd_Round. The x-axis in the figures presents the size of the communities, i.e., their number of vertices. The y-axis shows the sentiment of the words. The center of the circumferences indicates the sentiment of the words whereas their diameter indicates the frequency of the data collected: higher diameters indicate words used more often. These figures show that the sentiment and frequencies of the words in the data collected regarding the first and second rounds of the elections are similar. The higher the size of the community, the higher the frequency of the words. Moreover, although the number of words associated with negative and positive sentiments is similar, the words associated with positive sentiments appear more often.

(a) Network 1st_Round.
(b) Network 2nd_Round.
Figure 14: Sentiments associated with the most common words on the communities.

Based on the observation that the words associated with positive sentiments are more frequent, the sentiment of the words that appear more times in the tweets should weigh more in a general sentiment score. Thereby, we calculate the average sentiment score of each community as the sum of the sentiments associated with each word multiplied by its frequency in the community.

Figures 15 and 16 present the average sentiment score of the communities with at least 1000 vertices. In these figures, the vertices belonging to the community found using the Louvain method are collapsed into supernodes and the edges between them represent the existence of arcs between its vertices. The larger the size of a supernode, the higher the number of vertices in the community. Similarly, the thickness of the edges is proportional to the amount of arcs between the communities. Additionally, for the 7 largest communities, these figures also show the three negative and positive words associated with the highest and lowest scores, respectively. The word in Portuguese is presented next to a green or red square whose color intensity is proportional to the sentiment score and the number of times the word was used in the community.

Figure 15: Average sentiment score and words associated with the highest positive and negative scores in communities considering network 1st_Round.
Figure 16: Average sentiment score and words associated with the highest positive and negative scores in communities considering network 2nd_Round.

The scores presented in Figures 15 and 16 show that, on average, the sentiment associated with the words in the communities was positive. However, this result is due to the higher frequency of positive words. These words are also amongst the most frequent words exhibited in the word clouds of Figures 13(a) and 13(b).

Moreover, the words associated with the highest scores are the same in all the communities in both Figures 15 and 16. These words occur in tweets classified as , i.e, the most popular tweets, which according to Figure 8, are viral tweets. This observation corroborates with Wang . (2013), who state that viral tweets spread across the communities.

8 Final Remarks and Future Works

This paper presents a thorough analysis of the structure and topology of networks constructed from political discussions on Twitter and, in particular, discusses networks composed by tweets expressing reactions to the 2018 Brazilian presidential elections.

These networks are scale-free and a minority of their users are responsible for receiving most “retweets” and “mentions”, while effectively posting few tweets. The evolution of the networks along time showed that new users are more likely to interact with the most popular users. We verified that the study by Weng . (2013) on the spreading patterns of hashtags was valid for the spreading of retweets on a political network regarding the 2018 Brazilian presidential elections: most tweets were retweeted by users from within the community of the user who originally posted the tweet but the minority of tweets is retweeted from several different communities and spread like viruses.

The most popular users, although in smaller number, dictate the popularity of the communities and the higher their popularity, the greater their influence on other communities. These users are also responsible for posting viral tweets.

Moreover, we studied the sentiment related to the most frequent words in the tweets with content about the 2018 Brazilian presidential elections and discovered that, overall, words associated with positive sentiments are predominant. The words that contribute most to the positive sentiment of the communities are the same and appear in viral tweets.

As a future application of this work, we intend to further investigate and classify tweets from larger databases, in particular, reactions to the 2018 Truckers’ Strike and to the 2018 World Cup. To deal with the large scale networks, we plan to design distributed algorithms. In addition, the study of the evolution of the networks with respect to spreading patterns of the tweets might help us to understand how communities are formed and the proliferation of tweets along time. Finally, the observation that viral tweets do not affect the community structure of the networks and that non-viral tweets get trapped within the communities could be explored through community detection algorithms in order to provide better communities.

Acknowledgments

The authors would like to acknowledge the financial support provided by São Paulo Research Foundation (FAPESP): Grant Numbers 2016/22688-2; 2017/24185-0 and 2015/21660-4; Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq): Grant Number 306036/2018-5; and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. The authors are grateful to Professor Ana Carolina Lorena (Instituto Tecnológico de Aeronáutica) for providing a thoughtful review of the first draft of this paper. The last author would also like to thank Leonardo V. Rosset for giving her a hand.

References

  • Ahmadi . (2018) Ahmadi2018Ahmadi, P., Gholampour, I.  Tabandeh, M.  2018. Cluster-based sparse topical coding for topic mining and document clustering Cluster-based sparse topical coding for topic mining and document clustering. Advances in Data Analysis and Classification123537–558.
  • Blondel . (2008) Blondel2008Blondel, VD., Guillaume, JL., Lambiotte, R.  Lefebvre, E.  2008. Fast unfolding of communities in large networks Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment200810P10008.
  • Bruns  Stieglitz (2013) Bruns2013Bruns, A.  Stieglitz, S.  2013mar. Towards more systematic Twitter analysis: metrics for tweeting activities Towards more systematic Twitter analysis: metrics for tweeting activities. International Journal of Social Research Methodology16291–108.
  • Cambria . (2013) cambria2013newCambria, E., Schuller, B., Xia, Y.  Havasi, C.  2013. New avenues in opinion mining and sentiment analysis New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems28215–21.
  • Conover . (2011) Conover2011_predictingConover, MD., Goncalves, B., Ratkiewicz, J., Flammini, A.  Menczer, F.  2011Oct. Predicting the Political Alignment of Twitter Users Predicting the political alignment of twitter users. 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing ( 192–199).
  • Csardi . (2006) igraphCsardi, G., Nepusz, T. .  2006. The igraph software package for complex network research The igraph software package for complex network research. InterJournal, Complex Systems169551–9.
  • Ding  Liu (2007) ding2007utilityDing, X.  Liu, B.  2007. The utility of linguistic rules in opinion mining The utility of linguistic rules in opinion mining. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( 811–812).
  • Francisquini . (2017) Francisquini2017Francisquini, R., Rosset, V.  Nascimento, MCV.  2017.

    GA-LP: A genetic algorithm based on Label Propagation to detect communities in directed networks GA-LP: A genetic algorithm based on label propagation to detect communities in directed networks.

    Expert Systems with Applications74127–138.
  • Gaul  Vincent (2017) Gaul2017Gaul, W.  Vincent, D.  2017. Evaluation of the evolution of relationships between topics over time Evaluation of the evolution of relationships between topics over time. Advances in Data Analysis and Classification111159–178.
  • Golbeck . (2010) Golbeck2010Golbeck, J., Grimes, JM.  Rogers, A.  2010. Twitter use by the U.S. Congress Twitter use by the U.S. congress. Journal of the Association for Information Science and Technology6181612–1621.
  • Kastrenakes (2019) vergeKastrenakes, J.  2019. Twitter keeps losing monthly users, so it’s going to stop sharing how many. Twitter keeps losing monthly users, so it’s going to stop sharing how many. The verge. https://www.theverge.com/2019/2/7/18213567/twitter-to-stop-sharing-mau-as-users-decline-q4-2018-earnings
  • Lancichinetti . (2011) Lancichinetti2011_oslomLancichinetti, A., Radicchi, F., Ramasco, JJ., Fortunato, S. .  2011. Finding Statistically Significant Communities in Networks Finding statistically significant communities in networks. PLOS ONE641-18.
  • Larsson  Moe (2012) larsson2012studyingLarsson, AO.  Moe, H.  2012. Studying political microblogging: Twitter users in the 2010 Swedish election campaign Studying political microblogging: Twitter users in the 2010 swedish election campaign. New Media & Society145729–747.
  • Leicht  Newman (2008) Leicht2008Leicht, EA.  Newman, ME.  2008. Community structure in directed networks Community structure in directed networks. Physical Review Letters10011118703.
  • Makazhanov . (2014) Makazhanov2014Makazhanov, A., Rafiei, D.  Waqar, M.  2014. Predicting political preference of Twitter users Predicting political preference of twitter users. Social Network Analysis and Mining41193.
  • Newman (2004) Newman2004Newman, ME.  2004. Analysis of weighted networks Analysis of weighted networks. Physical Review E705056131.
  • Newman (2006) Newman2006Newman, ME.  2006. Finding community structure in networks using the eigenvectors of matrices Finding community structure in networks using the eigenvectors of matrices. Physical Review E743036104.
  • Newman  Girvan (2004) Newman2004cNewman, ME.  Girvan, M.  2004. Finding and evaluating community structure in networks Finding and evaluating community structure in networks. Physical Review E692026113.
  • Odilla (2018) bbc_article_electionOdilla, F.  2018. Eleições 2018: Os pré-candidatos à Presidência e quais dificuldades têm de superar até a campanha. Eleições 2018: Os pré-candidatos à presidência e quais dificuldades têm de superar até a campanha. http://www.bbc.com/portuguese/brasil-42313908. Accessed on 10 May 2018
  • Pak  Paroubek (2010) pak2010twitterPak, A.  Paroubek, P.  2010. Twitter as a corpus for sentiment analysis and opinion mining. Twitter as a corpus for sentiment analysis and opinion mining. LREc LREc ( 10,  1320–1326).
  • Pennacchiotti  Popescu (2011) Pennacchiotti2011_democratsPennacchiotti, M.  Popescu, AM.  2011. Democrats, republicans and starbucks afficionados: user classification in twitter Democrats, republicans and starbucks afficionados: user classification in twitter. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ( 430–438).
  • Pons  Latapy (2005) Pons2005Pons, P.  Latapy, M.  2005. Computing communities in large networks using random walks Computing communities in large networks using random walks. International Symposium on Computer and Information Sciences International Symposium on Computer and Information Sciences ( 284–293).
  • Raghavan . (2007) Raghavan2007Raghavan, UN., Albert, R.  Kumara, S.  2007. Near linear time algorithm to detect community structures in large-scale networks Near linear time algorithm to detect community structures in large-scale networks. Physical Review E763036106.
  • Raudys . (1991) Raudys1991Raudys, SJ., Jain, AK. .  1991.

    Small sample size effects in statistical pattern recognition: Recommendations for practitioners Small sample size effects in statistical pattern recognition: Recommendations for practitioners.

    IEEE Transactions on Pattern Analysis and Machine Intelligence133252–264.
  • Rinker (2016) sentimentrRinker, T.  2016. Package ‘sentimentr’. Package ‘sentimentr’. https://cloud.r-project.org/web/packages/sentimentr/sentimentr.pdf
  • Rosvall  Bergstrom (2007) Rosvall2007Rosvall, M.  Bergstrom, CT.  2007. An information-theoretic framework for resolving community structure in complex networks An information-theoretic framework for resolving community structure in complex networks. Proceedings of the National Academy of Sciences104187327–7331.
  • Santos . (2016) Santos2016aSantos, CP., Carvalho, DM.  Nascimento, MCV.  2016. A consensus graph clustering algorithm for directed networks A consensus graph clustering algorithm for directed networks. Expert Systems with Applications54121–135.
  • Shannon (1948) Shannon1948Shannon, CE.  1948. A mathematical theory of communication A mathematical theory of communication. Bell System Technical Journal273379–423.
  • Sluban . (2015) Sluban2015Sluban, B., Smailović, J., Battiston, S.  Mozetič, I.  2015Jul14. Sentiment leaning of influential communities in social networks Sentiment leaning of influential communities in social networks. Computational Social Networks219.
  • Sun . (2017) Sun2017Sun, S., Luo, C.  Chen, J.  2017. A review of natural language processing techniques for opinion mining systems A review of natural language processing techniques for opinion mining systems. Information Fusion3610–25.
  • Surian . (2016) Surian2016Surian, D., Nguyen, DQ., Kennedy, G., Johnson, M., Coiera, E.  Dunn, AG.  2016. Characterizing Twitter discussions about HPV vaccines using topic modeling and community detection Characterizing twitter discussions about HPV vaccines using topic modeling and community detection. Journal of Medical Internet Research188e232.
  • Tang . (2014) Tang2014Tang, J., Chang, Y.  Liu, H.  2014. Mining social media with social theories: a survey Mining social media with social theories: a survey. ACM SIGKDD Explorations Newsletter15220–29.
  • Tedjamulia . (2005) Tedjamulia2005Tedjamulia, SJJ., Dean, DL., Olsen, DR.  Albrecht, CC.  2005. Motivating Content Contributions to Online Communities: Toward a More Comprehensive Theory Motivating content contributions to online communities: Toward a more comprehensive theory. Proceedings of the 38th Annual Hawaii International Conference on System Sciences Proceedings of the 38th Annual Hawaii International Conference on System Sciences ( 193b–193b).
  • Twitter (2018) twitterapiTwitter.  2018. Twitter Streaming API. Twitter streaming API. {https://developer.twitter.com/en/docs/tweets/filter-realtime/overview} Accessed in 15 April 2018
  • Wang . (2013) Wang2013Wang, X., Qian, B., Ye, J.  Davidson, I.  2013.

    Multi-objective multi-view spectral clustering via pareto optimization Multi-objective multi-view spectral clustering via pareto optimization.

    Proceedings of the 2013 SIAM International Conference on Data Mining Proceedings of the 2013 SIAM International Conference on Data Mining ( 234–242).
  • Wang . (2011) wang2011topicWang, X., Wei, F., Liu, X., Zhou, M.  Zhang, M.  2011. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. Proceedings of the 20th ACM International Conference on Information and Knowledge Management Proceedings of the 20th ACM International Conference on Information and Knowledge Management ( 1031–1040).
  • Weller . (2013) Weller2014Weller, K., Bruns, A., Burgess, J., Mahrt, M.  Puschmann, C.  2013. Twitter and Society Twitter and society. Bern, SwitzerlandPeter Lang.
  • Weng . (2013) Weng2013Weng, L., Menczer, F.  Ahn, YY.  2013. Virality Prediction and Community Structure in Social Networks Virality Prediction and Community Structure in Social Networks. Scientific Reports32522.
  • Yang . (2016) Yang2016Yang, Z., Algesheimer, R.  Tessone, CJ.  2016. A comparative analysis of community detection algorithms on artificial networks A comparative analysis of community detection algorithms on artificial networks. Scientific Reports630750.
  • Yardi  Boyd (2010) Yardi2010Yardi, S.  Boyd, D.  2010. Dynamic debates: An analysis of group polarization over time on twitter Dynamic debates: An analysis of group polarization over time on twitter. Bulletin of Science, Technology & Society305316–327.