The Impact of Bots on Opinions in Social Networks

10/29/2018
by   Zakaria el Hjouji, et al.
MIT
0

We present an analysis of the impact of automated accounts, or bots, on opinions in a social network. We model the opinions using a variant of the famous DeGroot model, which connects opinions with network structure. We find a strong correlation between opinions based on this network model and based on the tweets of Twitter users discussing the 2016 U.S. presidential election between Hillary Clinton and Donald Trump, providing evidence supporting the validity of the model. We then utilize the network model to predict what the opinions would have been if the network did not contain any bots which may be trying to manipulate opinions. Using a bot detection algorithm, we identify bot accounts which comprise less than 1 twice as many bots supporting Donald Trump as there are supporting Hillary Clinton. We remove the bots from the network and recalculate the opinions using the network model. We find that the bots produce a significant shift in the opinions, with the Clinton bots producing almost twice as large a change as the Trump bots, despite being fewer in number. Analysis of the bot behavior reveals that the large shift is due to the fact that the bots post one hundred times more frequently than humans. The asymmetry in the opinion shift is due to the fact that the Clinton bots post 50 results suggest a small number of highly active bots in a social network can have a disproportionate impact on opinions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 19

10/17/2015

A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks

Analysis of opinion dynamics in social networks plays an important role ...
10/26/2018

Negative Representation and Instability in Democratic Elections

Motivated by the troubling rise of political extremism and instability t...
11/02/2016

Structure vs. Language: Investigating the Multi-factors of Asymmetric Opinions on Online Social Interrelationship with a Case Study

Though current researches often study the properties of online social re...
10/12/2019

Deep Learning for Predicting Dynamic Uncertain Opinions in Network Data

Subjective Logic (SL) is one of well-known belief models that can explic...
02/08/2021

SONIC: SOcial Network with Influencers and Communities

The integration of social media characteristics into an econometric fram...
01/21/2016

Active Sensing of Social Networks

This paper develops an active sensing method to estimate the relative we...
11/02/2016

Measuring Asymmetric Opinions on Online Social Interrelationship with Language and Network Features

Instead of studying the properties of social relationship from an object...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Social networks have given us the ability to spread messages and influence large populations very easily. Malicious actors can take advantage of social networks to manipulate opinions using artificial accounts, or bots. It is suspected that the 2016 U.S. presidential election was the victim of such social network interference, potentially by foreign actors (Parlapiano and Lee 2018). Foreign influence bots are also suspected of having attacked European elections (Ferrara 2017). The bots main action was the sharing of politically polarized content in an effort to shift opinions (Shane 2018). The potential threat to election security from social networks has become a concern for the U.S. government. Members of Congress have not been satisfied with the response of major social networks (Fandos and Shane 2017) and have asked them to take actions to prevent future interference in the U.S. democratic process by foreign actors (Price 2018). In response, major social media companies have taken serious steps. Facebook has identified several pages and accounts tied to foreign actors (O’Sullivan and Herb 2018) and Twitter suspended over 70 million bot accounts (Timberg and Dwoskin 2018).

Despite all of the efforts taken to counter the threat posed by bots, one important question remains unanswered: how many people were impacted by these influence campaigns. More generally, how can we quantify the effect of bots on the opinions of users in a social network? Answering this question would allow one to assess the potential threat of an influence campaign. Also, it would allow one to test the efficacy of different responses to the threat. Studies have looked at the volume of content produced by bots and their social network reach during the 2016 election (Bessi and Ferrara 2016). However, this data alone does not indicate the effectiveness of the bots in shifting opinions. The challenge is we do not know what would have happened if the bots had not been there. Such a counterfactual analysis is only possible if there is a model which can predict the opinions of users in the presence or absence of bots. Such models do exist in the literature (DeGroot 1974, Ghaderi and Srikant 2013, Hunter and Zaman 2018). However, for a model to be useful in assessing the impact of bots, it must be validated on real social network data. Once validated, an opinion model can then be used to assess the impact of different groups of bots.

1.1 Our Contribution

In this work we present a method to quantify the impact of bots on the opinions of users in a social network. We focus our analysis on a network of Twitter users discussing the 2016 presidential election between Hillary Clinton and Donald Trump. To core of our approach is a model for opinion dynamics in a social network. First, we validate the model by showing that the user opinions predicted by the model align with the opinions of these users’ based on their social media posts. Second, we identify bots in the network using a state of the art algorithm. Third, we use the opinion model to calculate how the opinions shift when we remove the bots from the network. Our high level finding is that a small number of bots have a disproportionate impact on the network opinions, and this impact is primarily due to their elevated activity levels. In our dataset, we find that the bots which support Clinton cause a bigger shift in opinions than the bots which support Trump, even though there are more Trump bots in the network.

2 Literature Review

A detailed study of bots in the 2016 presidential election was conducted by Bessi and Ferrara (2016). The authors found a large fraction of the election discussion came from bots and the bots were connected to many users. Similar conclusions were reached for bots deployed in the run up to the Brexit vote (Bastos and Mercea 2017) and French elections (Ferrara 2017). A comprehensive survey of social bots is provided in (Ferrara et al. 2016).

The detection of bots is an active area of research. Algorithms which use predictive features of individual users have been developed (Davis et al. 2016, Chu et al. 2012, Benevenuto et al. 2010, Wang 2010, Egele et al. 2013, Viswanath et al. 2014, Thomas et al. 2011), but require extensive data about the users. Another approach utilizes network structure to identify coordinated groups of bots, also referred to as sybils (Benevenuto et al. 2009, Aggarwal 2014, Cao et al. 2014, Yang et al. 2014, Ghosh et al. 2012, Tran et al. 2009, Yu et al. 2008, Danezis and Mittal 2009, Yu et al. 2006, Wang et al. 2013, Alvisi et al. 2013, Cao et al. 2012, Mesnards and Zaman 2018). The network based approaches look for anomalous behavior in network interactions. They have the advantage of requiring less data about the users and also being able to simultaneously detect multiple bots, unlike the feature based approaches.

Bots are designed to shift opinions. A variety of models have been developed to quantify such opinion shifts in networks. One of the earliest is the DeGroot model (DeGroot 1974) where users’ opinions equal the weighted average of their neighbors’ opinions. This model has a similar flavor to many distributed consensus algorithms (Tsitsiklis 1984, Tsitsiklis et al. 1986, Olshevsky and Tsitsiklis 2009, Jadbabaie et al. 2003), as the goal of each user reach consensus with his neighbors. Related to the DeGroot model is the voter model (Clifford and Sudbury 1973, Holley and Liggett 1975) where users update their opinions to match a randomly chosen neighbor. There is a large body of theoretical research concerning the limiting behavior in the voter model (Cox and Griffeath 1986, Gray 1986, Krapivsky 1992, Liggett 2012, Sood and Redner 2005)

. Another class of models take a Bayesian perspective on how opinions evolve, where each message a user posts causes his neighbors to update their belief according to Bayes’ theorem

(Bikhchandani et al. 1992, Banerjee and Fudenberg 2004, Acemoglu et al. 2011, Banerjee 1992, Jackson 2010).

Some opinion models allow for stubborn users whose opinions do not evolve. The notion of stubborn users was introduced by Mobilia (2003). Analysis has been done of the impact of stubborn agents in various opinion models (Galam and Jacobs 2007, Wu and Huberman 2004, Chinellato et al. 2015, Mobilia et al. 2007, Yildiz et al. 2013, Acemoğlu et al. 2013, Ghaderi and Srikant 2013). The model in Hunter and Zaman (2018) is similar in flavor to the DeGroot model, but allows for users to grow stubborn with time. Common to all of these models is an opinion equilibrium where the non-stubborn users’ opinions are determined by the stubborn users.

3 Network Opinion Model

We consider users in a directed social network where each user follows a set of individuals, which we refer to as his friends. The user can see any social media content posted by his friends. To model the opinions in a social network we utilize the model proposed by Hunter and Zaman (2018) which is a variant of the classic DeGroot model (DeGroot 1974). The model assumes that individuals in a social network update their opinions based upon the opinions contained in the social media posts of their friends. Another assumption of the model is that certain users in the network are stubborn, meaning that their opinions do not change. These stubborn users end up driving the opinions of the rest of the users in the network.

Let us define the opinion of a user in the network as . We assume the opinions are between zero and one. As an example, in the 2016 U.S. presidential election, an opinion of zero indicates strong support for Hillary Clinton while an opinion of one indicates strong support for Donald Trump. Let us also define the posting rate of user as . The posting rate is in units of posts per day, and is easily measured. The network opinion model predicts that in equilibrium the opinion of a non-stubborn user satisfies the following condition:

(1)

This equilibrium condition has a natural interpretation in terms of electrical circuits. One can view the user opinions as the voltages of nodes in an electrical circuit and the posting rate as the conductance between pairs of nodes connected by a resistor. Using this analogy, and Ohm’s Law from circuit theory, the quantity in the summation is equal to the electrical current flowing into a user from all of his friends. The equilibrium condition simply states that the total current flowing into each user must be zero. This same equilibrium and circuit interpretation was obtained for a very similar opinion model by Ghaderi and Srikant (2013).

Equation (1) has a unique solution as long as each non-stubborn user can be reached by at least one stubborn user. This means that in equilibrium, the non-stubborn user opinions are determined by the stubborn users. Therefore, if the reachability condition is satisfied, then once the stubborn user opinions are known, the non-stubborn user opinions can be calculated using equation (1).

4 Social Network Data

To utilize the network opinion model for our analysis, we first need to confirm that the opinions it predicts align with the true opinions of users in the social network. The dataset we use to validate the model consists of Twitter users discussing the second debate of the 2016 presidential election. The selected users were those who had at least one post, or tweet, which mentioned the second debate. We then collected all tweets of these users which were related to the 2016 election. Details on the dataset are provided in Littman et al. (2016). In total this dataset consists of over 2.3 million tweets belonging to 77,563 users. In addition, we were able to build the follower graph for these users using the Twitter API (Twitter 2018) which contains over 5.0 million edges.

For each user we want to calcuate their opinion with respect to the two presidential candidates. We assume each user’s opinion is between zero and one, with a zero or one indicating strong support for Clinton or Trump, respectively. The opinion of a user is manifested in the content of their tweets. Therefore, we use the tweets to estimate each user’s opinion. We did this by training a neural network on the tweets’ opinions.

We first needed a set of tweets with opinions labeled to serve as training data for the neural network. We did this by identifying several extremely politically polarized hashtags, such as #ImWithHer or #LockHerUp. We manually labeled these hashtags as either pro-Clinton or pro-Trump. Anyone using these hashtags is given a similar label. Then, all tweets in the dataset belonging to any pro-Clinton users are given an opinion of zero, and all tweets of pro-Trump users are given an opinion of one. This process produced 200,000 labeled tweets which served as our training data.

Next, we trained a neural network on the text of these tweets to predict their opinion. The full details of the neural network are provided in the appendix. We train on 80% of the labeled data and tested on the remaining fraction. The neural network achieves an accuracy of 92% on the test set, indicating that it has good accuracy in predicting the opinions.

We then used the neural network to calculate tweet opinions for all users in our dataset. For each user we first find the neural network opinion of each of their tweets. We then average these values to obtain the user’s opinion. Looking at the distribution of the tweet opinions reveals some interesting properties about the data. For the 77,563 users in our follower graph, the mean opinion is 0.38, indicating a bias towards Clinton. Looking more closely at the opinion distribution, we find that there are 57,781 users whose opinion is below 0.5 (pro-Clinton) and 19,782 users whose opinion is above 0.5 (pro-Trump). It is not clear why this bias exists in the data. It is possible that pro-Clinton users were more likely to use the hashtags and keywords that were used to collect the tweets for the dataset.

Figure 1: Visualization of the network of Twitter users discussing the second 2016 presidential debate. Node sizes are proportional to their follower-count in the network and node colors indicate their tweet based opinion. Nodes favoring Trump are red and nodes favoring Clinton are blue.

5 Model Validation

We assessed the validity of the network opinion model in Hunter and Zaman (2018) based on its ability to reproduce the tweet based opinions of users in our dataset. We show a visualization of the follower network of these users in Figure 1. To calculate the network based opinions, we first determine which users are stubborn. We do this by setting lower and upper opinion intervals for Clinton and Trump supporters. Any user whose tweet opinion falls within either of these intervals is declared stubborn. We set the network based opinions of these stubborn users equal to their tweet based opinions. For the posting rates, we use the number of tweets of each user in the dataset. This is a good estimate of the rates since the data covers the same period of time for each user.

We set the Clinton and Trump stubborn threshold intervals to be and , respectively. Using these stubborn intervals, we have 69,861 non-stubborn users and 7,702 stubborn users. The mean opinion of the stubborn users is 0.23, indicating that there is a greater number of stubborn users supporting Clinton than Trump. In total, 6,147 users have opinions in and 1,555 users have opinions in . We then solve the system of equations given by (1) to obtain the network opinions of the non-stubborn users.

To see how well the network model predicts the opinions, we look at the correlation between the tweet and network based opinions. We find that the correlation coefficient is equal to 0.43 (p-value ). Thus, we find that there is a non-trivial relationship between the tweet and network based opinions. We plot the two types of opinions for the non-stubborn users in Figure 2

. As can be seen, the network opinions have a similar mean as the tweet based opinions, but a lower variation. The mean tweet based opinion of the non-stubborn users is 0.40 and the mean network based opinion is 0.42. The standard deviation of the tweet based opinions is 0.19 and the standard deviation of the network based opinions is 0.04. These differences may be due to properties of the network model. The equilibrium equation (

1) states that the opinions adjust to create a zero net opinion “current” flow into each non-stubborn user. This makes the opinions come closer to each other. The net result is that the variation of non-stubborn opinions across the network is reduced.

Despite this difference in opinion variation, we still obtain a non-trivial correlation between the network model’s opinions and the tweet based opinions. Also, the mean opinions are very close. This indicates that the network model is useful for capturing the impact of stubborn users on the mean opinion of a population of non-stubborn users. For this reason, the network model will be sufficient for our analysis of how bots shift the mean opinion.

Figure 2: Scatter plot of non-stubborn user tweet based opinions and network model based opinions. The stubborn intervals are and . The correlation coefficient for the tweet and network based opinions is 0.45 (p-value ).

To make sure the network opinions were robust to the choice of stubborn interval, we recalculated the opinions using several different intervals. Overall we found that the opinions were highly correlated and did not change significantly. This indicates that our results are robust to the choice of stubborn intervals. We summarize the robustness results in Table 1.

Lower stubborn Upper stubborn Minimum correlation Mean correlation
interval interval coefficient coefficient
0.71 0.85
0.69 0.78
0.64 0.80
0.74 0.85
0.74 0.81
0.66 0.80
0.64 0.76
Table 1: Statistics of the correlation coefficient between the non-stubborn opinions for different stubborn intervals. The mean and minimum correlation coefficient for each stubborn interval with all other stubborn intervals are shown.

6 Bot Impact on Opinions

Bots are automated accounts which most likely are not influenced by content in the social network, and so must be stubborn. We now investigate the effect social network bots have on user opinions. We do this by first identifying the bots, and then calculating the opinion distribution using the network model when we remove the bots. To identify the bots, we use the algorithm of Mesnards and Zaman (2018). This algorithm looks for groups of accounts that retweet others often (a retweet is a forwarded tweet) but do not receive many retweets in the interaction network related to an event. In our case, we use the network of retweets related to the second presidential debate. We chose this algorithm because it was shown to have a higher accuracy than other state of the art algorithms and also required less data.

The bot detection algorithm assigns a likelihood of being a bot between zero and one. We declared any user whose likelihood was greater than or equal to 0.51 as a bot. However, we found on our dataset that the algorithm identified several humans who retweeted frequently but were not retweeted themselves as bots. Therefore, to be even more careful with our bot detection, we further refined the set of bots by eliminating any users that had verified Twitter accounts (a verified account has the user’s identity confirmed by Twitter). We also manually inspected the accounts of the users who had the top 30 follower counts and removed anyone who appeared not to be a bot. At the end of this process, we identified 396 bots, of which 260 supported Trump (tweet based opinion greater than 0.5) and 136 supported Clinton (tweet based opinion less than 0.5). These bots represent less than 0.5% of the users in our dataset.

When the bots are included, they are considered stubborn users, even if their tweet opinion is not within our stubborn thresholds of and . We show a histogram of the bot opinions in Figure 3. It can be seen that most bots do not fall within our stubborn threshold. Therefore, despite not being persuadable, most of the bots would not be considered stubborn based on their tweet opinion alone. It is possible these bots are designed to sometimes post less polarized content in an attempt to appear more human.

We calculated the network model opinions for four different scenarios: no bots, only Trump bots, only Clinton bots, and all bots. The resulting opinion cumulative distributions for the different scenarios are shown in Figure 4. We see some striking features here. First, when all bots are included, the opinions shift towards Clinton. The median opinion goes from 0.58 with no bots to 0.42 with both Trump and Clinton bots. This indicates an asymmetry in the influence of the bots. This is more striking given the fact that there are nearly twice as many Trump bots as Clinton bots. To emphasize this asymmetry, one can look at the shift caused by each group of bots operating without the opposing bots. The Clinton bots alone create a huge shift from 0.58 to 0.26, while the Trump bots alone only cause a shift from 0.58 to 0.76.

Figure 3: Histogram of the bots’ opinions.
Figure 4: Plot of cumulative distribution of non-stubborn user opinions based on the network opinion model with different types of bots removed.

Our results indicate that the Clinton bots are more effective than the Trump bots at shifting opinions. To understand why, we looked closer at these two groups of bots. We focus on two hypotheses. One is that the Clinton bots are followed by more people, and therefore have greater reach. The second is that the Clinton bots post content more frequently, allowing them to be more effective in their influence. According to the network opinion model, either of these conditions will lead to the observed asymmetry. This can be seen by looking at equation (1). To satisfy the equilibrium condition, non-stubborn users will shift their opinion towards those with a higher posting rate or more followers.

We first consider the friend and follower count of the bots and non-bots in the entire Twitter network. For the non-bots, we separate the users into pro-Trump and pro-Clinton using their tweet based opinions. We choose the tweet based opinions as our separation criterion because this is a more accurate measure of opinion than the network model which distorts the variance of the opinion distribution. We show the resulting cumulative distributions in Figure

5. We see that both Clinton and Trump bots are more connected than the non-bot users. We also see that there is no visible difference between these distributions for the two groups of bots. This is confirmed by a Kolmogorov-Smirnov (KS) test (no difference at the 1% level). Therefore, the connectivity of the bots does not seem to be the cause of the asymmetry.

Next, we consider at the posting rate. Recall that this is the number of tweets in our dataset posted by each user that are relevant to the election. Figure 6 shows the cumulative distribution of the rates for the bots and non-bots. The first observation is that the bots’ median posting rate is nearly two orders of magnitude larger than the non-bots. This would seem to explain why the model predicts that their removal creates such a large opinion shift, despite their small number. Second, it appears the Clinton bots post at a higher rate than the Trump bots. Not only is the median rate 50% higher for the Clinton bots, but their distribution has a fatter tail than the Trump bots. This means there are several Clinton bots that post at very high rates. A KS test shows that the Clinton and Trump bot rates have different distributions (significant at the 1% level). Third, there is no clear difference between the rate distribution for the non-bot Clinton and Trump supporters. The median rate is five and six for the Trump and Clinton supporters, respectively.

As a second test, we recalculated the network opinions, but this time we gave every user in the network the same posting rate. This would eliminate any advantage the Clinton bots had due to higher activity levels. The resulting opinion distributions are shown in Figure 7. As can be seen, when the rates are equal, the bots ability to shift opinions is reduced considerably. The Clinton bots alone cause almost no opinion shift. The inclusion of all bots shifts the median opinion slightly towards Trump. When the rates are equalized, the Trump bots have an advantage, most likely due to the fact that there are twice as many of them as Clinton bots. Therefore it seems that the asymmetry between the influence of Clinton and Trump bots is due to the Clinton bots’ higher posting rate.

Another interesting observation about the uniform rates model is that the median opinion shifts towards Clinton compared to the model with the true posting rates. Recall that the rate distributions of Trump and Clinton supporters were very similar. This suggests that there are some Trump supporters with high posting rates who have an effective position in the network for influencing others.

Figure 5: Plot of cumulative distribution of the bots’ and non-bots’ friend and follower counts in the entire Twitter network.
Figure 6: Plot of cumulative distribution of the bots’ and non-bots’ posting rate.
Figure 7: Plot of cumulative distribution of non-stubborn user opinions based on the network opinion model with different types of bots removed. All users in the network are give uniform rates.

7 Conclusion

It is important to understand the impact of bots on the opinions of users in social networks. The method we presented here shows how opinion dynamics models can be used to accomplish this. Validation of the model was done by comparing its predictions to those based on the content of users. Then we were able to evaluate different scenarios by removing bots from the network. We found that the bots had a disproportionate effect on the opinions. Also, we found that the Clinton bots had a much greater effect on opinions than Trump bots, and that this asymmetry was due to their increased posting rate. Effectively, the bots shout louder than the humans, and the Clinton bots were shouting louder than the Trump bots. Our results suggest that a small number of highly active bots could be sufficient to significantly shift opinions in a social network. Therefore, it is important to limit the presence of bots on major social networks as they can distort the public discourse on many important issues.

References

  • Acemoglu et al. (2011) Daron Acemoglu, Munther A Dahleh, Ilan Lobel, and Asuman Ozdaglar. Bayesian learning in social networks. The Review of Economic Studies, 78(4):1201–1236, 2011.
  • Acemoğlu et al. (2013) Daron Acemoğlu, Giacomo Como, Fabio Fagnani, and Asuman Ozdaglar. Opinion fluctuations and disagreement in social networks. Mathematics of Operations Research, 38(1):1–27, 2013.
  • Aggarwal (2014) Charu C Aggarwal. Data classification: algorithms and applications. CRC Press, 2014.
  • Alvisi et al. (2013) Lorenzo Alvisi, Allen Clement, Alessandro Epasto, Silvio Lattanzi, and Alessandro Panconesi. Sok: The evolution of sybil defense via social networks. In Security and Privacy (SP), 2013 IEEE Symposium on, pages 382–396. IEEE, 2013.
  • Banerjee and Fudenberg (2004) Abhijit Banerjee and Drew Fudenberg. Word-of-mouth learning. Games and Economic Behavior, 46(1):1–22, 2004.
  • Banerjee (1992) Abhijit V Banerjee. A simple model of herd behavior. The quarterly journal of economics, 107(3):797–817, 1992.
  • Bastos and Mercea (2017) Marco T Bastos and Dan Mercea. The brexit botnet and user-generated hyperpartisan news. Social Science Computer Review, page 0894439317734157, 2017.
  • Benevenuto et al. (2009) Fabricio Benevenuto, Tiago Rodrigues, Virgilio Almeida, Jussara Almeida, and Marcos Goncalves. Detecting spammers and content promoters in online video social networks. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 620–627. ACM, 2009.
  • Benevenuto et al. (2010) Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.
  • Bessi and Ferrara (2016) Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 us presidential election online discussion. 2016.
  • Bikhchandani et al. (1992) Sushil Bikhchandani, David Hirshleifer, and Ivo Welch. A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of political Economy, 100(5):992–1026, 1992.
  • Cao et al. (2012) Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pages 15–15. USENIX Association, 2012.
  • Cao et al. (2014) Qiang Cao, Xiaowei Yang, Jieqi Yu, and Christopher Palow. Uncovering large groups of active malicious accounts in online social networks. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pages 477–488. ACM, 2014.
  • Chinellato et al. (2015) David D Chinellato, Irving R Epstein, Dan Braha, Yaneer Bar-Yam, and Marcus AM de Aguiar. Dynamical response of networks under external perturbations: exact results. Journal of Statistical Physics, 159(2):221–230, 2015.
  • Chu et al. (2012) Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing, 9(6):811–824, 2012.
  • Clifford and Sudbury (1973) Peter Clifford and Aidan Sudbury. A model for spatial conflict. Biometrika, 60(3):581–588, 1973.
  • Cox and Griffeath (1986) J Theodore Cox and David Griffeath. Diffusive clustering in the two dimensional voter model.

    The Annals of Probability

    , pages 347–370, 1986.
  • Danezis and Mittal (2009) George Danezis and Prateek Mittal. Sybilinfer: Detecting sybil nodes using social networks. In NDSS, pages 1–15. San Diego, CA, 2009.
  • Davis et al. (2016) Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web, pages 273–274. International World Wide Web Conferences Steering Committee, 2016.
  • DeGroot (1974) Morris H DeGroot. Reaching a consensus. Journal of the American Statistical Association, 69(345):118–121, 1974.
  • Egele et al. (2013) Manuel Egele, Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. Compa: Detecting compromised accounts on social networks. In NDSS, 2013.
  • Fandos and Shane (2017) Nocholas Fandos and Scott Shane. Senator Berates Twitter Over ‘Inadequate’ Inquiry Into Russian Meddling . The New York Times, September 2017. URL https://www.nytimes.com/2017/09/28/us/politics/twitter-russia-interference-2016-election-investigation.html?mtrref=www.google.com.
  • Ferrara (2017) Emilio Ferrara. Disinformation and social bot operations in the run up to the 2017 french presidential election. 2017.
  • Ferrara et al. (2016) Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. The rise of social bots. Communications of the ACM, 59(7):96–104, 2016.
  • Galam and Jacobs (2007) Serge Galam and Frans Jacobs. The role of inflexible minorities in the breaking of democratic opinion dynamics. Physica A: Statistical Mechanics and its Applications, 381:366–376, 2007.
  • Ghaderi and Srikant (2013) Javad Ghaderi and R Srikant. Opinion dynamics in social networks: A local interaction game with stubborn agents. In American Control Conference (ACC), 2013, pages 1982–1987. IEEE, 2013.
  • Ghosh et al. (2012) Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma, Gautam Korlam, Fabricio Benevenuto, Niloy Ganguly, and Krishna Phani Gummadi. Understanding and combating link farming in the twitter social network. In Proceedings of the 21st international conference on World Wide Web, pages 61–70. ACM, 2012.
  • Goldberg and Levy (2014) Yoav Goldberg and Omer Levy. word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722, 2014.
  • Gray (1986) Lawrence Gray. Duality for general attractive spin systems with applications in one dimension. The Annals of Probability, pages 371–396, 1986.
  • Holley and Liggett (1975) Richard A Holley and Thomas M Liggett. Ergodic theorems for weakly interacting infinite systems and the voter model. The annals of probability, pages 643–663, 1975.
  • Hunter and Zaman (2018) D Scott Hunter and Tauhid Zaman. Opinion dynamics with stubborn agents. arXiv preprint arXiv:1806.11253, 2018.
  • Jackson (2010) Matthew O Jackson. Social and economic networks. Princeton university press, 2010.
  • Jadbabaie et al. (2003) Ali Jadbabaie, Jie Lin, and A Stephen Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on automatic control, 48(6):988–1001, 2003.
  • Kim (2014) Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
  • Krapivsky (1992) PL Krapivsky. Kinetics of monomer-monomer surface catalytic reactions. Physical Review A, 45(2):1067, 1992.
  • Liggett (2012) Thomas Milton Liggett. Interacting particle systems, volume 276. Springer Science & Business Media, 2012.
  • Littman et al. (2016) Justin Littman, Laura Wrubel, and Daniel Kerchner. 2016 United States presidential election tweet ids. https://doi.org/10.7910/DVN/PDI7IN, 2016.
  • Mesnards and Zaman (2018) Nicolas Guenon des Mesnards and Tauhid Zaman. Detecting influence campaigns in social networks using the ising model. arXiv preprint arXiv:1805.10244, 2018.
  • Mobilia (2003) Mauro Mobilia. Does a single zealot affect an infinite group of voters? Physical review letters, 91(2):028701, 2003.
  • Mobilia et al. (2007) Mauro Mobilia, A Petersen, and Sidney Redner. On the role of zealotry in the voter model. Journal of Statistical Mechanics: Theory and Experiment, 2007(08):P08029, 2007.
  • Olshevsky and Tsitsiklis (2009) Alex Olshevsky and John N Tsitsiklis. Convergence speed in distributed consensus and averaging. SIAM Journal on Control and Optimization, 48(1):33–55, 2009.
  • O’Sullivan and Herb (2018) Donie O’Sullivan and Jeremy Herb. Facebook takes down suspected Russian network of pages. CNN Tech, 2018. URL https://money.cnn.com/2018/07/31/technology/facebook-removes-pages/index.html.
  • Parlapiano and Lee (2018) Alicia Parlapiano and C. Lee, Jasmine. The Propaganda Tools Used by Russians to Influence the 2016 Election. The New York Times, February 2018. URL https://www.nytimes.com/interactive/2018/02/16/us/politics/russia-propaganda-election-2016.html.
  • Price (2018) Molly Price. Democrats urge Facebook and Twitter to probe Russian bots . CNET, January 2018. URL https://www.cnet.com/news/facebook-and-twitter-asked-again-to-investigate-russian-bots/.
  • (45) Python. Python Word Segmentation. http://www.grantjenks.com/docs/wordsegment/. Accessed: 2018-08-14.
  • Shane (2018) Scott Shane. How Unwitting Americans Encountered Russian Operatives Online. The New York Times, February 2018. URL https://www.nytimes.com/2018/02/18/us/politics/russian-operatives-facebook-twitter.html.
  • Sood and Redner (2005) Vishal Sood and Sidney Redner. Voter model on heterogeneous graphs. Physical review letters, 94(17):178701, 2005.
  • Thomas et al. (2011) Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. Suspended accounts in retrospect: an analysis of twitter spam. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, pages 243–258. ACM, 2011.
  • Timberg and Dwoskin (2018) Craig Timberg and Elizabeth Dwoskin. Twitter is sweeping out fake accounts like never before, putting user growth at risk. The Washington Post, 2018. URL https://www.washingtonpost.com/technology/2018/07/06/twitter-is-sweeping-out-fake-accounts-like-never-before-putting-user-growth-risk/?noredirect=on&utm_term=.97cf42cc0bed&wpisrc=al_technology__alert-economy--alert-tech&wpmk=1.
  • Tran et al. (2009) Dinh Nguyen Tran, Bonan Min, Jinyang Li, and Lakshminarayanan Subramanian. Sybil-resilient online content voting. In NSDI, volume 9, pages 15–28, 2009.
  • Tsitsiklis et al. (1986) John Tsitsiklis, Dimitri Bertsekas, and Michael Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE transactions on automatic control, 31(9):803–812, 1986.
  • Tsitsiklis (1984) John Nikolas Tsitsiklis. Problems in decentralized decision making and computation. Technical report, MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR INFORMATION AND DECISION SYSTEMS, 1984.
  • Twitter (2018) Twitter. Follow, search, and get users. https://developer.twitter.com/en/docs/accounts-and-users/follow-search-get-users/api-reference/get-friends-ids.html, 2018.
  • Viswanath et al. (2014) Bimal Viswanath, Muhammad Ahmad Bashir, Mark Crovella, Saikat Guha, Krishna P Gummadi, Balachander Krishnamurthy, and Alan Mislove. Towards detecting anomalous user behavior in online social networks. In USENIX Security Symposium, pages 223–238, 2014.
  • Wang (2010) Alex Hai Wang.

    Detecting spam bots in online social networking sites: a machine learning approach.

    In IFIP Annual Conference on Data and Applications Security and Privacy, pages 335–342. Springer, 2010.
  • Wang et al. (2013) Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, and Ben Y Zhao. You are how you click: Clickstream analysis for sybil detection. In USENIX Security Symposium, volume 9, pages 1–008, 2013.
  • Wu and Huberman (2004) Fang Wu and Bernardo A Huberman. Social structure and opinion formation. arXiv preprint cond-mat/0407252, 2004.
  • Yang et al. (2014) Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y Zhao, and Yafei Dai. Uncovering social network sybils in the wild. ACM Transactions on Knowledge Discovery from Data (TKDD), 8(1):2, 2014.
  • Yildiz et al. (2013) Ercan Yildiz, Asuman Ozdaglar, Daron Acemoglu, Amin Saberi, and Anna Scaglione. Binary opinion dynamics with stubborn agents. ACM Transactions on Economics and Computation, 1(4):19, 2013.
  • Yu et al. (2006) Haifeng Yu, Michael Kaminsky, Phillip B Gibbons, and Abraham Flaxman. Sybilguard: defending against sybil attacks via social networks. In ACM SIGCOMM Computer Communication Review, volume 36, pages 267–278. ACM, 2006.
  • Yu et al. (2008) Haifeng Yu, Phillip B Gibbons, Michael Kaminsky, and Feng Xiao. Sybillimit: A near-optimal social network defense against sybil attacks. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 3–17. IEEE, 2008.

8 Neural Network Details

We now present the details of the neural network used to calculate the tweet based opinions of the users. We used a convolutional neural network architecture with two channels for two versions of the same tweet. The model architecture was inspired by

Kim (2014). Their approach was to train a text classification model on two different word embeddings of the same text: one static channel comprised of embeddings using word2vec Goldberg and Levy (2014) and another channel which is the output of an embedding layer trained during back-propagation.

Each tweet goes through a processing phase where we remove punctuation, stopwords and convert it into a format that the model can process. Each processed tweet is then converted into two versions: one where hashtags are left as they are and another where hashtags have been split into actual words. For example, “I hope @candidate_x will be our next president #voteforcandidate_x #hatersgonnahate.” will be converted into two versions: The dirty version “I hope candidate_x will be our next president voteforcandidate_x hatersgonnahate” and the clean version “I hope candidate_x will be our next president vote for candidate_x haters gonna hate”. We do this in order to prevent our model from being a lazy learner by learning only from the hashtags. The hashtag splitting was done using the WordSegment library in Python (Python )

. The sequence length of the tweets has been set to 20 tokens (i.e. words). Any tweet with more than 20 tokens is truncated, while tweets with less than 20 tokens are padded with zeros.

We show the complete neural network architecture in Figure 8. Each version of the processed tweet goes through its own embedding layer (dimension dense embedding = 128) that will then output two separate channels, each of size (20, 128). Each channel will go through its own separate 32 1D-convolution filters

(kernel size = 3, stride = 1, padding = ‘valid’)

with a ‘ReLU’ activation, 1D max-pooling layers (pool size = 2) and a flattening layer. The resulting output is two (288,1) layers that we concatenate to form a (576,1) layer. This layer then goes through two fully connected layers with a ‘ReLU’ activation and 64 and 32 units, respectively. The final layer is a softmax layer that outputs the probability distribution over our labels (i.e. Clinton, Trump).

Figure 8: Convolutional neural network architecture used to learn tweet opinions with respect to Hillary Clinton and Donald Trump.