Bot identification and data preparation
In order to detect bots we use BotOrNot
, a general supervised learning system designed for detecting socialbot accounts on Twitter. It utilizes over 1,000 features such as user meta-data, social contacts, diffusion networks, content, sentiment, and temporal signatures. Based on evaluation on a large set of labeled accounts, BotOrNot is extremely accurate in distinguishing bots from humans accounts, with an Area Under the ROC Curve (AUC) of 94%.
When a twitter account is evaluated in BotOrNot, the output is a JSON file with several scores. As we are examining a corpus of tweets in Spanish we focus on language-independent classifiers, which show a large number of potential bot accounts. Surprisingly, combining the results of these language-independent classifiers is sufficient for detecting bots in Spanish. This suggests that simply discarding the language-dependent features of BotOrNot can yield to non-English bot detection. Further research should be done to validate the transferability of BotOrNot outside of English Twitter.
We streamed 20,854 tweets from Twitter’s API between 2016-08-19 15:06:17 and 2016-08-22 02:13:35. These tweets were generated by 9730 different users (see Figures 1 and 6 for the relation between humans and bots), and among them we have 12905 retweets. When a user (human or bot) generates a tweet, and this tweet can be retweeted by a bot or a human. Consequently, we find four possibilities: a tweet created by a human and retweeted by another human (H-H), created by a human and retweeted by a bot (H-B), created by bot and retweeted by human (B-H) or bot (B-B). In Figure 1 we show the evolution of #Tanhuato in the collection period. The percentages of accounts that are humans and those that are bots are shown in Figure 5.
In Figure 3
we show the bi-variate kernel decomposition estimates for pairwise combinations of the Friend, Network, and Temporal classifiers formBotOrNot. The regions towards the upper right hand corner correspond to areas where the bot scores are high. It can be clearly seen how the bot accounts naturally cluster. The final visualization of this analysis is presented in Figure 4
, where we now compute the kernel density estimate that incorporates the three classifiers Friend, Network, and Temporal. In this image the smaller cluster in the upper right corner is the region where the bots accumulate. This 3D image is formed by taking iso-surfaces obtained from the 3D kernel density estimate. Again, as in the 2D images, we can separate the bot accounts in a natural way, to isolate them for further analysis. Notice that these three classifiers are all non-language specific and this is the reason behind focusing on them instead of on the overall bot score produced byBotOrNot. Having identified the bots present in our sample, we can now understand how the appeared over the collection period, as shown in Figure 5.
Now that we have performed our bot analysis, we can analyze the bot and human Twitter network. In Figure 6(a) we see that the nodes with the highest betweenness centrality in the full retweet network are all human, except for two accounts that belong to bots. These bot accounts are in fact official news organizations @pictoline and @Pajaropolitico. Thus, by the betweenness centrality in the retweet network, human users constitute the shortest paths of dialogue. With the exception of the formal news bots, socialbots are not playing an active role in the retweet network.
Figure 6(b) shows the number of retweets by each user (measure of degree in the retweet network) and that again humans are the more active retweeters. In Figure 6 (down) we find the relation between these quantities for our data. Furthormore, we observed in the data that the bots with the highest ammount of retweets among humans were mainly news organizations: @pictoline, @Pajaropolitico, @emeequis, @CNNEE, and @NewsweekEspanol.
In figure 8 we show the entire retweet network for our collection. It can be seen that very few bot accounts are responsible for a large proportion of the retweets by humans. This last point is also clear in figure 10, where only the retweets of bot tweets by humans are shown. Here the central nodes with high valency are the accounts that were retweeted most by humans. In contrast figure 9 shows that bots did not actually retweet themselves much. In fact most bot accounts lie in the outer circle, edgelessly isolated.
The total number of tweets created by bots were 4153, this number represents the
19.9146% of all tweets.
In total 12905 of all tweets are retweets. A total of 11895 retweets were done by humans, and
1010 retweets were done by bots.
The number of tweets created by bots and retweeted by humans is: 1450
The number of tweets created by humans and retweeted by bots is: 848
The number of tweets created by bots and retweeted by bots is: 76
The number of tweets created by humans and retweeted by humans is: 9896
There are more humans retweeted (10744) than bots retweeted (1526). There is a difference of 635 retweets: ‘all retweets ’=humans retweeted + bots retweeted + 635
The ‘missing’ 635 retweets belong to tweets created at previous time (before the fist tweet registe- red). Fortunately, retweets store the info of the original tweet. Searching the string http in the text of each tweet, we found that 17474 tweets from humans include web pages, and 4736 tweets from bots include web pages.
We extract bag-of-words features represented as TF-IDF (term frequency–inverse document frequency) using 
. We then used Singular Value Decomposition (SVD, also referred to as Latent Semantic Indexing in the context of information retrieval and text mining) to look at the distribution of Tweets on the top singular vectors. While the top singular vectors capture the most variance in the bag-of-words features set, for this corpora the difference between the bot and human tweets was not clear. We also redid the analysis by removing Spanish stop words and still did not find any discrimination between bots and humans.
However, as seen in Figure 11
, by computing the log-odds ratio of the counts of words between the human and bot cohorts (as was done in for discriminating between two Tweet corpora), we see several terms that are discriminating. Thus, although the bag-of-words features do not capture strong discrimination between bots and humans, the two cohorts are clearly different (specific word usages among bots can be different orders of magnitude since the horizontal axis in Figure 11 is on a log scale).
To better understand the nature of words bots and humans used, we apply basic sentiment analysis using LabMT . As discussed in , the top 10,000 Spanish words were presented to Amazon Mechanical Turk where 50 workers rated the happiness of each word on a scale of 1 to 9 (where 1 is least happy, 9 is most happy, and 5 is neutral). Using these scores for each word, we compute the average sentiment, for the human and bot corpora using Equation 1 in . As discussed in  however, a great deal of words may have neutral sentiment (and are essentially commonly used stop words), and the average sentiment score may be biased heavily towards the neutral score of 5.0. Therefore, the authors suggest removing words that are within of 5.0 so that words with stronger sentiment remain. By selecting an appropriate , we can remove stop words in a systematic way that does not contribute to sentiment.
It is not clear what value to select for . While the authors in  suggest , here we compute the average sentiment score for for a more complete understanding. Figure 12, left panel, shows how the tweets average sentiment changes as we filter out more neutral words. As the neutral words are filtered, we see that the average sentiment is pulled down significantly. This is to be expected as most tweets are expressing words related to violence. Interestingly, however, the bots seem to be less emotional than the humans in that their average sentiment is consistently above humans regardless of what value we use.
To investigate this hypothesis further, we removed all retweets and recomputed the average sentiments. Figure 12, panel on the right, shows again that removing the retweets does not change the fact that filtering neutral words yields more negative words. However, we see that the bot sentiment does not correlate strongly with the human tweets. In other words, as we filter more neutral words, the human tweets become more negative as before. But the bot tweets remain closer to being neutral. These findings all suggest that the bots were using less emotionally charged words than humans. In other words, it appears that the purpose of the bots in this case was to only distribute information in a non-sensational manner rather than purposefully stir up emotions.
In addition to using LabMT, we also hand coded a list of negative words, extracted from the corpus of collected tweets, and used it to compare both the bot and human corpora according to the frequency of appearance of words in this list. In order to increase the comparability of these words in a wider volume of tweets, when possible, we suppressed some last letters (that is, we applied “stemming”) such that they could match with different tenses (in case of verbs) and different genders and numbers (in nouns and adjectives) keeping the connotation. We refer to Table 4 for this list of incomplete words.
To check matches between words in Table 4 and the text in tweets, we remove URLs from the text in tweets, replace non-ASCII characters (like “ñ ”, stressed vowel á,é,í,ó,ú and “?‘ ”) by their ASCII equivalent (“n”,a, e, i, o, u,“?”). We also transform all capital letters to lowercase. The transformed text were split into single words to compare individually. In order to increase comparison speeds, we group the words alphabetically and compare only with words starting with the same letter, skipping also words starting with symbols, numbers. Finally, we only check if the words in Table 4 with the same initial letter as each word in split message starts with the same letters.
To prevent a misplaced punctuation mark from not matching a word, a second analysis was performed suppressing the first letter in each word, and checking if this shorter word matches with Table 4. This analysis also reveals no difference. Our method of comparison fails when a negative sentiment word is misspelled, but one expects that the sentiment of the tweet remains congruent in the whole text. Then, if the text is long, we are more likely to find another negative word but spelled correctly. Conversely, short texts are more likely to have less misspelled words.
To distinguish what kind of information is most shared, we consider the total of tweets and assign a numerical value to each one. This value was initialized in 0 increased by a constant, depending on the number of matches with the Table 4. Assuming that a tweet has a negative feeling when its value is different to zero, we show in Figure 13 that the largest volume of tweets comes from retweets with a negative feeling text. A closer reading of the entire tweet corpus revealed that the most of the messages which are non-negative cannot be identified as positive or neutral. Their texts share URLs and/or the sentiment cannot be determined by word inspection.
In this work we presented a case study of socialbots for a specific trending topic in Mexican Twitter. While numerous studies have suggested that socialbots act as disrupting agents of information, in our case study we found the opposite. The socialbots were in fact enabling the flow of information to ensure that the report about these atrocities reached the public and information was not stifled. Of course, from the point of the police authorities the socialbots may be viewed as agents of disruption and it is therefore a matter of perspective if socialbot are enablers or not. Our case study suggests that the role and landscape of socialbots is far more complex than simple binary categorizations. Our work highlights the need for further research to understand the ethical implications of such automated social activity.
We thank IPAM in UCLA and the organizers of the Cultural Analytics program, CNetS and the BotOrNot team in IU, and also Twitter for allowing access to data through their APIs. PSS acknowledges support from UNAM-DGAPA-PAPIIT-IN102716 and UC-MEXUS CN-16-43.
-  Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22Nd International Conference on World Wide Web, WWW ’13, pages 119–130, New York, NY, USA, 2013. ACM.
-  Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu. Design and analysis of a social botnet. Comput. Netw., 57(2):556–578, February 2013.
Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas
Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre
Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian
Holt, and Gaël Varoquaux.
API design for machine learning software: experiences from the scikit-learn project.In ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pages 108–122, 2013.
-  Nikan Chavoshi, Hossein Hamooni, and Abdullah Mueen. Identifying correlated bots in twitter. In International Conference on Social Informatics, pages 14–21. Springer International Publishing, 2016.
-  Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. Who is tweeting on twitter: Human, bot, or cyborg? In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC ’10, pages 21–30, New York, NY, USA, 2010. ACM.
-  Eric M. Clark, Jake Ryland Williams, Chris A. Jones, Richard A. Galbraith, Christopher M. Danforth, and Peter Sheridan Dodds. Sifting robotic from organic text: A natural language approach for detecting automation on twitter. Journal of Computational Science, 16:1 – 7, 2016.
-  Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web, WWW ’16 Companion, pages 273–274, Republic and Canton of Geneva, Switzerland, 2016. International World Wide Web Conferences Steering Committee.
-  John P. Dickerson, Vadim Kagan, and V.S. Subrahmanian. Using sentiment to detect bots on twitter: Are humans more opinionated than bots? 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 00(undefined):620–627, 2014.
-  Peter Sheridan Dodds, Kameron Decker Harris, Isabel M Kloumann, Catherine A Bliss, and Christopher M Danforth. Temporal patterns of happiness and information in a global social network: Hedonometrics and twitter. PloS one, 6(12):e26752, 2011.
-  Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. The rise of social bots. Commun. ACM, 59(7):96–104, June 2016.
-  Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. Social spammer detection with sentiment information. In Proceedings of the 2014 IEEE International Conference on Data Mining, ICDM ’14, pages 180–189, Washington, DC, USA, 2014. IEEE Computer Society.
-  Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. Social spammer detection in microblogging. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13, pages 2633–2639. AAAI Press, 2013.
-  Gary King, Jennifer Pan, and Margaret E. Roberts. How censorship in china allows government criticism but silences collective expression. American Political Science Review, 107(2 (May)):1–18, 2013. Please see our followup article published in Science, “Reverse-Engineering Censorship In China: Randomized Experimentation And Participant Observation.”.
-  Kyumin Lee, Brian David Eoff, and James Caverlee. Seven months with the devils: a long-term study of content polluters on twitter. In In AAAI Intl Conference on Weblogs and Social Media (ICWSM, 2011.
-  Sangho Lee and Jong Kim. Early filtering of ephemeral malicious accounts on twitter. Comput. Commun., 54(C):48–57, December 2014.
-  Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno GonÃ§alves, Snehal Patil, and Ro Flammini. Truthy: Mapping the spread of astroturf in microblog streams. In Proceedings of the 20th interna)onal conference companion on World wide web, 2011.
-  Saiph Savage, Andres Monroy-Hernandez, and Tobias Höllerer. Botivist: Calling volunteers to action using online bots. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, CSCW ’16, pages 813–822, New York, NY, USA, 2016. ACM.
-  Julia Silge and David Robinson. tidytext: Text mining and analysis using tidy data principles in r. JOSS, 1(3), 2016.
-  Daniel Sparks. How many users does twitter have?, April 2017. [Online; accessed 06-June-2017].
-  Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. Suspended accounts in retrospect: An analysis of twitter spam. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC ’11, pages 243–258, New York, NY, USA, 2011. ACM.
-  Kurt Thomas, Damon McCoy, Chris Grier, Alek Kolcz, and Vern Paxson. Trafficking fraudulent accounts: The role of the underground market in twitter spam and abuse. In Proceedings of the 22Nd USENIX Conference on Security, SEC’13, pages 195–210, Berkeley, CA, USA, 2013. USENIX Association.
-  Alex Hai Wang. Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach, pages 335–342. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
-  Samuel Woolley. Automating power: Social bot interference in global politics. First Monday, 21(4), 2016.
-  Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. Uncovering social network sybils in the wild. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC ’11, pages 259–268, New York, NY, USA, 2011. ACM.
-  Yin Zhu, Xiao Wang, Erheng Zhong, Nanthan N. Liu, He Li, and Qiang Yang. Discovering spammers in social networks. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI’12, pages 171–177. AAAI Press, 2012.