BoostNet: Bootstrapping detection of socialbots, and a case study from Guatemala

by   E. I. Velazquez Richards, et al.

We present a method to reconstruct networks of socialbots given minimal input. Then we use Kernel Density Estimates of Botometer scores from 47,000 social networking accounts to find clusters of automated accounts, discovering over 5,000 socialbots. This statistical and data driven approach allows for inference of thresholds for socialbot detection, as illustrated in a case study we present from Guatemala.


page 4

page 5


Identifying Compromised Accounts on Social Media Using Statistical Text Analysis

Compromised social media accounts are legitimate user accounts that have...

Causal Inference for Early Detection of Pathogenic Social Media Accounts

Pathogenic social media accounts such as terrorist supporters exploit co...

Troll Tweet Detection Using Contextualized Word Representations

In recent years, numerous troll accounts that manipulate social media se...

Selection of link function in binary regression: A case-study with world happiness report on immigration

Selection of appropriate link function for binary regression remains an ...

Instagram Fake and Automated Account Detection

Fake engagement is one of the significant problems in Online Social Netw...

Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and Autoencoders

Deep neural networks often suffer from overconfidence which can be partl...

Attack RMSE Leaderboard: An Introduction and Case Study

In this manuscript, we briefly introduce several tricks to climb the lea...

1 Introduction

In this paper we analyze data from the social networking platform Twitter. We use a statistical approach, with bi-variate Kernel Density Estimates, to detect automated accounts (socialbots) at scale in a large dataset. We present our BoostNet algorithm, which allows for the detection of networks of socialbots in microblogs and social media platforms given a very small number of initial accounts. We illustrate its performance with empirical data collected from Twitter in relation to current events in Guatemala.

To begin to describe some of the context of the events that have led to this particular social media situation, first we point out that the displacement of people due to armed conflict and corruption is a problem that affects many countries around the world. This phenomenon has strongly affected the Central American countries of Honduras and Guatemala. Nevertheless, currently the US enjoys the lowest level of undocumented immigrants in US in a decade, according to a Pew Research Center analysis of government data Pew18 . The same study indicates that border apprehensions have declined for Mexicans but risen for other Central Americans.

What are the root causes of migration? Understanding these can better help prevent forced displacement of people and thus also the effects on societies that receive them. In a previous work we investigated the use of socialbots in Honduras in relation to protests alleging electoral fraud GSV19 .

Consider the case of Guatemala. The International Commission against Impunity in Guatemala (CICIG UN-CICIG ) was created in 2006 by the United Nations and Guatemala. It is an international body whose mission is to investigate and prosecute serious crime UN-CICIG .

An independent international body, CICIG investigates illegal security groups and clandestine security organizations in Guatemala. These are criminal groups believed to have infiltrated state institutions, fostering impunity and undermining democratic advances since the end of the armed conflict in the 1990s. The third impeachment against President Jimmy Morales for illicit electoral financing during his electoral campaign in 2015 was requested by the Attorney General and the CICIG.

The mandate of the CICIG was set to end originally on September 3rd, 2019, but it has been cut abruptly short as Guatemalan President Morales ordered the CICIG to leave the country on January 7th, 2019 Linthicum2019 .

After we published our work on socialbots in Honduras GSV19 we were contacted by a Guatemalan journalist claiming that similar socialbots were acting against the population there. It was claimed that multiple Twitter accounts were being used to systematically intimidate and harass members of the CICIG and the media that covers their activities. In April 2018 we were provided with 19 seed accounts of potential socialbots that were notorious in this instance for their negative behaviour.

From these 19 accounts we reconstructed a network of over 35,000 accounts, by collecting their followers and their followees. The rationale is that socialbot accounts are not generally followed by human accounts. Following this premise we begin with these 19 seed accounts and take two hops out into the follower network to find potential accounts that are also automated and being used for this purpose. This method, which we call BoostNet, is explained in Algorithm 1 and the networks are visualized in figure 1 in terms of reach and spread of the full network, and in figure 2 in a subset of the most active bot accounts and their retweet relationships. This strategy led us to discover a socialbot network of over 3000 accounts. To this end we queried Botometer Davis2016 and performed a statistical analysis of the scores it provides to find the network of socialbots (explained below, see figure 3). We further validated our method by using 14 more accounts mentioned in a media interview about socialbot harassment in Guatemala from November 2018. From these 14 seed accounts we reconstructed a full network of over 12,000 accounts and found over 2,000 socialbots (see figure 4). There were over 600 socialbots common to both datasets.

In order to better understand the magnitude of this socialbot network, it is helpful to observe that Guatemala has a population of around 17 million people, and internet users include only 4.5 million Guate-Net . Measurements of social media use in Guatemala indicate that 5.24% of internet users are active on Twitter Guate-Tw . We can therefore extrapolate an—admittedly rough—estimate of around 250,000 Twitter users in Guatemala (2018 figures). In this perspective, socialbot networks of 3,000 and 2,000 accounts can have a considerable impact.

Conclusions: Our work here demonstrates how statistical methods can show the existence of considerable socialbot network of linked accounts. Given the potential size of Guatemala’s total Twitter user base, the amount of socialbot accounts could certainly impede freedom of expression. These findings corroborate the experience of users (and journalists) who claimed wide-spread abuse of this technology for nefarious purposes was present in Guatemala.

Moreover, our BoostNet strategy can be employed in other circumstances and social media platforms, where limited observational data can then lead to a complete reconstruction of networks of malicious accounts.

2 Data Collection

In this section we describe our strategy to gather a large network of socialbot accounts from a small number of accounts that are reported to be abusing a social media service. We present an algorithm that can be replicated in other circumstances, and can be easily implemented to reconstruct a complete network of linked accounts.

2.1 BoostNet: A method to find socialbot networks with minimal input

The following pseudo-code illustrates our work-flow to construct networks where the human and socialbot accounts can be analyzed. Our method allows us to find large networks of socialbots given a small number of starting accounts. We illustrate its performance with an empirical case study here, we discovered two sets of socialbots; one containing over 3,000 socialbot accounts and the second containing over 2,000 socialbot accounts, starting from only 19 and 14 accounts respectively in each case that were reportedly harassing journalists and members of the CICIG.

1:A collection of Twitter Accounts
2:Full linked network of with Socialbot account score
4:For each account in :
5:        Collect followers of the collection from Twitter’s Rest API
6:        Collect those accounts who are following from Twitter’s Rest API
7:        Obtain scores of every account in , and from Botometer, to determine if it is Human or Socialbot
8:Construct a follower-followee network annotated with Botometer scores
Algorithm 1 BoostNet : Bootstrapping Socialbot Network Detection

2.2 Comparison with Twitter’s Stream API

One poignant criticism of certain Twitter studies is the reliance on Twitter’s Streaming API for data acquisition. While Twitter’s Streaming API provides free and public access to a sample of tweets and has promoted research into social networks, there are certain limitations that its sampling method impose. Here we circumvent these difficulties in finding networks of linked accounts. Connections of followers and followees were queried from Twitter’s Rest API. In this way we have reconstructed a full dataset of accounts that are linked in the same connected network.

Certain studies have avoided this sampling bias uncertainty from Twitter’s Streaming API by using the Search API to obtain complete datasets Stella18 . Another option seems to be to work directly with Twitter, and some research has been successful at establishing influence relations using this kind of access Aral18 .

For this work we have reconstructed a full dataset of interest for our research using the Rest API only.


Figure 1: Gephi network graph created using OpenOrd and Force Atlas 2 force-directed layout algorithms. The network contains 35,208 nodes, 59,471 edges and 8 distinct clusters or communities. As per Twitter’s data policies we have used the user ID to label the nodes, and not the account handle.


Figure 2: Gephi network graph created using OpenOrd and Force Atlas 2 force-directed layout algorithms. The complete network (see Fig. 1) contains 35,208 nodes, 59,471 edges and 8 distinct clusters or communities, which was filtered by degree range 50 revealing 14 visible nodes (0.04 %) and 100 visible edges (0.17 %)of the complete network.

3 Statistical Detection of Socialbot Networks

For a review of Botometer, we recommend RiseSocialBots2016 . Socialbots have been employed for political purposes Wooley2016 . It has also been observed that this technology is used in marketing and propaganda Varol2017-2 . Although research has uncovered other successful methods of bot detection chavoshi2016identifying ; Dickerson2014 ; Chu2012 ; Clark2015 , Botometer provides public API access. The features it has built in, as well as a review of how it compares to, and surpasses, other methods can be found in Davis2016 ; Varol2017 .

We have previously used this method for identifying bots in online communities in Latin America, specifically in Mexico and Honduras suarez2016influence ; GSV19 ; VYS18 .

In this work we have concentrated on three of the non-language specific classifiers that Botometer provides. Using the scores from Temporal, Network, and Friend evaluations that each account in our dataset yields, we aggregate this data and then find a 2D bimodal behaviour using KDE, as illustrated in figures

3 and 4. A numerical summary of the number of accounts found appears in table 1.

Figure 3: 2D Kernel decomposition estimate for Network-Friend, Network-Temporal, and Temporal-Friend, pairwise classifiers from Botometer, for the 35,308 Twitter accounts in our 1st dataset, obtained through our BoostNet method. The regions in the upper right corners correspond to the over 3,000 socialbot accounts that we discovered. These results were obtained on April 9th-18th 2018.
Figure 4: 2D Kernel decomposition estimate for Network-Friend, Network-Temporal, and Temporal-Friend, pairwise classifiers from Botometer, for the 12,044 Twitter accounts in our 2nd dataset, obtained through our BoostNet method. The regions in the upper right corners correspond to the over 2154 socialbot accounts that we discovered. These results were obtained between November 2018 and January 2019.
Accounts in Network
Total Bots
April 35208 3009
November 12044 2154
Table 1: Linked accounts in our two datasets, from the full networks BoostNet reconstructed in April and in November 2018. A comparison between both datasets yields 3688 shared accounts, and at least 646 of them were classified as socialbots.
We thank the OSoMe team in Indiana University for access to Botometer , and also Twitter for allowing access to data through their APIs. PSS acknowledges support from UNAM-DGAPA-PAPIIT-IN104819.


  • (1) Aral S., and Dhillon P.S., Social influence maximization under empirical influence models, Nature Human Behaviour (2018): 1.
  • (2) CICIG (International Commission against Impunity in Guatemala), United Nations. U.N. Department of Political Affairs, {}.Cited28Dec2018
  • (3) Internet Live Stats,
  • (4) Social Media Stats Guatemala, GlobalStats,
  • (5) Gallagher, E., Suárez-Serrato P. and Velazquez Richards E. I., Socialbots Whitewashing Contested Elections; A Case Study from Honduras, Third International Congress on Information and Communication Technology, Advances in Intelligent Systems and Computing 797, 2019, Springer Singapore, 547–552,
  • (6)

    Velazquez Richards E. I., Yazdani M., Suárez-Serrato P, Socialbots supporting human rights, AAAI/ACM Artificial Intelligence, Ethics, and Society, 2018,

  • (7) Pew Research Center : U.S. Unauthorized Immigrant Total Dips to Lowest Level in a Decade (2018), {}.Cited26Dec2018.
  • (8) Chavoshi, N., Hamooni, H., Mueen, A.: Identifying correlated bots in twitter. In: International Conference on Social Informatics. pp. 14–21. Springer International Publishing (2016)
  • (9) Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on twitter: Human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference. pp. 21–30. ACSAC ’10, ACM, New York, NY, USA (2010)
  • (10) Clark, E.M., Williams, J.R., Jones, C.A., Galbraith, R.A., Danforth, C.M., Dodds, P.S.: Sifting robotic from organic text: A natural language approach for detecting automation on twitter. Journal of Computational Science 16, 1 – 7 (2016)
  • (11) Davis, Clayton Allen and Varol, Onur and Ferrara, Emilio and Flammini, Alessandro and Menczer, Filippo: BotOrNot: A System to Evaluate Social Bots, Proceedings of the 25th International Conference Companion on World Wide Web, WWW ’16 Companion, 2016, 273–274, International World Wide Web Conferences Steering Committee
  • (12) Dickerson, J.P., Kagan, V., Subrahmanian, V.: Using sentiment to detect bots on twitter: Are humans more opinionated than bots? 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 00(undefined), 620–627 (2014)
  • (13) Ferrara, E., Varol, O., Davis, C., Menczer, F., Flammini, A.: The rise of social bots. Commun. ACM 59(7), 96–104 (Jun 2016)
  • (14) Linthicum K., A U.N. anti-corruption commission is fleeing Guatemala after president’s order, Los Angeles Times, Mexico & The Americas, Jan 8 2019, {}.
  • (15) Suárez-Serrato P., Roberts M.E., Davis C., Menczer F. (2016): On the Influence of Social Bots in Online Protests. In: Spiro E., Ahn YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science, vol 10047. Springer, Cham
  • (16) Stella M., Ferrara E., and De Domenico M. ,Bots increase exposure to negative and inflammatory content in online social systems, Proceedings of the National Academy of Sciences 115, no. 49 (2018): 12435–12440.
  • (17) Varol O., Ferrara E., Davis C.A., Menczer F., Flammini A., Online Human-Bot Interactions: Detection, Estimation, and Characterization, Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, Montréal, Québec, Canada, May 15-18, 2017,280–289
  • (18)

    Varol O., Ferrara E., Menczer F., and Flammini A., Early detection of promoted campaigns on social media, EPJ Data Science, 2193–1127, 2017

  • (19) Velázquez E., Yazdani M., Suárez-Serrato P., Socialbots supporting human rights, to appear in Proceedings AAAI-ACM International Conference on Artificial Intelligence Ethics and Society, 2018.
  • (20) Woolley, S.: Automating power: Social bot interference in global politics. First Monday 21(4) (2016)