Bot Electioneering Volume: Visualizing Social Bot Activity During Elections

02/06/2019 ∙ by Kai-Cheng Yang, et al. ∙ Indiana University 0

It has been widely recognized that automated bots may have a significant impact on the outcomes of national events. It is important to raise public awareness about the threat of bots on social media during these important events, such as the 2018 US midterm election. To this end, we deployed a web application to help the public explore the activities of likely bots on Twitter on a daily basis. The application, called Bot Electioneering Volume (BEV), reports on the level of likely bot activities and visualizes the topics targeted by them. With this paper we release our code base for the BEV framework, with the goal of facilitating future efforts to combat malicious bots on social media.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Social bots have drawn great attention from the public recently. They are accounts on social media platforms controlled at least in part by algorithms to generate/share/retweet content and interact with human users (Ferrara et al., 2016). The automated nature of social bots makes it easy to achieve scalability, with which a single person is capable to control thousands of accounts on one or more social media platforms. When needed, these social bots can work collectively to manipulate the public by promoting certain accounts or opinions.

Being social animals, human users are inevitably vulnerable to the efforts of the social bots. Studies have shown ubiquitous social bots (Varol et al., 2017) distort online discussions, and particularly those about politics. During the 2010 US midterm election, primitive social bots were used to attack some candidates (Metaxas and Mustafaraj, 2012) and spread tweets with links to fake news websites (Ratkiewicz et al., 2011). A similar pattern emerged in the 2016 US presidential election, only with more sophisticated bots that aimed to effectively push their messages to the target audience (Bessi and Ferrara, 2016). In particular, bots were most active in the core of the misinformation-sharing network (Shao et al., 2018b) and effectively amplified the spread of low-credibility content by posting it within seconds and by targeting influential accounts (Shao et al., 2018a). Analogous automated campaigns were reported in countries around the globe (Stella et al., 2018; Ferrara, 2017).

Figure 1. Illustration of BEV’s (a) crawler, (b) database, (c) analyzer, and screen shot of (d) front-end interface. The upper panel of the frontend shows the Bot Electioneering Volume for the past 8 days. Users can select a day of interest to explore the top topics of that day. The bottom panels show a tag cloud and entity lists for the selected day. The tag cloud presents entities all together, with the size of each entity proportional to how often it is tweeted by bots. The entity lists display hashtags, mentions, and links ranked by how often each is tweeted by bots.

Here we present Bot Electioneering Volume (BEV), a platform that visualizes the volume generated by bots and the corresponding targeted topics. BEV tracks online traffic centered around elections from Twitter by feeding the streaming API with a list of selected hashtags. By incorporating the bot-detection ability of Botometer (Davis et al., 2016; Varol et al., 2017; Yang et al., 2019), BEV is able to distinguish between content generated by likely bots and humans. The measurement of average bot activity is then compared with random samples of tweets to produce a number that quantifies electioneering activity by bots. BEV also collects content topics, including hashtags, mentions, and URLs shared by likely bots, and reports on their relative volumes.

BEV (botometer.iuni.iu.edu/bev) monitored public tweets about the 2018 US midterm elections between October 22, 2018 and Dec 30, 2018. During the collection period, BEV drew over 3,000 visits. An archive of the data from Oct to Dec 2018 remains publicly available for retrospective inspections. We plan to activate BEV again for future elections. Our goal is to raise public awareness of bot activities and their impact during elections in the past, and more importantly those in the future.

2. System design

The BEV system contains 4 major parts: a crawler, a database, an analyzer, and a front-end interface, as illustrated in Figure 1.

The crawler is in charge of tracking Twitter’s filtering API for public election-related tweets, querying Twitter’s Spritzer API for random samples of public tweets, and fetching bot scores. Crawled data is stored in the database. The analyzer then extracts the required information and generates the statistics for the visualization at the application frontend. The frontend has three major parts: the Bot Electioneering Volume timeline, a tag cloud, and entity lists. The Bot Electioneering Volume measures the activity of likely bots, while the tag could and entity lists display the topics that are most tweeted by likely bots. By clicking on the links in an entity list, users are directed to Hoaxy (hoaxy.iuni.iu.edu) (Shao et al., 2016), where they can explore more in-depth visualizations of the influence of bots around the entities on Twitter in the recent minutes/hours/days.

The data collection runs in a streaming fashion, but fetching bot scores and analyzing the data take time. The front-end interface is updated every 4 hours to reflect newly incoming data.

2.1. Data collection

The collection of election-related tweets is crucial to our application. Our collection process starts with a set of election-related hashtags that are tracked using the Twitter filtering API. We seeded our set of hashtags with several widely-used political hashtags including #2018midterms, #maga, and #bluewave. We then repeatedly expanded the set with co-occuring hashtags (Conover et al., 2012), resulting in a set of 110 hashtags. From this set we manually removed 6 hashtags that are general and irrelevant to election. We also added the hashtags for each US state’s Senate race: #casen, #nysen, and so on. The full list of hashtags can be found in the FAQ page of the BEV website. This methodology allows our system to collect most tweets with newly emerging election hashtags, because it is likely that these tweets contain some of the hashtags in our list as well.

Twitter’s free filtering API offers at most 1% of Twitter’s traffic. We estimated that the traffic captured by our method is about 0.3–0.5% of Twitter’s complete traffic (see Figure 

2(a)). This means that instead of a sample, BEV collects and visualizes bot activities based on all of the targeted election-related tweets.

2.2. Bot identification

BEV uses Botometer (botometer.iuni.iu.edu) (Davis et al., 2016; Varol et al., 2017; Yang et al., 2019)

to obtain bot scores for Twitter accounts involved in election-related discourse. Botometer is a supervised machine learning algorithm that considers more than a thousand features about an account and its activity to estimate the likelihood that the account is automated. We consider accounts with bot score above 4 (on a 5-point scale) as social bots. This is a fairly conservative threshold choice, corresponding to a posterior probability of automation near 50%

(Yang et al., 2019)

based on a 15% prior probability

(Varol et al., 2017).

2.3. Bot activity measurement

Figure 2. (a) Number of election tweets and unique users for each day. (b) Average bot scores for election tweets and random sample. (c) BEV timeline. The 2018 US midterm elections day, Nov 6, is highlighted by a vertical dashed line.

To measure the bot electioneering volume, we first take daily averages of the bot scores of accounts generating political tweets, weighted by their tweet frequencies. Considering that spamming the same message is a common strategy for bots, the weighted average better highlights the amount of bot-generated content and their potential influence. To obrain a baseline of bot activity, we produce the same weighted average for random tweets from Twitter’s Spritzer API, with a rate of 1,000 tweets per hour. The daily average bot scores for election and random tweets are shown in Figure 2(b).

The Bot Electioneering Volume is defined by the relative difference between the two averages:

(1)

where and represent the average bot scores of electoral and random tweets, respectively. BEV is shown as a percentage difference in the front-end interface, as shown in Figure 1(d). Figure 2(c) plots the Bot Electioneering Volume of the whole midterm elections period.

In the design stage of BEV, we considered different metrics by replacing the average bot score in Eq. 1 with the median score and the proportion of tweets by bots. The Bot Electioneering Volume based on median values of the scores yields patterns similar to the average score version. However, the version based on the proportion of tweets by bots (denoted by BEV2) shows different trends. As shown in Figure 3(a), the baseline proportion of tweets by bots decreases steadily after the midterm elections. Consequently, the BEV2 value increases (Figure 3(b)). Since the proportion of tweets by likely bots in the random sample is not stable, we decided to deploy the BEV metric based on average bot scores.

Figure 3. (a) Proportion of tweets by bots for election tweets and random sample. (b) BEV2.

The BEV front-end interface also visualizes the topics targeted by likely bots. For simplicity, we only extract hashtags, mentioned accounts, and links from the tweets generated by bots. The targeted topics are represented in a tag cloud and entity lists.

3. Discussion

BEV reveals many interesting patterns. In terms of the discussion intensity, the number of election tweets before the election was about twice as many as the number after the election day peak, as shown in Figure 2(a). For bot activity, all the different metrics — average bot score, median bot score, and proportion of tweets by likely bots — are much higher for election tweets, suggesting that bots are actively generating election-related content; they are indeed employed for electioneering.

The average bot scores of election-related tweets fluctuated from day to day. The BEV on November 6 and 7 was drastically lower than on other days. Considering the spike in election tweets, we hypothesize that the bot activity was diluted by the huge amount of normal users around election day. Furthermore, the average bot scores of election-related tweets after the election are generally higher than those before the election. This is perhaps because many human users ceased to tweet political content after the election, but the bots kept working. BEV cannot reflect the changes in total volume of tweets because it is based on averaging the daily scores. For the future, a measurement that can leverage both the average bot score and the volume of tweets may be preferable to better represent the volume generated by bots.

4. Conclusion

We offer a real-time tool to visualize the electioneering activities of social bots. Open-source code for BEV is at github.com/IUNetSci/BEV.

Around the 2018 US midterm elections, bot electioneering was rampant. The great majority of content amplified by likely bots was on the conservative side of the political spectrum. It remains to be seen if this will change in the future.

The tool enables the public to gain a sense of the organic nature of the online discussion regarding elections and spot possibly polluted content. By performing the hashtag list generation procedure with different seeds, BEV can be adapted to target future US elections, as well as elections and events in other countries.

Acknowledgments.

We are grateful to the IU Network Science Institute (iuni.iu.edu) and the AWS Cloud Credits for Research program for supporting the BEV infrastructure and to Twitter for their free APIs. Botometer was developed with support from the Democracy Fund.

References

  • (1)
  • Bessi and Ferrara (2016) Alessandro Bessi and Emilio Ferrara. 2016. Social bots distort the 2016 US Presidential election online discussion. First Monday 21, 11 (2016).
  • Conover et al. (2012) Michael D Conover, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2012. Partisan asymmetries in online political activity.

    EPJ Data Science

    1 (2012), 6.
    https://doi.org/10.1140/epjds6
  • Davis et al. (2016) Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2016. BotOrNot: A system to evaluate social bots. In Proc. 25th Intl. Conf. Companion on World Wide Web. IW3CSC, Montreal, 273–274.
  • Ferrara (2017) Emilio Ferrara. 2017. Disinformation and Social Bot Operations in the Run Up to the 2017 French Presidential Election. First Monday 22, 8 (2017).
  • Ferrara et al. (2016) Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. 2016. The rise of social bots. Commun. ACM 59, 7 (2016), 96–104.
  • Metaxas and Mustafaraj (2012) Panagiotis T Metaxas and Eni Mustafaraj. 2012. Social media and the elections. Science 338, 6106 (2012), 472–473.
  • Ratkiewicz et al. (2011) Jacob Ratkiewicz, Michael Conover, Mark R Meiss, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011. Detecting and tracking political abuse in social media. In Proc. Intl. Conf. on Weblogs and Social Media (ICWSM). AAAI, Barcelona, 297–304.
  • Shao et al. (2016) Chengcheng Shao, Giovanni Luca Ciampaglia, Alessandro Flammini, and Filippo Menczer. 2016. Hoaxy: A platform for tracking online misinformation. In Proc. 25th Intl. Conf. Companion on World Wide Web. IW3CSC, Montreal, 745–750.
  • Shao et al. (2018a) Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. 2018a. The spread of low-credibility content by social bots. Nat. Commun. 9 (2018), 4787. https://doi.org/10.1038/s41467-018-06930-7
  • Shao et al. (2018b) Chengcheng Shao, Pik-Mai Hui, Lei Wang, Xinwen Jiang, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2018b. Anatomy of an online misinformation network. PLoS ONE 13, 4 (2018), e0196087. https://doi.org/10.1371/journal.pone.0196087
  • Stella et al. (2018) Massimo Stella, Emilio Ferrara, and Manlio De Domenico. 2018. Bots increase exposure to negative and inflammatory content in online social systems. Proc. Natl. Acad. Sci. U.S.A. 115, 49 (2018), 12435–12440.
  • Varol et al. (2017) Onur Varol, Emilio Ferrara, Clayton A Davis, Filippo Menczer, and Alessandro Flammini. 2017. Online human-bot interactions: Detection, estimation, and characterization. In Proc. Intl. Conf. on Web and Social Media (ICWSM). AAAI, Montreal, 280–289.
  • Yang et al. (2019) Kai-Cheng Yang, Onur Varol, Clayton A Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2019. Arming the public with AI to counter social bots. Human Behavior and Emerging Technologies 1, 1 (2019). https://doi.org/10.1002/hbe2.115 In press; arXiv preprint 1901.00912.