Analyzing Twitter Users' Behavior Before and After Contact by the Internet Research Agency

08/04/2020 ∙ by Upasana Dutta, et al. ∙ University of Colorado Boulder 0

Social media platforms have been exploited to conduct election interference in recent years. In particular, the Russian-backed Internet Research Agency (IRA) has been identified as a key source of misinformation spread on Twitter prior to the 2016 presidential election. The goal of this research is to understand whether general Twitter users changed their behavior in the year following first contact from an IRA account. We compare the before and after behavior of contacted users to determine whether there were differences in their mean tweet count, the sentiment of their tweets, and the frequency and sentiment of tweets mentioning @realDonaldTrump or @HillaryClinton. Our results indicate that users overall exhibited statistically significant changes in behavior across most of these metrics, and that those users that engaged with the IRA generally showed greater changes in behavior.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

For many years, the growth of online social media has been seen as a tool used to bring people together across borders and time-zones. However, malicious actors have been shown to have harnessed the tool of social media to instead test their hand at sowing regional discord (Banjo, 2019; Woolley and Howard, 2017). Prior to the 2016 presidential election, the Internet Research Agency (IRA), a Russian organization, created a “troll farm”, which registered and deployed thousands of sham accounts across various social media platforms (of Representatives Permanent Select Committee on Intelligence, 2018). These IRA accounts have been shown to have both targeted U.S. citizens and worked towards goals of forging interference in American politics, specifically the 2016 presidential election involving primary candidates Hillary Clinton and Donald Trump (of Representatives Permanent Select Committee on Intelligence, 2018). Their operations were deeply sophisticated and mainly aimed at instigating the social media users against each other on socio-economic grounds, political sentiments, and voting perceptions (Graff, 2018).

The IRA deployed 3,841 fake accounts on Twitter, a micro-blogging site, which resulted in 1.4 million Twitter users being exposed to the IRA tweets, as reported by Twitter (Policy, 2018), which led to roughly 73 million total engagements (Thompson, 2018). The main goal of this research is to understand to what extent the behavior of Twitter users changed after contact by IRA bots during or prior to the 2016 U.S. presidential election cycle. The implications are clearly enormous as a new presidential election of 2020 looms, which may be disrupted by the continuing efforts of the IRA (Ward and Lister, 2020) or other like-minded organizations. This work focuses on an extensive before and after analysis of Twitter users’ behavior based on six quantifiable metrics to identify the extent of changes following initial IRA contact.

While prior work has studied in great detail the behavior of the IRA bot accounts and techniques that they employed to spread misinformation (Howard et al., 2019; Boatwright et al., 2018; Boyd et al., 2018; Keller et al., 2020), based on a data set released by Twitter (Policy, 2018), much less work has examined the targets of the IRA bots, namely the overall Twitter user population. One recent work  (Bail et al., 2020) examining such targets surveyed the attitudes and behaviors of 1,239 Republican and Democratic Twitter users and found no evidence that Russian agents successfully changed user opinions or voting decisions. However, a limitation of this study is that it focused on a subset of Twitter users who already held fairly strong partisan beliefs and as a result may not be easily swayed by IRA bots.

Our interest is to study a broader population consisting of all the Twitter users contacted by the IRA to understand if their behavior changed before and after contact. This broader population of users may be less partisan in nature and hence more amenable to influence. Specifically, this study focuses on the following research questions:

  • RQ1: Did the contacted Twitter users exhibit a change in behavior following contact with IRA accounts?

  • RQ2: Did the contacted Twitter users who engaged back with the IRA accounts behave differently compared with those who did not after the contact?

  • RQ3: Did the behavior of users with high follower count (”influencers”) differ from those with low follower count (”general users”) after initial contact with the IRA?

To answer these questions, we have collected tweets of Twitter users whom the IRA bots contacted through retweets, replies, and mentions. To identify whether there were significant changes in the tweeting behavior before and after their interaction with the IRA, and if so by how much, we selected objective behavioral metrics that measured monthly tweet count, mean sentiment of tweets, frequency of mentioning either @realDonaldTrump or @HillaryClinton, and the mean sentiment of tweets that tag either @realDonaldTrump or @Hillary Clinton.

In addition, our research considers whether the behavior of those contacted Twitter users with a high follower count (5000), whom we term ”influencers”, differed from the behavior of contacted users with a more typical follower count (¡5000), whom we term ”general users”. Combined with our interest in studying users who responded to and did not respond to the contact attempts by the IRA, this creates four groups of users that we studied: responsive influencers, non-responsive influencers, responsive general users, and non-responsive general users.

We further desired to restrict the data set to American Twitter users due to our interest in the 2016 presidential election, but lacked geo-location or other information to determine whether a Twitter user is American. Our closest approximation was to filter the data set for English language tweets, as the IRA bots also conversed with users using other languages, especially Russian. In addition, the user pool was filtered in order to remove potential bots, protected accounts, and accounts with insufficient data to be studied. We also focused our efforts on studying those users who were most targeted by the IRA, namely those who had been contacted in all three ways through retweets, replies, and mentions by the IRA. We reasoned that if this group could first be shown to exhibit a change in behavior, then it would justify expanding our data collection to the much larger set of those users who had been contacted in at least one of the three ways. Figure 1 illustrates the overall process of data pre-processing and the before and after analysis.

Figure 1. The process we used to create and analyze the Twitter dataset. The final user population has been divided into four subsets: responsive and unresponsive general users, as well as responsive and unresponsive influencers. All users in this work were filtered to remove those with bot-like behavior, protected accounts, predominantly non-English tweets, or insufficient data to be studied. Before and after analysis was performed to evaluate monthly tweet volumes, overall sentiment, as well as the frequencies and sentiment of tweets mentioning either @realDonaldTrump or @HillaryClinton.

With these caveats in mind, this work provides some of the first evidence that contacted Twitter users’ behavior underwent significant changes in a variety of ways following interactions with Russia’s Internet Research Agency prior to the 2016 presidential election. Specifically, this paper makes the following contributions:

  • A substantially broader population of Twitter users is studied to understand whether user behavior changed before and after contact by the IRA accounts.

  • Evidence is found of statistically significant increases in tweeting frequency across all user groups, including responsive and non-responsive general users and influencers, after initial contact by the IRA. (RQ1)

  • Three of the four user groups exhibited statistically significant decreases in sentiment (i.e., less positive or more negative) after initial contact by the IRA. (RQ1)

  • The frequency of monthly mentions of @realDonaldTrump is shown to significantly increase for all user groups after initial contact by the IRA. (RQ1)

  • Statistically significant increases in monthly mentions of @HillaryClinton occurred for all four user groups after initial contact by the IRA. (RQ1)

  • Across a variety of different metrics, responsive users who engaged with the IRA generally showed change that was either higher in percentage or satisfied stricter statistical significance compared with non-responsive users. (RQ2)

  • We found no statistically significant difference in the behavior of influencers versus general users. (RQ3)

The intent of this paper is to identify and quantify correlation between initial contact by the IRA and changed behavior by targeted users where it exists. The intent of the paper is not to establish causality, i.e., that IRA contact caused the changes in behavior, which would require a different path of research. The Discussion section of the paper describes this topic in more detail.

In the following, we first describe background and related work, then explain our data collection and pre-processing filters, followed by a description of our method for performing before and after analysis, including the metrics used. We then present the key results of our data analysis. The paper finishes with a discussion of the implications and limitations of our work as well as a summary of our conclusions.

Background and Related Work

Internet Research Agency

Internet Research Agency (IRA) is a well-known Russian company engaged in online influence operations on behalf of Russian business and political interests (Kriel and Pavliuc, 2019; Ruck et al., 2019; Boatwright et al., 2018; DiResta et al., 2018). According to the Mueller Report released in 2017 (Mueller and Cat, 2019), the IRA was using social media to undermine the electoral process and is at the center of the federal indictment. They have been collecting sensitive information, such as names, addresses, phone numbers, email addresses, and other valuable data about thousands of Americans. The IRA campaign during the 2016 election has drawn great interest in the research community. Earlier research has characterized the strategies of IRA activity on social media from various perspectives (Howard et al., 2019; Boatwright et al., 2018; Boyd et al., 2018; Keller et al., 2020). To the best of our knowledge, most of this work does not examine how IRA accounts interact with general Twitter users, and whether or not these campaigns play a role in changing attitudes and behaviors of such users. Bail et al. (2020) used survey data that described the attitudes and behaviors of 1,239 Republican and Democratic Twitter users and found no evidence that these Russian agents successfully changed user opinions or voting decisions. In comparison, we have expanded the scope of our study to consider a much broader population of users over a longer observational period, and found significant differences in their behaviors before and after the contact.

Malicious Bots on Social Media

While some bots are designed for providing better service/management, such as auto-moderators and chat bots, they can be misused for bad behavior by extreme groups (Bessi and Ferrara, 2016). For instance, researchers have detected bots that spread misinformation (Shao et al., 2017; Ferrara et al., 2016), interfere with elections (Ferrara, 2017), and amplify hate against minority groups (Albadi et al., 2019). ISIS propagandists are also known for using bots to inflate their influence and promote extreme ideologies (Benigni et al., 2017). Due to bots’ sophisticated behavior, it is increasingly important to understand them and investigate strategies to combat malicious bots in an online environment.

Social Media Manipulation during Elections

There is growing interest in understanding manipulation campaigns deployed on social media sites specifically targeting elections. Other than the IRA campaign that we have mentioned previously, similar approaches have been replicated by different countries. For example, during the 2016 UK Brexit referendum it was found that political bots played a small but strategic role in shaping Twitter conversations (Howard and Kollanyi, 2016). Ferrara (2017) studied the MacronLeaks disinformation campaign during the 2017 French presidential election and found that most users who engaged in this campaign were foreigners with preexisting interest in alt-right topics and alternative news media. Arnaudo (2017) examined computational propaganda during three election related political events in Brazil from 2014 to 2016, and found that social media manipulation was becoming more diverse and at a much larger scale. Starbird et al. conducted an interpretative analysis of a cross-platform campaign targeting a rescue group in Syria, which conceptualized what a disinformation campaign is and how it works (Starbird et al., 2019; Wilson and Starbird, 2020).

Data Collection

The goal of this research is to understand behavioral changes of general American Twitter users after interactions with the IRA. To achieve this goal, IRA tweets were analyzed to find a sample of user population the IRAs interacted with on Twitter. To keep the scope of this work focused on the most heavily targeted users, only those users who were mentioned, replied to, as well as retweeted by the IRAs were considered for the analysis. The reason for this selection is that these users are potentially the ones most impacted by IRA interactions. To isolate American users who were contacted, the pool of potential users was put through a number of vetting processes in order to weed out users who did not have majority of their tweets published in English. The screening also removed users with protected accounts, users who had a high probability of being a Twitter bot account, and users whose first accessible tweet occurred after IRA contact. This process has been illustrated in Figure  

1.

IRA Data Set: This study used an unhashed version of the data set made public by Twitter for research purposes (Policy, 2018). The IRA accounts were confirmed to be Russian agents by the U.S. government during the 2017 investigation into the 2016 U.S. presidential election cycle (of Representatives Permanent Select Committee on Intelligence, 2018). The data set contains the tweets of 3,479 IRA accounts, that are assumed to be connected to the propaganda efforts of the Internet Research Agency and are currently suspended (Policy, 2018).

Contacted User Data Set: Twitter has identified 1.4 million users who interacted with the IRA via Twitter (Policy, 2018). However, in order to preserve the anonymity of these users, that data set has not been released to the public. Therefore, to study the users who interacted with the IRA, our research first analyzed a total of 8,768,633 tweets shared by the 3,479 IRA accounts, from May 2009 to June 2018, and identified users who were either replied to, retweeted by, or mentioned by an IRA account.  6% of the IRA tweets were in reply to other users, resulting in 82,806 distinct users contacted by the IRAs. Out of these users, 1,533 ( 2%) were identified as themselves being affiliated with the IRA.  38% of the IRA tweets were retweets of other Twitter users, resulting in 204,273 distinct users who were retweeted by IRA accounts, out of which 1,819 ( 1% of them) were identified as themselves being affiliated with the IRA.  46% of the IRA tweets mentioned other users, resulting in 829,080 distinct users who were mentioned by the IRA accounts. Among these users, 2,141 ( 0.3%) were identified as themselves being affiliated with the IRA.

In aggregate, we found 837,688 non-IRA Twitter users who received either mentions, replies, or retweets from an IRA account. The set of highly targeted users that were contacted by all three methods (mention, reply, retweet) consists of 15,929 users, and is the focus of this study. We plan to extend this study in the future to consider the wider set of 837,688 users contacted by at least one of the three means.

General Users vs. Influencers: We separated the set of users identified above into two groups, influencers and general users, to better understand the IRA impact. We consider influencers to be those users who have at least 5,000 followers. We used Tweepy, a python library for the Twitter API (Roesslein, 2019), to calculate each user’s follower count. Tweepy was unable to access information on 2,873 protected, deleted, or similarly inaccessible Twitter users. We found that of the remaining 13,056 accounts, there were 6,565 general users and 6,491 influencers.

For these 13,056 users we utilized Twint, a Twitter-scraping python library, to collect their user data (team, 2020). Twint has less severe rate restrictions than Twitter and specializes in excavating archival data specifically (team, 2020). Twint can compile all tweets publicly authored by a user but it is restricted in the retweet and favorite tweets that it can scrape. Twint can only retrieve the most recent retweets and the most recently favorite tweets (team, 2020). When working with data from previous years, it is unlikely that a majority of favorite tweets from said period will be reachable. For this reason, retweets and favorite tweets were not utilized within this analysis, as their presence could be unevenly distributed due to this time-sensitive constraint.

Twint is also limited in the state of the accounts it is able to retrieve. Twint is unable to retrieve account data for deleted accounts (team, 2020). Additionally, Twint is unable to scrape tweets published while a Twitter account was “protected” (private), even if the account is later made public (team, 2020). Users “shadow-banned” from Twitter were also inaccessible via Twint (team, 2020); “Shadow-banning” is a process wherein a user’s data is not available through the Twitter search feature for a period of time (Gadde and Beykpour, 2018; Stack, 2018).

Responsive vs. Unresponsive Users: After removing the deleted and inaccessible accounts within the general user set, 5,001 users’ tweet histories were scraped. Analysis of these tweets revealed that 2,285 of the users in this set ( 45.6%) engaged back with the IRA by either mentioning an IRA account in their tweets or by replying to the tweets shared by an IRA agent, while the remaining 54.4% of users showed no signs of engagement through mentions with an IRA account. Within the influencer user set, 5,712 users’ tweet histories could be obtained. Analysis of the tweets of this influencer set revealed that 1,054 of them ( 18.5%) engaged back with IRA accounts by either mentioning the IRAs in their tweets or by replying to the tweets shared by the IRA, while the remaining  81.5% of users showed no signs of engaging back with an IRA account.

Removal of Potential Bots: In order to ensure the users under the purview of this study were legitimate users, a number of filters were utilized to remove accounts suspected of deviating from this criteria. A python API, Botometer, was used to assess the likelihood that any individual user was an automated account and not a natural user (Yang, 2020). A “universal” and an “English” score are returned by Botometer for each user, representing the probability that an account is run by a bot (Yang, 2020). The “English” score checks only English-specific information, while the “universal” score is not language specific (Yang, 2020). Accounts that received probabilities of 0.4 or greater from Botometer were deemed suspected bots and thus removed from this analysis. 151 of the 5,001 users within this general user set were removed by the Botometer general, whereas 182 of the 5,712 users were removed from the influencer set .

Language Analysis: Next, a language analysis was performed using a language-detection Python library called langdetect (Danilak, 2020). It was found that the majority of the tweets examined were published in either English or Russian. For users who engaged back with an IRA account, the number of tweets published in Russian were more numerous than those written in English, whereas for people who did not engage back, English tweets outnumbered Russian. As for the influencer set, the number of English tweets outnumbered the number of tweets in any other languages for the set of users who engaged back as well as for those who did not. Only those users who had more than 75% of their total tweets in English were selected to remain in the study.

Removal of Insufficient Data: Additionally, users whose first date of contact by an IRA account was prior to their first recorded tweet on file were removed (i.e., the IRA contact occurred before the account was made public), leaving a total of 1,839 users remaining in the general user set, 578 of whom engaged back with an IRA account while the remaining 1,261 did not. These users formed the final responsive general and unresponsive general user sets, on which before and after analysis was performed. Of the influential users with 5,000 or greater follower counts, 3,056 users remained, 430 of whom engaged back with an IRA account and 2,626 did not. These two groups formed the responsive influencer and unresponsive influencer user groups respectively. Table 1 shows the number of tweets analysed in each category of users to conduct the before and the after analysis.

Category of users Total count of users Behavior No. of users No. of tweets
General 1,839 Engaged back 578 (31.4%) 19,548,890
Did not engage back 1,261 (68.6%) 33,960,883
Influencers 3,056 Engaged back 430 (14.07%) 29,246,645
Did not engage back 2,626 (85.95%) 148,780,230
Table 1. Size of the data-set used for the before-and-after analysis

Method

A before and after analysis was performed on the users contacted by IRA accounts. As each user was contacted on a potentially different date, time ‘zero’ for each user in the study was assigned as the date of first IRA contact of that user, thus allowing us to aggregate before and after behaviors across different users contacted at different times. To analyze the before IRA contact behavior, we considered for each user the 12-month time period preceding their time zero event, and similarly, to analyze the after IRA contact behavior, we considered for each user the 12-month time period following their time zero event. For each monthly score, a period of 31 days was used in place of a calendar month.

The before and after analysis monitored user behavior through six specific metrics: monthly user tweet count, monthly user average sentiment, number of monthly @realDonaldTrump mentions, number of monthly @HillaryClinton mentions, monthly average sentiment of tweets mentioning @realDonaldTrump and monthly average sentiment of tweets mentioning @HillaryClinton.

Monthly Tweet Count was determined for each user by assessing the total number of tweets published by a user within the bounds of a one-month period. This count includes tweets published by the Twitter user and replies to other Twitter accounts during that time period. We excluded favorite tweets, retweets of tweets authored by another Twitter account, or other Twitter activity from the monthly tweet count because of the limitations mentioned in the Data Collection Section.

VADER Sentiment Analysis, a python sentiment analysis tool built specifically to examine text written on social media, was used to return a sentiment score on the range [-1, 1] for each tweet authored by a given user within an allotted time span 

(Hutto and Gilbert, 2014). A sentiment score of -1 indicates a highly negative sentiment while a score of 1 indicates a highly positive sentiment. These values were then averaged to form the Monthly Sentiment score for each user.

A Trump or Hillary mention is defined within this study as a tweet which directly tags either @realDonaldTrump or @HillaryClinton, respectively. These counts include tweets from users written in response to either @realDonaldTrump or @HillaryClinton. The sum of @realDonaldTrump and @HillaryClinton mentions over the assigned time period form the Trump Mention Count and Hillary Mention Count.

In order to examine the intersection of Trump or Hillary mentions and user sentiment, we used the VADER Sentiment Analysis on the tweets directed at @realDonaldTrump and @HillaryClinton. The results form the Trump Mention Sentiment and Hillary Mention Sentiment scores.

To assess the statistical significance of results from the before and after analysis of user tweeting behaviors, the Wilcoxon hypothesis test was used (Solutions, 2020)

, which is a non-parametric alternative to the dependent samples t-test. We use this alternative because the distribution of our sample data does not approximate a normal distribution. Also, since the before and after samples are not independent of each other, the Mann-Whitney U-test cannot be used. The assumptions of the Wilcoxon tests are (a) the samples compared are paired, (b) each pair is chosen randomly and independently, and (c) the data are measured on at least an interval scale when within-pair differences are calculated to perform the test. The samples we use to perform the Wilcoxon hypothesis tests satisfy all these three assumptions. Since not all users may have 12 months of activity on Twitter before they were first contacted by the IRA, for the hypothesis tests we consider only a subset of users who were active on Twitter for at least 6 months prior to their interaction with the IRAs. This also removes the users who were very new to the platform. All the hypothesis tests were done at a 0.05 level of significance.

Results

User Behavior

A before and after analysis was performed on the behavior of Twitter users contacted by IRA accounts. Using the prior definitions of general users and influencers, as well as responsive and non-responsive users, four groups were analyzed, namely responsive general users, unresponsive general users, responsive influencers, unresponsive influencers. Plots have behavior for the year prior to and following IRA contact displayed, whereas significance testing, means, medians and other descriptive data are determined using only six months before and after IRA contact.

Figure 2. Total monthly tweet volumes have changed in a statistically significant way after first contact in both responsive and unresponsive general user groups. Both subsets had local maximum tweet counts in the month following IRA contact.

Figure 3. Total monthly tweet volumes have increased in a statistically significant manner after first contact in both responsive and unresponsive influencer groups. Only responsive influencers are shown to have had a local maximum tweet count occur in the month directly following initial IRA engagement.

Monthly Tweet Frequency: Users across all groups were found to have significantly changed tweeting frequency in the months following initial IRA contact (See Figure 2 and Figure 3). Responsive general users had an average increase in tweet volume of 26.02% following first IRA contact and unresponsive general users had an average increase in tweet volume of 15.43%. Influencers saw an average increase of 15.96% for responsive users and 9.83% for unresponsive users. Within both the responsive and unresponsive general users as well as the responsive influencers, a local maximum is seen as having occurred in the month following first IRA contact, however this trend is not followed in the the unresponsive influencer subset. It is shown here that the responsive users within both the general and influencer sets have clearly sustained higher mean tweets in the year following IRA interaction. While the unresponsive general users also show a peak during the month following IRA interaction, their tweet counts descend to reflect counts similar to those found before IRA contact.

Monthly Tweet Sentiment: Overall mean tweet sentiment by month was shown to increase in negativity in a statistically significant manner for responsive general users, and both responsive and unresponsive influencers (See Figure 4 and Figure 5). However, the unresponsive general users showed no significant change at the 0.05 level of significance. For responsive general users, mean sentiment prior to IRA contact rested at 0.023 and fell to 0.015 after, whereas unresponsive general users had a mean sentiment of 0.049 before IRA contact and 0.05 after. For influencers, mean sentiment for responsive users was 0.022 before contact and 0.019 after, while for the unresponsive ones the mean sentiment was 0.092 before IRA contact and 0.088 after. Responsive general users became notably more negative in sentiment after IRA contact, while responsive influencers tended to be more negative in general than their unresponsive counterparts.

Figure 4. Average monthly sentiment significantly changed only for general users which responded back to an IRA accounts. Users who chose not to engage back with IRA accounts did not show significant change at the 0.05 level of significance in sentiment in the months following contact.

Figure 5. Average monthly sentiment significantly changed for both influencer user groups, responsive and unresponsive.

@realDonaldTrump Mentions: The number of monthly mentions of the @realDonaldTrump account significantly increased for all user segments (See Figure 6 and Figure 7). For general users, mentioning of the @realDonaldTrump handle increased by 65.51% for responsive users and by 27.93% for unresponsive users. Among influencers, @realDonaldTrump was mentioned 94.72% more after IRA contact by responsive users and 145.82% more by unresponsive users. In the responsive users groups, general and influencer, the number of @realDonaldTrump mentions dwarfs the unresponsive user groups in their respective categories. Responsive influencers were found to have tagged @realDonaldTrump more regularly than any other user set.

Figure 6. The monthly @realDonaldTrump mention counts for the general users groups, those which responded to an IRA account and those which did not.

Figure 7. The monthly @realDonaldTrump mention counts for the influencer user groups, those which responded to an IRA account and those which did not. The responsive influencer accounts mentioned @realDonaldTrump more regularly than any other group.

Looking at the intersection of user sentiment and @realDonaldTrump mentions, a significant change was observed only within the unresponsive influencer set (See Figure 8 and Figure 9). While responsive general users are seen to become increasingly negative in the subsequent months following IRA contact (going from a mean sentiment of 0.002 before and -0.015 after IRA contact), this was not found to be statistically significant. A random analysis of this tweet set indicated that while tweet sentiment did grow more negative, the negativity was most often targeting the opponents of @realDonaldTrump, often stemming from tweets replying to @realDonaldTrump himself. Some examples of this phenomenon are: ”@readDonaldTrump We know Obama had Trump Tower wire tapped no doubt at all keep tweeting”, ”@foxnews […] @realDonaldTrump It’s not bigotry to keep terrorists out of this country.” and ”@nytimes @realDonaldTrump […] Everyone knows media & NYT is dishonest. That is why you are popular, you tell them to their face.” (some identifying usernames have been redacted from these examples). It appeared that the sentiment change in @realDonaldTrump tweets may have indicated a growing desire to engage Trump’s perceived adversaries than an increasingly negative sentiment for Trump himself. Unresponsive general users sentiment decreased from 0.015 before IRA contact to 0.009 after. The decline into negative sentiment of @realDonaldTrump mentions seen in the unresponsive general user group is not seen within the influencer subset. Unresponsive influencers appeared to conversely grow more positive in their tweets regarding @realDonaldTrump, with a before mean sentiment of 0.019 and an after sentiment of 0.029. Responsive influencers had a mean sentiment of @realDonaldTrump mentions of 0.031 before IRA contact and 0.013 after.

Figure 8. No significant change was detected in the sentiment of @realDonaldTrump tweets by contacted users for either the responsive or unresponsive general users, despite the dip into an overall negative sentiment seen in responsive general users.

Figure 9. Sentiment of @realDonalTrump tweets by contacted influencers was shown to have significantly changed only for unresponsive influencers. For responsive influencers, no significant change was detected.

@HillaryClinton Mentions: There is a statistically significant increase in the number of monthly mentions of @HillaryClinton within all user groups (See Figure 10 and Figure 11). For general users, responsive users mentioned @HillaryClinton with 55.44% more regularity after IRA contact, and unresponsive users mentioned the handle 14.75% more often. For influencers, there was an increase of 101.62% and 67.4% in @HillaryClinton mentions for responsive and unresponsive users, respectively. While all groups showed increased mentions of @HillaryClinton, responsive users from each the general and influencer set clearly sustained a larger number of mentions in the year after first IRA contact, following the pattern that @realDonaldTrump mention counts exhibited to a far lesser degree. Again responsive influencers are found to have mentioned @HillaryClinton more than any other user group, though their use of the @realDonaldTrump handle triples their use of @HillaryClinton for most months.

Figure 10. The change in monthly @HillaryClinton mention counts are significant among each of the general user groups, responsive and unresponsive.

Figure 11. The change in monthly @HillaryClinton mention counts are significant among each of the influencer subsets, those which responded to an IRA account and those which did not.

There is a statistically significant increase in negativity of user sentiment for the @HillaryClinton mentions only within the unresponsive general users and the responsive influencers (See Figure 12 and Figure 13). Responsive general users have a mean sentiment of @HillaryClinton mentions of -0.017 before IRA contact and -0.021 after. Unresponsive general users have a before mean of 0.002 and an after mean of -0.001. Influencers mean sentiment changes from -0.003 to -0.025 for responsive users and 0.018 to 0.017 for unresponsive users. Possibly too few data points exist to show significant change within this set, as @HillaryClinton was not mentioned as often as @realDonaldTrump.

Figure 12. Sentiment has significantly changed only for unresponsive general users mentioning @HillaryClinton. There was no significant change for responsive general users.

Figure 13. Sentiment has significantly changed for responsive influencers mentioning @HillaryClinton. There is no significant change for unresponsive influencers.

IRA to User Contact Points

The following data has been presented in order to contextualize the strategies and the timeline of events surrounding IRA’s misinformation campaign (Figure 14 and Figure 15)

The IRA accounts listed within the data were created as early as 2009, though the majority of accounts were registered with Twitter in 2014. Accounts continued to be created through 2016. Examination of the distribution of first contact dates, i.e., the date upon which a user was first mentioned by an IRA account, showed that while 2014 saw the first large wave of users being contacted by IRA accounts, the most extensive campaign to reach Twitter users initiated contact within 2015. Activity in 2016 decreased to mirror the number of users in 2014, before further dropping off to the final year of actively reaching new Twitter users in 2017 (See Figure 14 and Figure 15).

Figure 14. The number of IRA outreach to general Twitter users increases until the spike in users reaches a maximum in 2015, the year proceeding the 2016 Presidential election. After the diminished activity in 2017 no further users were contacted by the IRA accounts housed within the data set of this study.

Figure 15. The majority of influencer accounts were contacted by the IRA first in 2015, and no further initial contacts were found after 2017. Far fewer influencer accounts were reached within years prior to 2014.

The mean number of contact points from an IRA account to a user is highly skewed due to the number of contact points with a relatively small number of accounts, such as @realDonaldTrump. As a result, the median number of contacts for each group has been used to show a more balanced depiction of events. Across all users the median number of contact points is nine. Among the general users, responsive users have a median of contact frequency of seven, while unresponsive users have a median of five. Within the influencers, responsive users have a median contact frequency of fourteen and unresponsive users have a median of twenty.

The average time span from the first to the last contact with an IRA account across all users, is 503.67 days, well over an entire year. Within the general user group we see that the mean span of contact is 355.23 days (449.08 mean for responsive general users and 276.12 for unresponsive general users). For influencers, the average span of contact is 636.32 days (629.23 day mean for responsive influencers and 642.85 day mean for unresponsive users).

Summary of Research Questions

RQ1: Did the contacted Twitter users exhibit a change in behavior following contact with IRA accounts? For all user groups, responsive and unresponsive general users and influencers, mean tweet count significantly increased after IRA contact, as did the number of mentions of @realDonaldTrump and @HillaryClinton. There was also a significant increase in negative sentiment for influencers as well as responsive general users.

RQ2: Did the contacted Twitter users who engaged back with the IRA accounts behave differently compared with those who did not after the contact? For certain behavioral metrics the study observed a higher percentage change in users who engaged back with the IRAs compared with those who did not, in both the general and the influencer user categories. The percentage change of tweet counts was higher in responsive users than the unresponsive ones (26.02% and 15.43% respectively among general users and 15.96% and 9.83% among the influencers). In addition, we found that even though the general responsive users as well as both the category of users in the influencer set show statistically significant change in overall tweet sentiments at 0.05 level of significance, at 0.01 level of significance only the general users who engaged back with the IRAs show a statistically significant change, while the other user sets do not, indicating that the change in sentiments was the most prominent for the general users who engaged back.

Our findings also show that the general and the influencer users who responded back have a greater percentage change in mentioning @HillaryClinton in their tweets, compared with those who did not respond to the IRA accounts (55.44% and 14.75% respectively among general users and 101.62% and 67.4% among the influencers). Among the general users, the percentage change in mentioning @realDonaldTrump for users who responded back was greater (65.51%) compared with the unresponsive ones (27.93%). In contrast, among the influencers, the users who did not engage back showed a greater percentage change (145.82%) compared with those who did (94.72%).

RQ3: Did the behavior of users with high follower count (”influencers”) differ from those with low follower count (”general users”) after initial contact with the IRA? We found that influencers and general users did not exhibit sufficiently large differences in behavior according to the metrics of this study. Both influencers and general users had similar behavioral changes in all groups for overall tweet count and mention frequency of the @realDonaldTrump and @HillaryClinton handles. All groups, except for unresponsive general users, also saw a significant change in overall sentiment. No clear distinction can be made between general users and influencers for @realDonaldTrump or @HillaryClinton sentiment. Deeper analysis of the percentage change in each behavior reveals that while tweet count increased with greater frequency among general users, influencers increased their uses of the @realDonaldTrump and @HillaryClinton handles more than that of the general user population. For all three sentiment measures, no single set of influencers or general users had a clearly more extreme change across both responsive and unresponsive users than the other.

Discussion

Implications

As it has been shown that Russia and other foreign actors have continued their efforts to sow discord around elections worldwide (Ward and Lister, 2020; Mary Ilyushina and Lister, 2019; Adam Goldman and Fandos, 2020), it has become more important than ever to understand the trends in behavior of users targeted in Russia’s efforts. It is the hope of this work that social media platforms and governments around the globe will be spurred by the findings in this work to cooperate to implement stronger policies that protect elections from online meddling. Given the important real-world impact of this work, our hope is also that greater collaboration between industry and academia will be stimulated.

Another potential implication of this work is that it may encourage more research investigating the effect of nefarious online political activities on targeted user populations. This would include studying whether users on other platforms besides Twitter, such as Facebook and Instagram, exhibited similar before/after changes in behavior. Indeed, there is evidence that the spread of misinformation is not confined solely to Twitter (O’Sullivan, 2020; Lomas, 2020; Burkhardt, 2017; Cooke, 2018). This would also include investigating which techniques achieved the greatest impact in terms of changing user behavior. Further, while this work focused on users who were contacted by IRA accounts through mentions, retweets, and replies, i.e., the intersection of these three mechanisms, the findings may motivate further investigation of larger data sets such as the union of these three sets, which contains a much larger number of users (837,688). Given our promising initial results, we intend to explore this wider data set in the future.

While our work has focused on ‘first order’ contacts between the IRA and Twitter users, this work may also raise the question of whether impacted users also impacted their friends and followers through their retweets, replies and mentions. Thus, we encourage investigators to study the impact of ‘second order’ contacts.

Limitations

We have been careful not to claim cause and effect in this research. That is, while we observe significant changes in behavior before and after contact by the IRA, we cannot definitively attribute the correlation in these changes to being primarily or even exclusively caused by contact from the IRA. There may be other factors that caused these changes. For example, tweet counts may have risen generally throughout the time frame considered, and sentiment may have become more negative generally and mentions of @realDonaldTrump and @HillaryClinton increased.

Establishing causality would require a different path of research, which is not the focus of this work. If we wanted to pursue this path, then we could start by seeking to rule out other potential causes. We would need to collect a baseline set of random Twitter users who had accounts during 2013-2017. Given the distribution of initial contacts from Figure 14 and Figure 15, we could randomly select a user from the baseline for each initial contact date and study their behavior before/after this date. Since most random users would also likely not have mentioned @realDonaldTrump or @HillaryClinton, we could restrict our study to only those users who have mentioned them. This would allow us to measure tweet count, mean sentiment, and political mentions. If the behavioral changes are similar to ours, then we cannot rule out that there were general trends that may have caused these changes, but if the baseline changes are different, then that rules out those general trends as a cause, but still does not establish causality for the IRA. We also note that even though it may be found that there are general trends among Twitter users, our study is still robust as it found cases where (for example, the overall change of sentiments in tweets for general users) there was a statistically significant change in users who engaged back with the IRAs, whereas no statistically significant change in users who did not engage back. This difference is an indication that the trends observed in this study are not merely the general trends observed on Twitter.

A limitation of this study was the inability to access several sources of archival Twitter data including both user retweets and favorites, on either Twint or Tweepy, which could not be accessed during the 2014-2017 time period for the majority of the users in the study (team, 2020; Roesslein, 2019). Although engaging back with the IRAs via mentions and replies can perhaps be deemed as a stronger sign of involvement with the IRAs, as compared to retweets, a log of favorite tweets as well as retweets would have allowed for a much more all-encompassing view of user behavior on the platform.

Another limitation of this study was the inability to scientifically ascertain whether the users are Americans or not. Twitter does not allow the extraction of location information if users do not want their locations to be public. Since most of the Twitter users do not turn on their location service settings, and even fewer of them go on to actually geo-tag their tweets (Sloan and Morgan, 2015), the analysis would have ended up with very few tweets having location data associated if the study had relied on tweets’ location information to discern if the user is American or not. Therefore, this study focused only on users who had more than 75% of their tweets in English. It is unclear what fraction of these English language posters are necessarily American.

It is also difficult to ascertain user attitudes, as it does not directly correlate with sentiments, especially when working with small amounts of text, such as tweets. Although tweets mentioning @realDonaldTrump or @HillaryClinton were almost certainly pertaining to the respective candidates in some capacity, it was unclear whether the sentiment was directed at the individuals, their detractors, or some other third party.

Conclusions

This paper has investigated the behavior of Twitter users before and after being contacted by Russia’s Internet Research Agency prior to the 2016 U.S. presidential election. Our key finding was that for all user groups of highly targeted users—namely responsive and unresponsive general users and influencers—there were significant increases after first IRA contact in mean tweet count, and the number of mentions of @realDonaldTrump and @HillaryClinton. There was also a significant increase in negative sentiment for influencers as well as responsive general users. Another key finding was that responsive users who engaged with the IRA generally showed change that was either higher in percentage or satisfied stricter statistical significance compared with non-responsive users across a variety of different metrics.

References

  • M. H. Adam Goldman and N. Fandos (2020) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Implications.
  • N. Albadi, M. Kurdi, and S. Mishra (2019) Hateful people or hateful bots? detection and characterization of bots spreading religious hatred in arabic social media. Proceedings of the ACM on Human-Computer Interaction 3 (CSCW), pp. 1–25. Cited by: Malicious Bots on Social Media.
  • D. Arnaudo (2017) Computational propaganda in brazil: social bots during elections. Cited by: Social Media Manipulation during Elections.
  • C. A. Bail, B. Guay, E. Maloney, A. Combs, D. S. Hillygus, F. Merhout, D. Freelon, and A. Volfovsky (2020) Assessing the russian internet research agency’s impact on the political attitudes and behaviors of american twitter users in late 2017. Proceedings of the National Academy of Sciences 117 (1). Cited by: Introduction, Internet Research Agency.
  • S. Banjo (2019) External Links: Link Cited by: Introduction.
  • M. C. Benigni, K. Joseph, and K. M. Carley (2017) Online extremism and the communities that sustain it: detecting the isis supporting community on twitter. PloS one 12 (12). Cited by: Malicious Bots on Social Media.
  • A. Bessi and E. Ferrara (2016) Social bots distort the 2016 us presidential election online discussion. First Monday 21 (11-7). Cited by: Malicious Bots on Social Media.
  • B. C. Boatwright, D. L. Linvill, and P. L. Warren (2018) Troll factories: the internet research agency and state-sponsored agenda building. Resource Centre on Media Freedom in Europe. Cited by: Introduction, Internet Research Agency.
  • R. L. Boyd, A. Spangher, A. Fourney, B. Nushi, G. Ranade, J. Pennebaker, and E. Horvitz (2018) Characterizing the internet research agency’s social media operations during the 2016 us presidential election using linguistic analyses. Cited by: Introduction, Internet Research Agency.
  • J. M. Burkhardt (2017) Combating fake news in the digital age. Vol. 53, American Library Association. Cited by: Implications.
  • N. A. Cooke (2018) Fake news and alternative facts: information literacy in a post-truth era. American Library Association. Cited by: Implications.
  • M. M. Danilak (2020) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Data Collection.
  • R. DiResta, K. Shaffer, B. Ruppel, D. Sullivan, R. Matney, R. Fox, J. Albright, and B. Johnson (2018) The tactics and tropes of the internet research agency. Cited by: Internet Research Agency.
  • E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini (2016) The rise of social bots. Communications of the ACM 59 (7), pp. 96–104. Cited by: Malicious Bots on Social Media.
  • E. Ferrara (2017) Disinformation and social bot operations in the run up to the 2017 french presidential election. arXiv preprint arXiv:1707.00086. Cited by: Malicious Bots on Social Media, Social Media Manipulation during Elections.
  • V. Gadde and K. Beykpour (2018) External Links: Link Cited by: Data Collection.
  • G. M. Graff (2018) External Links: Link Cited by: Introduction.
  • P. N. Howard, B. Ganesh, D. Liotsiou, J. Kelly, and C. François (2019) The ira, social media and political polarization in the united states, 2012-2018. U.S. Senate Documents. Cited by: Introduction, Internet Research Agency.
  • P. N. Howard and B. Kollanyi (2016) Bots,# strongerin, and# brexit: computational propaganda during the uk-eu referendum. Available at SSRN 2798311. Cited by: Social Media Manipulation during Elections.
  • C.J. Hutto and E.E. Gilbert (2014) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Method.
  • F. B. Keller, D. Schoch, S. Stier, and J. Yang (2020) Political astroturfing on twitter: how to coordinate a disinformation campaign. Political Communication 37 (2), pp. 256–280. Cited by: Introduction, Internet Research Agency.
  • C. Kriel and A. Pavliuc (2019) Reverse engineering russian internet research agency tactics through network analysis. Defence Strategic Communication, pp. 199–227. Cited by: Internet Research Agency.
  • N. Lomas (2020) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Implications.
  • G. M. Mary Ilyushina and T. Lister (2019) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Implications.
  • R. S. Mueller and M. W. A. Cat (2019) Report on the investigation into russian interference in the 2016 presidential election. Vol. 1, US Department of Justice Washington, DC. Cited by: Internet Research Agency.
  • D. O’Sullivan (2020) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Implications.
  • U. H. of Representatives Permanent Select Committee on Intelligence (2018) Exposing russia’s effort to sow discord online: the internet research agency and advertisements. . Cited by: Introduction, Data Collection.
  • T. P. Policy (2018) Update on twitter’s review of the 2016 us election. Note: https://blog.twitter.com/en_us/topics/company/2018/2016-election-update.html[Online; accessed 15-May-2020] Cited by: Introduction, Introduction, Data Collection, Data Collection.
  • J. Roesslein (2019) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Data Collection, Limitations.
  • D. J. Ruck, N. M. Rice, J. Borycz, and R. A. Bentley (2019) Internet research agency twitter activity predicted 2016 us election polls. First Monday 24 (7). Cited by: Internet Research Agency.
  • C. Shao, G. L. Ciampaglia, O. Varol, A. Flammini, and F. Menczer (2017) The spread of fake news by social bots. arXiv preprint arXiv:1707.07592 96, pp. 104. Cited by: Malicious Bots on Social Media.
  • L. Sloan and J. Morgan (2015) Who tweets with their location? understanding the relationship between demographic characteristics and the use of geoservices and geotagging on twitter. PloS one 10 (11). Cited by: Limitations.
  • S. Solutions (2020) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Method.
  • L. Stack (2018) External Links: Link Cited by: Data Collection.
  • K. Starbird, A. Arif, and T. Wilson (2019) Disinformation as collaborative work: surfacing the participatory nature of strategic information operations. Proceedings of the ACM on Human-Computer Interaction 3 (CSCW), pp. 1–26. Cited by: Social Media Manipulation during Elections.
  • O. team (2020) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Data Collection, Data Collection, Limitations.
  • N. Thompson (2018) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Introduction.
  • C. Ward and T. Lister (2020) Note: [Online; accessed 15-May-2020] External Links: Link Cited by: Introduction, Implications.
  • T. Wilson and K. Starbird (2020) Cross-platform disinformation campaigns: lessons learned and next steps. Harvard Kennedy School Misinformation Review 1 (1). Cited by: Social Media Manipulation during Elections.
  • S. C. Woolley and P. Howard (2017) Computational propaganda worldwide: executive summary. Cited by: Introduction.
  • K. Yang (2020) External Links: Link Cited by: Data Collection.