Neutral Bots Reveal Political Bias on Social Media

05/17/2020
by   Wen Chen, et al.
0

Social media platforms attempting to curb abuse and misinformation have been accused of political bias. We deploy neutral social bots on Twitter to probe biases that may emerge from interactions between user actions, platform mechanisms, and manipulation by inauthentic actors. We find evidence of bias affecting the news and information to which U.S. Twitter users are likely to be exposed, depending on their own political alignment. Partisan accounts, especially conservative ones, tend to receive more followers, follow more automated accounts, are exposed to more low-credibility content, and find themselves in echo chambers. Liberal accounts are exposed to moderate content shifting their experience toward the political center, while the interactions of conservative accounts are skewed toward the right. We find weak evidence of liberal bias in the news feed ranking algorithm for conservative accounts. These findings help inform the public debate about how social media shape exposure to political information.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

10/26/2020

The Manufacture of Political Echo Chambers by Follow Train Abuse on Twitter

A growing body of evidence points to critical vulnerabilities of social ...
06/02/2020

How Twitter Data Sampling Biases U.S. Voter Behavior Characterizations

Online social media are key platforms for the public to discuss politica...
10/21/2021

Algorithmic Amplification of Politics on Twitter

Content on Twitter's home timeline is selected and ordered by personaliz...
05/04/2020

Roots of Trumpism: Homophily and Social Feedback in Donald Trump Support on Reddit

We study the emergence of support for Donald Trump in Reddit's political...
11/03/2021

Shifting Polarization and Twitter News Influencers between two U.S. Presidential Elections

Social media are decentralized, interactive, and transformative, empower...
10/11/2021

Perspective-taking to Reduce Affective Polarization on Social Media

The intensification of affective polarization worldwide has raised new q...
09/28/2021

Learning Ideological Embeddings from Information Cascades

Modeling information cascades in a social network through the lenses of ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Compared with traditional media, online social media can connect more people in a cheaper and faster way than ever before. As a large portion of the population frequently use social media to generate content, consume information, and interact with others (29), online platforms are also shaping the norms and behaviors of their users. Experiments show that simply altering the messages appearing on social feeds can affect the online expressions and real-world actions of users (19, 5), and that social media users are sensitive to early social influence (25, 36). At the same time, discussions on social media tend to be polarized around critical yet controversial topics like elections (10), vaccination (32), and climate change (37). Polarization is often accompanied by the segregation of users with incongruent points of view into so-called echo chambers (15, 21), which have been associated with ideology radicalization and misinformation spreading (38, 12).

Countering such undesirable phenomena requires a deep understanding of their underlying mechanisms. On the one hand, several socio-cognitive biases of humans, including selection of belief-consistent information (26) and the tendency to seek homophily in social ties (23), have been identified as major contributors (13, 31). On the other hand, web platforms have their own algorithmic biases (2, 27). For example, engagement bias in ranking algorithms may create a vicious cycle amplifying noise over quality (9, 1). For a more extreme illustration, recent studies and media reports suggest that the YouTube recommendation system might lead to videos with more misinformation or extreme viewpoints regardless of the starting point (30).

Beyond the socio-cognitive biases of individual users and algorithmic biases of technology platforms, we have a very limited understanding of how collective interactions mediated by social media may bias the view of the world that we obtain through the online information ecosystem. The major obstacle is the complexity of the system — not only do users exchange huge amounts of information with large numbers of others via many hidden mechanisms, but these interactions can be manipulated overtly and covertly by legitimate influencers as well as inauthentic, adversarial actors who are motivated to influence opinions or radicalize behaviors (35). Evidence suggests that malicious entities like social bots and trolls have already been deployed to spread misinformation and influence public opinion on critical matters (33, 34, 7).

In this study, we aim to reveal biases in the news and information to which people are exposed in social media ecosystems. We are particularly interested in clarifying the role of social media during the polarization process and the formation of echo chambers. We therefore focus on U.S. political discourse on Twitter since this platform plays an important role in American politics and strong polarization and echo chambers have been observed (10).

Our goal of studying ecosystem bias requires the exclusion of biases from individual users, which is a challenge when using observational methods. Social media accounts that mimic human users but are completely controlled by algorithms, known as social bots, can be used for this purpose (18). Here we deploy social bots with unbiased random behavior as instruments to probe exposure biases in social media. We call our bots drifters to distinguish their neutral behavior from other types of benign and malicious social bots on Twitter (14).

Drifters are designed with an identical behavior model but with the only distinctive difference of their initial friend — the very first account they follow. After this initial action that represents the single independent variable in our experiment, each drifter was let loose in the wild. After five months, we examined the content consumed and generated by the drifters and analyzed their exposure to low-credibility information and the characteristics of their friends and followers, including their political alignment and automated activities. This methodology allows us to examine the combined biases that stem both from Twitter’s system design and recommendation/ranking algorithms, and from the organic and inorganic social interactions between the drifters and other accounts.

We find that the political alignment of the initial friend has a major impact on the popularity, social network structure, exposure to bots and low-credibility sources, and political alignment manifest in the actions of each drifter. The unique insights provided by our study into the political currents in Twitter’s information ecosystem can aid the public debate about how social media platforms shape people’s exposure to political information.

Results

We developed 15 drifter bots with the same behavior model (see Methods for details), divided them into five groups, and initialized each drifter in the same group with the same initial friend. Each Twitter account used as a first friend is a news source aligned with the left, center-left, center, center-right, or right of the U.S. political spectrum (see details in Methods). We refer to the drifters by the political alignment of their initial friends; for example, bots initialized with center-left sources are called “C. Left” drifters.

Between their deployment on July 10, 2019 and until their deactivation on December 1, 2019, we monitored the behaviors of the drifters and collected data on a daily basis. In particular, we measured: (1) the number of followers of each drifter to compare their ability to gain influence; (2) the bot scores of friends and followers of the drifters to check for automated activities; (3) the transitivity of the ego network of each drifter as a proxy for echo-chamber exposure; (4) the proportion of low-credibility information to which the drifters are exposed; and (5) the political valence of content generated by the drifters and their friends to probe political biases.

Influence

Figure 1:

Average numbers of followers of different drifter groups over time. In this and other line charts, colored confidence intervals indicate

standard error.

The number of followers can be used as a crude proxy for influence (8). To gauge how political alignment affects influence dynamics, Fig. 1 plots the average number of followers of drifters in different groups over time. Two trends emerge. First, among drifters on the same side of the political spectrum, those with more extreme sources as initial friends tend to attract more followers; Center drifters tend to be the least influential. Second, drifters with right-leaning initial sources gain followers at a significantly higher rate than those with left-leaning initial sources.

Automated Activities

Figure 2: Average bot scores of friends and followers of drifters in different groups. The bot score is a number between zero and one, with higher scores signaling likely automation. In this and other bar charts, error bars indicate standard errors.

Social bots were actively involved in online discussions about recent U.S. elections (4, 11, 33). It is therefore expected for the drifters to encounter automated accounts. We used the Botometer service (40) to collect bot scores of friends and followers of the drifters. We report the average bot scores for both friends and followers of the drifters in Fig. 2. Focusing on the friends, we find that accounts followed by centrist, moderate, and partisan drifters are increasingly more bot-like, respectively. Among partisan groups, right-leaning drifters tend to follow significantly more bots than left-leaning ones.

Followers are more bot-like than friends for all groups, without significant differences across the political spectrum. This is not surprising; human users are more likely to identify the true nature of the drifters and therefore less likely to follow them.

Echo Chambers

We wish to investigate whether the structure of the social networks in which the drifter bots find themselves amplifies exposure to homogeneous content. Density and transitivity of ego networks can be used as proxies for the presence of echo chambers, which we define as highly clustered social media neighborhoods in which users are likely to be exposed to the same information from multiple sources. Transitivity measures the fraction of possible triangles that are actually present among the nodes of a network. High transitivity means that friends and followers are likely to follow each other too. See Methods for further details.

Figure 3: (A) Density, (B) transitivity, and (C) normalized transitivity of drifters ego networks in different groups. (D) Ego networks of the drifters in the five groups. Nodes represent accounts and edges represent friend/follower relations. Node size and color represent degree (number of neighbors) and political alignment of shared content, respectively. Black nodes have missing valence score due to not sharing content with political valence.

Fig. 3(A,B) shows the average density and transitivity of ego networks for the drifters (see details in Methods). Since the two metrics are correlated in an ego network, Fig. 3(C) also plots the transitivity rescaled by that of shuffled random networks (see Methods). All metrics indicate that partisan accounts are more densely clustered than centrists, and right-leaning accounts are in stronger echo chambers than left-leaning ones.

To get a better sense of what these echo-chambers look like, Fig. 3(D) maps the ego networks of the 15 drifters. In addition to the clustered structure, we observe a degree of homogeneity in shared content as illustrated by the colors of the nodes, which represent the political alignment of the links shared by the corresponding accounts (see Methods; similar results are obtained by measuring political alignment based on shared hashtags). In general, the neighbors of a drifter tend to share links to sources that are politically aligned with the drifter’s first friend. We note a few exceptions, however. The left drifters and their neighbors are more moderate, having shifted their alignment toward the center. One of the center-left drifters has become connected to many conservative accounts, shifting its alignment to the right. And one of the center-right drifters has shifted its alignment to the left, becoming connected to mostly liberal accounts after randomly following @CNN. In most cases, drifters find themselves in structural echo chambers where they are exposed to content with homogeneous political alignment that mirrors their own.

Exposure to Low-credibility Content

Figure 4: Proportions of low-credibility links in drifter home timelines.

Since the 2016 U.S. presidential election, concern has been heightened about the spread of misinformation in social media (20). We therefore analyze exposure to content from low-credibility sources for different groups of drifters in Fig. 4. Details about low-credibility sources are found in Methods. We observe that drifters initialized with right-leaning sources receive significantly more low-credibility content in their social feeds than other groups. For Right drifters, over 15% of the links that appear in their timelines are from low-credibility sources. We also measured the absolute number of low-credibility links, and used the total number of tweets or the number of tweets with links as the denominator of the proportions; the same pattern emerges in all cases.

Note that Breitbart News appears in lists of low-credibility (hyper-partisan) sources compiled by some news and fact-checking organizations and used in the literature. However, we used Breitbart News as the right-leaning initial friend account for Right drifters because it is one of the most popular conservative news sources. To prevent biasing our results, Breitbart News is not labeled as a low-credibility source in this analysis and does not contribute to the proportions in Fig. 4.

Political Alignment and Algorithmic Bias

Figure 5: Time series of (A) political valence scores to which drifter are exposed in their home timelines; (B) valence scores expressed by drifter posts in their user timelines; and (C) algorithmic bias experienced by drifters in different groups, where the political valence score is derived from the content generated by friends. The political valence scores are calculated from hashtags in tweets. Negative scores mean left-leaning and positive scores mean right-leaning. Missing values are replaced by preceding available ones.

We wish to measure the political valence of content consumed and produced by drifters. Given a link (URL), we can extract the source (website) domain name and obtain a valence score based on the known political slant of the source. Given a hashtag, we can calculate a score based on its co-occurrence with other hasthags on Twitter that are known to signal a political alignment. These scores can then be averaged across the links or hashtags contained in a feed of tweets to measure their aggregate political valence. Further details can be found in Methods.

The home timeline (also known as news feed) is the set of tweets to which accounts are exposed. The user timeline is the set of tweets produced by an account. In Fig. 5(A,B) we observe how the political valence of information to which drifters are exposed in their home timelines and of content generated by them in their user timelines changed during the experiment. The initial friends strongly affect the political trajectories of the drifters. Both in terms of information to which they are exposed and content they produce, drifters initialized with right-leaning sources stay on the conservative side of the political spectrum. Those initialized with left-leaning sources, on the other hand, tend to drift toward the political center: they are exposed to more conservative content and even start spreading it. This analysis is based on hashtags, but similar results are obtained with political valence scores inferred from links (as seen in Fig. 3 and S1).

We measures the political bias of the home timeline ranking algorithm by calculating the difference between the valence score of tweets posted by friends of the drifters and the score of tweets in the home timeline. The algorithm selects the latter from the former. We observe little evidence of political bias by the algorithm. For right-leaning drifters, there is a small but significant shift to the left, suggesting a weak bias of the platform algorithm prioritizing left-leaning content (Fig. 5(C)). For the other groups, no algorithmic bias is detected. This analysis is again based on hashtags; with links, we don’t find significant evidence of favoritism in any of groups (Fig. S1(C)).

The political valence scores of the home timeline of each drifter account and the corresponding algorithmic bias can be found in Fig. S2S4.

Discussion

Social bots can be used as unbiased instruments to probe political (and other) biases in online information ecosystems. Though we examined Twitter in this paper, the same methodology (and our software) can be applied by the research community in different contexts. It will be interesting to see if our findings can be replicated on other platforms, other countries, with more drifter bots, and during elections.

The present results suggest that early choices about which sources to follow have a strong impact on the experiences of social media users. This is consistent with previous studies (25, 36). But beyond those initial actions, drifter bots are designed to be neutral with respect to partisan content and users. Therefore the partisan-dependent differences in their experiences and behaviors can be attributed to their interactions with users and information mediated by the social media platform — they reflect biases of the online information ecosystem.

Drifters with right-wing initial friends are gradually embedded into dense and homogeneous networks where they are constantly exposed to right-leaning content. They even start to spread right-leaning content themselves. Such online feedback loops reinforcing group identity may lead to radicalization (38), especially in conjunction with social and cognitive biases like in-/out-group bias and group polarization.

The fact that right-leaning drifters are exposed to considerably more low-credibility content than other groups is in line with previous findings that conservative users are more likely to engage with misinformation on social media (17)

. Our experiment suggests that the ecosystem can lead completely unbiased agents to this condition, therefore it is not necessary to impute the vulnerability to individual characteristics. Other mechanisms that may contribute to the exposure to low-credibility content observed for the drifters initialized with right-leaning sources involve the actions of neighbor accounts (friends and followers) in the right-leaning group, including inauthentic accounts that target this group.

While most drifters are parts of clustered and homogeneous network communities, the echo chambers of conservative accounts grow especially dense and include a larger portion of politically active accounts. Social bots also seem to play an important role in the partisan social networks; the drifters, especially right-leaning ones, end up following a lot of them. Since bots also amplify the spread of low-credibility news (33), this may help explain the prevalent exposure of right-leaning drifters to low-credibility sources. Drifters initialized with far-left sources do gain more followers, follow more bots, and form denser social networks compared with the center and center-left groups. However this occurs in a way that is less emphatic and vulnerable to low-credibility content compared to the right and center-right groups.

Twitter has been accused of favoring liberal content and users. Our analysis of temporal shifts in political valence of the drifters and their friends reveals a complex picture in which the ecosystem has a significant bias even though the platform does not. We examined the possible bias in Twitter’s news feed ranking algorithm, i.e., whether the content to which a user is exposed in the home timeline is selected in a way that amplifies or suppresses certain political content. Our results suggest this is not the case in general: the drifters seem to receive content that is closely aligned with whatever their friends produce. The one exception is a weak liberal bias captured only by hashtag analysis and observed only among conservative accounts. Such an algorithmic bias may not be due to any intentional interference by the platform. Other explanations are possible; for example, Twitter may remove or demote information from low-credibility sources, resulting in a bias toward the center due to the prevalence of low-credibility information in conservative groups.

On the other hand, drifters that start with left-leaning sources shift toward the right during the course of the experiment, sharing and being exposed to more moderate content. Drifters that start with right-leaning sources do not experience a similar exposure to moderate information and produce increasingly partisan content. These results also confirm that right-leaning bots do a better job at influencing users (22). In summary, we observe a net conservative bias that emerges from the complex interactions within the information ecosystem.

Our experiment demonstrates that even if a platform has no bias in its algorithms and policies, the social networks and activities of its users may still create an environment in which unbiased agents end up in echo chambers with constant exposure to partisan, inauthentic, and misleading content. Users have to make extra efforts to moderate the content they consume and the social ties they form in order to counter these currents and create a healthy and balanced online experience. Platform must become aware that neutral algorithms do not necessarily yield neutral outcomes. A key question is how to design mechanisms capable of mitigating the biases that emerge in online information ecosystems.

Methods

Here we provide details about the design of Drifter bots, the computation of political valence metrics, the identification of low-credibility sources, and the characterization of echo chambers.

Drifter Behavior Model

Drifter bots are the key instrument for this study. They are designed to mimic social media users, so that the data collected from their actions and interactions reflects realistic experiences on the platform. The drifters lack any ability to comprehend the content to which they are exposed or the users with whom they interact. All actions are controlled by a stochastic model which was unchanged during the experiment.

Like many human behaviors, social media activity is bursty (16). To reproduce this feature, we draw time intervals between two successive actions from a power-law distribution , with exponent manually tuned to minimize the bot score obtained from the Botometer service. The distribution was cutoff at a maximum sleep duration of seven hours between consecutive actions. Intervals were further scaled to obtain a reasonable average frequency of 20–30 actions per day. Moreover, the drifters were inactive between midnight and 7 a.m.

Figure 6:

Drifter bot behavior model. Each action box is connected with boxes that indicate the sources used for that action. For example, the source of a retweet can be a trending tweet, a tweet in the home timeline, or a tweet liked by a friend. Links to actions and sources are associated with probabilities. Follow and unfollow actions require additional constraints to be satisfied (gray diamonds).

Every time a drifter is activated, it randomly selects an action and a source as illustrated in Fig. 6. Actions include tweets, retweets, likes, replies, etc. Sources include the home timeline, trends, friends, etc. Each action is selected with a predefined probability. Given the selected action, one of a set of possible sources is selected with a predefined conditional probability. A random object is then drawn from the source and the action is performed on it. For example, if the action is a retweet and the source is the home timeline, then a random tweet in the drifter’s home timeline is retweeted. Non-English sources (users and tweets) are disregarded when they can be identified from metadata. Finally, the bot sleeps for a random interval until the next action. To avoid behaviors typical of spam bots that violate Twitter’s polices, the follow and unfollow actions have additional constraints regarding the ratio between friends and followers. The constraints are mutually exclusive, so that if one of these two actions fails due to a constraint not being satisfied, the other action in performed. Details about actions, sources, their associated probabilities, and constraints can be found in Supplementary Materials.

When creating the drifter profiles, we avoided any political references that would bias the experiment as well as deceptive impersonation that would violate platform policies. The screen names and user names were selected from famous bots in the arts and literature. Corresponding profile images were drawn from the public domain. Random quotes were used as profile descriptions. Since the drifter bots interacted with human subjects, the experiment was vetted by the Indiana University’s ethics board and deemed exempt from review.

The only difference among the drifters was the way their friend lists were initialized. This was our experiment’s independent variable. We started from five Twitter accounts associated with established, active, and popular U.S. news sources: The Nation (left), The Washington Post (center-left), USA Today (center), The Wall Strett Journal (center-right), and Breitbart News (right). These sources were selected as they span the full range of the U.S. political spectrum, according to the valence of the corresponding websites computed with a method described below. The 15 drifters were divided into five groups so that each of three bots in the same group started by following the same source. The friend list of each drifter was then expanded by following a random sample of five English-speaking friends of the first friend, and a random sample of five English-speaking followers of the first friend — 11 accounts in total.

Political Alignment Metrics and Algorithmic Bias

Given our political bias, we need to measure the political alignment of tweets and accounts. We adopt two independent approaches, one based on hashtags and one on links, to ensure the robustness of our results. Both approaches start with assigning political valence scores to entities that may be present in tweets, namely hashtags and links. The entity scores are then averaged at the tweet level to obtain valence scores for the tweets, and further at the user level to measure the political alignment of users.

The hashtag-based approach relies on hashtags (keywords preceded by the hash mark (#) commonly included by users in their social media posts because they are concise and efficient ways to label topics, ideas, or memes so that others can find the messages. Hashtags are often used to signal political identities, beliefs, campaigns, or alignment. We apply the word2vec algorithm (24)

to assign political valence scores to hashtags in a semi-supervised fashion. Word2vec maps words in text to continuous vector representations. The axis between a pair of carefully selected word vectors can encode a meaningful cultural dimension and an arbitrary word’s position along this axis reflects its association with this cultural dimension. Using hashtags as words, we look for an axis representing the political alignment in the embedding vector space. We leverage a dataset of political tweets collected during the 2018 U.S. midterm elections 

(39). The hashtags from the same tweet are grouped together as a single “sentence” and fed to the word2vec algorithm to obtain vector representations for the hashtags. After removing hashtags appearing less than 10 times in the dataset, we end up with 54,533 hashtag vectors. To define the political alignment axis, we choose #voteblue and #votered as two poles because they show clear alignment with U.S. liberal and conservative political orientations, respectively. The rest of the hashtags are then projected onto this axis, and the relative positions, scaled into the interval , are used to measure the political valence where negative/positive scores indicate left/right alignment.

The link-based approach considers links (URLs) commonly used to share news and other websites in social media posts, for the purpose of spreading information, expressing opinions, or flagging identity, especially around political matters. Many websites show clear political bias and the number of popular ones is quite limited. Therefore the websites (domains) extracted from links provide another convenient proxy for the political alignment of tweets and users. To assess the political alignment of a website, we start with a dataset of 500 ideologically diverse news sources (3), where each domain is assigned a scores reflecting its political valence in the liberal-conservative range . We manually clean the dataset by removing outdated domains and updating the ones with new names. For example, myfoxdetroit.com becomes fox2detroit.com. For each link found in the tweets, after expanding shortened links, we matched the domain to the list to obtain a score.

We further aggregate the political valence scores of the tweets, obtained using either hashtags or links, at the account level. We examine different types of political valence for accounts, each measured on a daily basis. The political valence to which a drifter is exposed, , is computed by averaging the scores of 50 most recent tweets from its home timelines. We also evaluate the political valence expressed by an accounts by averaging recent tweets they post. We measure this expressed valence for each drifter using its most recent 20 tweets. We use to represent the political valence expressed by the friends of each drifters, using their most recent 500 collective tweets. In the timeline plots, missing values are replaced by preceding available ones. In Supplementary Materials we detail how political valence scores are calibrated so that a value of zero can be interpreted as aligned with the political center.

Since represents the political alignment expressed by the friends of a drifter and represents the valence of the posts to which the drifter is actually exposed in its home timeline, the difference can be used to measure any political bias in Twitter’s ranking algorithm that prioritize posts on one’s news feed.

Identification of Low-credibility Content

In evaluating the credibility of the content to which drifters are exposed, we focus on the sources of shared links to circumvent the challenge of assessing the accuracy of individual news articles (20). Annotating content credibility at the domain (website) level rather than the link level is an established approach in the literature (33, 17, 28, 6).

We use a list of low-credibility sources complied from several recent research papers. Specifically, we consider a source as low-credibility if it fulfills any one of the following criteria: (1) labeled as low-credibility by Shao et al. (33); (2) labeled as “Black,” “Red,” or “Satire” by Grinberg et al. (17); (3) labeled as “fake news” or “hyperpartisan” by Pennycook et al. (28); or (4) labeled as “extreme left,” “extreme right,” or “fake news” by Bovet et al. (6). This provides us with a list of 570 sources.

To measure the percentage of low-credibility links, we extracted the links from the home timelines of the drifters (expanding those that are shortened) and then cross-referenced them with the list of low-credibility sources.

Echo Chambers

We wish to measure the density and transitivity of each drifter bot’s ego network. Since reconstructing the full network of friends and followers of each bot is prohibitively time consuming due to the Twitter API’s rate limits, we approximated each bot’s ego network by sampling 100 random neighbors from a list of the latest 200 friends and 200 followers returned by the Twitter API. We then checked each pair of sampled neighbors for friendship. We add an undirected edge if there a follower/friend link in either direction, so that the sampled ego network is undirected and unweighted. Finally, we computed the density and transitivity of each ego network.

Since transitivity is correlated with density, we also normalized the transitivity by the average transitivity of 30 shuffled networks generated by a configuration model that preserves the degree sequence of the original ego network. We replace any self-loop and parallel edges, generated by the configuration model, by random edges.

References

  • [1] M. Avram, N. Micallef, S. Patil, and F. Menczer (2020) Exposure to social engagement metrics increases vulnerability to misinformation. Preprint Technical Report 2005.04682, arXiv. Cited by: Introduction.
  • [2] R. Baeza-Yates (2018) Bias on the web. Communications of the ACM 61 (6), pp. 54–61. Cited by: Introduction.
  • [3] E. Bakshy, S. Messing, and L. Adamic (2015) Replication data for: exposure to ideologically diverse news and opinion on facebook. Harvard Dataverse 2. External Links: Document Cited by: Political Alignment Metrics and Algorithmic Bias.
  • [4] A. Bessi and E. Ferrara (2016) Social bots distort the 2016 US presidential election online discussion. First Monday 21 (11). Cited by: Automated Activities.
  • [5] R. M. Bond, C. J. Fariss, J. J. Jones, A. D. Kramer, C. Marlow, J. E. Settle, and J. H. Fowler (2012) A 61-million-person experiment in social influence and political mobilization. Nature 489 (7415), pp. 295–298. Cited by: Introduction.
  • [6] A. Bovet and H. A. Makse (2019) Influence of fake news in twitter during the 2016 us presidential election. Nature communications 10 (1), pp. 1–14. Cited by: Identification of Low-credibility Content, Identification of Low-credibility Content.
  • [7] D. A. Broniatowski, A. M. Jamison, S. Qi, L. AlKulaib, T. Chen, A. Benton, S. C. Quinn, and M. Dredze (2018) Weaponized health communication: twitter bots and russian trolls amplify the vaccine debate. Am. J. of Public Health 108 (10), pp. 1378–1384. Cited by: Introduction.
  • [8] M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi (2010) Measuring user influence in Twitter: The million follower fallacy. In Proc. 4th Intl. AAAI Conference on Weblogs and Social Media (ICWSM), Cited by: Influence.
  • [9] G. L. Ciampaglia, A. Nematzadeh, F. Menczer, and A. Flammini (2018) How algorithmic popularity bias hinders or promotes quality. Scientific reports 8 (1), pp. 1–7. Cited by: Introduction.
  • [10] M. D. Conover, J. Ratkiewicz, M. Francisco, B. Gonçalves, F. Menczer, and A. Flammini (2011) Political polarization on twitter. In Fifth international AAAI conference on weblogs and social media, Cited by: Introduction, Introduction.
  • [11] A. Deb, L. Luceri, A. Badaway, and E. Ferrara (2019) Perils and challenges of social media and election manipulation analysis: the 2018 us midterms. In Companion Proc. WWW Conf., pp. 237–247. External Links: Link, Document Cited by: Automated Activities.
  • [12] M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H. E. Stanley, and W. Quattrociocchi (2016) The spreading of misinformation online. Proceedings of the National Academy of Sciences 113 (3), pp. 554–559. Cited by: Introduction.
  • [13] M. Del Vicario, A. Scala, G. Caldarelli, H. E. Stanley, and W. Quattrociocchi (2017) Modeling confirmation bias and polarization. Scientific reports 7, pp. 40391. Cited by: Introduction.
  • [14] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini (2016) The rise of social bots. Communications of the ACM 59 (7), pp. 96–104. Cited by: Introduction.
  • [15] R. K. Garrett (2009) Echo chambers online?: politically motivated selective exposure among internet news users. Journal of Computer-Mediated Communication 14 (2), pp. 265–285. Cited by: Introduction.
  • [16] R. Ghosh, T. Surachawala, and K. Lerman (2011) Entropy-based classification of ‘retweeting’ activity on twitter. In Proc. 4th Workshop on Social Network Mining and Analysis (SNA-KDD), Cited by: Drifter Behavior Model.
  • [17] N. Grinberg, K. Joseph, L. Friedland, B. Swire-Thompson, and D. Lazer (2019) Fake news on twitter during the 2016 us presidential election. Science 363 (6425), pp. 374–378. Cited by: Discussion, Identification of Low-credibility Content, Identification of Low-credibility Content.
  • [18] E. Hargreaves, C. Agosti, D. Menasché, G. Neglia, A. Reiffers-Masson, and E. Altman (2019) Fairness in online social network timelines: measurements, models and mechanism design. Performance Evaluation 129, pp. 15–39. Cited by: Introduction.
  • [19] A. D. Kramer, J. E. Guillory, and J. T. Hancock (2014) Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences 111 (24), pp. 8788–8790. Cited by: Introduction.
  • [20] D. Lazer, M. Baum, Y. Benkler, A. Berinsky, K. Greenhill, F. Menczer, et al. (2018) The science of fake news. Science 359 (6380), pp. 1094–1096. Cited by: Exposure to Low-credibility Content, Identification of Low-credibility Content.
  • [21] J. K. Lee, J. Choi, C. Kim, and Y. Kim (2014) Social media, network heterogeneity, and opinion polarization. Journal of communication 64 (4), pp. 702–722. Cited by: Introduction.
  • [22] L. Luceri, A. Deb, A. Badawy, and E. Ferrara (2019) Red bots do it better: comparative analysis of social bot partisan behavior. In Companion Proceedings of The 2019 World Wide Web Conference, pp. 1007–1012. Cited by: Discussion.
  • [23] M. McPherson, L. Smith-Lovin, and J. M. Cook (2001) Birds of a feather: homophily in social networks. Annual review of sociology 27 (1), pp. 415–444. Cited by: Introduction.
  • [24] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: Political Alignment Metrics and Algorithmic Bias.
  • [25] L. Muchnik, S. Aral, and S. J. Taylor (2013) Social influence bias: a randomized experiment. Science 341 (6146), pp. 647–651. Cited by: Introduction, Discussion.
  • [26] R. S. Nickerson (1998) Confirmation bias: a ubiquitous phenomenon in many guises. Review of general psychology 2 (2), pp. 175–220. Cited by: Introduction.
  • [27] D. Nikolov, M. Lalmas, A. Flammini, and F. Menczer (2019) Quantifying biases in online information exposure. Journal of the Association for Information Science and Technology 70 (3), pp. 218–229. External Links: Document, Link Cited by: Introduction.
  • [28] G. Pennycook and D. G. Rand (2019) Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences 116 (7), pp. 2521–2526. Cited by: Identification of Low-credibility Content, Identification of Low-credibility Content.
  • [29] A. Perrin and M. Anderson (2019) Share of us adults using social media, including facebook, is mostly unchanged since 2018. Pew Research Center 10. Cited by: Introduction.
  • [30] M. H. Ribeiro, R. Ottoni, R. West, V. A. Almeida, and W. Meira Jr (2020) Auditing radicalization pathways on youtube. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 131–141. Cited by: Introduction.
  • [31] K. Sasahara, W. Chen, H. Peng, G. L. Ciampaglia, A. Flammini, and F. Menczer (2019) On the inevitability of online echo chambers. arXiv preprint arXiv:1905.03919. Cited by: Introduction.
  • [32] A. L. Schmidt, F. Zollo, A. Scala, C. Betsch, and W. Quattrociocchi (2018) Polarization of the vaccination debate on facebook. Vaccine 36 (25), pp. 3606–3612. Cited by: Introduction.
  • [33] C. Shao, G. L. Ciampaglia, O. Varol, K. Yang, A. Flammini, and F. Menczer (2018) The spread of low-credibility content by social bots. Nature Communications 9 (1), pp. 1–9. Cited by: Introduction, Automated Activities, Discussion, Identification of Low-credibility Content, Identification of Low-credibility Content.
  • [34] M. Stella, E. Ferrara, and M. De Domenico (2018) Bots increase exposure to negative and inflammatory content in online social systems. PNAS 115 (49), pp. 12435–12440. Cited by: Introduction.
  • [35] R. Thompson (2011) Radicalization and the use of social media. Journal of strategic security 4 (4), pp. 167–190. Cited by: Introduction.
  • [36] T. Weninger, T. J. Johnston, and M. Glenski (2015) Random voting effects in social-digital spaces: a case study of reddit post submissions. In Proceedings of the 26th ACM conference on hypertext & social media, pp. 293–297. Cited by: Introduction, Discussion.
  • [37] H. T. Williams, J. R. McMurray, T. Kurz, and F. H. Lambert (2015) Network analysis reveals open forums and echo chambers in social media discussions of climate change. Global environmental change 32, pp. 126–138. Cited by: Introduction.
  • [38] M. Wojcieszak (2010) ‘Don’t talk to me’: effects of ideologically homogeneous online groups and politically dissimilar offline ties on extremism. New Media & Society 12 (4), pp. 637–655. Cited by: Introduction, Discussion.
  • [39] K. Yang, P. Hui, and F. Menczer (2019) Bot electioneering volume: visualizing social bot activity during elections. In Companion Proceedings of The 2019 World Wide Web Conference, pp. 214–217. Cited by: Political Alignment Metrics and Algorithmic Bias.
  • [40] K. Yang, O. Varol, C. A. Davis, E. Ferrara, A. Flammini, and F. Menczer (2019)

    Arming the public with artificial intelligence to counter social bots

    .
    Human Behavior and Emerging Technologies 1 (1), pp. 48–61. External Links: Document, Link Cited by: Automated Activities.

Supplementary Materials

Supplementary methods

Drifter Actions and Probabilities

An action is performed upon a sentence, an existing tweet, or a user. These inputs are selected from sources that are described below. A drifter can perform the following actions:

  • Tweet – gather a sentence from Trends, Home Timeline, or Random Quotes, and post it as its own tweet.

  • Retweet – select a tweet from Trends, Home Timeline, or a list of Tweets Liked by its Friends, and retweet it.

  • Like – like a tweet selected in the same way as for a Retweet.

  • Reply – reply to a tweet from the Mention Timeline.

  • Follow – select a user to follow from the list of Followers, list of Friends of Friends, list of users who posted Tweets liked by Friends, or from the Home Timeline.

  • Unfollow – select a user to unfollow from the latest 200 in the list of Friends.

Input elements for actions are selected from candidate lists that we call sources

. The selection is random with uniform probability distribution unless otherwise explained below. Due to limitations of the Twitter APIs, we imitate some basic mechanisms offered by the platform, such as suggestions to follow friends of friends. Sources are defined as follows:

  • Random Quotes – sentences obtained from a random quote API (api.quotable.io/random).

  • Mention Timeline – the latest 10 tweets in the mention timeline. If the drifter replied to any mentions in the past, this source only considers subsequent tweets.

  • Friends of Friends – the model randomly selects three friends of the drifter and requests their latest 5,000 friends, ignoring those that are already friends of the drifter. A new friend is selected from the combined list with probability proportional to the occurrences in the list, to favor friends of multiple friends.

  • Friends – most recent 200 friends. The user is selected from this list at random, but older friends are more likely to be unfollowed. We implement this mechanism by ranking friends chronologically; the latest friend has rank one. The unfollow probability is proportional to the rank. The initial friend can never be unfollowed.

  • Followers – most recent 200 followers.

  • Trends – list obtained by randomly selecting three trending topics in the U.S. and fetching the top five tweets in each topic by the default ranking.

  • Tweets Liked by Friends – the latest 15 tweets from the home timeline. Depending on the selected action, the source can return the authors of the tweets (excluding the drifter itself) or a randomly selected tweet among the list composed by the latest three tweets liked by those authors.

  • Home Timeline – text content or authors of the latest 15 tweets in the home timeline.

Action Source
Reply 0.05 Mention Timeline 1.0
Tweet 0.15 Random Quotes 0.3
Trends 0.3
Home Timeline 0.4
Retweet
Like
0.1
0.35
Trends
Home Tiemline
Tweets Liked by Friends
0.1
0.6
0.3
Follow 0.25 Home Timeline 0.2
Tweets Liked by Friends 0.2
Friends of Friends 0.5
Followers 0.1
Unfollow 0.1 Friends 1.0
Table S1: Probabilities of actions and sources in the drifter bot behavior model. The probabilities of the actions add up to one, and so do the conditional probabilities of the sources given each action.

We list the probabilities used in the bot behavior model in Table S1. The numbers are inferred from a random sample of Twitter users. If the Follow or Unfollow action is selected, a precondition check is triggered. If the Follow precondition is not met, the Unfollow action is performed and viceversa; the two checks cannot both fail. A new friend can only be followed if the number of friends is sufficiently small compared to the number of followers: less than the number of followers plus 113. A friend can only be unfollowed if the drifter has at least 50 friends.

Calibration of Political Valence Scores

We calibrated valence scores so that positive scores mean right-leaning hashtags/links and negative scores mean left-leaning hashtags/links. To this end, we selected the news source account @USATODAY to have a zero valence score. We used the 200 most recent tweets by @USATODAY in early June to calculate the raw center valence score . We obtained and for the link-based and hashtag-based approach, respectively. The political valence scores are then calibrated by:

where is the score for tweet and is the number of tweets across which the score is aggregated.

Supplementary Analyses

In this section, we provide additional results and analyses for the political valence estimations and algorithmic biases. Figure 

S1 shows the aggregated political valence scores inferred from links. The absolute values are different from the hashtag-based approach, but the qualitative results hold. We further provide the individual trajectory of the political alignment for each drifter. Figure S2 shows the results from the link-based approach and Figure S3 shows the results from the hashtag-based approach. Finally, Fig. S4 shows the algorithmic bias computed for each drifter with both methods.

Figure S1: Time series of (A) political valence scores exposed in drifter home timelines; (B) valence scores expressed by drifter posts; and (C) algorithmic bias experienced by drifters in different groups. The political valence scores are calculated from links in tweets. Negative scores mean left-leaning and positive scores mean right-leaning. See Fig. S2 and Fig. S4 for plots for each drifter.

[trim = 0mm 7mm 0mm 0mm, clip,width=0.9]images/m3_bots AB

Figure S2: Political valence timelines based on links for all fifteen bots. A tweet is assigned a score between (liberal) and (conservative) based on the shared link domains. (A) Home timeline: daily average score of the last 50 tweets in the home timeline. (B) User timeline: daily average score of the last 20 tweets in the user timeline. The summary represents the average for each group.

[trim = 0mm 7mm 0mm 0mm, clip,width=0.9]images/m3_hashtag_bots AB

Figure S3: Political valence timelines based on hashtags for all fifteen bots. A tweet is assigned a score between (liberal) and (conservative) based on the shared hashtags. (A) Home timeline: daily average score of the last 50 tweets in the home timeline. (B) User timeline: daily average score of the last 20 tweets in the user timeline. The summary represents the average for each group.

[trim = 0mm 7mm 0mm 0mm, clip,width=0.9]images/url_bias AB

Figure S4: Algorithmic bias for all fifteen bots, measured by the difference in valence between the account’s home timeline and its friends’ user timelines, based on (A) links and (B) hashtags. The summary represents the average for each group.