Popping the bubble may not be enough: news media role in online political polarization

09/18/2021 ∙ by Jordan K Kobellarz, et al. ∙ UTFPR 0

Politics in different countries show diverse degrees of polarization, which tends to be stronger on social media, given how easy it became to connect and engage with like-minded individuals on the web. A way of reducing polarization would be by distributing cross-partisan news among individuals with distinct political orientations, i.e., "popping the bubbles". This study investigates whether this holds in the context of nationwide elections in Brazil and Canada. We collected politics-related tweets shared during the 2018 Brazilian presidential election and the 2019 Canadian federal election. Next, we proposed a new centrality metric that enables identifying highly central bubble poppers, nodes that can distribute content among users with diverging political opinions - a fundamental metric for the proposed study. After that, we analyzed how users engage with news content shared by bubble poppers, its source, and its topics, considering its political orientation. Among other results, we found that, even though news media disseminate content that interests different sides of the political spectrum, users tend to engage considerably more with content that aligns with their political orientation, regardless of the topic.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 10

page 11

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Polarized scenarios are characterized by the tendency of people with similar characteristics, interests, and behaviours to form tightly linked groups [McPherson2001]. In online social networks, this characteristic tends to be stronger because users start to be selectively exposed to similar points of view through mechanisms whose goal is to optimize engagement [pariser_filter_2011]. A small portion of bridging nodes, called brokers, can form weak ties between polarized groups by linking users with access to complementary information and creating opportunities for intergroup contact and exchange of information [burt2003social]. On Twitter’s online social network, highly central brokers tend to be related to news media accounts [kobellarz:2019], through which such media publish links to content available on their respective websites.

This study aims to understand how Twitter users engage with tweets shared by highly central brokers – users who can reach others with diverging political orientations. For the sake of simplicity, these users are referenced to as “bubble poppers” in this study because of their potential ability to reestablish the dialogue among polarized groups of people. More specifically, we study tweets containing URLs to external news by analyzing the relationship between users’ political orientation and the content of such news, content source (i.e., website domain), and content topic (extracted using automated processes). We are also interested in understanding the extent to which the exposure of users with a diverging political orientation to the same content contributes to a meaningful engagement, i.e., one that helps to increase the diversity of exchange of ideas, helping to avoid situations such as “echo chambers” [mcewan_mediated_2018]. To conduct these analyses, we draw on the 2018 Brazilian presidential election and the 2019 Canadian federal election, two political situations with different polarization levels. For each of these cases, we collected tweets on the weeks before and after voting and the election’s results. Datasets are publicly available in: URL will be released after the review process.

The main contributions of this study can be summarized as: (1) The proposition of a new centrality metric that enables the identification of highly central brokers (bubble poppers) that reach users with diverging political opinions; (2) The creation and analysis of tweet engagement networks, which allow us to identify common patterns in Brazilian and Canadian cases, such as: (i) neutral domains tend to distribute information in a balanced way to different polarized groups; however, (ii) when users in these polarized groups are exposed to content that interests both sides of the political spectrum, they tend to engage only with those which align with their own political orientation, regardless of the topic, and (iii) this phenomenon is stronger with the increase in user polarity. Our results indicate that, while news media can “pop the bubbles,” i.e., disseminate content that reaches and appeals to different sides of the political spectrum, users still tend to select the information that reinforces their points of view to engage themselves with. This clarifies the nature of news media brokerage, suggesting it can integrate users across the political spectrum, but is less effective in overcoming selective exposure. These findings speak to current debates on media, bias, and polarization.

The rest of this study is organized as follows. Section 2 presents the related work. Section 3 presents the data and methods used. Section 4 presents the results. Finally, Section 5 concludes the study.

2 Related Works

Democratic theorists have long extolled the virtues of ties that cross-cut political cleavages [mutz_hearing_2006, putnam_bowling_2000]. Interactions with members of an outgroup are said to promote inclusive relations that encourage civil restraint, and mutual understanding of opposing views [mutz_hearing_2006]. Conversely, in the absence of bridging ties, the public sphere becomes fractured, breeding segmented communities that foment antagonistic partisan identities [putnam_bowling_2000]. Against this backdrop, rising political polarization has been identified as a serious threat facing contemporary democracies. Growing distance between partisans, whether in their attitudes, lifestyles, or relations, has eroded the cross-ties that once tempered partisan conflict [dellaposta_pluralistic_2020]. As a result, partisan politics have become more rancorous, testing the limits of democratic institutions [mason_uncivil_2018].

Within this context, the role of online media, in particular, has been heavily scrutinized. Early optimism over the democratic potential of networked communities has waned, giving way to concerns over ideologically homophilous networks that proliferate “fake news” marred by partisan bias at the expense of factual veracity [faris2017partisanship, pariser_filter_2011]

. Some scholars have approached these issues by relying on heuristics such as “filter bubbles,” or “echo chambers”

[mcewan_mediated_2018]. According to these arguments, processes like value homophily and selective exposure that drive partisans to seek like-minded individuals and information corroborating prior attitudes [McPherson2001] are intensified in online spaces, reinforcing ideological viewpoints while insulating partisans from opposing viewpoints [pariser_filter_2011]. Without the tempering effect of this cross-cutting exposure, moreover, some claim partisan commitments become more extreme [mutz_hearing_2006, mason_uncivil_2018]. Consistent with this, research finds that individuals without exposure to cross-cutting exposure become less politically tolerant, less likely to regard opposing views as legitimate, and more likely to hold extreme attitudes [huckfeldt_disagreement_2004].

These developments have brought questions about information brokerage to the fore. Scholars identify brokers as those actors that bridge divides between opposing groups [burt2003social]. Owing to this relational position, brokers benefit from greater control over the transmission of information while also adopting conciliatory attitudes that mediate tensions [burt2003social]. As gatekeepers of political communication, media outlets have traditionally assumed this role [graber_mass_2018]. By selecting the news that citizens read and scrutinizing the quality of information they consume, media outlets integrate citizens into common narratives [anderson_imagined_1991]. Some scholars refer to this as ‘agenda-setting,’ whereby news media have the power to integrate public opinion by telling them what the important issues, worthy of attention, are [mccombs2020setting, lippmann2017public].

The emergence of social media, however, is said to have upended this role [ferguson_square_2017]. Platforms like Twitter have reconfigured how information is diffused by enabling grassroots actors to bypass traditional media outlets and spread crowdsourced information [ferguson_square_2017]. As discussed, these platforms are seen as encouraging dynamics like homophily, moral sensationalism, and “factual polarization” that drive echo chambers [brady_emotion_2017]. Taking heed of these issues, social media platforms and mainstream news outlets have sought to remedy this situation by re-asserting their brokerage function. Initiatives include expanding “fact-checking” efforts, with social media platforms like Twitter and Facebook collaborating with third-party programs that independently review content and flag false information. By directing citizens to institutionally validated facts, such initiatives intend to quell polarization by offering a “neutral” medium that foregrounds partisan debate and counteracts filter bubbles.

Recent research, however, suggests that the ‘filter bubble’ perspective misdiagnoses the nature of online polarization [judith_moller_filter_2021]. Empirical evidence finds that online ideological homophily is overstated, demonstrating that online networks are diverse in viewpoints and news sources, often more so than offline networks [beam_facebook_2018]. In fact, users often seek out counter-attitudinal political information. Morales et al. [de_francisci_morales_no_2021] find that during discussions of the 2016 US elections, Reddit users were less likely to engage with members of their own party than they were to engage across party lines. Likewise, Dylko et al. [dylko_impact_2018] find that online engagement is higher when users are exposed to counter-attitudinal items. Rather than moderating viewpoints, however, exposure to counter-attitudinal information can backfire, reinforcing commitment to prior beliefs [bail2018exposure]. Summarizing the empirical record, Moller [judith_moller_filter_2021] claims that “the reality of social media and algorithmic filter systems today seems to be that they inject diversity rather than reducing it”. Overall, this research challenges the premise that cross-cutting ties mediate conflict in online settings, suggesting that exposure to opposing views alone appears insufficient as a remedy to polarization. The information we receive and the evaluations we make appear to be guided by an effort to preserve the integrity of prior beliefs, even when confronted with opposing evidence [jerit_partisan_2012]. Brokers who pop the filter bubble by affording universal access to similar content might not necessarily break the echo chambers, as partisan bias continues to motivate users to select stories and narratives that corroborate prior beliefs, and reconcile challenging information by rejecting its validity [mcewan_mediated_2018].

Brokers’ ability to temper opposing positions may thus be limited because of selective exposure. That said, they may still mediate polarization in other ways. As mentioned above, an important function of brokerage traditionally provided by news media involves ‘agenda-setting’, whereby news media effectively tell people what topics to pay attention to, even if they don’t dictate what the specific positions should be [mccombs2020setting, lippmann2017public]. From that angle, established news media can still promote integration by delineating important topics of public discussion, thus drawing interest from partisans across the spectrum, uniting them in their attention, even if as hostile counterparts. Given the fragmentation of online media, however, the extent to which established news media that aim to be neutral brokers succeed in this capacity is unclear.

Taken together, this research raises numerous questions on brokerage in online settings. Our study seeks to adjudicate divergent perspectives by investigating how online brokers mediate polarization during two elections in Brazil and Canada. Specifically, we pursue four research questions: (1) How can we identify social media brokers (bubble poppers) who reach users across partisan divides? (2) Does content shared by bubble poppers elicit high or low levels of polarization? (3) Are more “neutral” users less likely to engage with polarized content when compared to “polarized” users? (4) Does the source of content shared by bubble poppers affect the degree to which it elicits a polarized response, based on the specific article or its semantic content (represented by its topic)?

3 Data and Methods

3.1 Data

We collected tweets from the Twitter streaming API regarding the 2018 Brazilian presidential election and the 2019 Canadian federal election. To obtain tweets related to the Brazilian and Canadian cases, we collected trending political hashtags ranked by Twitter on those countries on different moments of the preceding collection day to use as filtering criteria (“seeds”) on the Twitter API. We made sure to have a balanced number of trending hashtags supporting different candidates. Examples of hashtags for the Brazilian scenario are: #bolsonaro, #haddad, #elenao, #elesim, #lula, #TodosComJair, and for the Canadian scenario: #CanadaElection2019, #cdnpoli, #TrudeauMustGo, #TrudeauWorstPM, #TeamTrudeau, #Trudeau4MoreYears. Data collection for Brazil started

days before the first-round vote, from 2018-Oct-01, and ended days after the second-round vote, on 2018-Dec-11. This dataset was filtered to obtain only tweets whose language of the text and the author’s profile was Portuguese, resulting in tweets. For the Canadian case, tweets’ collection started days before the polls on 2019-Sep-18 and ended days after the polls on 2019-Sep-06. This dataset was filtered to obtain only tweets whose language of the text was English or French, resulting in tweets. We did not filter by users’ profile language for the Canadian dataset, because the Twitter Streaming API was no longer making this attribute available on tweets body. Both Brazilian and Canadian datasets were analyzed separately.

The political situation for each election provides important context for the analysis. In Brazil, during the election time, the polls111http://media.folha.uol.com.br/datafolha/2018/10/19/692a4086c399805ae503454cf8cd0d36IV.pdf indicated high polarization among voters. The dispute was between Jair Messias Bolsonaro, representing the possibility of a 15-year break from the ruling Workers Party (PT), and Fernando Haddad, representing the continuity of PT’s rule, after a brief period in which a PT president elected was replaced by her vice-president as a result of impeachment. The election culminated in Bolsonaro’s victory, with against of the valid votes in favour of Haddad222https://politica.estadao.com.br/eleicoes/2018/cobertura-votacao-apuracao/segundo-turno. In Canada, Justin Trudeau represented the Liberals, which had previously held a parliamentary majority after unseating the Conservatives in 2015. In a close election, the Liberals won 157 () seats in parliament, while the Conservatives, led by Andrew Scheer, won 121 ()333https://newsinteractives.cbc.ca/elections/federal/2019/results. The (left-wing) New Democratic Party continued to lose ground from its 2011 peak, especially in French Quebec, where the Liberals and the separatist Bloc Québécois subsequently gained ground. The 2019 election resulted in the Liberals forming a minority government, which have historically exhibited instability, since the prime minister relies on representatives of other parties to remain in power444https://web.archive.org/web/20130627154515/http://www2.parl.gc.ca/Parlinfo/compilations/parliament/DurationMinorityGovernment.aspx. Despite the multi-party character of the Canadian political system, at least since the 1980s, national politics has largely revolved around left-right differences [cochrane2015left].

3.2 User Polarity Estimation

The next step was to estimate each user’s polarity based on the retweeted content. For this, we used the political orientation of the hashtags that users applied in their tweets as an indication of their polarity. For this task, we identified

and

unique hashtags for the Brazilian and Canadian cases, respectively. For each of these two sets, we extracted the top 100 most frequent hashtags and classified them manually according to their political orientation, “

”, “”, “” and “”, were used to represent left-wing, neutral, right-wing and uncertain political leaning, respectively. Six volunteers (not the authors), three in each country, helped to classify all the top 100 hashtags without interference from one another. We maintained only the hashtags whose classification was the same for the three raters, resulting in 64 and 78 out of 100 hashtags for Brazil and Canada, respectively. We relied on the Fleiss’ kappa assessment [fleiss1981measurement] to measure the agreement degree between the raters, obtaining for Brazil, which means “substantial agreement”, and for Canada, meaning “almost perfect agreement”, according to Landis and Koch’s (1977) [landis1977measurement] interpretation of kappa scores.

To classify the rest of the hashtags at scale, we assumed that hashtags describing a common topic usually occur together in the same tweet. This way, a network of hashtag co-occurrences was built for Brazil and Canada, in which a node represented a hashtag and an edge between two nodes, the occurrence of both in the same tweet. Standalone hashtags were eliminated. A semi-supervised machine learning algorithm

[zhu2003semi], which uses the edges’ weight to compute the similarity between nodes, was applied over the network to label unlabeled hashtags starting from the manually classified hashtags. By applying this procedure, (%) and (%) hashtags were classified in the Brazilian and Canadian datasets, respectively. For the Brazilian case we obtained 35,788 (62.3%) hashtags classified as “” (right), 643 (1.1%) as “” (neutral), 21,039 (36.6%) as “” (left), and 17 (0.0%) as “” (uncertain). For the Canadian case we obtained 8,963 (13.3%) as “”, 52,416 (77.5%) as “”, 2,597 (3.8%) as “”, and 3,663 (5.4%) as “”. To assess the consistency of this method, the same hashtag classification procedure was performed times, but randomly hiding 10% of the manually classified hashtags each time. Classification results were submitted to the Fleiss’ kappa assessment [fleiss1981measurement] (simulating raters), where we obtained a for Brazil, and for Canada, meaning “almost perfect agreement” in both cases. We also considered a more aggressive strategy, hiding 20% of the manually classified hashtags, obtaining for Brazil, and for Canada, meaning “substantial agreement” in both cases, showing that the classification procedure was robust.

Results showed an imbalance in both datasets between hashtags classified as “” and those classified as “” or “”. Taking a close look at the Canadian dataset, it is possible to conclude that users tend to apply neutral hashtags more frequently, for example, #cdnpoli, #elxn43, #onpoli, and #abpoli, together with polarized hashtags, such as #TrudeauMustGo, #blackface, #ChooseForward, or #IStandWithTrudeau, which, in turn, are applied in a proportionally smaller number, helping to explain the more prominent number of neutral hashtags in this case. On the other hand, in the Brazilian dataset, the use of left and/or right hashtags is higher than neutral hashtags, even when they appear together. This imbalance on both datasets was addressed by adding weights to each class during the users’ polarity estimation, presented in Equation 1. Given the low relevance of hashtags with uncertain positioning “” for the analysis, only those classified as “”, “” and “” were maintained. Finally, we removed all tweets from the dataset that did not contain any classified hashtag, to preserve only political domain-related tweets. This removal resulted in (%) and (%) tweets for the Brazilian and Canadian datasets, respectively.

The next step was to classify users according to their political orientation, which was performed on a weekly basis. For this, we separated each dataset into six weeks, starting on Monday and ending on Sunday. Less active users, with less than five tweets per week, were removed. This was done because these users did not create enough tweets to estimate their polarity. With that, we obtained and unique users for the Brazilian and Canadian datasets, respectively. For each of these users, a list of hashtags used in all of their tweets for the week was created, and their polarity (), calculated using Equation 1:

(1)

where , and are the hashtag multisets (a set that allows for multiple instances for each of its elements) for classes “”, “” and “”, respectively. , , and are the weights for classes “”, “” and “”, respectively. In these equations, is a function that returns the average number of hashtags in two sets, and is a set containing all hashtags applied by a user, i.e., . These weights are important to mitigate the class imbalance characteristic of our datasets. This is inspired by usual tasks in classification scenarios with imbalanced classes [tan2016introduction]. The general idea is to penalize classes with a higher number of hashtags, such as the case of class “N” in Canada, and “R” and “L” in Brazil, and increase the relevance of classes with fewer hashtags, such as “ R ” and “L” in Canada, and “N” in Brazil. This is necessary to avoid polarity estimation biased to the most frequent hashtags users’ tweeted. Without this strategy, most users would have their polarity estimated wrongly as neutral in the Canadian case. For example, a user who tweeted #ChooseForward, #ChooseForwardWithTrudeau, #cdnpoli, #elxn43, #ScheerCowardice, #onpoli, #VoteLiberal, #cdnpoli, #elxn43, #ItsOurVote, #CPC and #climatestrikecanada, which is a representative example of our dataset, would have a P(H) value equal to without using weights, and a P(H) value equal to with weights, better reflecting the polarity of the user. Similarly, a user who applied the hashtags #cdnpoli, #elxn43, #elxn43, #RCMP, #cdnpoli, #TrudeauIsARacist and #TrudeauCorruption, would have a P(H) value of without weights and equal to

with weights, again the use of weights expresses a user’s polarity much better. Without weights, the opposite would happen in the Brazilian case; most users would have their P(H) values skewed to extremes improperly.

The result of is a value in the continuous range . Positive values represent a right-wing political orientation, negative values represent a left-wing orientation, and values close to represent a neutral orientation. Based on this result, we labelled users according to their political orientation: the first third of values on the P(H) scale represents the left-wing (L) users, with , the second third, the neutral users (N), with and the last third, the right-wing users (R), with .

3.3 Detecting bubble poppers

After identifying all users’ political orientation, a network was created for each week, connecting one user to another through retweets. Each network was represented in the form of a weighted undirected graph, where each user is a node, and the retweet is an edge that starts from the user who was retweeted and ends at the user who retweeted. The edge weight represents the number of retweets between users. Self-loops and isolated nodes were eliminated.

To achieve the main goal of our study, we needed to detect highly central nodes linked to both sides of the network, i.e., we needed to detect brokers (bubble poppers) on the network capable of reaching users with diverging political views. For this task, betweenness centrality [freeman1978centrality] could be a metric to be applied as a starting point, because it ranks nodes by their capacity of acting as information bridges between more pairs of nodes in the network, relying on the number of shortest paths that pass through each node. However, in a polarized situation, we can have nodes with a high betweenness degree within and between polarized groups. That is, highly influential nodes inside bubbles could be ranked the same as highly influential nodes between bubbles. The former are called “local bridges” and the latter “global bridges” [jensen2016detecting]. We were interested only in the global bridges, nodes that most of the time act as brokers, by linking both sides of the network.

Considering that betweenness centrality was not an appropriate metric to distinguish local bridges from global bridges, we identified in the literature a relatively new centrality metric called “bridgenness” [jensen2016detecting]. This metric is based on the betweenness algorithm, but while computing the number of shortest paths between all pairs of nodes that pass through a source node, it does not include the shortest paths that either start or end at the immediate neighbourhood of the source node [jensen2016detecting]. Even though the bridgeness algorithm could better emphasize global than local bridges, it brings up a problem when the considered network has a small average path length, wich is precisely what happens with the retweet networks we are analyzing. This is because it could disregard some important small paths that either start or end on the neighbourhood of a node. Considering that we already know the political orientation of all users in our dataset and that users with similar orientations tend to form tightly linked groups on the network, we used the political orientation as a filtering criterion for the shortest paths.

Algorithm 1 presents this proposed process, which we called the “bubble popper” algorithm. This algorithm builds on the betweenness and bridgeness algorithms. Still, while computing the number of shortest paths between all pairs of nodes that pass through a node, it does not include the shortest paths that either start or end at the immediate neighbourhood of a certain node if the considered node on the neighbourhood has the same class (political orientation, in our case) as the source node – a key twist added to the bridgeness algorithm is this restriction to be from a different class. Put in simple words: this new algorithm measures a node’s capacity to disseminate information to distinct groups on the network, with a different political orientation from itself. To construct the bubble popper algorithm, we relied on the Brandes “faster algorithm” [brandes2001faster]. These proposed changes are presented in line 35 – in the count of the shortest paths that pass through node , it is verified if the considered path is not a self-loop with (which already exists in the original Brandes’ algorithm) and if is not in the neighborhood of , with or, if is in the neighborhood of , with and has a different political orientation from , with . We refer to the measure created by this algorithm as “bubble popper centrality.” Note that this metric is also applicable to other problems with the same characteristics.

11:   adjacency matrix;
22:   node orientation, ;
33:  ;
4:  for  do
45:      empty stack;
56:      empty list, ;
67:     ;
78:     ;
89:      empty queue;
910:     enqueue ;
11:     while  not empty do
1012:        dequeue ;
1113:        push ;
14:        for all neighbor of  do
15:           // found for the first time?
16:           if  then
1217:              enqueue ;
1318:              ;
19:           end if
20:           // shortest path to via ?
21:           if  then
1422:              ;
1523:              append ;
24:           end if
25:        end for
26:     end while
1627:     ;
28:     // returns vertices in order of non-increasing distance from
29:     while  not empty do
1730:        pop ;
31:        for  do
1832:           ;
33:        end for
34:        // this is the part were bubble popper differs from betweenness and bridgeness algorithms
35:        if  and ( or and then
36:           
37:        end if
38:     end while
39:  end for
Algorithm 1 Bubble popper algorithm, adapted from Brandes’ “faster algorithm” for betweenness centrality [brandes2001faster] and bridgeness algorithm [jensen2016detecting].

To illustrate the difference between centrality metrics, a synthetic network was used, the same one evaluated by Jensen et al. (2016) [jensen2016detecting] to compare the metrics of betweenness and bridgeness, but with the addition of a label for each node representing its political orientation (L, N or R), to allow the bubble popper metric computation. Figure 1 presents this network, including the computed values for all metrics, with colour-coded nodes according to political orientation, with L in red, N dark-coloured, and R in blue. In the figure, it is possible to observe that the bubble popper metric was more efficient in detecting nodes that bridge distinct groups on the network with a different political orientation from itself, represented by nodes A and B. To complement this analysis, Figure 2 shows centrality values for the users with the highest betweenness values in Brazil in week . These values were normalized using a min-max strategy for each metric. The user’s political orientation (L, N, or R) is presented next to their name. It is possible to note that the values of bridgeness and bubble popper centrality follow a similar pattern for most users in Brazil, being the former metric slightly higher than the latter, showing that those metrics capture similar information for most cases. However, they differ considerably in specific cases, especially when important nodes reach different sides of the spectrum in a more balanced way, such as the case for @UOLNoticias and @g1, two Brazilian news media profiles.

Figure 1: Synthetic network centrality comparison.

Figure 2: Centrality comparison among top 25 users according to betweenness for the Brazilian dataset - week 4.

This result can be difficult to understand without being familiar with the network structure; thus, Figure 3 compares the same metrics on a zoomed example of the same network analyzed above. Nodes are sized by their respective centrality measure. On the left and right corner of the figures, one can view the left-wing (red) and right-wing (blue) groups. Between them, a small number of nodes (dark-coloured) link these groups. On the first representation for betweenness centrality, two big nodes appear. These are for accounts @Haddad_Fernando and @ManuelaDavila, representing Fernando Haddad and Manuela Davila, candidates for presidency and vice-presidency. On the left-wing side of the figure, these accounts were ranked higher than the bridging nodes @UOLNoticias and @G1. On the second representation (bridgeness centrality), as expected, nodes from polarized groups were also not highlighted. Rather, @Trump_The_Robot, a spamming account banned from Twitter after data collection, was ranked higher than @UOLNoticias and @G1. And finally, on the third representation for bubble popper centrality, nodes from polarized groups also were not highlighted. However, in contrast to bridgeness centrality, @UOLNoticias ranked higher than @Trump_The_Robot and @G1. This same pattern prevailed in other weeks. For Canada (not shown due to lack of space), the bubble popper also helps to highlight important nodes that receive the attention of users from different sides of the political spectrum. These findings regasrding the effectiveness of the bubble popper centrality algorithm speak to our first research question.

Figure 3: Comparison among centralities on week 4 in Brazil.

3.4 Domain, Content and Topic Polarity Estimation

The next step was to extract entities (i.e., domain, content, and topic) related to the links to external sites present in tweets made by bubble poppers. Recall that content refers to news represented by its URL, domain refers to the news website domain, and the topic refers to latent topic in the content extracted using standard automated processes. The process for extracting and estimating the polarity of these entities is presented in this section.

Week Brazil Canada
Var Kurt Deg Var Kurt Deg
2 0.320 -0.895 0.586 0.253 -1.306 0.475
3 0.495 -1.590 0.676 0.299 -1.450 0.492
4 0.448 -1.413 0.658 0.280 -1.338 0.475
5 0.144 -0.129 0.310 0.231 -0.682 0.368
6 0.201 -0.466 0.462 0.095 -0.257 0.225
Table 1: Polarization analysis.

P(H) summarized variables: variance (var), kurtosis (kurt) and polarization degree (deg).

Bubble popper’s centrality was used to obtain the users with verified Twitter accounts that had the highest value for this metric in each of the weeks. Knowing the most central nodes according to bubble popper, we obtained all retweets they made that included links to external sites, except social networking sites (i.e., Facebook, Instagram, Periscope, and Snapchat) and content aggregators (i.e., YouTube, Apple Podcasts and Globo Play - in case of Brazil). In total, and retweets with links were identified in the Brazilian and Canadian datasets, respectively. To capture the textual content and metadata from pages referred to in these links, the news-please555https://pypi.org/project/news-please library was used. A total of and unique links were extracted from the Brazilian and Canadian datasets, respectively. Any textual content with less than 300 characters was removed to avoid potentially incomplete articles. Then, the textual content was pre-processed in the following order: (1) removal of stop-words from specific lists for the languages of each dataset using the NLTK library666https://www.nltk.org; (2) removal of tokens that were not nouns, proper names, adjectives or verbs using specific pre-trained POS-tagging models for the languages of each dataset using the Spacy library777https://spacy.io.; (3) stemming of tokens using pre-trained models for the languages of each country through the Spacy library; (4) transformation of tokens into lower case and removal of special characters and accents; (6) extraction of bi and trigrams using the Gensim library888https://radimrehurek.com/gensim.

Having cleaned the data, we extracted topics from the pre-processed textual content. For this task, we applied the Latent Dirichlet Allocation (LDA) [blei2003latent] algorithm using the Gensim library implementation. This algorithm allows for identifying topics in a set of texts, considering that each text has a mixture of topics [blei2003latent]. The choice of the number of topics needs to be made manually for the LDA algorithm. Therefore, for each dataset, multiple models were created with values for the number of topics in the range from 20 to 50. Within this range, we identified the model whose quantity of topics had the highest degree of semantic similarity between texts through the coherence metric generated with the model footnote 8, being topics in Brazil and in Canada. For all cases, the same seed was used so that the results of identifying topics could be replicated.

The last step was to estimate the polarity of domains, content, and topics based on the polarity () of the users who retweeted a tweet. For this task, a new metric called relative polarity () was created, which is calculated as follows: (1) for each entity (i.e. domain, content or topic), a list with 21 positions was created, where each cell computes the number of retweets that the entity received from users in the polarity bins represented by the set {}; (2) the entities were allocated in a matrix, in which each row represents an entity and each column one of the 21 polarity bins; (3) in order to avoid data imbalance for each bin, considering the overall dataset, each cell of the matrix was divided by their respective column maximum value; (4) for each entity (row), we normalize the values using a min-max strategy, putting the values on a interval; (5) for each entity (row), we summed each cell value multiplied by its respective polarity, for example, the first cell value was multiplied by , the second cell by , the third by , and so on, until . If the cell value was equal to zero or the cell represented the polarity it was disregarded and not counted; (6) finally, the sum result was divided by the number of considered cells, which became the RP(H) value for the entity. The resulting value of the metric RP(H) was interpreted in the same way as the metric P(H), presented in Subsection 3.2.

4 Results

4.1 Polarization analysis

To provide general context, we first analyze the global polarization of each dataset; the metrics of variance, kurtosis, and polarization degree were extracted based on the value of the users who retweeted each week; the analysis is inspired by [dimaggio1996have]. Variance refers to the dispersion of opinions among individuals. The more dispersed the opinions become, the more difficult it becomes for the political system to maintain consensus [dimaggio1996have]. Kurtosis refers to the bimodality in the distribution of opinions among individuals. Positive values mean consensus, and negative values, as it gets closer to

, mean disagreement among individuals. The more these curves diverge, the greater the probability of social disagreement

[dimaggio1996have]. The polarization degree, calculated by the average absolute values of for all users, refers to how much the opinions of individuals diverge, with being totally convergent and being totally divergent. Table 1 shows the values for these metrics, and Figure 4 shows the weekly histograms of users by range, divided into 21 bins.

(a) Brazil
(b) Canada
Figure 4: Number of users by polarity histograms.
(a) Brazil
(b) Canada
Figure 5: User engagement with domains.

In both datasets, it is possible to visually identify bimodal curves in the first four weeks. These weeks represent the period before the elections. The presence of the bimodal curve indicates that both datasets present evidence of polarization [Fiorina2008, dimaggio1996have], with the polarization in the Brazilian case stronger than in the Canadian case. The variance indicates that the Brazilian case had a greater dispersion in political opinions, while it was slightly less dispersed in Canada. Left-wing radicals in Brazil appear proportionally more concentrated than right-wing radicals, which could be interpreted as the former being more prone to retweet content shared by the top bubble poppers than the latter. Finally, the polarization degree indicates a more extreme scenario in Brazil, mainly in week 4, which refers to the week before the voting day. After the election in Brazil, polarization became lower on both left and right-wing sides. The histograms show some polarization in Canada, but with a significant amount of neutral users during the entire observed period. However, weeks 5 and 6 exhibit proportionally more neutral users than polarized users than previous weeks, and a substantial reduction of polarized users occurs after the election result (week 6). This high degree of overall polarization, even among users exposed to the content shared by the top “bubble poppers”, is an initial sign that exposure to cross-cutting content alone is not enough to remedy polarization, especially in the emotionally intensified context of a national election. This finding speaks to our second research question.

4.2 User engagement

This section discusses how users with specific political orientations engage by retweeting messages that support their political biases in different perspectives: domain, content, and topic. For this, we examine the correlation between the user P(H) who made a retweet and the RP(H) of the domain, content, and topics that were retweeted. The results of the correlations are shown in Table 2, where columns refer to the variables that were tested, with “User P(H)” being the P(H) value for the user who made a retweet, and “Domain RP(H)”, “Content RP(H)”, and “Topic RP(H)” the RP(H) value of the retweeted entity. To structure the tests, we considered that users classified with a neutral political orientation (with the label “”) behave differently from polarized users (with labels “” or “”). For this reason, three tests were created, each with a distinct set of retweets filtered by the user P(H): (1) including all retweets, regardless of the user’s political orientation, (2) including only retweets made by neutral users and (3) including only retweets made by polarized users. See Table 2.

Considering that an entity’s RP(H) value is defined based on the P(H) of the users who retweeted it, we evaluated the possible existence and impact of circularity in the correlation tests. For this purpose, a synthetic dataset of random retweets was created for each performed correlation test, maintaining the same number of retweets, users, and entities present in the original retweets dataset, but with P(H) values generated randomly following a uniform distribution. Each synthetic dataset was generated 100 times with different random seeds. For each synthetic dataset, we generate two null models: Model A with RP(H) being calculated following the method presented in section

2 and Model B with RP(H) generated randomly following a uniform distribution. As expected for a full random model, Model B showed no correlation between the P(H) of users and the RP(H) of domains, content, or topics for all tests. On the other hand, Model A showed weak signs of positive correlations, so we focused on presenting results from this model.

Brazil Canada
User P(H) Domain RP(H) Content RP(H) Topic RP(H) Domain RP(H) Content RP(H) Topic RP(H)
(1) all
***
***
***
***
***
***
***
***
***
***
***
***
(2) neutral
***
***
***
***
***
***
***
***
***
***
***
***
(3) polarized
***
***
***
***
***
***
***
***
***
***
***
***
Note: and represent the Pearson and Spearman’s correlation coefficients, respectively. All correlations are significant at level (two-tailed).
Table 2: Correlation between User P(H) and retweeted Domain, Content or Topic RP(H).

User P(H) vs domain RP(H): Considering all retweets, a moderate positive correlation was found, with ***, and *** in the Brazilian dataset and a strong positive correlation with ***, and ***, in the Canadian dataset. When looking only at neutral users, this correlation is slightly less intense than polarized users in both datasets. This result indicates that, in general, users tend to retweet content from domains with a political orientation aligned to their own, even if they do not always do so. Figure 5 show the retweet density made by each user polarity from the top most retweeted domains on each dataset and their respective RP(H) – domains were sorted in an increasing order by their RP(H) value. In this figure, it is possible to note that a relatively small number of nodes could be popping the bubbles more than others, such as “noticias.uol.com.br” and “globalnews.ca”. Considering that these domains could be producing content that appeals to users from diverging political orientations, we also have to verify if partisans interact in a balanced way with content that goes towards or against their own political bias (we present results regarding this point in Section 4.3). The correlations found for the null Model A were *** , and *** for tests 1, 2 and 3, respectively, in Canada and *** , and *** , respectively, in Brazil. Values are similar for Spearman’s correlations. We observe that there is a very low circularity that is negligible. Also note that the original correlation values are much larger than the limits of the null model, thus, no significant impact of circularity in the results.

(a) Brazil
(b) Canada
Figure 6: User engagement with news content.
(a) Brazil
(b) Canada
Figure 7: User engagement with topics.

User P(H) vs content RP(H): When considering all retweets, a strong to very strong positive correlation was identified in both datasets, with ***, and *** for Brazil and ***, and *** for Canada. The similarity between these results may indicate that, regardless of the dataset, most of the time, users engage with content that aligns with their political orientation. As with the correlation of user P(H) with domain RP(H), when neutral users are observed in isolation, the correlation is slightly less intense compared to polarized users. The fact that the correlation between user P(H) and content RP(H) is stronger than the correlation between user P(H) and domain RP(H) may indicate that the content’s political orientation is slightly more important than the domain political orientation in user retweet behaviour. Regardless, content shared by bubble poppers remains highly polarized, and even “neutral” users engage with content in segmented ways. Figure 6 show the retweet density made by each user polarity from the top most retweeted URLs (content) on each dataset and their respective RP(H). URLs are ordered ascending according to their RP(H) value. In this figure, it is possible to note that none of the URLs popped the bubbles. Instead, the content was shared only by users with a political orientation that resonated with the content bias. This visualization indicates that, although some domains could be popping the bubbles, users tend to select only the content that aligns with their beliefs, even when exposed to content that appeals to different groups. The correlations for the null Model A are *** , *** and *** for tests 1, 2 and 3, respectively, in Canada and *** , *** and *** , respectively, in Brazil. Values are similar to Spearman’s correlations. Note that although the limits produced by this model are slightly higher than the previous case, the original correlation results are located well beyond the limits of observed circularity effects.

User P(H) vs topics RP(H): In the test for all retweets, a weak positive correlation was found in both datasets, with ***, and *** in the Brazilian dataset and ***, and *** in the Canadian one. The similarity between the results may indicate that, regardless of the situation, the topic’s political orientation has less influence on retweeting behaviour compared to the content or domain. This means that the content itself and where it appears is generally more important than what it says in terms of its topic. Figure 7 shows the retweet density made by each user polarity from the top most retweeted topics on each dataset and its respective RP(H). Topics are sorted in an increasing order based on RP(H) value. It is possible to observe more topics slightly biased to the left in Brazil. In Canada, most of the topics are dispersed among partisans, regardless of their political orientation. The correlations for the null Model A are *** , and *** for tests 1, 2 and 3, respectively, in Canada and *** , and *** , respectively, in Brazil. Values also are similar to Spearman’s correlations. Here we also observe that the original correlations are located very far from the observed circularity limits. Therefore, these results, like the others, are also not due to pure circularity.

About our third research question, we found that more polarized users show stronger partisan alignment. However, relatively neutral users still tend to share content from media outlets that align with their own partisan beliefs.

Brazil Canada
Domain RP(H) Content RP(H) Topic RP(H) Content RP(H) Topic RP(H)
(1) all
***
***
***
***
***
***
***
***
(2) neutral
***
***
***
***
***
***
(3) polarized
***
***
***
***
***
***
***
***
Note: and represent the Pearson and Spearman’s correlation coefficients, respectively. All correlations are significant at level (two-tailed).
Table 3: Correlation between retweeted Domain RP(H) and Content or Topic RP(H).

4.3 The impact of the content source

The analyses that was carried out in the previous section indicate a stronger correlation between users’ polarity and content polarity than between users’ polarity and domain’ polarity. However, it is still not possible to know whether users who engage with domains that produce content that interests both sides, that is, domains with a neutral RP(H), retweet more content that reinforces their political views or tend to balance their retweets by including content that goes in favour and against their political orientation. To better understand this, correlation tests were carried out involving the retweeted domain RP(H) and the retweeted content or topic RP(H) produced by the domain. The results are shown in Table  3. The tests were conducted in the same way as the previous ones, separating the dataset according to the domain RP(H), including: (1) all retweets, regardless of the retweeted domain RP(H), (2) only retweets from domains with neutral RP(H) and (3) only retweets from domains with polarized RP(H).

Domain RP(H) vs content RP(H): When considering all retweets, a strong positive correlation was identified in both datasets, with *** and *** in the Brazilian dataset and *** and *** in the Canadian one. Figure 8 shows the scatterplot with the regression line (in green) for the correlation test involving domain RP(H), on the horizontal axis, and content RP(H) on the vertical axis, where each point is colour-coded considering the polarity value of users who made each retweet. This result reinforces what had already been found in the correlations between user P(H) and domain and content RP(H). However, when neutral domains are analyzed in isolation, it is possible to observe a weak positive correlation between domain RP(H) and content RP(H), with *** and *** in the Brazilian dataset and a moderate positive correlation with *** and ***, in the Canadian dataset. This correlation is significantly lower than the correlation between domain RP(H) and content RP(H) of polarized domains in Brazil, with *** and *** and similar in the Canadian dataset, with *** and ***. These results show that neutral domains can generate engagement of users from both extremes of the political spectrum in a more balanced way, i.e., engage a similar proportion of users with different political orientations. However, this is not the case for polarized domains in Brazil. As we saw in the previous results, this balanced engagement does not mean that users with different political orientations engage with any content. In fact, the indication we have is that users are selective, engaging with content that matches their convictions.

(a) Brazil
(b) Canada
Figure 8: Correlation between retweeted Domain RP(H) and Content RP(H).

Domain RP(H) vs topic RP(H): Finally, in the test including all retweets involving the domain RP(H) and topic RP(H), a weak positive correlation was identified in both datasets, with *** and ***, in the Brazilian dataset, and with *** and ***, in the Canadian dataset. Figure 9 shows the scatterplot with the regression line (in green) for the correlation test involving domain RP(H), on the horizontal axis, and topic RP(H) on the vertical axis, where each point is colour-coded, considering the polarity value of users who made each retweet. Nevertheless, when we study neutral users in isolation, as it happened for previous tests, the correlation is slightly less intense when compared to polarized users. These results suggest that the political orientation of topics and domains is weakly related to retweeting behaviour.

(a) Brazil
(b) Canada
Figure 9: Correlation between retweeted Domain RP(H) and Topic RP(H).

Overall, speaking to our fourth research question, we found that neutral domains do foster greater engagement across partisan divisions compared to more partisan domains. Nevertheless, this high degree of cross-partisan exposure does not mean users from different partisan orientations engage with the same content to the same degree. Rather, they selectively engage with content that aligns with their partisan orientation. Moreover, this selective engagement is not based on the actual content but upon where it appears. Articles on similar topics generate very different responses depending on the media outlet that was dealing with it.

5 Discussion and Conclusion

The results indicate a common phenomenon across the Brazilian and Canadian datasets: neutral domains can distribute content that interests both sides of the political spectrum in a balanced way. Despite that, in most cases, users tend to engage only with content that reinforces their convictions, regardless of the topic. The same topic can generate different retweet behaviours, depending more strongly on the content or the domain’s political orientation (in polarized domains).

Social media platforms and mainstream news organizations have pursued initiatives to expose users to more frequently shared and neutral content, hoping that this expedint would reduce polarization. The results presented in this study partially contrast with this received wisdom. We find that when users engage with neutral domains and are exposed to content that interests both sides of the political spectrum, they prefer to select the content that better fits their previous beliefs. In short, users seek to engage with information more aligned with their biases, even when exposed to contrasting information. Neither cross-cutting exposure nor “neutral” media alone are enough to free people from their echo chambers. Intergroup contact with the same content may not have the expected effect in reducing polarization. Our results thus build on previous literature emphasizing the role of selective exposure in structuring online polarization [iyengar2009red, Hameleers2018]. We offer greater confidence in the truth of these claims by using a novel centrality metric. This suggests selective exposure is not simply an artifact of prior measurement techniques but reflects something fundamental about engagement on social media.

On the other hand, evidence that neutral domains generate engagement from distinct polarized groups suggests news media can still function as a ‘bubble popper’, but in a different way. By evidencing prior biases as the decisive factor in building public opinion, selective exposure arguments tend to minimize the impact of news media. However, while news media may be ineffective in telling people what to think, others note it is more successful in telling people what to think about through ‘agenda setting’ [lippmann2017public, mccombs2020setting]. Based on that, news media can still promote cohesion by delineating what the important topics are, even when positions on these issues diverge. Our analysis speaks to this dynamic by finding that even when neutral domains are unable to upend partisan biases, they still manage to draw interest from across the political spectrum. Neutral domains can thus conceivably mediate polarization by integrating opposing partisans within common topical agendas, even if this means adopting antagonistic perspectives. Along similar lines the fact that neutral domains can draw interest from both sides of the political spectrum could be interpreted as consistent with their promise to be impartial with regards to partisan “coverage bias” [d2000media]. Nowadays, this is an important point considering news media are constantly being attacked by country leaders who claim that they produce fake or biased news to weaken their power [faris2017partisanship, huckfeldt_disagreement_2004].

Despite the interesting results of this study, there are limitations that we should highlight. The action of retweeting was considered a sign of a user’s endorsement of a domain, content, or topic. However, retweets do not always indicate that a user is endorsing something. Regarding this point, users could even retweet content they don’t agree with to make fun of or mock it in front of their peers. In addition, the Twitter platform recommendation mechanisms may be generating filter bubbles [pariser2011filter] through the selection of content published by traditional media that would potentially generate more engagement on either side of the political spectrum. Despite this, with the sample sizes of the analyzed datasets and the fact that a balance has been identified in the distribution of neutral media content among users in differing political spectrums, that possibility becomes less likely to have affected the obtained results. We recognize that the high granularity of the L, N, and R labels greatly simplified our analysis, however real extremists could behave differently compared to those who are merely biased. And those we considered neutral in their orientation were not necessarily so. Lastly, it is important to consider that the filtering of most active users on a weekly basis may have captured more politically motivated individuals for our analysis, which could inflate the polarization indicators. Despite this, there is a large number of users in the dataset, and politically motivated individuals could be in a smaller proportion when compared to individuals just sharing political content among other non-political content, even considering the election’s setting.

For future studies, it is important to verify whether the same dynamic found in the engagement of users with content created by neutral top bubble poppers is replicated in other online social networks, such as Facebook, Whatsapp and Reddit or even in offline social networks. Understanding this phenomenon in distinct contexts is important for consolidating policies or building mechanisms that aim to reduce polarization, whether online or offline. Such research could build on recent work outlining the different functions of cross-partisan brokerage [keuchenius2021important, brocic2021influence] by elaborating the role of neutral bubble-poppers. Drawing on our earlier discussion, this could mean exploring their role in topical ‘agenda-setting’.

Regarding the role of news media, an important step is to assess the degree to which this media is able to distribute content in a balanced way to different social groups and how individuals behave when exposed to such content, not only considering the sharing behaviour, as performed in this work, but also in the discussion cascades generated from the consumption of online contents. In this sense, recent work on message cascades exchanged by Whatsapp [caetano2019characterizing] users identified that the reach of messages with false political information was greater than that of other types of information. Similar research could be built in order to understand the user’s reaction through discussions, when exposed to information shared by news media, considering the political bias of users and the kind of information to which they were exposed.

Acknowledgements

The authors would like to thank Michelle Reddy for the ongoing discussions on the thematic of this study.

Funding

All stages of this study was financed in part by CAPES - Finance Code 001, project GoodWeb (Grant 2018/23011-1 from São Paulo Research Foundation - FAPESP), CNPq (grant 310998/2020-4), and by a Connaught Global Challenge Award.

Availability of data and materials

The datasets generated and analyzed during the current study will be made available in a public repository after review.

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

JK ran the analysis and wrote the paper. TS, MB and DS conceptualized the analysis and wrote the paper. AG revised the paper. All authors read and approved the final manuscript.

References