Tubes Bubbles – Topological confinement of YouTube recommendations

01/15/2020
by   Camille Roth, et al.
Humboldt-Universität zu Berlin
0

The role of recommendation algorithms in online user confinement is at the heart of a fast-growing literature. Recent empirical studies generally suggest that filter bubbles may principally be observed in the case of explicit recommendation (based on user-declared preferences) rather than implicit recommendation (based on user activity). We focus on YouTube which has become a major online content provider but where confinement has until now been little-studied in a systematic manner. Starting from a diverse number of seed videos, we first describe the properties of the sets of suggested videos in order to design a sound exploration protocol able to capture latent recommendation graphs recursively induced by these suggestions. These graphs form the background of potential user navigations along non-personalized recommendations. From there, be it in topological, topical or temporal terms, we show that the landscape of what we call mean-field YouTube recommendations is often prone to confinement dynamics. Moreover, the most confined recommendation graphs i.e., potential bubbles, seem to be organized around sets of videos that garner the highest audience and thus plausibly viewing time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 7

page 8

03/20/2022

YouTube, The Great Radicalizer? Auditing and Mitigating Ideological Biases in YouTube Recommendations

Recommendations algorithms of social media platforms are often criticize...
03/25/2022

An Audit of Misinformation Filter Bubbles on YouTube: Bubble Bursting and Recent Behavior Changes

The negative effects of misinformation filter bubbles in adaptive system...
01/27/2022

OtherTube: Facilitating Content Discovery and Reflection by Exchanging YouTube Recommendations with Strangers

To promote engagement, recommendation algorithms on platforms like YouTu...
12/24/2019

Algorithmic Extremism: Examining YouTube's Rabbit Hole of Radicalization

The role that YouTube and its behind-the-scenes recommendation algorithm...
07/21/2021

Auditing the Biases Enacted by YouTube for Political Topics in Germany

With YouTube's growing importance as a news platform, its recommendation...
08/07/2020

Middle-Aged Video Consumers' Beliefs About Algorithmic Recommendations on YouTube

User beliefs about algorithmic systems are constantly co-produced throug...
03/10/2022

AI Annotated Recommendations in an Efficient Visual Learning Environment with Emphasis on YouTube (AI-EVL)

In this article, we create a system called AI-EVL. This is an annotated-...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

The effect of algorithms in the filtering of information and interactions in online platforms is currently at heart of a very active debate. The question of whether algorithmic recommendation fosters the serendipity of contact and content discovery or not is of particular interest. On the one hand, a growing literature aims at empirically comparing what happens when users do rely, at least in part, on the output of some recommendation algorithm vs. when they do not. This kind of scientific endeavor generally need not venture into knowing or reverse-engineering which principles drive these algorithms. Contrarily, perhaps, to intuitions related to the popularization of so-called “filter bubbles”, several recent studies appear to show that algorithmic suggestions do not necessarily contribute to restrict the horizon of users. Be it in terms of interaction or information consumption, users do not seem to be proposed less diversity content in regard to what would happen in the absence of recommendation [1, 2, 3, 4, 5, 6] or using distinct recommendation approaches [7, 8], except for what stems from explicit personalization (i.e. explicitly chosen [9], or self-selected [10], by users [11]). Put shortly, the picture that seems to emerge is that filter bubbles, and possibly echo chambers, mostly occur when platforms recommend content based on explicit personal preferences (e.g., by subscribing to channels, specifying lists of interests, etc.) rather than implicit traces (either at the user-level or aggregated from the activities of all users).

On the other hand, at a more downstream level, user reactions to algorithmic curation are an equally important issue. The current state of the art exhibits mixed results and user populations may not be deemed to be homogeneous. For one, users may variously seek diversity [12], be variously responsive to recommendation [1], use it for various purposes [13] or have various expectations about it [14] — in these respects, the “average user” does not really exist. Users however seem to be generally sensitive to social signals and goaded by the indication that some content is popular or appreciated [15, 16, 17] whereas they are weakly sensitive to content-based signals, for instance if they are informed of the diversity of what they are currently consuming [18]. Such studies generally require the design of sophisticated experimental protocols or privileged access to private company data. On the whole, there appears to be no blanket answer to the complex interplay between the structure of proposed recommendations and user attitudes towards them.

In this context, while YouTube has become a key content provider (being the second most popular site as of 2019), the influence of its recommendation system on user navigation dynamics and exploration diversity has been little studied (even though it is already a current news topic, see e.g., [19, 20]). The present contribution intends to bridge this gap by focusing on the global, platform-level and thus non-personalized recommendations of YouTube. Indeed, irrespective of the personalized, user-centric adjustments to recommendation, studying platform-level suggestions shall provide an overview of the forces that are susceptible to apply globally to all users. As such, characterizing a possible confinement on these recommendation landscape constitutes a primary step toward characterizing confinement as a whole.

On content-sharing platforms, a model of user behavior toward recommendations may be construed as the navigation on a recommendation graph where nodes are items (such as videos) and links are recommendation suggestions, which users may or may not follow. Understanding the heterogeneity of the subsequent navigation topologies is crucial to appraise possible confinement processes. The issue of potential navigation topology has already generated several key studies in other platforms such as Twitter or Facebook, especially with respect to polarization and fragmentation, yielding convincing graph typologies (see, inter alia, [21, 22, 23, 24, 25]). By contrast, the state of the art relevant to YouTube’s algorithms appears to have essentially focused on their technical underpinnings [26], their improvement [27] or their impact on consumption and audience statistics [28, 29, 30]. To our knowledge, very few academic works appear to focus on the general structure of the browsing network: for instance, [31] describes the potential navigation dynamics in relation to audience or macro-level features, while [32] principally use the recommendation graph as a data source for extracting crowdsourced content taxonomies.

The paper is broadly organized as follows. We first carry out an instrumental step by exploring how video recommendation sets are being provided by the platform in the absence of personalization. This enables us to devise a robust protocol of collection of node-centric recommendation graphs. We then analyze the shape of the graphs that are thus generated and their various confinement features. Most importantly, we discuss them in relation to various intrinsic properties of videos (especially in terms of popularity, consensus, or topics), both in a static and longitudinal manner.

Node-centric analysis of recommendations

Most YouTube videos include a tab featuring a list of suggested videos. How user-specific these suggestions are depends on whether users are logged in or share cookies and other identification information. While some suggestions seem to be clearly user-centric and depend on user navigation history (generally labeled by YouTube as Recommended for you), others appear to be node-centric, i.e. stemming from a pool of suggestions attached to the video itself, independently of the user history. In this latter case, suggested videos most likely depend on inferences made from platform-level behavioral traces accumulated over an unknown pool of users and an unknown period of time. This engenders a dichotomy between user-specific suggestions and what may be called a “mean field” of user-independent suggestions. We aim at characterizing this mean field, while leaving personalized recommendation outside of the scope of this paper. We do not aim at all at reverse-engineering the way node-centric suggestions are being computed by the platform, but rather wish to understanding the navigation landscape that YouTube algorithms contribute to shape. In other words, we take for granted how these recommendations are built and focus on characterizing this landscape. Users are admittedly exposed to both types of suggestions, yet we contend that the analysis of the mean recommendation field is already likely to shed light on the attraction forces that are due to node-centric suggestions, all other things being equal.

Data

In practice, we thus study user-independent suggestion lists attached to videos by creating non-persistent, anonymous sessions with simple HTTPS requests on a given page from a set of about a hundred IP addresses located in the region of Paris, France. We first define a diverse set of YouTube videos by arbitrarily selecting five distinct sets of sources which feature links to such videos. Two of these sets aim to capture mainstream use by focusing on “Top” videos listed on Reddit and Wikipedia. The first set, denoted as “Reddit top” consists of the YouTube URLs contained in the most voted-up posts of 20 of the most subscribed subreddits (i.e., forums) listed on redditlist.com. The second set gathers the 5 most viewed videos from the 50 YouTube channels listed on the “List of most-subscribed YouTube channels” Wikipedia page [33], which we denote as “Wikipedia Top”. This yields a diverse range of popular content mainly categorized by YouTube as “Music”, “Entertainment”, “Howto & Style” or “Science & Technology”. The remaining sets focus on the activity surrounding the 2019 European Parliament election. While not as popular or representative of the use of YouTube, focusing on this context also contributes to reach political and election-related content. More precisely, the third set, denoted as “Twitter”, consists of the most shared YouTube video links found on the micro-blogging platform over the 3 weeks leading to the 2019 European Parliament Elections and associated with a set of 22 hashtags that were manually selected to cover discussions related to the elections in 4 European languages: English, French, German and Italian (such as #EuropeanElections2019, #Europeennes2019, #Europawahl2019, #elezionieuropee, among others). The fourth and fifth sets are based on channels of political parties engaged in the 2019 European elections in France and in Germany. In Germany, we focus on the 13 political parties that obtained a seat after the vote. In France, we equivalently consider the 13 main parties in descending order of obtained votes. In both cases, we identify the ten most viewed videos on each channel. We denote these seed sets respectively as “Political DE” and “Political FR”. This political selection yields a more homogeneous set of videos than the mainstream ones: the three subsets consist of content that is principally categorized by YouTube as “News & Politics”.

Seed set Seeds Views Top categories %
min median max (using YouTube labels)
Reddit Top 178 2,102 1,595k 228m Entertainment (15.7%), Science & Technology (12.9%), People & Blogs (12.9%), Howto & Style (11.8%), Music (10.1%)
Wikipedia Top 161 528,159 91,430k 4,242m Music (34.8%), Entertainment (22.4%)
Political DE 73 299 111k 1.25m News & Politics (100%)
Political FR 88 116 33k 0.8m News & Politics (86.4%)
Twitter Top 184 20 10k 34m News & Politics (68.5%)
Table 1: Seed categories and basic statistics.

Table 1 gathers basic statistics for these seed sets. Server errors, videos deleted during the data acquisition process and a handful unidentified crawling and parsing errors explain the discrepancy between the number of videos targeted for each set and the number of actually extracted seeds. While selecting IDs in a purely random manner across the platform could yield a more uniform sampling of video IDs on YouTube, it could bear the potential risk of overemphasizing insignificant videos with an extremely limited audience (using a protocol similar to [29], we verify that this would indeed be the case: a selection of 50 such random videos yields a median number of views of 115, far below the seed categories considered here).

Each HTTP request for the recommendations attached to a YouTube video returns a set of a maximum of 20 suggestions (in practice, exactly 20 suggestions four fifth of the time, and 19 about a fifth of the time). Our first conception of a model of a user navigating through these node-centric recommendations would thus consist of a walk in a directed recommendation graph whose nodes all have an out-degree of 19-20. However, for a given video, this set appears to fluctuate significantly from a request to the other, bearing the risk of exploring a very unstable and thus unreliable recommendation graph: examining the temporal features of these suggestions is thus a prerequisite to construct such a graph.

To this end, we proceed with a long crawl centered on seeds and aimed at understanding the variation and potential evolution of suggestion sets across successive requests. Along the way, we also collect video metadata such as the number of views and appreciation statistics: number of thumbs up (likes), down (dislikes). For each seed, we carry out a total of 2,000 requests at a regular average interval of about 10 minutes, thus covering a bit less than two weeks of sampling. This yields a node-centric time series of sets of suggestions.

Stability of a recommendation plateau

We first compute the frequency of occurrence and recurrence aggregated over a certain number of requests in order to appraise the stability of suggestions and thus of the related network. Irrespective of the sampling duration, yet even more so for shorter time spans, a “plateau” of consistently highly frequently suggested videos quickly emerges (several of them are often recommended nearly 100% of the time), beyond which occurrence frequencies decrease steeply. The size of this plateau may be dynamically determined through a simple change-point analysis restricted to suggestions appearing at least, say, 1% of the time. This lower bound does not significantly change the position of the detected change point but is needed to prevent the very flat long tail of the distribution to interfere with the detection process. The plateau is generally found to feature between around 20 and 30 videos (, ). Nonetheless, its erosion over time suggests the existence of a slow renewal process. In the longer term, the ordered distribution of occurrence frequencies progressively takes the shape of an heterogeneous distribution apparently exhibiting a power-law-like tail with a cut-off. In figure (b)b, we gather the frequency of occurrence of videos with respect to their rank, for various durations of aggregation. For instance, we see that the tenth most frequent suggestion after requests (i.e. over about hours, red curve) appears about 70% of the time. For all sampling durations , and , and all the more for the shorter ones, occurrence frequency is relatively high up to the 20th most frequent suggestion and then markedly decreases afterwards. This suggests that exit routes leaving from a given seed and, thus, the recommendation graph induced by mean-field suggestions, are rather stable when observed on a relatively short time span of a couple of days.

(a) Occurrence frequencies of suggested videos
(b) Recommendation lifespans.
Figure 3: (a). Number of recommendations with a lifespan for various thresholds , and

(averages are central lines, along with their 95%-confidence intervals). A lifespan of

means that a recommendation appeared at least of the time over a sliding window of

successive requests, at two distinct moments at least

requests apart. Inset: average presence of a suggestion over its lifespan as a function of the lifespan (again for the three thresholds). (b). Recommendation lifespans after sampling requests, ordered by rank, averaged over all seeds. Inset: zoom on the inflection area typically occurring around the 20th suggestion for .

To further qualify this observation, we turn to the study of the lifespan of suggestions. For a given seed video, we define the occurrence frequency of a suggested video over a sliding window of sampling requests as . We fix , consistently with the above-observed minimal amount of requests needed to observe a robust plateau. We then denote as the lifespan of , defined as the difference between the first and the last moment where its average occurrence frequency goes above a certain threshold. Put simply, the lifespan of a suggestion is such that it appeared above this threshold frequency at two moments separated by such a length of time. This does not mean, however, that it appeared consistently above this threshold over that length of time. We plot the numbers of suggestions having a lifespan of at least (instead of exactly , since we ignore what happens before or after we started collecting data). Figure (a)a shows the distribution of lifespans for various thresholds: corresponds to suggestions appearing at least once (i.e. at all), whereas focuses on very dominant suggestions which appear at least 90% of the time over their lifespan and thus principally belong to the plateau. While this graph exhibits a relatively large number of short-lived suggestions, it also demonstrates that the plateau videos are likely to be present for a significant time. This is all the more as suggestions with higher lifespans also appear more frequently across their lifespan and not just at its extremities, as demonstrated by the inset in figure (a)a.

This bears two conclusions when considering the recommendation graph induced by suggestions. First, focusing on the plateau would suffice as it concentrates most of the density of suggestion occurrence. This plateau has a modal distribution size and thus entails a network with a modal, homogeneous degree distribution – a quite peculiar object with respect to classical web topologies, which are generally heterogeneous. Second, this graph should be relatively stable in the short term, which substantiates the idea that a graph exploration protocol spanning over a short period would plausibly approximate well the recommendation graph faced by users during a navigation session.

Induced recommendation graphs

Figure 4: Illustration of the recursive crawl focused on a given seed video.

Recommendations are crawled for the seed until a plateau may be estimated, which defines the direct neighbors of the seed and a set of nodes at depth 1. This process is repeated for all nodes at depth 1 in parallel, thus defining depth 2, and again with nodes at depth 2. In the end, the recommendation graph induced by the seed contains nodes at depths 1 and 2 and potentially includes links towards already explored nodes, i.e. at depth 0 (seed), 1 (seed’s direct neighbors) and 2 (seed’s indirect neighbors). There are on average 23.6 nodes at depth 1 (

), 325 nodes at depth 2 () and 2830 nodes at depth 3 (). Some elements are shaded simply to indicate that we do not represent all nodes and links on this figure for the sake of clarity.

For each seed video, we now recursively crawl suggestions belonging to the above-evoked plateau computed by changepoint analysis for 20 requests. We repeat this until reaching a depth of 3, which constitutes the horizon we consider for recommendation graphs. In other words, the graph induced by a seed video contains its direct suggestions and two levels of indirect suggestions, as well as all the links between these nodes. See an illustration in figure 4. Choosing an arbitrary depth of 3 is a trade-off between sampling frequency (to keep a reasonable bandwidth with YouTube servers) and sufficiently deep exploration of the various graphs. They are each collected in about 58.2hr () which roughly remains within the plateau stability window (this corresponds to the time elapsed after about 350 requests in figure (a)a). Graphs contain an average of nodes and edges. We crawled plateaus from nodes up to depth 2 (i.e., for around 89.0k videos) and thus visited nodes up to depth 3 (reaching a total of 540k videos).

Graph entropy, diversity and confinement

We are specifically interested in exploring confinement within recommendation graphs. To this end, we devised two metrics. The first one is based on random walks, which play the role of a very simple and abstract model of user navigation (e.g., [25], to describe graph families). Random walks always start from ego (the seed video of the induced recommendation graph), and terminate when they reach a length of . Results were not very sensitive to this constant, unless it is so small that meaningful walks can no longer be captured (). Other plausible random walk strategies include a restart once revisiting a node, or a restart once revisiting ego. Again, we found very similar results under such strategies, so we settled for the simplest one. We measure the diversity of visited nodes by computing the information entropy of the set of frequencies of visits. For one random walk, we refer to this measure as . For each graph we perform random walks – again, a value chosen to be high enough so that the results are stable across runs. The mean random walk entropy () gives us an estimation of the confinement of an idealized user exploring the recommendation graph from ego. The lower the entropy / diversity, the higher the confinement. Another metric that we consider is the number of nodes in a recommendation graph (

). This configures a direct measure of the number of video recommendations that can be accessed from ego while not exceeding our maximal depth, independently of the probability of a user reaching a given node. Given that all out-degrees are roughly equal to 20 and maximal depth is 3 for all graphs,

becomes indeed smaller when the set of targets accessible from the graph exhibits redundancy. To summarize, the first metric measures the propensity for diversity from the perspective of an idealized user following recommendations, and is determined by the topology of the graph. The second metric measures the global potential for diversity of the graph, independently of user behavior, and is simply determined by the size of the set of recommendations up to a certain depth.

(a) Induced recommendation graphs plotted according to number of nodes in the graph () and mean random walk entropy (

). Points are colored according to number of views, on a log scale presented on the right. Solid green lines indicate medians, red dashed line is a linear regression of the distribution. Three points are marked in this latter line: one at each extremity and one in its middle.

(b) Sample graph 1
(c) Sample graph 2
(d) Sample graph 3
Figure 9: Induced recommendation graphs and sample visualizations. The three sample graphs (b), (c) and (d) are the closest ones to the three points indicated in the regression line of plot (a). Nodes (and adjacent edges) are colored according to the category of the video they correspond to.

In figure 9 we show that the two metrics are negatively correlated (). This is somewhat counter-intuitive: it means that the more diverse the mean random walk is, the less diverse the graph is, overall. The dots in the scatter plot are colored according to the number of views of ego on a log scale. The darker the dot, the more views ego has. This helps illustrate another interesting fact: number of views are positively correlated with mean random walk entropy () and negatively correlated with the number of nodes in the graph (). All of these correlations have a p-value . It appears that, as videos receive views, their overall recommendation graphs contract, becoming significantly smaller in number of nodes, while the diversity of the mean random walk increases. We first provide illustrative visualizations of three sample graphs, corresponding to the closest graphs to the two extremes and the middle point of the regression line. These sample graphs provide a preliminary intuition of how topology changes across the spectrum defined by the correlation line.

Higher random walk entropy thus corresponds to smaller graphs, as well as denser graphs: there is a strong correlation between and , the average degree of the graph (). Graph contraction goes with increased connectivity – in the sense that everywhere is more accessible from everywhere else: even if the number of potentially accessible videos gets smaller (as graph size decreases), the number of actually accessible videos increases (as further exemplified by the very strong correlation between and ). Put differently, graph contraction nevertheless results in more isotropy in a smaller space: graphs with higher entropy lead to more videos being visited on average (higher ) whereas they stem from a smaller potential selection (smaller ).

Furthermore, we could confirm that graphs with higher (i.e., more diverse random walks while having a smaller ) do also qualitatively appear to users to be more confined semantically. To substantiate this empirically, we designed a simple human-based protocol. We produced three sets of 20 seed videos which are closest respectively to each extreme and the middle point, similarly to the above procedure. We recruited six participants: each of them received plateau recommendations for 20 seed videos randomly selected among the 60, without knowing anything about them. We then asked them to tell us, for each seed, whether plateau videos are similar to one another or not, on a scale of 5 stars, from most similar (*****) to least similar (*). We gathered the aggregation of their subjective evaluation of the semantic confinement of plateau videos in figure 10. We see that region 1 videos were perceived as most confined, while region 3 videos were seen as least similar, thus confirming a link between and semantic confinement.

Figure 10: Human evaluation of confinement. Plateau recommendations for seed videos stemming from region 1 (largest entropy ) are generally perceived as most similar (five stars), while the opposite holds for region 3. Region 2 appears as a middle way.

Confinement and seed properties

To expand our empirical exploration of confinement, we consider a number of other metrics. For the seed videos, we consider their age in seconds (), their number of likes () and dislikes (), and the number of subscribers () of the channel that they belong to. For the recommendation graphs, we apply the same random walk strategy to measure confinement or diversity in terms of video authors () and categories (), as provided by YouTube.

Figure 11: Pearson correlations between various recommendation graph metrics. Asterisks indicate significance: *** for , ** for , * for .

In figure 11 we present the correlations found between the above-mentioned metrics as well as the two original diversity metrics ( and ) and the number of views (). It can be observed that all metrics that correspond to explicit user actions (, and ) are highly inter-correlated, and also highly correlated with the number of subscribers () of the channel of the seed video, hinting at an audience effect. To evaluate consensus around a video, we also derive from and a contentment index (), computed as the log of the ratio of the number of likes (plus one, for consistency reasons regarding the log) over the number of dislikes (plus one, to avoid divisions by zero) i.e., . There are generally more likes than dislikes and the opposite happens in about only 0.6% of the cases. Interestingly, this index is at best weakly correlated with explicit actions, thus denoting an intrinsic property of videos. As for the two extra random walk entropy measures, we find that unlike , is positively correlated with (), and that is only very weakly correlated with (). The mild positive correlation between and is already hinted at by the category coloring of the sample graphs in the lower panel in figure 9. As already mentioned, mean number of distinct visited nodes per random walk () and mean degree () are very strongly correlated with . Finally, we see that age shows close to no correlation with any of the metrics, except for a weak correlation with ().

A plausible interpretation for the interplay between random walk diversities (especially and ), recommendation graph size () and number of views (), arises from modeling the recommendation engine as a knowledge-discovery process. By viewing a video, the user provides empirical data on the probability of relatedness of the video being watched and all the videos the user has watched before. Of course, there are certainly myriad implementation details on how different signals and pieces of information about the user and the video are taken into account to tweak the recommendation process. Here we are not interested in reverse-engineering a given recommendation engine, but instead in using empirical data to try to uncover more general dynamics from a user’s perspective. This is of particular interest to understand how a generic recommendation engine may mediate the exploration of a given cultural space by human actors. Independently of the details, it appears trivial to assume that users viewing videos also provide a connection between this video and the videos previously seen by them. The observation that the age of a video has almost no correlation with any of the other metrics goes in favor of this interpretation: the dynamics of the system appears to be dominantly driven by the actions of its users.

This standpoint invites us to take the number of nodes in the recommendation graph as an expression of uncertainty. The user is given more choices, but these choices lead to more constrained paths. As a video receives views, and so knowledge about relatedness of this video to other videos in the system increases, recommendations become possibly more focused: smaller in overall number, but more inter-related between themselves, and thus further constraining the user in a general sense, while providing a more diverse navigation path, in terms of distinct video IDs, within this more constrained realm. This interpretation is given further credence by the fact that, even though random walk video diversity increases with , random walk category diversity decreases. In other words, the user is exposed to a higher diversity of unique videos on a less diverse set of topics.

Confinement and transitions

Figure 12: Recommendation transition matrices for all nodes, with respect to topical categories (left), contentment index (middle

) and quartiles of numbers of views (

right).

We dig further this notion of topical confinement by focusing on the node level and especially the navigations induced by jumping from a video to another one. More precisely, for each node that appears in any crawl, we compute the outgoing transition probabilities for immediate recommendations i.e., we examine dyadic directed links from a node to the members of the plateau found for that node. We distinguish three types of features related to topics, on the one hand, and to explicit user actions, on the other hand; all of which are linked to some intrinsic property of a seed video (semantics, popularity, consensus):

  • topical categories, found in the meta data of the respective videos. We focus on the six top categories in the whole data set (News & politics, Entertainment, Music, People & Blogs, Science & technology, Howto & Style). YouTube provides for many other possible categories which each appear less than a dozen times here, so we gathered them as “[Other]”.

  • contentment indices, defined as before as the log of the ratio of likes over dislikes. Since negative values are rare, we gather them into a single category denoted as “negative”. Integer ranges strictly above 4 are also strongly underpopulated (less than a dozen of occurrences each) and are, again, gathered as “[Other]”.

  • number of views, binned as quartiles whose boundaries are views.

In figure 12, we show the likelihood of jumping from a video with some property to a recommended video of the plateau with some property as transition matrices. Results are aggregated over all nodes appearing in the various seed-centric crawls.

For one, it appears that topical categories are also generally topological categories, even though we observe large variations across topics: from “Music” which is massively self-reinforcing, to “People & blogs” which rather redistribute users toward other topics, especially “Entertainment”.

The effect of “contentment” displays a quite different picture. There are few negatively rated videos and contentment typically ranges between 1 and 3. Yet, there is also a tendency to redistribute users toward videos which are more positively rated so, in a sense, the recommendation landscape does not confine users into controversial areas.

Views follow a rather automorphic tendency where, irrespective of the origin quartile, recommended videos generally exhibit the same order of magnitude as the origin video. This effect is particularly strong for the most viewed videos. As such, the recommendation landscape does not seem to push viewers of less viewed videos towards most viewed videos. Furthermore and similarly, mainstream videos do not appear to forward users towards less viewed videos, which seems to be likely to induce a reinforcement mechanism in these areas, opposite to the conclusions of [28]. One may suggest that we just observe here the result of an a posteriori redistribution mechanisms where videos recommended from the most viewed ones incidentally garner views and end up in the highest quartile as well. This possibility is however invalidated by the computation of these transition matrices restrained to newly appearing videos only i.e., videos that were not part of the plateau when collecting recommendation graphs (see below): these matrices do exhibit exactly the same patterns as the ones shown on figure 12.

In other words and to summarize, following mean-field recommendations, users are incited (1) to navigate within the same topical category, especially so for musical and political/news videos, (2) to remain in sets of videos which have rather comparable numbers of views, especially so for mainstream videos, and (3) to go towards more consensual videos, to a lesser extent when videos are moderately consensual.

Evolution of recommendation graphs and origin of novelty

Figure 13: Provenance of new suggestions for seed videos. Left: Distribution of the percentage of novelty: percentage of plateau recommendations which are new at the end of the long crawl (after requests) vs. its beginning. Middle: Distribution of the percentage of such novelty which could already be found deeper in the recommendation graph at the beginning of the crawl. Right:

Average, over all seeds, of the provenance of plateau recommendations, with respect to their position in the recommendation graph. Error bars indicate standard deviations.

We previously observed that recommendation sets attached to a seed video slowly evolve with time. New suggestions appear in the plateau over time. We may ask in which direction does the introduction of novelty in recommendation sets alter the picture that we sketched so far and, in particular, where do new suggestions come from and what percentage of them stems from inside vs. outside the known recommendation graph. Put differently, is novelty really novel? To check this, we consider as novelty the new plateau suggestions for seed videos appearing at the end of the long crawl i.e., requests after the recommendation graph has been collected. We first notice that percentages vary greatly across seed videos, as shown on the left panel in figure 13: most plateaus nevertheless exhibit at least a third of novel videos, with an average of about 58%. However, many of these novel recommendations can be found not far in the recommendation graph, at depth 2 or 3. In other words, a significant portion of suggestions at come from inside the known graph at (almost four in five): reinforcement is also at work here, in the sense that new suggestions are either direct or indirect neighbors.

Similarly, we could also verify that transitions matrices restricted to novel recommendations are of the same nature as those which were observed in figure 12: the aggregated matrices look almost indistinguishable from the original matrices (we thus do not shown them here).

Concluding remarks

This work was focused on recommendation graphs extracted from YouTube. Two types of findings were attained: firstly about the temporal dynamics of the mean-field recommendations provided by the platform for a given seed video, and secondly about the configuration of local recommendation graphs centered around seed videos, especially in regard to confinement and diversity. The former does not aim at reverse-engineering: it is purely instrumental to the purpose of the latter. In this respect, we could exhibit a plateau of highly frequently suggested videos and characterized this phenomenon statistically, both in terms of size and duration. This led to an exploration and retrieval protocol that is both computationally feasible and leads to observables – the recommendation graphs – with well-justified and empirically grounded boundaries. Recommendation graphs are, for one, a peculiar sort of networks, with a modal degree distribution.

In turn, the analysis of these graphs according to several metrics, notably measures of confinement, led to a better understanding of recommendation dynamics, including its interaction with users. In a nutshell, be it in topological, topical or temporal terms, the landscape of what we call mean-field YouTube recommendations generally exhibits confinement. However, we could also show that this claim must be nuanced in various directions.

  • First, recommendation graphs exhibit a wide range of values of entropies: some graphs are more confined or confining than others. Counter-intuitively, higher entropies (in terms of navigation) are associated with lower diversity (in terms of distinct number of accessible videos). This hints at a dichotomy where some seed videos are at the root of an isotropic navigation (higher entropy) in a more limited space of videos (lower size).

  • Second, we could demonstrate that higher entropies are found for seed videos with a higher number of views. We hypothesized that a higher popularity means that more information could be collected and thus plausibly enabled the platform to refine and in passing contract the associated recommendation graph. This contributes to hint at a dynamic of increasing confinement driven by user activity.

  • Third, we exhibited the existence of confinement in topical terms (categories are endogenous), temporal terms (seemingly new recommendations are not to be found too far in the recommendation graph), popularity terms (high view videos transition to high view videos, keeping in mind the correlation between the number of views and topological confinement), but not in contentment terms.

Future work should certainly appraise a variety of other modes of recommendation (such as personalized suggestions), other types of behavior (such as organic navigation, whereby users search for videos by themselves) and a mix thereof (such as browsing on subscription-based channels). On the whole, the analysis of the graphs we extracted nonetheless demonstrate the diversity of navigation anisotropy on YouTube in a variety of dimensions. They also suggest that the most confined graphs i.e., potential bubbles, are organized around videos that garner the highest audience and plausibly viewing time. Admittedly, our work could help devise algorithms that make users aware of their possible confinement, in line with [34] and [35]. While our results further indicate that it is difficult to provide a binary answer to the question of confinement on this platform, they appear to nuance the emerging picture in the literature that implicit recommendation has a neutral or even horizon-expanding role.


Acknowledgements

We are grateful to Katharina Tittel for her participation in the Twitter data set perimeter definition, and to Lucie Lamy, Serge Reubi and Ayşe Yuva for their kind contribution to the human-based confinement evaluation step. This paper has been partially realized in the framework of the “Algodiv” grant (ANR-15-CE38-0001) funded by the ANR (French National Agency of Research) and the “Socsemics” Consolidator grant funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 772743).

References

  • [1] Nguyen TT, Hui PM, Harper FM, Terveen L, and Konstan JA. Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web, pages 677–686. ACM, 2014.
  • [2] Bakshy E, Messing S, and Adamic LA. Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239):1130–1132, 2015.
  • [3] Aiello LM and Barbieri N. Evolution of ego-networks in social media with link recommendations. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 111–120. ACM, 2017.
  • [4] Datta H, Knox G, and Bronnenberg BJ. Changing Their Tune: How Consumers’ Adoption of Online Streaming Affects Music Consumption and Discovery. Marketing Science, 37(1):5–21, 2018.
  • [5] Roth C. Algorithmic Distortion of Informational Landscapes. Intellectica, 70(1):97–118, 2019.
  • [6] Haim M, Graefe A, and Brosius HB. Burst of the Filter Bubble? Effects of personalization on the diversity of Google News. Digital Journalism, 6(3):330–343, 2018.
  • [7] Bessi A, Zollo F, Del Vicario M, Puliga M, Scala A, Caldarelli G, Uzzi B, and Quattrociocchi W. Users Polarization on Facebook and Youtube. PLoS ONE, 11(8):e0159641, 2016.
  • [8] Möller J, Trilling D, Helberger N, and van Es B. Do not blame it on the algorithm: an empirical assessment of multiple recommender systems and their impact on content diversity. Information, Communication & Society, 21(7):959–977, 2018.
  • [9] Thurman N and Schifferes S. The Future of Personalization at News Websites. Journalism Studies, 13(5–6):775–790, 2012.
  • [10] Zuiderveen Borgesius FJ, Trilling D, Möller J, Bodó B, de Vreese CH, and Helberger N. Should we worry about filter bubbles? Internet Policy Review, 5(1), 2016.
  • [11] Dylko I, Dolgov I, Hoffman W, Eckhart N, Molina M, and Aaziz O. The dark side of technology: An experimental investigation of the influence of customizability technology on online political selective exposure. Computers in Human Behavior, 73:181 – 190, 2017.
  • [12] Munson SA and Resnick P. Presenting diverse political opinions: how and how much. In Proc. SIGCHI Conf. on human factors in computing systems, pages 1457–1466. ACM, 2010.
  • [13] Chen J, Nairn R, and Chi EH. Speak Little and Well: Recommending Conversations in Online Social Systems. In Proc CHI’11 Vancouver, BC, Canada, pages 217–226. 2011.
  • [14] Rader E and Gray R. Understanding User Beliefs About Algorithmic Curation in the Facebook News Feed. In Proc. ACM CHI’15, pages 173–182. 2015.
  • [15] Salganik MJ, Dodds PS, and Watts DJ. Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science, 311:854–856, 2006.
  • [16] Steck H. Item Popularity and Recommendation Accuracy. In Proc. RecSys’11, Oct 23-27, 2011, Chicago, IL, pages 125–132. 2011.
  • [17] Messing S and Westwood SJ. Selective exposure in the age of social media: Endorsements trump partisan source affiliation when selecting news online. Communication Research, 41(8):1042–1063, 2014.
  • [18] Munson SA, Lee SY, and Resnick P. Encouraging reading of diverse political viewpoints with a browser widget. In Proc. ICWSM 7th AAAI Intl. Conf. Weblogs and Social Media, pages 419–428. AAAI press, 2013.
  • [19] Nicas J. How YouTube Drives People to the Internet’s Darkest Corners. https://www.wsj.com/articles/how-youtube-drives-viewers-to-the-internets-darkest-corners-1518020478, February 2018.
  • [20] Tufekci Z. YouTube, the Great Radicalizer. https://www.nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical.html, March 2018.
  • [21] Conover MD, Ratkiewicz J, Francisco M, Gonçalves B, Menczer F, and Flammini A. Political polarization on twitter. In Fifth international AAAI conference on weblogs and social media. 2011.
  • [22] Barberá P, Jost JT, Nagler J, Tucker JA, and Bonneau R. Tweeting from left to right: Is online political communication more than an echo chamber? Psychological science, 26(10):1531–1542, 2015.
  • [23] Jacobson S, Myung E, and Johnson SL. Open media or echo chamber: the use of links in audience discussions on the Facebook Pages of partisan news organizations. Information, Communication & Society, 19(7):875–891, 2016.
  • [24] Vicario MD, Zollo F, Caldarelli G, Scala A, and Quattrociocchi W. Mapping social dynamics on Facebook: The Brexit debate. Social Networks, 50:6 – 16, 2017.
  • [25] Garimella K, Morales GDF, Gionis A, and Mathioudakis M. Quantifying controversy on social media. ACM Transactions on Social Computing, 1(1):3, 2018.
  • [26] Davidson J, Liebald B, Liu J, Nandy P, Van Vleet T, Gargi U, Gupta S, He Y, Lambert M, Livingston B, et al. The YouTube video recommendation system. In Proceedings of the fourth ACM conference on Recommender systems, pages 293–296. ACM, 2010.
  • [27] Covington P, Adams J, and Sargin E.

    Deep neural networks for youtube recommendations.

    In Proceedings of the 10th ACM conference on recommender systems, pages 191–198. ACM, 2016.
  • [28] Zhou R, Khemmarat S, and Gao L. The impact of YouTube recommendation system on video views. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, pages 404–410. ACM, 2010.
  • [29] Park M, Naaman M, and Berger J. A data-driven study of view duration on youtube. In Tenth International AAAI Conference on Web and Social Media. 2016.
  • [30] Zhou R, Khemmarat S, Gao L, Wan J, and Zhang J. How YouTube videos are discovered and its impact on video views. Multimedia Tools and Applications, 75(10):6035–6058, 2016.
  • [31] Cheng X, Dale C, and Liu J. Statistics and social network of youtube videos. In 2008 16th Interntional Workshop on Quality of Service, pages 229–238. IEEE, 2008.
  • [32] Airoldia M, Beraldo D, and Gandini A. Follow the algorithm: An exploratory investigation of music on YouTube. Poetics, 57(August):1–13, 2016.
  • [33] Wikipedia. List of most-subscribed YouTube channels. https://en.wikipedia.org/wiki/List_of_most-subscribed_YouTube_channels, 2019. [Online; accessed 15-May-2019].
  • [34] Resnick P, Garrett RK, Kriplean T, Munson SA, and Stroud NJ. Bursting your (filter) Bubble: Strategies for Promoting Diverse Exposure. In CSCW ’13 Companion, Feb. 23–27, 2013, San Antonio, Texas, USA, pages 95–100. 2013.
  • [35] Ekstrand MD, Kluver D, Harper FM, and Konstan JA. Letting Users Choose Recommender Algorithms: An Experimental Study. In Proc. ACM RecSys’15 Ninth ACM Conf. on Recommender Systems, pages 11–18. 2015.