Understanding the Incel Community on YouTube

01/22/2020 ∙ by Kostantinos Papadamou, et al. ∙ Boston University Binghamton University UCL 0

YouTube is by far the largest host of user-generated video content worldwide. Alas, the platform also hosts inappropriate, toxic, and/or hateful content. One community that has come into the spotlight for sharing and publishing hateful content are the so-called Involuntary Celibates (Incels), a loosely defined movement ostensibly focusing on men's issues, who have often been linked to misogynistic views. In this paper, we set out to analyze the Incel community on YouTube. We collect videos shared on Incel-related communities within Reddit, and perform a data-driven characterization of the content posted on YouTube along several axes. Among other things, we find that the Incel community on YouTube is growing rapidly, that they post a substantial number of negative comments, and that they discuss a broad range of topics ranging from ideology, e.g., around the Men Going Their Own Way movement, to discussions filled with racism and/or misogyny. Finally, we quantify the probability that a user will encounter an Incel-related video by virtue of YouTube's recommendation algorithm. Within five hops when starting from a non-Incel-related video, this probability is 1 in 5, which is alarmingly high given the toxicity of said content.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

While YouTube has revolutionized the way people discover and consume video content online, it has also enabled the spread of inappropriate and hateful content.

One fringe community that has been particularly active on YouTube are the so-called Involuntary Celibates, or Incels [33]. While not particularly structured, Incel ideology revolves around the idea of the “blackpill,” a bitter and painful truth about society, which roughly postulates that life trajectories are determined by how attractive one is. For example, Incels often deride the so called rise of “lookism,” where things that are in large part out of personal control, like facial structure, are more “valuable” than those that are under our own control, like the fitness level. Taken to the extreme, these beliefs can lead to a dystopian outlook on society, where the only solution is a radical, potentially violent shift towards traditionalism, especially in terms of the role of women in society [7].

To the best of our knowledge, Incels are actually the most extreme subgroup of the Manosphere [3], a larger collection of movements driven by what is generally characterized as misogynistic beliefs [16]. In fact, Incel ideology has been linked to multiple mass murders and violent offenses. In May 2014, Elliot Rodger killed six people (and himself) in Isla Vista, CA. This incident was a harbinger of things to come. Rodger uploaded a video on YouTube with his “manifesto,” as he planned to commit mass murder due to his belief in what is now generally understood to be Incel ideology [48]. He served as an apparent “mentor” to another mass murderer who shot nine people at Umpqua Community College in Oregon the following year [44]. In 2018, another mass murderer killed nine people in Toronto, and during his interview with the police claimed that he had been radicalized online by the Incel ideology [6]. Thus, while the concepts underpinning the Incel ideology may seem “just” absurd, they also have serious real-world impact.

This paper explores the footprint of the Incel community on YouTube. More precisely, we identify and set to answer the following research questions:

  1. How has the Incel community grown on YouTube over the last decade?

  2. What can we learn by studying the use of language by the Incel-community on YouTube? In particular, what are the main themes involved in the discussions below Incel-related videos?

  3. Does YouTube’s recommendation algorithm contribute to steering users towards Incel communities?

To answer these questions, we collect a set of 18K YouTube videos shared on Incel-related subreddits (e.g., /r/incels, /r/braincels, etc.). We then build a lexicon of 200 Incel-related terms via manual annotation, using expressions found on the Incel Wiki. We use the lexicon to label videos as Incel-related, based on the appearance of terms in the title, tags, description, and comments of the videos. We also collect a set of 18K random videos and use it as a control dataset, to capture more general trends on YouTube. Next, we use several tools, including graph analysis, word embeddings, and sentiment analysis, to understand the use of language by the Incel community and investigate whether YouTube’s recommendation algorithm contributes to steering users towards Incel content.

Overall, our study leads to several interesting findings:

  • We find an increase in Incel-related activity on YouTube over the past few years and in particular in the publication of Incel-related videos, as well as in comments that include related terms. This indicates that Incels are increasingly exploiting the YouTube platform to spread their views.

  • Using word2vec, along with graph visualization techniques, we find several interesting themes in Incel-related videos, with topics related to the Manosphere (e.g., the Men Going Their Own Way movement [31]) or expressing racism, misogyny, and anti-feminism.

  • Sentiment analysis reveals that Incel-related videos attract a substantially larger number of negative comments compared to other videos.

  • By performing random walks on YouTube’s recommendation graph, we find that, with probability, a user will encounter an Incel-related video within five hops if they start from a non-Incel-related video posted to Incel-related communities on Reddit. At the same time, we find that Incel-related videos are more likely to be recommended within the first two hops than in the subsequent hops. By comparing to the control dataset we find that a user casually browsing YouTube is unlikely to end up in a region of the YouTube recommendation graph that consists largely of Incel-related videos.

2 Background & Related Work

We now review Incel ideology as well as related work.

The Manosphere. Incels are a part of the broader “Manosphere,” a loose collection of groups revolving around a common shared interest in men’s rights in society. While we focus on Incels, it is worth understanding the Manosphere to get broader picture. Although the Manosphere had roots in academic-style feminism [36, 11], it is ultimately a reactionary community, with its ideology evolving and spreading mostly on the Web. Blais et al. [4] analyze this belief system from a sociology perspective and refer to it as masculinism. They conclude that masculinism is: “a trend within the anti-feminist counter-movement mobilized not only against the feminist movement, but also for the defense of a non-egalitarian social and political system, that is, patriarchy.” Coston et al. [8] argue, with respect to the growth of feminism: “If women were imprisoned in the home […] then men were exiled from the home, turned into soulless robotic workers, in harness to a masculine mystique, so that their only capacity for nurturing was through their wallets”. Subgroups within the Manosphere actually differ quite a bit. For instance, Men Going Their Own Way (MGTOWs) are more hyper-focused on a particular set of men’s rights, often in the context of a bad relationship with a woman.

Overall, the majority of research studying the Manosphere has been mostly theoretical and/or qualitative in nature [21, 16, 31, 18]. This is particularly important because it provides guidance for our study in terms of framework and conceptualization, while it motivates large-scale quantitative work like ours.

Incels. Incels are, to the best of our knowledge, the most extreme subgroup of the Manosphere [3]. Incel ideology differs from the other subgroups in the significance that the “involuntary” aspect of their celibacy has. They believe that society is rigged against them in terms of sexual activity, and there is no personal solution to systemic dating problems for men [23, 41]. Further, Incel ideology differs from, for example, MGTOW ideology, in the idea of voluntary vs. involuntary celibacy. MGTOWs are choosing to not partake in sexual activities. Incels, on the other hand, believe that society adversarially deprives them of sexual activities. This difference is crucial, as it gives rise to some of their more violent tendencies [16].

Incels believe to be doomed from birth to suffer in a modern society where women are not only able, but encouraged, to focus on superficial aspects in potential mates; e.g., facial structure, or racial attributes. Some of the earliest studies of “involuntary celibacy” note that celibates tend to be more introverted, and that, unlike women, celibate men in their 30s tend to be lower class or even unemployed [29]. In this distorted view of reality, men with these desirable attributes (colloquially known as Chads) are placed at the top of society’s hierarchy. While a perusal of powerful people in the world would perhaps lend credence to the idea that “handsome” white men are indeed at the top, Incel ideology takes it to the extreme.

Incels rarely hesitate to call for violence. This is often expressed in the form of self-harm, for example, “roping” (to hang oneself), however, it also approaches calls for outright gendercide. Zimmerman et al. [52] associate Incel ideology with white-supremacy, highlighting how Incel ideologies should be taken as seriously as other forms of violent extremism.

Incels and the Web. Massanari [34] performs a qualitative study of how Reddit’s algorithms, policies, and general community structure enables, and even supports, toxic culture. She focuses on the #GamerGate and Fappening incidents, both of which had primarily women victims, and argues that specific design decisions make it even worse for victims. For instance, the default ordering of posts on Reddit favors mobs of users promoting content over a smaller set of victims attempting to have it removed. She notes that these issues are exacerbated in the context of online misogyny because many of the perpetrators are extremely techno-literate, and thus able to exploit more advanced features of social media platforms.

Farrell et al. [10] perform the largest quantitative study of misogynistic language across the Manosphere on Reddit. They create nine lexicons of misogynistic terms which they use to investigate how misogynistic language is used in 6M posts from Manosphere related subreddits. Then, Jaki et al. [27]

study misogyny on the Incels.me forum, analyzing the language of users and detecting instances of misogyny, homophobia, and racism using a deep learning classifier that achieves up to 95% accuracy.

Harmful Activity on YouTube. YouTube’s role in harmful activity has been studied mostly in the context of detection. Agarwal et al. [1] present a binary classifier trained with user and video features to detect videos promoting hate and extremism on YouTube, while Giannakopoulos et al. [15] develop a k-nearest classifier trained with video, audio, and textual features to detect violence on YouTube videos. Jiang et al. [28] investigate how channel partisanship and video misinformation affect comment moderation on YouTube, finding that comments are more likely to be moderated if the video channel is ideologically extreme. Sureka et al. [45] use data mining and social network analysis techniques to discover hateful YouTube videos, while Ottoni et al. [38] analyze user comments and video contents on alt-right channels. Zannettou et al. [49] present a deep learning classifier for detecting videos that use manipulative techniques to increase their views (i.e., clickbait videos). Papadamou et al. [39] and Tahir et al. [46] focus on detecting inappropriate videos targeting children on YouTube, while Enrico et al. [32] build a classifier to predict, at upload time, whether or not a YouTube video will be “raided” by hateful users.

YouTube Recommendations. Covington et al. [9] provide a description of YouTube’s recommendation algorithm, focusing on two models: 1) a deep candidate generation model used to retrieve a small subset of videos from a large corpus; and 2) a deep ranking model used to rank those videos based on their relevance to the user’s activity. Zhao et al. [51] introduce a large-scale ranking system for YouTube recommendations that extends the Wide & Deep model architecture with Multi-gate Mixture-of-experts for multitask learning. The proposed model ranks the candidate recommendations taking into account user engagement and satisfaction metrics. Ribeiro et al. [42] perform a large-scale audit of user radicalization on YouTube: they analyze videos from Intellectual Dark Web, Alt-lite, and Alt-right channels, showing that they increasingly share the same user base. They also analyze YouTube’s recommendation algorithm finding that Alt-right channels can be reached from both Intellectual Dark Web and Alt-lite channels.

3 Dataset

In this section, we present our video collection and annotation process.

3.1 Data Collection

Aiming to collect Incel-related videos on YouTube, we look for YouTube links on Reddit, because of extensive anecdotal evidence suggesting that Incels are particularly active on the platform. We start by building a set of subreddits that can be confidently considered related to Incels. To do so, we inspect around posts on the Incel Wiki [25] looking for references to subreddits, and compile a list of Incel-related subreddits111https://tinyurl.com/list-of-incel-subreddits. This list also includes a set of communities broadly relevant to Incel ideology (even possibly anti-incel like /r/Inceltears) to capture a broader set of relevant videos.

Subreddit #Users #Posts #Videos Min. Date Max. Date
ForeverAlone 86,670 1,921,363 6,761 2010-09 2019-05
Braincels 51,443 2,830,522 6,250 2017-10 2019-05
IncelTears 93,684 1,477,204 2,984 2017-05 2019-05
Incels 39,130 1,191,797 2,344 2014-01 2017-11
IncelsWithoutHate 7,141 163,820 550 2017-04 2019-05
ForeverAloneDating 27,460 153,039 465 2011-03 2019-05
askanincel 1,700 39,799 90 2018-11 2019-05
BlackPillScience 1,363 9,048 41 2018-03 2019-05
ForeverUnwanted 1,136 24,855 40 2016-02 2018-04
Incelselfies 7,057 60,988 32 2018-07 2019-05
Truecels 714 6,121 22 2015-12 2016-06
MaleForeverAlone 831 6,306 11 2017-12 2018-06
foreveraloneteens 450 2,077 9 2011-11 2019-04
gymcels 296 1,430 6 2018-03 2019-04
SupportCel 474 6,095 3 2017-10 2019-01
IncelDense 388 2,058 3 2018-06 2019-04
Truefemcels 95 311 2 2018-09 2019-04
gaycel 43 117 1 2014-02 2018-10
Foreveralonelondon 19 57 0 2013-01 2019-01
Table 1: Dataset overview.

We collect all submissions and comments made between June 1, 2005 and May 1, 2019 on the 19 Incel-related subreddits using the Reddit monthly dumps from Pushshift [2]. We parse them to gather links to YouTube videos, extracting 5M posts including 18K unique links to YouTube videos. Next, we collect the metadata of each YouTube video using the YouTube Data API [17]. Specifically, we collect: 1) title and description; 2) tags; 3) video statistics like the number of views, likes, etc.; and 4) the top comments, up to 1K, and their replies. Throughout the rest of this paper we refer to this set of videos, which is derived from Incel-related subreddits, as the “Incel-derived” videos.

Table 1 reports the total number of users, posts, linked YouTube videos, and the period of available information for each subreddit. Although recently created, /r/Braincels has the largest number of posts and the second largest number of YouTube videos. Also, even though it was banned in November 2017 for inciting violence against women [19], /r/Incels is fourth in terms of YouTube videos shared. Finally, note that most of the subreddits in our sample were created between 2015 and 2018, which indicates an increase in the popularity of the Incel community.

Ethics. Note, that we only collect publicly available data, make no attempt to de-anonymize users, and overall follow standard ethical guidelines [43]. In addition, our data collection fully complies with the terms of use of the APIs we employ.

3.2 Video Annotation

To determine how relevant the videos collected from Incel-related subreddits are to Incel ideology, we use a lexicon of terms that are routinely used by members of the Incel community. We first crawl the “glossary” available on the Incels Wiki page [24], gathering 395 terms. Since there are several terms that can also be regarded as general purpose (e.g., fuel, hole, legit, etc.), three researchers with domain expertise manually checked them to determine whether or not they are specific to the Incel community. Annotators were told to consider a term relevant only if it expresses hate, misogyny, or it is directly associated to Incel ideology (e.g., “Beta male”) or any Incel-related incident (e.g., “supreme gentleman” – an indirect reference to the Isla Vista killer Elliot Rodgers [48]). Any general purpose term is not considered relevant (e.g., “legit”).

#Comments with Incel-related
Terms in a Video
%Incel-related
Comments
%Incel-related
Videos
1
2
3
4
5
Table 2: Percentage of Incel-related comments and videos in random samples of videos that contain exactly 1, 2, 3, 4, or 5 or more comments with Incel-related terms.

We also compute the Fleiss agreement score ([13] across the annotators, obtaining , which is considered ”substantial“ agreement [30]. We then create our lexicon by only considering the terms annotated as relevant, based on the majority agreement of all the annotators, which yields a lexicon of 200 Incel-related terms222https://tinyurl.com/incel-related-terms-lexicon.

Next, we use the lexicon to label all the videos in our dataset. We look for these terms in the title, description, tags, and comments of the videos in our dataset. After inspecting the results, we find that most matches come from the comments. Hence, we decide to use the comments to determine whether a video is Incel-related or not. To select the lower bound of Incel-related comments that a video should contain to be labeled as “Incel-related,” we devise the following methodology:

  1. We consider a comment as possibly Incel-related if it contains at least one Incel-related term;

  2. We create sets of videos based on the number of Incel-related comments they contain – namely, exactly 1, 2, 3, 4, or 5 – and randomly sample 50 videos from each set;

  3. The resulting 250 randomly sampled videos are manually checked, along with all their comments that contain Incel-related terms, by the first author of this paper to determine if the video itself contains content directly related to Incel ideology (e.g., it is produced by or it is a news story about Incels) or if the comments are likely posted by Incels (e.g., they express hate against women or physically attractive men).

Subreddit #Incel-related #Other
Videos Videos
ForeverAlone 358 6,403
Braincels 943 5,307
IncelTears 369 2,615
Incels 314 2,030
IncelsWithoutHate 88 462
ForeverAloneDating 9 456
askanincel 14 76
BlackPillScience 15 26
ForeverUnwanted 10 30
Incelselfies 11 21
Truecels 5 17
MaleForeverAlone 2 9
foreveraloneteens 1 8
gymcels 5 1
IncelDense 0 3
SupportCel 1 2
Truefemcels 0 2
gaycel 0 1
Foreveralonelondon 0 0
Total (Unique) 1,773 16,521
Table 3: Labeled dataset overview.

Table 2 shows the results of this manual annotation process. For videos with less than five Incel-related comments, we find a lot of ambiguous examples of comments that are relevant to Incels but probably not posted by Incel users – e.g., “sounds like Alpha Dog,” “don’t ever stop crying because once you stop feeling it’s over,” “Alex Jones: Alpha Male Confirmed,” etc. We also observe that, as the number of considered Incel-related comments per video increases, so does the percentage of comments that are relevant to Incels and likely posted by them. We select five or more comments as a threshold as it delimits an acceptable percentage of comments probably posted by Incels. Taking into account this threshold, we consider a video as “Incel-related” if it contains at least five Incel-related comments, otherwise we consider it as “Other.” Note, that an Incel-related video does not necessarily mean that the video itself is related to Incel ideology. It may be a benign video that is heavily commented on by Incels.

Table 3 reports the number of videos as per our labeled dataset, which includes a total of Incel-related and other videos. (Note, that the total number of videos shared on all subreddits differs from the total number of videos overall because some of them are posted in more than one subreddits.) Interestingly, we observe that /r/ForeverAlone and /r/Braincels are the subreddits with the most YouTube videos ( and ), while on /r/Braincels we have the most Incel-related videos (). Also, although on /r/ForeverAlone we find substantially more videos than /r/IncelTears and /r/Incels, the three of them have the same amount of Incel-related videos.

3.3 Control set

Additionally, we collect a dataset of random videos and use it as control, to capture more general trends on YouTube. To collect Control videos we parse all submissions and comments made on Reddit between June 1, 2005 and May 1, 2019 using the Reddit monthly dumps from Pushshift and we gather all links to YouTube videos. From them, we randomly select 18,294 links for which we collect their metadata using the YouTube Data API. Following a similar approach as with the Incel-derived videos, we label the Control and we find a total of 477 Incel-related and 17,817 other videos.

Figure 1: Temporal evolution of the number of YouTube videos shared on each subreddit.
Figure 2: Cumulative percentage of videos published for both Incel-derived and Control videos.
(a)
(b)
Figure 3: Temporal evolution of the number of comments (a) and unique commenting users (b).
Figure 4: Self-similarity of commenting users in adjacent years for both Incel-derived and Control videos.

4 Temporal Analysis

In this section, we present a temporal analysis of our dataset.

4.1 Videos

We start by studying the “evolution” of the Incel communities in terms of the amount of videos they share. First, we look at the frequency with which YouTube videos are shared on various Incel-related subreddits over the years; see Fig. 1. The drop in the number of comments on Reddit in 2019 is due to our dataset covering activity only up to April 2019. We observe that, after June 2016, linking to YouTube videos becomes more frequent, and more so in 2018, and in particular on /r/Braincels. This likely indicates that the use of YouTube to spread Incel ideology is increasing.

In Figure 2, we plot the cumulative percentage of videos published for both Incel-derived and Control videos. While the increase in the number of other videos remains relatively constant over the years for both sets of videos, this is not the case for Incel-related videos, as and of them in the Incel-derived and Control sets, respectively, were published after 2014. Overall, we observe a steady increase in Incel activity, especially after 2018; this is particularly worrisome as we have several examples of users who were radicalized online and have gone to undertake deadly attacks, e.g., the Toronto shooter [20].

4.2 Comments

Next, we study the commenting activity on both Reddit and YouTube. In Figure 2(a), we plot the number of comments posted over the years for both YouTube Incel-derived and Control videos, and Reddit. Activity on both platforms starts to markedly increase after September 2015 with Reddit and YouTube Incel-derived videos having much more comments than the Control videos. Once again, the sharp increase in the commenting activity over the last few years confirms the increased popularity of the Incel Web communities and, likely, their user base.

To further analyze this trend, we also look at the number of unique commenting users on both platforms; see Figure 2(b). On Reddit, we observe that the number of unique users remains the same over the years, with a substantial increase after the end of 2015 from 23K to 204K in Q1 2019. This is mainly because the majority of the subreddits in our dataset () were created after the end of 2015. On the other hand, on YouTube, for the Incel-derived videos we once again observe a substantial increase from 126K in Q4 2015 to 773K in Q1 2019. An increase is also observed for the unique commenting users of the Control videos (from 133K in Q4 2015 to 415K in Q1 2019). However, it is not as sharp as the one of the Incel-derived videos ( vs increase in unique commenting users after 2015 in Control and Incel-derived videos, respectively).

To verify whether the sharp increase in unique commenting users of Incel-derived videos is due to increased interest by random users or to the Incel community maintaining/strengthening its user base, we use the Overlap Coefficient [47] similarity metric:

to measure user retention over the years for the videos in our dataset. More precisely, we calculate the similarity of commenting users with those doing so the year before, for both Incel-related and other videos in the Incel-derived and Control sets. The results of this calculation are shown in Figure 4. Interestingly, for the Incel-derived set we find a sharp growth in user retention on Incel-related videos after 2014 while this is not the case for the Control. We believe that this might be related to the increased popularity of the Incel communities. Last, we believe that the higher user retention of other videos in both sets is due to the much higher proportion of other videos in each set.

Video Category
Incel-derived
(Incel-related)
Incel-derived
(Other)
Control
(Incel-related)
Control
(Other)
People & Blogs 408 (23.0%) 2,188 (13.2%) 34 (7.1%) 1,797 (10.1%)
Entertainment 285 (16.1%) 2,155 (13.0%) 56 (11.7%) 2,056 (11.5%)
Music 210 (11.8%) 6,090 (36.9%) 79 (16.6%) 4,521 (25.4%)
Comedy 199 (11.2%) 1,888 (11.4%) 51 (10.7%) 1,489 (8.4%)
News & Politics 171 (9.6%) 551 (3.3%) 27 (5.7%) 711 (4.0%)
Education 160 (9.0%) 674 (4.1%) 26 (5.5%) 628 (3.5%)
Howto & Style 88 (5.0%) 256 (1.5%) 3 (0.6%) 405 (2.3%)
Film & Animation 71 (4.0%) 1,229 (7.4%) 23 (4.8%) 1,148 (6.4%)
Gaming 62 (3.5%) 594 (3.6%) 125 (26.2%) 2,649 (14.9%)
Sports 33 (1.9%) 230 (1.4%) 22 (4.6%) 1,088 (6.1%)
Science & Technology 33 (1.9%) 242 (1.5%) 20 (4.2%) 615 (3.5%)
Nonprofits & Activism 27 (1.5%) 162 (1.0%) 6 (1.3%) 162 (0.9%)
Pets & Animals 18 (1.0%) 123 (0.7%) 2 (0.4%) 160 (0.9%)
Autos & Vehicles 4 (0.2%) 48 (0.3%) 2 (0.4%) 244 (1.4%)
Travel & Events 4 (0.2%) 76 (0.5%) 1 (0.2%) 131 (0.7%)
Trailers 0 (0.0%) 11 (0.1%) 0 (0.0%) 10 (0.1%)
Shows 0 (0.0%) 1 (0.0%) 0 (0.0%) 0 (0.0%)
Movies 0 (0.0%) 3 (0.0%) 0 (0.0%) 3 (0.0%)
Table 4: Categories of videos in the Incel-derived and Control sets.

5 Video Analysis

In this section, we present a multi-faceted analysis of Incel-related videos.

Categories. Table 4 reports the categories of the videos in our dataset for both Incel-derived and Control videos. Interestingly, most of the Incel-related videos among the Incel-derived ones fall in the People & Blogs (), Entertainment (), and Music () categories, while among the Control videos most of the Incel-related ones fall under the Gaming category (). We manually inspect a sample of 20 Incel-related of the Incel-derived videos in People & Blogs, finding that for 19 of them the commenters are indeed men discussing Incel ideology. We do the same for the music videos, finding that: 1) 7 are Incel-heavy videos deliberately (mis)labeled as music videos by their uploaders, as shown in Figure 6; 2) 10 are random songs; and 3) the rest 3 are video clips of popular songs featuring women with millions of views.

(a)
(b)
Figure 5: Proportion of Incel-related vs other videos for the top 15 stems found in the titles of Incel-derived (a) and Control (b) videos.

For the Incel-derived set, we also observe a non-negligible number of both Incel-related and other videos in the News & Politics ( and ) category. Since these videos have been shared in Incel-related Web communities and, due to anecdotal evidence suggesting that the Incel community is influenced by the alt-right [14], we count how many videos have been posted by alt-right YouTube channels using a list obtained from the authors of [42]. Alarmingly, we find that and , respectively, of the Incel-related and other videos in this category are indeed posted by alt-right YouTube channels.

Finally, we notice a non-negligible percentage of Incel-related videos among the Incel-derived ones belonging in seemingly innocent categories like Education () and Science & Technology (). After inspecting a random sample of 20 videos from each category, we find that these are indeed related to Incel ideology (e.g., videos discussing what women like in men) that are deliberately mislabeled.

Title. We study the titles of the videos in our dataset to identify the most common terms used by the uploaders of those videos. We split each title into words and perform stemming using the Porter Stemmer algorithm [40]. Figure 5 reports the top 15 stems appearing in the titles of the Incel-derived and Control videos. Note that a substantial portion of the Incel-derived videos with stems referring to women in their title are Incel-related. For example, and of the Incel-derived videos with, respectively, “women” and “girl” in the title are Incel-related. Other examples include “guy” (), “life” (), “man” (), and “love” (). On the other hand, the most common terms in the titles of Incel-related Control videos are “trailer” () and “world” (). This suggests that most of the videos that are shared and commented on by Incels are about women, love, and life, while this is not the case for the Control.

Figure 6: Examples of Incel-derived Incel-related videos (mis)labeled as music videos.
(a)
(b)
Figure 7: Proportion of Incel-related vs other videos for the top 15 stems found in the tags of Incel-derived (a) and Control (b) videos.
(a)
(b)
(c)
(d)
Figure 8: CDF of the number of views (a) and comments per video (b), as well as of the fraction of likes over the total likes and dislikes (c), and comments over views per video (d), for Incel-related and other videos in the Incel-derived and Control sets.

Tags. Video tags are words specified by the uploader and determine the topics that a video is relevant to. Figure 7 reports the top 15 stems appearing in the tags of Incel-derived and Control videos. Note that for both Incel-derived and Control videos there is a substantial overlap between the stems in the tags and the titles. Unsurprisingly, tags like “date” () and “girl” () appear to have a higher portion of Incel-related Incel-derived videos, while for the Control the tags that appear more in Incel-related videos are “game” (), “best” (), and “world” (). Once again, Incels seem to be mostly interested in videos about dating, and love, or featuring women.

(a)
(b)
Figure 9: CDF of the number of Incel-related comments (a) and fraction of negative comments (b) per video, for both Incel-related and other in the Incel-derived and Control sets.
Figure 10: Graph representation of the words associated with “incel” in the comments of Incel-related videos in the Incel-derived set.

Statistics. Next, we examine various video statistics in our dataset. Figures 7(a) and 7(b) show the CDF of the number of views and of comments, respectively, per video in both Incel-derived and Control sets. We observe that the Incel-related videos in both sets have more views and much more comments than the other videos. Then, Figure 7(c) shows the CDF of the fraction of likes over the total likes and dislikes per video in both Incel-derived and Control sets. Interestingly, Incel-related videos in the Incel-derived set have a lower fraction, suggesting that they are more often disliked by viewers, while this is not the case for the Control videos where Incel-related and other videos have similar fraction of likes. Finally, Figure 7(d) plots the fraction of comments over views for both sets of videos. We observe that views are more often converted into comments for Incel-related Incel-derived videos, while this is not the case for the Control.

Comments. We now look at the comments of the videos in our dataset using our lexicon of Incel-related terms. Recall that we consider a comment Incel-related when it contains at least one Incel-related term from our lexicon. Figure 8(a) plots the CDF of the number of Incel-related comments per video for both Incel-related and other videos in Incel-derived and Control sets. Almost half and of the Incel-related videos in Incel-derived and Control sets, respectively, have ten or more Incel-related comments. There is also a small number of videos with four or less Incel-related comments that are labeled as “other” in accordance to our methodology (see Section 3.2).

Comments (Sentiment). Next, we look at the sentiment expressed in the comments, using VADER [22], a rule-based model for general sentiment analysis. Figure 8(b) plots the CDF of the fraction of negative comments per video, which shows that Incel-related videos in both Incel-derived and Control sets attract more negative comments. We also find that a substantial proportion of the comments, and , in almost half of the Incel-related Incel-derived and Control videos, respectively, are negative.

Comments (Language). To understand the use of language on Incel-related videos, we train a word2vec model [37]

on all comments of the Incel-derived videos. A word2vec model is a shallow neural network that maps words into a high-dimensional vector space, so that words used in similar contexts are mapped closer together in the space. To train the model, we first process all the 1.8M comments made on Incel-related videos, by tokenizing them, lowering the case of each word, and performing stemming using the Porter Stemmer algorithm. Then, we use the methodology from 

[50]

: for each word in the vocabulary, we get the cosine similarity between the specific word and all the other words.

To visualize the association of words, we create a graph that shows the most similar words around a specific word of interest. Figure 10 shows the graph for the word “incel.” We build this graph by taking the 2-hop ego network around the word “incel,” while only considering the words that have at least 0.6 cosine similarity—again, following the methodology in [50]. In this graph, the size of the nodes is proportional to the number that the words appear on the comments. The nodes are painted based on the community they belong after running the Louvain method [5]. The graph is laid out in space with the ForceAtlas2 algorithm [26], which lays nodes closer to each other according to their cosine similarity (i.e., words that are used in similar contexts are closer in the visualization). We observe several interesting themes on the graph. First, we find a community of words related to Incels (purple) and one related to Alpha Males (peacock blue). We also notice a community related to the Men Going Their Own Way (MGTOW) movement and feminism (red). Finally, we can see a community that seemingly expresses several hateful ideologies like misogynist, racist, sexist, and homophobic content (green). Overall, our analysis shows that Incels are commenting on videos related to a variety of topics, ranging from men’s issues using their own vocabulary (e.g., Chad, thundercock, etc.) to others that can be considered hateful or divisive, including expressing racism, homophobia, and disgust towards women.

Rec. Graph Type Incel-related (%) Other (%)
Incel-derived 7,196 () 124,945 ()
Control 4047 () 152,642 ()
Table 5: Number of Incel-related and other videos in each recommendation graph.
Source Destination Incel-derived Rec. Graph (%) Control Rec. Graph (%)
Incel-related Incel-related 11,269 () 2,044 ()
Incel-related Other 26,641 () 9,926 ()
Other Other 432,659 () 466,357 ()
Other Incel-related 29,544 () 19,714 ()
Table 6: Number of transitions between Incel-related and “other” videos in each recommendation graph.

6 Recommendations Analysis

In this section, we present an analysis of how YouTube’s recommendation algorithm behaves with respect to Incel-related videos. First, we investigate how likely it is for YouTube to recommend an Incel-related video. Then, we look at the probability of discovering Incel-related content by performing random walks on YouTube’s recommendation graph, thus simulating the behavior of a user that views videos based on the recommendations.

Graph Analysis. To build the recommendation graphs used for our analysis, for each video in the Incel-derived and Control sets, we collect the top 10 recommended videos associated with it, as returned by the YouTube Data API. We manage to collect the recommendations for the Incel-derived videos between September 20, 2019 and October 4, 2019, and for the Control between October 15, 2019 and November 1, 2019. Note, that the use of the YouTube Data API is associated with a specific account only for the purposes of authentication, thus our data collection does not capture how specific account features or the viewing history affect recommendations. To annotate the collected videos we follow the same approach as described in Section 3.2.

Next, we build a directed graph for each set of recommendations, where nodes are videos (either our dataset videos or their recommendations), and edges between nodes indicate the recommendations between all videos (up to ten). For instance, if video2 is recommended via video1 then we add an edge from video1 to video2. Throughout the rest of this paper, we refer to the collected recommendations of each set of videos as separate recommendation graphs.

(a)
(b)
Figure 11: Percentage of random walks where the random walker encounters at least one Incel-related video for both starting scenarios.

First, we investigate the prevalence of Incel-related videos in each recommendation graph. Table 5 shows the number of Incel-related and other videos in each recommendation graph. For the Incel-derived recommendation graph, we find 125K () other videos and 7K () Incel-related videos, while in the Control recommendation graph we find 153K () other videos and 4K (

) Incel-related videos. These findings highlight that despite the fact that the proportion of Incel-related video recommendations in the Control recommendation graph is less, there is still a non-negligible amount recommended to users. Also, note that we reject the null hypothesis that the differences between the two recommendation graphs are due to chance via the Fisher’s exact test (

[12].

(a)
(b)
Figure 12: Percentage of Incel-related videos over all unique videos that the random walk encounters at hop for both starting scenarios.

Next, to understand how likely it is for YouTube to recommend an Incel-related video, we study the interplay between the Incel-related and other videos in each recommendation graph. For each video, we calculate the out-degree in terms of Incel-related and “other” labeled nodes. We can then count the number of transitions the graph makes between differently labeled nodes. Table 6 summarizes the percentage of each transition between the different types of video for both recommendation graphs. Unsurprisingly, we find that most of the transitions, and , in the Incel-derived and Control recommendation graphs, respectively, are between “other” videos, but this is mainly because of the large number of “other” videos in each graph. We also find a high percentage of transitions between “other” and Incel-related videos. When a user watches an “other” video, if he randomly follows one of the top ten recommended videos, there is a and probability in the Incel-derived and Control recommendation graphs, respectively, that he will end up at an Incel-related video. Interestingly, in both graphs, Incel-related videos are more often recommended by “other” videos than by Incel-related videos, but again this is due to the large number of other videos.

Does YouTube’s recommendation algorithm contribute to steering users towards Incel communities? Next, we study how YouTube’s recommendation algorithm behaves with respect to discovering Incel-related videos. Through our graph analysis, we showed that the problem of Incel-related videos on YouTube is quite prevalent. However, it is still unclear how often YouTube’s recommendation algorithm leads users to this type of abhorrent content.

To measure this we perform experiments considering a “random walker.” This allow us to simulate the behavior of a random user who starts from one video and then he watches several videos according to the recommendations. The random walker begins from a randomly selected node and navigates the graph choosing edges at random until he reaches five hops, which constitutes the end of a single random walk. We repeat this process for random walks considering two starting scenarios. In each scenario, the starting node is restricted to either Incel-related or other videos. The same experiment is performed on both the Incel-derived and Control recommendations graphs.

Next, for the random walks of each recommendation graph we calculate two metrics: 1) the percentage of random walks where the random walker finds at least one Incel-related video in the -th hop; and 2) the percentage of Incel-related videos over all unique videos that the random walker encounters up to the -th hop for both starting scenarios. The two metrics, at each hop are shown in Figures 11 and 12 for both recommendation graphs.

We observe that when starting from an “other” video, we find that there is a and probability to encounter at least one Incel-related video after five hops in the Incel-derived and Control recommendation graphs, respectively (see Fig. 10(a)). We also observe that, when starting from an Incel-related video, we find at least one Incel-related in and of the random walks performed on the Incel-derived and Control recommendation graphs, respectively (see Fig. 10(b)). We also find that, when starting from other videos, most of the Incel-related videos are found early in our random walks (i.e., at the first hop) and this number remains almost the same as the number of hops increases (see Fig. 11(a)). The same stands when starting from Incel-related videos, but in this case the percentage of Incel-related videos decreases as the number of hops increases for both recommendation graphs (see Fig. 11(b)).

As expected, in all cases the percentage of encountering Incel-related videos in random walks performed on the Incel-derived recommendation graph is higher compared to the random walks performed on the Control recommendation graph. We verify that the difference between the distribution of Incel-related videos encountered in the random walks of the two recommendation graphs is significant via the Kolmogorov-Smirnov test ([35]. Overall, we find that Incel-related videos are usually recommended within the two first hops. However, in subsequent hops the number of encountered Incel-related videos decreases. These findings indicate that a user casually browsing YouTube videos is unlikely to end up in region dominated by Incel-related videos.

Take-Aways. Overall, the analysis of YouTube’s recommendation algorithm yields the following findings:

  1. We find a non-negligible amount of Incel-related videos () within YouTube’s recommendation graph being recommended to users;

  2. When a user watches an “other” video, if he randomly follows one of the top ten recommended videos, there is a chance that he will end up watching an Incel-related video;

  3. By performing random walks on YouTube’s recommendation graph, we find that when starting from an “other” video, there is a probability to encounter at least one Incel-related video within five hops.

7 Discussion & Conclusion

This paper presented a data-driven characterization of the Incel community on YouTube. We collected thousands of YouTube videos shared by users in Incel-related communities within Reddit and used them to understand how Incel ideology spreads on YouTube, as well as to study the evolution of the Incel community.

Overall, we found a non-negligible growth in Incel-related activity on YouTube over the past few years in terms of Incel-related videos published and comments likely posted by Incels, which suggests members gravitating around the Incel community are increasingly using YouTube to disseminate their views. By analyzing comments posted on Incel-related videos using word2vec embeddings, we also observed associations with topics expressing racism, misogyny, and anti-feminism. Finally, we took a look at how the YouTube’s recommendation algorithm behaves with respect to Incel-related videos. We found that there is a non-negligible chance (

) that a user who watches a non Incel-related video will end up watching an Incel-related video if they randomly follow one of the top ten recommended videos. By performing random walks on YouTube’s recommendation graph, we estimated a

chance of a user who starts by watching non Incel-related videos to be recommended Incel-related ones within 5 recommendations.

To the best of our knowledge, our work is the first to focus on the characterization of Incel-related content on YouTube. We hope that the insights provided by our work can be used as a starting point to gain a deeper understanding of how misogynistic ideologies spread on YouTube, which in turn may help in effectively moderating this type of content.

Limitations. Naturally, our work is not without limitations. First, the video recommendations we collected only represent a snapshot of the recommendation system. Given that we do not take personalization into account, it is hard to derive conclusions about YouTube at large. However, we believe that the recommendation graphs we obtain still allow us to understand how YouTube’s recommendation system is behaving in our scenario. Also note that a similar methodology for auditing YouTube’s recommendation algorithm has been used in previous work [42]. Lastly, the moderation status of a video comment is not publicly available via the YouTube Data API, thus we were not able to study moderation of Incel-related comments.

Future Work. We plan to extend our work to include random walks live on YouTube recommendations with personalization. We will also work towards detecting Incel-related content based on video content, as well as investigating other Manosphere communities.

References

  • [1] S. Agarwal and A. Sureka. A Focused Crawler for Mining Hate and Extremism Promoting Videos on YouTube. In Proceedings of the 25th ACM conference on Hypertext and social media, pages 294–296. ACM, 2014.
  • [2] J. Baumgartner. Pushshift Reddit API. 2019.
  • [3] BBC. How rampage killer became misogynist “hero”. https://www.bbc.com/news/world-us-canada-43892189, 2018.
  • [4] M. Blais and F. Dupuis-Déri. Masculinism and the Antifeminist Countermovement. Social Movement Studies, 11(1):21–39, 2012.
  • [5] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast Unfolding of Communities in Large Networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
  • [6] L. Cecco. Toronto van attack suspect says he was ’radicalized’ online by ’incels’. https://www.theguardian.com/world/2019/sep/27/alek-minassian-toronto-van-attack-interview-incels, 2019.
  • [7] J. Cook. Inside incels’ looksmaxing obsession: Penis stretching, skull implants and rage. https://www.huffpost.com/entry/incels-looksmaxing-obsession˙n˙5b50e56ee4b0de86f48b0a4f, 2018.
  • [8] B. M. Coston and M. Kimmel. White Men as the New Victims: Reverse Discrimination Cases and the Men’s Rights Movement. Nevada Law Journal, 13:368, 2012.
  • [9] P. Covington, J. Adams, and E. Sargin. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM conference on recommender systems, pages 191–198. ACM, 2016.
  • [10] T. Farrell, M. Fernandez, J. Novotny, and H. Alani. Exploring Misogyny across the Manosphere in Reddit. In Proceedings of the 10th ACM Conference on Web Science, WebSci ’19, page 87–96, New York, NY, USA, 2019. Association for Computing Machinery.
  • [11] W. Farrell. The myth of male power. Berkeley Publishing Group, 1996.
  • [12] R. A. Fisher. On the interpretation of

    2 from contingency tables, and the calculation of P.

    Journal of the Royal Statistical Society, 85(1):87–94, 1922.
  • [13] J. L. Fleiss. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378, 1971.
  • [14] D. Futrelle. The ’alt-right’ is fueled by toxic masculinity — and vice versa. https://www.nbcnews.com/think/opinion/alt-right-fueled-toxic-masculinity-vice-versa-ncna989031, 2019.
  • [15] T. Giannakopoulos, A. Pikrakis, and S. Theodoridis. A Multimodal Approach to Violence Detection in Video Sharing Sites. In

    2010 20th International Conference on Pattern Recognition

    , pages 3244–3247. IEEE, 2010.
  • [16] D. Ging. Alphas, Betas, and Incels: Theorizing the Masculinities of the Manosphere. Men and Masculinities, page 1097184X17706401, 2017.
  • [17] Google Developers. YouTube Data API. https://developers.google.com/youtube/v3/, 2019.
  • [18] L. Gotell and E. Dutton. Sexual Violence in the “Manosphere”: Antifeminist Men’s Rights Discourses on Rape. International Journal for Crime, Justice and Social Democracy, 5(2):65, 2016.
  • [19] C. Hauser. Reddit bans ”incel” group for inciting violence against women. https://www.nytimes.com/2017/11/09/technology/incels-reddit-banned.html, 2017.
  • [20] T. Hume. Toronto van attack suspect says he used reddit and 4chan to chat with other incel killers. https://www.vice.com/en˙us/article/bjwdk3/man-accused-of-toronto-van-attack-says-he-corresponded-with-other-incel-killers, 2019.
  • [21] Z. Hunte and K. Engström. “Female Nature, Cucks, and Simps”: Understanding Men Going Their Own Way as Part of the Manosphere. Master’s thesis, 2019.
  • [22] C. J. Hutto and E. Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth international AAAI conference on weblogs and social media, 2014.
  • [23] Incels Wiki. Blackpill. https://incels.wiki/w/Blackpill, 2019.
  • [24] Incels Wiki. Incel Forums Term Glossary. https://incels.wiki/w/Incel˙Forums˙Term˙Glossary, 2019.
  • [25] Incels Wiki. The Incel Wiki. 2019.
  • [26] M. Jacomy, T. Venturini, S. Heymann, and M. Bastian. ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software. PloS one, 9(6):e98679, 2014.
  • [27] S. Jaki, T. De Smedt, M. Gwóźdź, R. Panchal, A. Rossa, and G. De Pauw. Online hatred of women in the Incels. me forum: Linguistic analysis and automatic detection. Journal of Language Aggression and Conflict, 7(2):240–268, 2019.
  • [28] S. Jiang, R. E. Robertson, and C. Wilson. Bias Misperceived: The Role of Partisanship and Misinformation in YouTube Comment Moderation. In Proceedings of the 13th International AAAI Conference on Web and Social Media, volume 13, pages 278–289, 2019.
  • [29] K. E. Kiernan. Who Remains Celibate? Journal of Biosocial Science, 20(3):253–264, 1988.
  • [30] J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. biometrics, pages 159–174, 1977.
  • [31] J. L. Lin. Antifeminism Online: MGTOW (Men Going Their Own Way). In Digital Environments, Ethnographic Perspectives Across Global Online and Offline Spaces. 2017.
  • [32] E. Mariconti, G. Suarez-Tangil, J. Blackburn, E. De Cristofaro, N. Kourtellis, I. Leontiadis, J. L. Serrano, and G. Stringhini. “You Know What to Do”: Proactive Detection of YouTube Videos Targeted by Coordinated Hate Attacks. In Proceedings of the ACM Conference on Computer Supported Cooperative Work and Social Computing, 2019.
  • [33] P. Martineau. YouTube Is Banning Extremist Videos. Will It Work? https://www.wired.com/story/how-effective-youtube-latest-ban-extremism/, 2019.
  • [34] A. Massanari. # gamergate and the fappening: How reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society, 19(3):329–346, 2017.
  • [35] F. J. Massey Jr. The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American statistical Association, 46(253):68–78, 1951.
  • [36] M. A. Messner. The Limits of “The Male Sex Role” An Analysis of the Men’s Liberation and Men’s Rights Movements’ Discourse. Gender & Society, 12(3):255–276, 1998.
  • [37] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781, 2013.
  • [38] R. Ottoni, E. Cunha, G. Magno, P. Bernardina, W. Meira Jr, and V. Almeida. Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination. In Proceedings of the 10th ACM Conference on Web Science, pages 323–332. ACM, 2018.
  • [39] K. Papadamou, A. Papasavva, S. Zannettou, J. Blackburn, N. Kourtellis, I. Leontiadis, G. Stringhini, and M. Sirivianos. Disturbed YouTube for Kids: Characterizing and Detecting Inappropriate Videos Targeting Young Children. In Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020.
  • [40] M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
  • [41] Rational Wiki. Incel. https://rationalwiki.org/wiki/Incel, 2019.
  • [42] M. H. Ribeiro, R. Ottoni, R. West, V. A. Almeida, and W. Meira. Auditing Radicalization Pathways on YouTube. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 2019.
  • [43] C. M. Rivers and B. L. Lewis. Ethical research standards in a world of big data. F1000Research, 3, 2014.
  • [44] S. Sara, K. Lah, S. Almasy, and R. Ellis. Oregon Shooting: Gunman a Student at Umpqua Community College. https://www.cnn.com/2015/10/02/us/oregon-umpqua-community-college-shooting/index.html, 2015.
  • [45] A. Sureka, P. Kumaraguru, A. Goyal, and S. Chhabra. Mining YouTube to Discover Extremist Videos, Users and Hidden Communities. In Asia Information Retrieval Symposium, pages 13–24. Springer, 2010.
  • [46] R. Tahir, F. Ahmed, H. Saeed, S. Ali, F. Zaffar, and C. Wilson. Bringing the Kid back into YouTube Kids: Detecting Inappropriate Content on Video Streaming Platforms. In Proceedings of the IEEE/ACM International Conference on Advances in Social Network Analysis and Mining, 2019.
  • [47] M. Vijaymeena and K. Kavitha. A Survey on Similarity Measures in Text Mining. Machine Learning and Applications: An International Journal, 3(2):19–28, 2016.
  • [48] Wikipedia. 2014 Isla Vista Killings. https://en.wikipedia.org/wiki/2014˙Isla˙Vista˙killings, 2019.
  • [49] S. Zannettou, S. Chatzis, K. Papadamou, and M. Sirivianos. The Good, the Bad and the Bait: Detecting and Characterizing Clickbait on YouTube. In 2018 IEEE Security and Privacy Workshops (SPW), pages 63–69. IEEE, 2018.
  • [50] S. Zannettou, J. Finkelstein, B. Bradlyn, and J. Blackburn. A Quantitative Approach to Understanding Online Antisemitism. In Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020.
  • [51] Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. Andrews, A. Kumthekar, M. Sathiamoorthy, X. Yi, and E. Chi. Recommending What Video to Watch Next: A Multitask Ranking System. In Proceedings of the 13th ACM Conference on Recommender Systems, pages 43–51, 2019.
  • [52] S. Zimmerman, L. Ryan, and D. Duriesmith. Who are Incels? Recognizing the Violent Extremist Ideology of ’Incels’. 09 2018.