Characterizing and Detecting Hateful Users on Twitter

03/23/2018 ∙ by Manoel Horta Ribeiro, et al. ∙ Universidade Federal de Minas Gerais 0

Most current approaches to characterize and detect hate speech focus on content posted in Online Social Networks. They face shortcomings to collect and annotate hateful speech due to the incompleteness and noisiness of OSN text and the subjectivity of hate speech. These limitations are often aided with constraints that oversimplify the problem, such as considering only tweets containing hate-related words. In this work we partially address these issues by shifting the focus towards users. We develop and employ a robust methodology to collect and annotate hateful users which does not depend directly on lexicon and where the users are annotated given their entire profile. This results in a sample of Twitter's retweet graph containing 100,386 users, out of which 4,972 were annotated. We also collect the users who were banned in the three months that followed the data collection. We show that hateful users differ from normal ones in terms of their activity patterns, word usage and as well as network structure. We obtain similar results comparing the neighbors of hateful vs. neighbors of normal users and also suspended users vs. active users, increasing the robustness of our analysis. We observe that hateful users are densely connected, and thus formulate the hate speech detection problem as a task of semi-supervised learning over a graph, exploiting the network of connections on Twitter. We find that a node embedding algorithm, which exploits the graph structure, outperforms content-based approaches for the detection of both hateful (95% AUC vs 88% AUC) and suspended users (93% AUC vs 88% AUC). Altogether, we present a user-centric view of hate speech, paving the way for better detection and understanding of this relevant and challenging issue.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


The importance of understanding hate speech in Online Social Networks (OSNs) is manifold. Countries such as Germany have strict legislation against the practice [Stein1986], the presence of such content may pose problems for advertisers [The Guardian2017] and users [Sabatini and Sarracino2017], and manually inspecting all possibly hateful content in OSNs is unfeasible [Schmidt and Wiegand2017]. Furthermore, the trade-off between banning such behavior from platforms and not censoring dissenting opinions is a major societal issue [Rainie, Anderson, and Albright2017].

This scenario has motivated work that aims to understand and detect hateful content [Greevy and Smeaton2004, Warner and Hirschberg2012, Burnap and Williams2016], by creating representations for tweets or comments in an OSN, e.g. word2vec [Mikolov et al.2013]

, and then classifying them as hateful or not, often drawing insights on the nature of hateful speech. However, in OSNs, the meaning of such content is often not self-contained, referring, for instance, to some event which just happened and the texts are packed with informal language, spelling errors, special characters and sarcasm 

[Dhingra et al.2016, Riloff et al.2013]. Besides that, hate speech itself is highly subjective, reliant on temporal, social and historical context, and occurs sparsely [Schmidt and Wiegand2017]. These problems, although observed, remain unaddressed [Davidson et al.2017, Magu, Joshi, and Luo2017]. Consider the tweet:

Timesup, yall getting w should have happened long ago

Which was in reply to another tweet that mentioned the holocaust. Although the tweet, whose author’s profile contained white-supremacy imagery, incited violence, it is hard to conceive how this could be detected as hateful with only textual features. Furthermore, the lack of hate-related words makes it difficult for this kind of tweet to be sampled.

Fortunately, as we just hinted, the data in posts, tweets or messages are not the only signals we may use to study hate speech in OSNs. Most often, these signals are linked to a profile representing a person or organization. Characterizing and detecting hateful users shares much of the benefits of detecting hateful content and presents plenty of opportunities to explore a richer feature space. Furthermore, on a practical hate speech guideline enforcement process, containing humans in the loop, its is natural that user profiles will be checked, rather than isolated tweets. The case can be made that this wider context is sometimes needed to define hate speech, such as in the example, where the abuse was made clear by the neo-nazi signs in the user’s profile.

Analyzing hate on a user-level rather than content-level enables our characterization to explore not only content, but also dimensions such as the user’s activity and connections. Moreover, it allows us to use the very structure of Twitter’s network in the task of detecting hateful users [Hamilton, Ying, and Leskovec2017b].

Figure 1: Network of users sampled from Twitter after our diffusion process. Red nodes indicate the proximity of users to those who employed words in our lexicon.

Present Work

In this paper we characterize and detect hateful users on Twitter, which we define according to Twitter’s hateful conduct guidelines. We collect a dataset of users along with up to tweets for each with a random-walk-based crawler on Twitter’s retweet graph. We identify users that employed words from a set of hate speech related lexicon, and generate a subsample selecting users that are in different distances to such users. These are manually annotated as hateful or not through crowdsourcing. The aforementioned distances are real valued numbers obtained through a diffusion process in which the users who used the words in the lexicon are seeds. We create a dataset containing manually annotated users, of which were labeled as hateful. We also find the users that have been suspended after the data collection - before and after Twitter’s guideline changes, which happened on the 18/Dec/17.

Studying these users, we find significant differences between the activity patterns of hateful and normal users: hateful users tweet more frequently, follow more people each day and their accounts are more short-lived and recent

While the media stereotypes hateful individuals as “lone wolves” [Burke2017], we find that hateful users are not in the periphery of the retweet network we sampled. Although they have less followers, the median for several network centrality measures in the retweet network is higher for those users. We also find that these users do not seem to behave like spammers.

A lexical analysis using Empath [Fast, Chen, and Bernstein2016] shows that their choice of vocabulary is different: words related to hate, anger and politics occur less often when compared to their normal counterparts, and words related to masculinity, love and curses occur more often. This is noteworthy, as much of the previous work directly employs hate-related words as a data-collection mechanism.

We compare the neighborhood of hateful with the neighborhood of normal users in the retweet graph, as well as accounts that have been suspended with those who were not. We argue that these suspended accounts and accounts that retweeted hateful users are also proxies for hateful speech online, and the similar results found in many of the analyses performed increase the robustness of our findings.

We also compare users who have been banned before and after Twitter’s recent guideline change, finding an increase in the number of users banned per day, but little difference in terms of their vocabulary, activity and network structure.

Finally, we find that hateful users and suspended users are very densely connected in the retweet network we sampled. Hateful users are times more likely to retweet other hateful users and suspended users are times more likely to retweet other suspended users. This motivates us to pose the problem of detecting hate speech as a task of supervised learning over graphs. We employ a node embedding algorithm that creates a low-dimensional representation of nodes in a network to then classify it. We demonstrate robust performance to detect both hateful and suspended users in such fashion ( AUC and AUC) and show that this approach outperforms traditional state-of-the-art classifiers ( AUC and AUC, respectively).

Altogether, this work presents a user-centric view of the problem of hate speech. Our code and data are available 111


Hateful Users

We define “hateful user” and “hate speech” according to Twitter’s guidelines. For the purposes of this paper, “hate speech” is any type of content that ‘promotes violence against or directly attack or threaten other people on the basis of race, ethnicity, national origin, sexual orientation, gender, gender identity, religious affiliation, age, disability, or disease.” [Twitter2017] On the other hand, “hateful user” is a user that, according to annotators, endorses such type of content.

Retweet Graph

The retweet graph is a directed graph where each node represents a user in Twitter, and each edge implies that the user has retweeted . Previous work suggests that retweets are better than followers to judge users’ influence [Cha et al.2010]. As influence flows in the opposite direction of retweets, we invert the graph’s edges.

Offensive Language

We employ  waseem2017understanding definition of explicit abusive language, which defines it as language that is unambiguous in its potential to be abusive, for example language that contains racial or homophobic slurs. The use of this kind of language doesn’t imply hate speech, although there is a clear correlation [Davidson et al.2017].

Suspended Accounts

Most Twitter accounts are suspended due to spam, however they are harder to reach in the retweet graph as they rarely get retweeted. We use Twitter’s API to find the accounts that have been suspended among the collected users, and use these as another source for potentially hateful behavior. We collect accounts that have been suspended two months after the data collection, on 12/Dec/2017, and after Twitter’s hateful conduct guideline changes, on 14/Jan/2018. The new guidelines are allegedly stricter, considering for instance, off-the-platform behavior.

Data Collection

Most previous work on detecting hate speech on Twitter employs a lexicon-based data collection, which involves sampling tweets that contain specific words [Davidson et al.2017, Waseem and Hovy2016], such as wetb*cks or fagg*t. However, this methodology is biased towards a very direct, textual and offensive hate speech. It presents difficulties with statements that subtly disseminate hate with no offensive words, as in "Who convinced Muslim girls they were pretty?" [Davidson et al.2017]; And also with the usage of code words, as in the use of the word "skypes", employed to reference jews [Magu, Joshi, and Luo2017, Meme2016]; In this scenario, we propose collecting users rather than tweets, relying on lexicon only indirectly, and collecting the structure of these users in the social network, which we will later use to characterize and detect hate.

We represent the connections among users in Twitter using the retweet network [Cha et al.2010]

. Sampling the retweet network is hard as we can only observe out-coming edges (due to API limitations), and as it is known that any unbiased in-degree estimation is impossible without sampling most of these “hidden” edges in the graph 

[Ribeiro et al.2012]. Acknowledging this limitation, we employ Ribeiro et al. Direct Unbiased Random Walk algorithm, which estimates out-degrees distribution efficiently by performing random jumps in an undirected graph it constructs online [Ribeiro, Wang, and Towsley2010]. Fortunately, in the retweet graph the outcoming edges of each user represent the other users she - usually [Guerra et al.2017] - endorses. With this strategy, we collect a sample of Twitter retweet graph with users and retweet edges along with the most recent tweets for each users, as shown in Figure 1. This graph is unbiased w.r.t. the out degree distribution of nodes.

Figure 2: Toy exampĺe of the diffusion process. (i) We begin with the sampled retweet graph ; (ii) We revert the direction of the edges (the way influence flows), add self loops to every node, and mark the users who employed words in our lexicon; (iii) We iteratively update the belief of other nodes.

As the sampled graph is too large to be annotated entirely, we need to select a subsample to be annotated. If we choose tweets uniformly at random, we risk having a very insignificant percentage of hate speech in the subsample. On the other hand, if we choose only tweets that use obvious hate speech features, such as offensive racial slurs, we will stumble in the same problems pointed in previous work. We propose a method between these two extremes. We:

  1. Create a lexicon of words that are mostly used in the context of hate speech. This is unlike other work [Davidson et al.2017] as we do not consider words that are employed in a hateful context but often used in other contexts in a harmless way (e.g. n*gger); We use words such as holohoax, racial treason and white genocide, handpicked from [Hate Base2017], and ADL’s hate symbol database [ADL2017].

  2. Run a diffusion process on the graph based on DeGroot’s Learning Model [Golub and Jackson2010], assigning an initial belief to each user who employed the words in the lexicon; This prevents our sample from being excessively small or biased towards some vocabulary.

  3. Divide the users in strata according to their associated beliefs after the diffusion process, and perform a stratified sampling, obtaining up to user per strata.

Figure 3: Average values for several activity-related statistics for hateful users, normal users, users in the neighborhood of those, and suspended/active users. The avg(interval) was calculated on the tweets extracted for each user. Error bars represent confidence intervals. The legend used in this graph is kept in the remainder of the paper.
Figure 4:

KDEs of the creation dates of user accounts. The white dot indicates the median and the thicker bar indicates the first and third quartiles.

We briefly present our diffusion model, as illustrated in Figure 2. Let be the adjacency matrix of our retweeted graph where each node represents a user and each edge represents a retweet. We have that if retweeted . We create a transition matrix by inverting the edges in (as influence flows from the retweeted user to the user who retweeted him or her), adding a self loop to each of the nodes and then normalizing each row in so it sums to . This means each user is equally influenced by every user he or she retweets.

We then associate a belief to every user who employed one of the words in our lexicon, and to all who didn’t. Lastly, we create new beliefs using the update rule: . Notice that all the beliefs converge to the same value as , thus we run the diffusion process with . With this real value () associated with each user, we get 4 strata by randomly selecting up to users with in the intervals , , and . This ensures that we annotate users that didn’t employ any of the words in our lexicon, yet have a high potential to be hateful due to homophily.

We annotate users as hateful or not using CrowdFlower, a crowdsourcing service. The annotators were given the definition of hateful conduct according to Twitter’s guidelines and asked, for each user:

Does this account endorse content that is humiliating, derogatory or insulting towards some group of individuals (gender, religion, race) or support narratives associated with hate groups (white genocide, holocaust denial, jewish conspiracy, racial superiority)?

Annotators were asked to consider the entire profile (limiting the tweets to the ones collected) rather than individual publications or isolate words and were given examples of terms and codewords in ADL’s hate symbol database. Each user profile was independently annotated by annotators, and, if there was disagreement, up to annotators. In the end, hateful users and normal ones were identified by them. The sample of the retweet network was collected between the 1st and 7th of Oct/17, and annotation began immediately after. We also obtained all users suspended up to 12/Dec/17 () and up to 14/Jan/18 ().

Characterizing Hateful Users

We analyze how hateful and normal users differ w.r.t. their activity, vocabulary and network centrality. We also compare the neighbors of hateful and of normal users, and suspended/active users to reinforce our findings, as homophily suggests that the neighbors will share a lot of characteristics with annotated users, and as suspended users may have been banned because of hateful conduct 222We use suspended and banned interchangeably.

. We compare those in pairs as the sampling mechanism for each of the populations is different. We argue that each one of these pairs contains a proxy for hateful speech in Twitter, and thus inspecting the three increases the robustness of our analysis. P-values given are from unequal variances t-tests to compare the averages across distinct populations. When we refer to “hateful users”, we refer to the ones annotated as hateful. The number of users in each of these groups is given in the table bellow:

Hateful Normal
Banned Active
Table 1: Number of users in each group.

Hateful users have newer accounts

The account creation date of users is depicted in Figure 4. Hateful users were created later than normal ones (p-value ). A hypothesis for this difference is that hateful users are banned more often due to Twitter’s guidelines infringement. This resonates with existing methods for detecting fake accounts in which using the account’s creation date have been successful [Viswanath et al.2015]. We obtain similar results w.r.t. the 1-neighborhood of such users, where the hateful neighbors were also created more recently (p-value ), and also when comparing suspended and active accounts (p-value ).

Figure 5: Boxplots for the distribution of metrics that indicate spammers. Hateful users have slightly less followers per followee, less URLs per tweet, and less hashtags per tweet.
Figure 6: Average values for the relative occurrence of several categories in Empath. Notice that not all Empath categories were analyzed and that the to-be-analyzed categories were chosen before-hand to avoid spurious correlations. Error bars represent confidence intervals.
Figure 7: Box plots for the distribution of sentiment and subjectivity and bad-words usage. Suspended users, hateful users and their neighborhood are more negative, and use more bad words than their counterparts.
Figure 8: Network centrality metrics for hateful and normal users, their neighborhood, and suspended/non-suspended users calculated on the sampled retweet graph.

Hateful users are power users

Other interesting metrics for analysis are the number of tweets, followers, followees and favorite tweets a user has, and the interval in seconds between their tweets. We show these statistics in Figure 3. We normalize the number of tweets, followers and followees by the number of days the users have since their account creation date. Our results suggest that hateful users are “power users” in the sense that they tweet more, in shorter intervals, favorite more tweets by other people and follow other users more (p-values ). The analysis yields similar results when we compare the 1-neighborhood of hateful and normal users, and when comparing suspended and active accounts (p-values , except for the number of favorites when comparing suspended/active users, and for the average interval, when comparing the neighborhood).

Hateful users don’t behave like spammers

We investigate whether users that propagate hate speech are spammers. We analyze metrics that have been used by previous work to detect spammers, such as the numbers of URLs per tweet, of hashtags per tweet and of followers per followees [Benevenuto et al.2010]. The boxplot of these distributions is shown on Figure 5. We find that hateful users use, in average, less hashtags (p-value ) and less URLs (p-value ) per tweet than normal users. The same analysis holds if we compare the 1-neighborhood of hateful and non-hateful, or suspended and active users (with p-values , except for the number of followers per followees, where there is no statistical significance to the t-test). Additionally, we also find that, in average, normal users have more followers per followees than hateful ones (p-value ), which also happens for their neighborhood (p-value ). This suggests that the hateful and suspended users do not use systematic and programmatic methods to deliver their content. Notice that it is not possible to extrapolate this finding to Twitter in general, as there maybe be hateful users with other behaviors which our data collection methodology does not consider, as we do not specifically look for trending topics or popular hashtags.

The median hateful user is more central

We analyze different measures of centrality for users, as depicted in Figure 8. The median hateful user is more central in all measures when compared to their normal counterparts. This is a counter-intuitive finding, as hateful crimes have long been associated with “lone wolves”, and anti-social people [Burke2017]

. We observe similar results when comparing the median eigenvector centrality of the neighbors of hateful and normal users, as well as suspended and active users. In the latter pair, suspended users also have higher median out degree. When analyzing the average for such statistics, we observe the average eigenvector centrality is higher for the opposite sides of the previous comparisons. This happens because some very influential users distort the value: for example, the

most central users according to the metric are normal. Notice that despite of this, hateful and suspended users have higher average out degree than normal and active users respectively (p-value ).

Hateful users use non-trivial vocabulary

We characterize users w.r.t. their content with Empath [Fast, Chen, and Bernstein2016], as depicted in Figure 6. Hateful users use less words related to hate, anger, shame and terrorism, violence, and sadness when compared to normal users (with p-values ). A question this raises is how sampling tweets based exclusively in a hate-related lexicon biases the sample of content to be annotated to a very specific type of “hate-spreading” user, and reinforces the claims that sarcasm, code-words and very specific slang plays a significant role in defining such users [Davidson et al.2017, Magu, Joshi, and Luo2017].

Categories of words more used by hateful users include positive emotions, negative emotions, suffering, work, love and swearing (with p-values ), suggesting the use of emotional vocabulary. An interesting direction would be to analyze the sensationalism of their statements, as it has been done in the context of clickbaits [Chen, Conroy, and Rubin2015]. When we compare the neighborhood of hateful and normal users and suspended vs active users, we obtain very similar results (with p-values except for when comparing suspended vs. active users usage of anger, terrorism and sadness, swearing and love). Overall, the non-triviality of the vocabulary of these groups of users reinforces the difficulties found in the NLP approaches to sample, annotate and detect hate speech [Davidson et al.2017, Magu, Joshi, and Luo2017].

Figure 9: Corhort-like depiction of the banning of users.
Susp. Accounts Hateful Normal Others
2017-12-12 9.09%/55 0.32%/14 0.33%/318
2018-01-14 17.64%/96 0.90%/40 0.55%/532
Table 2: Percentage/number of accounts that got suspended up before and after the guidelines changed.

We also explore the sentiment in the tweets users write using a corpus based approach, as depicted in Figure 7. We find that sentences written by hateful and suspended users are more negative, and are less subjective (p-value ). The neighbors of hateful users in the retweet graph are also more negative (p-value ), however not less subjective. We also analyze the distribution of profanity per tweet in hateful and non-hateful users. The latter is obtained by matching all words in Shutterstock’s “List of Dirty, Naughty, Obscene, and Otherwise Bad Words” 333 We find that suspended users, hateful users and their neighbors employ more profane words per tweet, also confirming the results from the analysis with Empath (p-value ).

Node Type () Node Type ()
Table 3: Occurrence of the edges between hateful and normal users, and between suspended and active users. Results are normalized w.r.t. to the type of the source node, as in: source typedest typesource type

. Notice that the probabilities do not add to

in hateful and normal users as we don’t present the statistics for non-annotated users.

More users are banned after the guideline changes, but they are similar to the ones banned before

Twitter has changed its enforcement of hateful conduct guidelines in 18/Dec/2017. We analyze the differences among accounts that have been suspended two months after the end of the annotation, in 12/Dec/2017 and in 14/Jan/2018.

The intersection between these groups and the ones we annotated as hateful or not is shown in Table 2. In the first period from the end of the data annotation to the 12/Dec, there were approximately banned users a day whereas in the second period there were . This trend, illustrated in Figure 9, suggests an increased banning activity.

Performing the lexical analysis we previously applied to compare hateful and normal users we do not find statistically significant difference w.r.t. the averages for users banned before and after the guideline change (except for government-related words, where p-value ). We also analyze the number of tweets, followers/followees, and the previously mentioned centrality measures, and observe no statistical significance in difference between the averages or the distributions (which were compared using KS-test). This suggests that Twitter has not changed the type of users banned.

Hateful users are densely connected

Finally, we analyze the frequency at which hateful and normal users, as well as suspended and active users, interact within their own group and with each other. Table 3 depicts the probability of an node of a given type retweeting other type of node. We find that of the retweets of hateful users are to other hateful users, which means that they are times more likely to retweet another hateful user, considering the occurrence of hateful users in the graph. We observe a similar phenomenon with suspended users, which have of their retweets redirected towards other suspended users. As suspended users correspond to only of the users sampled, this means they are approximately times more likely to retweet other suspended users.

The high density of connections among hateful and suspended users suggest a strong modularity. We exploit this, along with activity and network centrality attributes to robustly detect these users.

Hateful/Normal Suspended/Active
Model Features Accuracy F1-Score AUC Accuracy F1-Score AUC
GradBoost user+glove
AdaBoost user+glove
GraphSage user+glove
Table 4:

Prediction results and standard deviations for the two proposed settings: detecting hateful users and detecting suspended users. The semi-supervised node embedding approach performs better than state-of-the-art supervised learning algorithms in all the assessed criteria, suggesting the benefits of exploiting the network structure to detect hateful and suspended users.

Detecting Hateful Users

As we consider users and their connections in the network, we can use information that is not available for models which operate on the granularity level of tweets or comments to detect hate speech.

  • Activity/Network: Features such as number of statuses, followers, followees, favorites, and centrality measurements such as betweenness, eigenvector centrality and the in/out degree of each node. We refer to these as user.

  • GloVe:

    We also use spaCy’s off-the-shelf 300-dimensional GloVe’s vector 

    [Pennington, Socher, and Manning2014] as features. We average the representation across all words in a given tweet, and subsequently, across all tweets a user has. We refer to these as glove.

Using these features, we compare experimentally two traditional machine learning models known to perform very well when the number of instances is not very large: Gradient Boosted Trees (

GradBoost) and Adaptive Boosting (AdaBoost); and a model aimed specifically at learning in graphs, GraphSage [Hamilton, Ying, and Leskovec2017a] (GraphSage). Interestingly, the latter approach is semi-supervised, and allows us to use the neighborhood of the users we are classifying even though they are not labeled, exploiting the modularity between hateful and suspended users we observed. The algorithm creates low-dimensional embeddings for nodes, given associated features (unlike other node embeddings, such as node2vec [Grover and Leskovec2016]). Moreover, it is inductive - which means we don’t need the entire graph to run it. For additional information on node embeddings methods, refer to [Hamilton, Ying, and Leskovec2017b].

The GraphSage algorithm creates embeddings for each node given that the nodes have associated features (in our case the GloVe embeddings and activity/network-centrality attributes associated with each user). Instead of generating embeddings for all nodes, it learns a function that generate embeddings by sampling and aggregating features from a node’s local neighborhood. This strategy exploits the structure of the graph beyond merely using the features of the neighborhood of a given node.

Experimental Settings

We run the algorithms trying to detect both hateful and normal users, as annotated by the crowdsourcing service, as well as trying to detect which users got banned. We perform a 5-fold cross validation and report the F1-score, the accuracy and the area under the ROC curve (AUC) for all instances.

In all approaches we accounted for the class imbalance (of approximately to

) in the loss function. We keep the same ratio of positive/negative classes in both tasks, which, in practice, means we used the

annotated users in the first setting (where approximately were hateful) and, in the second setting, selected users from the graph, including the suspended users, and other users randomly sampled from the graph.

Notice that, as we are dealing with a binary classification problem, we may control the trade-off between specificity and sensitivity by varying the positive-class threshold. In this work we simply pick the largest value, and report the resulting score - which can be interpreted as the probability of a classifier correctly ranking a random positive case higher than a random negative case.


The results of our experiments are shown in Table 4. We find that the node embedding approach using the features related to both users and the GloVe embeddings yields the best results for all metrics in the two considered scenarios. The Adaptative Boosting approach yields good scores, but incorrectly classifies many normal users as hateful, which results in a low accuracy and F1-score.

Using the features related to users makes little difference in many settings, yielding, for example, exactly the same , and very similar accuracy/F1-score in the Gradient Boosting models trained with the two sets of parameters. However, the usage of the retweet network yields promising results, especially because we observe improvements in both the detection of hateful users and of suspended users, which shows the performance improvement occurs independently of our annotation process.

Related Work

We review previous efforts to charactere and detect hate speech in OSNs. Tangent problems such as cyber-bullying and offensive language are not extensively covered; refer to schmidt2017survey.

Characterizing Hate

Hate speech has been characterized in websites and different Online Social Networks. gerstenfeld2003hate analyze hateful websites characterizing their modus operandi w.r.t. monetization, recruitment, and international appeal. chau2007mining identified and analyzed how hate-groups organize around blogs. silva2016analyzing matches regex-like expressions on large datasets on Twitter and Whisper to characterize the targets of hate in Online Social Networks. chatzakou2017hate characterized users and their tweets in the specific context surrounding the #GamerGate controversy. More generally, abuse online also has been characterized on Community-Based Question Answering  [Kayes et al.2015], and in [Van Hee et al.2015].

Detecting Hate

We briefly go through different steps carried by previous work on the task detecting hate speech, analyzing the similarities and differences to this work.

Data Collection

Many previous studies collect data by sampling OSNs with the aid of a lexicon with terms associated with hate speech [Davidson et al.2017, Waseem and Hovy2016, Burnap and Williams2016, Magu, Joshi, and Luo2017]. This may be succeeded by expanding the lexicon with co-occurring terms [Waseem and Hovy2016]. Other techniques employed include matching regular expressions [Warner and Hirschberg2012] and selecting features in tweets from users known to have reproduced hate speech [Kwok and Wang2013]. We employ a random-walk-based methodology to obtain a generic sample of Twitter’s retweet graph, use a lexicon of hateful words to obtain a subsample of potentially hateful users and then select users to annotate in different “distances” from these users, which we obtain through a diffusion process. This allows us not to rely directly on lexicon to obtain the sample to be annotated.


Human annotators are used in most previous work on hate speech detection. This labeling may be done by researchers themselves [Waseem and Hovy2016, Kwok and Wang2013, Djuric et al.2015, Magu, Joshi, and Luo2017], selected annotators [Warner and Hirschberg2012, Gitari et al.2015], or crowd-sourcing services [Burnap and Williams2016]. Hate speech speech has been pointed out as a difficult subject to annotate on [Waseem2016, Ross et al.2017]. chatzakou2017mean annotate tweets in sessions, clustering several tweets to help annotators get a grasp on context. We also employ CrowdFlower to annotate our data. Unlike previous work, we give annotators the entire profile of an user, instead of individual tweets. We argue this provides better context for the annotators [Waseem et al.2017]. The extent additional context improves annotation quality is a promising research direction.


Features used in previous work are almost exclusively content-related. The content from tweets, posts and websites has been represented as n-grams, BoWs 

[Waseem and Hovy2016, Kwok and Wang2013, Magu, Joshi, and Luo2017, Greevy and Smeaton2004, Gitari et al.2015], and word embeddings such as paragraph2vec [Djuric et al.2015], GloVe [Pennington, Socher, and Manning2014] and FastText [Badjatiya et al.2017]

. Other techniques used to extract features from content include POS tagging, sentiment analysis and ease of reading measures 

[Warner and Hirschberg2012, Davidson et al.2017, Burnap and Williams2016, Gitari et al.2015]. waseem2016hateful employ features not related to the content itself, using the gender and the location of the creator of the content. We use attributes related to the user’s activity, his network centrality and the content he or she produced in our characterization and detection. In the context of detecting aggression and cyber-bullying on Twitter, chatzakou2017mean employ a similar set of features as we do.


Models used to classify these features in the existing literature include supervised classification methods such as Naïve-Bayes [Kwok and Wang2013]

, Logistic Regression 

[Waseem and Hovy2016, Davidson et al.2017, Djuric et al.2015]

, Support Vector Machines 

[Warner and Hirschberg2012, Burnap and Williams2016, Magu, Joshi, and Luo2017], Rule-Based Classifiers [Gitari et al.2015]

, Random Forests 

[Burnap and Williams2016]

, Gradient-Boosted Decision Trees 

[Badjatiya et al.2017]

and Deep Neural Networks 

[Badjatiya et al.2017]. We use Gradient-Boosted Decision Trees, Adaptative Boosting and a semi-supervised node embedding approach (GraphSage). Our experiments shows that the latter performs significantly better. A possible explanation for this is because hateful users retweet other hateful users very often, which makes exploiting the network structure beneficial.

Conclusion and Discussion

We present an approach to characterize and detect hate speech on Twitter at a user-level granularity. Our methodology differs previous efforts, which focused on isolated pieces of content, such as tweets and comments. [Greevy and Smeaton2004, Warner and Hirschberg2012, Burnap and Williams2016]. We developed a methodology to sample Twitter which consists of obtaining a generic subgraph, finding users who employed words in a lexicon of hate-related words and running a diffusion process based on DeGroot’s learning model to sample for users in the neighborhood of these users. We then used Crowdflower, a crowdsourcing service to manually annotate users, of which () were considered to be hateful. We argue that this methodology aids two existing shortcomings of existing work: it allows the researcher to balance between having a generic sample and a sample biased towards a set of words in a lexicon, and it provides annotators with realistic context, which is sometimes necessary to identify hateful speech.

Our findings shed light on how hateful users differ from normal ones with respect to their user activity patterns, network centrality measurements, and the content they produce. We discover that hateful users have created their accounts more recently and write more negative sentences. They use lexicon associated with categories such as hate, terrorism, violence and anger less than normal ones, and categories of words such as love, work and masculinity more frequently. We also find that the median hateful user is more central and that hateful users are densely connected in the retweet network. The latter finding motivates the use of an inductive graph embedding approach to detect hateful users, which outperforms widely used algorithms such as Gradient Boosted Trees. As moderation of Online Social Networks in many cases analyzes users, characterizing and detecting hate on a user-level granularity is an essential step for creating workflows where humans and machines can interact to ensure OSNs obey legislation, and to provide a better experience for the average user.

Nevertheless, our approach still has limitations that may lead to interesting future research directions. Firstly, our characterization only considered the behavior of users on Twitter, and the same scenario in other Online Social Networks such as Instagram of Facebook may present different challenges. Secondly, although classifying hateful users provides contextual clues that are not available when looking only at a piece of content, it is still a non-trivial task, as hateful speech is subjective, and people can disagree with what is hateful or not. In that sense, an interesting direction would be to try to create mechanisms of consensus, where online communities could help moderate their content in a more decentralized fashion (like Wikipedia [Shi et al.2017]). Lastly, a research question in the context of detecting hate speech on a user-level granularity that this work fails to address is how much hateful content comes from how many users. This is particularly important as, if we have a Pareto-like distribution where most of the hate is generated by very few users, then analyzing hateful users rather than content becomes even more attractive.

An interesting debate which may arise when shifting the focus on hate speech detection from content to users is how this can potentially blur the line between individuals and their speech. Twitter, for instance, implied it will consider conduct occurring “off the platform” in making suspension decisions [Kadri, Thomas2018]. In this scenario, approaching the hate speech detection problem as we propose could allow users to be suspended to ”contextual” factors - and not for a specific piece of content he or she wrote. However, as mentioned previously, such models can be used as a first step to detect these users, which then will be assessed by humans or other more specific methods.

The broader question this brings is to what extent a “black-box” model may be used to aid in tasks such as content moderation, where this model may contain accidental or intentional bias. These models can be used to moderate Online Social Networks, without the supervision of a human, in which case its bias could be very damaging towards certain groups, even leading to possible suppressions of individual’s human rights, notably the right to free speech. Another option would be to make a clear distinction between using the model to detect possibly hateful or inadequate content and delegating the task of moderation exclusively to a human. Although there are many shades of gray between these two approaches, an important research direction is how to make the automated parts of the moderation process fair, accountable and transparent, which is hard to achieve even for content-based approaches.


We would like to thank Nikki Bourassa, Ryan Budish, Amar Ashar and Robert Faris from the BKC for their insightful suggestions. This work was partially supported by CNPq, CAPES and Fapemig, as well as projects InWeb, INCT-Cyber, MASWEB, BigSea and Atmosphere.


  • [ADL2017] ADL. 2017. ADL Hate Symbols Database.
  • [Badjatiya et al.2017] Badjatiya, P.; Gupta, S.; Gupta, M.; and Varma, V. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion, 759–760. International World Wide Web Conferences Steering Committee.
  • [Benevenuto et al.2010] Benevenuto, F.; Magno, G.; Rodrigues, T.; and Almeida, V. 2010. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6,  12.
  • [Burke2017] Burke, J. 2017. The myth of the ‘lone wolf’ terrorist.
  • [Burnap and Williams2016] Burnap, P., and Williams, M. L. 2016. Us and them: identifying cyber hate on twitter across multiple protected characteristics.

    EPJ Data Science

  • [Cha et al.2010] Cha, M.; Haddadi, H.; Benevenuto, F.; and Gummadi, P. K. 2010. Measuring user influence in twitter: The million follower fallacy. Icwsm 10(10-17):30.
  • [Chatzakou et al.2017a] Chatzakou, D.; Kourtellis, N.; Blackburn, J.; De Cristofaro, E.; Stringhini, G.; and Vakali, A. 2017a. Hate is not binary: Studying abusive behavior of #gamergate on twitter. In Proceedings of the 28th ACM Conference on Hypertext and Social Media, HT ’17. ACM.
  • [Chatzakou et al.2017b] Chatzakou, D.; Kourtellis, N.; Blackburn, J.; De Cristofaro, E.; Stringhini, G.; and Vakali, A. 2017b. Mean birds: Detecting aggression and bullying on twitter. arXiv preprint arXiv:1702.06877.
  • [Chau and Xu2007] Chau, M., and Xu, J. 2007. Mining communities and their relationships in blogs: A study of online hate groups. International Journal of Human-Computer Studies.
  • [Chen, Conroy, and Rubin2015] Chen, Y.; Conroy, N. J.; and Rubin, V. L. 2015. Misleading online content: recognizing clickbait as false news. In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, 15–19.
  • [Davidson et al.2017] Davidson, T.; Warmsley, D.; Macy, M.; and Weber, I. 2017. Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009.
  • [Dhingra et al.2016] Dhingra, B.; Zhou, Z.; Fitzpatrick, D.; Muehl, M.; and Cohen, W. W. 2016. Tweet2vec: Character-based distributed representations for social media. arXiv preprint arXiv:1605.03481.
  • [Djuric et al.2015] Djuric, N.; Zhou, J.; Morris, R.; Grbovic, M.; Radosavljevic, V.; and Bhamidipati, N. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web, 29–30.
  • [Fast, Chen, and Bernstein2016] Fast, E.; Chen, B.; and Bernstein, M. S. 2016. Empath: Understanding topic signals in large-scale text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 4647–4657. ACM.
  • [Gerstenfeld, Grant, and Chiang2003] Gerstenfeld, P. B.; Grant, D. R.; and Chiang, C.-P. 2003. Hate online: A content analysis of extremist internet sites. Analyses of social issues and public policy 3(1):29–44.
  • [Gitari et al.2015] Gitari, N. D.; Zuping, Z.; Damien, H.; and Long, J. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10(4):215–230.
  • [Golub and Jackson2010] Golub, B., and Jackson, M. O. 2010. Naive learning in social networks and the wisdom of crowds. American Economic Journal: Microeconomics 2(1):112–149.
  • [Greevy and Smeaton2004] Greevy, E., and Smeaton, A. F. 2004. Classifying racist texts using a support vector machine. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, 468–469. ACM.
  • [Grover and Leskovec2016] Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 855–864. ACM.
  • [Guerra et al.2017] Guerra, P. H. C.; Nalon, R.; Assunção, R.; and Jr., W. M. 2017. Antagonism also flows through retweets: The impact of out-of-context quotes in opinion polarization analysis. In Eleventh International AAAI Conference on Weblogs and Social Media (ICWSM 2017).
  • [Hamilton, Ying, and Leskovec2017a] Hamilton, W. L.; Ying, R.; and Leskovec, J. 2017a. Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216.
  • [Hamilton, Ying, and Leskovec2017b] Hamilton, W. L.; Ying, R.; and Leskovec, J. 2017b. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584.
  • [Hate Base2017] Hate Base. 2017. Hate Base.
  • [Kadri, Thomas2018] Kadri, Thomas. 2018. Speech vs. Speakers.
  • [Kayes et al.2015] Kayes, I.; Kourtellis, N.; Quercia, D.; Iamnitchi, A.; and Bonchi, F. 2015. The social world of content abusers in community question answering. In Proceedings of the 24th International Conference on World Wide Web, 570–580. International World Wide Web Conferences Steering Committee.
  • [Kwok and Wang2013] Kwok, I., and Wang, Y. 2013. Locate the hate: Detecting tweets against blacks. In AAAI.
  • [Magu, Joshi, and Luo2017] Magu, R.; Joshi, K.; and Luo, J. 2017. Detecting the hate code on social media. arXiv preprint arXiv:1703.05443.
  • [Meme2016] Meme, K. Y. 2016. Operation Google.
  • [Mikolov et al.2013] Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • [Pennington, Socher, and Manning2014] Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In

    Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)

    , 1532–1543.
  • [Rainie, Anderson, and Albright2017] Rainie, L.; Anderson, J.; and Albright, J. 2017. The future of free speech, trolls, anonymity and fake news online. Pew Research Center, March 29.
  • [Ribeiro et al.2012] Ribeiro, B.; Wang, P.; Murai, F.; and Towsley, D. 2012. Sampling directed graphs with random walks. In INFOCOM, 2012 Proceedings IEEE. IEEE.
  • [Ribeiro, Wang, and Towsley2010] Ribeiro, B.; Wang, P.; and Towsley, D. 2010. On estimating degree distributions of directed graphs through sampling.
  • [Riloff et al.2013] Riloff, E.; Qadir, A.; Surve, P.; De Silva, L.; Gilbert, N.; and Huang, R. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In EMNLP, volume 13, 704–714.
  • [Ross et al.2017] Ross, B.; Rist, M.; Carbonell, G.; Cabrera, B.; Kurowsky, N.; and Wojatzki, M. 2017. Measuring the reliability of hate speech annotations: The case of the european refugee crisis. arXiv preprint arXiv:1701.08118.
  • [Sabatini and Sarracino2017] Sabatini, F., and Sarracino, F. 2017. Online networks and subjective well-being. Kyklos 70(3):456–480.
  • [Schmidt and Wiegand2017] Schmidt, A., and Wiegand, M. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, 1–10.
  • [Shi et al.2017] Shi, F.; Teplitskiy, M.; Duede, E.; and Evans, J. 2017. The wisdom of polarized crowds. arXiv preprint arXiv:1712.06414.
  • [Silva et al.2016] Silva, L. A.; Mondal, M.; Correa, D.; Benevenuto, F.; and Weber, I. 2016. Analyzing the targets of hate in online social media. In ICWSM, 687–690.
  • [Stein1986] Stein, E. 1986. History against free speech: The new german law against the” auschwitz” - and other - ”lies”. Michigan Law Review 85(2):277–324.
  • [The Guardian2017] The Guardian. 2017. Google’s bad week: YouTube loses millions as advertising row reaches US.
  • [Twitter2017] Twitter. 2017. Hateful conduct policy.
  • [Van Hee et al.2015] Van Hee, C.; Lefever, E.; Verhoeven, B.; Mennes, J.; Desmet, B.; De Pauw, G.; Daelemans, W.; and Hoste, V. 2015. Automatic detection and prevention of cyberbullying. In International Conference on Human and Social Analytics (HUSO 2015), 13–18. IARIA.
  • [Viswanath et al.2015] Viswanath, B.; Bashir, M. A.; Zafar, M. B.; Bouget, S.; Guha, S.; Gummadi, K. P.; Kate, A.; and Mislove, A. 2015. Strength in numbers: Robust tamper detection in crowd computations. In Proceedings of the 2015 ACM on Conference on Online Social Networks, 113–124. ACM.
  • [Warner and Hirschberg2012] Warner, W., and Hirschberg, J. 2012. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, 19–26. Association for Computational Linguistics.
  • [Waseem and Hovy2016] Waseem, Z., and Hovy, D. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In SRW @ HLT-NAACL, 88–93.
  • [Waseem et al.2017] Waseem, Z.; Chung, W. H. K.; Hovy, D.; and Tetreault, J. 2017. Understanding abuse: A typology of abusive language detection subtasks. In Proceedings of the First Workshop on Abusive Language Online, 78–85.
  • [Waseem2016] Waseem, Z. 2016. Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In Proceedings of the 1st Workshop on Natural Language Processing and Computational Social Science, 138–142.