The question about where is the borderline or whether there is indeed any borderline between ‘free speech’ and ‘hate speech’ is an ongoing subject of debate which has recently gained a lot of attention. With crimes related to hate speech increasing in the recent times111https://www.justice.gov/hatecrimes/hate-crime-statistics, hate speech is considered to be one of the fundamental problems that plague the Internet. The online dissemination of hate speech has even lead to real-life tragic events such as genocide of the Rohingya community in Myanmar, anti-Muslim mob violence in Sri Lanka, and the Pittsburg shooting. The big tech giants are also unable to control the massive dissemination of hate speech222https://tinyurl.com/facebook-leaked-moderation.
Recently, there have been a lot of research concerning multiple aspects of hate speech such as detection [15, 4, 72], analysis [10, 53], target identification [62, 49, 20], counter-hate speech [24, 45, 5] etc. However, very little is known about the temporal effects of hate speech in online social media, especially if it is considered as normative. In order to have a clear understanding on this, we would need to see the effects on a platform which allows free flow of hate speech. To understand the true nature of the hateful users, we need to study them in an environment that would not stop them from following/enacting on their beliefs. This led us to focus our study on Gab (). Gab is a social media site that calls itself the ‘champion of free speech’. The site does not prohibit a user from posting any hateful content. This natural environment in which the only moderation is what the community members impose on themselves provides a rich platform for our study. Using a large dataset of posts spanning around two years since the inception of the site, we develop a data pipeline which allows us to study the temporal effects of hate speech in an unmoderated environment. Our work adds the temporal dimension to the existing literature on hate speech and tries to study and characterize hate in unmoderated online social media.
Despite the importance of understanding hate speech in the current socio-political environment, there is little HCI work which looks into the temporal aspects of these issues. This paper fills an important research gap in understanding how hate speech evolves in an environment where it is protected under the umbrella of free speech. This paper also opens up questions on how new HCI design policies of online platforms should be regulated to minimize/mitigate the problem of the temporal growth of hate speech. We posit that HCI research, acknowledging the far-reaching mal consequences of this problem, should factor it into the ongoing popular initiative of platform governance333https://www.tandfonline.com/eprint/KxDwNEpqTY86MNpRDHE9/full.
Outline of the work
To understand the temporal characteristics, we needed data from consecutive time points in Gab. As a first step, using a heuristic, we generate successive graphs which capture the different time snapshots of Gab at one month intervals. Then, using the DeGroot model, we assign a hate intensity score to every user in the temporal snapshot and categorize them based on their degrees of hate. We then perform severallinguistic and network studies on these users across the different time snapshots.
RQ1: How can we characterize the growth of hate speech in Gab?
RQ2: How have the hate speakers affected the Gab community as a whole?
RQ1 attempts to investigate the general growth of hate speech in Gab. Previous research on Gab  states that the hateful content is 2.4x as compared to Twitter. RQ2, on the other hand, attempts to identify how these hateful users have affected the Gab community. We study this from two different perspectives: language and network characteristics.
For RQ1, we found that the amount of hate speech in Gab is consistently increasing. This is true for the new users joining as well. We found that the recently joining new users take much less time to become hateful as compared those that joined at earlier time periods. Further, the fraction of users getting exposed to hate speech is increasing as well.
For RQ2, we found that the language used by the community is aligning more with the hateful users as compared to the non-hateful ones. The hateful users also seem to be playing a pivotal role from the network point of view. We found that the hateful users reach the core of the community faster and in larger sizes.
2 Prior Work
The hate speech research has a substantial literature and it has recently gained a lot of attention from the Computer Science perspective. In the following sections, we will examine the various aspects of research on hate speech. Interested readers can follow Fortuna et al.  and Schmidt et al.  for a comprehensive survey of this subject.
Definition of hate speech
Hate speech lies in a complex confluence of freedom of expression, individual, group and minority rights, as well as concepts of dignity, liberty and equality . Owing to the subjective nature of this issue, deciding if a given piece of text contains hate speech is onerous. In this paper, we use the hate speech definition outlined in the work done by Elsherief et al. . The author defines hate speech as a “direct and serious attack on any protected category of people based on their race, ethnicity, national origin, religion, sex, gender, sexual orientation, disability or disease.”. Others have a slightly different definition for hate speech but the spirit is mostly the same. In our work we shall mostly go by this definition unless otherwise explicitly mentioned.
Hate speech is a complex phenomenon, intrinsically associated to relationships among groups, and also relying on linguistic nuances . It is related to some of the concepts in social science such as incivility , radicalization , cyberbullying , abusive language [11, 52], toxicity [32, 65], profanity  and extremism . Owing to the overlap between hate speech and these concepts, sometimes it becomes hard to differentiate between them . Teh et al.  obtained a list of frequently used profane words from comments in YouTube videos and categorized them into 8 different types of hate speech. The authors aim to use these profane words for automatic hate speech detection. Malmasi et al. 
attempts to distinguish profanity from hate speech by building models with features such as n-grams, skip-grams and clustering-based word representations.
Effects of hate speech
Previous studies have found that public expressions of hate speech affects the devaluation of minority members , the exclusion of minorities from the society , psychological well-being and the suicide rate among minorities , and the discriminatory distribution of public resources . Frequent and repetitive exposure to hate speech has been shown to desensitize the individual to this form of speech and subsequently to lower evaluations of the victims and greater distancing, thus increasing outgroup prejudice . Olteanu et. al  studied the effect of violent attacks on the volume and type of hateful speech on two social media platforms, Twitter and Reddit. They found that extremist violence tends to lead to an increase in online hate speech, in particular, on messages directly advocating violence.
The research interest in hate speech, from a computer science perspective, is gaining interest. Larger datasets [15, 23, 16] and different approaches have been devised by researchers to detect hateful social media comments. These methods include techniques such as dictionary-based , distributional semantics , multi-feature 
and neural networks.
Burnap et al. 27]
used sentiment analysis along with subjectivity detection to generate a set of words related to hate speech for hate speech classification. Chau et. al used analysis of hyperlinks among web pages to identify hate group communities. Zhou et al.  used multidimensional scaling (MDS) algorithm to represent the proximity of hate websites and thus capture their level of similarity. Lie et al.  incorporated LDA topic modelling for improving the performance of the hate speech detection task. Saleem et al.  proposed an approach to detecting hateful speech using self-identifying hate communities as training data for hate speech classifiers. Davidson et al.  used crowd-sourcing to label tweets into three categories: hate speech, only offensive language, and those with neither. Waseem et al.  presented a list of criteria based on critical race theory to identify racist and sexist slurs.
More recently, researchers have started using deep learning methods[4, 72] and graph embedding techniques  to detect hate speech. Badjatiya et al.  applied several deep learning architectures and improved the benchmark score by 18 F1 points. Zhang et al.  used deep neural network, combining convolutional and gated recurrent networks to improve the results on 6 out of 7 datasets. Gao et al.  utilized the context information accompanied with the text to develop hate speech detection models. Grondahl et al.  found that several of the existing state-of-the-art hate speech detection models work well only when tested on the same type of data they were trained on.
While most of the computational approaches focus on detecting if a given text contains hate speech, very few works focus on the user account level detection. Gian et al.  proposed a model that leverages intra-user and inter-user representation learning for hate speech detection.
Gibson  studied the moderation policies on Reddit communities and observed that ‘safe space’ have higher levels of censorship and is directly related to the politeness in the community. Studying the effects of hate speech in online social media remains an understudied area in HCI research. By employing our data processing pipeline, we study the temporal effects of hate speech on Gab.
The Gab social network
Gab is a social media platform launched in August 2016 known for promoting itself as the “Champion of free speech”. However, it has been criticized for being a shield for alt-right users . The site is very similar to Twitter but has very loose moderation policies. According to the Gab guidelines, the site does not restrain users from using hateful speech444https://gab.com/about/tos. The site allows users to read and write posts of up to 3,000 characters. The site employs an upvoting and downvoting mechanism for posts and categorizes posts into different topics such as News, Sports, Politics, etc.
We use the dataset developed by Mathew et al.  for our analysis. For the sake of completeness of the paper, we present the general statistics of the dataset in Table 1. The dataset contains information from August 2016 to July 2018. We do not use the data for the initial two months (August-September 2016) and the last month (July 2018) as they had less posts.
|Number of posts||21,207,961|
|Number of reply posts||6,601,521|
|Number of quote posts||2,085,828|
|Number of reposts||5,850,331|
|Number of posts with attachments||9,669,374|
|Number of user accounts||341,332|
|Average follower per account||62.56|
|Average following per account||60.93|
To address our research questions, we need to have a temporal overview of the activity of each user. So, our first task involves generating temporal snapshots to capture the month-wise activity of the users. We develop a pipeline to generate the hate vectors of each users for this purpose. A hate vector is a representation used to capture the activity of each user. Higher value in the hate vector is an indication of the hatefulness of a user, whereas a lower value means that the user potentially did not indulge in any hateful activity.
In this section, we will explain the pipeline we used to study the temporal properties of hate. The pipeline mainly consists of the following three tasks:
Generating temporal snapshots: We divide the data such that a snapshot will represent the activities of a particular month.
Hate intensity calculation: We calculate the hate intensity score for each user, which represent the hateful activity of a user based on his/her posts, reposts, and network connections.
User profiling: We profile users based on his/her temporal activity of hate speech, which is represented by a vector of his/her hate intensity score.
Figure 1 shows our overall data processing pipeline.
5 Generating Temporal Snapshots
In order to study the temporal nature of hate speech, we need a temporal sequence of posts, reposts, users being followed, and users following the account. Thus, for each snapshot, we should have a list of the new posts, reposts, followers, and following of each user. This would allow us to have a better picture of the user stance/opinion in each snapshot. The Gab dataset gives us the information regarding the post creation date, but it does not provide any information about when a particular user started following another user. Using various data points we have, we come up with a technique in the following section to approximate the month in which a user started following another user.
New followers in each snapshot
While the post creation date is available in the dataset, the Gab API does not provide us with the information regarding when a particular user started following another user. Hence, we apply a heuristic by Meeder et al. , which was used in previous works [36, 3]; to get a lower bound on the following link creation date. The heuristic is based on the fact that the API returns the list of followers/friends of a user ordered by the link creation time. We can thus obtain a lower bound on the follow time using the account creation date of a followers. For instance, if a user is followed by users (in this order through time) and the users joined Gab on dates , then we can know for certain that was not following before . We applied this heuristic on our dataset and ordered all of the following relationships according to this. The authors 
proved that this heuristic is pretty accurate (within several minutes) specially on time periods where there are high follow rates. Since in our case we have considered a much larger window of one month, it would provide a fairly accurate estimate about the link creation time.
Hence the above heuristic helps to get the list of followers/friends each month for a particular user. This information, combined with the creation dates of his posts allows us to construct a temporal snapshot of his/her activity each month.
Dynamic graph generation
We consider the Gab graph () as a dynamic graph with no parallel edges. We represent the dynamic graph as a set of successive time step graphs , where denotes the graph at snapshot , where the set of nodes is (=) nodes and the set of edges is (=). In this paper, we consider the time duration between each successive snapshot as one month. An example of this dynamic graph is provided in figure 2.
Each snapshot, is a weighted directed graph with the users as the nodes and the edges representing the following relationship. The edge weight is calculated based on the user’s posting and reposting activity. We shall explain the exact mechanism of calculation of this weight in the following section.
6 Hate Intensity calculation
We make use of the temporal snapshots to calculate the hate intensity of a user. The notion of hate intensity allows us to capture the overall hatefulness of a user. A user with a high value of hate intensity would be considered to be a potential hateful user as compared to another with lower value. The hate intensity value ranges from 0 to 1, with 1 representing highly hateful user and values close to zero representing non-hateful user.
We use the DeGroot model [17, 28, 56, 43] to calculate the hate intensity of a user at each snapshot. Similar to Mathew et al. , our purpose of using DeGroot model is to capture users who did not use these hate keywords explicitly, yet have a high potential to spread hate. We later perform manual evaluation to ensure the quality of the model.
In the DeGroot opinion dynamics model , each individual has a fixed set of neighbours, and the local interaction is captured by taking the convex combination of his/her own opinion and the opinions of his/her neighbours at each time step . The DeGroot model describes how each user repeatedly updates its opinion to the average of those of its neighbours. Since this model reflects the fundamental human cognitive capability of taking convex combinations when integrating related information , it has been studied extensively in the past decades . We will now briefly explain the DeGroot model and how we modify it to calculate the hate intensity of a user account.
In the DeGroot model, each user starts with an initial belief. In each time step, the user interacts with its neighbours and updates his/her belief based on the neighbour’s beliefs. The readers should remember that each snapshot is a directed graph, with representing the set of vertices and representing the set of edges at snapshot . Let denote the set of neighbours of node and denote the belief of the node at iteration . The update rule in this model is the following: where .
For each snapshot, we assign the initial edge weights based on the following criteria:
where denotes the number of reposts done by user , where the original post was made user . represents the following relationship, where means that user is following user , and means that user is not following user . Similarly, denotes the number of posts by user .
We then run the DeGroot model on each snapshot graph for 5 iterations, similar to Mathew et al. , to obtain the hate score for each of the users.
We initially started with the lexicon set available in Mathew et al. . These high-precision keywords were selected from Hatebase555https://hatebase.org and Urban dictionary666https://www.urbandictionary.com. To further enhance the quality of the lexicon, we adopt the word embedding method, skip-gram 
, to learn distributed representation of the words from our Gab dataset in an unsupervised manner. This would allow us to enhance the hate lexicon with words that are specific to the dataset as well as spelling variations used by the Gab users. For example, we found more than five variants for the derogatory termni**er in the dataset used by hateful users. We manually went through the words and carefully selected only those words which could be used in a hateful context. This resulted in a final set of 187 phrases which we have made public777https://www.dropbox.com/sh/spidpraeln0qrtj/AACyFRPAWURXT05dbHwH9-Kta?dl=0 for the use of future researchers. In figure 3, we plot the % of posts that have at least one of the words from these hate lexicon. We can observe from these initial results that the volume of hateful posts on Gab is increasing over time. Further, in order to establish the quality of this lexicon, we collected three posts randomly for each of the words in the hate lexicon. Two of the authors independently annotated these posts for the presence of hate speech, which yielded 88.5% agreement where both the annotators found the posts to be hateful. The value indicates that the lexicons developed are of high quality. The annotators were instructed to follow the definition of hate speech used in Elsherief et al. .
Calculating the hate score
Using the high precision hate lexicon directly to assign a hate score to a user should be problematic because of two reasons: first, we might miss out on a large set of users who might not use any of the words in the hate lexicon directly or use spelling variations, thereby, getting a much lower score. Second, many of the users share hateful messages via images, videos and external links. Using the hate lexicon for these users will not work. Instead, we use a variant of the methodology used in Riberio et al.  to assign each user in each snapshot a value in the range which indicates the users’ propensity to be hateful.
We enumerate the steps of our methodology below. We apply this procedure for each snapshot to get the hate score for each user.
We identify the initial set of potential hateful users as those who have used the words from the hate lexicon in at least two posts. Rest of the users are identified as non-hateful users.
Using the snapshot graph, we assign the edge weight according to equation 1. We convert this graph into a belief graph by reversing the edges in the original graph and normalizing the edge weights between and .
We then run a diffusion process based on the DeGroot’s learning model on the belief network. We assign an initial belief value of to the set of potential hateful users identified earlier and to all the other users.
We observe the belief values of all the users in the network after five iterations of the diffusion process.
The DeGroot’s model will assign each user a hate score in the range with implying the least hateful and implying highly hateful. In order to draw the boundary between the hateful and non-hateful users, we need a threshold value, above which we might be able to call a user is hateful. The same argument goes for the non-hateful users as well: a threshold value below which the user can be considered to be non-hateful.
In order to select such threshold values, we used -means [38, 34] as a clustering algorithm on the scalar values of the hate score. Briefly, -means selects points in space to be the initial guess of the centroids. Remaining points are then allocated to the nearest centroid. The whole procedure is repeated until no points switch cluster assignment or a number of iterations is performed. In our case, we assign which would give us three regions in the range represented by three centroids , , and denoting ‘low hate’, ‘medium hate’ and ‘high hate’, respectively. The purpose of having medium hate category is to capture the ambiguous users. These will be the users who will have values that are neither high enough to be considered hateful nor low enough to be considered non-hateful. We apply -means algorithm on the list of hate scores from all the snapshots. Figure 5 shows the fraction of users in each category of hate in each snapshot. The DeGroot model is biased toward non-hate users as in every snapshot, a substantial fraction of users are initially assigned a value of zero. As shown in figure 4, the centroid values are 0.0421 (), 0.2111 (), 0.5778 () for the low, mid, and high hate score users, respectively.
7 User profiling
Using the centroid values (, , and ), we transform the activities of a user into a sequence of low, medium, and high hate over time. We denote this sequence by a vector . Each entry in consists of one of the three values of low, mid, and high hate. This would allow us to find the changes in the perspective of a user at multiple time points.
Consider the example given in figure 5(a). The vector represents a user who had high hate score for most of the time period with intermittent low and medium hate score. Similarly, figure 5(b) shows a user who had low hate score for most of the time period. For the purpose of this study, we mainly focus on only two types of user profiles: consistently hateful users and the consistently non-hateful users.
We would like to point out here that other types of variations could also be possible. Like a user’s hate score might change from one category to other multiples times, but we have not considered such cases here.
In order to find these users, we adopt a conservative approach and categorize the users based on the following criteria:
Hateful: We would call a user as hateful if at least 75% of his/her entries contain an ‘H’.
Non-hateful: We would call a user as non-hateful user if at least 75% of his/her entries contain an ‘L’.
In addition, we used the following filters on the user accounts as well:
The user should have posted at least five times.
The account should be created before February 2018 so that there are at least six snapshots available for the user.
We have not considered users with hate score in the mid-region as they are ambiguous. After the filtering, the number of users in the two different categories are noted in Table 2. In the following section, we will perform textual and network analysis on these types of users and try to characterize them.
Sampling the appropriate set of non-hateful users
We use the non-hateful users as the candidates in the control group. Our idea of the control group is to find non-hate users with similar characteristics as the hateful users. For sanity check purpose, we identify users who have (nearly) the same activity rate as the users in the hate set. We define the activity rate of a user as the sum of the number of posts and reposts done by the user, divided by the account age as of June 2018. For each hateful user, we identify a user from the non-hateful set with the nearest activity rate. We repeat this process for all the users in the hate list. We then performed Mann-Whitney -test  to measure the goodness. We found the value of and -value = 0.441. This indicates that the hate and non-hate users have nearly the same distribution. By using this subset of non-hate users, we aim to capture any general trend in Gab. Our final set consists of 1,019 hateful users and the corresponding 1,019 non-hateful users who have very similar activity profile.
Evaluation of user profiles
We evaluate the quality of the final dataset of hateful and non-hateful accounts through human judgment. We ask two annotators to determine if a given account is hateful or non-hateful as per their perception. Since Gab does not have any policy for hate speech, we use the definition of hate speech provided by Elsherief et al.  for this task. We provide the annotators with a class balanced random sample of 200 user accounts.
Each account was evaluated by two independent annotators. After the labelling was complete, we only took those annotations where both the annotators agreed. This gave us a final set of 258 user accounts, where 135 accounts were hateful and 123 accounts were non-hateful. We compared these ground truth annotations with our model predictions and found that they were in almost 100% agreement.
8 RQ1: How can we characterize the growth of hate speech in Gab?
The volume of hate speech is increasing
As a first step to measure the growth of hate speech in Gab, we use the hate lexicon that we generated to find the number of posts which contain them in each month. We can observe from figure 3 that the amount of hate speech in Gab is indeed increasing.
More number of new users are becoming hateful
Another important aspect about the growth that we considered was the fraction of new users who are getting exposed to hate. In this scenario, we say that a user has become hateful, if his/her hate vector has the entry ‘H’ at least times within months from the account creation. In figure 7, we plot for and ‘H’ entries, to observe the fraction of users for each month who are becoming hateful. As we can observe, the fraction of users being exposed to hate speech is increasing over time.
New users are becoming hateful at a faster rate
In figure 8, we show how much time does a user take to have the first, second and third ‘H’ entry888The reason for the initial dip in the plot is that some of the users who have 1 ‘H’ do not further account for 2 ‘H’ and 3 ‘H’ since they were never assigned a ‘H’ after the first one.. We observe that with time the time required for a user to get his/her first exposure to hate decreases in Gab.
9 RQ2: What was the impact of the hateful users on Gab?
Hate users receive replies much faster
In order to understand the user engagement, we define ‘first reply time’(FRT) which tries to measure the time taken to get the first comment/reply to a post by a user.
We define the ‘first reply time’ (FRT) for a set of users as , where represents the average time taken to get the first reply for the posts written by a user .
We calculated the FRT values for the set of hateful and non-hateful users and found that the average time for the first reply is 51.32 minutes for non-hate users, whereas it is 44.38 minutes for the hate users (-value ). The indicates that the community is engaging with the hateful users at a faster speed as compared to the non-hateful users.
Hateful users: lone wolf or clans
In this section, we study the hateful and non-hateful users from a network-centric perspective by leveraging user-level dynamic graph. This approach has been shown to be effective in extracting anomalous patterns in microblogging platforms such as Twitter and Weibo [71, 61]. In similar lines, we conduct an unique experiment, where we track the influence of hateful and non-hateful users across successive temporal snapshots.
We utilize the node metric – k-core or coreness to identify influential users in the network . Nodes with high coreness are embedded in major information pathways. Hence they have been shown to be influential spreaders, that can diffuse information to a large portion of the network [40, 35]. For further details about coreness and its several applications in functional role identification, we refer to Malliaros et al. . We first calculate coreness of the undirected follower/followee graph for each temporal snapshot using k-core decomposition . In each snapshot, we subdivide all the nodes into buckets where consecutive buckets comprise increasing order of influential nodes, i.e., the bottom percentile nodes to the top percentile nodes in the network. We calculate the proportion of each category of users in all the proposed buckets across multiple dynamic graphs. We further estimate the proportion of migration from different buckets in consecutive snapshots. We illustrate results as a flow diagram in figure 9. The innermost core is labeled 0, the next one labeled 1 and so on. The bars that have been annotated with a label denote the proportion of users who have eventually been detected to be in a particular category but have not yet entered in the network at that time point (Account is not yet created).
Position of hateful users: We demonstrate the flow hateful users in figure 8(a). The leftmost bar denotes the entire group strength. The following bars indicate consecutive time points, each showcasing the evolution of the network.
We could observe several interesting patterns in figure 8(a). In the initial three time points, we observe that a large proportion of users are confined to the outer shells of the network. This forms a network-centric validation of the hypothesis that newly joined users tend to familiarize themselves with the norms of the community and do not exert considerable influence . However, in the final time points we observe that the hateful users rapidly rise in ranks and the majority of them assimilate in the inner cores. This trend among Gab users has been found consistent with other microblogging sites like Twitter 
where hate mongers have been found with higher eigenvector and betweenness centrality compared to normal accounts. There are also surprising cases where a fraction of users who have just joined the network, become part of the inner core very quickly. We believe that this is by their virtue of already knowing a lot of ‘inner core’ Gab users even before they join the platform.
Position of non-hateful users: Focusing on figure 8(b), which illustrates the case of non-hateful users, we see a contrasting trend. The flow diagram shows that users already in influential buckets continue to stay there over consecutive time periods. The increase in core size at a time point can be mostly attributed to the nodes of the nearby cores in the previous time point. We also observe that in the final snapshot of the graph all the cores tend to have a similar number of nodes. These results are in sharp contrast with those observed for the hateful users (figure 8(a)).
Acceleration toward the core: We were also interested in understanding the rate at which the users were accelerating toward the core. To this end, we calculated the time it took for the users to reach bucket 0 from their account creation time. We found that a hateful user takes only 3.36 month to do this, whereas a non-hateful users requires 6.94 months to reach an inner core in the network. We test the significance of this result with the Mann-Whitney -test and found and -value.
To further understand the volume of users transitioning in-between the cores of the network, we compute the ratio of the hateful to the non-hateful users in a given core for each month. Figure 10 plots the ratio values. A value of 1.0 means that an equal number of hateful and non-hateful users occupy the same core in a particular month. A value less than one means that there were more non-hateful users in a particular core than there were hateful users. We observe that in the initial time periods (October 2016 - July 2017), the non-hateful users were occupying the inner core of the network more. However, after this, the fraction of hateful users in the innermost started increasing, and around August 2017 the fraction of hateful users surpassed the non-hateful ones. We observe similar trends in all the four innermost cores (0, 1, 2, and 3).
Gab community is increasingly using language similar to users with high hate scores
Gab started in August 2016 with the intent to become the ‘champion of free speech’. Since its inception, it has attracted several types of users. As the community evolves, so does the members in the community. To understand the temporal nature of the language of the users and the community, we utilize the framework developed by Danescu et al. . In their work, the authors use language models to track the linguistic change in communities.
We use kenLM  to generate language models for each snapshot. These ‘Snapshot Language Models’ (SLM) are generated for each month, and they capture the linguistic state of a community at one point of time. The SLMs allow us to capture how close a particular utterance is to a community. The ‘Hate Snapshot Language Model’ (H-SLM) is generated using the posts written by the users with high hate score in a snapshot as the training data. Similarly, we generate the ‘Non-hate Snapshot Language Model’ (N-SLM), which uses the posts written by users with low hate score in a snapshot for the training data. Note that unlike in the previous sections where we were building hate vectors aggregated over different time snapshots to call a user hateful/non-hateful, here we consider posts of users with high/low hate scores for a given snapshot to build the snapshot wise training data for the language models999It is not possible to extend the hate vector concept here as we are building language models snapshot by snapshot.. For a given snapshot, we use the full data for testing. Using these two models, we test them on all the posts of the month and report the average cross entropy
where represents the cross-entropy,
is the probability assigned to bigramfrom comment in community-month . Here, the community can be hate (H-SLM) or non-hate (N-SLM)101010We controlled for the spurious length effect by considering only the initial 30 words only ; the same controls are used in the cross-entropy calculations..
A higher value of cross-entropy indicates that the posts of the month deviate from the respective type of the community (hate/ non-hate). We plot the entropy values in figure 11. As is observed, the whole Gab community seems to be more aligned with the language model built using the posts of the users with high hate scores. A remarkable observation is that from around May 2017, the language used by the Gab community started getting closer to the language of users with high hate scores. This may indicate that the hate community is evolving its own linguistic culture and the other users on the platform are slowly (and possibly unknowingly) aligning themselves to this culture thus making it the normative. If left unmoderated, one might notice in the near future that there would be only one language on this platform – the language of hate.
Ethical considerations and implications
The ongoing discussion of whether to include hate speech under the umbrella of free speech is a subject of great significance. A major argument used by the supporters of hate speech is that any form of censorship of speech is a violation of freedom of speech. Our work provides a peek into the hate ecosystem developed on a platform which does not have any sort of moderation, apart form ‘self-censorship’.
We caution against our work being perceived as a means for full-scale censorship. Our work is not indented to be perceived as a support for full-scale censorship. We simply argue that the ‘free flow’ of hate speech should be stopped. We leave it to the platform or government to implement a system that would reduce such speech in the online space.
Although the intent of Gab was to provide support for free speech, it is acting as a fertile ground for the fringe communities such as alt-right, neo-Nazis etc. Freedom of speech which harms others is no longer considered freedom. While banning of users and/or comments is not democratic, platforms/governments would still need to curb such hateful contents for proper functioning of the ‘online’ society. Recently, some of the works have started looking into alternatives to the banning approach. One of the contenders for this is counterspeech [6, 45, 44]. The basic idea is that instead of banning hate speech, we should use crowdsourced responses to reply to these messages. The main advantage of such an approach is that it does not violate the freedom of speech. However, there are some doubts on how much applicable/practical this approach is. Large scale studies would need to be done to observe the benefits and costs of such an approach. Understanding the effects of such moderation is an area of future work.
We suggest that social media platforms could start incentive programs for counterspeech. They could also provide interface to group moderators to identify hateful activities and take precautionary steps. This would allow platforms to stop the spread of hateful messages in an early stage itself. The effect of hate speech from influential users is also much more as compared to others and thus targeted campaigns are required to overcome such adverse effects.
Monitoring the growth of hate speech
The platform should have interface which allows moderators to monitor the growth of hate speech in the community. This could be a crowdsourced effort which could help identify users who are attempting to spread hate speech.
As we have seen that the new users are gravitating toward the hateful community at a faster rate and quantity, there is a need for methods that could detect and prevent such movement. There could exist radicalization pipelines  which could navigate a user toward hateful contents. Platforms should make sure that their user feed and recommendation algorithms are free from such issues. Exposure to such content could also lead to desensitization toward the victim community . We would need methods which would take the user network into consideration as well. Instead of waiting for a user to post his/her hateful post after the indoctrination, the platforms will need to be proactive instead of reactive. Some simple methods such as nudge , or changing the user feed to reduce polarization  could be an initial step. Further research is required in this area to study these points more carefully.
Platform governance – the rising role of HCI
All the points that we had discussed above related to moderation and monitoring can be aptly summarized as initiatives toward platform governance. We believe that within this initiative the HCI design principles of the social media platforms need to completely overhauled. In February 2019, the United Kingdom’s Digital, Media, Culture, and Sport (DCMS) committee issued a verdict that social media platforms can no longer hide themselves behind the claim that they are merely a ‘platform’ and therefore have no responsibility of regulating the content of their sites111111https://policyreview.info/articles/analysis/platform-governance-triangle-conceptualising-informal-regulation-online-content. In fact, the European Union now has the ‘EU Code of Conduct on Terror and Hate Content’ (CoT) that applies to the entire EU region. Despite the increase in toxic content and harassment, Twitter did not have a policy of their own to mitigate these issues until the company created a new organisation – ‘Twitter Trust and Safety Council’ in 2015. A common way deployed by the EU to combat such online hate content involves creation of working groups that combine voices from different avenues including academia, industry and civil society. For instance, in January 2018, 39 experts met to frame the ‘Code of Practice on Online Disinformation’ which was signed by tech giants like Facebook, Google etc. We believe that HCI practitioners have a lead role to play in such committees and any code of conduct cannot materialize unless the HCI design policies of these platforms are reexamined from scratch.
11 Limitations and future works
There are several limitations of our work. We are well aware that studies conducted on only one social media such as Gab have certain limitations and drawbacks. Especially, since other social media sites delete/suspend hateful posts and/or users, it becomes hard to conduct similar studies on those platforms. The initial keywords selected for the hate users were in English. This would bias the initial belief value assignment as users who use non-English hate speech would not be detected directly. But, since these users follow similar hate users and repost several of their content, we would still detect many of them. We plan to take up the multilingual aspect as an immediate future work. Another major limitation of our work is the high-precision focus of the work which would leave out several users who could have been hateful.
As part of the future work, we also plan to use the discourse structure of these hateful users for better understanding of the tactics used by these users in spreading hate speech . This would allow us to break down the hate speech discourse into multiple components and study them in detail.
In this paper, we perform the first temporal analysis of hate speech in online social media. Using an extensive dataset of 21M posts by 314K users on Gab, we divide the dataset into multiple snapshots and assign a hate score to each user at every snapshot. We then check for variations in the hate score of the user. We characterize these account types on the basis of text and network structure. We observe that a large fraction of hateful users occupy the core of the Gab network and they reach the core at a much faster rate as compared to non-hateful users. The language of the hate users seem to pervade across the whole network and convert even benign users unknowingly to speak the language of hate. Our work would be extremely useful to platform designers to detect the hateful users at an early stage and introduce appropriate measure to change the users’ stance.
Using knn and svm based one-class classifier for detecting online radicalization on twitter. In International Conference on Distributed Computing and Internet Technology, pp. 431–442. Cited by: §2.
-  (1981) Foundations of information integration theory. Cited by: §6.
-  (2015) Evolving twitter: an experimental analysis of graph properties of the social graph. arXiv preprint arXiv:1510.01091. Cited by: §5.
-  (2017) Deep learning for hate speech detection in tweets. WWW, pp. 759–760. Cited by: §1, §2, §2.
-  (2016) Counterspeech on twitter: a field study. Dangerous Speech Project. Available at: https://dangerousspeech.org/counterspeech-on-twitter-a-field-study. Cited by: §1.
-  (2014) Countering dangerous speech: new ideas for genocide prevention. Washington, DC: US Holocaust Memorial Museum. Cited by: §10.
Us and them: identifying cyber hate on twitter across multiple protected characteristics.
EPJ Data Science5 (1), pp. 11. Cited by: §2.
-  (2019) Controlling polarization in personalization: an algorithmic framework. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 160–169. Cited by: §10.
-  (2013) Models for the diffusion of beliefs in social networks: an overview. IEEE Signal Processing Magazine 30 (3), pp. 16–29. Cited by: §6.
-  (2017) You can’t stay here: the efficacy of reddit’s 2015 ban examined through hate speech. PACMHCI 1, pp. 31:1–31:22. Cited by: §1.
-  (2017) The bag of communities: identifying abusive behavior online with preexisting internet data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 3175–3187. Cited by: §2.
-  (2007) Mining communities and their relationships in blogs: a study of online hate groups. International Journal of Human-Computer Studies 65 (1), pp. 57–70. Cited by: §2.
-  (2011) Detecting offensive language in social medias for protection of adolescent online safety. Ph.D. Dissertation. The Pennsylvania State University. Cited by: §2.
-  (2013) No country for old members: user lifecycle and linguistic change in online communities. In Proceedings of the 22nd international conference on World Wide Web, pp. 307–318. Cited by: §9, §9, footnote 10.
-  (2017) Automated hate speech detection and the problem of offensive language. In Eleventh International AAAI Conference on Web and Social Media, Cited by: §1, §2, §2, §2.
-  (2018) Hate speech dataset from a white supremacy forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pp. 11–20. Cited by: §2.
-  (1974) Reaching a consensus. Journal of the American Statistical Association 69 (345), pp. 118–121. Cited by: §6, §6.
-  (2005) W ilcoxon–m ann–w hitney test. Encyclopedia of statistics in behavioral science. Cited by: §7.
-  (2015) Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web, pp. 29–30. Cited by: §2.
-  (2018) Hate lingo: a target-based linguistic analysis of hate speech in social media. In Twelfth International AAAI Conference on Web and Social Media, Cited by: §1, §2, §6, §7.
-  (2015) Labelling and discrimination: do homophobic epithets undermine fair distribution of resources?. British Journal of Social Psychology 54 (2), pp. 383–393. Cited by: §2.
-  (2018) A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR) 51 (4), pp. 85. Cited by: §2, §2.
-  (2018) Large scale crowdsourcing and characterization of twitter abusive behavior. In Twelfth International AAAI Conference on Web and Social Media, Cited by: §2.
-  (2015) Countering online hate speech. Unesco Publishing. Cited by: §1, §2.
-  (2017) Detecting online hate speech using context aware models. arXiv preprint arXiv:1710.07395. Cited by: §2.
-  (2017) Safe spaces & free speech: effects of moderation policy on structures of online forum discussions. In Proceedings of the 50th Hawaii International Conference on System Sciences, Cited by: §2.
-  (2015) A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10 (4), pp. 215–230. Cited by: §2.
-  (2010) Naive learning in social networks and the wisdom of crowds. American Economic Journal: Microeconomics 2 (1), pp. 112–49. Cited by: §6.
-  (1985) The effect of an overheard ethnic slur on evaluations of the target: how to spread a social disease. Journal of Experimental Social Psychology 21 (1), pp. 61–72. Cited by: §2.
-  (2018) All you need is” love”: evading hate-speech detection. arXiv preprint arXiv:1808.09115. Cited by: §2.
-  (2007) Using a semi-automatic keyword dictionary for improving violent web site filtering. In 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, pp. 337–344. Cited by: §2.
-  (2018) A review of standard text classification practices for multi-label toxicity identification of online content. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pp. 21–25. Cited by: §2.
-  (2013-08) Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp. 690–696. External Links: Cited by: §9.
-  (1999) Data clustering: a review. ACM computing surveys (CSUR) 31 (3), pp. 264–323. Cited by: §6.
-  (2010) Identification of influential spreaders in complex networks. Nature physics 6 (11), pp. 888. Cited by: §9.
-  (2011) Anti-preferential attachment: if i follow you, will you follow me?. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on, pp. 339–346. Cited by: §5.
New classification models for detecting hate and violence web content.
2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), Vol. 1, pp. 487–495. Cited by: §2.
-  (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, pp. 281–297. Cited by: §6.
-  (2018) Opinion conflicts: an effective route to detect incivility in twitter. Proceedings of the ACM on Human-Computer Interaction 2 (CSCW), pp. 117. Cited by: §2.
-  (2016) Locating influential nodes in complex networks. Scientific reports 6, pp. 19307. Cited by: §9.
-  (2019) The core decomposition of networks: theory, algorithms and applications. Cited by: §9.
Challenges in discriminating profanity from hate speech.
Journal of Experimental & Theoretical Artificial Intelligence30 (2), pp. 187–202. Cited by: §2.
-  (2018) Spread of hate speech in online social media. arXiv preprint arXiv:1812.01693. Cited by: §3, §6, §6, §6.
-  (2018) Analyzing the hate and counter speech accounts on twitter. arXiv preprint arXiv:1812.02712. Cited by: §10.
-  (2018) Thou shalt not hate: countering online hate speech. arXiv preprint arXiv:1808.04409. Cited by: §1, §10.
-  (2010) A call to educate, participate, invoke and indict: understanding the communication of online hate groups. Communication Monographs 77 (2), pp. 257–280. Cited by: §2.
-  (2011) We know who you followed last summer: inferring social link creation times in twitter. In Proceedings of the 20th international conference on World wide web, pp. 517–526. Cited by: §5.
-  (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: §6.
-  (2017) A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media, pp. 85–94. Cited by: §1.
-  (2003) Ethnophaulisms and exclusion: the behavioral consequences of cognitive representation of ethnic immigrant groups. Personality and Social Psychology Bulletin 29 (8), pp. 1056–1067. Cited by: §2.
-  (2004) Immigrant suicide rates as a function of ethnophaulisms: hate speech predicts death. Psychosomatic Medicine 66 (3), pp. 343–348. Cited by: §2.
-  (2016) Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web, pp. 145–153. Cited by: §2.
-  (2018) The effect of extremist violence on hateful speech online. In ICWSM, Cited by: §1, §2.
-  (2018) Framing hate with hate frames: designing the codebook. In Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing, pp. 201–204. Cited by: §11.
-  (2018) Leveraging intra-user and inter-user representation learning for automated hate speech detection. In NAACL, Vol. 2, pp. 118–123. Cited by: §2.
-  (2018) Characterizing and detecting hateful users on twitter. In Twelfth International AAAI Conference on Web and Social Media, Cited by: §2, §6, §6, §9.
-  (2019) Auditing radicalization pathways on youtube. arXiv preprint arXiv:1908.08313. Cited by: §10.
-  (2017) A web of hate: tackling hateful speech in online social spaces. arXiv preprint arXiv:1709.10159. Cited by: §2.
-  (2018) Anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In Twelfth International AAAI Conference on Web and Social Media, Cited by: §2.
A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. Cited by: §2.
-  (2016) CoreScope: graph mining using k-core analysis—patterns, anomalies and algorithms. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 469–478. Cited by: §9, §9.
-  (2016) Analyzing the targets of hate in online social media. In Tenth International AAAI Conference on Web and Social Media, Cited by: §1.
-  (2012) Profanity use in online communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1481–1490. Cited by: §2.
-  (2018) Exposure to hate speech increases prejudice through desensitization. Aggressive behavior 44 (2), pp. 136–146. Cited by: §10, §2.
-  (2018) Identifying aggression and toxicity in comments using capsule network. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pp. 98–105. Cited by: §2.
-  (2018) Identifying and categorising profane words in hate speech. In Proceedings of the 2nd International Conference on Compute and Data Analysis, pp. 65–69. Cited by: §2.
-  (2009) Nudge: improving decisions about health, wealth, and happiness. Penguin. Cited by: §10.
-  (2016) Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop, pp. 88–93. Cited by: §2.
-  (2015) On a modified degroot-friedkin model of opinion dynamics. In 2015 American Control Conference (ACC), pp. 1047–1052. Cited by: §6.
-  (2018) What is gab: a bastion of free speech or an alt-right echo chamber. In Companion of the The Web Conference 2018 on The Web Conference 2018, pp. 1007–1014. Cited by: §1, §3.
-  (2015) Who influenced you? predicting retweet via social influence locality. ACM Transactions on Knowledge Discovery from Data (TKDD) 9 (3), pp. 25. Cited by: §9.
-  (2018) Detecting hate speech on twitter using a convolution-gru based deep neural network. In European Semantic Web Conference, pp. 745–760. Cited by: §1, §2.
-  (2005) US domestic extremist groups on the web: link and content analysis. IEEE intelligent systems 20 (5), pp. 44–51. Cited by: §2.