You too Brutus! Trapping Hateful Users in Social Media: Challenges, Solutions Insights

08/01/2021 ∙ by Mithun Das, et al. ∙ IIT Kharagpur 0

Hate speech is regarded as one of the crucial issues plaguing the online social media. The current literature on hate speech detection leverages primarily the textual content to find hateful posts and subsequently identify hateful users. However, this methodology disregards the social connections between users. In this paper, we run a detailed exploration of the problem space and investigate an array of models ranging from purely textual to graph based to finally semi-supervised techniques using Graph Neural Networks (GNN) that utilize both textual and graph-based features. We run exhaustive experiments on two datasets – Gab, which is loosely moderated and Twitter, which is strictly moderated. Overall the AGNN model achieves 0.791 macro F1-score on the Gab dataset and 0.780 macro F1-score on the Twitter dataset using only 5 other models including the fully supervised ones. We perform detailed error analysis on the best performing text and graph based models and observe that hateful users have unique network neighborhood signatures and the AGNN model benefits by paying attention to these signatures. This property, as we observe, also allows the model to generalize well across platforms in a zero-shot setting. Lastly, we utilize the best performing GNN model to analyze the evolution of hateful users and their targets over time in Gab.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Hate speech is regarded as one of the major issues plaguing the online social media, necessitating its detection as a crucial task. Hate speech detection itself is a challenging task with literature including techniques such as dictionary-based (Guermazi et al., 2007), distributional semantics (Djuric et al., 2015), multi-feature (Salminen et al., 2018) and neural networks (Badjatiya et al., 2017).

The majority of these systems rely solely on or mostly on the textual content of the post. However, hate speech is highly subjective, reliant on temporal, social and historical context, and occurs sparsely (Schmidt and Wiegand, 2017). Hate speech is often crafted in a subtle manner and cannot be precisely identified solely using text-based features. Furthermore, hate speech detection from text is a closed-loop problem (MacAvaney et al., 2019). Since the hateful users are conscious of the detection systems, they will try to include deliberate spelling errors or code names of the targets to spread hate speech. Hence, a better way will be to consider the users’ profile, interests, posting behaviour, and people they are linked to in the online social networks (OSNs). Recently, there have been efforts to leverage these user information to increase the performance (Qian et al., 2018) but a critical analysis of such network based methods is lacking in the literature. In this work, we explore several graph based methods to detect hateful users in online social networks while focusing on the challenges present while doing so. We further provide insights and observations based on the analysis and provide some suggestions about the future research directions.

Challenges: While the motivation to escalate the detection problem from the post level to the user level looks apparent, one has to note that it is an extremely tedious task to get large scale annotated data at the user level. This is since the annotators now would need to look through all/a majority of the posts of a user to designate a user as hateful. Therefore the option of adapting existing popular supervised text-based models to detect hateful users is ruled out since these are data hungry and would typically not perform well in low data settings.

Solution: Recently an array of research have tried to use graph based algorithms; however, most of these do not perform an extensive comparison of supervised algorithms (Mishra et al., 2018). In this paper we attempt to bridge this gap by presenting a rigorous evaluation of the graph algorithms, through few shot and cross platform test scenarios. We utilize several text and graph based methods for identifying hateful users. These include LSTM, doc2vec, BERT, node2vec, DeepWalk, GCN, GraphSAGE etc. We test these models on two different datasets (Twitter and Gab). To test the model’s capability to work with lesser data, we test the model’s performance on 5%, 10%, 15%, 20%, 50%, and 80% of the training data. In Gab dataset, users are nodes and the edges represent the follower-followee relationship. In Twitter, users are nodes and the edges represent the retweet information. We thus experiment on two types of network and report the performance.

Our key contributions are noted below.

  • We create a dataset of 423 hateful users and 375 non-hateful users from the social media platform Gab where each user was adjudged to be hateful or not based on the profile information. For Twitter, we make use of the data already made available by Ribeiro et al. (2018). We have made the Gab dataset along with all the codes public for advancing research in hate speech333https://github.com/hate-alert/Hateful-users-detection.

  • We explore several supervised, unsupervised and semi supervised machine learning models, including the state-of-the-art deep learning models, to classify users as hateful and non-hateful.

  • We perform detailed error analysis on the best performing text based and graph based model.

  • We apply the trained model to label the entire Gab dataset and perform a post-facto analysis of the evolution of hatred across certain target communities.

The most important observations that we make are as follows.

  • We observe that semi-supervised approaches using GNN that leverage both textual features and social connections between users significantly outperform other models. For the Gab dataset, the best GNN model achieves a macro F1-Score of 0.791 and for the Twitter dataset, it achieves a macro F1-Score of 0.780 with only 5% of the labelled instances. The performance is comparable with other supervised machine learning classifiers like LSTM and doc2vec, that use the entire set of labelled instances for training.

  • On cross-platform zero shot evaluation, we observe that AGNN performs better as compared to the best text-based models. The attention on the neighborhood structure (learnt from one dataset) seems to capture the signature characteristic of the users and constitutes as one of the most effective ingredient for predictions given an unseen dataset.

  • A detailed error analysis reveals that when words from the hate lexicon are infrequent in a user’s post, standard supervised models (e.g., doc2vec+LR) are unable to detect if the user is hateful; in contrast, graph based semi-supervised models (e.g., AGNN) are able to detect such a user correctly by making use of the hateful influential nodes (read users) in the neighbourhood of that user.

  • Post-facto analysis of the machine-labelled Gab data shows that ethnic and religious groups like Blacks, Jews and Muslims face an ever increasing hatred from the community; i.e., as the site is getting older, so is the hatred blowing up against these communities.

2. Related work

Hate speech detection: Hate speech is a complex phenomenon, intrinsically associated to relationships among groups, and also relying on linguistic nuances (Fortuna and Nunes, 2018). The public expression of such hate speech has been shown to devalue members of the minority community (Greenberg and Pyszczynski, 1985). To tackle this issue, researchers have developed methods to detect such hateful content. Some of the initial works of detection of hate speech relied on lexicons (Warner and Hirschberg, 2012; Gitari et al., 2015). Liu and Forss (2015) incorporated LDA topic modelling for improving the performance of the hate speech detection task. Saleem et al. (2017) proposed an approach to detect hateful speech using self-identifying hate communities as training data for hate speech classifiers thus bypassing the expensive annotation process.

Deep learning approaches: Recently, larger datasets for hate speech detection have been made available (Davidson et al., 2017; Founta et al., 2018; de Gibert et al., 2018). Most of these datasets have hate speech class as the minority. Researchers have also started using deep learning methods (Zhang et al., 2018) and graph embedding techniques (Ribeiro et al., 2018) to detect hate speech. Badjatiya et al. (2017) applied several deep learning architectures and improved the benchmark score by 18 F1 points. Zhang et al. (2018) used novel deep neural models to improve the results on 6 out of 7 datasets.

Hate speech at user level: While most of the computational approaches focus on detecting if a given text contains hate speech, very few works focus on detecting this at the user level. Detecting hate speech at user level will allow algorithms to use additional dimensions such as user activity and connection which could help in improving the classifier performance (Ribeiro et al., 2018). Qian et al. (2018) proposes a model that learns intra-user and inter-user representations for hate speech detection.

In our paper, we explore an array of methods to detect hateful accounts on Gab and Twitter. The work by Ribeiro et al. (2018) is closest to ours and hence we describe it briefly here. Ribeiro et al. (2018) builds a Twitter retweet graph and uses a graph embedding approach to detect hateful users. They collect and annotate around 5K users on Twitter and characterize their Twitter accounts. The authors then employ a node embedding algorithm (GraphSAGE (Hamilton et al., 2017)), which exploits the graph structure and show that it outperforms content-based approaches for hateful user detection. Our work, on the other hand, explores the performance of supervised/semi-supervised models that use both the content and the network structure to detect hateful users in Gab and Twitter. The uniqueness of our models lies in the use of very small number of labelled instances. Further we add two more novelties - (i) an extensive error analysis which shows how the neighborhood characteristic of hateful users benefits GNN models and (ii) predictions in the zero-shot setting.

3. Methodology

In this section, we present a suite of models for hateful user detection. These range from text based approaches to graph based approaches to finally text+graph based approaches. The methods are enumerated below. This pipeline of efforts portrays a hitherto unreported complete picture of how the different online attributes can be effective in determining hateful users.

Figure 1.

Schematic representation of a GNN classifier (in this case GCN) which takes in as input the social network of the users and the corresponding feature embeddings of each user. After applying “n” GCN filters on the input data, we obtain the output i.e, the probability of the user being hateful or not.

Figure 2. We show the distribution of the belief scores of the users. The dashed lines in the diagram correspond to the centroids of “low”, “medium” and “high” hate intensity.

3.1. Text-based classification

Here the idea is to classify users as hateful and non-hateful solely based on the content of their posts. We apply standard pre-processing techniques on the post to remove urls, mentions, hashtags, emoticons and other stray characters. Finally, we concatenate all the posts of an user into a single paragraph/document and provide it as an input to the classifiers. We experiment with the following text-classifiers.

fastText: We leverage the pre-trained fastText embedding by Grave et al. (Grave et al., 2018)

which has been trained on Common Crawl and Wikipedia. Each user’s document has been represented as a 100 dimensional vector. We use logistic regression as the classifier.

Glove: We use the pre-trained Glove embeddings (Pennington et al., 2014) which has been trained on 2B tweets, to represent each word as a 100 dimensional vector. We then represent a user as the mean of all the Glove embeddings of all the words in his/her posts. We use logistic regression as the classifier.

LSTM: We experiment with standard LSTM (Hochreiter and Schmidhuber, 1997)

and neural models with random embeddings, applied over the user document. The loss is set to binary cross-entropy. The model is run for 10 epochs with Adam optimizer using default parameters of keras

444https://bit.ly/2zBs87Z.

Doc2vec: We apply the doc2vec model (Le and Mikolov, 2014) on the user document to generate a default 100 dimensional document embedding for each user. We use logistic regression as the classifier. We use the default hyper-parameters of the doc2vec implementation available in gensim (Rehurek and Sojka, 2010).

BERT: In order to see if contextual embeddings can better represent a user, we fine-tuned the already pre-trained BERT model. For finetuning BERT we follow a setup similar to Doc-BERT (Adhikari et al., 2019). For each user in the labelled set, we combine all the post of that user and consider it as a document to be used in the training of the model. While tokenizing a document we consider the first 512 tokens which is the limit of the input that can be passed to BERT. We train the BERT model for 10 epochs with a default learning rate of 2e-5 and store the results at best validation score.

TSVM

: We use the Transductive Support Vector Machines (TSVM) model 

(Joachims, 1999) for semi-supervised classification. Applying doc2vec (Le and Mikolov, 2014) on the user document yields the user feature vector. TSVM aims to learn the manifold between the hateful and non-hateful users by leveraging the feature vectors of both the labelled and unlabelled users. We use the default hyper-parameters of the implementation555https://bit.ly/3iviua2.

3.2. Network embeddings

Research on hate speech in social media has revealed that hateful users are densely connected in the network (Mathew et al., 2020). They exhibit a strong degree of homophily and have high reciprocity values (Ribeiro et al., 2018; Mathew et al., 2019a). Consequently the network structure might provide additional insights for detecting hateful users. Network or node embeddings enable us to project the nodes to a lower-dimensional latent embedding space, while preserving its network characteristics and have been instrumental in node classification (Cui et al., 2019). We explore the embeddings generated by DeepWalk and Node2vec in this work.

DeepWalk: DeepWalk (Perozzi et al., 2014) performs random walk on the network to generate a sequence of nodes. The sequence of nodes can be imagined to be a sentence with each node representing a word. Applying word2vec on these simulated sentences generates an embedding for each node. We apply DeepWalk (Perozzi et al., 2014) on the network to learn a 128 dimensional embedding for each node. We apply logistic regression on the learned representation of the nodes to classify the users. The hyper-parameters are 10 random walks per node with a walk-length of 80, and a window-size of 10.

Node2vec: The Node2vec algorithm learns low-dimensional representation for nodes in a graph that maximizes the likelihood of preserving network neighborhoods of nodes. We apply Node2vec (Grover and Leskovec, 2016) on the network to learn a 128 dimensional embedding for each node. We use the default hyper-parameters to generate the node embedding. We apply logistic regression as a classifier.

3.3. Graph neural networks (GNN)

Node embeddings are typically shallow encoders; they do not allow parameter sharing nor do they incorporate node features (Hamilton et al., 2017). To overcome these limitations, we have employed Graph Neural Networks (GNN). GNN operates on a graph and can be envisioned as a neural architecture with one or more hidden layers . The successive layer can be represented as the output of an activation function which takes the current hidden layer and the adjacency matrix .

(1)

and , where is the input feature matrix, is the final feature matrix (in node classification, is the predicted label for each node) and is the number of layers. GNN variants modify the activation function. We use the following variants of GNNs in our experiments.

ChebNet: The GNN model by Defferrard et al. (Defferrard et al., 2016)

approximates the spectral convolution filter using ChebNetshev polynomials of the diagonal matrix of eigenvalues on the input graph. The applied filters are spatially localized and thus can extract features of a node’s neighbours independent of the graph size.

GCN: The GCN model by Kifp et al. (Kipf and Welling, 2017) uses a localized order approximation of ChebNet with certain re-normalization tricks to enhance performance.

GraphSAGE: The GraphSAGE model by Hamilton et al. (Hamilton et al., 2017) performs spatial convolution operations on the graph by sampling a fixed number of neighbours for a node and aggregating its features.

AGNN: The AGNN model by Thekumparampil et al. (2018) applies an attention-based mechanism over a linear propagation function to aggregate neighbourhood information. Simply put it assigns varying degrees of importance to a node’s neighbours, instead of treating all neighbours uniformly.

ARMA: The ARMA model by Bianchi et al. (2019) utilizes auto regressive moving average (ARMA) filters instead of polynomial filters to perform convolution operations.

GAT: The GAT model (Veličković et al., 2018) leverage masked self-attentional layers to assign different importance to nodes of a same neighborhood, enabling a leap in model capacity.

Popular GNN models like GCN and GraphSAGE have been frequently used in the literature for node classification (Morris et al., 2018) and hence they been employed in this work. In fact, the GraphSAGE model was leveraged by Ribeiro et al.(2018) to classify hateful users in Twitter and has the highest reported performance amongst all the supervised frameworks. We also experiment with the recent AGNN, ARMA and GAT model in this study since they have been shown to outperform the GCN model on the node classification task across several datasets (Thekumparampil et al., 2018; Bianchi et al., 2019).

3.4. GNN + text setup

We adopt the same setup for the different variants of GNN. The graph is the followership network for Gab and retweet network for Twitter. In order to take advantage of the text content, the input feature to the model is set to a 100 dimensional word embedding obtained by applying doc2vec on the user document666We choose doc2vec as the input feature embedding since it outperforms other embeddings like fasttext and GloVe in the supervised setting as shown in Table 1.. The first filter (Conv1) performs convolution on the input feature vector and produces a 32 dimensional feature vector while the second filter (Conv2) further reduces it to a 2 dimensional feature vector (

). A ReLu layer is added between Conv1 and Conv2 for non-linearity. We pass the final feature vector through a log-softmax layer with negative log-likelihood loss.

This gives the probability of whether the user is hateful or not. The model is run for 200 epochs with Adam optimizer, learning rate = , weight decay = and dropout = 0.2. In Figure 1 we have shown the schematic representation of a GNN classifier.

4. Dataset

For our experiments we have used the Gab (Mathew et al., 2019a) and the Twitter (Ribeiro et al., 2018) datasets. Gab is a social media platform which promotes itself as a “Champion of free speech”, but has been criticized for being an echo-chamber for alt-right users (Zannettou et al., 2018). The site is very similar to Twitter in terms of creating posts and following others, but has loose moderation policy. Consequently, it is possible to retrieve hateful posts of users, which would have otherwise been difficult on any other platform.

Twitter, on the other hand, is a much more mainstream social media platform with stricter moderation policies. Consequently, Twitter has a significantly larger proportion of non-hateful users and closely mimics real-world distribution.

4.1. Gab data

Data sampling: We have used the Gab dataset  (Mathew et al., 2019a) which comprises 381K users as well as their posts and their followership network. To ensure sufficient representation of hateful and non-hateful users, we used the sampling strategy from Mathew et al. (2020). In this sampling strategy, a lexicon of 45 high-precision hate terms are used777Lexicon available here: https://goo.gl/8iHTDP (like ‘kike’, ‘ni*ger’) to identify hateful posts. An initial seed-set of 2,769 hateful users was created considering the users who have posted at least 10 such posts. Then a repost network was created where nodes represent users and edge-weights denote the reposting frequency. This repost network is then converted to its corresponding belief network by reversing the edges and normalizing the edge-weights between 0 and 1, as outlined in (Mathew et al., 2019a). Afterwards, belief score of each user was computed using a diffusion model (Golub and Jackson, 2010). An initial belief score of 1 was assigned to the hateful users and 0 to the others. Final belief values of all the users in the network were assigned after five iterations of the diffusion process. The users are then clustered on the basis of this score using -means algorithm into three tiers – “high”, “medium” and “low”

using a 1D KNN. We show the distribution of the belief scores along with the three tiers in Figure 

2. The three tiers allow us to have better control on selecting the number of hateful and non-hateful users for the annotations, which would not be possible with random sampling.

Annotation Guidelines: To annotate the users as hateful we followed the hate speech definition proposed by ElSherief et al. (2018)

). Annotators were asked to go through all the posts of a user (rather than considering only isolated derogatory words) and use them to estimate if the account is hateful. In specific, we asked the annotators to judge if a Gab user endorses content that is

humiliating, attacking or insulting, some groups or individuals based on their race, ethnicity, national origin, religion, sex, gender, sexual orientation, disability or disease. The labelling of users was carried out by two PhD students who have extensive previous experience in hate speech annotation task. We also kept the user’s information (e.g. user name, follower/followee etc.) hidden from the annotator to maintain the privacy of the users.

Gold labels for training and evaluation: We randomly sample 300 users from each of these three tiers with the additional constraint that the user must have posted at least 10 times. Using the annotation guidelines, the annotators manually annotate888Dubious cases which arose as a result of conflict were dropped. each of these 900 users as hateful or non-hateful. We achieved an inter-annotator agreement of 0.772 using Cohen score. On completion of the annotation and after dropping the dubious cases, the number of hateful and non-hateful users in the “low”, “medium” and “high” tiers were , and , respectively. This yields a final count of 423 hateful and 375 non-hateful users and constitutes our set of a total of 798 labelled instances.

Followership network to train GNNs: We then construct a 1.5-degree network of these labeled users, which consists of their immediate followers, followings and connections among themselves. The nodes in the network represent the user accounts and the edges represent following relationship. A directed edge from user to user means that follows . We filter the graph further by removing users with less than 10 posts. The filtered graph has 47K users and 13.8M edges and this constitutes the network required to train the GNNs for the Gab data.

Ethical considerations: We only analyzed publicly available data. We followed standard ethical guidelines (Rivers and Lewis, 2014), not making any attempts to track users across sites or deanonymize them. Also, taking into account user privacy, we anonymized the users information such as user name, user id etc.

4.2. Twitter data

We also experiment with the publicly available Twitter dataset (Ribeiro et al., 2018) which followed a similar procedure to collect and label the users as hateful or non-hateful. The authors sampled and labelled 544 users as hateful and 4,427 users as non-hateful. Here the network is a retweet graph (as opposed to a followership graph) consisting of 100K nodes and 2.28M edges. A retweet graph is a directed graph where each node represents a user in Twitter, and each edge indicates that the user has retweeted . Since the followership information is not present for this dataset we use the retweet network as a proxy as has also been assumed by Ribeiro et al. (2018).

5. Experiments and results

5.1. Experimental setup

To evaluate our models, we use -fold stratified cross validation which can be beneficial in evaluating models having less labelled (Yadav and Shukla, 2016). We set to here and for each fold, use up to 80% dataset for training and rest 20% for testing. Further, to simulate resource constrained setting, we take % data out of the data available in each fold, for training. We report the average performance of a model across -folds by varying as & . The same 20% is always held out across all models for testing so that the comparison is fair.

Gab Twitter
Method Inputs 5% 10% 15% 20% 50% 80% 5% 10% 15% 20% 50% 80%
fastText 0.492 0.537 0.571 0.603 0.690 0.709 0.624 0.634 0.648 0.651 0.670 0.676
Glove 0.695 0.720 0.745 0.750 0.778 0.784 0.650 0.666 0.674 0.681 0.691 0.695
LSTM 0.579 0.600 0.605 0.608 0.622 0.645 0.514 0.487 0.567 0.564 0.592 0.608
Doc2vec 0.733 0.767 0.783 0.779 0.779 0.781 0.715 0.715 0.719 0.729 0.749 0.758
BERT 0.631 0.660 0.682 0.701 0.740 0.764 0.603 0.665 0.690 0.709 0.729 0.740
TSVM 0.686 0.704 0.712 0.712 0.739 0.753 0.480 0.520 0.533 0.533 0.585 0.611
DeepWalk 0.652 0.676 0.700 0.713 0.723 0.734 0.757 0.764 0.767 0.767 0.773 0.779
Node2vec 0.647 0.672 0.695 0.704 0.725 0.744 0.692 0.720 0.732 0.734 0.749 0.748
GraphSAGE 0.778 0.808 0.806 0.811 0.827 0.828 0.762 0.773 0.774 0.780 0.782 0.777
GCN 0.721 0.735 0.730 0.738 0.751 0.758 0.756 0.759 0.767 0.773 0.776 0.770
AGNN 0.791 0.796 0.818 0.824 0.830 0.833 0.780 0.785 0.785 0.790 0.786 0.787
ARMA 0.765 0.778 0.783 0.797 0.809 0.805 0.757 0.760 0.761 0.762 0.770 0.769
ChebNet 0.778 0.802 0.796 0.798 0.805 0.812 0.746 0.750 0.754 0.762 0.761 0.766
GAT 0.683 0.718 0.725 0.726 0.745 0.758 0.757 0.774 0.781 0.777 0.787 0.782
Table 1. Performance of different models for classifying users on Gab and Twitter into hateful and non-hateful based on the mean macro F1-Score. The column x% means that x% of labelled instances were used for training. represents the feature vectors of only the labelled users, represents the feature vectors of all users, denotes the user-labels, represents the network. We perform 5-fold cross validation and note down the macro F1-Score values across the 5-folds in terms of mean. The AGNN model outperforms the other models for almost all values. The best performance is marked in bold and the second best is underlined.

5.2. Observations and insights

5.2.1. Observations

We illustrate the performance of the different hateful user detection models on the Gab and the Twitter data in terms of macro F1-score999We experimented with other metrics such as accuracy and observed a similar trend for both datasets. We leverage the macro F1 score to account for the high class imbalance in Twitter. in Table 1.

GNNs vs text classifiers: We observe that GNNs which combine both textual and network features exhibit an improved performance over the individual text based classifiers and the network embeddings. Amongst the GNNs, AGNN almost always boasts of the highest performance, both in terms of accuracy and macro F1-score across different amounts of labelled instances.

GNNs performance: The attention mechanism which assigns varying importance to the nodes’ neighbours accounts for the improved performance of AGNN model over other GCN model. It is to be noted that the followership/retweet network is orders of magnitudes larger101010

The Gab followership network and the Twitter retweet network have 13.8M and 2.28M edges as opposed to Cora (

https://relational.fit.cvut.cz/dataset/CORA) and Pubmed (https://linqs.soe.ucsc.edu/data) which have only 5K and 44K edges, respectively. than the conventional networks on which experimental results have been reported earlier and hence first order approximation may be inadequate in this setting. A similar argument holds for the improved performance of AGNN over GraphSAGE. GraphSAGE (Hamilton et al., 2017) essentially performs a linear approximation of a localized spectral convolution and is similar to the GCN model by Morris et al. (Morris et al., 2018) barring a normalization constant.

Text classifier performance: We observe that the Doc2vec model performs reasonably well in a supervised setting, particularly with small amount of training data. However, by including the unlabelled instances, we notice a drop in performance of TSVM model with doc2vec feature vector.

On the other hand, models which fine-tune the parameters based on the available training data, such as LSTM, BERT, DeepWalk, node2vec and GNNs show an improved performance with the increase in amount of labelled instances. Nevertheless, the ability of GNN models (AGNN) to achieve 0.79 macro F1-score with only 5% labelled instances for Gab and 0.78 macro F1-score for Twitter justifies the use of GNN for detecting hateful users.

5.2.2. Insights

We aimed at comparing different classifiers to detect hateful users. Based on the results, it seems that GNN (especially AGNN) can leverage the network and the text features to improve the performance of this task. In order to understand why this works, we perform a detailed error analysis on the best model using text based features, i.e., doc2vec+LR and the best model using text with network features, i.e., AGNN. In particular, we focus on two situations – (i) where AGNN predicts the hateful users correctly but doc2vec+LR fails (AGNN wins), (ii) doc2vec+LR predicts correctly but AGNN fails (doc2vec wins). For this purpose, we sampled at most 20 users for each situations from the predictions obtained in the Twitter and the GAB networks. For interpretation of the results of doc2vec+LR we use the LIME explainer (Ribeiro et al., 2016) to get the top 10 important words used for prediction. Similarly, for GNN, we use GNN explainer (Ying et al., 2019) which is used to return the top 10 most influential nodes (read users). We further annotate these users manually into hate and non hate class. We note our observations below.

AGNN wins: We first consider the cases where the AGNN predicts correctly but the doc2vec does not. As a first step, from the posts of each user, we find the % of posts having words from the hate lexicon (HL posts) (Mathew et al., 2020). This was found to be  2%. In contrast, the cases where doc2vec is typically successful, the proportion of HL posts was  5%. Hence, it can be speculated that the representation by doc2vec cannot capture the hate dimension properly. Manually checking the top 10 words from the posts of each user, reveals that it is capturing wrong features from the users posts. For example, some such words are ‘fam’, ‘girlfriend’, ‘earthnext’, ‘pack’ etc. On the other hand, AGNN is able to make correct predictions as the user (to be classified) has several hateful neighbors in its vicinity. On average for the user being classified, the 10 influential nodes returned by the GNN explainer have 7 of them hateful. The attention on this neighborhood allows the model to learn a signature characteristic of the hateful users which makes the predictions successful.

Doc2vec wins: Next we consider the cases where doc2vec predicts correctly. We explore the top 10 influential nodes for these users as returned by the GNN explainer. On average 6 of these nodes (read users) are non hateful. The dominance of the non hateful users in the neighbourhood of these users actually hampers the predictions made by the AGNN models. This highlights the fact that GNN based classification will be less beneficial while detecting isolated hateful nodes (read user) in the network. Since such models do not only rely on the textual representation of an user but also on its neighbourhood, the natural expectation is that homophily would be prevalent in the network.

Contribution of the network: In order to understand the role of network properties in the classification task, we perform an additional experiment on the Gab data (since we have the actual follower-followee network). In this setup, we take a subset of test set with only the hate users. We use the AGNN model to predict the class on this subset. However, instead of using the doc2vec embeddings of the hate user, we average the doc2vec embeddings of the non hate users from the test set and provide it as the input user embeddings to the AGNN model. If the AGNN model is more reliant on the doc2vec embedding, then one would expect to see a lot of misclassification on the hate users test set. Please note that the network part of the hate user remains the same and only the doc2vec user embedding of the test set user is changed.

The results seem to suggest that just using the network information the AGNN is able to (re)produce 51% of all the correct hate class predictions as would have been obtained by using the actual hate user embeddings. We perform the same experiment with the non hate users and change their emdedding to the average doc2vec embeddings of the hateful users. In this case, AGNN is able to produce the correct class only for 7% of the non hateful users out of all that would have been produced if the actual non hateful user embeddings were used. This shows that the hateful users have a discriminative neighborhood structure and the AGNN model benefits by attending to this structure.

Methods Train Test F1 F1 () P () R ()
AGNN Twitter Gab 0.75 0.81 0.71 0.94
Doc2vec 0.77 0.79 0.76 0.77
AGNN Gab Twitter 0.74 0.54 0.58 0.50
Doc2vec 0.58 0.31 0.22 0.45
Table 2. Results for zero-shot cross platform evaluation. : Hate class, P: Precision, R: Recall.

6. Cross platform evaluation

Having observed the superior performance of the GNN classifiers for both the Gab and Twitter data individually, we experiment whether such models are generalizable across platforms. In particular, we train the best performing GNN classifier on one particular dataset, say GAB, and measure its performance on the other, i.e., Twitter in a zero-shot setting. We also compare these results with the best performing text based model which is doc2vec+LR. In Table 2, we observe that both the GNN and doc2vec classifiers have a F1-score of around  0.8 which tells us that the textual features learnt from user profiles in the Twitter dataset are generalizable enough. The network provides a slight benefit to the overall performance especially for the hate class. The contribution of the network is more visible when we use the Gab dataset for training. The doc2vec+LR trained on the user profiles in the Gab network performs badly when evaluated on user profiles from Twitter. In this case, the network based system seems to perform very well. This is since users in Twitter seem to use offensive words in a non-hateful sense also and hence the number of posts having words from the hate lexicon in the posts of hate and non hate users is on average. So doc2vec representation alone cannot detect the hate users. In this case, the network neighborhood structure of hateful users learnt from the Gab network seem to be quite useful for the detection.

Figure 3. Notable target communities.
Figure 4. Distribution of hateful posts made by the identified hateful users of a month to the three target communities – Blacks, Muslims and Jews.
Figure 5. Distribution of the top three joint target communities.
Figure 6. Distribution of the notable joint targeted communities, i.e., ‘Jews-Blacks’, ‘Muslims-Blacks’ and ‘Jews-Muslims’ per month based on the number of hateful users who targeted these joint communities.

7. Post facto analysis

In this section, we investigate the evolution of hateful users in Gab over time.

The precise reasons for choosing the Gab dataset for this analysis are - (i) availability of the full longitudinal data including the temporal snapshots of the followership network and (ii) loose moderation policies of the platform that enables the use of high precision keywords for obtaining reasonable results, which is not true for Twitter.

We divide our entire dataset into 21 snapshots ranging from October 2016 to June 2018 by following the snapshot generation mechanism explained in (Mathew et al., 2019b)

. They utilized a heuristic

(Meeder et al., 2011) which allows to get a lower bound on the following link creation date. For each user, we note his/her followers and followings, as well as her posts on a monthly basis. We again impose the constraint that each user in the network has posted at least 10 times over her/his whole account age, to ensure that the user is sufficiently represented through the posts. Each monthly snapshot consists of users with all his/her posts and the set of followers and followees till that particular month.

We take the best-performing AGNN model trained on the entire Gab data and use it to label the users present in each snapshot as hateful or not. Once a user is labelled hateful, the user is permanently marked as hateful for the subsequent months, since the subsequent months also carry information of the current month111111There might be some users who could possibly become non-hateful from hateful but this hypothesis can be safely ruled out for a platform like Gab.. We randomly select 10 hateful users predicted by the model and manually validate them, and find 8 of them turns out to be indeed hateful. We attempt to answer the following research questions using the machine-labelled data.

  • What are the target communities of these hateful users?

  • What is the distribution of hateful users/posts targeting toward a community?

To answer these questions we leverage the set of high-precision lexicon obtained from Mathew et al. (Mathew et al., 2019b) where each keyword is a derogatory slur. These keywords are then categorized121212https://bit.ly/3BmJqBk into different communities that they target. The categories are assigned through manual inspection by the authors and in consultation with the urban dictionary131313https://www.urbandictionary.com and hatebase141414https://hatebase.org. For example, “n*gger”, “coon”, and “porch-monkey” are all derogatory terms used to describe blacks. We say a hateful user targets a particular community if any of his/her posts mentions any of the aforementioned keywords associated with the community. We observe that 80% of the hateful users have used at least one of the keywords151515We note that just by using the lexicon on the entire set of users might give false positives..

The predominant targets: In Figure 3, we compute the distribution of targets of the hateful users using the lexicon. We observe that ‘Jews’, ‘Muslims’ and ‘Blacks’ are the most prominent targets. We thus restrict our analysis to these three communities.

The rise and rise of hatred: In Figure 4, we plot the distribution of the number of hateful posts against a particular target community. We observe a rise in the gross number of posts over time, highlighting that hatred is on the rise (Mathew et al., 2019b) as the website grows older. We observe that these three communities are almost equally targeted till July 2017; afterwards ‘Jews’ and ‘Blacks’ become slightly more prominent targets161616We looked at the user distribution as well and it was similar to the post distribution..

Joint targets: To probe deeper, we observe users who target multiple communities. We state a given hateful user has targeted multiple communities if he/she uses keywords belonging to more than one community. This could be within a single post or different posts as well. We observe that a large fraction of users attack all the three communities in their posts. Figure 5 shows the overall distribution of the hateful users targeting one or multiple communities. These categories are mutually exclusive, such that if one user targets both ‘Jews’ and ‘Muslims’ (‘Jews-Muslims’), she is not counted in the respective sub-communities. Figure 6 plots the temporal distribution of multi-community hatred. ‘Blacks-Jews’ are the most targeted communities, followed by ‘Muslims-Blacks’ and ‘Jews-Muslims’.

Trending hashtags: Like Twitter, in Gab also hastags can indicate the topics being discussed by the users. We attempt to find the hashtags among the posts of the hateful users which could help us to understand what kind of hate speech is being spread by the hateful users and if these are correlated to certain offline events. We collect the hashtags and their frequency for each of the 21 snapshots. Next we find the trending hashtags for each month by finding the hashtags which are frequent in the current month but have been infrequent in the previous month. In December 2016 some of the trending hashtags that we observe are #BanIslam, #StopWhiteGenocide which might have originated in response to the Berlin Attack171717https://en.wikipedia.org/wiki/2016_Berlin_truck_attack that took place in December 2016 and was a part of Islamic terrorism. In January 2017, we observe a lot of incitement around the Chicago torture event done181818https://en.wikipedia.org/wiki/2017_Chicago_torture_incident that was due to a few black individuals. Previously we observed a jump in the hate speech against ‘Jews’ in the month of August 2017; upon investigation we find hashtags #UniteTheRight and #Charlottesville which can be linked to the Unite the Right Rally191919https://en.wikipedia.org/wiki/Unite_the_Right_rally. The hate users even react to the cultural events by the target community like #BlackHistoryMonth and call the ‘Blacks’ as violent. Another interesting observation is the support among hate users for Tommy Robinson202020https://en.wikipedia.org/wiki/Tommy_Robinson_(activist), an anti-Islamic activist when he was arrested in the month of June 2018 by making the hashtag #FreeTommyRobinson trending.

8. Limitations and future Work

There are a few limitations of our work. Though, AGNN model performs well by making use of both the textual and network features for classifying a user as hateful, one of the limitations of this model is it can fail for the cases where the hateful nodes have more non-hate nodes in their neighbourhood or vice-versa. Further, if a hateful user is less connected to the hateful network, network features cannot be utilized for classifying that user. Also, to find out the hateful users targeting a particular community, we have used the high-precision lexicon where each keyword is a derogatory slur. Using this method it is possible to miss one or more target communities if no derogatory slurs are used by any of the hateful users to refer to these communities.

As part of the future work, we plan to study the users who are less connected to other hateful users, and identify techniques to detect them. One can link such users to the other hate users based on the target they hate. Hence, we plan to develop a model which will not only detect hateful users but also detect the target communities of these hateful users. Another direction could be user based monitoring and possibly red alerting potential hateful users.

9. Conclusions

In this work, we detected hateful users on Gab and Twitter dataset using supervised and semi-supervised machine learning models. GNNs that exploit both the textual features and social connections of the users significantly outperform other models; the best model achieves macro F1-score of 0.791 on Gab and 0.780 on Twitter using only 5% of labeled data. In order to understand the models further, we performed a detailed error analysis on doc2vec and AGNN which are the best performing models using text and text+network features, respectively. We found that doc2vec usually does not perform well when the number of hateful words in the users’ post is low. In such cases the neighbourhood of a user helps the AGNN model to make correct predictions. We also notice that structural signatures learnt from a network are transferable in a zero shot setting to an unseen dataset. We perform an extensive post-facto analysis to identify how hateful posts and hateful users target different communities.

References

  • A. Adhikari, A. Ram, R. Tang, and J. Lin (2019) DocBERT: bert for document classification. External Links: 1904.08398 Cited by: §3.1.
  • P. Badjatiya, S. Gupta, M. Gupta, and V. Varma (2017) Deep learning for hate speech detection in tweets. WWW, pp. 759–760. Cited by: §1, §2.
  • F. M. Bianchi, D. Grattarola, L. Livi, and C. Alippi (2019) Graph neural networks with convolutional arma filters. arXiv preprint arXiv:1901.01343. Cited by: §3.3, §3.3.
  • P. Cui, X. Wang, J. Pei, and W. Zhu (2019) A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering 31 (5), pp. 833–852. External Links: Document, ISSN 1558-2191 Cited by: §3.2.
  • T. Davidson, D. Warmsley, M. Macy, and I. Weber (2017) Automated hate speech detection and the problem of offensive language. In Eleventh International AAAI Conference on Web and Social Media, Cited by: §2.
  • O. de Gibert, N. Perez, A. G. Pablos, and M. Cuadros (2018) Hate speech dataset from a white supremacy forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pp. 11–20. Cited by: §2.
  • M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, USA, pp. 3844–3852. External Links: ISBN 978-1-5108-3881-9, Link Cited by: §3.3.
  • N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati (2015) Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web, pp. 29–30. Cited by: §1.
  • M. ElSherief, V. Kulkarni, D. Nguyen, W. Y. Wang, and E. Belding (2018) Hate lingo: a target-based linguistic analysis of hate speech in social media. ICWSM ’18. Cited by: §4.1.
  • P. Fortuna and S. Nunes (2018) A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR) 51 (4), pp. 85. Cited by: §2.
  • A. M. Founta, C. Djouvas, D. Chatzakou, I. Leontiadis, J. Blackburn, G. Stringhini, A. Vakali, M. Sirivianos, and N. Kourtellis (2018) Large scale crowdsourcing and characterization of twitter abusive behavior. In Twelfth International AAAI Conference on Web and Social Media, Cited by: §2.
  • N. D. Gitari, Z. Zuping, H. Damien, and J. Long (2015) A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10 (4), pp. 215–230. Cited by: §2.
  • B. Golub and M. O. Jackson (2010) Naive learning in social networks and the wisdom of crowds. American Economic Journal: Microeconomics 2 (1), pp. 112–49. Cited by: §4.1.
  • E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov (2018) Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Cited by: §3.1.
  • J. Greenberg and T. Pyszczynski (1985) The effect of an overheard ethnic slur on evaluations of the target: how to spread a social disease. Journal of Experimental Social Psychology 21 (1), pp. 61–72. Cited by: §2.
  • A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Cited by: §3.2.
  • R. Guermazi, M. Hammami, and A. B. Hamadou (2007) Using a semi-automatic keyword dictionary for improving violent web site filtering. In 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, pp. 337–344. Cited by: §1.
  • W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §2, §3.3, §3.3, §5.2.1.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Comput. 9 (8), pp. 1735–1780. External Links: ISSN 0899-7667, Link, Document Cited by: §3.1.
  • T. Joachims (1999) Transductive inference for text classification using support vector machines. In Proceedings of ICML-99, 16th International Conference on Machine Learning, I. Bratko and S. Dzeroski (Eds.), Bled, SL, pp. 200–209. Cited by: §3.1.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §3.3.
  • Q. Le and T. Mikolov (2014) Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pp. II–1188–II–1196. External Links: Link Cited by: §3.1, §3.1.
  • S. Liu and T. Forss (2015) New classification models for detecting hate and violence web content. In

    2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)

    ,
    Vol. 1, pp. 487–495. Cited by: §2.
  • S. MacAvaney, H. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder (2019) Hate speech detection: challenges and solutions. PloS one 14 (8), pp. e0221152. Cited by: §1.
  • B. Mathew, R. Dutt, P. Goyal, and A. Mukherjee (2019a) Spread of hate speech in online social media. In Proceedings of WebSci, Cited by: §3.2, §4.1, §4.
  • B. Mathew, A. Illendula, P. Saha, S. Sarkar, P. Goyal, and A. Mukherjee (2019b) Temporal effects of unmoderated hate speech in gab. External Links: 1909.10966 Cited by: §7, §7, §7.
  • B. Mathew, A. Illendula, P. Saha, S. Sarkar, P. Goyal, and A. Mukherjee (2020) Hate begets hate: a temporal study of hate speech. Proceedings of the ACM on Human-Computer Interaction (CSCW). Cited by: §3.2, §4.1, §5.2.2.
  • B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes (2011) We know who you followed last summer: inferring social link creation times in twitter. In Proceedings of the 20th international conference on World wide web, pp. 517–526. Cited by: §7.
  • P. Mishra, M. Del Tredici, H. Yannakoudakis, and E. Shutova (2018) Author profiling for abuse detection. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, pp. 1088–1098. External Links: Link Cited by: §1.
  • C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe (2018) Weisfeiler and leman go neural: higher-order graph neural networks. CoRR abs/1810.02244. External Links: Link, 1810.02244 Cited by: §3.3, §5.2.1.
  • J. Pennington, R. Socher, and C. D. Manning (2014) GloVe: global vectors for word representation. In

    Empirical Methods in Natural Language Processing (EMNLP)

    ,
    pp. 1532–1543. Cited by: §3.1.
  • B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In Proceedings of KDD, pp. 701–710. Cited by: §3.2.
  • J. Qian, M. ElSherief, E. Belding, and W. Y. Wang (2018) Leveraging intra-user and inter-user representation learning for automated hate speech detection. In NAACL, Vol. 2, pp. 118–123. Cited by: §1, §2.
  • R. Rehurek and P. Sojka (2010) Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Cited by: §3.1.
  • M. H. Ribeiro, P. H. Calais, Y. A. Santos, V. A. Almeida, and W. Meira Jr (2018) Characterizing and detecting hateful users on twitter. In Twelfth International AAAI Conference on Web and Social Media, Cited by: 1st item, §2, §2, §2, §3.2, §3.3, §4.2, §4.
  • M. T. Ribeiro, S. Singh, and C. Guestrin (2016) ”Why should I trust you?”: explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 1135–1144. Cited by: §5.2.2.
  • C. Rivers and B. Lewis (2014) Ethical research standards in a world of big data. F1000Research 3, pp. . External Links: Document Cited by: §4.1.
  • H. M. Saleem, K. P. Dillon, S. Benesch, and D. Ruths (2017) A web of hate: tackling hateful speech in online social spaces. arXiv preprint arXiv:1709.10159. Cited by: §2.
  • J. Salminen, H. Almerekhi, M. Milenković, S. Jung, J. An, H. Kwak, and B. J. Jansen (2018) Anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In Twelfth International AAAI Conference on Web and Social Media, Cited by: §1.
  • A. Schmidt and M. Wiegand (2017) A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. Cited by: §1.
  • K. K. Thekumparampil, C. Wang, S. Oh, and L. Li (2018)

    Attention-based graph neural network for semi-supervised learning

    .
    arXiv preprint arXiv:1803.03735. Cited by: §3.3, §3.3.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio (2018) Graph attention networks. In International Conference on Learning Representations, External Links: Link Cited by: §3.3.
  • W. Warner and J. Hirschberg (2012) Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, pp. 19–26. Cited by: §2.
  • S. Yadav and S. Shukla (2016) Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. pp. 78–83. External Links: Document Cited by: §5.1.
  • Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec (2019) Gnnexplainer: generating explanations for graph neural networks. In Advances in neural information processing systems, pp. 9244–9255. Cited by: §5.2.2.
  • S. Zannettou, B. Bradlyn, E. De Cristofaro, H. Kwak, M. Sirivianos, G. Stringini, and J. Blackburn (2018) What is gab: a bastion of free speech or an alt-right echo chamber. In Proceedings of WWW (Companion), pp. 1007–1014. Cited by: §4.
  • Z. Zhang, D. Robinson, and J. Tepper (2018) Detecting hate speech on twitter using a convolution-gru based deep neural network. In European Semantic Web Conference, pp. 745–760. Cited by: §2.