Abusive Language Detection with Graph Convolutional Networks

04/05/2019 ∙ by Pushkar Mishra, et al. ∙ University of Cambridge Facebook University of Amsterdam 0

Abuse on the Internet represents a significant societal problem of our time. Previous research on automated abusive language detection in Twitter has shown that community-based profiling of users is a promising technique for this task. However, existing approaches only capture shallow properties of online communities by modeling follower-following relationships. In contrast, working with graph convolutional networks (GCNs), we present the first approach that captures not only the structure of online communities but also the linguistic behavior of the users within them. We show that such a heterogeneous graph-structured modeling of communities significantly advances the current state of the art in abusive language detection.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Matthew Zook Zook (2012) carried out an interesting study showing that the racist tweets posted in response to President Obama’s re-election were not distributed uniformly across the United States but instead formed clusters. This phenomenon is known as homophily: i.e., people, both in real life and online, tend to cluster with those who appear similar to themselves. To model homophily, recent research in abusive language detection on Twitter Mishra et al. (2018a) incorporates embeddings for authors (i.e., users who have composed tweets) that encode the structure of their surrounding communities. The embeddings (called author profiles) are generated by applying a node embedding framework to an undirected unlabeled community graph where nodes denote the authors and edges the follower–following relationships amongst them on Twitter. However, these profiles do not capture the linguistic behavior of the authors and their communities and do not convey whether their tweets tend to be abusive or not.

In contrast, we represent the community of authors as a heterogeneous graph consisting of two types of nodes, authors and their tweets, rather than a homogeneous community graph of authors only. The primary advantage of such heterogeneous representations is that they enable us to model both community structure as well as the linguistic behavior of authors in these communities. To generate richer author profiles, we then propose a semi-supervised learning approach based on graph convolutional networks (

gcns) applied to the heterogeneous graph representation. To the best of our knowledge, our work is the first to use gcns to model online communities in social media. We demonstrate that our methods provide significant improvements over existing techniques.

2 Related work

Supervised learning for abusive language detection was first explored by Spertus Spertus (1997)

who extracted rule-based features to train their classifier. Subsequently, manually-engineered lexical–syntactic features formed the crux of most approaches to the task

Yin et al. (2009); Warner and Hirschberg (2012). Djuric et al. Djuric et al. (2015) showed that dense comment representations generated using paragraph2vec outperform bag-of-words features. Several works have since utilized (deep) neural architectures to achieve impressive results on a variety of abuse-annotated datasets Nobata et al. (2016); Pavlopoulos et al. (2017a). Recently, the research focus has shifted towards extraction of features that capture behavioral and social traits of users. Pavlopoulos et al. Pavlopoulos et al. (2017b) showed that including randomly-initialized user embeddings improved the performance of their rnn methods. Qian et al. Qian et al. (2018) employed lstms to generate inter and intra-user representations based on tweets, but they did not leverage community information.

3 Dataset

Following previous work Mishra et al. (2018a), we experiment with a subset of the Twitter dataset compiled by Waseem and Hovy Waseem and Hovy (2016). Waseem and Hovy released a list of tweet IDs along with their corresponding annotations,111https://github.com/ZeerakW/hatespeech/ labeling each tweet as racist, sexist or neither (clean). Recently, Mishra et al. Mishra et al. (2018a) could only retrieve of these tweets since some of them are no longer available. This is the dataset we use in our experiments. () of tweets are racist, () are sexist, and the remaining () are clean. The tweets have been authored by a total of unique users. Tweets in the racist class come from of the users, while those in the sexist class come from of them.

4 Approach

4.1 Representing online communities

We create two different graphs: the first one is identical to the community graph of Mishra et al. Mishra et al. (2018a) (referred to as the community graph). It contains nodes representing each of the authors in the dataset. Two authors/nodes are connected by a single undirected edge if either one follows the other on Twitter. There are solitary authors in the graph who are neither followed by nor follow any other author in the dataset. This graph is homogeneous, i.e., it has nodes (and hence edges) of a single type only.

Our second graph is an extended version of the first (referred to as the extended graph) that additionally contains nodes representing the tweets of the authors. Specifically, in addition to the author nodes, the graph contains tweet nodes. Each tweet node is connected to a single author node, denoting that the tweet is elicited from that particular author. This graph is no longer homogeneous since it contains nodes and edges of two different types.

4.2 Generating author profiles

We first describe the approach of Mishra et al. Mishra et al. (2018a) that learns author embeddings using node2vec Grover and Leskovec (2016); this serves as our baseline. We then move on to our semi-supervised approach based on graph convolutional networks Kipf and Welling (2017).

Node2vec. Node2vec extends the word2vec skip-gram model Mikolov et al. (2013) to graphs in order to create low-dimensional embeddings for nodes based on their position and neighborhood. Specifically, for a given graph with nodes , node2vec

aims to maximize the following log probability:

where denotes the neighbor set of node generated using neighbor sampling strategy . The framework utilizes two different strategies for sampling neighbor sets of nodes: Depth-First Sampling (DFS) and Breadth-First Sampling (BFS). The former captures the structural role of nodes, while the latter captures the local neighborhood around them. Two hyper-parameters control the overall contribution of each of these strategies. Following Mishra et al. Mishra et al. (2018a), we initialize these parameters to their default value of and set the embedding size and number of iterations to and respectively. Since node2vec cannot produce embeddings for nodes without edges, we map the solitary authors to a single zero embedding as done by Mishra et al.

Graph convolutional networks. We propose an approach for learning author profiles using gcns applied to the extended graph. In contrast to node2vec, our method allows us to additionally propagate information with respect to whether tweets composed by authors and their communities are abusive or not. Specifically, as labels are available for a subset of nodes in our graph (i.e., the tweet nodes), we frame the task as a graph-based semi-supervised learning problem, allowing the model to distribute gradient information from the supervised loss on the labeled tweet nodes. This, in turn, allows us to create profiles for authors that not only capture the structural traits of their surrounding community but also their own linguistic behavior based on the types of tweets that they have composed.

We consider a graph , where is the set of nodes () and is the set of edges. denotes the adjacency matrix of . We assume that is symmetric (), and that all nodes in have self loops (). The significance of these assumptions is explained in Kipf and Welling Kipf and Welling (2017). Let be the diagonal degree matrix defined as , and

be the input feature matrix that holds feature vectors of length

for the nodes in . We can now recursively define the computation that takes place at the convolutional layer of a -layer gcn as:

with the computation at the first layer being:


denotes an activation function;

is the normalized adjacency matrix; is the weight matrix of the convolutional layer; represents the output from the preceding convolutional layer, where is the number of hidden units in the layer (note that , i.e., the length of the input feature vectors).

In our experiments, we apply a 2-layer gcn to the extended graph.222Stacking more layers does not improve results on the validation set further. Specifically, our gcn performs the following computation, yielding a softmax distribution over the classes in the dataset for each of the nodes:

We set the input feature vectors in to be the binary bag-of-words representations of the nodes (following Kipf and Welling 2017); for author nodes, these representations are constructed over the entire set of their respective tweets. Note that is row-normalized prior to being fed to the gcn. We set the number of hidden units in the first convolutional layer to in order to extract -dimensional embeddings for author nodes so that they are directly comparable with those from node2vec . The number of hidden units in the second convolutional layer is set to for the output of the gcn to be a softmax distribution over the classes in the data.

The gcn is trained by minimizing the cross-entropy loss with respect to the labeled nodes of the graph. Once the model is trained, we extract -dimensional embeddings from the first layer (i.e., the layer’s output without activation). This contains embeddings for author nodes as well as tweet nodes. For our experiments on author profiles, we make use of the former.

4.3 Classification methods

We experiment with five different supervised classification methods for tweets in the dataset. The first three (lr, lr+auth, lr+extd) serve as our baselines,333The implementations of the baselines are taken from https://github.com/pushkarmishra/AuthorProfilingAbuseDetection. and the last two with gcns444The code we use for our gcn models can be found at https://github.com/tkipf/gcn. are the methods we propose.

lr. This method is adopted from Waseem and Hovy Waseem and Hovy (2016)

wherein they train a logistic regression classifier on character

-grams (up to

-grams) of the tweets. Character n-grams have been shown to be highly effective for abuse detection due to their robustness to spelling variations.

lr + auth. This is the state of the art method Mishra et al. (2018a) for the dataset we are using. For each tweet, the profile of its author (generated by node2vec from the community graph) is appended onto the tweet’s character n-gram representation for training the lr classifier as above.

lr + extd. This method is identical to lr + auth, except that we now run node2vec on the extended graph to generate author profiles. Intuitively, since node2vec treats both author and tweet nodes as the same and does not take into account the labels of tweets, the author profiles generated should exhibit the same properties as those generated from the community graph.

gcn. Here, we simply assign a label to each tweet based on the highest score from the softmax distribution provided by our gcn model for the (tweet) nodes of the extended graph.

lr + gcn. Identical to lr + extd, except that we replace the author profiles from node2vec with those extracted by our gcn approach.

Method Racism Sexism Overall
p r f p r f p r f
lr 80.59 70.62 75.28 83.12 62.54 71.38 83.18 75.62 78.75
lr + auth 77.95 78.35 78.15 87.28 78.41 82.61 85.26 83.28 84.18
lr + extd 77.95 78.35 78.15 87.02 78.73 82.67 85.17 83.33 84.17
gcn 74.12 64.95 69.23 82.48 82.22 82.35 81.90 79.42 80.56
lr + gcn 79.08 79.90 79.49 88.24 80.95 84.44 86.23 84.73 85.42
Table 1: The baselines (lr, lr + auth/extd) vs. our gcn approaches () on the racism and sexism classes. Overall shows the macro-averaged metrics computed over the classes: sexism, racism, and clean.

5 Experiments and results

5.1 Experimental setup

We run every method times with random initializations and stratified train–test splits. Specifically, in each run, the dataset is split into a randomly-sampled train set () and test set () with identical distributions of the classes in each. In methods involving our gcn, a small part of the train set is held out as validation data to prevent over-fitting using early-stopping regularization. When training the gcn, we only have labeled tweet nodes for those tweets in the extended graph that are part of the train set. Our gcn is trained using the parameters from the original paper Kipf and Welling (2017): Glorot initialization Glorot and Bengio (2010), adam optimizer Kingma and Ba (2015) with a learning rate of , dropout regularization Srivastava et al. (2014) rate of ,

training epochs with an early-stopping patience of


5.2 Results and analysis

In Table 1, we report the mean precision, recall, and F on the racism and sexism classes over the runs. We further report the mean macro-averaged precision, recall, and F for each method (‘Overall’) to investigate their overall performance on the data. lr + gcn significantly (

on paired t-test) outperforms all other methods. The author profiles from

node2vec only capture the structural and community information of the authors; however, those from the gcn also take into account the (abusive) nature of the tweets composed by the authors. As a result, tweets like “#MKR #mkr2015 Who is gonna win the peoples choice?” that are misclassified as sexist by lr + auth (because their author is surrounded by others producing sexist tweets) are correctly classified as clean by lr + gcn.

gcn on its own achieves a high performance, particularly on the sexism class where its performance is typical of a community-based profiling approach, i.e., high recall at the expense of precision. However, on the racism class, its recall is hindered by the same factor that Mishra et al. Mishra et al. (2018a) highlighted for their node2vec-only method, i.e., that racist tweets come from 5 unique authors only who have also contributed sexist or clean tweets. The racist activity of these authors is therefore eclipsed, leading to misclassifications of their tweets. lr + gcn alleviates this problem by incorporating character n-gram representations of the tweets, hence not relying solely on the linguistic behavior of their authors.

Figure 3 shows the t-sne van der Maaten and Hinton (2008) visualizations of node2vec author profiles from the community and extended graphs. Both visualizations show that some authors belong to densely-connected communities while others are part of more sparse ones. The results from lr + auth and lr + extd have insignificant differences, further confirming that their author profiles have similar properties. In essence, node2vec is unable to gain anything more from the extended graph than what it does from the community graph.

(a) Author profiles from the community graph
(b) Author profiles from the extended graph
Figure 3: Visualizations of the node2vec author profiles from the community and extended graphs.
Figure 4: Visualization of the author profiles extracted from our gcn. Red dots represent the authors who are deemed abusive (racist or sexist) by the gcn.

Figure 4 shows a t-sne visualization of the author profiles generated using our gcn approach. Red dots denote the authors who are abusive (sexist or racist) according to our model (i.e., as per the softmax outputs for the author nodes).555Note that there are no such gold labels for authors in the dataset itself. The red dots are mostly clustered in a small portion of the visualization, which corroborates the notion of homophily amongst abusive authors.

Despite the addition of improved author profiles, several abusive tweets remain misclassified. As per our analysis, many of these tend to contain urls to abusive content but not the content itself, e.g., “@MENTION: Logic in the world of Islam http://t.co/6nALv2HPc3” and “@MENTION Yes. http://t.co/ixbt0uc7HN”. Since Twitter shortens all urls into a standard format, there is no indication of what they refer to. One possible way to address this limitation could be to append the content of the url to the tweet; however this can lead to misclassifications in cases where the tweet is disagreeing with the url. Another factor in misclassifications is the deliberate obfuscation of words and phrases by authors in order to evade detection, e.g., “Kat, a massive c*nt. The biggest ever on #mkr #cuntandandre”. Mishra et al. Mishra et al. (2018b) demonstrate in their work that character-based word composition models can be useful in dealing with this aspect.

6 Conclusions

In this paper, we built on the work of Mishra et al. Mishra et al. (2018a) that introduces community-based profiling of authors for abusive language detection. We proposed an approach based on graph convolutional networks to show that author profiles that directly capture the linguistic behavior of authors along with the structural traits of their community significantly advance the current state of the art.


We would like to thank the anonymous reviewers for their useful feedback. Helen Yannakoudakis was supported by Cambridge Assessment, University of Cambridge.