DeepHate: Hate Speech Detection via Multi-Faceted Text Representations

03/14/2021 ∙ by Rui Cao, et al. ∙ University of Saskatchewan L3S Research Center 0

Online hate speech is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have developed many traditional machine learning and deep learning methods to detect hate speech in online social platforms automatically. However, most of these methods have only considered single type textual feature, e.g., term frequency, or using word embeddings. Such approaches neglect the other rich textual information that could be utilized to improve hate speech detection. In this paper, we propose DeepHate, a novel deep learning model that combines multi-faceted text representations such as word embeddings, sentiments, and topical information, to detect hate speech in online social platforms. We conduct extensive experiments and evaluate DeepHate on three large publicly available real-world datasets. Our experiment results show that DeepHate outperforms the state-of-the-art baselines on the hate speech detection task. We also perform case studies to provide insights into the salient features that best aid in detecting hate speech in online social platforms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Motivation. The proliferation of social media has enabled users to share and spread ideas at a prodigious rate. While the information exchanges in social media platforms may improve an individual’s sense of connectedness with real and virtual communities, these platforms are increasingly exploited for the propagation of toxic content such as hate speeches (Schmidt and Wiegand, 2017; Fortuna and Nunes, 2018), which are defined by the Cambridge dictionary as “public speech that expresses hate or encourages violence towards a person or group based on something such as race, religion, sex, or sexual orientation (Cambridge, ). The spread of hate speech in social media has not only sowed discord among individuals or communities online but also resulted in violent hate crimes (Williams, 2019; Relia et al., 2019; Mathew et al., 2019). Therefore, it is a pressing issue to detect and curb hate speech in online social media.

Major social media platforms such as Facebook and Twitter have made great efforts to combat the spread of hate speech in their platforms (Times, 2019; Bloomberg, 2019). For instance, the platforms have provided clear policies on hateful conducts (Facebook, ; Twitter, ), implemented mechanisms for users to report hate speech, and employed content moderators to detect hate speeches actively. However, such approaches are labor-intensive, time-consuming, and thus not scalable or sustainable in the long run (Waseem and Hovy, 2016; Gambäck and Sikdar, 2017).

The gravity of the issues and limitations of manual approaches has motivated the search for automatic hate speech detection methods. In recent years, researchers from data mining, information retrieval, and Natural Language Processing (NLP) fields have proposed several such methods

(Fortuna and Nunes, 2018; Schmidt and Wiegand, 2017). These methods can be broadly grouped into two categories: (i) methods that adopted classic machine-learning strategies (Chen et al., 2012; Waseem and Hovy, 2016; Waseem, 2016; Nobata et al., 2016; Chatzakou et al., 2017; Davidson et al., 2017), and more recently, (ii) deep learning-based methods (Djuric et al., 2015; Mehdad and Tetreault, 2016; Gambäck and Sikdar, 2017; Badjatiya et al., 2017; Park and Fung, 2017; Gröndahl et al., 2018; Zhang et al., 2018; Arango et al., 2019; Founta et al., 2019).

For traditional machine learning-based methods, textual features such as bag-of-words are commonly extracted from social media posts to train a classifier to detect hate speech in the posts. For deep learning-based methods, words in the posts are often represented by some word embedding vectors and fed as inputs to train a neural network for hate speech detection. Although the existing methods, especially the deep learning ones, have shown promising results in automatic hate speech detection in social media, there are limitations in these models. Firstly, most of the existing methods have only considered single type textual features, neglecting the other rich textual information that could be utilized to improve hate speech detection. Secondly, the current deep learning methods offered limited explainability on why a particular post should be flagged as hate speech.

Research Objectives. In this paper, we address the limitations in existing methods and propose DeepHate111Code: https://gitlab.com/bottle_shop/safe/deephate, a novel deep learning architecture that effectively combines multi-faceted textual representations for automatic hate speech detection in social media. At a high level, DeepHate utilizes three types of textual representations: semantic, sentiment, and topical information. The semantic representations of social media posts are extracted from several word embeddings pre-trained on a much larger text corpus (Mikolov et al., 2018; Pennington et al., 2014; Wieting et al., 2015)

. For sentiment representation, we proposed a two-step approach that utilizes an existing sentiment analysis tool

(Hutto and Gilbert, 2014) and neural network to train a new word embedding that captures the sentiment information in a social media post. Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is used to extract topical representations of posts. The three types of textual representations are subsequently used as input to train our DeepHate model for hate speech classification. The underlying intuition is that the multi-faceted textual representations enrich the representation of social media posts, and this enables the deep learning classifier to perform better hate speech detection. To gain a better understanding of how DeepHate decides which posts to classify as hate speech, we also conduct empirical studies to analyze the salient features aided in hate speech detection.

Contributions. Our main contributions in this work consist of the following.

  • We propose a novel deep learning model called DeepHate, which utilizes multi-faceted textual representations for automatic hate speech detection in social media.

  • We conduct extensive experiments on three real-world and publicly available datasets. Our experiment results show that DeepHate consistently outperforms state-of-the-art methods in the hate speech detection task.

  • We conduct empirical analyses on the DeepHate model and provide insights into the salient features that helped in detecting hate speech in social media. The salient feature analysis also improves the explainability of our proposed model.

2. Related Work

Automatic detection of hate speech has received considerable attention from the data mining, information retrieval, and natural language processing (NLP) research communities. Interest in this field has increased with the proliferation of social media and social platforms. In this section, we focus on reviewing the existing works for detecting hate speech in social media text content. Mostly, we will be focusing on hate speech detection on Twitter short messages (i.e., tweets). These works can be broadly categorized into two approaches: (i) works that adopt classic machine-learning strategies, and, more recently, (ii) those that deep learning methods. Readers are encouraged to refer to recent the surveys on the topic (Fortuna and Nunes, 2018; Schmidt and Wiegand, 2017) for great detail of methods for both Twitter and other social media.

Traditional machine learning methods have been applied to detect hate speech in social media (Chen et al., 2012; Waseem and Hovy, 2016; Waseem, 2016; Nobata et al., 2016; Chatzakou et al., 2017; Davidson et al., 2017)

. Typically, these methods include an initial feature extraction phase, where features are extracted from the raw textual content. The most commonly extracted features include Term Frequency Inverse Document Frequency (TF-IDF) scores, Bag-of-Words vectors, and other linguistic attributes. Xiang et al.

(Xiang et al., 2012) had also explored the latent semantic features extracted from tweets for hate speech detection by mining topics of the tweets. Beyond the textual content, some studies have also utilized other meta-information from the users’ profiles and network structures (i.e., followers, mentioned, etc.) (Chatzakou et al., 2017; Papegnies et al., 2017; Singh et al., 2017)

. The extracted features are subsequently used as input for classifiers such as Logistic Regression, SVM, Random Forest, etc., to predict if the given tweet contains hate speeches. In this paper, we focus on utilizing only the textual content to perform hate speech detection as other meta-information is scarce and often incomplete. Moreover, by not using these meta-information, we would avoid to bias our hate detectors by any personal information of the users.

Deep learning methods have achieved notable performance in many classification tasks. Unlike traditional machine learning methods, deep learning methods are able to automatically learn latent representations of the input data to perform classification (Goodfellow et al., 2016). Such deep learning approaches have also been applied to several natural language processing tasks, including text classification (Goldberg, 2016; Yang et al., 2016). The increasing popularity of deep learning approaches also sees a number of recent studies adopting these methods to detect hate speech in social media (Djuric et al., 2015; Mehdad and Tetreault, 2016; Gambäck and Sikdar, 2017; Badjatiya et al., 2017; Park and Fung, 2017; Gröndahl et al., 2018; Zhang et al., 2018; Arango et al., 2019; Founta et al., 2019).

Mehdad et al. (Mehdad and Tetreault, 2016)

experimented applying Recurrent Neural Network (RNN) model different input types such as word, unigram, and bigram character embeddings for hate speech detection. Gambäck et al.

(Gambäck and Sikdar, 2017)

conducted similar studies using Convolutional Neural Network (CNN). Badjatiya et al.

(Badjatiya et al., 2017)

proposed an ensemble approach that combines Long-Short Term Memory (LSTM) model and Gradient-Boost Decision Tree to detect hate speech on Twitter. Park and Fung

(Park and Fung, 2017) introduced the HybridCNN method that trains a CNN over both word and unigram character embeddings for hate speech detection. Zhang et al. (Zhang et al., 2018)

proposed a new neural network architecture that combines CNN with Gate Recurrent Unit (GRU) to classify hate speech using word embeddings as input. Founta

(Founta et al., 2019)

trains a combined RNN and Multilayer perceptron (MLP) on textual content and meta-information to perform hate speech detection. Most of these studies have applied and experimented with their proposed methods on datasets proposed in

(Waseem and Hovy, 2016; Davidson et al., 2017). Unlike the existing deep learning methods, which mainly utilized word or character embeddings as input, we proposed a novel deep learning architecture to combine multi-faceted text representations for the classification of hate speech. We will evaluate and benchmark our proposed model against these state of the art in Section 5.

3. Proposed Model

In this section, we elaborate on our proposed DeepHate model. The intuition of our proposed model is to learn the latent representations of multi-faceted text information, and effectively combine them to improve hate speech detection. Figure 1 illustrates the overall architecture of the DeepHate model. The proposed model first utilizes different types of feature embeddings to represent a post

. The feature embeddings are subsequently fed into neural network models to learn three types of textual information latent representations, namely, semantic, sentiment, and topic. The latent representations are then combined via a feed-forward network. Finally, a softmax layer takes the combined representation as input to predict probability distribution over all possible classes. The details of the individual components in

DeepHate model are described in the subsequent subsections.

Figure 1. Overall architecture of DeepHate model

3.1. Semantic Representation

To learn the latent semantic representation of a post , we first represent the post as a sequence word, i.e., , where refers to the length of the post. Each word in the sequence is then represented by pre-trained word embeddings. In order to obtain expressive word-level representations, we use three popular pre-trained word embeddings, namely: GloVe (Pennington et al., 2014), word2vec trained on wiki news (Mikolov et al., 2018), and Paragram (Wieting et al., 2015). Consequently, we denote the pre-trained word embeddings of post as follows:

  • Glove:

  • Word2Vec-Wiki:

  • Paragram:

Note that all pre-trained word embeddings are trained as 300-dimensional vectors. Each pre-trained word embedding of a post is used as input into a C-LSTM-Att encoder to learn a latent representation of the post. For instance, we learn the latent post representations from , from , and from . The detail implementation of the C-LSTM-Att encoder will be presented in section 3.4.

Finally, we generate the post’s latent semantic representation, , by combining the three latent post representations learned from various types of pre-trained word embeddings. We want to fully exploit the advantage of each kind of representation. As a consequence, we assign an attention weight, which is a single value, to each vector of representation. To make it easier for training, these attention weights

are initialized by standard uniform distribution. Element-wise summation are performed on the three attended latent post representation vectors.

(1)

will be subsequently combined with other textual information latent representations for hate speech detection in the fusion process described in Section 3.5.

3.2. Sentiment Representation

Tweets often contain abundant sentiment information (Giachanou and Crestani, 2016). Each tweet encodes the attitude and emotion of the writer, and such information may be helpful in hate speech detection. From this intuition, we aim to incorporate sentiment information in the DeepHate model. Inspired by the method proposed in (Tang et al., 2014), our goal here is to train a word embedding that contains sentiment information. Then, similar to the training of latent semantic representation, this trained sentiment-specific word embedding, denoted by , will be used to represent the words in a post, which is subsequently used as input into the C-LSTM-Att encoder to learn the latent sentiment representation of the post.

However, the technique proposed in (Tang et al., 2014) is a supervised method that requires labels for sentiment classification, and the existing hate speech datasets lack such sentiment labels. To overcome this limitation, we first explore a sentiment analysis tool, VADER (Hutto and Gilbert, 2014)

, to label the sentiment for the tweets in the hate speech datasets. VADER is a lexicon and rule-based sentiment analysis tool that is specifically tuned to extract sentiments expressed in social media. Given a tweet, the tool generates scores over three polarities: negative, positive, and neutral. In our implementation, we assume the sentiment label for a tweet to be the polarity with the highest score. With the generated sentiment labels for the hate speech datasets, we train a sentiment-specific word embedding

(Tang et al., 2014) by performing a sentiment classification task to predict the sentiment labels in hate speech datasets. Similar to the pre-trained word embeddings, the sentiment-specific word embedding (a.k.a sentiment embedding) is trained as 300-dimensional vectors.

Finally, we represent the word sequence of a post using the trained sentiment embedding, , and use it as input into a C-LSTM-Att encoder to learn a latent sentiment representation of the post, . Note that we also freeze the parameters in the sentiment embedding layers such that the sentiment embedding is not updated; this ensures the sentiment information is preserved and not updated by the back-propagation from our hate speech detection task.

3.3. Topic Representation

We employ the probabilistic topic modeling approach (Blei, 2012) to derive the topic representation of the posts. Generally, this approach allows to represent each text document by a multinomial distribution over topics where each topic is mathematically defined as a multinomial distribution over words. Often, in mining topics from Twitter posts, the documents are formed by aggregating the posts by hashtags, published time, or authors (Hong and Davison, 2010; Zhao et al., 2011). However, since our datasets are highly sparse in those dimensions (e.g., there are not many common hashtags among posts, posts are published in a long span of time, and each user has very few posts), we consider each single post is a document and make use of the original Latent Dirichlet Allocation model (Blei et al., 2003) (LDA) for computing the posts’ topic distribution. Moreover, inspired by previous works that each post should focus on one or a few topics, we exploit sparsity regularization in the learning of the LDA model (Balasubramanyan and Cohen, 2013). Thus, posts’ topic representations,

, are sparse vectors (which are highly skewed multinomial distribution over topics).

Specifically, we first remove all stopwords and non-English words. Next, we iteratively filter away infrequent words and too short posts such that: each word must appear in at least five remaining posts while each post contains at least three remaining words. These minimum thresholds are to ensure that for each post and each word, we have enough observations to learn the topics accurately. To determine the appropriate number of topics in each dataset, we run the LDA model (with sparsity regularization) with the number of topics varied from 5 to 20 and measure the likelihood of the learned model in each case. The number of topics in each dataset is then set by considering both computational complexity and improvement in the likelihood. Lastly, for too short posts that are previously filtered away, we assume that their topic distributions are uniform. There is just a small proportion (often less than 10%) of such posts in each dataset.

3.4. C-LSTM-Att Encoder

Zhou et al. (Zhou et al., 2015) introduced the C-LSTM model, which stacked a convolutional neural network (CNN) and long-short term memory (LSTM) (Hochreiter and Schmidhuber, 1997), for text classification. Motivated by C-LSTM’s good performance in the text classification task, we modify C-LSTM by adding an attention mechanism (Yang et al., 2016) to select informative words that contribute more towards detecting hate speech in social media posts. Also, our proposed network can be adaptive to convolution operation with multiple filter sizes. We name the modified model C-LSTM-Att. The C-LSTM-Att model is used to encode the post’s pre-trained word embeddings, i.e., , , , and sentiment embedding, , into its latent semantic and sentiment representations. As the process of encoding the four feature embeddings is similar, we use as a generic word embedding and as a generic latent semantics or sentiment representation of a post for illustration in our subsequent discussion in this section.

CNN Component. This component is a slight variant of the traditional convolutional networks (Goodfellow et al., 2016). Let be the -dimensional word vectors for the -th word in a post . Let denote the input post where is the length of the post . Let be the length of the filter, and the vector is a filter for the convolution operation. For each position in the post, we have a window vector with consecutive word vectors, denoted as:

(2)

Where the commas represent row vector concatenation. A feature map is generated with as the filter convolves with the window vectors (-grams) at each position. Each element of the feature map for window vector is produced as follows:

(3)

where is a bias term and

is the ReLU nonlinear transformation function. We use

filters to generate feature maps. To be more specific, the output from convolution operation with windeow size and filters is: . We assume there are different sizes of filters, so the output is: . Each of the output is subsequently fed into the LSTM component individually. As LSTM is specified for sequence input while pooling operators break the sequence organization due to the discontinuous selected features, we do not use these operators after the convolution operation.

LSTM Component. We adopt the LSTM model (Goodfellow et al., 2016) to learn the long-range dependencies from higher-order sequential features generated by the CNN component of our model. Formally, the transition functions are defined as follows:

(4)

where denotes the current input feature representation from and ; is the old hidden state at previous time ; , , represent the values of an input gate, an output gate, a forget gate at time step , respectively. We use as the number of hidden states in the LSTM. These gates collectively decide how to update the current memory cell and the current hidden state . are the weighted matrices and are biases, which parameterizing the transformations. Here,

is the logistic sigmoid function that has an output in [0, 1],

denotes the hyperbolic tangent function that has an output in [-1, 1], and denotes the element-wise multiplication. Attention mechanism is subsequently applied on the learned hidden states to capture the informative words that contribute more towards detecting hate speech in social media post.

Attention Mechanism. Not all words in a post contribute equally to the detection of hate speech. We adapt the word attention proposed by Yang et al. (Yang et al., 2016) to emphasis the important word features extracted from LSTM that aided hate speech detection and aggregate the representation of these informative word features to form the post’s latent representations.

Specifically, we let be a matrix consisting of hidden vectors that the C-LSTM model produced, where is the size of hidden layers and is the length of the output. The length can be represented as , when the window size of filters is . The final hidden state of LSTM, , contains information of the whole sentence. To implement self attention over the sentence, we consider the relation between each word and the meaning of the whole sentence. As a consequence, we use the final hidden state to attend to hidden state of each word to obtain the importance of each word for detecting hate speech. The attention mechanism will produce an attention weight vector and the post’s latent representations .

(5)
(6)
(7)

where, , are projection parameters and is a middle dimension of computation. is a vector consisting of attention weights and is attended output where there is only single size of filters. As there are multiple filter sizes, we concatenate output from attention module to form the final latent representation of the post. As such, the post’s semantic representations , , and , and sentiment representation are learned in similar fashion.

Figure 2. Sentiment distribution of datasets. Figure a, b, c corresponds to the distribution of WZ-LS, DT and FOUNTA. Green in the pie chart denotes tweets labeled positive, orange denotes negtive and blue denotes neural.

3.5. Representation Fusion

After learning the latent semantic, sentiment, and topic representations, we perform operations to effectively combine the different post representations in the representation fusion layer.

In this layer, we adopt the Gate Attention Network to fuse information from multiple modalities. The Gate Attention Network proposed by Tang et al. (Tang et al., 2019) is effective for combining two vectors. Here we extend the gate fusion from two modalities to three modalities. The sentiment representation and topic representation are regarded as complementary information of semantic representation. So they are weighted by the interaction with semantic representation at first. Eventually, element-wise sum operation is performed to combine the weighted sentiment and topic representation with the semantic representation:

(8)
(9)
(10)

where and are weighted matrices. The resulting joint representation is a high level representation of the post and can be used as features for hate speech multi-class classification in the softmax layer.

3.6. Implementation Details

We use the same embedding size for pre-trained word embeddings and sentiment embedding. Additionally, we have added a dropout layer with dropout after the embedding layer for regularization and dropout for fully connected layer. Each CNN layer has three kinds of filter window size with 50 filters each. The number of hidden states in LSTM is set as 200. We use ADAM (Kingma and Ba, 2014) as the optimizer with a learning rate of to train our model.

4. Dataset

We evaluate DeepHate on three publicly available datasets and one dataset reconstructed by merging tweets of the three datasets. The first three datasets, namely, WZ-LS (Park and Fung, 2017), DT (Davidson et al., 2017), and FOUNTA (Founta et al., 2018), are widely used in hate speech detection studies. The merged dataset was constructed by concatenating the three datasets and removing the spam tweets. Table 1 shows the statistical summary of the these datasets. The table clearly shows that these datasets are diverse in both size and nature, thus allows us to evaluate our proposed model more comprehensively.

Dataset #tweets Classes (#tweets)
WZ-LS 13,202 racism (82), sexism (3,332), both (21), neither (9,767)
DT 24,783 hate (1,430), offensive (19,190), neither (4,163)
FOUNTA 89,990 normal (53,011), abusive (19,232), spam (13,840), hate (3,907)
COMBINED 114,120 normal (66,941), inappropriate (47,194)
Table 1. Statistic information about datasets in experiments

WZ-LS dataset. Park et al. (Park and Fung, 2017) combined two Twitter datasets (Waseem and Hovy, 2016; Waseem, 2016) to form the WZ-LS dataset. The dataset breaks down the types of hate speech into four classes: racism, sexism, both, and neither. As only the ids of the tweets are released in (Park and Fung, 2017), we retrieve the text of the tweets using Twitter’s APIs. However, some of the tweets have been deleted by Twitter due to their inappropriate content. Thus, our dataset is slightly smaller than the original dataset reported in (Park and Fung, 2017).

DT dataset. Davidson et al. (Davidson et al., 2017) argued that hate speech should be differentiated from offensive tweets; some tweets may contain hateful words but may be just offensive and did not meet the threshold of classifying them as hate speech. The researchers constructed the DT Twitter dataset, which manually labeled and categorized tweets into three categories: offensive, hate, and neither.

FOUNTA dataset. The FOUNTA dataset is recently published in (Founta et al., 2018). It’s a human-annotated dataset that went through two rounds of annotations. In the first round, annotators are required to classify tweets into three categories: normal, spam, and inappropriate. Subsequently, the annotators were asked to refine further the labels of the tweets in the “inappropriate” category. Specifically, the final version of the dataset consists of four classes: normal, spam, hate, and abusive. We found that there were duplicated tweets in FOUNTA dataset as the dataset annotators have included retweets in their dataset. For our experiments, we remove the retweets resulting in the distribution in Table 1.

COMBINED dataset. The COMBINED dataset is an aggregation of all inappropriate tweets including offensive and hate tweets, and normal tweets in three datasets above. We postulate that the aggregated dataset is closer to the real-world application as the social media platform provider is likely to be interested in detecting and reducing both hate and offensive tweets. The COMBINED dataset is the largest dataset with diverse types of inappropriate tweets.

5. Experiment

In this section, we will first describe the settings of experiments conducted to evaluate our DeepHate model. Next, we discuss the experiment results and evaluate how DeepHate fare against other state-of-the-art baselines. We conduct more in-depth ablation studies on the various components in the DeepHate. Empirical analyses on the sentiment, topics, and salient features that aided in hate speech detection are also conducted. Finally, we discuss some interesting case studies on how DeepHate’s strengths and limitations in hate speech detection.

5.1. Experiment Setup

Baselines. For evaluation, we compare the DeepHate model with the following state-of-the-art baselines that utilized textual content for hate speech detection:

Model Micro-Prec Micro-Rec Micro-F1
CNN-W 75.95 78.57 75.54
CNN-C 54.77 74.01 62.95
CNN-B 76.30 79.08 74.78
LSTM-W 75.39 79.52 74.52
LSTM-C 74.82 78.13 71.95
LSTM-B 54.77 74.01 62.95
HybridCNN 76.35 78.93 75.98
CNN-GRU 75.33 79.27 74.42
DeepHate 77.95 79.48 78.19
Table 2. Experiment results of DeepHate and baselines on WZ-LS dataset
Model Micro-Prec Micro-Rec Micro-F1
CNN-W 87.88 88.65 87.95
CNN-C 60.53 77.43 67.60
CNN-B 78.02 80.33 77.01
LSTM-W 88.08 89.08 87.87
LSTM-C 77.21 79.88 76.47
LSTM-B 59.97 77.44 67.60
HybridCNN 88.33 88.96 88.07
CNN-GRU 87.60 88.24 87.23
DeepHate 89.97 90.39 89.92
Table 3. Experiment results of DeepHate and baselines on DT dataset
Model Micro-Prec Micro-Rec Micro-F1
CNN-W 78.26 79.71 78.27
CNN-C 69.66 70.15 64.40
CNN-B 52.01 58.41 50.64
LSTM-W 78.54 79.87 78.48
LSTM-C 70.15 70.89 66.30
LSTM-B 55.22 62.71 54.52
HybridCNN 78.34 79.24 77.73
CNN-GRU 78.62 80.17 78.39
DeepHate 78.95 80.43 79.09
Table 4. Experiment results of DeepHate and baselines on FOUNTA dataset.
Model Micro-Prec Micro-Rec Micro-F1
CNN-W 91.86 91.77 91.72
CNN-C 79.56 79.06 78.50
CNN-B 42.26 58.63 43.67
LSTM-W 91.54 91.54 91.52
LSTM-C 82.46 80.64 79.70
LSTM-B 64.93 65.50 63.85
HybridCNN 91.77 91.72 91.67
CNN-GRU 91.63 91.40 91.31
DeepHate 92.48 92.45 92.43
Table 5. Experiment results of DeepHate and baselines on COMBINED dataset.
  • CNN: Previous studies have utilized CNN to perform automatic hate speech detection and achieved good results (Badjatiya et al., 2017; Gambäck and Sikdar, 2017; Agrawal and Awekar, 2018). For baselines, we train three CNN models with different input embeddings: word embedding (CNN-W), character embedding (CNN-C), and character-bigram embedding (CNN-B).

  • LSTM: The LSTM model ,is another model that was commonly explored in previous hate speech detection studies (Badjatiya et al., 2017; Agrawal and Awekar, 2018; Gröndahl et al., 2018). Similarly, we train three LSTM models with different input embeddings: word embedding (LSTM-W), character embedding (LSTM-C), and character-bigram embedding (LSTM-B).

  • HybridCNN: We replicate the HybridCNN model proposed by Park and Fung (Park and Fung, 2017) for comparison. The HybridCNN model trains CNN over both word and character embeddings for hate speech detection.

  • CNN-GRU: The CNN-GRU model that was proposed in a recent study by Zhang et al. (Zhang et al., 2018) is also replicated in our study as a baseline. The CNN-GRU model takes word embeddings as input.

Sentiment Learning. As mentioned in Section 3.2, we learn the post’s sentiment using the VADER (Hutto and Gilbert, 2014) sentiment analysis tool. The learned sentiment will be used as labels (i.e, negative, positive, and neutral) to train the sentiment representations of the post. Figure 2 shows the Sentiment distributions of the three datasets.

Topic Modeling. In learning post’s topic representation, we set the number of topics in WZ-LS, DT, and FOUNTA datasets to 15, 10, 15, respectively.

Training and Testing Set. In our experiments, we adopt an 80-20 split, where for each dataset, 80% of the posts are used for training with the remaining 20% used for testing.

Evaluation Metrics. Similar to most existing hate speech detection studies, we use micro averaging precision (Micro-Prec), recall (Micro-Rec), and F1 score (Micro-F1

) as the evaluation metrics. Micro averaging is preferred in our experiments as there are classes imbalance in the hate speech datasets. Also, five-fold cross-validation is used in our experiments, and the average results of the five folds are reported.

5.2. Experiment Results

Table 2 shows the experiment results on WZ-LS dataset while Tables 3, 4 and 5 show that on DT, FOUNTA and COMBINED datasets respectively. In the tables, the highest figures are highlighted in bold. We observe that DeepHate outperformed the state-of-the-art baselines. Interestingly, we observed that the CNN and LSTM models with word embedding perform better than character-level embeddings input, suggesting that the semantics information in words is able to yield good performance for hate speech detection. The performance of character-level bi-gram embedding is worse among the three types of input feature embeddings. A possible reason may be that posts are short, and character-level bi-gram embedding may be ambiguous sometimes. We also noted that the basic CNN and LSTM with word embedding models are able to outperform some of the more complicated models such as HybridCNN and CNN-GRU in some occasions.

It is worth noting that there are differences between the results of HybridCNN and CNN-GRU models in our experiments and the results that were reported in previous studies (Park and Fung, 2017; Zhang et al., 2018). For instance, previous studies for HybridCNN (Park and Fung, 2017) and CNN-GRU (Zhang et al., 2018) had conducted experiments on the WZ-LS dataset. However, we did not cite the previous scores directly as some of the tweets in WZ-LS has been deleted. Similarly, CNN-GRU was also previously applied to the DT dataset. However, in the previous work (Zhang et al., 2018), the researchers have cast the problem into binary classification by re-labeling the offensive tweets as non-hate. In our experiment, we perform the classification based on the original DT dataset (Davidson et al., 2017). Therefore, we replicated the HybridCNN and CNN-GRU models and applied them to the updated WZ-LS dataset and original DT dataset.

5.3. Ablation Study

Our proposed DeepHate model is made up of several components. In this evaluation, we perform an ablation study to investigate the effects of different latent post representations. Specifically, we compare the following settings:

  • Semantic: We apply the DeepHate model using only the post’s latent semantic representation .

  • Topic+Semantic: We apply the DeepHate model using only the post’s latent semantic representation and topic representation .

  • Sentiment+Semantic: We apply the DeepHate model using only the post’s latent semantic representation and sentiment representation .

  • DeepHate: Our original DeepHate model which utilized post’s semantic, topic, and sentiment representations.

Model Micro-Prec Micro-Rec Micro-F1
Semantic 77.00 78.75 77.04
Topic+Semantic 77.98 79.32 77.98
Sentiment+Semantic 77.09 78.62 77.35
DeepHate 77.95 79.48 78.19
Table 6. Performance of various DeepHate components on WZ-LS dataset
Model Micro-Prec Micro-Rec Micro-F1
Semantic 89.44 90.24 89.49
Topic+Semantic 89.56 90.28 89.64
Sentiment+Semantic 89.59 90.39 89.64
DeepHate 89.97 90.39 89.92
Table 7. Performance of various DeepHate components on DT dataset
Model Micro-Prec Micro-Rec Micro-F1
Semantic 78.68 80.40 78.57
Topic+Semantic 78.77 80.45 78.62
Sentiment+Semantic 78.88 80.53 78.79
DeepHate 78.95 80.43 79.09
Table 8. Performance of various DeepHate components on FOUNTA dataset
Model Micro-Prec Micro-Rec Micro-F1
Semantic 92.26 92.23 92.20
Topic+Semantic 92.33 92.30 92.27
Sentiment+Semantic 92.32 92.28 92.25
DeepHate 92.48 92.45 92.43
Table 9. Performance of various DeepHate components on COMBINED dataset

Table 6, 7, 8 and 9 shows the results of our ablation studies on WZ-LS, DT, FOUNTA and COMBINED datasets respectively. Our post’s latent semantic representations, which combined the latent representations learned from three pre-trained word embeddings, outperforms the LSTM or CNN model with randomly initialized word embedding. We postulate that the combination of three pre-trained word embeddings allows more expressive semantic representations and ultimately leads to better hate speech detection. The addition of the post’s latent topic and sentiment representations to its semantic representation is observed to improve performance, suggesting that topic and sentiment information are useful in hate speech detection. Finally, our DeepHate model that utilizes the post’s semantic, topic, and sentiment representations constantly outperforms the other configurations in the four datasets, suggesting that multi-faceted textual representations can improve hate speech detection.

5.4. Salient Feature Analysis

To analyze how our DeepHate is able to extract important features from the textual information to perform hate speech detection, we examine the most salient sections of a single post’s latent semantic representations. We adopt the saliency score defined by (Li et al., 2016) to measure the saliency of input features. The score indicates how sensitive a model is to the changes in the embedding input, i.e., in our case, how much a specific word, contributes to the final classification decision. Figure 3, 4, and 5 show the visualized computed saliency scores for WZ-LS, DT, and FOUNTA dataset respectively. Note that the more salient that section of the post’s input feature representation, the darker is its shade of red.

From the visualizations, we notice that the salient words highlighted in a post are descriptive of its true-label. For example, in Figure 3, the words “lady she’s awful” are highlighted for the post labeled as sexism. Similarly, words such as “nigga” are salient in the racism post. More interesting, words that are indicative of both sexism and racism are highlighted for posts labeled to contain both racism and sexism. Similar observations are made for other datasets. For instance, in the DT dataset, Figure 4 visualization shows that offensive lexicons are differentiated from hateful ones. The salient feature analysis demonstrates DeepHate’s capabilities to extracting critical textual features for hate speech detection.

Label Example Post
Sexism
Racism
Both
Figure 3. Visualized saliency scores on posts in WZ-LS dataset.
Label Example Post
Offensive
Hate
Figure 4. Visualized saliency scores on posts in DT dataset.
Label Example Post
Abusive
Hate
Figure 5. Visualized saliency scores on posts in FOUNTA dataset.

The extracted critical textual features also provide some form of explanation to DeepHate’s classification decision; it highlights the keywords which may suggest some form of hate or abusive language in the text. Besides achieving excellent detection performances, hate speech detection models should also be explainable to aid the content moderator in providing provide reasons for content removal. To the best of our knowledge, this is the first deep-learning-based hate speech detection study that supports explainability.

5.5. Case Studies

To gain a better understanding of the difference between DeepHate and baseline models, we qualitatively evaluate the models by examining some example posts and their classification results. Table 10, 11, and 12 show example posts from WZ-LS, DT, and FOUNTA datasets respectively. In each example post, we display the true label and predicted labels from DeepHate and the best baseline, CNN-GRU. The correct predictions are marked in green font, while the incorrect predictions are represented in red font.

From the example posts, we observed that the posts’ sentiments have a profound impact on DeepHate’s hate speech detection performance. For instance, for posts with “normal” or “non-hate” true labels speech (e.g., post 1, 9, 15), DeepHate is correctly classified the posts while CNN-GRU made incorrect predictions. We postulate that sentiment information, i.e., neural sentiment, utilized by DeepHate has helped the model to minimize false negatives, i.e., labeling non-hate speech as hate speeches. However, the dependent on sentiment information also exposes a limitation of Deephate model: the inaccuracy of the sentiment information may have an adverse effect on DeepHate’s performance. For example, in post 5, 11, 12, 16, 17, the posts’ sentiments are labeled as neural, and DeepHate has wrongly classified these posts. The sentiment labels learned using VADER model are also inaccurate as examining the posts’ text clearly suggests that these are posts with negative sentiment. Thus, for future work, we will explore utilizing or designing better sentiment classification techniques to boost DeepHate performance in hate speech detection.

S/N Post True Label CNN-GRU DeepHate
1 Kats face the only thing worse than her cooking. Neither Sexist Neither (Sentiment: Neural)
2 I’m annoyed to have missed this live need some b*tchy. Sexist Neither Sexist (Sentiment: Negative)
3 Man w*f is this d*mn ass nig*a doing on that d*mn commercial. Racist Sexist Racist (Sentiment: Negative)
4 Kat is f*cking c*nt. Sexist Sexist Neither (Sentiment: Negative)
5 It’s not the only thing. She’s done lot of, I’m guessing. Sexist Neither Neither (Sentiment: Neural)
6 Would slap the smirk off Kat’s face. Neither Sexist Sexist (Sentiment: Neural )
Table 10. Example WZ-LS dataset’s posts evaluated using DeepHate and best baseline (i.e., CNN-GRU)
S/N Post True Label CNN-GRU DeepHate
7 B*tch shut the f*ck up goddam your sl*t b*tch wh*re nig*a. Hate Offensive Hate (Sentiment: Negative)
8 Stop looking p*rn fag. Offensive Non-hate Offensive (Sentiment: Negative)
9 Okay, I’m going to say it once comb yer beards. Non-hate Offensive Non-hate (Sentiment: Neural)
10 Why do people even talk about white privilege when the majority of food stamp recipients are white people. Hate Hate Non-hate (Sentiment: Neural)
11 And f*ck you too ya, little b*tch. You look like Mexican sucking c*ck in ur profile. Hate Offensive Offensive (Sentiment: Neural)
12 The nig is fun to watch. Got to admit Ebola boy can speak broken Spanish too, lol. Hate Offensive Offensive (Sentiment: Neural)
Table 11. Example DT dataset’s posts evaluated using DeepHate and best baseline (i.e., CNN-GRU)
S/N Post True Label CNN-GRU DeepHate
13 Hate this b*tch with all the hate on the world. Hate Abusive Hate (Sentiment: Negative)
14 Slow replies annoy the hell out of me. Abusive Normal Abusive (Sentiment: Negative)
15 We have the worst maternal mortality rate of any modern country and that will kill women. What is wrong with them? Normal Hate Normal (Sentiment: Neural)
16 Don’t like to hear women call their baby their little men. This is just another gender storm in cup. Normal Normal Abusive (Sentiment: Neural)
17 Looks like both the Chinese and the India news media have temporary amnesia. Hate Normal Abusive (Sentiment: Neural)
18 Netizens are so f*ckin annoying and dumb. Hate Abusive Abusive (Sentiment: Negative)
Table 12. Example FOUNTA dataset’s posts evaluated using DeepHate and best baseline (i.e., CNN-GRU)

6. Conclusion and Future Work

In this paper, we proposed a novel deep learning framework known as DeepHate, which utilized multi-faceted text representations for automatic hate speech detection. We evaluated Deephate on three publicly available real-world datasets, and our extensive experiments have shown that Deephate outperformed the state-of-the-art baselines. We have also empirically analyzed the Deephate model and provided insights into the salient features that best aided in detecting hate speech in online social platforms. Our salient feature analysis provided some form of explanation to Deephate’s hate classification decision. For future works, we will like to incorporate non-textual features into Deephate model and improve the posts’ sentiment and topic representations with more advanced techniques.

References

  • S. Agrawal and A. Awekar (2018) Deep learning for detecting cyberbullying across multiple social media platforms. In European Conference on Information Retrieval, pp. 141–153. Cited by: 1st item, 2nd item.
  • A. Arango, J. Pérez, and B. Poblete (2019) Hate speech detection is not as easy as you may think: a closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 45–54. Cited by: §1, §2.
  • P. Badjatiya, S. Gupta, M. Gupta, and V. Varma (2017) Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. Cited by: §1, §2, §2, 1st item, 2nd item.
  • R. Balasubramanyan and W. W. Cohen (2013) Regularization of latent variable models to obtain sparsity. In Proceedings of the 2013 SIAM International Conference on Data Mining, pp. 414–422. Cited by: §3.3.
  • D. M. Blei, A. Y. Ng, and M. I. Jordan (2003) Latent dirichlet allocation. Journal of machine Learning research 3 (Jan), pp. 993–1022. Cited by: §1, §3.3.
  • D. M. Blei (2012) Probabilistic topic models. Communications of the ACM 55 (4), pp. 77–84. Cited by: §3.3.
  • Bloomberg (2019) External Links: Link Cited by: §1.
  • [8] Cambridge(Website) External Links: Link Cited by: §1.
  • D. Chatzakou, N. Kourtellis, J. Blackburn, E. De Cristofaro, G. Stringhini, and A. Vakali (2017) Mean birds: detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on web science conference, pp. 13–22. Cited by: §1, §2.
  • Y. Chen, Y. Zhou, S. Zhu, and H. Xu (2012) Detecting offensive language in social media to protect adolescent online safety. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, pp. 71–80. Cited by: §1, §2.
  • T. Davidson, D. Warmsley, M. Macy, and I. Weber (2017) Automated hate speech detection and the problem of offensive language. In Eleventh international aaai conference on web and social media, Cited by: §1, §2, §2, §4, §4, §5.2.
  • N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati (2015) Hate speech detection with comment embeddings. In Proceedings of the 24th international conference on world wide web, pp. 29–30. Cited by: §1, §2.
  • [13] Facebook(Website) External Links: Link Cited by: §1.
  • P. Fortuna and S. Nunes (2018) A survey on automatic detection of hate speech in text. ACM Computing Surveys (CSUR) 51 (4), pp. 1–30. Cited by: §1, §1, §2.
  • A. M. Founta, D. Chatzakou, N. Kourtellis, J. Blackburn, A. Vakali, and I. Leontiadis (2019) A unified deep learning architecture for abuse detection. In Proceedings of the 10th ACM Conference on Web Science, pp. 105–114. Cited by: §1, §2, §2.
  • A. M. Founta, C. Djouvas, D. Chatzakou, I. Leontiadis, J. Blackburn, G. Stringhini, A. Vakali, M. Sirivianos, and N. Kourtellis (2018) Large scale crowdsourcing and characterization of twitter abusive behavior. In Twelfth International AAAI Conference on Web and Social Media, Cited by: §4, §4.
  • B. Gambäck and U. K. Sikdar (2017) Using convolutional neural networks to classify hate-speech. In Proceedings of the first workshop on abusive language online, pp. 85–90. Cited by: §1, §1, §2, §2, 1st item.
  • A. Giachanou and F. Crestani (2016) Like it or not: a survey of twitter sentiment analysis methods. ACM Computing Surveys (CSUR) 49 (2), pp. 1–41. Cited by: §3.2.
  • Y. Goldberg (2016) A primer on neural network models for natural language processing.

    Journal of Artificial Intelligence Research

    57, pp. 345–420.
    Cited by: §2.
  • I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep feedforward networks. Deep learning, pp. 168–227. Cited by: §2, §3.4, §3.4.
  • T. Gröndahl, L. Pajola, M. Juuti, M. Conti, and N. Asokan (2018) All you need is” love” evading hate speech detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pp. 2–12. Cited by: §1, §2, 2nd item.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §3.4.
  • L. Hong and B. D. Davison (2010) Empirical study of topic modeling in twitter. In Proceedings of the first workshop on social media analytics, pp. 80–88. Cited by: §3.3.
  • C. J. Hutto and E. Gilbert (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In Eighth international AAAI conference on weblogs and social media, Cited by: §1, §3.2, §5.1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.6.
  • J. Li, X. Chen, E. Hovy, and D. Jurafsky (2016) Visualizing and understanding neural models in nlp. In NAACL:HLT, Cited by: §5.4.
  • B. Mathew, R. Dutt, P. Goyal, and A. Mukherjee (2019) Spread of hate speech in online social media. In Proceedings of the 10th ACM Conference on Web Science, pp. 173–182. Cited by: §1.
  • Y. Mehdad and J. Tetreault (2016) Do characters abuse more than words?. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 299–303. Cited by: §1, §2, §2.
  • T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin (2018) Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Cited by: §1, §3.1.
  • C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang (2016) Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web, pp. 145–153. Cited by: §1, §2.
  • E. Papegnies, V. Labatut, R. Dufour, and G. Linares (2017) Graph-based features for automatic online abuse detection. In International Conference on Statistical Language and Speech Processing, pp. 70–81. Cited by: §2.
  • J. H. Park and P. Fung (2017) One-step and two-step classification for abusive language detection on twitter. In Proceedings of the First Workshop on Abusive Language Online, pp. 41–45. Cited by: §1, §2, §2, §4, §4, 3rd item, §5.2.
  • J. Pennington, R. Socher, and C. D. Manning (2014) Glove: global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543. Cited by: §1, §3.1.
  • K. Relia, Z. Li, S. H. Cook, and R. Chunara (2019) Race, ethnicity and national origin-based discrimination in social media and hate crimes across 100 us cities. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13, pp. 417–427. Cited by: §1.
  • A. Schmidt and M. Wiegand (2017) A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. Cited by: §1, §1, §2.
  • V. K. Singh, S. Ghosh, and C. Jose (2017) Toward multimodal cyberbullying detection. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 2090–2099. Cited by: §2.
  • D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1555–1565. Cited by: §3.2, §3.2.
  • M. Tang, J. Cai, and H. H. Zhuo (2019) Multi-matching network for multiple choice reading comprehension. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, pp. 7088–7095. Cited by: §3.5.
  • Times (2019) External Links: Link Cited by: §1.
  • [40] Twitter(Website) External Links: Link Cited by: §1.
  • Z. Waseem and D. Hovy (2016) Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop, pp. 88–93. Cited by: §1, §1, §2, §2, §4.
  • Z. Waseem (2016) Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In Proceedings of the first workshop on NLP and computational social science, pp. 138–142. Cited by: §1, §2, §4.
  • J. Wieting, M. Bansal, K. Gimpel, and K. Livescu (2015) From paraphrase database to compositional paraphrase model and back. Transactions of the Association for Computational Linguistics 3, pp. 345–358. Cited by: §1, §3.1.
  • M. Williams (2019) External Links: Link Cited by: §1.
  • G. Xiang, B. Fan, L. Wang, J. Hong, and C. Rose (2012) Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 1980–1984. Cited by: §2.
  • Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy (2016) Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 1480–1489. Cited by: §2, §3.4, §3.4.
  • Z. Zhang, D. Robinson, and J. Tepper (2018) Detecting hate speech on twitter using a convolution-gru based deep neural network. In European semantic web conference, pp. 745–760. Cited by: §1, §2, §2, 4th item, §5.2.
  • W. X. Zhao, J. Jiang, J. Weng, J. He, E. Lim, H. Yan, and X. Li (2011) Comparing twitter and traditional media using topic models. In European conference on information retrieval, pp. 338–349. Cited by: §3.3.
  • C. Zhou, C. Sun, Z. Liu, and F. Lau (2015) A c-lstm neural network for text classification. arXiv preprint arXiv:1511.08630. Cited by: §3.4.