Context-Aware Attention for Understanding Twitter Abuse

09/24/2018 ∙ by Tuhin Chakrabarty, et al. ∙ Columbia University 0

The original goal of any social media platform is to facilitate users to indulge in healthy and meaningful conversations. But more often than not, it has been found that it becomes an avenue for wanton attacks. We want to alleviate this issue and hence we try to provide a detailed analysis of how abusive behavior can be monitored in Twitter. The complexity of the natural language constructs makes this task challenging. We show how applying contextual attention to Long Short Term Memory networks help us give near state of art results on multiple benchmarks abuse detection data sets from Twitter.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction & Related Work

Any social interaction involves an exchange of viewpoints and thoughts. But these views and thoughts can be caustic. Often we see that users resort to verbal abuse to win an argument or overshadow someone’s opinion. On Twitter, people from every sphere have experienced online abuse. Be it a famous celebrity with millions of followers or someone representing a marginalized community such as LGBTQ, Women and more. We want to channelize Natural Language Processing (NLP) for social good and aid in the process of flagging abusive tweets and users. Detecting abuse on Twitter can be challenging, particularly because the text is often noisy. Abuse can also have different facets. (Waseem and Hovy, 2016) released one of the initial data sets from Twitter with the goal of identifying what constitutes racism and sexism. (Thomas Davidson, 2016) in their work pointed out that hate speech is different from offensive language and released a data set of 25k tweets with the goal of distinguishing hate speech from offensive language.

Stop saying dumb blondes with pretty
faces as you need a pretty face to pull
them off !!! #mkr
In Islam women must be locked in their
houses and Muslims claim this is treating
them well
Table 1. Tweets from (Waseem and Hovy, 2016) data set demonstrating online abuse

They find that racist and homophobic tweets are more likely to be classified as hate speech but sexist tweets are generally classified as offensive.

(Jennifer Golbeck, 2017) introduced a large, hand-coded corpus of online harassment data for studying the nature of harassing comments and the culture of trolling. Keeping these motivations in mind, we make the following salient contributions:

  • We build a deep context-aware attention-based model for abusive behavior detection on Twitter . To the best of our knowledge ours is the first work that exploits context aware attention for this task.

  • Our model is robust and achieves consistent performance gains in all the three abusive data sets

  • We show how context aware attention helps in focusing on certain abusive keywords when used in specific context and improve the performance of abusive behavior detection .

2. Related Work

Existing approaches to abusive text detection can be broadly divided into two categories: 1) Feature intensive machine learning algorithms such as Logistic Regression (LR), Multilayer Perceptron (MLP) and etc. 2) Deep Learning models which learn feature representations on their own.

(Waseem and Hovy, 2016) released the popular data set of 16k tweets annotated as belonging to sexism, racism or none class 111 , and provided a feature engineered model for detection of abuse in their corpus. (Thomas Davidson, 2016) use a similar handcrafted feature engineered model to identify offensive language and distinguish it from hate speech. (Badjatiya et al., 2017) in their work, experiment with multiple deep learning architectures for the task of hate speech detection on Twitter using the same data set by (Waseem and Hovy, 2016)

. Their best-reported F1-score is achieved using Long Short Term Memory Networks (LSTM) + Gradient Boosting.

On the data set released by (Waseem and Hovy, 2016), (Park and Fung, 2017)

experiment with a two-step approach of detecting abusive language first and then classifying them into specific types i.e. racist, sexist or none. They achieve best results using a Hybrid Convolution Neural Network (CNN) with the intuition that character level input would counter the purposely or mistakenly misspelled words and made-up vocabularies.

(Pavlopoulos et al., 2017a) in their work ran experiments on the Gazetta dataset and the DETOX system ((Wulczyn et al., 2017)

) and show that a Recurrent Neural Network (RNN) coupled with deep, classification-specific attention outperforms the previous state of the art in abusive comment moderation. In their more recent work

(Pavlopoulos et al., 2017b) explored how user embeddings, user-type embeddings, and user type biases can improve their previous RNN based model on the Gazetta dataset. Attentive neural networks have been shown to perform well on a variety of NLP tasks ((Yang et al., 2016), (Wenpeng Yin and Zhou, 2015)). (Yang et al., 2016) use hierarchical contextual attention for text classification (i.e attention both at word and sentence level) on six large scale text classification tasks and demonstrate that the proposed architecture outperform previous methods by a substantial margin.We primarily focus on word level attention because most of the tweets are single sentence tweets.

3. Model

The best choice for modeling tweets was Long Short Term Memory Networks (LSTMs) because of their ability to capture long-term dependencies by introducing a gating mechanism that ensures the proper gradient propagation through the network. We use bidirectional LSTMs because of their inherent capability of capturing information from both: the past and the future states. A bidirectional LSTM (BiLSTM) consists of a forward LSTM that reads the sentence from to and a backward LSTM that reads the sentence from to , where T is the number of words in the sentence under consideration and is the word in the sentence. We obtain the final annotation for a given word , by concatenating the annotations from both directions (Eq. [1]). (Alex Graves, 2013) show that LSTMs can benefit from depth in space.Stacking multiple recurrent hidden layers on top of each other, just as feed forward layers are stacked in the conventional deep networks give performance gains .And hence we choose stacked LSTM for our experiments.

Figure 1. Architecture for Stacked BiLSTM + Word Level Contextual Attention. Figure is inspired by (Yang et al., 2016)

3.1. Word Attention

The attention mechanism assigns a weight to each word annotation that is obtained from the BiLSTM layer. We compute the fixed representation v

of the whole message as a weighted sum of all the word annotations which is then fed to a final fully-connected Softmax layer to obtain the class probabilities. We first feed the LSTM output

of each word through a Multi Layer Perceptron to get

as its hidden representation.

is our word level context vector that is randomly initialized and learned as we train our network. Once

is obtained we calculate the importance of the word as the similarity of with and get a normalized importance weight through a softmax function. The context vector can be seen as a tool which filters which word is more important over all the words like that used in the LSTM. Figure 2 shows the high-level architecture of this model. and are the attention layer’s weights and biases. More formally,


4. Experiments

In this section we talk about data sets first and then go on to show our results obtained on these three data sets .We also show some examples where our model failed . Finally we show how attention helps us understand the model in a better fashion.

4.1. Data Sets

We have used the 3 benchmark data sets for abusive content detection on Twitter. At the time of the experiment, the (Waseem and Hovy, 2016) data set had a total of 15,844 tweets out of which 1,924 were labelled as belonging to racism, 3,058 as sexism and 10,862 as none. The (Thomas Davidson, 2016) data set had a total of 25,112 tweets out of which 1498 were labelled as hate speech, 19,326 as offensive language and 4,288 as neither. For the (Jennifer Golbeck, 2017) data set, there were 20,362 tweets out of which 5,235 were positive harassment examples and 15,127 were negative.

We call (Waseem and Hovy, 2016) data set as D1 , (Thomas Davidson, 2016) data set as D2 and (Jennifer Golbeck, 2017) as D3

Data Set Tweets Count
(Waseem and Hovy, 2016) 15,844
(Thomas Davidson, 2016) 25,112
(Jennifer Golbeck, 2017) 20,362
Table 2. Data sets and their total tweets count

For tweet tokenization, we use Ekphrasis which is a text processing tool built specially from social platforms such as Twitter.

(Baziotis et al., 2017) use a big collection of Twitter messages (330M) to generate word embeddings, with a vocabulary size of 660K words, using GloVe ((Pennington et al., 2014)). We use these pre-trained word embeddings for initializing the first layer (embedding layer) of our neural networks.

4.2. Results

The network is trained at a learning rate of 0.001 for 10 epochs, with a dropout of 0.2 to prevent over-fitting. The results are averaged over 10-fold cross-validations for D1 and D3 and 5 fold cross-validations for D2 because

(Thomas Davidson, 2016) reported results using 5 fold CV. Because of class imbalance in all our data sets, we report weighted F1 scores.

Table 3 shows our results in detail. We compare our model with the best models reported in each paper. Because (Jennifer Golbeck, 2017) is a data set paper, we cannot fill the corresponding row. * denotes the numbers from baseline papers. All the results were reproducible except for the one marked red. For (Waseem and Hovy, 2016) data set, (Badjatiyaet al., 2017) claim that using Gradient Boosting with LSTM embeddings obtained from random word embeddings boosted their performance by 12 F1 from 81.0 to 93.0. When we tried to reproduce the result, we did not find any significant improvement over 81. Results show that our model is robust when it comes to the performance on all of the three data sets.

Models D1 D2 D3
(Waseem and Hovy, 2016) 73.8* 82.3 63.0
(Thomas Davidson, 2016) 78.0 90.0* 69.0
(Jennifer Golbeck, 2017) - - -
(Park and Fung, 2017) 82.7* 88.0 70.6
(Badjatiya et al., 2017) 93.1* 88.0 65.7
Our Model 84.2 91.1 72.7
Table 3. Data sets and the results of different models. We reproduced the results for each model on three of the data sets.

We also share some examples from the three data sets in Figure 2

which our BiLSTM attention model could not classify correctly. On closer investigation we find that most cases where our model fails are instances where annotation is either noisy or the difference between classes are very blurred and subtle.

Figure 2. The first tweet is a tweet from (Waseem and Hovy, 2016), the second tweet is a tweet from from (Thomas Davidson, 2016) data set and the third from the (Jennifer Golbeck, 2017) datset

4.3. Why Contextual Attention?

Attention mechanism enables our neural network to focus on the relevant parts of the input more than the irrelevant parts while performing a prediction task. But the relevance is often dependant on the context and so the importance of words is highly context dependent. For example, the word islam may appear in the realm of Racism as well as in any normal conversation.The top tweet in Figure 3 belongs to None class while the bottom tweet belongs to Racism class.

Figure 3. An example showing how our model captures diverse context and assigns context-dependent weights to the same word in two different tweets.

4.4. Attention Heat Map Visualization

The color intensity corresponds to the weight given to each word by the contextual attention.

Figure 4. The first tweet is a sexist tweet from (Waseem and Hovy, 2016) where as the second tweet is an example of racist tweet from the same datset . The third tweet is from (Thomas Davidson, 2016) data set labelled as offensive language.

5. Conclusion and Future Work

We successfully built a deep context-aware attention-based model and applied it to the task of abusive tweet detection. We ran experiments on three relevant data sets and empirically showed how our model is robust when it comes to detecting abuse on Twitter. We also show how context-aware attention helps us to interpret the model’s performance by visualizing the attention weights and conducting thorough error analysis.
As for future work, we want to experiment with a model that learns user embeddings from their historical tweets. We also want to model abusive text classification in Twitter by taking tweets in context because often standalone tweets don’t give a clear picture of a tweet’s intent.


  • (1)
  • Alex Graves (2013) Geoffrey Hinton Alex Graves, Abdel-rahman Mohamed. 2013. Speech Recognition with Deep Recurrent Neural Networks. International Conference on Acoustics, Speech, and Signal Processing (2013).
  • Badjatiya et al. (2017) Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 759–760.
  • Baziotis et al. (2017) Christos Baziotis, Nikos Pelekis, and Christos Doulkeridis. 2017.

    Datastories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In

    Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, 747–754.
  • Jennifer Golbeck (2017) Rashad O Banjo Alexandra Berlinger Siddharth Bhagwan Cody Buntain Paul Cheakalos Alicia A Geller Quint Gergory Rajesh Kumar Gnanasekaran Raja Rajan Gunasekaran Kelly M Hoffman Jenny Hottle Vichita Jienjitlert Shivika Khare Ryan Lau Marianna J Martindale Shalmali Naik Heather L Nixon Piyush Ramachandran Kristine M Rogers Lisa Rogers Meghna Sardana Sarin Gaurav Shahane Jayanee Thanki Priyanka Vengataraman Zijian Wan Derek Michael Wu Jennifer Golbeck, Zahra Ashktorab. 2017. Hack Harassment: Technology Solutions to Combat Online Harassment. In ACM. 229–233.
  • Park and Fung (2017) Ji Ho Park and Pascale Fung. 2017. One-step and Two-step Classification for Abusive Language Detection on Twitter. (2017).
  • Pavlopoulos et al. (2017a) John Pavlopoulos, Prodromos Malakasiotis, and Ion Androutsopoulos. 2017a. Deeper attention to abusive user content moderation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1125–1135.
  • Pavlopoulos et al. (2017b) John Pavlopoulos, Prodromos Malakasiotis, Juli Bakagianni, and Ion Androutsopoulos. 2017b. Improved Abusive Comment Moderation with User Embeddings. arXiv preprint arXiv:1708.03699 (2017).
  • Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
  • Thomas Davidson (2016) Michael Macy Ingmar Weber Thomas Davidson, Dana Warmsley. 2016. Automated Hate Speech Detection and the Problem of Offensive Language.. In (ICWSM 2017). 3952–3958.
  • Waseem and Hovy (2016) Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL student research workshop. 88–93.
  • Wenpeng Yin and Zhou (2015) Bing Xiang Wenpeng Yin, Hinrich Schutze and Bowen Zhou. 2015. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. arXiv preprint arXiv:1512.05193 . (2015).
  • Wulczyn et al. (2017) Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex machina: Personal attacks seen at scale. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1391–1399.
  • Yang et al. (2016) Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480–1489.