Unified and Multilingual Author Profiling for Detecting Haters

by   Ipek Baris Schlicht, et al.

This paper presents a unified user profiling framework to identify hate speech spreaders by processing their tweets regardless of the language. The framework encodes the tweets with sentence transformers and applies an attention mechanism to select important tweets for learning user profiles. Furthermore, the attention layer helps to explain why a user is a hate speech spreader by producing attention weights at both token and post level. Our proposed model outperformed the state-of-the-art multilingual transformer models.



There are no comments yet.


page 1

page 2

page 3

page 4


Interpretable Rumor Detection in Microblogs by Attending to User Interactions

We address rumor detection by learning to differentiate between the comm...

Perceived and Intended Sarcasm Detection with Graph Attention Networks

Existing sarcasm detection systems focus on exploiting linguistic marker...

Gender Prediction from Tweets: Improving Neural Representations with Hand-Crafted Features

Author profiling is the characterization of an author through some key a...

Turning transformer attention weights into zero-shot sequence labelers

We demonstrate how transformer-based models can be redesigned in order t...

To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Hate speech is an important problem in the management of user-generated ...

Improving Tweet Representations using Temporal and User Context

In this work we propose a novel representation learning model which comp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Hate speech is a type of online harm that expresses hostility toward individuals and social groups based on race, beliefs, sexual orientation, etc. [levy1986encyclopedia]. Hateful content is disseminated faster and reaches wider users than non-hateful contents through social media [MathewDG019, ziems]. This dissemination could trigger prejudices and violence. As a recent example of this, during the COVID-19 pandemic, people of Chinese origin suffered from discrimination and hate crimes [wang2021m, he2020discrimination]. Policymakers and social media companies work hard on mitigating hate speech and the other types of abusive language [preslav_survey_2021] while keeping balance of freedom of expression. AI systems are encouraged for easing the process and understanding the rationales behind hate speech dissemination [schmidt2017survey, FortunaN18].

In natural language processing, hate speech has been widely studied in social media (e.g 

[DBLP:conf/semeval/BasileBFNPPRS19, poletto2020resources]) or as a task of news comment moderation (e.g  [korencic-etal-2021-block, shekhar2020automating]). However, majority of the prior studies formulates the problem as a text classification [macavaney2019hate, schmidt2017survey] that determines whether an individual post is hate speech. This year, PAN 2021 organization [bevendorff:2021b] proposed to explore the task as an author profiling problem [rangel:2021]. In this case, the objective is to identify possible hate speech spreaders on Twitter as an initial effort towards preventing hate speech from being propagated among online users [rangel:2021].

In a similar shared task on profiling fake news spreaders [PardoGGR20], many approaches rely on appending tweets to one text for each user (e.g [vogel2020fake, buda2020ensemble, Pizarro20]) to encode the inputs. However, this approach could be problematic if not all the tweets shared by hate speech spreaders convey hatred messages, and a human moderator needs a detailed justification to ban users or delete related tweets. Furthermore, the global issues such as COVID-19 attract heated discussions from the users worldwide, thus there is a need for supporting multi-language systems to moderate those discussions. With these motivations, we propose a unified framework which is scalable to other languages and explains why a user receives a certain label based on the language used in her tweets by using token level and post level attention mechanisms [VaswaniSPUJGKP17], as shown in Figure 1. Our model outperformed multilingual DistillBERT [DBLP:journals/corr/abs-1910-01108] models. The source code is publicly available111https://github.com/isspek/Cross-Lingual-Cyberbullying.

Figure 1: The Proposed Framework

2 Methodology

Our proposed framework is shown in Figure 1. The input of the framework is a author profile that posts n number tweets. Each post is encoded with a Sentence Transformer, and then the encoded tweets pass through an attention layer. Finally, the output of the attention layer is fed into a classification layer which decides whether the author is a hate speech spreader or not. We give more details of each component in the subsequent sections.

2.1 Post Encodings

We encode the tweets with a Sentence-BERT (SBERT) [ReimersG19], a modified BERT [DevlinCLT19] network and consists of Siamese and Triplet network structures. SBERTs are computationally more effective than BERT models and could provide semantically more meaningful sentence representations. Like BERT models, SBERTs also have variations [DBLP:journals/corr/abs-1910-03771] that are publicly available. Since we have a limited resource to train our framework, and aim to use a language model that learns the usages of social language, we prefer the pre-trained SBERT that is trained on Quora corpus in 50 languages, and its knowledge is distilled [reimers-2020-multilingual-sentence-bert]

. The SBERT produces outputs with 768 hidden layers. We set the maximum length of the post as 32, and apply zero padding on any texts shorter than 32 tokens. The sentence embeddings are obtained by mean pooling operation on the last hidden of the outputs.

2.2 Post-Level Attention Layer

We employ an attention layer in order to learn importance scores for determining author profile vectors. First, the pooled tweets (


) are projected by feeding them to a linear layer which produces a hidden representation of the author profile (

Hap) as shown in Equation 1

. Next, a softmax layer is applied to get the similarity between the post and author profile (

Hap). Lastly the similarity scores are multiplied with the author profile to obtain the attended author profile (), as seen in Equation 2.


2.3 Classification Layer

The classification layer consists of two linear layers. The output of the first layer is activated with the tanh function to learn the non-linearity in the features. The second layer outputs the probabilities for each class. The input of the classification layer is the attended user profile followed by a dropout layer which prevents the over-fitting. We use a cross entropy loss function for the outputs of the classification layer and an Adam optimizer with a weight decay. During training, the weights of the models are optimized by minimizing the loss, and the batches contain mixed English and Spanish samples.

3 Experiments

3.1 Dataset

PAN Profiling Hate Speech Spreader Task [rangel:2021] contains a dataset in English and Spanish, whose samples were collected from Twitter. The total number of the profiles are 200 for each language, and each profile is composed of a feed of 200 tweets. The class distribution of the dataset is highly balanced. We observe a significant difference between the length of tweets by hate speech spreaders and normal profiles in the Spanish set. The statistics of the dataset are summarized in Table 1.

Stats En Es
#Total Profiles 200 200
#Hate Speech Spreaders 100 100
#Tweets per Profile 200 200
#Mean and Std of Tweets by hate speech Spreader 67.72 30.34 75.32 28.91
#Mean and Std of Tweets by Normal Profiles 67.42 29.05 68.47 28.99
Table 1: The statistics of the training dataset

3.2 Preprocessing

The organizers have already cleaned the samples in the dataset. For example, certain patterns have been replaced with special tags. We extend the vocabulary of the models’ tokenizers with these tags as follows:

  • #URL# is replaced with [URL]

  • #HASHTAG# is replaced with [HASHTAG]

  • #USER# is replaced with [USER]

  • RT is replaced with [RT]

3.3 Baselines and Ablation Models

We compare the performance of our model with a set of baselines and an ablation model as follows:

  • DistillBERT [DBLP:journals/corr/abs-1910-01108]: We use one of its version that is multilingual and cased sensitive. First each tweets of an author is joined to obtain one text. Then the joined texts for each users are fine-tuned with the DistillBERT by keeping their maximum length as 500 tokens.

  • DistillBERT*: We additionally add [POSTSTART] and [POSTEND] tags, which indicate the start and the end of the tweets, to the vocabulary of the extended DistillBERT tokenizer.

  • SBERT-Mean: is an ablation model that replaces the attention layer with a mean pooling layer which computes the mean values of the tweets’ hidden representations.

3.4 Training Settings

We train the models by applying 5-Fold Cross Validation222We experiment also 10-Fold, but the models show worse performance in the test set.

, with the epochs of 5, learning rate as 1e-5, batch size as 2. We use the GPU of the Google Colab

333https://colab.research.google.com/ as an environment for training the models. We use a fixed random seed of 1234 to ensure reproducible results. The official results are obtained by a TIRA machine [potthast:2019n].

4 Results and Discussion

We report the F1-Macro, F1-Weighted, accuracy, precision, and recall for each model. Table 


presents the results of the 5-fold cross validation training. SBERT-Attn, the model that we propose, outperformed the other models in all metrics. When we compare SBERT-Mean and SBERT-Attn, we see that standard deviations of the SBERT-Attn are lower than the ablation model. This result indicates that the attention layer enables more generalized feature representations. It also shows that the tweets by the hate speech spreader are not necessarily hatred tweets and vice versa for the non haters. For this reason, the DistillBERT models that joins the all tweets by the user to one underperformed.

Models F1-Macro F1-Weighted Accuracy Precision Recall
DistillBERT 67.46 5.28 67.58 5.37 67.75 5.15 67.04 5.68 71.46 1.63
DistillBERT* 61.90 3.01 62.04 3.22 62.25 3.39 63.13 4.40 59.86 7.49
SBERT-Mean 69.55 6.82 69.58 6.71 69.75 6.86 67.38 3.61 77.10 12.12
SBERT-Attn 73.62 4.11 73.77 4.12 74.0 4.14 70.97 5.39 81.23 5.39
Table 2: The results of the 5 Fold Cross Validation Experiment
Mode Language Accuracy
Cross-Val En 67.09 7.88
Es 80.54 1.78
Official Result En 58
Es 77
Table 3: Cross validation for each language and the PAN shared official result.

For the submission to the PAN shared task, we leverage the 5-fold trained models to obtain the predictions on official test set. The final predictions are the majority class. Table 3

shows cross validation results for the English samples and the Spanish samples, and the official results of the PAN shared task where the accuracy is the evaluation metric. Our model obtained a result with similar range in cross-validation. The performance of the English set is worse than the Spanish one. Cultural bias or the topical difference could be reasons for the performance. We leave the detailed analysis of these issues as future work.

5 Visualizations

Figure 2: Attention visualizations for English and Spanish. The original sentence in English is [USER] [USER] Yes, you’re a part of feminism. And that’s because you aren’t a man; and the other in Spanish is [USER] [USER] Le quedan grandes, como su vicepresidencia (Some emojis)

Our framework can provide explanations with tweet-level and token-level attention, as shown in Figure 2. The token-level attentions are the average of the attentions in the last layer of the SBERT and they are obtained through the self-attention mechanism. The tweet-level attentions are obtained with the attention layer, which is connected to the classification layer. The examples in the figure are the most hatred examples from the authors that are analysed. In the English example, the model pays attention to feminism. In the Spanish example, vice presidencia is the important entity.

6 Conclusion

In this paper, we presented a unified framework for monitoring hate speech spreaders in multilingualism. The framework leverages multilingual SBERT representations to encode texts regardless of the language and uses an attention mechanism to determine the importance of the tweets by the author in the task. Our methods outperformed multilingual DistillBERT and SBERT that apply mean pooling on the tweets.

In the future, we plan to evaluate the method on the related user profiling tasks such as profiling fake news spreaders [PardoGGR20] and investigate advanced method (e.g [pfeiffer2020adapterhub]) for effectively transferring knowledge across the languages.