Hate speech is a type of online harm that expresses hostility toward individuals and social groups based on race, beliefs, sexual orientation, etc. [levy1986encyclopedia]. Hateful content is disseminated faster and reaches wider users than non-hateful contents through social media [MathewDG019, ziems]. This dissemination could trigger prejudices and violence. As a recent example of this, during the COVID-19 pandemic, people of Chinese origin suffered from discrimination and hate crimes [wang2021m, he2020discrimination]. Policymakers and social media companies work hard on mitigating hate speech and the other types of abusive language [preslav_survey_2021] while keeping balance of freedom of expression. AI systems are encouraged for easing the process and understanding the rationales behind hate speech dissemination [schmidt2017survey, FortunaN18].
In natural language processing, hate speech has been widely studied in social media (e.g[DBLP:conf/semeval/BasileBFNPPRS19, poletto2020resources]) or as a task of news comment moderation (e.g [korencic-etal-2021-block, shekhar2020automating]). However, majority of the prior studies formulates the problem as a text classification [macavaney2019hate, schmidt2017survey] that determines whether an individual post is hate speech. This year, PAN 2021 organization [bevendorff:2021b] proposed to explore the task as an author profiling problem [rangel:2021]. In this case, the objective is to identify possible hate speech spreaders on Twitter as an initial effort towards preventing hate speech from being propagated among online users [rangel:2021].
In a similar shared task on profiling fake news spreaders [PardoGGR20], many approaches rely on appending tweets to one text for each user (e.g [vogel2020fake, buda2020ensemble, Pizarro20]) to encode the inputs. However, this approach could be problematic if not all the tweets shared by hate speech spreaders convey hatred messages, and a human moderator needs a detailed justification to ban users or delete related tweets. Furthermore, the global issues such as COVID-19 attract heated discussions from the users worldwide, thus there is a need for supporting multi-language systems to moderate those discussions. With these motivations, we propose a unified framework which is scalable to other languages and explains why a user receives a certain label based on the language used in her tweets by using token level and post level attention mechanisms [VaswaniSPUJGKP17], as shown in Figure 1. Our model outperformed multilingual DistillBERT [DBLP:journals/corr/abs-1910-01108] models. The source code is publicly available111https://github.com/isspek/Cross-Lingual-Cyberbullying.
Our proposed framework is shown in Figure 1. The input of the framework is a author profile that posts n number tweets. Each post is encoded with a Sentence Transformer, and then the encoded tweets pass through an attention layer. Finally, the output of the attention layer is fed into a classification layer which decides whether the author is a hate speech spreader or not. We give more details of each component in the subsequent sections.
2.1 Post Encodings
We encode the tweets with a Sentence-BERT (SBERT) [ReimersG19], a modified BERT [DevlinCLT19] network and consists of Siamese and Triplet network structures. SBERTs are computationally more effective than BERT models and could provide semantically more meaningful sentence representations. Like BERT models, SBERTs also have variations [DBLP:journals/corr/abs-1910-03771] that are publicly available. Since we have a limited resource to train our framework, and aim to use a language model that learns the usages of social language, we prefer the pre-trained SBERT that is trained on Quora corpus in 50 languages, and its knowledge is distilled [reimers-2020-multilingual-sentence-bert]
. The SBERT produces outputs with 768 hidden layers. We set the maximum length of the post as 32, and apply zero padding on any texts shorter than 32 tokens. The sentence embeddings are obtained by mean pooling operation on the last hidden of the outputs.
2.2 Post-Level Attention Layer
We employ an attention layer in order to learn importance scores for determining author profile vectors. First, the pooled tweets (Hp
) are projected by feeding them to a linear layer which produces a hidden representation of the author profile (Hap) as shown in Equation 1
. Next, a softmax layer is applied to get the similarity between the post and author profile (Hap). Lastly the similarity scores are multiplied with the author profile to obtain the attended author profile (), as seen in Equation 2.
2.3 Classification Layer
The classification layer consists of two linear layers. The output of the first layer is activated with the tanh function to learn the non-linearity in the features. The second layer outputs the probabilities for each class. The input of the classification layer is the attended user profile followed by a dropout layer which prevents the over-fitting. We use a cross entropy loss function for the outputs of the classification layer and an Adam optimizer with a weight decay. During training, the weights of the models are optimized by minimizing the loss, and the batches contain mixed English and Spanish samples.
PAN Profiling Hate Speech Spreader Task [rangel:2021] contains a dataset in English and Spanish, whose samples were collected from Twitter. The total number of the profiles are 200 for each language, and each profile is composed of a feed of 200 tweets. The class distribution of the dataset is highly balanced. We observe a significant difference between the length of tweets by hate speech spreaders and normal profiles in the Spanish set. The statistics of the dataset are summarized in Table 1.
|#Hate Speech Spreaders||100||100|
|#Tweets per Profile||200||200|
|#Mean and Std of Tweets by hate speech Spreader||67.72 30.34||75.32 28.91|
|#Mean and Std of Tweets by Normal Profiles||67.42 29.05||68.47 28.99|
The organizers have already cleaned the samples in the dataset. For example, certain patterns have been replaced with special tags. We extend the vocabulary of the models’ tokenizers with these tags as follows:
#URL# is replaced with [URL]
#HASHTAG# is replaced with [HASHTAG]
#USER# is replaced with [USER]
RT is replaced with [RT]
3.3 Baselines and Ablation Models
We compare the performance of our model with a set of baselines and an ablation model as follows:
DistillBERT [DBLP:journals/corr/abs-1910-01108]: We use one of its version that is multilingual and cased sensitive. First each tweets of an author is joined to obtain one text. Then the joined texts for each users are fine-tuned with the DistillBERT by keeping their maximum length as 500 tokens.
DistillBERT*: We additionally add [POSTSTART] and [POSTEND] tags, which indicate the start and the end of the tweets, to the vocabulary of the extended DistillBERT tokenizer.
SBERT-Mean: is an ablation model that replaces the attention layer with a mean pooling layer which computes the mean values of the tweets’ hidden representations.
3.4 Training Settings
We train the models by applying 5-Fold Cross Validation222We experiment also 10-Fold, but the models show worse performance in the test set.
, with the epochs of 5, learning rate as 1e-5, batch size as 2. We use the GPU of the Google Colab333https://colab.research.google.com/ as an environment for training the models. We use a fixed random seed of 1234 to ensure reproducible results. The official results are obtained by a TIRA machine [potthast:2019n].
4 Results and Discussion
We report the F1-Macro, F1-Weighted, accuracy, precision, and recall for each model. Table2
presents the results of the 5-fold cross validation training. SBERT-Attn, the model that we propose, outperformed the other models in all metrics. When we compare SBERT-Mean and SBERT-Attn, we see that standard deviations of the SBERT-Attn are lower than the ablation model. This result indicates that the attention layer enables more generalized feature representations. It also shows that the tweets by the hate speech spreader are not necessarily hatred tweets and vice versa for the non haters. For this reason, the DistillBERT models that joins the all tweets by the user to one underperformed.
|DistillBERT||67.46 5.28||67.58 5.37||67.75 5.15||67.04 5.68||71.46 1.63|
|DistillBERT*||61.90 3.01||62.04 3.22||62.25 3.39||63.13 4.40||59.86 7.49|
|SBERT-Mean||69.55 6.82||69.58 6.71||69.75 6.86||67.38 3.61||77.10 12.12|
|SBERT-Attn||73.62 4.11||73.77 4.12||74.0 4.14||70.97 5.39||81.23 5.39|
For the submission to the PAN shared task, we leverage the 5-fold trained models to obtain the predictions on official test set. The final predictions are the majority class. Table 3
shows cross validation results for the English samples and the Spanish samples, and the official results of the PAN shared task where the accuracy is the evaluation metric. Our model obtained a result with similar range in cross-validation. The performance of the English set is worse than the Spanish one. Cultural bias or the topical difference could be reasons for the performance. We leave the detailed analysis of these issues as future work.
Our framework can provide explanations with tweet-level and token-level attention, as shown in Figure 2. The token-level attentions are the average of the attentions in the last layer of the SBERT and they are obtained through the self-attention mechanism. The tweet-level attentions are obtained with the attention layer, which is connected to the classification layer. The examples in the figure are the most hatred examples from the authors that are analysed. In the English example, the model pays attention to feminism. In the Spanish example, vice presidencia is the important entity.
In this paper, we presented a unified framework for monitoring hate speech spreaders in multilingualism. The framework leverages multilingual SBERT representations to encode texts regardless of the language and uses an attention mechanism to determine the importance of the tweets by the author in the task. Our methods outperformed multilingual DistillBERT and SBERT that apply mean pooling on the tweets.
In the future, we plan to evaluate the method on the related user profiling tasks such as profiling fake news spreaders [PardoGGR20] and investigate advanced method (e.g [pfeiffer2020adapterhub]) for effectively transferring knowledge across the languages.