Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages

by   Somnath Banerjee, et al.
IIT Kharagpur

Hate speech is considered to be one of the major issues currently plaguing online social media. Repeated and repetitive exposure to hate speech has been shown to create physiological effects on the target users. Thus, hate speech, in all its forms, should be addressed on these platforms in order to maintain good health. In this paper, we explored several Transformer based machine learning models for the detection of hate speech and offensive content in English and Indo-Aryan languages at FIRE 2021. We explore several models such as mBERT, XLMR-large, XLMR-base by team name "Super Mario". Our models came 2nd position in Code-Mixed Data set (Macro F1: 0.7107), 2nd position in Hindi two-class classification(Macro F1: 0.7797), 4th in English four-class category (Macro F1: 0.8006) and 12th in English two-class category (Macro F1: 0.6447).



There are no comments yet.


page 1

page 2

page 3

page 4


One to rule them all: Towards Joint Indic Language Hate Speech Detection

This paper is a contribution to the Hate Speech and Offensive Content Id...

Offensive Language and Hate Speech Detection for Danish

The presence of offensive language on social media platforms and the imp...

IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always Hope in Transformers

In a world filled with serious challenges like climate change, religious...

Macro F1 and Macro F1

The 'macro F1' metric is frequently used to evaluate binary, multi-class...

Complaint Identification in Social Media with Transformer Networks

Complaining is a speech act extensively used by humans to communicate a ...

VAIS Hate Speech Detection System: A Deep Learning based Approach for System Combination

Nowadays, Social network sites (SNSs) such as Facebook, Twitter are comm...

Modeling the Severity of Complaints in Social Media

The speech act of complaining is used by humans to communicate a negativ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Online Social media platforms such as Twitter, Facebook have connected billions of people and allowed them to publish their ideas and opinions instantly. The problem arises when the bad actors(users) share contents to spread propaganda, fake news, and hate speech etc[hate-speech-websci-19] by using these platforms. Companies like Facebook have been accused of instigating anti-Muslim mob violence in Sri Lanka that left three people dead 222 http://www.aaiusa.org/unprecedented_increase_expected_in_upcoming_fbi_hate_crime_report and a UN report blamed them for playing a leading role in the possible genocide of the Rohingya community in Myanmar by spreading hate speech 333 https://www.reuters.com/investigates/special-report/myanmar-facebook-hate. In order to mitigate the spread of hateful/offensive content, these platforms have come up with some guidelines444https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy and expect that users should follow the guidelines before sharing any content. Sometimes, violation of such procedures could lead to the post being deleted or user account suspension.

To reduce the harmful content (such as offensive/hate speech) from these platforms, they employ moderators [newton_2019] to keep the conversations healthy and people-friendly by manually checking the posts. With the ever-increasing volume of data on the platform, manual moderation does not seem a feasible solution in the long run. Hence, platforms are looking toward automatic moderation systems for maintaining civility in their platforms. It has already been observed that Facebook has actively removed a large portion of malicious content from their platforms even before the users report them [robertson_2020]. However, the limitation is these platforms can detect such abusive content in specific major languages, such as English, Spanish, etc. [perrigo_2019, Das2021YouTB]. Hence, an effort is required to detect and mitigate offensive/hate speech-language in low resource language. It has been found that Facebook has the highest number of users, and Twitter has the third-highest number of users in India. So it is necessary for these platforms to have moderation systems for Indian languages as well.

There is a lot of state-of-the-art hate speech detection research content present in the market, mostly in English languages [DasNews2020]. To extend the research in other languages, we also study these methods, which detect offensive/ hate content in Hindi, Marathi, Code-Mixed languages using the data in this shared task.

Despite being the third most spoken language, Hindi is always being considered a low resource language because of its mostly typological representation. Marathi is also kind of very low resource language because there is rarely some work present to identify Hate/ Offensive content. Finally, Code-mixed data is also following a current trend because of complexity in writing local languages in universal key inputs.

Earlier, in HASOC 2019555https://hasocfire.github.io/hasoc/2019/index.html three datasets have been launched to identify Hate and offensive content in English, German and Hindi languages, and in HASOC 2020666https://hasocfire.github.io/hasoc/2020/index.html another dataset has been launched, aiming to identify offensive posts in the code-mixed dataset. Extending the previous work, this time HASOC[hasoc2021mergeoverview, hasoc2021overview]

has introduced two Sub-task, where Sub-task 1 is further divided into two parts. Sub-task 1A focus on Hate speech and Offensive language identification offered for English, Hindi and Marathi. Sub-task 1A is a coarse-grained binary classification in which the posts have to be classified into two classes, namely: Hate and Offensive (HOF) and Non-Hate and offensive (NOT). Sub-task 1B is a fine-grained classification offered for English and Hindi. Hate-speech and offensive posts from sub-task A are further classified into three categories, Hate speech, offensive and profane. On the other hand, Sub-task 2 

[hasoc2021ICHCLoverview] focuses on identifying conversational hate-speech in code-mixed languages. In Sub-task 1B, we participated only in the English language. The definitions of different class labels are given below:

  • HATE - Hate Speech [mathew2020hatexplain]: A post is targeting a specific group of people based on their ethnicity, religious beliefs, geographical belonging, race, etc., with malicious intentions of spreading hate or encouraging violence.

  • OFFN - Offensive [6406271] 777https://www.vocabulary.com/dictionary/offensive: Offensive describes rude or hurtful behaviour or a military or sports incursion into an opponent’s territory. In any context, "on the offensive" means on the attack.

  • PRFN - Profane [10.1145/3193077.3193078] 888https://www.vocabulary.com/dictionary/profane: A post that expresses deeply offensive behaviour shows a lack of respect, especially for someone’s religious beliefs.

In this paper, we have investigated several Transformer based models for our classification task, which has already been seen to be outperforming the existing baselines and standing as a state-of-the-art model. We perform pre-processing, data sampling, hyper-parameter tuning etc., to build the model. Our models are standing in the position in Code-Mixed Data set, position in Hindi two-class classification, in English four-class classification and in English two-class classification. The rest of the paper is organised as follows: Related literature for Hate speech and offensive language detection Section 2. We have discussed the Dataset Description in Section 3; In Section 4, we have presented the System Description. Finally, we have evaluated the experimental setup in Section 5 and the Conclusion in Section 6.

2 Related Works

The problem of hate/offensive speech has been studied for a long time in the research community. People were continuously trying to improve the models in order to identify hateful/offensive content more precisely. One of the earliest works that tried to detect hate speech by using lexicon-based features 

[6406271Chen]. Although they have provided an efficient framework for future research, their dataset was short for any conclusive evidence. In 2017, Davidson et al.  [Davidson2017AutomatedHS]

contributed a dataset in which thousands of tweets were labelled hate, offensive, and neither. With the classification task of detecting hate/offensive speech present in Tweets in mind. Using this dataset, they then explored how linguistic features such as character and word n-grams affected the performance of a classifier aimed to distinguish the three types of Tweets. Additional features in their classification involved binary and count indicators for hashtags, mentions, retweets, and URLs, as well as features for the number of characters, words, and syllables in each tweet. The authors found that one of the issues with their best performing models was that they could not distinguish between hate and offensive posts. With the advent of neural networks becoming more accessible and usable for people, many of them tried solutions using these models.

In 2018, Pitsilis et al. [Pitsilis2018DetectingOL]

, tried deep learning models such as recurrent neural networks (RNNs) to identify the offensive language in English and found that it was quite effective in this task. RNN’s remember the output of each step the model conducts. This approach can capture linguistic context within a text, which is critical to detection. In contrast, RNN’s have been projected to work well with language models, other neural network models, such as CNN. LSTM has had notable success in detecting hate/offensive speech 

[Goldberg2015, Sarracn2018HateSD].

Although the research on hate/offensive speech detection has been growing rapidly, one of the current issues is that most of the datasets are available in the English language only. Thus, hate/offensive speech in other languages are not detected properly and this could be harmful. This is also a problem for companies like Facebook, which can only detect hate speech in certain languages (English, Spanish, and Mandarin) [perrigo_2019]. Recently, the research community has begun to focus on hate/offensive language detection in other low resourced languages like Danish [sigurbergsson-derczynski-2020-offensive], Greek  [Pitenis2020OffensiveLI] and Turkish[ltekin2020ACO]. In the Indian context, the HASOC 2019 999 https://hasocfire.github.io/hasoc/2019/ shared Task by Mandal et al. [Mandl2019OverviewOT] was a significant effort in that direction, where authors created a dataset of hateful and offensive posts in Hindi and English. The best model in this competition used an ensemble of multilingual Transformers, fine-tuned on the given dataset [Mishra20193IdiotsAH]. In the Dravidian part of HASOC 2020 101010 https://hasocfire.github.io/hasoc/2020/, Renjit and Idicula [Renjit2020] used an ensemble of deep learning and simple neural networks to identify offensive posts in Manglish (Malayalam in the roman font).

Recently, Transformer based [Vaswani2017AttentionIA] language models such as BERT, m-BERT, XLM-RoBERTa [Devlin2019BERTPO] are becoming quite popular in several downstream tasks, such as classification, spam detection etc. Previously, it has been already seen these Transformer based models have been outperformed several deep learning models [mathew2020hatexplain] such as CNN-GRU, LSTM etc. Having observed the superior performance of these Transformer based models, we focus on building these models for our classification problem.

3 Dataset Description

The shared tasks present in this competition are divided into two parts. The datasets have been sampled from Twitter. Subtask-1 offers in English, Hindi with two problems, and Marathi with one problem. The subtasks-2 dataset contains English, Hindi and code-mixed Hindi tweets. Details about the problem statements have been discussed below:

3.1 Subtask 1A: Identifying Hate, offensive and profane content from the post

The primary focus of Subtask 1A on Hate speech and Offensive language identification, mainly for English, Hindi and Marathi [gaikwad2021cross], is coarse-grained binary classification. In Table 1 we have presented the dataset statistics on English and Hindi for binary classification.

3.2 Subtask 1B: Discrimination between Hate, profane and offensive posts

This subtask is a fine-grained classification of English and Hindi. Mostly Hate-speech and offensive posts from Subtask 1A are further classified as (HATE) Hate speech, (OFFN) Offensive, (PRFN) Profane, (NONE) Non-Hate. In Table 2 we have presented the dataset statistics on English for Four-class classification (We participated for English language only.).

3.3 Subtask 2: Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL)

A conversational thread can also contain hate and offensive content, which not always can be identified from a single comment or reply to a comment. In this type of situation, context is important to identify the hate or offensive content.

According to Figure 1 the parent tweet is expressing hate and profanity towards Muslim countries regarding the controversy happening in Israel at the time. The two comments on the tweet have written "Amine", which means "truthfully" in Persian, which supports the hate but with the context of the parent. This sub-task focused on the binary classification of such conversational tweets with tree-structured data into (NOT) Non-Hate-Offensive and (HOF) Hate and Offensive. In Table 1 we have presented the dataset statistics of the code-mixed data for binary classification as well.

Figure 1: Example of conversational Hate speech (Image has been taken from HASOC 2021 website)
Category English Hindi Marathi Code-Mixed
Train Test Train Test Train Test Train Test
(NOT) Non Hate-Offensive 1342 483 3161 1027 1205 418 2899 695
(HOF) Hate and Offensive 2501 798 1433 505 669 207 2841 653
Total 3843 1281 4594 1532 1874 625 5740 1348
Table 1: Two-Class Dataset statistics for languages English, Hindi, Marathi and Code-Mixed
Category English
Train Test
(HATE) Hate speech 683 224
(OFFN) Offensive 622 195
(PRFN) Profane 1196 379
(NONE) Non-Hate 1342 483
Total 3843 1281
Table 2: Four-Class Dataset statistics for languages English

3.4 Pre-Processing

While manually going through the data, we found the dataset contains lots of special characters, emoji’s, blank spaces, links etc. Mostly custom functions have been used in pre-processing the files, but some libraries were helpful, like "emoji", "nltk" as a baseline. Performed pre-processing steps are:

  • We have replaced all the tagged user names to @user.

  • We have removed all non-alphanumeric characters except full stop and punctuation (| , ?) in Hindi and Marathi. We have kept all the stop words because by that way machine will be able to identify the sequence of characters properly.

  • We have removed emojis, flags and emotions.

  • We have removed all the URLs.

  • We have kept the hashtags because the hashtags contains some contextual information

Figure 2: Design of the proposed methods

4 System Description

We have presented our proposed models for Offensive language detection and Hate speech detection in English, Marathi, Hindi and Code-mixed posts. The overall pipeline of the methodology has been represented in Figure  2. The baseline we have used is Transformer based pre-trained architecture of BERT [Devlin2019BERTPO]. More intuitively, we have used a couple of various versions of BERT, more specifically mBERT [Devlin2019BERTPO] and XLM-Roberta [conneau2020unsupervised]. The beauty of XLM-Roberta is it is trained in an unsupervised manner on the multilingual corpus. XLM-Roberta has achieved state-of-the-art results in most language modelling tasks.

4.1 Binary Classification

Most of our task was binary classification problem based on respective embedding. We fine-tuned BERT Transformer and classifier layer on top and used binary target labels for individual classes. We have used this procedure with dehatebert-mono-english [aluru2021deep] and XLM-Roberta [conneau2020unsupervised] for English dataset. We have used multilingual BERT (mBERT) and XLM-Roberta for Marathi, Hindi and Code-Mixed classification. Binary cross-entropy loss can be computed for previously mentioned classification task can be mathematically formulated as:

Where it’s assumed that there are two classes: C1 and C2. t1 [0,1] and s1 are the ground truth and the score for C1, and t2=1-t1 and s2=1-s1 are the ground truth and the score for C2.

4.2 Multi-class Classification

In this procedure, we have considered the problem as a Multi-Class classification task. We have fine-tuned the BERT model to get the contextualized embedding by the attention mechanism. We have tried Weighted XLM-Roberta large and weighted dehatebert-mono-english for the Four-Class classification task.

4.3 Weighted Binary Classification

The main challenge in any classification problem is the imbalance in data. This imbalance in data may create a bias towards the most present labels, which leads to a decrease in classification performance. According to table 1 it is clearly evident that except code-Mixed data set, there are class imbalances present in the English, Hindi and Marathi dataset. In the English dataset (HOF), Hate and Offensive labels are 46% more than (NOT) Non-Hate-Offensive class. Similarly, in Hindi (NOT), Non-Hate-Offensive class labels are 54% more than (HOF) Hate and Offensive class. In Marathi, also (NOT) Non-Hate-Offensive class labels are 44% more than (HOF) Hate and Offensive class.

There is a lot of research has been done in this domain to make the data balance. Oversampling and Undersampling are very much popular data balancing methods, but they have coherent disadvantages also. We tried to implement data balance by using the class weight procedure. Table  3 describes the class weight distribution we have used in order to manage the data imbalance.

Task Name
(NOT) Non Hate-Offensive
Class Weight
(HOF) Hate and Offensive
Class Weight
English 1.4318 0.7682
Hindi 0.7266 1.6029
Marathi 0.7775 1.4005
Table 3: Normalized Class Weight for Two Class Classification

4.4 Weighted Multi-Class Classification

In Multi-Class classification also clearly, there is data imbalance present, and we normalized it. It is evident from Table  3 that (HATE) Hate Speech and (OFFN) Offensive counts are quite similar, but they are almost 50% less than (PRFN) Profane and (NONE) Non-Hate individually. We computed class weight for Multi-Class classification, which is present in Table 4.

Task Name
(HATE) Hate speech
Class Weight
(OFFN) Offensive
Class Weight
(PRFN) Profane
Class Weight
(NONE) Non-Hate
Class Weight
English 1.4066 1.5446 0.8033 0.7159
Table 4: Normalized Multi-Class Classweight

4.5 Tuning Parameters

For all the models presented here, we have pre-train on the target dataset for 20 epochs in order to capture the semantics. Along with that, we fine-tuned weighted and unweighted using cross-entropy loss functions

[10.1145/1102351.1102422]. We have used HuggingFace[wolf2020huggingfaces]

and PyTorch

[paszke2019pytorch]. Initial phases, we have used Adam optimizer[loshchilov2019decoupled] with an initial learning rate as 2e-5. We have not used early stopping while training.

5 Results

Our observation was among most of the individual Transformer based BERT models, and the best performance was coming using XLM-Roberta-large (XLMR-large) in English Two-Class and Four-Class dataset and Marathi dataset. In contrast, we are getting the best performance in Code-Mixed dataset by using Custom XLM-Roberta-large. In the case of the Hindi dataset, mBERT is giving the best performance. The beauty of XLM-Roberta is that it has been pre-trained on the parallel corpus. We have noted that the performance of XLM-Roberta-large is very much consistent with most of the regional languages.

While achieving the performance scores. We have used multiple random seeds and have observed that performance was heavily getting impacted for different seeds. It has been observed that while using mBERT, the performance varied 6-7% across our experimented languages. In the case of XLM-Roberta models, the performance was mostly the same and, it varied a maximum 1-2%. Table 5 shows the performance of XLMR-base, XLMR-large and mBERT-base for Two Class classification results. We have shown the classification results for Four-Class classification in Table 6.

This team have not actively participated in the competition for the Marathi dataset, but later post-competition implemented all the transformer based models mentioned in this paper and found Macro F1 as 0.8756, which matches with the 3rd rank holder team from the competition.

Classifiers English Hindi Code-Mixed Marathi
Macro F1 Macro F1 Macro F1 Macro F1
XLMR-base 0.7834 0.6862 0.6456 0.8133
XLMR-large 0.8006 0.7112 0.7107 0.8756
mBERT-base 0.7328 0.7797 0.6277 0.8611
Indic-BERT 0.7002 0.6323 0.5912 0.8176
dehate-BERT 0.7811 0.6533 0.6377 0.7550
Submission Name "Bestn" "T2" "Context 1" -
Table 5: Two-Class Classification Result
Classifiers English
Macro F1
XLMR-base 0.5824
XLMR-large 0.6447
mBERT-base 0.5443
Indic-BERT 0.5119
dehate-BERT 0.4845
Submission Name "Final"
Table 6: Four-Class Classification Result

6 Conclusion

In this shared task, we have compared and evaluated multiple Transformer-based architectures and discovered that XLM-Roberta-large mainly performs better than other Transformer-based models. However, performance varies based on a random seed. It has been observed that by changing random seed, XLM-Roberta performance was impacted less than other Transformer-based models. So, some of the actions will be to identify this observation and speculate the reason behind it. We have also used a couple of Transformer based models like IndicBERT and dehateBERT but was not getting enough raising performance compared with XLMRobeta and mBERT. Our immediate next step will be to investigate the reasons behind the lower performance of IndicBERT and dehateBERT, as IndicBERT is specifically pretrained with Indian languages and dehateBERT is an already fine-tuned model on the hate speech dataset.