Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches

10/18/2021
by   Ravindra Nayak, et al.
0

In the recent past, social media platforms have helped people in connecting and communicating to a wider audience. But this has also led to a drastic increase in cyberbullying. It is essential to detect and curb hate speech to keep the sanity of social media platforms. Also, code mixed text containing more than one language is frequently used on these platforms. We, therefore, propose automated techniques for hate speech detection in code mixed text from scraped Twitter. We specifically focus on code mixed English-Hindi text and transformer-based approaches. While regular approaches analyze the text independently, we also make use of content text in the form of parent tweets. We try to evaluate the performances of multilingual BERT and Indic-BERT in single-encoder and dual-encoder settings. The first approach is to concatenate the target text and context text using a separator token and get a single representation from the BERT model. The second approach encodes the two texts independently using a dual BERT encoder and the corresponding representations are averaged. We show that the dual-encoder approach using independent representations yields better performance. We also employ simple ensemble methods to further improve the performance. Using these methods we report the best F1 score of 73.07

READ FULL TEXT
research
05/11/2021

Role of Artificial Intelligence in Detection of Hateful Speech for Hinglish Data on Social Media

Social networking platforms provide a conduit to disseminate our ideas, ...
research
12/18/2021

Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets

In the current era of the internet, where social media platforms are eas...
research
06/25/2020

Normalizing Text using Language Modelling based on Phonetics and String Similarity

Social media networks and chatting platforms often use an informal versi...
research
04/30/2022

HateCheckHIn: Evaluating Hindi Hate Speech Detection Models

Due to the sheer volume of online hate, the AI and NLP communities have ...
research
08/24/2021

Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts

Offensive Language detection in social media platforms has been an activ...
research
10/25/2021

Battling Hateful Content in Indic Languages HASOC '21

The extensive rise in consumption of online social media (OSMs) by a lar...
research
02/08/2021

A study of text representations in Hate Speech Detection

The pervasiveness of the Internet and social media have enabled the rapi...

Please sign up or login with your details

Forgot password? Click here to reset