Deep Learning Approach for Classifying the Aggressive Comments on Social Media: Machine Translated Data Vs Real Life Data

03/13/2023
by   Mst Shapna Akter, et al.
0

Aggressive comments on social media negatively impact human life. Such offensive contents are responsible for depression and suicidal-related activities. Since online social networking is increasing day by day, the hate content is also increasing. Several investigations have been done on the domain of cyberbullying, cyberaggression, hate speech, etc. The majority of the inquiry has been done in the English language. Some languages (Hindi and Bangla) still lack proper investigations due to the lack of a dataset. This paper particularly worked on the Hindi, Bangla, and English datasets to detect aggressive comments and have shown a novel way of generating machine-translated data to resolve data unavailability issues. A fully machine-translated English dataset has been analyzed with the models such as the Long Short term memory model (LSTM), Bidirectional Long-short term memory model (BiLSTM), LSTM-Autoencoder, word2vec, Bidirectional Encoder Representations from Transformers (BERT), and generative pre-trained transformer (GPT-2) to make an observation on how the models perform on a machine-translated noisy dataset. We have compared the performance of using the noisy data with two more datasets such as raw data, which does not contain any noises, and semi-noisy data, which contains a certain amount of noisy data. We have classified both the raw and semi-noisy data using the aforementioned models. To evaluate the performance of the models, we have used evaluation metrics such as F1-score,accuracy, precision, and recall. We have achieved the highest accuracy on raw data using the gpt2 model, semi-noisy data using the BERT model, and fully machine-translated data using the BERT model. Since many languages do not have proper data availability, our approach will help researchers create machine-translated datasets for several analysis purposes.

READ FULL TEXT
research
08/15/2023

A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on Social Media Using Synthetic Data

Social media cyberbullying has a detrimental effect on human life. As on...
research
04/03/2023

Detection of Homophobia Transphobia in Dravidian Languages: Exploring Deep Learning Methods

The increase in abusive content on online social media platforms is impa...
research
08/19/2021

How Hateful are Movies? A Study and Prediction on Movie Subtitles

In this research, we investigate techniques to detect hate speech in mov...
research
10/01/2020

Detecting White Supremacist Hate Speech using Domain Specific Word Embedding with Deep Learning and BERT

White supremacists embrace a radical ideology that considers white peopl...
research
02/28/2021

NLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation Learner

In recent years, several systems have been developed to regulate the spr...
research
11/02/2021

Automatic identification of suicide notes with a transformer-based deep learning model

Suicide is one of the leading causes of death worldwide. At the same tim...
research
01/26/2023

A benchmark for toxic comment classification on Civil Comments dataset

Toxic comment detection on social media has proven to be essential for c...

Please sign up or login with your details

Forgot password? Click here to reset