Detecting Hate Speech and Offensive Language on Twitter using Machine Learning: An N-gram and TFIDF based Approach

09/23/2018
by   Aditya Gaydhani, et al.
0

Toxic online content has become a major issue in today's world due to an exponential increase in the use of internet by people of different cultures and educational background. Differentiating hate speech and offensive language is a key challenge in automatic detection of toxic text content. In this paper, we propose an approach to automatically classify tweets on Twitter into three classes: hateful, offensive and clean. Using Twitter dataset, we perform experiments considering n-grams as features and passing their term frequency-inverse document frequency (TFIDF) values to multiple machine learning models. We perform comparative analysis of the models considering several values of n in n-grams and TFIDF normalization methods. After tuning the model giving the best results, we achieve 95.6 on test data. We also create a module which serves as an intermediate between user and Twitter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2021

Offensive Language and Hate Speech Detection with Deep Learning and Transfer Learning

Toxic online speech has become a crucial problem nowadays due to an expo...
research
08/30/2018

Comparative Studies of Detecting Abusive Language on Twitter

The context-dependent nature of online aggression makes annotating large...
research
03/13/2018

Automatic Detection of Online Jihadist Hate Speech

We have developed a system that automatically detects online jihadist ha...
research
06/01/2017

Deep Learning for Hate Speech Detection in Tweets

Hate speech detection on Twitter is critical for applications like contr...
research
11/21/2016

Ontology Driven Disease Incidence Detection on Twitter

In this work we address the issue of generic automated disease incidence...
research
02/01/2018

A Unified Deep Learning Architecture for Abuse Detection

Hate speech, offensive language, sexism, racism and other types of abusi...
research
08/17/2022

BIC: Twitter Bot Detection with Text-Graph Interaction and Semantic Consistency

Twitter bot detection is an important and meaningful task. Existing text...

Please sign up or login with your details

Forgot password? Click here to reset