Detect Toxic Content to Improve Online Conversations

10/29/2019
by   Deepshi Mediratta, et al.
12

Social media is filled with toxic content. The aim of this paper is to build a model that can detect insincere questions. We use the 'Quora Insincere Questions Classification' dataset for our analysis. The dataset is composed of sincere and insincere questions, with the majority of sincere questions. The dataset is processed and analyzed using Python and its libraries such as sklearn, numpy, pandas, keras etc. The dataset is converted to vector form using word embeddings such as GloVe, Wiki-news and TF-IDF. The imbalance in the dataset is handled by resampling techniques. We train and compare various machine learning and deep learning models to come up with the best results. Models discussed include SVM, Naive Bayes, GRU and LSTM.

READ FULL TEXT

page 5

page 6

page 7

page 8

page 9

research
04/10/2021

Identifying and Categorizing Offensive Language in Social Media

Offensive language is pervasive in social media. Individuals frequently ...
research
09/01/2023

Detecting Suicidality in Arabic Tweets Using Machine Learning and Deep Learning Techniques

Social media platforms have revolutionized traditional communication tec...
research
04/18/2020

Identifying Semantically Duplicate Questions Using Data Science Approach: A Quora Case Study

Identifying semantically identical questions on, Question and Answering ...
research
07/01/2021

Tackling COVID-19 Infodemic using Deep Learning

Humanity is battling one of the most deleterious virus in modern history...
research
07/22/2019

Detecting Radical Text over Online Media using Deep Learning

Social Media has influenced the way people socially connect, interact an...
research
09/28/2018

Overview of PicTropes, a film trope dataset

From the database DBTropes.org, we have created a dataset of films and t...
research
06/12/2023

Izindaba-Tindzaba: Machine learning news categorisation for Long and Short Text for isiZulu and Siswati

Local/Native South African languages are classified as low-resource lang...

Please sign up or login with your details

Forgot password? Click here to reset