Can We Achieve More with Less? Exploring Data Augmentation for Toxic Comment Classification

07/02/2020
by   Chetanya Rastogi, et al.
10

This paper tackles one of the greatest limitations in Machine Learning: Data Scarcity. Specifically, we explore whether high accuracy classifiers can be built from small datasets, utilizing a combination of data augmentation techniques and machine learning algorithms. In this paper, we experiment with Easy Data Augmentation (EDA) and Backtranslation, as well as with three popular learning algorithms, Logistic Regression, Support Vector Machine (SVM), and Bidirectional Long Short-Term Memory Network (Bi-LSTM). For our experimentation, we utilize the Wikipedia Toxic Comments dataset so that in the process of exploring the benefits of data augmentation, we can develop a model to detect and classify toxic speech in comments to help fight back against cyberbullying and online harassment. Ultimately, we found that data augmentation techniques can be used to significantly boost the performance of classifiers and are an excellent strategy to combat lack of data in NLP problems.

READ FULL TEXT
research
12/19/2021

Data Augmentation for Mental Health Classification on Social Media

The mental disorder of online users is determined using social media pos...
research
12/05/2018

Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs

In practice, it is common to find oneself with far too little text data ...
research
08/25/2022

Image augmentation improves few-shot classification performance in plant disease recognition

With the world population projected to near 10 billion by 2050, minimizi...
research
03/26/2019

Augmented Ultrasonic Data for Machine Learning

Flaw detection in non-destructive testing, especially in complex signals...
research
01/31/2020

Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending

Data imbalance is a major problem that affects several machine learning ...
research
10/03/2018

Machine Learning Suites for Online Toxicity Detection

To identify and classify toxic online commentary, the modern tools of da...

Please sign up or login with your details

Forgot password? Click here to reset