Is augmentation effective to improve prediction in imbalanced text datasets?

04/20/2023
by   Gabriel O. Assunção, et al.
0

Imbalanced datasets present a significant challenge for machine learning models, often leading to biased predictions. To address this issue, data augmentation techniques are widely used in natural language processing (NLP) to generate new samples for the minority class. However, in this paper, we challenge the common assumption that data augmentation is always necessary to improve predictions on imbalanced datasets. Instead, we argue that adjusting the classifier cutoffs without data augmentation can produce similar results to oversampling techniques. Our study provides theoretical and empirical evidence to support this claim. Our findings contribute to a better understanding of the strengths and limitations of different approaches to dealing with imbalanced data, and help researchers and practitioners make informed decisions about which methods to use for a given task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2023

Experimenting with an Evaluation Framework for Imbalanced Data Learning (EFIDL)

Introduction Data imbalance is one of the crucial issues in big data ana...
research
04/06/2023

A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation

Class imbalance (CI) in classification problems arises when the number o...
research
02/18/2023

Data Augmentation for Imbalanced Regression

In this work, we consider the problem of imbalanced data in a regression...
research
10/24/2022

GradMix for nuclei segmentation and classification in imbalanced pathology image datasets

An automated segmentation and classification of nuclei is an essential t...
research
10/10/2022

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

Many natural language processing (NLP) tasks are naturally imbalanced, a...
research
06/06/2023

Augmenting Reddit Posts to Determine Wellness Dimensions impacting Mental Health

Amid ongoing health crisis, there is a growing necessity to discern poss...
research
08/29/2023

From SMOTE to Mixup for Deep Imbalanced Classification

Given imbalanced data, it is hard to train a good classifier using deep ...

Please sign up or login with your details

Forgot password? Click here to reset