Balanced Split: A new train-test data splitting strategy for imbalanced datasets

12/17/2022
by   Azal Ahmad Khan, et al.
0

Classification data sets with skewed class proportions are called imbalanced. Class imbalance is a problem since most machine learning classification algorithms are built with an assumption of equal representation of all classes in the training dataset. Therefore to counter the class imbalance problem, many algorithm-level and data-level approaches have been developed. These mainly include ensemble learning and data augmentation techniques. This paper shows a new way to counter the class imbalance problem through a new data-splitting strategy called balanced split. Data splitting can play an important role in correctly classifying imbalanced datasets. We show that the commonly used data-splitting strategies have some disadvantages, and our proposed balanced split has solved those problems.

READ FULL TEXT
research
04/06/2023

A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation

Class imbalance (CI) in classification problems arises when the number o...
research
08/20/2022

A Novel Hybrid Sampling Framework for Imbalanced Learning

Class imbalance is a frequently occurring scenario in classification tas...
research
04/07/2020

Long-Tailed Recognition Using Class-Balanced Experts

Classic deep learning methods achieve impressive results in image recogn...
research
12/04/2018

Bad practices in evaluation methodology relevant to class-imbalanced problems

For research to go in the right direction, it is essential to be able to...
research
12/20/2019

Divide and Conquer: an Accurate Machine Learning Algorithm to Process Split Videos on a Parallel Processing Infrastructure

Every day the number of traffic cameras in cities rapidly increase and h...
research
11/02/2017

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Learning from class-imbalanced data continues to be a common and challen...
research
10/16/2018

An empirical evaluation of imbalanced data strategies from a practitioner's point of view

This research tested the following well known strategies to deal with bi...

Please sign up or login with your details

Forgot password? Click here to reset