CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

12/12/2017
by   Farshid Rayhan, et al.
0

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater interest than the majority class instances in real-life applications. Recently, several techniques based on sampling methods (under-sampling of the majority class and over-sampling the minority class), cost-sensitive learning methods, and ensemble learning have been used in the literature for classifying imbalanced datasets. In this paper, we introduce a new clustering-based under-sampling approach with boosting (AdaBoost) algorithm, called CUSBoost, for effective imbalanced classification. The proposed algorithm provides an alternative to RUSBoost (random under-sampling with AdaBoost) and SMOTEBoost (synthetic minority over-sampling with AdaBoost) algorithms. We evaluated the performance of CUSBoost algorithm with the state-of-the-art methods based on ensemble learning like AdaBoost, RUSBoost, SMOTEBoost on 13 imbalance binary and multi-class datasets with various imbalance ratios. The experimental results show that the CUSBoost is a promising and effective approach for dealing with highly imbalanced datasets.

READ FULL TEXT
research
12/18/2017

MEBoost: Mixing Estimators with Boosting for Imbalanced Data Classification

Class imbalance problem has been a challenging research problem in the f...
research
11/09/2020

Synthetic Over-sampling with the Minority and Majority classes for imbalance problems

Class imbalance is a substantial challenge in classifying many real-worl...
research
10/17/2019

KDE sampling for imbalanced class distribution

Imbalanced response variable distribution is not an uncommon occurrence ...
research
10/10/2021

Time Series Classification Using Convolutional Neural Network On Imbalanced Datasets

Time Series Classification (TSC) has drawn a lot of attention in literat...
research
07/06/2022

A Hybrid Approach for Binary Classification of Imbalanced Data

Binary classification with an imbalanced dataset is challenging. Models ...
research
09/22/2020

Gamma distribution-based sampling for imbalanced data

Imbalanced class distribution is a common problem in a number of fields ...
research
10/09/2020

Handling Imbalanced Data: A Case Study for Binary Class Problems

For several years till date, the major issues in terms of solving for cl...

Please sign up or login with your details

Forgot password? Click here to reset