MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance

09/03/2020
by   Anubha Kabra, et al.
53

Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and computational advertising. We propose an iterative data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) instances from the majority and minority classes to generate synthetic hybrid instances that have characteristics of both classes. We evaluate MixBoost on 20 benchmark datasets, show that it outperforms existing approaches, and test its efficacy through significance testing. We also present ablation studies to analyze the impact of the different components of MixBoost.

READ FULL TEXT

page 1

page 4

page 9

research
11/09/2020

Synthetic Over-sampling with the Minority and Majority classes for imbalance problems

Class imbalance is a substantial challenge in classifying many real-worl...
research
11/05/2021

Solving the Class Imbalance Problem Using a Counterfactual Method for Data Augmentation

Learning from class imbalanced datasets poses challenges for many machin...
research
04/06/2023

A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation

Class imbalance (CI) in classification problems arises when the number o...
research
01/06/2020

Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning

We investigate learning a ConvNet classifier with class-imbalanced data....
research
10/23/2019

GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets

Imbalanced datasets are ubiquitous. Classification performance on imbala...
research
04/20/2022

Neurochaos Feature Transformation and Classification for Imbalanced Learning

Learning from limited and imbalanced data is a challenging problem in th...
research
02/27/2023

Semantic-aware Node Synthesis for Imbalanced Heterogeneous Information Networks

Heterogeneous graph neural networks (HGNNs) have exhibited exceptional e...

Please sign up or login with your details

Forgot password? Click here to reset