Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory

08/05/2023
by   Samuel Stocksieker, et al.
0

In supervised learning, it is quite frequent to be confronted with real imbalanced datasets. This situation leads to a learning difficulty for standard algorithms. Research and solutions in imbalanced learning have mainly focused on classification tasks. Despite its importance, very few solutions exist for imbalanced regression. In this paper, we propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates which can be used in classification and regression. This general approach encompasses two large families of synthetic oversampling: those based on perturbations, such as Gaussian Noise, and those based on interpolations, such as SMOTE. It also provides an explicit form of these machine learning algorithms and an expression of their conditional densities, in particular for SMOTE. New synthetic data generators are deduced. We apply GOLIATH in imbalanced regression combining such generator procedures with a wild-bootstrap resampling technique for the target values. We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations. We empirically evaluate and compare our approach and demonstrate significant improvement over existing state-of-the-art techniques.

READ FULL TEXT

page 8

page 14

page 16

research
02/18/2023

Data Augmentation for Imbalanced Regression

In this work, we consider the problem of imbalanced data in a regression...
research
10/23/2019

GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets

Imbalanced datasets are ubiquitous. Classification performance on imbala...
research
06/20/2022

Model Optimization in Imbalanced Regression

Imbalanced domain learning aims to produce accurate models in predicting...
research
03/24/2019

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Data augmentation is rapidly gaining attention in machine learning. Synt...
research
06/01/2023

SPINEX: Similarity-based Predictions and Explainable Neighbors Exploration for Regression and Classification Tasks in Machine Learning

The field of machine learning (ML) has witnessed significant advancement...
research
08/21/2023

A step towards understanding why classification helps regression

A number of computer vision deep regression approaches report improved r...
research
09/14/2021

Variation-Incentive Loss Re-weighting for Regression Analysis on Biased Data

Both classification and regression tasks are susceptible to the biased d...

Please sign up or login with your details

Forgot password? Click here to reset