An empirical evaluation of imbalanced data strategies from a practitioner's point of view

10/16/2018
by   Jacques Wainer, et al.
0

This research tested the following well known strategies to deal with binary imbalanced data on 82 different real life data sets (sampled to imbalance rates of 5 (just the base classifier). As base classifiers we used SVM with RBF kernel, random forests, and gradient boosting machines and we measured the quality of the resulting classifier using 6 different metrics (Area under the curve, Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced accuracy). The best strategy strongly depends on the metric used to measure the quality of the classifier. For AUC and accuracy class weight and the baseline perform better; for F-measure and MCC, SMOTE performs better; and for G-mean and balanced accuracy, underbagging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

Class-Weighted Evaluation Metrics for Imbalanced Data Classification

Class distribution skews in imbalanced datasets may lead to models with ...
research
10/17/2019

KDE sampling for imbalanced class distribution

Imbalanced response variable distribution is not an uncommon occurrence ...
research
04/19/2018

Instance Selection Improves Geometric Mean Accuracy: A Study on Imbalanced Data Classification

A natural way of handling imbalanced data is to attempt to equalise the ...
research
12/17/2022

Balanced Split: A new train-test data splitting strategy for imbalanced datasets

Classification data sets with skewed class proportions are called imbala...
research
05/21/2021

Computational Efficient Approximations of the Concordance Probability in a Big Data Setting

Performance measurement is an essential task once a statistical model is...
research
06/05/2022

Never mind the metrics – what about the uncertainty? Visualising confusion matrix metric distributions

There are strong incentives to build models that demonstrate outstanding...
research
02/21/2023

Does the evaluation stand up to evaluation? A first-principle approach to the evaluation of classifiers

How can one meaningfully make a measurement, if the meter does not confo...

Please sign up or login with your details

Forgot password? Click here to reset