On Model Evaluation under Non-constant Class Imbalance

01/15/2020
by   Jan Brabec, et al.
0

Many real-world classification problems are significantly class-imbalanced to detriment of the class of interest. The standard set of proper evaluation metrics is well-known but the usual assumption is that the test dataset imbalance equals the real-world imbalance. In practice, this assumption is often broken for various reasons. The reported results are then often too optimistic and may lead to wrong conclusions about industrial impact and suitability of proposed techniques. We introduce methods focusing on evaluation under non-constant class imbalance. We show that not only the absolute values of commonly used metrics, but even the order of classifiers in relation to the evaluation metric used is affected by the change of the imbalance rate. Finally, we demonstrate that using subsampling in order to get a test dataset with class imbalance equal to the one observed in the wild is not necessary, and eventually can lead to significant errors in classifier's performance estimate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

Measuring Class-Imbalance Sensitivity of Deterministic Performance Evaluation Metrics

The class-imbalance issue is intrinsic to many real-world machine learni...
research
09/06/2019

Master your Metrics with Calibration

Machine learning models deployed in real-world applications are often ev...
research
06/13/2022

On the impact of dataset size and class imbalance in evaluating machine-learning-based windows malware detection techniques

The purpose of this project was to collect and analyse data about the co...
research
12/04/2018

Bad practices in evaluation methodology relevant to class-imbalanced problems

For research to go in the right direction, it is essential to be able to...
research
08/26/2020

Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis

Indices quantifying the performance of classifiers under class-imbalance...
research
09/16/2011

A Characterization of the Combined Effects of Overlap and Imbalance on the SVM Classifier

In this paper we demonstrate that two common problems in Machine Learnin...
research
08/31/2016

Towards Competitive Classifiers for Unbalanced Classification Problems: A Study on the Performance Scores

Although a great methodological effort has been invested in proposing co...

Please sign up or login with your details

Forgot password? Click here to reset