A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification

07/24/2021
by   Umberto Michelucci, et al.
1

This paper presents the intrinsic limit determination algorithm (ILD Algorithm), a novel technique to determine the best possible performance, measured in terms of the AUC (area under the ROC curve) and accuracy, that can be obtained from a specific dataset in a binary classification problem with categorical features regardless of the model used. This limit, namely the Bayes error, is completely independent of any model used and describes an intrinsic property of the dataset. The ILD algorithm thus provides important information regarding the prediction limits of any binary classification algorithm when applied to the considered dataset. In this paper the algorithm is described in detail, its entire mathematical framework is presented and the pseudocode is given to facilitate its implementation. Finally, an example with a real dataset is given.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2020

Probabilistic learning of boolean functions applied to the binary classification problem with categorical covariates

In this work we cast the problem of binary classification in terms of es...
research
02/13/2018

A Dimension-Independent discriminant between distributions

Henze-Penrose divergence is a non-parametric divergence measure that can...
research
01/09/2023

The Optimal Input-Independent Baseline for Binary Classification: The Dutch Draw

Before any binary classification model is taken into practice, it is imp...
research
05/08/2023

A LSTM and Cost-Sensitive Learning-Based Real-Time Warning for Civil Aviation Over-limit

The issue of over-limit during passenger aircraft flights has drawn incr...
research
01/16/2021

Towards Searching Efficient and Accurate Neural Network Architectures in Binary Classification Problems

In recent years, deep neural networks have had great success in machine ...
research
07/19/2022

Selecting applicants based on multiple ratings: Using binary classification framework as an alternative to inter-rater reliability

Inter-rater reliability (IRR) has been the prevalent quality and precisi...
research
07/05/2017

Estimating the Fundamental Limits is Easier than Achieving the Fundamental Limits

We show through case studies that it is easier to estimate the fundament...

Please sign up or login with your details

Forgot password? Click here to reset