Prevalence Threshold and bounds in the Accuracy of Binary Classification Systems

12/25/2021
by   Jacques Balayla, et al.
0

The accuracy of binary classification systems is defined as the proportion of correct predictions - both positive and negative - made by a classification model or computational algorithm. A value between 0 (no accuracy) and 1 (perfect accuracy), the accuracy of a classification model is dependent on several factors, notably: the classification rule or algorithm used, the intrinsic characteristics of the tool used to do the classification, and the relative frequency of the elements being classified. Several accuracy metrics exist, each with its own advantages in different classification scenarios. In this manuscript, we show that relative to a perfect accuracy of 1, the positive prevalence threshold (ϕ_e), a critical point of maximum curvature in the precision-prevalence curve, bounds the F_β score between 1 and 1.8/1.5/1.2 for β values of 0.5/1.0/2.0, respectively; the F_1 score between 1 and 1.5, and the Fowlkes-Mallows Index (FM) between 1 and √(2)≈ 1.414. We likewise describe a novel negative prevalence threshold (ϕ_n), the level of sharpest curvature for the negative predictive value-prevalence curve, such that ϕ_n > ϕ_e. The area between both these thresholds bounds the Matthews Correlation Coefficient (MCC) between √(2)/2 and √(2). Conversely, the ratio of the maximum possible accuracy to that at any point below the prevalence threshold, ϕ_e, goes to infinity with decreasing prevalence. Though applications are numerous, the ideas herein discussed may be used in computational complexity theory, artificial intelligence, and medical screening, amongst others. Where computational time is a limiting resource, attaining the prevalence threshold in binary classification systems may be sufficient to yield levels of accuracy comparable to that under maximum prevalence.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2020

Prevalence Threshold and the Geometry of Screening Curves

The relationship between a screening tests' positive predictive value, ρ...
research
06/05/2022

Information Threshold, Bayesian Inference and Decision-Making

We define the information threshold as the point of maximum curvature in...
research
06/01/2016

On the equivalence between Kolmogorov-Smirnov and ROC curve metrics for binary classification

Binary decisions are very common in artificial intelligence. Applying a ...
research
04/15/2021

The SIR-P Model: An Illustration of the Screening Paradox

In previous work by this author, the screening paradox - the loss of pre...
research
12/13/2020

Invariant Points on the Screening Plane: a Geometric Definition of the Likelihood Ratio (LR+)

From the fundamental theorem of screening we obtain the following mathem...
research
10/25/2018

Between a ROC and a Hard Place: Using prevalence plots to understand the likely real world performance of biomarkers in the clinic

The Receiver Operating Characteristic (ROC) curve and the Area Under the...
research
04/25/2017

Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts

Understanding how ideas relate to each other is a fundamental question i...

Please sign up or login with your details

Forgot password? Click here to reset