Evaluating Probabilistic Classifiers: The Triptych

01/25/2023
by   Timo Dimitriadis, et al.
0

Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance: The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value. A Murphy curve shows a forecast's mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm based) approach to craft reliability diagrams and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the DSC measure of discrimination ability versus the calibration metric MCB visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2020

Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited

A probability forecast or probabilistic classifier is reliable or calibr...
research
08/06/2021

Regression Diagnostics meets Forecast Evaluation: Conditional Calibration, Reliability Diagrams, and Coefficient of Determination

Model diagnostics and forecast evaluation are two sides of the same coin...
research
02/29/2020

Model-based ROC (mROC) curve: examining the effect of case-mix and model calibration on the ROC plot

The performance of a risk prediction model is often characterized in ter...
research
06/28/2021

More on verification of probability forecasts for football outcomes: score decompositions, reliability, and discrimination analyses

Forecast of football outcomes in terms of Home Win, Draw and Away Win re...
research
01/29/2021

Evaluating the Discrimination Ability of Proper Multivariate Scoring Rules

Proper scoring rules are commonly applied to quantify the accuracy of di...
research
09/21/2023

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

Calibration measures and reliability diagrams are two fundamental tools ...
research
04/27/2022

Faster online calibration without randomization: interval forecasts and the power of two choices

We study the problem of making calibrated probabilistic forecasts for a ...

Please sign up or login with your details

Forgot password? Click here to reset