Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

09/21/2023
by   Jarosław Błasiok, et al.
0

Calibration measures and reliability diagrams are two fundamental tools for measuring and interpreting the calibration of probabilistic predictors. Calibration measures quantify the degree of miscalibration, and reliability diagrams visualize the structure of this miscalibration. However, the most common constructions of reliability diagrams and calibration measures – binning and ECE – both suffer from well-known flaws (e.g. discontinuity). We show that a simple modification fixes both constructions: first smooth the observations using an RBF kernel, then compute the Expected Calibration Error (ECE) of this smoothed function. We prove that with a careful choice of bandwidth, this method yields a calibration measure that is well-behaved in the sense of (Błasiok, Gopalan, Hu, and Nakkiran 2023a) – a consistent calibration measure. We call this measure the SmoothECE. Moreover, the reliability diagram obtained from this smoothed function visually encodes the SmoothECE, just as binned reliability diagrams encode the BinnedECE. We also provide a Python package with simple, hyperparameter-free methods for measuring and plotting calibration: `pip install relplot.̀

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

A Unifying Theory of Distance from Calibration

We study the fundamental question of how to define and measure the dista...
research
06/03/2020

Plots of the cumulative differences between observed and expected values of ordered Bernoulli variates

Many predictions are probabilistic in nature; for example, a prediction ...
research
07/27/2022

Calibrate: Interactive Analysis of Probabilistic Model Output

Analyzing classification model performance is a crucial task for machine...
research
08/07/2020

Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited

A probability forecast or probabilistic classifier is reliable or calibr...
research
08/06/2021

Regression Diagnostics meets Forecast Evaluation: Conditional Calibration, Reliability Diagrams, and Coefficient of Determination

Model diagnostics and forecast evaluation are two sides of the same coin...
research
05/19/2022

Metrics of calibration for probabilistic predictions

Predictions are often probabilities; e.g., a prediction could be for pre...
research
01/25/2023

Evaluating Probabilistic Classifiers: The Triptych

Probability forecasts for binary outcomes, often referred to as probabil...

Please sign up or login with your details

Forgot password? Click here to reset