Regression Diagnostics meets Forecast Evaluation: Conditional Calibration, Reliability Diagrams, and Coefficient of Determination

08/06/2021
by   Tilmann Gneiting, et al.
0

Model diagnostics and forecast evaluation are two sides of the same coin. A common principle is that fitted or predicted distributions ought to be calibrated or reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary outcomes, this is a universal concept of reliability. For general real-valued outcomes, practitioners and theoreticians have relied on weaker, unconditional notions, most prominently probabilistic calibration, which corresponds to the uniformity of the probability integral transform. Conditional concepts give rise to hierarchies of calibration. In a nutshell, a predictive distribution is conditionally T-calibrated if it can be taken at face value in terms of the functional T. Whenever T is defined via an identification function - as in the cases of threshold (non) exceedance probabilities, quantiles, expectiles, and moments - auto-calibration implies T-calibration. However, the notion of T-calibration also applies to stand-alone point forecasts or regression output in terms of the functional T. We introduce population versions of T-reliability diagrams and revisit a score decomposition into measures of miscalibration (MCB), discrimination (DSC), and uncertainty (UNC). In empirical settings, stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, we propose a universal coefficient of determination, R^∗ = DSC-MCB/UNC, that nests and reinterprets the classical R^2 in least squares (mean) regression and its natural analogue R^1 in quantile regression, yet applies to T-regression in general, with MCB ≥ 0, DSC ≥ 0, and R^∗∈ [0,1] under modest conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2023

Evaluating Probabilistic Classifiers: The Triptych

Probability forecasts for binary outcomes, often referred to as probabil...
research
08/07/2020

Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited

A probability forecast or probabilistic classifier is reliable or calibr...
research
10/21/2022

Calibration tests beyond classification

Most supervised machine learning tasks are subject to irreducible predic...
research
02/19/2019

Evaluating model calibration in classification

Probabilistic classifiers output a probability distribution on target cl...
research
03/11/2020

Estimation of Accurate and Calibrated Uncertainties in Deterministic models

In this paper we focus on the problem of assigning uncertainties to sing...
research
09/21/2023

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

Calibration measures and reliability diagrams are two fundamental tools ...
research
10/16/2019

Forecast Evaluation of Set-Valued Functionals

A functional is elicitable (identifiable) if it is the unique minimiser ...

Please sign up or login with your details

Forgot password? Click here to reset