Regression Diagnostics meets Forecast Evaluation: Conditional Calibration, Reliability Diagrams, and Coefficient of Determination
Model diagnostics and forecast evaluation are two sides of the same coin. A common principle is that fitted or predicted distributions ought to be calibrated or reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary outcomes, this is a universal concept of reliability. For general real-valued outcomes, practitioners and theoreticians have relied on weaker, unconditional notions, most prominently probabilistic calibration, which corresponds to the uniformity of the probability integral transform. Conditional concepts give rise to hierarchies of calibration. In a nutshell, a predictive distribution is conditionally T-calibrated if it can be taken at face value in terms of the functional T. Whenever T is defined via an identification function - as in the cases of threshold (non) exceedance probabilities, quantiles, expectiles, and moments - auto-calibration implies T-calibration. However, the notion of T-calibration also applies to stand-alone point forecasts or regression output in terms of the functional T. We introduce population versions of T-reliability diagrams and revisit a score decomposition into measures of miscalibration (MCB), discrimination (DSC), and uncertainty (UNC). In empirical settings, stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, we propose a universal coefficient of determination, R^∗ = DSC-MCB/UNC, that nests and reinterprets the classical R^2 in least squares (mean) regression and its natural analogue R^1 in quantile regression, yet applies to T-regression in general, with MCB ≥ 0, DSC ≥ 0, and R^∗∈ [0,1] under modest conditions.
READ FULL TEXT