Recalibration of Predictive Models as Approximate Probabilistic Updates

by   Evan T. R. Rosenman, et al.

The output of predictive models is routinely recalibrated by reconciling low-level predictions with known derived quantities defined at higher levels of aggregation. For example, models predicting turnout probabilities at the individual level in U.S. elections can be adjusted so that their aggregation matches the observed vote totals in each state, thus producing better calibrated predictions. In this research note, we provide theoretical grounding for one of the most commonly used recalibration strategies, known colloquially as the "logit shift." Typically cast as a heuristic optimization problem (whereby an adjustment is found such that it minimizes the difference between aggregated predictions and the target totals), we show that the logit shift in fact offers a fast and accurate approximation to a principled, but often computationally impractical adjustment strategy: computing the posterior prediction probabilities, conditional on the target totals. After deriving analytical bounds on the quality of the approximation, we illustrate the accuracy of the approach using Monte Carlo simulations. The simulations also confirm analytical results regarding scenarios in which users of the simple logit shift can expect it to perform best – namely, when the aggregated targets are comprised of many individual predictions, and when the distribution of true probabilities is symmetric and tight around 0.5.


page 1

page 2

page 3

page 4


Reconciling Individual Probability Forecasts

Individual probabilities refer to the probabilities of outcomes that are...

Sparse joint shift in multinomial classification

Sparse joint shift (SJS) was recently proposed as a tractable model for ...

The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions

Nowcasting based on social media text promises to provide unobtrusive an...

Prediction and Evaluation in College Hockey using the Bradley-Terry-Zermelo Model

We describe the application of the Bradley-Terry model to NCAA Division ...

Challenging the Lévy Flight Foraging Hypothesis -A Joint Monte Carlo and Numerical PDE Approach

For a Lévy process on the flat torus 𝕋^2 with power law jump length dist...

Predicting the Performance of IDA* using Conditional Distributions

Korf, Reid, and Edelkamp introduced a formula to predict the number of n...

Deep learning languages: a key fundamental shift from probabilities to weights?

Recent successes in language modeling, notably with deep learning method...

Please sign up or login with your details

Forgot password? Click here to reset