Comparing Sequential Forecasters

09/30/2021
by   Yo Joong Choe, et al.
0

Consider two or more forecasters, each making a sequence of predictions for different events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts or outcomes were generated? This work presents a novel and rigorous answer to this question. We design a sequential inference procedure for estimating the time-varying difference in forecast quality as measured by a relatively large class of proper scoring rules (bounded scores with a linear equivalent). The resulting confidence intervals are nonasymptotically valid, and can be continuously monitored to yield statistically valid comparisons at arbitrary data-dependent stopping times ("anytime-valid"); this is enabled by adapting variance-adaptive supermartingales, confidence sequences, and e-processes to our setting. Motivated by Shafer and Vovk's game-theoretic probability, our coverage guarantees are also distribution-free, in the sense that they make no distributional assumptions on the forecasts or outcomes. In contrast to a recent work by Henzi and Ziegel, our tools can sequentially test a weak null hypothesis about whether one forecaster outperforms another on average over time. We demonstrate their effectiveness by comparing forecasts on Major League Baseball (MLB) games and statistical postprocessing methods for ensemble weather forecasts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2021

Valid sequential inference on probability forecast performance

Probability forecasts for binary events play a central role in many appl...
research
09/24/2021

Sequentially valid tests for forecast calibration

Forecasting and forecast evaluation are inherently sequential tasks. Pre...
research
01/23/2023

Huber-Robust Confidence Sequences

Confidence sequences are confidence intervals that can be sequentially t...
research
05/25/2021

Ranking earthquake forecasts using proper scoring rules: Binary events in a low probability environment

Operational earthquake forecasting for risk management and communication...
research
02/27/2023

Design-Based Inference for Multi-arm Bandits

Multi-arm bandits are gaining popularity as they enable real-world seque...
research
10/04/2022

Game-theoretic statistics and safe anytime-valid inference

Safe anytime-valid inference (SAVI) provides measures of statistical evi...
research
05/26/2023

Angular Combining of Forecasts of Probability Distributions

When multiple forecasts are available for a probability distribution, fo...

Please sign up or login with your details

Forgot password? Click here to reset