Conformal prediction for exponential families and generalized linear models

05/09/2019
by   Daniel J. Eck, et al.
0

Conformal prediction methods construct prediction regions for iid data that are valid in finite samples. Distribution-free conformal prediction methods have been proposed for regression. Generalized linear models (GLMs) are a widely used class of regression models, and researchers often seek predictions from fitted GLMs. We provide a parametric conformal prediction region for GLMs that possesses finite sample validity and is asymptotically of minimal length when the model is correctly specified. This parametric conformal prediction region is asymptotically minimal at the √((n)/n) rate when the dimension d of the predictor is one or two, and converges at the O{((n)/n)^1/d} rate when d > 2. We develop a novel concentration inequality for maximum likelihood estimation in exponential families that induces these convergence rates. We analyze prediction region coverage properties, large-sample efficiency, and robustness properties of four methods for constructing conformal prediction intervals for GLMs: fully nonparametric kernel-based conformal, residual based conformal, normalized residual based conformal, and parametric conformal which uses the assumed GLM density as a conformity measure. Extensive simulations compare these approaches to standard asymptotic prediction regions. The utility of the parametric conformal prediction region is demonstrated in an application to interval prediction of glycosylated hemoglobin levels, a blood measurement used to diagnose diabetes.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset