Using leave-one-out cross-validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale

09/05/2022
by   Swen Kuh, et al.
0

In recent decades, multilevel regression and poststratification (MRP) has surged in popularity for population inference. However, the validity of the estimates can depend on details of the model, and there is currently little research on validation. We explore how leave-one-out cross-validation (LOO) can be used to compare Bayesian models for MRP. We investigate two approximate calculations of LOO, the Pareto smoothed importance sampling (PSIS-LOO) and a survey-weighted alternative (WTD-PSIS-LOO). Using two simulation designs, we examine how accurately these two criteria recover the correct ordering of model goodness at predicting population and small area level estimands. Focusing first on variable selection, we find that neither PSIS-LOO nor WTD-PSIS-LOO correctly recovers the models' order for an MRP population estimand (although both criteria correctly identify the best and worst model). When considering small-area estimation, the best model differs for different small areas, highlighting the complexity of MRP validation. When considering different priors, the models' order seems slightly better at smaller area levels. These findings suggest that while not terrible, PSIS-LOO-based ranking techniques may not be suitable to evaluate MRP as a method. We suggest this is due to the aggregation stage of MRP, where individual-level prediction errors average out. These results show that in practice, PSIS-LOO-based model validation tools need to be used with caution and might not convey the full story when validating MRP as a method.

READ FULL TEXT

page 16

page 17

page 18

page 27

page 28

research
02/26/2020

Towards new cross-validation-based estimators for Gaussian process regression: efficient adjoint computation of gradients

We consider the problem of estimating the parameters of the covariance f...
research
02/17/2019

Approximate leave-future-out cross-validation for Bayesian time series models

One of the common goals of time series analysis is to use the observed s...
research
02/17/2019

Approximate leave-future-out cross-validation for time series models

One of the common goals of time series analysis is to use the observed s...
research
08/29/2023

Small Area Estimation with Random Forests and the LASSO

We consider random forests and LASSO methods for model-based small area ...
research
08/24/2020

Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison

Leave-one-out cross-validation (LOO-CV) is a popular method for comparin...
research
03/21/2023

Machine Learning Techniques for Estimating Soil Moisture from Mobile Captured Images

Precise Soil Moisture (SM) assessment is essential in agriculture. By un...

Please sign up or login with your details

Forgot password? Click here to reset