Estimating prediction error for complex samples

11/13/2017
by   Andrew Holbrook, et al.
0

Non-uniform random samples are commonly generated in multiple scientific fields ranging from economics to medicine. Complex sampling designs afford research with increased precision for estimating parameters of interest in less prevalent sub-populations. With a growing interest in using complex samples to generate prediction models for numerous outcomes it is necessary to account for the sampling design that gave rise to the data in order to assess the generalized predictive utility of a proposed prediction rule. Specifically, after learning a prediction rule based on a complex sample, it is of interest to estimate the rule's error rate when applied to unobserved members of the population. Efron proposed a general class of covariance-inflated prediction error estimators that assumed the available training data is representative of the target population for which the prediction rule is to be applied. We extend Efron's estimator to the complex sample context by incorporating Horvitz-Thompson sampling weights and show that it is consistent for the true generalization error rate when applied to the underlying superpopulation giving rise to the training sample. The resulting Horvitz-Thompson-Efron (HTE) estimator is equivalent to dAIC---a recent extension of AIC to survey sampling data---and is more widely applicable. The proposed methodology is assessed via empirical simulations and is applied to data predicting renal function that was obtained from the National Health and Nutrition Examination Survey (NHANES).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/11/2022

Semiparametric adaptive estimation under informative sampling

In survey sampling, survey data do not necessarily represent the target ...
research
03/02/2023

Design-based conformal prediction

Conformal prediction is an assumption-lean approach to generating distri...
research
03/17/2020

Generalizing Randomized Trial Findings to a Target Population using Complex Survey Population Data

Randomized trials are considered the gold standard for estimating causal...
research
04/19/2023

The effect of estimating prevalences on the population-wise error rate

The population-wise error rate (PWER) is a type I error rate for clinica...
research
03/12/2019

Doubly Robust Inference when Combining Probability and Non-probability Samples with High-dimensional Data

Non-probability samples become increasingly popular in survey statistics...
research
06/12/2023

On the Validity of Conformal Prediction for Network Data Under Non-Uniform Sampling

We study the properties of conformal prediction for network data under v...
research
02/16/2023

Augmented two-step estimating equations with nuisance functionals and complex survey data

Statistical inference in the presence of nuisance functionals with compl...

Please sign up or login with your details

Forgot password? Click here to reset