Improving Survey Inference in Two-phase Designs Using Bayesian Machine Learning

06/07/2023
by   Xinru Wang, et al.
0

The two-phase sampling design is a cost-effective sampling strategy that has been widely used in public health research. The conventional approach in this design is to create subsample specific weights that adjust for probability of selection and response in the second phase. However, these weights can be highly variable which in turn results in unstable weighted analyses. Alternatively, we can use the rich data collected in the first phase of the study to improve the survey inference of the second phase sample. In this paper, we use a Bayesian tree-based multiple imputation (MI) approach for estimating population means using a two-phase survey design. We demonstrate how to incorporate complex survey design features, such as strata, clusters, and weights, into the imputation procedure. We use a simulation study to evaluate the performance of the tree-based MI approach in comparison to the alternative weighted analyses using the subsample weights. We find the tree-based MI method outperforms weighting methods with smaller bias, reduced root mean squared error, and narrower 95% confidence intervals that have closer to the nominal level coverage rate. We illustrate the application of the proposed method by estimating the prevalence of diabetes among the United States non-institutionalized adult population using the fasting blood glucose data collected only on a subsample of participants in the 2017-2018 National Health and Nutrition Examination Survey.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2021

Variable selection for longitudinal survey data

In this article we propose a new variable selection method for analyzing...
research
07/19/2019

Matrix Completion for Survey Data Prediction with Multivariate Missingness

Survey data are the gold-standard for estimating finite population param...
research
11/16/2017

Adjusting for selective non-participation with re-contact data in the FINRISK 2012 survey

Aims: A common objective of epidemiological surveys is to provide popula...
research
06/14/2023

Nonprobability follow-up sample analysis: an application to SARS-CoV-2 infection prevalence estimation

Public health policy makers are faced with making crucial decisions rapi...
research
05/26/2022

Confidence Intervals for Prevalence Estimates from Complex Surveys with Imperfect Assays

We present several related methods for creating confidence intervals to ...
research
01/24/2022

Imputing Missing Values in the Occupational Requirements Survey

The U.S. Bureau of Labor Statistics allows public access to much of the ...

Please sign up or login with your details

Forgot password? Click here to reset