Optimization of Survey Weights under a Large Number of Conflicting Constraints

01/12/2019
by   Matthew R. Williams, et al.
0

In the analysis of survey data, sampling weights are needed for consistent estimation of the population. However, the original inverse probability weights from the survey sample design are typically modified to account for non-response, to increase efficiency by incorporating auxiliary population information, and to reduce the variability in estimates due to extreme weights. It is often the case that no single set of weights can be found which successfully incorporates all of these modifications because together they induce a large number of constraints and restrictions on the feasible solution space. For example, a unique combination of categorical variables may not be present in the sample data, even if the corresponding population level information is available. Additional requirements for weights to fall within specified ranges may also lead to fewer population level adjustments being incorporated. We present a framework and accompanying computational methods to address this issue of constraint achievement or selection within a restricted space that will produce revised weights with reasonable properties. By combining concepts from generalized raking, ridge and lasso regression, benchmarking of small area estimates, augmentation of state-space equations, path algorithms, and data-cloning, this framework simultaneously selects constraints and provides diagnostics suggesting why a fully constrained solution is not possible. Combinatoric operations such as brute force evaluations of all possible combinations of constraints and restrictions are avoided. We demonstrate this framework by applying alternative methods to post-stratification for the National Survey on Drug Use and Health. We also discuss strategies for scaling up to even larger data sets. Computations were performed in R and code is available from the authors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2022

Population level information combined parameter estimation from complex survey datasets

We consider an empirical likelihood framework for inference for a statis...
research
07/06/2020

Adjusted Logistic Propensity Weighting Methods for Population Inference using Nonprobability Volunteer-Based Epidemiologic Cohorts

Many epidemiologic studies forgo probability sampling and turn to nonpro...
research
05/15/2023

Bayesian predictive inference when integrating a non-probability sample and a probability sample

We consider the problem of integrating a small probability sample (ps) a...
research
05/02/2019

A Conditional Empirical Likelihood Based Method for Model Parameter Estimation from Complex survey Datasets

We consider an empirical likelihood framework for inference for a statis...
research
04/24/2018

Estimation and inference of domain means subject to shape constraints

Population domain means are frequently expected to respect shape or orde...

Please sign up or login with your details

Forgot password? Click here to reset