Simultaneous Edit and Imputation for Household Data with Structural Zeros

04/14/2018
by   Olanrewaju Akande, et al.
0

Multivariate categorical data nested within households often include reported values that fail edit constraints---for example, a participating household reports a child's age as older than his biological parent's age---as well as missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2020

A logic-based resampling with matching approach to multiple imputation of missing data

Researchers often use model-based multiple imputation to handle missing ...
research
07/12/2021

Choosing Imputation Models

Imputing missing values is an important preprocessing step in data analy...
research
03/30/2021

Model-based clustering of partial records

Partially recorded data are frequently encountered in many applications ...
research
07/12/2020

Multiple Imputation and Synthetic Data Generation with the R package NPBayesImputeCat

In many contexts, missing data and disclosure control are ubiquitous and...
research
11/11/2020

Multiple Imputation for Nonignorable Item Nonresponse in Complex Surveys Using Auxiliary Margin

We outline a framework for multiple imputation of nonignorable item nonr...
research
05/05/2022

Assistive Recipe Editing through Critiquing

There has recently been growing interest in the automatic generation of ...
research
03/02/2020

Uncertainty-Gated Stochastic Sequential Model for EHR Mortality Prediction

Electronic health records (EHR) are characterized as non-stationary, het...

Please sign up or login with your details

Forgot password? Click here to reset