Multiple Imputation with Massive Data: an Application to the Panel Study of Income Dynamics

07/06/2020
by   Yajuan Si, et al.
0

Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods. We use a sequential regression/ chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with results from the current hot deck approach. Practical difficulties and our approaches to overcoming them, are described in this setting. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. Though MI produces some improvements over the existing hot deck approach, gains are limited due to a relatively small fraction of missing information in this application. We demonstrate the practical implementation and expect greater gains when the fraction of missing information is large.

READ FULL TEXT
research
09/09/2021

Evaluation of imputation techniques with varying percentage of missing data

Missing data is a common problem which has consistently plagued statisti...
research
01/19/2017

Random Forest Missing Data Algorithms

Random forest (RF) missing data algorithms are an attractive approach fo...
research
08/21/2021

A computational study on imputation methods for missing environmental data

Data acquisition and recording in the form of databases are routine oper...
research
01/28/2019

CollaGAN : Collaborative GAN for Missing Image Data Imputation

In many applications requiring multiple inputs to obtain a desired outpu...
research
01/24/2022

Imputing Missing Values in the Occupational Requirements Survey

The U.S. Bureau of Labor Statistics allows public access to much of the ...
research
01/12/2018

Multiple Imputation: A Review of Practical and Theoretical Findings

Multiple imputation is a straightforward method for handling missing dat...
research
07/12/2021

Choosing Imputation Models

Imputing missing values is an important preprocessing step in data analy...

Please sign up or login with your details

Forgot password? Click here to reset