Multiple Imputation with Massive Data: an Application to the Panel Study of Income Dynamics

07/06/2020
by   Yajuan Si, et al.
0

Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods. We use a sequential regression/ chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with results from the current hot deck approach. Practical difficulties and our approaches to overcoming them, are described in this setting. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. Though MI produces some improvements over the existing hot deck approach, gains are limited due to a relatively small fraction of missing information in this application. We demonstrate the practical implementation and expect greater gains when the fraction of missing information is large.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset