Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForest

04/23/2020
by   Shangzhi Hong, et al.
0

Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but they can be time-consuming when handling large datasets. To overcome this drawback, parallel computing strategies have been proposed but their impact on imputation results and subsequent statistical analyses are relatively unknown. This study examines the two parallel strategies (variable-wise distributed computation and model-wise distributed computation) implemented in the random-forest imputation method, missForest. Results from the simulation experiments showed that the two parallel strategies can influence both the imputation process and the final imputation results differently. Specifically, even though both strategies produced similar normalized root mean squared prediction errors, the variable-wise distributed strategy led to additional biases when estimating the mean and inter-correlation of the covariates and their regression coefficients.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2017

Random Forest Missing Data Algorithms

Random forest (RF) missing data algorithms are an attractive approach fo...
research
11/30/2017

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

Missing data is an expected issue when large amounts of data is collecte...
research
03/30/2022

A comparison of strategies for selecting auxiliary variables for multiple imputation

Multiple imputation (MI) is a popular method for handling missing data. ...
research
10/22/2021

Missing the Point: Non-Convergence in Iterative Imputation Algorithms

Iterative imputation is a popular tool to accommodate missing data. Whil...
research
06/15/2022

HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

Consider the problem of imputing missing values in a dataset. One the on...
research
12/09/2021

On the Relation between Prediction and Imputation Accuracy under Missing Covariates

Missing covariates in regression or classification problems can prohibit...
research
01/19/2021

Goodness (of fit) of Imputation Accuracy: The GoodImpact Analysis

In statistical survey analysis, (partial) non-responders are integral el...

Please sign up or login with your details

Forgot password? Click here to reset