DeepAI AI Chat
Log In Sign Up

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

by   Burim Ramosaj, et al.
Universität Ulm

Missing data is an expected issue when large amounts of data is collected, and several imputation techniques have been proposed to tackle this problem. Beneath classical approaches such as MICE, the application of Machine Learning techniques is tempting. Here, the recently proposed missForest imputation method has shown high imputation accuracy under the Missing (Completely) at Random scheme with various missing rates. In its core, it is based on a random forest for classification and regression, respectively. In this paper we study whether this approach can even be enhanced by other methods such as the stochastic gradient tree boosting method, the C5.0 algorithm or modified random forest procedures. In particular, other resampling strategies within the random forest protocol are suggested. In an extensive simulation study, we analyze their performances for continuous, categorical as well as mixed-type data. Therein, MissBooPF, a combination of the stochastic gradient tree boosting method together with the parametrically bootstrapped random forest method, appeared to be promising. Finally, an empirical analysis focusing on credit information and Facebook data is conducted.


page 1

page 2

page 3

page 4


A cautionary tale on using imputation methods for inference in matched pairs design

Imputation procedures in biomedical fields have turned into statistical ...

MissForest - nonparametric missing value imputation for mixed-type data

Modern data acquisition based on high-throughput technology is often fac...

Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForest

Machine learning iterative imputation methods have been well accepted by...

MURAL: An Unsupervised Random Forest-Based Embedding for Electronic Health Record Data

A major challenge in embedding or visualizing clinical patient data is t...

On the consistency of a random forest algorithm in the presence of missing entries

This paper tackles the problem of constructing a non-parametric predicto...

Dimensionality reduction with missing values imputation

In this study, we propose a new statical approach for high-dimensionalit...

Winning Models for GPA, Grit, and Layoff in the Fragile Families Challenge

In this paper, we discuss and analyze our approach to the Fragile Famili...