Bootstrapping and Multiple Imputation Ensemble Approaches for Missing Data

02/01/2018
by   Shehroz S. Khan, et al.
0

Presence of missing values in a dataset can adversely affect the performance of a classifier; it deteriorates rapidly as missingness increases. Single and Multiple Imputation (MI) are normally performed to fill in the missing values. In this paper, we present several variants of combining MI and bootstrapping to create ensemble that can model uncertainty and diversity in the data and that are robust to high missingness in the data. We present three ensemble strategies: bootstrapping on incomplete data followed by single imputation and MI, and MI ensemble without bootstrapping. We use mean imputation, Gaussian random imputation and expectation maximization as the base imputation methods to be used in these ensemble strategies. We perform an extensive evaluation of the performance of the proposed ensemble strategies on 8 datasets by varying the missingness ratio. Our results show that bootstrapping followed by average of MIs using expectation maximization is the most robust method that prevents the classifier's performance from degrading, even at high missingness ratio (30 perform equivalently but better than their single imputation counterparts. Kappa-error plots suggest that accurate classifiers with reasonable diversity is the reason for this behaviour. A consistent observation in all the datasets suggests that for small missingness (up to 10 data without any imputation produces equivalent results to other ensemble methods with imputations.

READ FULL TEXT
research
01/28/2022

A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data

This paper tackles the problem of missing data imputation for noisy and ...
research
07/21/2022

Missing Values and the Dimensionality of Expected Returns

Combining 100+ cross-sectional predictors requires either dropping 90 da...
research
02/28/2022

Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework

Missing values in tabular data restrict the use and performance of machi...
research
06/09/2021

EMFlow: Data Imputation in Latent Space via EM and Deep Flow Models

High dimensional incomplete data can be found in a wide range of systems...
research
10/19/2021

Riemannian classification of EEG signals with missing values

This paper proposes two strategies to handle missing data for the classi...
research
04/28/2023

Counterfactual Explanation with Missing Values

Counterfactual Explanation (CE) is a post-hoc explanation method that pr...

Please sign up or login with your details

Forgot password? Click here to reset