Log In Sign Up

On the Relation between Prediction and Imputation Accuracy under Missing Covariates

by   Burim Ramosaj, et al.

Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for imputation. It originates from their capability of showing favourable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine Learning based methods for both, imputation and prediction are used. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.


page 9

page 10

page 12

page 13

page 14

page 26

page 31

page 32


Evolving imputation strategies for missing data in classification problems with TPOT

Missing data has a ubiquitous presence in real-life applications of mach...

A cautionary tale on using imputation methods for inference in matched pairs design

Imputation procedures in biomedical fields have turned into statistical ...

Bootstrap Inference for Multiple Imputation under Uncongeniality and Misspecification

Multiple imputation has become one of the most popular approaches for ha...

Multiple-level Point Embedding for Solving Human Trajectory Imputation with Prediction

Sparsity is a common issue in many trajectory datasets, including human ...

Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForest

Machine learning iterative imputation methods have been well accepted by...

Efficient surrogate-assisted inference for patient-reported outcome measures with complex missing mechanism

Patient-reported outcome (PRO) measures are increasingly collected as a ...

Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison

Nonparametric and machine learning methods are flexible methods for obta...