A cautionary tale on using imputation methods for inference in matched pairs design

06/18/2018
by   Burim Ramosaj, et al.
0

Imputation procedures in biomedical fields have turned into statistical practice, since further analyses can be conducted ignoring the former presence of missing values. In particular, non-parametric imputation schemes like the random forest or a combination with the stochastic gradient boosting have shown favorable imputation performance compared to the more traditionally used MICE procedure. However, their effect on valid statistical inference has not been analyzed so far. This paper closes this gap by investigating their validity for inferring mean differences in incompletely observed pairs while opposing them to a recent approach that only works with the given observations at hand. Our findings indicate that machine learning schemes for (multiply) imputing missing values heavily inflate type-I-error in small to moderate matched pairs, even after modifying the test statistics using Rubin's multiple imputation rule. In addition to an extensive simulation study, an illustrative data example from a breast cancer gene study has been considered.

READ FULL TEXT
research
11/30/2017

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

Missing data is an expected issue when large amounts of data is collecte...
research
12/10/2019

Asymptotic based bootstrap approach for matched pairs with missingness in a single-arm

The issue of missing values is an arising difficulty when dealing with p...
research
12/09/2021

On the Relation between Prediction and Imputation Accuracy under Missing Covariates

Missing covariates in regression or classification problems can prohibit...
research
11/16/2020

Imputation techniques on missing values in breast cancer treatment and fertility data

Clinical decision support using data mining techniques offers more intel...
research
04/30/2020

Multiple imputation using chained random forests: a preliminary study based on the empirical distribution of out-of-bag prediction errors

Missing data are common in data analyses in biomedical fields, and imput...
research
08/30/2021

Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data

Causal discovery algorithms estimate causal graphs from observational da...
research
01/26/2018

Multiplication-Combination Tests for Incomplete Paired Data

We consider statistical procedures for hypothesis testing of real valued...

Please sign up or login with your details

Forgot password? Click here to reset