Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data

by   Janine Witte, et al.

Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focussing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this paper, we investigate two alternative solutions: Test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: As one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.



There are no comments yet.


page 3

page 20

page 21


Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion

Many real datasets contain values missing not at random (MNAR). In this ...

Estimating Average Treatment Effects Utilizing Fractional Imputation when Confounders are Subject to Missingness

The problem of missingness in observational data is ubiquitous. When the...

A Unified Framework for Causal Inference with Multiple Imputation Using Martingale

Multiple imputation is widely used to handle confounders missing at rand...

Causal Discovery from Incomplete Data using An Encoder and Reinforcement Learning

Discovering causal structure among a set of variables is a fundamental p...

A cautionary tale on using imputation methods for inference in matched pairs design

Imputation procedures in biomedical fields have turned into statistical ...

Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison

Nonparametric and machine learning methods are flexible methods for obta...

Causal Imputation via Synthetic Interventions

Consider the problem of determining the effect of a drug on a specific c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.