Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion

05/25/2017
by   Eric V. Strobl, et al.
0

Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain only a few missing values. In this report, we show that we can more efficiently utilize the observed values with test-wise deletion while still maintaining algorithmic soundness. Here, test-wise deletion refers to the process of list-wise deleting samples only among the variables required for each conditional independence (CI) test used in constraint-based searches. Test-wise deletion therefore often saves more samples than list-wise deletion for each CI test, especially when we have a sparse underlying graph. Our theoretical results show that test-wise deletion is sound under the justifiable assumption that none of the missingness mechanisms causally affect each other in the underlying causal graph. We also find that FCI and RFCI with test-wise deletion outperform their list-wise deletion and imputation counterparts on average when MNAR holds in both synthetic and real data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2021

Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data

Causal discovery algorithms estimate causal graphs from observational da...
research
09/28/2020

Rate-Distance Tradeoffs for List-Decodable Insertion-Deletion Codes

This paper presents general bounds on the highest achievable rate for li...
research
07/11/2018

Causal discovery in the presence of missing data

Missing data are ubiquitous in many domains such as healthcare. Dependin...
research
01/06/2022

List-decodable Codes for Single-deletion Single-substitution with List-size Two

In this paper, we present an explicit construction of list-decodable cod...
research
03/30/2021

Model-based clustering of partial records

Partially recorded data are frequently encountered in many applications ...
research
04/20/2021

On link deletion and point deletion in games on graphs

We discuss link and point deletion operators on graph games and provide ...
research
10/26/2020

The More Data, the Better? Demystifying Deletion-Based Methods in Linear Regression with Missing Data

We compare two deletion-based methods for dealing with the problem of mi...

Please sign up or login with your details

Forgot password? Click here to reset