Identifying Patient-Specific Root Causes of Disease

by   Eric V. Strobl, et al.

Complex diseases are caused by a multitude of factors that may differ between patients. As a result, hypothesis tests comparing all patients to all healthy controls can detect many significant variables with inconsequential effect sizes. A few highly predictive root causes may nevertheless generate disease within each patient. In this paper, we define patient-specific root causes as variables subject to exogenous "shocks" which go on to perturb an otherwise healthy system and induce disease. In other words, the variables are associated with the exogenous errors of a structural equation model (SEM), and these errors predict a downstream diagnostic label. We quantify predictivity using sample-specific Shapley values. This derivation allows us to develop a fast algorithm called Root Causal Inference for identifying patient-specific root causes by extracting the error terms of a linear SEM and then computing the Shapley value associated with each error. Experiments highlight considerable improvements in accuracy because the method uncovers root causes that may have large effect sizes at the individual level but clinically insignificant effect sizes at the group level. An R implementation is available at


page 1

page 2

page 3

page 4


Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model

Complex diseases are caused by a multitude of factors that may differ be...

Sample-Specific Root Causal Inference with Latent Variables

Root causal analysis seeks to identify the set of initial perturbations ...

Evaluating the root causes of fatigue and associated risk factors in the Brazilian regular aviation industry

This work evaluates the potential root causes of fatigue using a biomath...

Counterfactual Formulation of Patient-Specific Root Causes of Disease

Root causes of disease intuitively correspond to root vertices that incr...

Learning DAGs from Data with Few Root Causes

We present a novel perspective and algorithm for learning directed acycl...

A Neural Attention Model for Categorizing Patient Safety Events

Medical errors are leading causes of death in the US and as such, preven...

Meta-analysis of Gene Expression in Neurodegenerative Diseases Reveals Patterns in GABA Synthesis and Heat Stress Pathways

Neurodegenerative diseases are characterized as the progressive loss of ...

Please sign up or login with your details

Forgot password? Click here to reset