Harmonization with Flow-based Causal Inference

by   Rongguang Wang, et al.

Heterogeneity in medical data, e.g., from data collected at different sites and with different protocols in a clinical study, is a fundamental hurdle for accurate prediction using machine learning models, as such models often fail to generalize well. This paper presents a normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM) to harmonize such data. We formulate a causal model for observed effects (brain magnetic resonance imaging data) that result from known confounders (site, gender and age) and exogenous noise variables. Our method exploits the bijection induced by flow for harmonization. We can infer the posterior of exogenous variables, intervene on observations, and draw samples from the resultant SCM to obtain counterfactuals. We evaluate on multiple, large, real-world medical datasets to observe that this method leads to better cross-domain generalization compared to state-of-the-art algorithms. Further experiments that evaluate the quality of confounder-independent data generated by our model using regression and classification tasks are provided.


page 1

page 2

page 3

page 4


Embracing the Disharmony in Heterogeneous Medical Data

Heterogeneity in medical imaging data is often tackled, in the context o...

Machine Learning Models Are Not Necessarily Biased When Constructed Properly: Evidence from Neuroimaging Studies

Despite the great promise that machine learning has offered in many fiel...

Scanner Invariant Multiple Sclerosis Lesion Segmentation from MRI

This paper presents a simple and effective generalization method for mag...

Classification of ADHD Patients Using Kernel Hierarchical Extreme Learning Machine

Recently, the application of deep learning models to diagnose neuropsych...

A Structural Causal Model for MR Images of Multiple Sclerosis

Precision medicine involves answering counterfactual questions such as "...

Bayesian Causal Inference

We address the problem of two-variable causal inference. This task is to...

Clustering Causal Additive Noise Models

Additive noise models are commonly used to infer the causal direction fo...