Quantifying Confounding Bias in Neuroimaging Datasets with Causal Inference

07/09/2019
by   Christian Wachinger, et al.
0

Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex machine learning models. A potential solution is to increase sample size by pooling scans from several datasets. In this work, we combine 12,207 MRI scans from 15 studies and show that simple pooling is often ill-advised due to introducing various types of biases in the training data. First, we systematically define these biases. Second, we detect bias by experimentally showing that scans can be correctly assigned to their respective dataset with 73.3 confounding factors by quantifying the extent of confounding and causality in a single dataset using causal inference. We achieve this by finding the simplest graphical model in terms of Kolmogorov complexity. As Kolmogorov complexity is not directly computable, we employ the minimum description length to approximate it. We empirically show that our approach is able to estimate plausible causal relationships from real neuroimaging data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2020

Detect and Correct Bias in Multi-Site Neuroimaging Datasets

The desire to train complex machine learning algorithms and to increase ...
research
04/28/2018

Detect, Quantify, and Incorporate Dataset Bias: A Neuroimaging Analysis on 12,207 Individuals

Neuroimaging datasets keep growing in size to address increasingly compl...
research
08/22/2023

Does Misclassifying Non-confounding Covariates as Confounders Affect the Causal Inference within the Potential Outcomes Framework?

The Potential Outcome Framework (POF) plays a prominent role in the fiel...
research
08/27/2021

Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing

Machine learning models achieve state-of-the-art performance on many sup...
research
06/22/2022

Causal inference in multi-cohort studies using the target trial approach

Longitudinal cohort studies have the potential to examine causal effects...
research
12/12/2019

It's easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets

Confounding variables are a well known source of nuisance in biomedical ...
research
08/05/2017

Quantifying homologous proteins and proteoforms

Many proteoforms - arising from alternative splicing, post-translational...

Please sign up or login with your details

Forgot password? Click here to reset