Detect, Quantify, and Incorporate Dataset Bias: A Neuroimaging Analysis on 12,207 Individuals

04/28/2018
by   Christian Wachinger, et al.
0

Neuroimaging datasets keep growing in size to address increasingly complex medical questions. However, even the largest datasets today alone are too small for training complex models or for finding genome wide associations. A solution is to grow the sample size by merging data across several datasets. However, bias in datasets complicates this approach and includes additional sources of variation in the data instead. In this work, we combine 15 large neuroimaging datasets to study bias. First, we detect bias by demonstrating that scans can be correctly assigned to a dataset with 73.3 metrics to quantify the compatibility across datasets and to create embeddings of neuroimaging sites. Finally, we incorporate the presence of bias for the selection of a training set for predicting autism. For the quantification of the dataset bias, we introduce two metrics: the Bhattacharyya distance between datasets and the age prediction error. The presented embedding of neuroimaging sites provides an interesting new visualization about the similarity of different sites. This could be used to guide the merging of data sources, while limiting the introduction of unwanted variation. Finally, we demonstrate a clear performance increase when incorporating dataset bias for training set selection in autism prediction. Overall, we believe that the growing amount of neuroimaging data necessitates to incorporate data-driven methods for quantifying dataset bias in future analyses.

READ FULL TEXT

page 5

page 6

research
07/09/2019

Quantifying Confounding Bias in Neuroimaging Datasets with Causal Inference

Neuroimaging datasets keep growing in size to address increasingly compl...
research
02/12/2020

Detect and Correct Bias in Multi-Site Neuroimaging Datasets

The desire to train complex machine learning algorithms and to increase ...
research
05/25/2023

A Robust Classifier Under Missing-Not-At-Random Sample Selection Bias

The shift between the training and testing distributions is commonly due...
research
08/24/2023

CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias

As language models (LMs) become increasingly powerful, it is important t...
research
12/19/2022

Towards Assessing Data Bias in Clinical Trials

Algorithms and technologies are essential tools that pervade all aspects...
research
10/21/2022

Men Also Do Laundry: Multi-Attribute Bias Amplification

As computer vision systems become more widely deployed, there is increas...
research
09/11/2023

Challenges in Annotating Datasets to Quantify Bias in Under-represented Society

Recent advances in artificial intelligence, including the development of...

Please sign up or login with your details

Forgot password? Click here to reset