Efficacy of MRI data harmonization in the age of machine learning. A multicenter study across 36 datasets

11/08/2022
by   Chiara Marzi, et al.
0

Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage. We tested these tools using brain T1-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we measured the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage.

READ FULL TEXT

page 25

page 27

research
12/08/2016

Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

Machine learning analysis of neuroimaging data can accurately predict ch...
research
12/24/2022

Hybrid Representation Learning for Cognitive Diagnosis in Late-Life Depression Over 5 Years with Structural MRI

Late-life depression (LLD) is a highly prevalent mood disorder occurring...
research
07/04/2021

Survey: Leakage and Privacy at Inference Time

Leakage of data from publicly available Machine Learning (ML) models is ...
research
05/31/2022

FedHarmony: Unlearning Scanner Bias with Distributed Data

The ability to combine data across scanners and studies is vital for neu...
research
03/26/2018

Removing scanner biases using Generative Adversarial Networks

Magnetic Resonance Imaging (MRI) of the brain has been used to investiga...
research
10/01/2019

Harmonization of diffusion MRI datasets with adaptive dictionary learning

Diffusion weighted magnetic resonance imaging is a noninvasive imaging t...
research
10/17/2022

Confound-leakage: Confound Removal in Machine Learning Leads to Leakage

Machine learning (ML) approaches to data analysis are now widely adopted...

Please sign up or login with your details

Forgot password? Click here to reset