How You Split Matters: Data Leakage and Subject Characteristics Studies in Longitudinal Brain MRI Analysis

Deep learning models have revolutionized the field of medical image analysis, offering significant promise for improved diagnostics and patient care. However, their performance can be misleadingly optimistic due to a hidden pitfall called 'data leakage'. In this study, we investigate data leakage in 3D medical imaging, specifically using 3D Convolutional Neural Networks (CNNs) for brain MRI analysis. While 3D CNNs appear less prone to leakage than 2D counterparts, improper data splitting during cross-validation (CV) can still pose issues, especially with longitudinal imaging data containing repeated scans from the same subject. We explore the impact of different data splitting strategies on model performance for longitudinal brain MRI analysis and identify potential data leakage concerns. GradCAM visualization helps reveal shortcuts in CNN models caused by identity confounding, where the model learns to identify subjects along with diagnostic features. Our findings, consistent with prior research, underscore the importance of subject-wise splitting and evaluating our model further on hold-out data from different subjects to ensure the integrity and reliability of deep learning models in medical image analysis.

READ FULL TEXT
research
11/25/2018

An overview of deep learning in medical imaging focusing on MRI

What has happened in machine learning lately, and what does it mean for ...
research
12/11/2017

Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review

In recent years, deep convolutional neural networks (CNNs) have shown re...
research
09/09/2019

Privacy-Net: An Adversarial Approach For Identity-obfuscated Segmentation

This paper presents a privacy-preserving network oriented towards medica...
research
12/08/2017

Detecting confounding due to subject identification in clinical machine learning diagnostic applications: a permutation test approach

Recently, Saeb et al (2017) showed that, in diagnostic machine learning ...
research
09/12/2023

MELAGE: A purely python based Neuroimaging software (Neonatal)

MELAGE, a pioneering Python-based neuroimaging software, emerges as a ve...
research
11/19/2019

Visualization approach to assess the robustness of neural networks for medical image classification

The use of neural networks for diagnosis classification is becoming more...
research
06/13/2019

Enforcing temporal consistency in Deep Learning segmentation of brain MR images

Longitudinal analysis has great potential to reveal developmental trajec...

Please sign up or login with your details

Forgot password? Click here to reset