Everything is Varied: The Surprising Impact of Individual Variation on ML Robustness in Medicine

10/10/2022
by   Andrea Campagner, et al.
0

In medical settings, Individual Variation (IV) refers to variation that is due not to population differences or errors, but rather to within-subject variation, that is the intrinsic and characteristic patterns of variation pertaining to a given instance or the measurement process. While taking into account IV has been deemed critical for proper analysis of medical data, this source of uncertainty and its impact on robustness have so far been neglected in Machine Learning (ML). To fill this gap, we look at how IV affects ML performance and generalization and how its impact can be mitigated. Specifically, we provide a methodological contribution to formalize the problem of IV in the statistical learning framework and, through an experiment based on one of the largest real-world laboratory medicine datasets for the problem of COVID-19 diagnosis, we show that: 1) common state-of-the-art ML models are severely impacted by the presence of IV in data; and 2) advanced learning strategies, based on data augmentation and data imprecisiation, and proper study designs can be effective at improving robustness to IV. Our findings demonstrate the critical relevance of correctly accounting for IV to enable safe deployment of ML in clinical settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2017

A giant with feet of clay: on the validity of the data that feed machine learning in medicine

This paper considers the use of Machine Learning (ML) in medicine by foc...
research
08/27/2023

Empowering Clinicians and Democratizing Data Science: Large Language Models Automate Machine Learning for Clinical Studies

A knowledge gap persists between Machine Learning (ML) developers (e.g.,...
research
10/17/2022

Confound-leakage: Confound Removal in Machine Learning Leads to Leakage

Machine learning (ML) approaches to data analysis are now widely adopted...
research
07/19/2021

Machine Learning for Real-World Evidence Analysis of COVID-19 Pharmacotherapy

Introduction: Real-world data generated from clinical practice can be us...
research
06/15/2018

Instrumental variables regression

IV regression in the context of a re-sampling is considered in the work....
research
05/07/2019

Machine Learning Cryptanalysis of a Quantum Random Number Generator

Random number generators (RNGs) that are crucial for cryptographic appli...
research
08/21/2023

Mixed-Integer Projections for Automated Data Correction of EMRs Improve Predictions of Sepsis among Hospitalized Patients

Machine learning (ML) models are increasingly pivotal in automating clin...

Please sign up or login with your details

Forgot password? Click here to reset