Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging

09/27/2019
∙
by   Luke Oakden-Rayner, et al.
∙
24
∙

Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model still consistently misses a rare but aggressive cancer subtype. We refer to this problem as hidden stratification, and observe that it results from incompletely describing the meaningful variation in a dataset. While hidden stratification can substantially reduce the clinical efficacy of machine learning models, its effects remain difficult to measure. In this work, we assess the utility of several possible techniques for measuring and describing hidden stratification effects, and characterize these effects both on multiple medical imaging datasets and via synthetic experiments on the well-characterised CIFAR-100 benchmark dataset. We find evidence that hidden stratification can occur in unidentified imaging subsets with low prevalence, low label quality, subtle distinguishing features, or spurious correlates, and that it can result in relative performance differences of over 20 implications of our findings, and suggest that evaluation of hidden stratification should be a critical component of any machine learning deployment in medical imaging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 12/03/2018

Learning to Unlearn: Building Immunity to Dataset Bias in Medical Imaging Studies

Medical imaging machine learning algorithms are usually evaluated on a s...
research
∙ 06/07/2023

AutoML Systems For Medical Imaging

The integration of machine learning in medical image analysis can greatl...
research
∙ 03/29/2023

Improving Object Detection in Medical Image Analysis through Multiple Expert Annotators: An Empirical Investigation

The work discusses the use of machine learning algorithms for anomaly de...
research
∙ 09/25/2021

A Principled Approach to Failure Analysis and Model Repairment: Demonstration in Medical Imaging

Machine learning models commonly exhibit unexpected failures post-deploy...
research
∙ 10/15/2020

Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset

The reliability of machine learning models can be compromised when train...
research
∙ 05/27/2022

Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed

Failure detection in automated image classification is a critical safegu...
research
∙ 08/12/2023

Hypothesis testing for medical imaging analysis via the smooth Euler characteristic transform

Shape-valued data are of interest in applied sciences, particularly in m...

Please sign up or login with your details

Forgot password? Click here to reset