Problems and shortcuts in deep learning for screening mammography

03/29/2023
by   Trevor Tsue, et al.
0

This work reveals undiscovered challenges in the performance and generalizability of deep learning models. We (1) identify spurious shortcuts and evaluation issues that can inflate performance and (2) propose training and analysis methods to address them. We trained an AI model to classify cancer on a retrospective dataset of 120,112 US exams (3,467 cancers) acquired from 2008 to 2017 and 16,693 UK exams (5,655 cancers) acquired from 2011 to 2015. We evaluated on a screening mammography test set of 11,593 US exams (102 cancers; 7,594 women; age 57.1 ±11.0) and 1,880 UK exams (590 cancers; 1,745 women; age 63.3 ±7.2). A model trained on images of only view markers (no breast) achieved a 0.691 AUC. The original model trained on both datasets achieved a 0.945 AUC on the combined US+UK dataset but paradoxically only 0.838 and 0.892 on the US and UK datasets, respectively. Sampling cancers equally from both datasets during training mitigated this shortcut. A similar AUC paradox (0.903) occurred when evaluating diagnostic exams vs screening exams (0.862 vs 0.861, respectively). Removing diagnostic exams during training alleviated this bias. Finally, the model did not exhibit the AUC paradox over scanner models but still exhibited a bias toward Selenia Dimension (SD) over Hologic Selenia (HS) exams. Analysis showed that this AUC paradox occurred when a dataset attribute had values with a higher cancer prevalence (dataset bias) and the model consequently assigned a higher probability to these attribute values (model bias). Stratification and balancing cancer prevalence can mitigate shortcuts during evaluation. Dataset and model bias can introduce shortcuts and the AUC paradox, potentially pervasive issues within the healthcare AI space. Our methods can verify and mitigate shortcuts while providing a clear understanding of performance.

READ FULL TEXT

page 4

page 8

page 9

page 10

page 12

page 18

page 19

page 20

research
03/20/2019

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

We present a deep convolutional neural network for breast cancer screeni...
research
09/18/2020

SCREENet: A Multi-view Deep Convolutional Neural Network for Classification of High-resolution Synthetic Mammographic Screening Scans

Purpose: To develop and evaluate the accuracy of a multi-view deep learn...
research
09/24/2021

Identifying Women with Mammographically-Occult Breast Cancer Leveraging GAN-Simulated Mammograms

Our objective is to show the feasibility of using simulated mammograms t...
research
05/03/2022

Assessing Dataset Bias in Computer Vision

A biased dataset is a dataset that generally has attributes with an unev...
research
01/23/2020

Adaptation of a deep learning malignancy model from full-field digital mammography to digital breast tomosynthesis

Mammography-based screening has helped reduce the breast cancer mortalit...
research
07/31/2021

Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

Datasets are rarely a realistic approximation of the target population. ...

Please sign up or login with your details

Forgot password? Click here to reset