Individual predictions matter: Assessing the effect of data ordering in training fine-tuned CNNs for medical imaging

12/08/2019
by   John R. Zech, et al.
12

We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet, the random seed affects the ordering of the batches of training data but not the initialized model weights. We found substantial variability in predictions for the same radiograph across model runs (mean ln[(maximum probability)/(minimum probability)] 2.45, coefficient of variation 0.543). This individual radiograph-level variability was not fully reflected in the variability of AUC on a large test set. Averaging predictions from 10 models reduced variability by nearly 70 coefficient of variation from 0.543 to 0.169, t-test 15.96, p-value < 0.0001). We encourage researchers to be aware of the potential variability of CNNs and ensemble predictions from multiple models to minimize the effect this variability may have on the care of individual patients when these models are deployed clinically.

READ FULL TEXT

page 5

page 7

research
03/10/2022

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

The conventional recipe for maximizing model accuracy is to (1) train mu...
research
11/12/2021

Histograms lie about distribution shapes and Pearson's coefficient of variation lies about variability

Background and Objective: Histograms and Pearson's coefficient of variat...
research
03/21/2021

Understanding performance variability in standard and pipelined parallel Krylov solvers

In this work, we collect data from runs of Krylov subspace methods and p...
research
06/10/2019

Analyzing the Role of Model Uncertainty for Electronic Health Records

In medicine, both ethical and monetary costs of incorrect predictions ca...
research
12/19/2022

Dataless Knowledge Fusion by Merging Weights of Language Models

Fine-tuning pre-trained language models has become the prevalent paradig...
research
11/07/2019

BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance

If the same neural architecture is trained multiple times on the same da...
research
10/08/2022

The effect of variable labels on deep learning models trained to predict breast density

Purpose: High breast density is associated with reduced efficacy of mamm...

Please sign up or login with your details

Forgot password? Click here to reset