CheXpert++: Approximating the CheXpert labeler for Speed,Differentiability, and Probabilistic Output

06/26/2020
by   Matthew B. A. McDermott, et al.
0

It is often infeasible or impossible to obtain ground truth labels for medical data. To circumvent this, one may build rule-based or other expert-knowledge driven labelers to ingest data and yield silver labels absent any ground-truth training data. One popular such labeler is CheXpert, a labeler that produces diagnostic labels for chest X-ray radiology reports. CheXpert is very useful, but is relatively computationally slow, especially when integrated with end-to-end neural pipelines, is non-differentiable so can't be used in any applications that require gradients to flow through the labeler, and does not yield probabilistic outputs, which limits our ability to improve the quality of the silver labeler through techniques such as active learning. In this work, we solve all three of these problems with CheXpert++, a BERT-based, high-fidelity approximation to CheXpert. CheXpert++ achieves 99.81% parity with CheXpert, which means it can be reliably used as a drop-in replacement for CheXpert, all while being significantly faster, fully differentiable, and probabilistic in output. Error analysis of CheXpert++ also demonstrates that CheXpert++ has a tendency to actually correct errors in the CheXpert labels, with CheXpert++ labels being more often preferred by a clinician over CheXpert labels (when they disagree) on all but one disease task. To further demonstrate the utility of these advantages in this model, we conduct a proof-of-concept active learning study, demonstrating we can improve accuracy on an expert labeled random subset of report sentences by approximately 8% over raw, unaltered CheXpert by using one-iteration of active-learning inspired re-training. These findings suggest that simple techniques in co-learning and active learning can yield high-quality labelers under minimal, and controllable human labeling demands.

READ FULL TEXT
research
12/09/2019

Large deviations for the perceptron model and consequences for active learning

Active learning is a branch of machine learning that deals with problems...
research
01/29/2019

Limitations of Assessing Active Learning Performance at Runtime

Classification algorithms aim to predict an unknown label (e.g., a quali...
research
04/05/2022

An Exploration of Active Learning for Affective Digital Phenotyping

Some of the most severe bottlenecks preventing widespread development of...
research
05/06/2019

Caveats in Generating Medical Imaging Labels from Radiology Reports

Acquiring high-quality annotations in medical imaging is usually a costl...
research
10/01/2019

Learning to estimate label uncertainty for automatic radiology report parsing

Bootstrapping labels from radiology reports has become the scalable alte...
research
08/15/2023

BI-LAVA: Biocuration with Hierarchical Image Labeling through Active Learning and Visual Analysis

In the biomedical domain, taxonomies organize the acquisition modalities...
research
10/21/2020

Complex data labeling with deep learning methods: Lessons from fisheries acoustics

Quantitative and qualitative analysis of acoustic backscattered signals ...

Please sign up or login with your details

Forgot password? Click here to reset