Improving Medical Annotation Quality to Decrease Labeling Burden Using Stratified Noisy Cross-Validation

by   Joy Hsu, et al.

As machine learning has become increasingly applied to medical imaging data, noise in training labels has emerged as an important challenge. Variability in diagnosis of medical images is well established; in addition, variability in training and attention to task among medical labelers may exacerbate this issue. Methods for identifying and mitigating the impact of low quality labels have been studied, but are not well characterized in medical imaging tasks. For instance, Noisy Cross-Validation splits the training data into halves, and has been shown to identify low-quality labels in computer vision tasks; but it has not been applied to medical imaging tasks specifically. In this work we introduce Stratified Noisy Cross-Validation (SNCV), an extension of noisy cross validation. SNCV can provide estimates of confidence in model predictions by assigning a quality score to each example; stratify labels to handle class imbalance; and identify likely low-quality labels to analyze the causes. We assess performance of SNCV on diagnosis of glaucoma suspect risk from retinal fundus photographs, a clinically important yet nuanced labeling task. Using training data from a previously-published deep learning model, we compute a continuous quality score (QS) for each training example. We relabel 1,277 low-QS examples using a trained glaucoma specialist; the new labels agree with the SNCV prediction over the initial label >85 low-QS examples mostly reflect labeler errors. We then quantify the impact of training with only high-QS labels, showing that strong model performance may be obtained with many fewer examples. By applying the method to randomly sub-sampled training dataset, we show that our method can reduce labelling burden by approximately 50 using the full dataset on multiple held-out test sets.



page 6


Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset

The reliability of machine learning models can be compromised when train...

Deep Learning from Small Amount of Medical Data with Noisy Labels: A Meta-Learning Approach

Computer vision systems recently made a big leap thanks to deep neural n...

Scheduling Techniques for Liver Segmentation: ReduceLRonPlateau Vs OneCycleLR

Machine learning and computer vision techniques have influenced many fie...

A Comparative Study of Confidence Calibration in Deep Learning: From Computer Vision to Medical Imaging

Although deep learning prediction models have been successful in the dis...

Disentangling Human Error from the Ground Truth in Segmentation of Medical Images

Recent years have seen increasing use of supervised learning methods for...

Combining Heterogeneously Labeled Datasets For Training Segmentation Networks

Accurate segmentation of medical images is an important step towards ana...

Data, Assemble: Leveraging Multiple Datasets with Heterogeneous and Partial Labels

The success of deep learning relies heavily on large datasets with exten...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.