How inter-rater variability relates to aleatoric and epistemic uncertainty: a case study with deep learning-based paraspinal muscle segmentation

by   Parinaz Roshanzamir, et al.

Recent developments in deep learning (DL) techniques have led to great performance improvement in medical image segmentation tasks, especially with the latest Transformer model and its variants. While labels from fusing multi-rater manual segmentations are often employed as ideal ground truths in DL model training, inter-rater variability due to factors such as training bias, image noise, and extreme anatomical variability can still affect the performance and uncertainty of the resulting algorithms. Knowledge regarding how inter-rater variability affects the reliability of the resulting DL algorithms, a key element in clinical deployment, can help inform better training data construction and DL models, but has not been explored extensively. In this paper, we measure aleatoric and epistemic uncertainties using test-time augmentation (TTA), test-time dropout (TTD), and deep ensemble to explore their relationship with inter-rater variability. Furthermore, we compare UNet and TransUNet to study the impacts of Transformers on model uncertainty with two label fusion strategies. We conduct a case study using multi-class paraspinal muscle segmentation from T2w MRIs. Our study reveals the interplay between inter-rater variability and uncertainties, affected by choices of label fusion strategies and DL models.


On the Effect of Inter-observer Variability for a Reliable Estimation of Uncertainty of Medical Image Segmentation

Uncertainty estimation methods are expected to improve the understanding...

Label fusion and training methods for reliable representation of inter-rater uncertainty

Medical tasks are prone to inter-rater variability due to multiple facto...

How certain are your uncertainties?

Having a measure of uncertainty in the output of a deep learning method ...

Impact of individual rater style on deep learning uncertainty in medical imaging segmentation

While multiple studies have explored the relation between inter-rater va...

Systematic Clinical Evaluation of A Deep Learning Method for Medical Image Segmentation: Radiosurgery Application

We systematically evaluate a Deep Learning (DL) method in a 3D medical i...

VoteNet: A Deep Learning Label Fusion Method for Multi-Atlas Segmentation

Deep learning (DL) approaches are state-of-the-art for many medical imag...

Deep Label Fusion: A 3D End-to-End Hybrid Multi-Atlas Segmentation and Deep Learning Pipeline

Deep learning (DL) is the state-of-the-art methodology in various medica...

Please sign up or login with your details

Forgot password? Click here to reset