Log In Sign Up

Towards a Guideline for Evaluation Metrics in Medical Image Segmentation

by   Dominik Müller, et al.

In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary as well as multi-class problems: Dice similarity coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen's Kappa, and Hausdorff distance. As a summary, we propose a guideline for standardized medical image segmentation evaluation to improve evaluation quality, reproducibility, and comparability in the research field.


MISeval: a Metric Library for Medical Image Segmentation Evaluation

Correct performance assessment is crucial for evaluating modern artifici...

SSEGEP: Small SEGment Emphasized Performance evaluation metric for medical image segmentation

Automatic image segmentation is a critical component of medical image an...

Assessing the Role of Random Forests in Medical Image Segmentation

Neural networks represent a field of research that can quickly achieve v...

Distribution-aware Margin Calibration for Medical Image Segmentation

The Jaccard index, also known as Intersection-over-Union (IoU score), is...

Roughness Index and Roughness Distance for Benchmarking Medical Segmentation

Medical image segmentation is one of the most challenging tasks in medic...