1 Introduction
Medical image segmentation is the process of making per-pixel predictions in an image in order to identify organs or lesions from the background. Generally, medical images are largely versatile in nature, depending on the acquisition process and the type of object to be segmented. Imaging modalities include magnetic resonance imaging (MRI), computed tomography (CT), nuclear medicine functional imaging, ultrasound imaging, microscopy, to name a few. Hence, they vary in characteristics and nature and are broad with regards to the anatomical object of interest. As such, guaranteeing high performance for medical image segmentation can be considered very challenging when compared to other types of images or segmentation tasks. Regardless, segmentation in the medical domain is considered a key step in assisting early disease detection, diagnosis, monitoring treatment and follow up.
In the recent era, deep learning has registered a pivotal milestone in many fields including pattern recognition, object detection, natural language processing, with medical image segmentation being no exception to the rule. Convolutional neural networks (CNNs), a class of deep learning models, have been known to achieve considerable results due to their generalization ability. Since the segmentation process involves indicating not only what is present in an image but also where, medical image segmentation via CNNs considers a trade-off between contextual and spatial understanding. A pioneering approach for segmentation is the U-Net
[27] model, which is known for the ability to consider semantic and contextual information while achieving promising performance. U-Net has gained a high-level of success within image segmentation generally, and medical image segmentation particularly, due to its enhanced properties and powerful predictive notions.Despite undeniable success, segmentation networks for medical images, including U-Net and its variants, may still generate anatomically aberrant segmentations, with holes or inaccuracies near the object boundaries as demonstrated in papers[14, 4] . Thus, such models lack the anatomical plausibility and background that a medical expert has. Moreover, they often require large amounts of annotated training data, which is not easy to obtain in the medical domain. Un-annotated or partially labeled data are, rather, more easily available or less computationally expensive.
To mitigate these limitations, recent research studies have focused on incorporating medical expert information, known as prior knowledge, as constraints within the deep learning framework. Prior knowledge can be information concerning the object shapes, size, topology, boundary or location, and has been known to be useful via variational approaches prior to the deep learning era. The exploitation of prior knowledge allows enforcing anatomical plausibility within segmentations provided by deep networks and can also overcome the need for fully labeled data [16, 19] .
Constraints via prior knowledge can be incorporated in CNNs either at the level of the network architecture [10, 11, 30, 13, 26, 21] or at the level of the loss function [17, 20, 18, 28, 6, 24, 1]. Whereas structural constraints are rather robust, loss constraints are more generic and can be plugged into any backbone network. Thus, prior-based loss functions offer a versatile way to include constraints at different scales, while maintaining interactions between regions as well as the computational efficiency of the backbone network.
Integrated prior can be low-level, which resembles reformulated ground-truth representation and is extracted from the ground-truth segmentations. For example, distance maps [6, 16, 25] and Laplacian filters [1]. Prior could also be high-level representing actual external medical information such as the shape of the organ, compactness or size, and are optimized directly based on ground-truth prior tags [17, 9, 24].
Over the past few years, prior-based losses, whether low-level or high-level, present a rising trend in today’s research in semantic image segmentation, particularly in the medical field. Given the diversity of prior-based losses on different medical imaging challenges and tasks, it has become hard to identify what loss works best for which dataset. For this reason, we establish in this paper, a benchmark of recent prior-based losses for medical image segmentation. Our main objective is to provide intuition onto which losses to choose given a particular task or dataset, based on dataset characteristics and properties.
In the literature, benchmarks most relevant to ours is the one proposed in paper [23] and paper [22]. In paper [22] , a benchmark of 20 losses is conducted with a thorough comparison on 4 main segmentation tasks: Liver, Liver Tumor, Pancreas and Multi-Abdominal Organ Segmentation. However, the authors do not address prior-based losses. Instead, they consider regular fitting losses like Dice, Cross entropy and their variants. Their benchmark is limited to only 4 datasets. The benchmark proposed in paper[23] targets some low-level prior-losses. However, this benchmark is limited to the scope of losses based on distance maps, such as the boundary loss [16] or the Hausdorff loss [15] , and do not compare relative to high-level prior losses. Benchmark [23] also demonstrate results on some structural constraints (i.e., regarding the architecture), that do not lie within the scope of our work. In addition, the benchmark is limited to two datasets: an organ segmentation task of the left atrial structure within MRI images and a liver tumor segmentation task within CT scans. In this work, we target specifically prior-based losses, both high-level and low-level, on 8 datasets of different tasks and modalities. Hence up to our knowledge, there is no benchmark that aims to compare prior-based losses on a number of datasets in order to quantify common trends and limitations.
The main objective of the proposed benchmark is to study the performance of prior-based losses, on a variety of datasets, tasks and modalities. In this way, we provide the readers with intuition onto which losses to choose given a particular task of interest. Prior-based losses are quite interesting because they allow integration of expert knowledge while still being architecture-agnostic, that is to say, they can be plugged into any backbone network. As a result, we are able to unify the segmentation network given the same learning environment, while varying the prior-based losses accordingly. We note that each of the considered losses has been proposed in their respective papers, in order to carry on a particular task. We believe that aside from the initial motive that the considered losses were designed for, additional significance may be drawn on other segmentation tasks and dataset characteristics. For this reason, we validate the chosen prior-based losses on 8 different datasets from a variety of medical image segmentation challenges including the Decathlon111http://medicaldecathlon.com/ , the ISLES 222http://www.isles-challenge.org/ and the WMH333https://wmh.isi.uu.nl/ challenge. The main contributions of this paper are summarized as follows:
-
We present a benchmark of architecture-agnostic prior-based losses for medical image segmentation.
-
We attempt to shed light on the underlying relationship between the prior-based losses and some dataset characteristics.
The rest of the paper is organized as follows. Section 2 presents the selected loss functions for the proposed benchmark and elaborate on why the proposed losses were chosen. Section 4 illustrates the experimental setting adopted in order to evaluate the considered prior-based losses on the different datasets. In section 3
, we describe the datasets considered and the meta-features extracted to compare the loss performances. Finally, section
5 demonstrates the results and analyzes the loss performances relative to segmentation tasks and dataset characteristics.2 Selected Loss Functions
In this section, we present the chosen prior-based losses for our proposed benchmark. Prior-based losses can be high-level, when the type of prior considered is based on external knowledge (e.g. shape), or low-level, that integrate ground-truth map transformations such as distance or contour maps, in order to reveal geometrical and location properties as demonstrated in survey [12] . The proposed benchmark mainly focuses on 4 recent prior-based losses that have raised interest within the field of medical image segmentation, 2 are low-level, and the other 2 are high-level.
2.1 Low-level prior-based losses
Possible low-level prior can be based on distance map as demonstrated in papers [16, 15] . In this context, two major contributions are the boundary loss [16] and the Hausdorff loss [15] .
The Boundary loss
is an approximation of the distance between the real and the estimated boundaries. Based on graph theories
[5], an equivalent term that finetunes the probability distribution via ground-truth distance maps is derived in paper
[16] and is defined as:(1) |
where denotes the distance of pixel to the closest contour () point, being the predicted value at pixel , and the image spatial domain.
The Hausdorff loss [15] conducts a direct point-by-point optimization of the predicted and ground-truth contours arriving to the following loss term:
(2) |
The boundary loss has been initially designed in order to segment lesions within the brain, with the WMH and the ISLES datasets, whereas the Hausdorff loss has been tested on 4 different single-organ segmentation tasks, including the prostate, liver and pancreas from the Decathlon and PROMISE challenges. However, these losses were not evaluated in multi-organ segmentation. Since both losses lie in the same spectrum of low-level prior-based losses, and rely on the distance map, it may be interesting to investigate their performance on the same datasets in order to pinpoint common behaviors. Moreover, we aim to also extend the scope of these losses to the multi-organ case.
2.2 High-level prior-based losses
Regarding high-level prior losses, we analyze the performance of the clDice loss [28] and the size loss [17] .
The Size loss [17]
estimates the organ size from a soft probability map and constrains it, based on higher and lower threshold value of the organ size, according to the following:
(3) |
(4) |
where and
are respectively the upper and lower permissible bounds that the size of the considered object can attain. The size loss was originally designed for weakly supervised learning, to guide the network through the training despite the lack of full label maps. We are particularly interested in studying the effect of the size loss on small structures that are known to be more difficult to segment.
The clDice loss [28]
, also called skeleton loss, exploits skeletonization maps that are compact representations of images and objects that preserve topological properties. The objective of this loss is to constrain the skeleton of the predicted map to match the skeleton of the ground-truth map. This prior was used in the segmentation of vessels and neurons in both 2D and 3D. Let
and be the ground-truth and the predicted skeleton respectively, of size . The sensitivity (or recall) between the predicted segmentation and ground-truth skeleton is introduced as . Likewise, the precision between the ground-truth mask and the predicted skeleton is defined as: . The clDice is defined as the F1-score between precision and sensitivity as follows:(5) |
The clDice was originally designed to segment vessels; however, due to the nature of the skeletonization feature that they target, we believe that they may be good at distinguishing between different structures lying in close proximity to each other, such as when the organs are made of multiple instances.
![]() |
![]() |
# of patients | Organ Size | # of classes | # of modality | # of | ||||
Train | Test | mean | std | CC | ||||
WMH | 48 | 12 | 0.33 | 0.56 | 1 | 2 | 0 26 | |
Isles | 74 | 20 | 2.11 | 1.91 | 1 | 5 | 0 3 | |
Atrium | 16 | 4 | 0.69 | 0.43 | 1 | 1 | 0 4 | |
Colon | 100 | 38 | 0.6 | 0.59 | 1 | 1 | 0 3 | |
Spleen | 32 | 9 | 1.57 | 1.03 | 1 | 1 | 0 1 | |
Hippo. |
H1 | 206 | 54 | 4.08 | 3.87 | 2 | 1 | 0 4 |
H2 | 3.53 | 2.53 | 0 3 | |||||
Prost. |
CG | 26 | 6 | 0.9 | 0.89 | 2 | 2 | |
PZ | 3.1 | 2.98 | 0 1 | |||||
ACDC |
RVC | 99 | 24 | 1.29 | 1.03 | 3 | 1 | 0 1 |
MYO | 1.38 | 0.69 | ||||||
LVC | 1.28 | 0.84 | ||||||
|
3 Datasets and Tasks
In this section, we present a brief description of the datasets under consideration. The datasets were chosen to cover different tasks, modalities and characteristics. Each dataset encompasses a particular set of challenges the segmentation network must consider while training. A summary of the meta-dataset characteristics is presented in Table 1.
![]() |
![]() |
![]() |
3.1 Brain Lesion Segmentation
To investigate the significance of prior-based losses on Brain lesion segmentation tasks, we mainly focus on the segmentation of white matter hyperintensities (WMH) dataset and the ischemic stroke lesion segmentation dataset (ISLES). Both datasets are multi-modal with anatomical objects that are characterized by being sparse and composed of multi-instances (See Figure 1).
3.2 Single Organ Segmentation
Organs can generally be single-connected of only 1 structure, or multi-connected composed of multi structures that are close to each other. To investigate the segmentation performance of prior-based losses on single-organ segmentation tasks where the organ considered is characterized with multi-connected structures, we targeted the segmentation of the atrium and Colon from the Decathlon Challenge. Alternatively, we target the spleen to investigate the performance of prior-based losses relative to single-label single-connected organs. The spleen and colon are characterized with a largely varying size and mild convexity issues at boundary levels. On the other hand, the atrium is a multi-instance anatomical object with up to 4 elements of varying sizes and lying in close proximity to each other (See Figure 2).
![]() |
![]() |
![]() |
3.3 Multi-Organ Segmentation
For multi-organ segmentation, we have targeted the Prostate (Prostate central gland and peripheral zone) and Hippocampus (tissues H1 & H2) datasets from the Decathlon Challenge and the ACDC dataset (Three Cardiac Structures).
3.4 Meta-dataset Features
In order to reveal the underlying relationship between loss performance and dataset characteristics, we propose a set of meta-features that describe the datasets. This includes the size of the anatomical object taken as the percentage of occupation from the entire image, the number of connected components, which means how many instances an anatomical object is constituted of, and the number of classes, i.e., whether the segmentation task is single or multi-label.
4 Experimental Setting
We deploy a unified U-Net based framework [16, 17] and modify the loss function accordingly. Training is done using a batch size of 8 and a learning rate of
. The learning rate is halved if the validation performance does not improve during 20 epochs. The U-Net model is trained via each prior-based loss in conjunction with the Dice loss weighted by a parameter
according to the following equation:(6) |
The parameter is fine-tuned via the dynamic training strategy [16] such that its value was initially set to 0.01 and increased by 0.01 per epoch for 200 epochs. Our code is publically available on GitHub 444https://github.com/rosanajurdi/Prior-based-Losses-for-Medical-Image-Segmentation
For pre-processing, we have resized the images to 256 256 pixels and normalized them to the range [0, 1]. For multi-modal datasets, we have concatenated the channels at the level of the input. Each dataset was split into train and validation based on an 80 % / 20 % partition respectively, as shown in Table 1, and validated via three Monte-Carlo simulations [2].
Dataset | + | + | + | + | ||
WMH | 74.64 1.34 | 77.29 0.75 | 78.77 0.70 | 78.06 1.61 | 66.97 11.48 | |
Isles |
53.41 4.61 | 62.93 2.24 | 63.53 1.66 | 46.86 7.74 | 62.53 5.22 | |
Atrium |
83.67 3.66 | 82.80 3.68 | 84.57 1.86 | 84.59 2.62 | 83.85 2.56 | |
Colon |
84.82 1.71 | 88.71 0.48 | 88.30 0.78 | 88.71 0.48 | 84.52 2.64 | |
Spleen |
76.80 7.59 | 80.38 5.46 | 91.79 2.67 | 86.44 15.86 | 87.15 13.61 | |
Hippocampus |
H1 | 49.38 0.33 | 65.20 0.31 | 68.54 1.46 | 66.24 0.33 | 68.39 2.60 |
H2 | 71.70 1.30 | 81.33 0.74 | 82.12 0.44 | 81.84 0.63 | 82.82 1.22 | |
Prostate |
CG | 45.17 6.41 | 44.89 7.09 | 44.15 5.61 | 34.12 7.49 | 42.45 7.03 |
|
PZ | 65.13 11.57 | 68.99 9.94 | 64.38 9.33 | 29.61 12.07 | 61.57 11.44 |
ACDC | RVC | 80.79 0.95 | 81.04 0.87 | 80.54 1.30 | 41.02 38.39 | 83.83 1.39 |
|
MYO | 83.92 0.13 | 84.16 0.83 | 83.91 0.85 | 83.41 0.72 | 83.24 0.66 |
LVC | 90.26 0.13 | 89.53 0.74 | 88.98 0.90 | 89.74 0.71 | 89.56 1.10 |
Data-Set | + | + | + | + | ||
WMH |
0.98 0.13 | 0.94 0.17 | 0.93 0.16 | 0.94 0.18 | 1.16 0.38 | |
Isles |
3.75 0.35 | 3.05 0.22 | 3.07 0.18 | 3.45 0.79 | 3.29 0.62 | |
Atrium | 1.62 , 0.16 | 1.64 0.16 | 1.67 0.13 | 1.59 0.17 | 1.64 0.16 | |
Colon |
0.58 0.04 | 0.50 0.02 | 0.51 0.03 | 0.50 0.02 | 0.58 0.07 | |
Spleen | 0.92 0.15 | 1.07 0.53 | ||||
Hippocampus | H1 | 2.31 0.05 | 1.99 0.01 | 1.98 0.03 | 1.97 0.02 | 1.99 0.04 |
H2 | 3.82 0.14 | 3.09 0.01 | 2.97 0.05 | 3.07 0.01 | 3.20 0.18 | |
Prostate | CG | 2.80 0.34 | 2.77 0.43 | 2.88 0.27 | 3.10 0.26 | 3.48 0.66 |
PZ | 3.24 0.35 | 2.94 0.27 | 3.17 0.47 | 4.41 0.90 | 3.45 0.58 | |
ACDC |
RVC | 2.44 0.04 | 2.41 0.05 | 2.33 0.04 | 3.88 1.44 | 2.34 0.08 |
|
MYO | 2.60 0.01 | 2.57 0.01 | 2.65 0.01 | 2.62 0.00 | 2.71 0.04 |
LVC | 1.95 0.02 | 1.95 0.02 | 1.98 0.01 | 1.94 0.01 | 1.98 0.04 |
Data-Set | + | + | + | + | |
---|---|---|---|---|---|
WMH |
1.04 0.14 | 0.98 0.17 | 1.01 0.23 | 0.91 0.22 | 2.14 1.26 |
Isles | 0.69 0.19 | 0.48 0.04 | 0.57 0.19 | 1.34 1.08 | 0.39 0.10 |
Atrium | 0.25 0.01 | 0.29 0.02 | 0.32 0.01 | 0.28 0.03 | 0.28 0.03 |
Colon | 0.17 0.02 | 0.13 0.01 | 0.13 0.01 | 0.13 0.01 | 0.18 0.03 |
Spleen | 0.22 0.05 | 0.24 0.09 | 0.09 0.01 | 0.18 0.13 | 0.12 0.15 |
Hippocampus-H1 | 3.76 0.14 | 1.81 0.14 | 2.67 0.11 | 2.88 0.39 | 1.30 0.96 |
Hippocampus-H2 | 0.95 0.01 | 0.23 0.01 | 0.87 0.10 | 0.74 0.06 | 0.10 0.03 |
Prostate-CG | 8.96 3.33 | 9.05 3.33 | 8.89 3.11 | 8.78 2.79 | 8.98 3.32 |
Prostate-PZ | 0.36 0.13 | 0.23 0.09 | 0.33 0.07 | 0.80 0.12 | 0.26 0.11 |
ACDC-RVC | 0.18 0.03 | 0.16 0.00 | 0.13 0.02 | 0.11 0.04 | 0.12 0.03 |
ACDC-MYO | 0.04 0.02 | 0.06 0.01 | 0.08 0.03 | 0.07 0.03 | 0.06 0.02 |
ACDC-LVC | 0.06 0.01 | 0.06 0.01 | 0.07 0.01 | 0.07 0.02 | 0.05 0.02 |
5 Results and Analysis
In this section, we report results of the benchmark datasets relative to the losses under consideration based on the training strategy explained in section 4. The segmentation performances are compared via the 2 usual segmentation metrics: the Dice score[8] (DSC) presented in Table 2 , the Hausdorff distance metric[3] (HD) presented in Table 3. In addition, we have computed the mean absolute error on the number of instances (connected components) presented in Table 4.
5.1 Added value of prior losses over the Dice loss baseline
From the performance tables, we realize that there is always at least one prior-based loss that is superior to the Dice baseline (denoted by cells with blue background in the tables). Thus, the exploitation of prior-based losses generally has enhanced segmentation performance in 10 out of 12 anatomical objects of the 8 datasets. For example, the Hausdorff loss has registered best performances on brain lesion segmentation tasks (WMH, Isles) and single-organ segmentation datasets. On the other hand, the boundary loss registered performances close to the best case performance on lesion tasks (Isles, WMH). The clDice registers best performances in 1 out of 3 multi-organ segmentation datasets and the size loss got good results on a selection of datasets including WMH, Atrium and Colon.
A close look at the Dice baseline performance over the entire set of datasets (first column in the tables), one can observe that the Prostate is quite challenging since it has the lowest Dice baseline performance. On the contrary, the ACDC dataset is the easiest with the highest Dice accuracy, and the problem of cardiac structure segmentation is well known and has been argued to be almost solved as demonstrated in paper[4]. Intuitively, an easy dataset would already register good performance given the simple Dice baseline and one would expect the addition of prior-based losses to have no added value, other than adding to the complexity of the training and degrading system performance. Indeed, the results benchmarked on the ACDC dataset registers little to no added value on the performance relative to the baseline. Alternatively, if the dataset is too complex such as the case of the Prostate (multi-label segmentation, large organ size imbalance, large number of connected components), customized prior-based losses may be needed to accommodate its characteristics: almost no gain is obtained from prior losses for the Prostate dataset.
5.2 Low-level vs. High-level Prior-based Losses
Both Hausdorff and Boundary losses register good performances on most datasets and over all segmentation tasks: brain lesions, single-organs and multi-organ segmentation. The Hausdorff loss has a superiority over the Boundary loss in some dataset cases (Spleen, Hippocampus). For example, for the spleen dataset, the Hausdorff loss has registered best case performance in both dice accuracy (added value of 14 %) and reduced the Hausdorff distances by over 30 % in comparison to the Boundary loss. The superiority of the Hausdorff loss over the Boundary loss is mainly due to the fact that the Hausdorff loss extracts distance maps from both predicted and ground-truth contours, and minimizes the error between the two maps accordingly, whereas the Boundary loss simply fine-tunes the probability distribution via the ground-truth distance maps. Based on this, one can say that since Hausdorff targets optimizing the distance map entity directly between predicted and ground-truth labels, it can guarantee a better mapping between predicted outputs and the ground-truth than the Boundary loss. Despite this significance, the Hausdorff loss is very computationally expensive since it consists in computing the predicted distance maps online while training, which directly affects training time. Hence, one may consider that the Boundary loss may represent a reasonable trade-off between good segmentation performance and computational cost.
Regarding the high-level prior-based losses, results are mixed: the size-based loss can either provide great improvement (e.g., WMH, Atrium, Colon), or much worse results (e.g., ISLES). For example, the size loss registers equivalent performance in the case of the WMH dataset relative to the best case segmentation result, but performs poorly on ISLES, despite the similarity in nature between the two datasets. We hypothesize that this may be due to the overall lesion sizes. A closer look at Table 1 showing the meta-data characteristic, we can gather that, on small sized organs (e.g.: WMH, Atrium, Colon), the size loss registers performance either better or equivalent to the Dice baseline. Given datasets that have large size variability (e.g., isles, Prostate, or ACDC), the exploitation of the size loss degrades segmentation performance. This is mainly due to the fact that, generally, the exploitation of the size loss allows the network to learn average sizes of the organs. In the same essence, based on the results, one can see that size loss can not accommodate multi-organ segmentations. The above observations are illustrated in Figure 4 showing the Dice performance relative to organ sizes. We note that the datasets where the size loss registered degraded results (red dots) are for those whose organ sizes are of large variability or that include multi-label segmentation. Hence, despite the fact that the size loss was initially customized to accommodate weakly supervised segmentation, it may be useful in full supervision, when the anatomical objects under consideration are very small structures, and occupying a tiny percentage of the overall image as in the case of the WMH dataset.
The clDice has a similar behavior but to a lesser extent. It generally registers better performance than the Dice baseline in most single-label segmentation cases and one multi-organ segmentation dataset. However, the clDice loss degraded performance on other datasets such as the WMH and the ACDC. Despite the equivalence in Dice accuracy between the Hausdorff loss and clDice loss on the Hippocampus dataset, the Hausdorff loss outperformed the clDice relative to the Hausdorff distance (Hausdorff loss is about 8 % lower than clDice loss in Hausdorff distance). This indicates its ability to take into consideration shape and border specifications. The degraded performance of clDice on Hausdorff distance can be explained by the fact that the loss is based on the skeleton maps, which tends to blur boundary specifications for the sake of revealing topological properties. This limitation is further verified by the clDice with the Hausdorff distance results on the ACDC dataset. Thus, even when the clDice registered best ranked results relative to the Dice Accuracy, the Hausdorff distance is degraded, even lower than the Dice baseline with regards to the Myocardium, for instance. Given tasks with high border irregularities, such as lesions, failing to consider boundary specifications can hinder overall performance (e.g. case of brain lesion in the WMH dataset).
When studying other meta-data features such as the number of connected components, one can see that the exploitation of high-level prior-based losses does not have a great influence on the results (see Table 4). We hypothesize that this may be due to the fact that high-level prior-based losses are rather customized to serve a particular task, or satisfy a particular constraint. If the task at hand does not conform with the dataset characteristics or attributes, the prior-based loss may generally have no added value.
Overall, we can hypothesize that contour-based losses are rather generic, and can be useful for enhancing segmentation performance on any type of dataset. However, if we are aiming at preserving a particular characteristic or anatomical property, a customized high-level prior-based loss may be a feasible solution. Thus, high-level losses may provide improvement; however, they are not very stable and can not be generalized to all datasets and tasks.

5.3 Limitations of the Current Proposed Benchmark
Despite our intuitive analysis with regards to some relationships between loss performance and dataset characteristics however, we admit to many limitations. For starters, the proposed benchmark can not be generic, as there are many existent prior-based losses that we fail to include: low-level prior [6, 31, 25] , high-level topological [7] or shape prior [9, 24]
. Moreover, due to the fact that high-level prior-based losses are customized to target a particular property, providing means of comparison with respect to their effectiveness is subjected to debate. Another key component to take into consideration is their optimization algorithms. Many prior-based losses are discrete in nature; hence, they require particular optimization strategies in order to insure good performances. Our proposed benchmark is based on plugging the considered losses into a penalty-based Lagrangian optimization technique and training via stochastic gradient descent and the ADAM optimizer. On the level of the datasets, despite some similarities between datasets (Lesion task: ISLES, WMH, task: Single vs. Multi), however, the datasets are rather very different, each given a set of characteristics and properties. Hence, there are a lot of variables to take into consideration, which makes the means of comparison often limited. Despite these limitations, presenting a benchmark that can test prior-based losses on different tasks and datasets is important, because it can give the reader an intuitive initial judgment on which loss to choose based on the considered requirements and datasets properties.
6 Conclusion
In this paper, we proposed a benchmark of prior-based losses on medical image segmentation datasets. We provided intuitive explanations on a few existing relationships between prior-based loss significance and dataset characteristics. We summarized the paper’s realizations as follows: the size loss is generally significant when considering datasets of small structures and limited size variability. The contour-based losses generally, and Hausdorff loss particularly, accommodates objects of multi-structures and border irregularities.
Future work includes expanding the proposed benchmark in order to encompass a broader perspective of losses. Moreover, we aim to add other metadata features, in order to better characterize the organ and the task at hand, develop robust similarity feature vectors between datasets for more accurate comparison and conduct meta-learning to predict loss ranks and outputs so as to address the computational complexity issues between losses and their peers.
Disclosures
The authors have no relevant financial interests in this article and no potential conflicts of interest to disclose.
Acknowledgments
The authors would like to acknowledge the ANR (Project APi, grant ANR-18-CE23-0014) and the CRIANN for providing computational resources. This work is part of the WeSmile project funded by PHC VanGogh.
Data, Materials, and Code Availability
The code used to conduct the the benchmark is publically available on GitHub:
https://github.com/rosanajurdi/Prior-based-Losses-for-Medical-Image-Segmentation
References
- [1] (2018) Shape-aware deep convolutional neural network for vertebrae segmentation. In Computational Methods and Clinical Applications in Musculoskeletal Imaging, pp. 12–24. External Links: ISBN 978-3-319-74113-0 Cited by: §1, §1.
- [2] (2010) A survey of cross-validation procedures for model selection. Statistics Surveys 4 (none), pp. 40–79. External Links: Document Cited by: §4.
- [3] (1998) On the hausdorff distance used for the evaluation of segmentation results. Cited by: §5.
- [4] (2018-11) Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved?. IEEE Transactions on Medical Imaging 37 (11), pp. 2514–2525. External Links: ISSN 1558-254X Cited by: §1, §5.1.
- [5] (2001) Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (11), pp. 1222–1239. Cited by: §2.1.
- [6] (2019-07) Distance map loss penalty term for semantic segmentation. In International Conference on Medical Imaging with Deep Learning – Extended Abstract Track, London, UK. Cited by: §1, §1, §5.3.
- [7] (2019-01) Explicit topological priors for deep-learning based image segmentation using persistent homology. arXiv:1901.10244 [cs]. External Links: 1901.10244 Cited by: §5.3.
- [8] (2006) Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Transactions on Medical Imaging 25 (11), pp. 1451–1461. External Links: ISSN 1558-254X, Document Cited by: §5.
- [9] (2017) Unbiased shape compactness for segmentation. In MICCAI, pp. 755–763. External Links: ISBN 978-3-319-66182-7 Cited by: §1, §5.3.
- [10] (2020) BB-UNet: U-net with bounding box prior. IEEE Journal of Selected Topics in Signal Processing 14 (6), pp. 1189–1198. Cited by: §1.
- [11] (2020-11) Investigating CoordConv for fully and weakly supervised medical image segmentation. In Tenth International Conference on Image Processing Theory, Tools and Applications, IPTA 2020, Paris, France, Cited by: §1.
-
[12]
(2021)
High-level Prior-based Loss Functions for Medical Image Segmentation: ASurvey.
Submitted to Computer Vision and Image Understanding
. Cited by: §2. - [13] (2016-10) Location sensitive deep convolutional neural networks for segmentation of white matter hyperintensities. Scientific Reports 7. Cited by: §1.
- [14] (2018) Automatic Cardiac Disease Assessment on cine-MRI via Time-Series Segmentation and Domain Specific Features. In Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges, Lecture Notes in Computer Science, pp. 120–129 (English). External Links: Document, ISBN 978-3-319-75541-0 Cited by: §1.
- [15] (2019-04) Reducing the Hausdorff Distance in Medical Image Segmentation with Convolutional Neural Networks. arXiv:1904.10030 [cs, eess, stat]. External Links: 1904.10030 Cited by: §1, §2.1, §2.1.
-
[16]
(2019-07)
Boundary loss for highly unbalanced segmentation.
In Medical Imaging with Deep Learning,
Proceedings of Machine Learning Research
, Vol. 102, London, UK, pp. 285–296. Cited by: §1, §1, §1, §2.1, §2.1, §4, §4. - [17] (2018) Constrained-CNN losses for weakly supervised segmentation. In 1st Conference on Medical Imaging with Deep Learning (MIDL), Amsterdam, the Netherlands, Cited by: §1, §1, §2.2, §2.2, §4.
- [18] (2020-04) Bounding boxes for weakly supervised segmentation: Global constraints get close to full supervision. arXiv:2004.06816 [cs] (en). External Links: 2004.06816 Cited by: §1.
- [19] (2020) Bounding boxes for weakly supervised segmentation: Global constraints get close to full supervision. In Medical Imaging with Deep Learning, Cited by: §1.
- [20] (2019-09) Constrained Deep Networks: Lagrangian Optimization via Log-Barrier Extensions. arXiv:1904.04205 [cs]. External Links: 1904.04205 Cited by: §1.
- [21] (2017-07) Simple does it: Weakly supervised instance and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1665–1674. External Links: ISSN 1063-6919, Document Cited by: §1.
- [22] (2021) Loss odyssey in medical image segmentation. Medical Image Analysis 71, pp. 102035. External Links: ISSN 1361-8415, Document, Link Cited by: §1.
- [23] (2020) How distance transform maps boost segmentation {cnn}s: an empirical study. In Medical Imaging with Deep Learning, External Links: Link Cited by: §1.
- [24] (2018) Star shape prior in fully convolutional networks for skin lesion segmentation. In MICCAI, Vol. 11073, pp. 737–745. External Links: Document Cited by: §1, §1, §5.3.
- [25] (2018) Beyond the pixel-wise loss for topology-aware delineation. In CVPR, pp. 3136–3145. Cited by: §1, §5.3.
- [26] (2017) Learning and incorporating shape models for semantic segmentation. In MICCAI, pp. 203–211. External Links: ISBN 978-3-319-66182-7 Cited by: §1.
- [27] (2015) U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241. External Links: ISBN 978-3-319-24574-4 Cited by: §1.
- [28] (2019) clDice - a topology-preserving loss function for tubular structure segmentation. In Medical Imaging Meets NeurIPS 2019 Workshop, Cited by: §1, §2.2, §2.2.
- [29] (2019) A large annotated medical image dataset for the development and evaluation of segmentation algorithms. CoRR abs/1902.09063. External Links: Link, 1902.09063 Cited by: Figure 2.
- [30] (2017) Joint segmentation of multiple thoracic organs in CT images with two collaborative deep architectures. MICCAI’17 workshop Deep Learning in Medical Image Analysis. Cited by: §1.
- [31] (2018-01) Class-balanced deep neural network for automatic ventricular structure segmentation. In STACOM@MICCAI, pp. 152–160. Cited by: §5.3.