clDice
None
view repo
Accurate segmentation of tubular, network-like structures, such as vessels, neurons, or roads, is relevant to many fields of research. For such structures, the topology is their most important characteristic, e.g. preserving connectedness: in case of vascular networks, missing a connected vessel entirely alters the blood-flow dynamics. We introduce a novel similarity measure termed clDice, which is calculated on the intersection of the segmentation masks and their (morphological) skeletons. Crucially, we theoretically prove that clDice guarantees topological correctness for binary 2D and 3D segmentation. Extending this, we propose a computationally efficient, differentiable soft-clDice as a loss function for training arbitrary neural segmentation networks. We benchmark the soft-clDice loss for segmentation on four public datasets (2D and 3D). Training on soft-clDice leads to segmentation with more accurate connectivity information, higher graph similarity, and better volumetric scores.
READ FULL TEXT VIEW PDFNone
Implementation of clDice - a Novel Connectivity-Preserving Loss Function for Vessel Segmentation (2019) in Keras/Tensorflow
None
Segmentation of tubular and curvilinear structures is an essential problem in numerous domains, such as clinical and biological applications (blood vessel and neuron segmentation from microscopic, optoacoustic, and radiology images), remote sensing applications (road network segmentation from satellite images), industrial quality control, etc. In these domains, a topologically accurate segmentation is necessary to guarantee error-free down-stream tasks, e.g. computational hemodynamics, Alzheimer’s disease prediction [10], or stroke modeling [11]. Analogous to most other image segmentation tasks, the two most commonly used categories of quantitative performance measures for evaluating segmentation accuracy of tubular
structures, are 1) overlap based measures such as dice-score, precision, recall, and Jaccard index; and 2) volumetric distance measures such as the Hausdorff distance and the Mahalanobis distance
[12, 28, 24, 8].However in most segmentation problems, where the object of interest is 1) locally a tubular structure and 2) globally forms a network, the most important characteristic to be preserved for the success of the subsequent tasks is the connectivity of the global network topology. Note that network in this context implies a physically connected structure, such as a vessel network, a road network, etc., which is also the primary structure of interest that is to be extracted as accurately as possible from the given image data. As an example, one can refer to brain vasculature analysis, where a missed vessel segment in the segmentation mask can pathologically be interpreted as a stroke or may lead to dramatic changes in a global simulation of blood flow. On the other hand, limited over- or under-segmentation can be tolerated, because a marginally thicker or thinner segmentation of a vessel does not affect clinical diagnosis.
For evaluating segmentation in such tubular-network structures, traditional performance indices are sub-optimal. For example, dice and Jaccard rely on the average voxel-wise hit or miss prediction [33]. In a task like network-topology extraction, a spatially contiguous sequence of correct voxel prediction is more meaningful than a spurious correct prediction. Further, a globally averaged metric does not equally weight tubular-structures with large, medium and small radii (cf. Fig 1). In real vessel datasets, where vessels of wide radius ranges exist, e.g. 30 m for arterioles [35, 4] and 5 m for capillaries, training on a globally averaged loss induces a strong bias towards the volumetric segmentation of large vessels. This is pronounced in imaging modalities, such as fluorescence microscopy [35, 42] and optoacoustic, which focus on mapping small capillary structures. In Figure 1, an example illustrates the sub-optimality of traditional scores in some scenarios.
Furthermore, most traditional metrics are ambiguous when some of the objects of interest are of the same order as the resolution of the signal. Single-voxel shifts in the prediction can change the topology of the network while maintaining a similar global segmentation score, thus making the metric difficult to interpret [33]. To this end, we are interested in a topology-aware segmentation of an image, eventually enabling correct network extraction. Therefore, we ask the following research questions:
What is a good measure to benchmark segmentation algorithms for tubular, linear and curvilinear structure segmentation while guaranteeing the preservation of the network-topology?
Can we use this improved measure
as a loss function for neural networks?
Achieving topology preservation can be crucial to obtain meaningful segmentation, particularly for elongated and connected shapes, e.g. vascular structures or roads. However analyzing preservation of topology while simplifying geometries is a difficult analytical and computational problem [5, 6].
For binary geometries, various algorithms based on thinning and medial surfaces have been proven to be topology-preserving according to varying definitions of topology [14, 15, 16, 23]. For non-binary geometries, existing methods applied topology and connectivity constraints onto variational and Markov random field-based segmentation methods: tree shape priors for vessel segmentation [31], graph representation priors to natural images [1], higher-order cliques which connect superpixels to road network extraction [39], or integer programming to general curvilinear structures [36], among others [7, 21, 20, 22, 25, 29, 38, 41]
. Furthermore, topological priors of containment and detachment were applied to convolutional neural network (CNN) based segmentation of image features in histology scans
[2].It is critical to differentiate between enforcing a topology prior and training using a topology-preserving loss function. Minimizing a topology-preserving loss function guarantees a perfect topology of a segmentation mask. Recently, some approaches have directly implemented topology-aware loss functions for structure segmentation in CNNs. Hu et al. proposed a continuous-valued loss function based on the Betti number [9]. Mosinska et al. claimed that pixel-wise loss-functions are unsuitable for topology and used selected filter responses from a VGG19 network as an additional penalty [19]. Nonetheless, the latter approach does not prove topology preservation. The method by Hu et al. is based on a matching of critical points, which, according to the authors makes the computation very expensive and error-prone for real image-sized patches [9]. Furthermore, these approaches have not been extended to three dimensional (3D) data.
The objective of this paper is topology preservation while segmenting tubular objects. We introduce a novel connectivity-aware similarity measure named clDice for bench-marking tubular-segmentation algorithms. Importantly, we provide theoretical guarantees for the topological correctness of the clDice for binary 2D and 3D segmentation. As a consequence of its formulation based on morphological skeletons, our measure pronounces the network’s topology instead of equally weighting every voxel. Using a differentiable soft-skeletonization, we show that the clDice measure can be used to train neural networks.
We show experimental results for various 2D and 3D network segmentation settings and tasks to demonstrate the practical applicability of our proposed similarity measure and loss function.
In this section we first introduce the clDice as a similarity measure and subsequently introduce a differentiable loss function namely soft-clDice.
We propose a novel connectivity-preserving metric to evaluate tubular and linear structure segmentation based on intersecting skeletons with masks. We call this metric a centerline-in-mask-dice-coefficient or clDice. We consider two binary masks: the ground truth mask () and the predicted segmentation masks (). First, the skeletons and are extracted from and respectively. Subsequently, we compute the fraction of that lies within , which we call Topology Precision or , and vice-a-versa we obtain Topology Sensitivity or as defined bellow;
(1) |
We observe that the measure is susceptible to false positives in the prediction while the measure is susceptible to false negatives. This explains our rationale behind referring to the as topology’s precision and to the as its sensitivity. Since we want to maximize both precision and sensitivity (recall), we construct the clDice to be symmetric with respect to both the measures:
(2) |
Different approaches to extract skeletons have been described, very popular are approaches using the Euclidean distance transform or approaches which utilize repeated morphological thinning. Although Euclidean distance transform has been used on multiple occasions to induce skeletons [30, 40], it is a discrete operation and, to the best of our knowledge, an end-to-end differentiable approximation remains to be developed. This prevents the usage of Euclidean distance transform as a loss function for training neural networks. On the contrary, morphological thinning is a sequence of dilation and erosion operations [c.f. Fig. 3
]. Min- and max filters are commonly used as the grey-scale alternative of morphological dilation and erosion. Motivated by this, we propose ‘soft-skeletonization’, where an iterative min- and max-pooling is applied as a proxy for morphological erosion and dilation. The Algorithm
1 describes the iterative processes involved in its computation. The hyper-parameter involved in its computation represents the iterations and has to be greater then or equal to the maximum radius for the tube-like structure. In our experiments, this parameter depends on the dataset. E.g. for the synthetic and real 3D vessel data. Choosing a larger does not reduce performance but increases computation time. On the other hand, a too low leads to incomplete skeletonization. In Figure 1, the successive steps of our skeletonization are intuitively represented. In the early iterations, the structures with a small radius are skeletonized and preserved until the later iterations when the thicker structures also become skeletonized. This enables the extraction of a parameter-free, morphologically motivated soft-skeleton on real-valued data. The aforementioned soft-skeletonization enables us to use clDice as a fully differentiable, real-valued, optimizable measure. The Algorithm 2 describes its implementation. We refer to this as the soft-clDice.Betti numbers describe and quantify topological differences in algebraic topology. The first three betti numbers (, , and ) comprehensively capture the manifolds appearing in 2D and 3D topological space. Specifically,
Based on the above notation we formulate the conditions of topology preservation between a labeled binary mask () and a predicted binary mask () according to Kong et al. [14] in 3D in Fig 6.
Thinning using morphological operations (skeletonization) is topology-preserving [23]. Therefore, all the topological differences between the labeled mask and a predicted mask are preserved in the topological differences between the skeletons of an actual mask and a predicted mask, respectively. Note that this holds for the skeletons of both the foreground and background regions. Following this, we postulate that topology preservation of a binary mask through its skeletons using two voxel-specific conditions:
[label=Top 0,align=left]
- No ghosts in skeleton: ; the predicted skeleton is completely included in the true mask. Otherwise, if implies ghosts in .
- No misses in skeleton: ; the true skeleton is completely included in the predicted mask. Otherwise, if implies misses in .
Table 1 redefines the topology-preserving conditions defined for masks (cf. Fig. 6) in terms of Top 1 and Top 2 properties described above. Essentially, it summarizes the necessary conditions when the topology is not preserved based on these two key properties, in terms of the foreground and background skeleton.
Denoting the set of foreground and background voxels with subscripts and , respectively, we represent the voxels in the true mask with and and the voxel in the predicted mask with and . We define loss functions for the foreground and background classes as:
(3) | ||||
(4) |
Equipped with this notation and with the consitions in Table 1, we prove the following aspects of clDice:
Optimal clDice score in voxel-specific conditions achieves perfect topology.
Minimizing topology mismatch implies maximizing clDice.
Any misses or ghosts in the skeleton of the prediction decrease the clDice.
If the , topology is preserved.
We formulate the following theorems to show that minimizing topology mismatch implies maximizing clDice.
Topological Changes | foreground | background | |
---|---|---|---|
I. | New connected-component is created | 1 | - |
II. | Connected-components are merged | - | 2 |
III. | Connected-component is deleted | 2 | - |
IV. | New hole is created | - | 1 |
V. | Holes have been merged | 2 | - |
VI. | Hole is deleted | - | 2 |
VII. | New cavity is created | - | 1 |
VIII. | Cavities are merged | 2 | - |
IX. | Cavity is deleted | - | 2 |
Any ghosts in the skeleton of the prediction decrease the clDice.
Let us consider a true skeleton of a true mask and a perfectly predicted skeleton without any ghosts and misses from a predicted mask , where and are the skeleton points of and respectively. Since there is no ghost or missing components in the skeleton, we have and . Which implies that skeletons and as well as corresponding mask and have the same topology. Considering the topological precision () for :
(5) |
Now, without loss of generality, let us consider the case of a topological change, such that for a predicted mask with no misses in the skeleton, a ghost skeleton was reconstructed that contains connected segments outside . Let’s denote clDice in the perfect prediction and prediction with ghost as and respectively.
(6) |
Considering the topological precision () for :
(7) |
Since the skeletonization algorithm preserves topology and there is no missing components in the prediction , . Considering the topological sensitivities and of and respectively,
(8) |
Combining (7) and (8) in the (2) and given that values of sensitivity and precision belong to by definition, we obtain the following:
(9) |
Any misses in the skeleton of the prediction decrease the clDice.
Similar to the proof of Theorem 1 we consider a true mask and a predicted mask , with their respective true skeleton and a perfectly predicted skeleton without any ghosts and misses. Since there are no ghost or missing components in the skeleton, we have and ; considering the topological sensitivity () for :
(10) |
Similar to the formulation of the Theorem 1, without the loss of generality, let us consider the case of a topological change, for a predicted mask with no ghosts, but with one or more misses in the skeleton such that there exist a connected segment in the true skeleton which is outside of . Let’s denote clDice in the optimal (perfect) prediction and prediction with misses as and respectively.
(11) |
Considering the topological sensitivity () for :
(12) |
Since the skeletonization algorithm preserves topology and there are no ghosts in the predicted skeleton , . Considering the topological precisions: and of and respectively,
(13) |
Since our objective here is to preserve topology while achieving accurate segmentations, we combine our proposed soft-clDice with soft-Dice as following:
(15) |
In stark contrast to previous works, where segmentation and centerline prediction has been learned jointly as multi-task learning [37, 34], we are not interested in learning the centerline. We are interested in learning a topology-preserving segmentation. Therefore, we restrict our experimental choice of alpha to .
We use the proposed clDice to evaluate the segmentation performance of two state-of-the-art network architectures: i) a 2D and 3D U-Net[26, 3], and ii) a 2D and 3D fully connected networks (FCN) [34]. As baselines, we use the same architectures trained using generalized soft-Dice [17, 32].
In all, we employ four datasets for validating clDice and soft-clDice as a measure and an objective function, respectively. In 2D, we test the DRIVE retina dataset ^{1}^{1}1 https://drive.grand-challenge.org/ and the Massachusetts Roads dataset [18] ^{2}^{2}2 https://www.cs.toronto.edu/~vmnih/data/. In 3D, a synthetic and a real brain vessel dataset. The generation of the synthetic vessel data is described in [27], additionally, we add a Gaussian noise term to this generated data ^{3}^{3}3 https://github.com/giesekow/deepvesselnet/wiki/Datasets. The real 3D dataset consists of multi-channel volumetric scans of the brain vasculature (voxel size: ()), which were obtained using light-sheet microscopy of tissue cleared Murine brains, and made publicly available in [35] ^{4}^{4}4 http://discotechnologies.org/VesSAP/.
For the DRIVE vessel segmentation dataset, we perform three-fold cross-validation with 30 images and deploy the best performing model on the test set with 10 images. For the Massachusetts Roads dataset, we choose a subset of 120 images (ignoring imaged without a network of roads) for three-fold cross-validation and test the models on the 13 official test images. For the 3D synthetic dataset. we perform experiments using 15 single-channel volumes for training, 2 for validation, and 5 for testing. For the real 3D dataset, we use 11 volumes for training, 2 for validation and 4 for testing. In each of these cases, we report the performance of the model with the highest dice score on the validation set.
As described in Section 3, in theory, clDice holds and explains a two-class case and should be computed on both the foreground and the background channels. However, in practice, this is hindered by an imbalance in the foreground and background classes (e.g. in vessel and road datasets).
The class imbalance would substantially enhance the computational complexity in calculating the skeletons on the majority class (typically the background class). Thus, we calculate the clDice only on the foreground. Note that this is not detrimental to the performance of clDice in the context of the datasets considered in our experiments. We attribute this to the non-applicability of the necessary conditions specific to the background (i.e. II, IV, VI, VII, and IX in Table 1), as explained below:
II. In tubular structures, all foreground objects are eccentric (or anisotropic). Therefore isotropic skeletonization will highly likely produce a ghost in the foreground.
IV. Creating a hole outside the labeled mask means adding a ghost in the foreground. Creating a hole inside the labeled mask is extremely unlikely because no such holes exist in our training data.
VI. The deletion of a hole without creating a miss is extremely unlikely because of the sparsity of the data.
VII. (only for 3D) Creating a cavity is very unlikely because no cavities exist in our training data.
IX. (only for 3D) Cavities do not exist in the real dataset.
We compare the performance of various experimental setups using two types metrics: overlap-based and topology-based.
Overlap-based: Dice coefficient, Accuracy, and the proposed clDice.
Topology-based: We extract a vascular graph from the skeleton of the predicted segmentation and compute relative accuracy (1 - relative error) of total vascular network length (Dist.), and the ratio of detected bifurcation points (Bifurc.) with respect to the ground truth, which describes graph similarity. Finally, we measure topological similarity using the Euler characteristic, , where is the number of vertices, is the number of edges and is the number of faces. We report the relative Euler characteristic error (), as the ratio of the of the predicted mask and that of the ground truth. Note that a closer to one is preferred.
Data | Network | Loss | Dice | clDice | Acc. | Dist. | Bifurc. | |
---|---|---|---|---|---|---|---|---|
DRIVE retina | FCN | soft-dice | 78.23 | 78.02 | 96.27 | 0.82 | 0.72 | 1.35 |
78.36 | 79.02 | 96.25 | 0.83 | 0.78 | 1.32 | |||
78.75 | 80.22 | 96.29 | 0.83 | 0.79 | 1.10 | |||
78.29 | 80.28 | 96.20 | 0.81 | 0.73 | 1.08 | |||
78.00 | 80.43 | 96.11 | 0.81 | 0.77 | 1.17 | |||
77.76 | 80.95 | 96.04 | 0.83 | 0.79 | 0.97 | |||
DRIVE retina | U-Net | soft-dice | 74.25 | 75.71 | 95.63 | 0.73 | 0.58 | 1.56 |
75.21 | 76.86 | 95.82 | 0.77 | 0.72 | 1.08 | |||
Road-Network | U-Net | soft-dice | 70.98 | 81.45 | 96.38 | 0.86 | 0.73 | 2.09 |
71.16 | 82.12 | 96.30 | 0.88 | 0.74 | 1.48 |
Data | Network | Loss | Dice | clDice | Acc. | Dist. | Bifurc. | |
---|---|---|---|---|---|---|---|---|
Synthetic | FCN, 1 ch | soft-dice | 99.41 | 99.45 | 99.97 | 0.92 | 0.91 | 0.81 |
99.16 | 99.77 | 99.96 | 0.92 | 0.91 | 0.82 | |||
U-Net, 1 ch | soft-dice | 99.61 | 99.90 | 99.98 | 0.88 | 0.86 | 0.83 | |
98.73 | 99.90 | 99.94 | 0.88 | 0.86 | 0.84 | |||
Vessap data | FCN, 1 ch | soft-dice | 75.28 | 90.98 | 89.88 | 0.87 | 0.72 | 1.51 |
85.57 | 96.16 | 95.09 | 0.82 | 0.88 | 0.97 | |||
FCN, 2 ch | soft-dice | 78.54 | 92.03 | 91.66 | 0.90 | 0.82 | 1.33 | |
85.28 | 95.75 | 94.91 | 0.91 | 0.91 | 1.11 | |||
U-Net, 1 ch | soft-dice | 87.11 | 95.03 | 95.78 | 0.92 | 0.82 | 0.77 | |
86.94 | 95.28 | 95.86 | 0.94 | 0.83 | 0.78 | |||
U-Net, 2 ch | soft-dice | 80.20 | 93.05 | 92.33 | 0.95 | 0.93 | 1.24 | |
83.96 | 96.10 | 94.18 | 0.96 | 0.89 | 0.92 |
We trained a U-Net and an FCN for the different loss functions in identical settings. In Table 2 we present an experiment, where we trained five models with a varying from (0.1 to 0.5) on the DRIVE dataset. We observe that including soft-clDice in any proportion leads to improved topological similarity. Further, increasing the consistently improves the clDice measure. The inclusion of soft-clDice
improves dice and accuracy, and more importantly preserves connectedness, improves the topological and graph similarity. In the case of 3D data, we observe similar trends, however it is not so pronounced in the synthetic data. We attribute this to the relatively simple features of the synthetic data, which has a high signal-to-noise ratio and lacks significant illumination variation. However, we observe significant improvements for all measures in case of the more complex multi-channel microscopic vessel data, see Figure
8. Despite not optimizing the soft-clDice on the background class, all of our networks converge to superior segmentation results. This not only reinforces our assumptions on dataset-specific necessary conditions but validates the practical applicability of our loss. Our findings hold for the different network architectures, for 2D or 3D, and for tubular or curvilinear structures, strongly indicating its generalizability to analogous binary segmentation tasks.In Figure 8, typical results for our datasets are depicted. Our networks trained on the proposed loss term recovers connections which were false negatives when trained with the soft-dice loss. Interestingly, in the real 3D vessel dataset, the soft-dice loss over segments stray light from large vessels, while the proposed loss function does not because of its topology-preserving nature.
We introduce clDice, a novel connectivity-preserving similarity measure for tubular structure segmentation. Importantly, we present a theoretical guarantee that clDice enforces topology preservation in 3D. First, we use the new metric to benchmark segmentation quality from a topology-preserving perspective. Next, we use a differentiable version, soft-clDice, in a loss function, to train state-of-the-art 2D and 3D neural networks. We find that training on soft-clDice leads to segmentations with more accurate connectivity information, better Euler characteristics and improved Dice and Accuracy. Our soft-clDice is computationally efficient and can be readily deployed in other tubular or linear-structured object segmentation tasks such as neuron segmentation in biomedical imaging, crack detection in industrial quality control or remote sensing.
Suprosanna Shit, Andrey Zhylka and Ivan Ezhov are supported by the Translational Brain Imaging Training Network(TRABIT) under the European Union’s ‘Horizon 2020’ research & innovation program (Grant agreement ID: 765148). With the support of the Technical University of Munich – Institute for Advanced Study, funded by the German Excellence Initiative. Johannes C. Paetzold and Suprosanna Shit are supported by the Graduate School of Bioengineering, Technical University of Munich. We thank Mihail I. Todorov and Ali Ertürk.
International journal of pattern recognition and artificial intelligence
9 (05), pp. 813–844. Cited by: §1.1, Figure 6, §3.Reconstructing curvilinear networks using path classifiers and integer programming
. IEEE TPAMI 38 (12), pp. 2515–2530. Cited by: §1.1.Data | Network | Loss | Dice | clDice | Acc. | Dist. | Bifurc. | |
---|---|---|---|---|---|---|---|---|
Road-Network | FCN | soft-dice | 64.84 | 70.79 | 95.16 | 0.88 | 0.56 | 28.22 |
66.52 | 74.80 | 95.70 | 0.86 | 0.65 | 15.41 | |||
67.42 | 76.25 | 95.80 | 0.86 | 0.67 | 13.73 | |||
65.90 | 74.86 | 95.35 | 0.87 | 0.61 | 15.39 | |||
67.18 | 76.92 | 95.46 | 0.91 | 0.67 | 15.35 | |||
65.77 | 75.22 | 95.09 | 0.91 | 0.71 | 17.39 |
We use the following notation: , ,
present input, output, and bottleneck information(for U-Net); denote a convolutional layer followed by
and batch-normalization;
denote a trans-posed convolutional layer followed by and batch-normalization; denotes maxpooling; indicates concatenation of information from an encoder block. We had to choose a different FCN architecture for the Massachusetts road dataset because we realize that a larger model is needed to learn useful features for this complex task.
Same as Drive Dataset.
Dataset | Network | Number of parameters |
---|---|---|
Drive | FCN | 15.52K |
UNet | 28.94M | |
Road | FCN | 279.67K |
3D | FCN 2ch | 58.71K |
Unet 2ch | 178.45M |