clDice – a Topology-Preserving Loss Function for Tubular Structure Segmentation

03/16/2020 ∙ by Suprosanna, et al. ∙ Technische Universität München 18

Accurate segmentation of tubular, network-like structures, such as vessels, neurons, or roads, is relevant to many fields of research. For such structures, the topology is their most important characteristic, e.g. preserving connectedness: in case of vascular networks, missing a connected vessel entirely alters the blood-flow dynamics. We introduce a novel similarity measure termed clDice, which is calculated on the intersection of the segmentation masks and their (morphological) skeletons. Crucially, we theoretically prove that clDice guarantees topological correctness for binary 2D and 3D segmentation. Extending this, we propose a computationally efficient, differentiable soft-clDice as a loss function for training arbitrary neural segmentation networks. We benchmark the soft-clDice loss for segmentation on four public datasets (2D and 3D). Training on soft-clDice leads to segmentation with more accurate connectivity information, higher graph similarity, and better volumetric scores.



There are no comments yet.


page 2

page 14

page 18

page 19

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Segmentation of tubular and curvilinear structures is an essential problem in numerous domains, such as clinical and biological applications (blood vessel and neuron segmentation from microscopic, optoacoustic, and radiology images), remote sensing applications (road network segmentation from satellite images), industrial quality control, etc. In these domains, a topologically accurate segmentation is necessary to guarantee error-free down-stream tasks, e.g. computational hemodynamics, Alzheimer’s disease prediction [10], or stroke modeling [11]. Analogous to most other image segmentation tasks, the two most commonly used categories of quantitative performance measures for evaluating segmentation accuracy of tubular

structures, are 1) overlap based measures such as dice-score, precision, recall, and Jaccard index; and 2) volumetric distance measures such as the Hausdorff distance and the Mahalanobis distance

[12, 28, 24, 8].

Figure 1: Motivation: (a) Shows an exemplary 2D slice of real microscopic data, (b) and (c) are two random segmentation results which achieve similar scores that are of identical quality in terms of the traditional dice score (not from our presented model). Note that (b) does not capture any of the small vessels while segmenting the large vessel very accurately; on the other side segmentation (c) captures all vessels in the image while being less accurate on the diameter of the large vessel. From a topology or network perspective, segmentation (c) is preferred.

However in most segmentation problems, where the object of interest is 1) locally a tubular structure and 2) globally forms a network, the most important characteristic to be preserved for the success of the subsequent tasks is the connectivity of the global network topology. Note that network in this context implies a physically connected structure, such as a vessel network, a road network, etc., which is also the primary structure of interest that is to be extracted as accurately as possible from the given image data. As an example, one can refer to brain vasculature analysis, where a missed vessel segment in the segmentation mask can pathologically be interpreted as a stroke or may lead to dramatic changes in a global simulation of blood flow. On the other hand, limited over- or under-segmentation can be tolerated, because a marginally thicker or thinner segmentation of a vessel does not affect clinical diagnosis.

For evaluating segmentation in such tubular-network structures, traditional performance indices are sub-optimal. For example, dice and Jaccard rely on the average voxel-wise hit or miss prediction [33]. In a task like network-topology extraction, a spatially contiguous sequence of correct voxel prediction is more meaningful than a spurious correct prediction. Further, a globally averaged metric does not equally weight tubular-structures with large, medium and small radii (cf. Fig 1). In real vessel datasets, where vessels of wide radius ranges exist, e.g. 30 m for arterioles [35, 4] and 5 m for capillaries, training on a globally averaged loss induces a strong bias towards the volumetric segmentation of large vessels. This is pronounced in imaging modalities, such as fluorescence microscopy [35, 42] and optoacoustic, which focus on mapping small capillary structures. In Figure 1, an example illustrates the sub-optimality of traditional scores in some scenarios.

Furthermore, most traditional metrics are ambiguous when some of the objects of interest are of the same order as the resolution of the signal. Single-voxel shifts in the prediction can change the topology of the network while maintaining a similar global segmentation score, thus making the metric difficult to interpret [33]. To this end, we are interested in a topology-aware segmentation of an image, eventually enabling correct network extraction. Therefore, we ask the following research questions:

  1. What is a good measure to benchmark segmentation algorithms for tubular, linear and curvilinear structure segmentation while guaranteeing the preservation of the network-topology?

  2. Can we use this improved measure

    as a loss function for neural networks?

1.1 Related Literature

Achieving topology preservation can be crucial to obtain meaningful segmentation, particularly for elongated and connected shapes, e.g. vascular structures or roads. However analyzing preservation of topology while simplifying geometries is a difficult analytical and computational problem [5, 6].

For binary geometries, various algorithms based on thinning and medial surfaces have been proven to be topology-preserving according to varying definitions of topology [14, 15, 16, 23]. For non-binary geometries, existing methods applied topology and connectivity constraints onto variational and Markov random field-based segmentation methods: tree shape priors for vessel segmentation [31], graph representation priors to natural images [1], higher-order cliques which connect superpixels to road network extraction [39], or integer programming to general curvilinear structures [36], among others [7, 21, 20, 22, 25, 29, 38, 41]

. Furthermore, topological priors of containment and detachment were applied to convolutional neural network (CNN) based segmentation of image features in histology scans


It is critical to differentiate between enforcing a topology prior and training using a topology-preserving loss function. Minimizing a topology-preserving loss function guarantees a perfect topology of a segmentation mask. Recently, some approaches have directly implemented topology-aware loss functions for structure segmentation in CNNs. Hu et al. proposed a continuous-valued loss function based on the Betti number [9]. Mosinska et al. claimed that pixel-wise loss-functions are unsuitable for topology and used selected filter responses from a VGG19 network as an additional penalty [19]. Nonetheless, the latter approach does not prove topology preservation. The method by Hu et al. is based on a matching of critical points, which, according to the authors makes the computation very expensive and error-prone for real image-sized patches [9]. Furthermore, these approaches have not been extended to three dimensional (3D) data.

Figure 2: Schematic overview of our proposed method: Our proposed clDice

loss can be applied to any arbitrary segmentation network. The soft-skeletonization can be easily implemented using pooling functions from any standard deep-learning toolbox.

1.2 Our Contributions

The objective of this paper is topology preservation while segmenting tubular objects. We introduce a novel connectivity-aware similarity measure named clDice for bench-marking tubular-segmentation algorithms. Importantly, we provide theoretical guarantees for the topological correctness of the clDice for binary 2D and 3D segmentation. As a consequence of its formulation based on morphological skeletons, our measure pronounces the network’s topology instead of equally weighting every voxel. Using a differentiable soft-skeletonization, we show that the clDice measure can be used to train neural networks.

We show experimental results for various 2D and 3D network segmentation settings and tasks to demonstrate the practical applicability of our proposed similarity measure and loss function.

2 Let’s Emphasize Connectivity

In this section we first introduce the clDice as a similarity measure and subsequently introduce a differentiable loss function namely soft-clDice.

Figure 3: Based on the initial vessel structure (purple), sequential bagging of skeleton voxels (red) via iterative skeletonization leads to a complete skeletonization, where denotes the diameter and iterations.

2.0.1 clDice Measure:

We propose a novel connectivity-preserving metric to evaluate tubular and linear structure segmentation based on intersecting skeletons with masks. We call this metric a centerline-in-mask-dice-coefficient or clDice. We consider two binary masks: the ground truth mask () and the predicted segmentation masks (). First, the skeletons and are extracted from and respectively. Subsequently, we compute the fraction of that lies within , which we call Topology Precision or , and vice-a-versa we obtain Topology Sensitivity or as defined bellow;


We observe that the measure is susceptible to false positives in the prediction while the measure is susceptible to false negatives. This explains our rationale behind referring to the as topology’s precision and to the as its sensitivity. Since we want to maximize both precision and sensitivity (recall), we construct the clDice to be symmetric with respect to both the measures:


2.0.2 Soft-clDice as a Loss Function using Soft-skeletonization:

Different approaches to extract skeletons have been described, very popular are approaches using the Euclidean distance transform or approaches which utilize repeated morphological thinning. Although Euclidean distance transform has been used on multiple occasions to induce skeletons [30, 40], it is a discrete operation and, to the best of our knowledge, an end-to-end differentiable approximation remains to be developed. This prevents the usage of Euclidean distance transform as a loss function for training neural networks. On the contrary, morphological thinning is a sequence of dilation and erosion operations [c.f. Fig. 3

]. Min- and max filters are commonly used as the grey-scale alternative of morphological dilation and erosion. Motivated by this, we propose ‘soft-skeletonization’, where an iterative min- and max-pooling is applied as a proxy for morphological erosion and dilation. The Algorithm

1 describes the iterative processes involved in its computation. The hyper-parameter involved in its computation represents the iterations and has to be greater then or equal to the maximum radius for the tube-like structure. In our experiments, this parameter depends on the dataset. E.g. for the synthetic and real 3D vessel data. Choosing a larger does not reduce performance but increases computation time. On the other hand, a too low leads to incomplete skeletonization. In Figure 1, the successive steps of our skeletonization are intuitively represented. In the early iterations, the structures with a small radius are skeletonized and preserved until the later iterations when the thicker structures also become skeletonized. This enables the extraction of a parameter-free, morphologically motivated soft-skeleton on real-valued data. The aforementioned soft-skeletonization enables us to use clDice as a fully differentiable, real-valued, optimizable measure. The Algorithm 2 describes its implementation. We refer to this as the soft-clDice.

Input : 
for  to  do
5 end for
Output : 
Algorithm 1 soft-skeleton
Input : 
Output : 
Algorithm 2 soft-clDice
Figure 4: Algorithm Description: In Algorithm 1, is the mask to be soft-skeletonized and the number of iterations for skeletonization. In Algorithm 2, is a real-valued probabilistic prediction from a segmentation network and is the true mask. We denote Hadamard product using .

3 Topology Preserving Guarantees for clDice

Figure 5: Examples of the topology terminology. Left, a hole in 2D, in the middle a hole in 3D and right a cavity inside a sphere in 3D.

Betti numbers describe and quantify topological differences in algebraic topology. The first three betti numbers (, , and ) comprehensively capture the manifolds appearing in 2D and 3D topological space. Specifically,

  • represents the number of distinct connected-components,

  • represents the number of circular holes [c.f. Fig. 5], and

  • represents the number of cavities [c.f. Fig. 5] (Only applicable in 3D)

Figure 6: Taxonomy of the conditions to preserve topology in 3D [14, 13].

Based on the above notation we formulate the conditions of topology preservation between a labeled binary mask () and a predicted binary mask () according to Kong et al. [14] in 3D in Fig 6.

3.0.1 Topology-preserving skeletonization:

Thinning using morphological operations (skeletonization) is topology-preserving [23]. Therefore, all the topological differences between the labeled mask and a predicted mask are preserved in the topological differences between the skeletons of an actual mask and a predicted mask, respectively. Note that this holds for the skeletons of both the foreground and background regions. Following this, we postulate that topology preservation of a binary mask through its skeletons using two voxel-specific conditions:

  1. [label=Top 0,align=left]

  2. - No ghosts in skeleton: ; the predicted skeleton is completely included in the true mask. Otherwise, if implies ghosts in .

  3. - No misses in skeleton: ; the true skeleton is completely included in the predicted mask. Otherwise, if implies misses in .

Table 1 redefines the topology-preserving conditions defined for masks (cf. Fig. 6) in terms of Top 1 and Top 2 properties described above. Essentially, it summarizes the necessary conditions when the topology is not preserved based on these two key properties, in terms of the foreground and background skeleton.

Denoting the set of foreground and background voxels with subscripts and , respectively, we represent the voxels in the true mask with and and the voxel in the predicted mask with and . We define loss functions for the foreground and background classes as:


Equipped with this notation and with the consitions in Table 1, we prove the following aspects of clDice:

  1. Optimal clDice score in voxel-specific conditions achieves perfect topology.

  2. Minimizing topology mismatch implies maximizing clDice.

  3. Any misses or ghosts in the skeleton of the prediction decrease the clDice.

Theorem 3.1

If the , topology is preserved.


. By definition of topology precision and sensitivity in Equation 1, we know that: and . Which means all sufficient conditions to preserve topology, i.e. 1 and 2, are satisfied and hence topology is preserved.

We formulate the following theorems to show that minimizing topology mismatch implies maximizing clDice.

Topological Changes foreground background
I. New connected-component is created 1 -
II. Connected-components are merged - 2
III. Connected-component is deleted 2 -
IV. New hole is created - 1
V. Holes have been merged 2 -
VI. Hole is deleted - 2
VII. New cavity is created - 1
VIII. Cavities are merged 2 -
IX. Cavity is deleted - 2
Table 1: Necessary violation of skeleton properties (Top 1 and Top 2) for each of the topological changes.
Figure 7: Intuitive depictions of ghosts and misses in the prediction, for the skeleton of the foreground (left) and the skeleton of the background (right).
Theorem 3.2

Any ghosts in the skeleton of the prediction decrease the clDice.


Let us consider a true skeleton of a true mask and a perfectly predicted skeleton without any ghosts and misses from a predicted mask , where and are the skeleton points of and respectively. Since there is no ghost or missing components in the skeleton, we have and . Which implies that skeletons and as well as corresponding mask and have the same topology. Considering the topological precision () for :


Now, without loss of generality, let us consider the case of a topological change, such that for a predicted mask with no misses in the skeleton, a ghost skeleton was reconstructed that contains connected segments outside . Let’s denote clDice in the perfect prediction and prediction with ghost as and respectively.


Considering the topological precision () for :


Since the skeletonization algorithm preserves topology and there is no missing components in the prediction , . Considering the topological sensitivities and of and respectively,


Combining (7) and (8) in the (2) and given that values of sensitivity and precision belong to by definition, we obtain the following:


Theorem 3.3

Any misses in the skeleton of the prediction decrease the clDice.


Similar to the proof of Theorem 1 we consider a true mask and a predicted mask , with their respective true skeleton and a perfectly predicted skeleton without any ghosts and misses. Since there are no ghost or missing components in the skeleton, we have and ; considering the topological sensitivity () for :


Similar to the formulation of the Theorem 1, without the loss of generality, let us consider the case of a topological change, for a predicted mask with no ghosts, but with one or more misses in the skeleton such that there exist a connected segment in the true skeleton which is outside of . Let’s denote clDice in the optimal (perfect) prediction and prediction with misses as and respectively.


Considering the topological sensitivity () for :


Since the skeletonization algorithm preserves topology and there are no ghosts in the predicted skeleton , . Considering the topological precisions: and of and respectively,


In analogy to Theorem 1, based on (12) and (13) we conclude that,


4 Experiments

Since our objective here is to preserve topology while achieving accurate segmentations, we combine our proposed soft-clDice with soft-Dice as following:


In stark contrast to previous works, where segmentation and centerline prediction has been learned jointly as multi-task learning [37, 34], we are not interested in learning the centerline. We are interested in learning a topology-preserving segmentation. Therefore, we restrict our experimental choice of alpha to .

We use the proposed clDice to evaluate the segmentation performance of two state-of-the-art network architectures: i) a 2D and 3D U-Net[26, 3], and ii) a 2D and 3D fully connected networks (FCN) [34]. As baselines, we use the same architectures trained using generalized soft-Dice [17, 32].

4.1 Datasets

In all, we employ four datasets for validating clDice and soft-clDice as a measure and an objective function, respectively. In 2D, we test the DRIVE retina dataset 111 and the Massachusetts Roads dataset [18] 222 In 3D, a synthetic and a real brain vessel dataset. The generation of the synthetic vessel data is described in [27], additionally, we add a Gaussian noise term to this generated data 333 The real 3D dataset consists of multi-channel volumetric scans of the brain vasculature (voxel size: ()), which were obtained using light-sheet microscopy of tissue cleared Murine brains, and made publicly available in [35] 444

For the DRIVE vessel segmentation dataset, we perform three-fold cross-validation with 30 images and deploy the best performing model on the test set with 10 images. For the Massachusetts Roads dataset, we choose a subset of 120 images (ignoring imaged without a network of roads) for three-fold cross-validation and test the models on the 13 official test images. For the 3D synthetic dataset. we perform experiments using 15 single-channel volumes for training, 2 for validation, and 5 for testing. For the real 3D dataset, we use 11 volumes for training, 2 for validation and 4 for testing. In each of these cases, we report the performance of the model with the highest dice score on the validation set.

4.2 clDice in Practice.

As described in Section 3, in theory, clDice holds and explains a two-class case and should be computed on both the foreground and the background channels. However, in practice, this is hindered by an imbalance in the foreground and background classes (e.g. in vessel and road datasets).

The class imbalance would substantially enhance the computational complexity in calculating the skeletons on the majority class (typically the background class). Thus, we calculate the clDice only on the foreground. Note that this is not detrimental to the performance of clDice in the context of the datasets considered in our experiments. We attribute this to the non-applicability of the necessary conditions specific to the background (i.e. II, IV, VI, VII, and IX in Table 1), as explained below:

  • II. In tubular structures, all foreground objects are eccentric (or anisotropic). Therefore isotropic skeletonization will highly likely produce a ghost in the foreground.

  • IV. Creating a hole outside the labeled mask means adding a ghost in the foreground. Creating a hole inside the labeled mask is extremely unlikely because no such holes exist in our training data.

  • VI. The deletion of a hole without creating a miss is extremely unlikely because of the sparsity of the data.

  • VII. (only for 3D) Creating a cavity is very unlikely because no cavities exist in our training data.

  • IX. (only for 3D) Cavities do not exist in the real dataset.

4.3 Evaluation Metrics

We compare the performance of various experimental setups using two types metrics: overlap-based and topology-based.

  1. Overlap-based: Dice coefficient, Accuracy, and the proposed clDice.

  2. Topology-based: We extract a vascular graph from the skeleton of the predicted segmentation and compute relative accuracy (1 - relative error) of total vascular network length (Dist.), and the ratio of detected bifurcation points (Bifurc.) with respect to the ground truth, which describes graph similarity. Finally, we measure topological similarity using the Euler characteristic, , where is the number of vertices, is the number of edges and is the number of faces. We report the relative Euler characteristic error (), as the ratio of the of the predicted mask and that of the ground truth. Note that a closer to one is preferred.

Data Network Loss Dice clDice Acc. Dist. Bifurc.
DRIVE retina FCN soft-dice 78.23 78.02 96.27 0.82 0.72 1.35
78.36 79.02 96.25 0.83 0.78 1.32
78.75 80.22 96.29 0.83 0.79 1.10
78.29 80.28 96.20 0.81 0.73 1.08
78.00 80.43 96.11 0.81 0.77 1.17
77.76 80.95 96.04 0.83 0.79 0.97
DRIVE retina U-Net soft-dice 74.25 75.71 95.63 0.73 0.58 1.56
75.21 76.86 95.82 0.77 0.72 1.08
Road-Network U-Net soft-dice 70.98 81.45 96.38 0.86 0.73 2.09
71.16 82.12 96.30 0.88 0.74 1.48
Table 2: Experimental results for 2D networks on the DRIVE dataset and the Massachusetts road dataset. Bold numbers indicate the best performance. All images are RGB (3 ch). Compared to soft-Dice, we observe that soft-clDice results in improved traditional scores, clDice as well as Euler characteristic , for varying values of .
Data Network Loss Dice clDice Acc. Dist. Bifurc.
Synthetic FCN, 1 ch soft-dice 99.41 99.45 99.97 0.92 0.91 0.81
99.16 99.77 99.96 0.92 0.91 0.82
U-Net, 1 ch soft-dice 99.61 99.90 99.98 0.88 0.86 0.83
98.73 99.90 99.94 0.88 0.86 0.84
Vessap data FCN, 1 ch soft-dice 75.28 90.98 89.88 0.87 0.72 1.51
85.57 96.16 95.09 0.82 0.88 0.97
FCN, 2 ch soft-dice 78.54 92.03 91.66 0.90 0.82 1.33
85.28 95.75 94.91 0.91 0.91 1.11
U-Net, 1 ch soft-dice 87.11 95.03 95.78 0.92 0.82 0.77
86.94 95.28 95.86 0.94 0.83 0.78
U-Net, 2 ch soft-dice 80.20 93.05 92.33 0.95 0.93 1.24
83.96 96.10 94.18 0.96 0.89 0.92
Table 3: Experimental results for 3D U-Nets and 3D FCNs on synthetic and real data. We observe a consistent performance improvement for real data with the combination of soft-clDice and soft-dice. Bold numbers indicate the best performing loss functions on the same network with the identical train, validation and test set. Overall clDice leads to results that are preferable to those obtained with soft-Dice

4.4 Discussion

We trained a U-Net and an FCN for the different loss functions in identical settings. In Table 2 we present an experiment, where we trained five models with a varying from (0.1 to 0.5) on the DRIVE dataset. We observe that including soft-clDice in any proportion leads to improved topological similarity. Further, increasing the consistently improves the clDice measure. The inclusion of soft-clDice

improves dice and accuracy, and more importantly preserves connectedness, improves the topological and graph similarity. In the case of 3D data, we observe similar trends, however it is not so pronounced in the synthetic data. We attribute this to the relatively simple features of the synthetic data, which has a high signal-to-noise ratio and lacks significant illumination variation. However, we observe significant improvements for all measures in case of the more complex multi-channel microscopic vessel data, see Figure

8. Despite not optimizing the soft-clDice on the background class, all of our networks converge to superior segmentation results. This not only reinforces our assumptions on dataset-specific necessary conditions but validates the practical applicability of our loss. Our findings hold for the different network architectures, for 2D or 3D, and for tubular or curvilinear structures, strongly indicating its generalizability to analogous binary segmentation tasks.

Figure 8: Qualitative results: from top to bottom, for the DRIVE retina, the Massachusetts road dataset and for 2D slices from our real 3D vessel dataset. From left to right, the real image, the label, the prediction using soft-dice and the U-Net predictions using , respectively. This indicates that clDice segments road connections and retina vessel connections which the soft-dice loss misses, but also does not segment false-positive vessels in 3D.

In Figure 8, typical results for our datasets are depicted. Our networks trained on the proposed loss term recovers connections which were false negatives when trained with the soft-dice loss. Interestingly, in the real 3D vessel dataset, the soft-dice loss over segments stray light from large vessels, while the proposed loss function does not because of its topology-preserving nature.

5 Conclusions

We introduce clDice, a novel connectivity-preserving similarity measure for tubular structure segmentation. Importantly, we present a theoretical guarantee that clDice enforces topology preservation in 3D. First, we use the new metric to benchmark segmentation quality from a topology-preserving perspective. Next, we use a differentiable version, soft-clDice, in a loss function, to train state-of-the-art 2D and 3D neural networks. We find that training on soft-clDice leads to segmentations with more accurate connectivity information, better Euler characteristics and improved Dice and Accuracy. Our soft-clDice is computationally efficient and can be readily deployed in other tubular or linear-structured object segmentation tasks such as neuron segmentation in biomedical imaging, crack detection in industrial quality control or remote sensing.


Suprosanna Shit, Andrey Zhylka and Ivan Ezhov are supported by the Translational Brain Imaging Training Network(TRABIT) under the European Union’s ‘Horizon 2020’ research & innovation program (Grant agreement ID: 765148). With the support of the Technical University of Munich – Institute for Advanced Study, funded by the German Excellence Initiative. Johannes C. Paetzold and Suprosanna Shit are supported by the Graduate School of Bioengineering, Technical University of Munich. We thank Mihail I. Todorov and Ali Ertürk.


  • [1] B. Andres et al. (2011) Probabilistic image segmentation with closedness constraints. In ICCV, pp. 2611–2618. Cited by: §1.1.
  • [2] A. BenTaieb and G. Hamarneh (2016) Topology aware fully convolutional networks for histology gland segmentation. In MICCAI, pp. 460–468. Cited by: §1.1.
  • [3] Ö. Çiçek and Aothers (2016) 3D U-Net: learning dense volumetric segmentation from sparse annotation. In MICCAI, pp. 424–432. Cited by: §4.
  • [4] A. P. Di Giovanna et al. (2018) Whole-brain vasculature reconstruction at the single capillary level. Scientific reports 8 (1), pp. 12573. Cited by: §1.
  • [5] H. Edelsbrunner and J. Harer (2010) Computational topology: an introduction. American Mathematical Soc.. Cited by: §1.1.
  • [6] H. Edelsbrunner et al. (2000) Topological persistence and simplification. In FOCS, pp. 454–463. Cited by: §1.1.
  • [7] X. Han et al. (2003) A topology preserving level set method for geometric deformable models. IEEE TPAMI 25 (6), pp. 755–768. Cited by: §1.1.
  • [8] K. Hu et al. (2018) Retinal vessel segmentation of color fundus images using multiscale convolutional neural network with an improved cross-entropy loss function. Neurocomputing 309, pp. 179–191. Cited by: §1.
  • [9] X. Hu et al. (2019) Topology-preserving deep image segmentation. In NeurIPS, pp. 5658–5669. Cited by: §1.1.
  • [10] J. M. Hunter et al. (2012) Morphological and pathological evolution of the brain microcirculation in aging and Alzheimer’s disease. PloS one 7 (5), pp. e36893. Cited by: §1.
  • [11] A. Joutel et al. (2010) Cerebrovascular dysfunction and microcirculation rarefaction precede white matter lesions in a mouse genetic model of cerebral ischemic small vessel disease. JCI 120 (2), pp. 433–445. Cited by: §1.
  • [12] C. Kirbas and F. Quek (2004) A review of vessel extraction techniques and algorithms. CSUR 36 (2), pp. 81–121. Cited by: §1.
  • [13] T. Y. Kong and A. Rosenfeld (1989) Digital topology: introduction and survey. Computer Vision, Graphics, and Image Processing 48 (3), pp. 357–393. Cited by: Figure 6.
  • [14] T. Y. Kong (1995) On topology preservation in 2-D and 3-D thinning.

    International journal of pattern recognition and artificial intelligence

    9 (05), pp. 813–844.
    Cited by: §1.1, Figure 6, §3.
  • [15] T. Lee et al. (1994) Building skeleton models via 3-D medial surface axis thinning algorithms. CVGIP: Graphical Models and Image Processing 56 (6), pp. 462–478. Cited by: §1.1.
  • [16] C. M. Ma (1994) On topology preservation in 3D thinning. CVGIP: Image understanding 59 (3), pp. 328–339. Cited by: §1.1.
  • [17] F. Milletari et al. (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 3DV, pp. 565–571. Cited by: §4.
  • [18] V. Mnih (2013) Machine learning for aerial image labeling. Ph.D. Thesis, University of Toronto. Cited by: §4.1.
  • [19] A. Mosinska et al. (2018) Beyond the pixel-wise loss for topology-aware delineation. In CVPR, pp. 3136–3145. Cited by: §1.1.
  • [20] F. Navarro et al. (2019) Shape-aware complementary-task learning for multi-organ segmentation. In International Workshop on MLMI, pp. 620–627. Cited by: §1.1.
  • [21] S. Nowozin and C. H. Lampert (2009) Global connectivity potentials for random field models. In CVPR, pp. 818–825. Cited by: §1.1.
  • [22] M. R. Oswald et al. (2014) Generalized connectivity constraints for spatio-temporal 3D reconstruction. In ECCV, pp. 32–46. Cited by: §1.1.
  • [23] K. Palágyi (2002) A 3-subiteration 3D thinning algorithm for extracting medial surfaces. Pattern Recognition Letters 23 (6), pp. 663–675. Cited by: §1.1, §3.0.1.
  • [24] R. Phellan et al. (2017) Vascular segmentation in TOF MRA images of the brain using a deep convolutional neural network. In MICCAI Workshop, pp. 39–46. Cited by: §1.
  • [25] M. Rempfler et al. (2017) Efficient algorithms for moral lineage tracing. In ICCV, pp. 4695–4704. Cited by: §1.1.
  • [26] O. Ronneberger et al. (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241. Cited by: §4.
  • [27] M. Schneider et al. (2012) Tissue metabolism driven arterial tree generation. Med Image Anal. 16 (7), pp. 1397–1414. Cited by: §4.1.
  • [28] M. Schneider et al. (2015) Joint 3-D vessel segmentation and centerline extraction using oblique Hough forests with steerable filters. Med Image Anal. 19 (1), pp. 220–249. Cited by: §1.
  • [29] F. Ségonne (2008) Active contours under topology control—genus preserving level sets. International Journal of Computer Vision 79 (2), pp. 107–117. Cited by: §1.1.
  • [30] F. Y. Shih and C. C. Pu (1995) A skeletonization algorithm by maxima tracking on euclidean distance transform. Pattern Recognition 28 (3), pp. 331–341. Cited by: §2.0.2.
  • [31] J. Stuhmer et al. (2013) Tree shape priors with connectivity constraints using convex relaxation on general graphs. In ICCV, pp. 2336–2343. Cited by: §1.1.
  • [32] C. H. Sudre et al. (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In MICCAI Workshop, pp. 240–248. Cited by: §4.
  • [33] A. A. Taha and A. Hanbury (2015) Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Medical Imaging 15 (1), pp. 29. Cited by: §1, §1.
  • [34] G. Tetteh et al. (2018) Deepvesselnet: vessel segmentation, centerline prediction, and bifurcation detection in 3-d angiographic volumes. arXiv preprint arXiv:1803.09340. Cited by: §4, §4.
  • [35] M. I. Todorov et al. (2019) Automated analysis of whole brain vasculature using machine learning. bioRxiv, pp. 613257. Cited by: §1, §4.1.
  • [36] E. Türetken et al. (2016)

    Reconstructing curvilinear networks using path classifiers and integer programming

    IEEE TPAMI 38 (12), pp. 2515–2530. Cited by: §1.1.
  • [37] F. Uslu and A. A. Bharath (2018) A multi-task network to detect junctions in retinal vasculature. In MICCAI, pp. 92–100. Cited by: §4.
  • [38] S. Vicente et al. (2008) Graph cut based image segmentation with connectivity priors. In CVPR, pp. 1–8. Cited by: §1.1.
  • [39] J. D. Wegner et al. (2013) A higher-order CRF model for road network extraction. In CVPR, pp. 1698–1705. Cited by: §1.1.
  • [40] M. W. Wright et al. (1995) Skeletonization using an extended euclidean distance transform. Image and Vision Computing 13 (5), pp. 367–375. Cited by: §2.0.2.
  • [41] Y. Zeng et al. (2008) Topology cuts: a novel min-cut/max-flow algorithm for topology preserving segmentation in n–d images. CVIU 112 (1), pp. 81–90. Cited by: §1.1.
  • [42] S. Zhao et al. (2020) Cellular and molecular probing of intact human organs. Cell. Cited by: §1.

Appendix 0.A Additional qualitative results

Figure 9: Qualitative results: for the DRIVE retina dataset. From left to right, the real image, the label, the prediction using soft-dice and the U-Net predictions using , respectively. This indicates that soft-clDice recovers retina vessel connections which the soft-dice loss misses.
Figure 10: Qualitative results: for the Massachusetts Road dataset. From left to right, the real image, the label, the prediction using soft-dice and the predictions using , respectively. The first three rows are U-Net results and the last row is an FCN result. This indicates that soft-clDice segments road connections which the soft-dice loss misses.
Figure 11: Qualitative results: 2D slices of the 3D vessel dataset of different sized field of views. From left to right, the real image, the label, the prediction using soft-dice and the FCN predictions using , respectively. These images show that soft-clDice helps to better segment the vessel connections. Importantly the networks trained using soft-dice over-segment the vessel radius and segments incorrect connections. Both of these errors are not present when we train including soft-clDice in the loss.

Appendix 0.B Additional quantitative results

Data Network Loss Dice clDice Acc. Dist. Bifurc.
Road-Network FCN soft-dice 64.84 70.79 95.16 0.88 0.56 28.22
66.52 74.80 95.70 0.86 0.65 15.41
67.42 76.25 95.80 0.86 0.67 13.73
65.90 74.86 95.35 0.87 0.61 15.39
67.18 76.92 95.46 0.91 0.67 15.35
65.77 75.22 95.09 0.91 0.71 17.39
Table 4: Experimental results for 2D networks on the Massachusetts road dataset. Bold numbers indicate the best performance. All images are RGB (3 ch). Compared to soft-Dice, we observe that soft-clDice results in improved traditional scores for all , clDice as well as an improved Euler characteristic agreement, for varying values of .

Appendix 0.C Network architectures

We use the following notation: , ,
present input, output, and bottleneck information(for U-Net); denote a convolutional layer followed by

and batch-normalization;

denote a trans-posed convolutional layer followed by and batch-normalization; denotes maxpooling; indicates concatenation of information from an encoder block. We had to choose a different FCN architecture for the Massachusetts road dataset because we realize that a larger model is needed to learn useful features for this complex task.

0.c.1 Drive Dataset

0.c.1.1 Fcn :

0.c.1.2 Unet :

ConvBlock :


Encoder :

Decoder :

0.c.2 Road Dataset

0.c.2.1 Fcn :

0.c.2.2 Unet :

Same as Drive Dataset.

0.c.3 3D Dataset

0.c.3.1 3d Fcn :

0.c.3.2 3D Unet :

ConvBlock :


Encoder :

Decoder :

Dataset Network Number of parameters
Drive FCN 15.52K
UNet 28.94M
Road FCN 279.67K
3D FCN 2ch 58.71K
Unet 2ch 178.45M
Table 5: Total number of parameters for each of the architectures used in our experiment.

Appendix 0.D Code for the clDice similarity measure and the soft-clDice loss (PyTorch):

0.d.1 clDice measure

from skimage.morphology import skeletonize
import numpy as np
def cl_score(v, s):
    return np.sum(v*s)/np.sum(s)
def clDice(v_p, v_l):
    tprec = cl_score(v_p,skeletonize(v_l))
    tsens = cl_score(v_l,skeletonize(v_p))
    return 2*tprec*tsens/(tprec+tsens)

0.d.2 soft-clDice in 2D

import torch.nn.functional as F
def soft_erode(img):
    p1 = -F.max_pool2d(-img, (3,1), (1,1), (1,0))
    p2 = -F.max_pool2d(-img, (1,3), (1,1), (0,1))
    return torch.min(p1,p2)
def soft_dilate(img):
    return F.max_pool2d(img, (3,3), (1,1), (1,1))
def soft_open(img):
    return soft_dilate(soft_erode(img))
def soft_skel(img, iter):
    img1 = soft_open(img)
    skel = F.relu(img-img1)
    for j in range(iter):
        img = soft_erode(img)
        img1 = soft_open(img)
        delta = F.relu(img-img1)
        skel = skel + F.relu(delta-skel*delta)
    return skel
def soft_clDice(v_p, v_l, iter = 50, smooth=1):
    s_p = soft_skel(v_p, iter)
    s_l = soft_skel(v_l, iter)
    tprec = ((s_p*v_l).sum()+smooth)/(s_p.sum()+smooth)
    tsens = ((s_l*v_p).sum()+smooth)/(s_l.sum()+smooth)
    return 2*tprec*tsens/(tprec+tsens)