Angiography offers insights into blood flow and conditions of vascular network. Three dimensional volumetric angiography information can be obtained using magnetic resonance (MRA), ultrasound, or x-ray based technologies like computed tomography (CT). A common first step in analyzing these data is vessel segmentation. Still, moving from raw angiography images to vessel segmentation alone might not provide enough information for clinical use, and other vessel features like centerline, diameter, or bifurcations of the vessels are also needed to accurately extract information about the vascular network, for example, to characterize its structural properties of flow pattern. In this work, we present a deep learning approach, called DeepVesselNet, to perform vessel segmentation, centerline prediction, and bifurcation detection tasks. DeepVesselNet deals with challenges that result from speed and memory requirements, unbalanced class labels, and the difficulty of obtaining well-annotated data for curvilinear volumetric structures by addressing the following three key limitations.
Fast cross-hair filters
Processing 3-D medical volumes poses a memory consumption and speed challenge. Using 3-D CNNs leads to drastic increase in number of parameters and computations compared to 2-D CNNs. At the same time, applying a 2-D CNN in a slice-wise fashion discards valuable 3-D context information that is crucial for tracking curvilinear structures in 3-D. Inspired by the ideas of [1, 2, 3] who proposed separable filters and using of intersecting 2-D planes, we demonstrate the use of cross-hair filters from three intersecting 2-D filters, which helps to avoid the memory and speed problems of classical 3-D networks, while at the same time making use of 3-D information in volumetric data. Unlike the existing ideas where 2-D planes are extracted at a pre-processing stage and used as input channels (see discussion in Sec. III-A), our cross-hair filters (see Sec. III-A) are implemented on a layer level which help retain the 3-D information throughout the network.
Extreme class balancing
The vessel, centerline and bifurcation prediction tasks is characterized by high class imbalances. Vessels account for less than 3% of the total voxels in a patient volume, centerlines represent a fraction of the segmented vessels, and visible bifurcations are in the hundreds at best – even when dealing with volumes with and more voxels. This bias towards the background class is a common problem in medical data [4, 5, 6]. Unfortunately current class balancing loss functions for training CNNs turns out to be numerically unstable in extreme cases as ours. We offer a solution to this ’extreme class imbalance’ problem by introducing a new loss function (see Sec. III-B) that we demonstrate to work well with our vascular features of interest.
Transfer learning from synthetic data
Manually annotating vessels, centerlines, and bifurcations requires many hours of work and expertise. To this end, we demonstrate the successful use of simulation based frameworks [7, 8] that can be used for generating synthetic data with accurate labels (see Sec. III-C) for pre-training our networks, rendering the training of our supervised classification algorithm feasible.
Ii Prior Work and Open Challenges
Vessel enhancement and segmentation is a longstanding task in medical image analysis (see reviews by [9, 10]). The range of methods employed for vessel segmentation reflect the development of image processing during the past decades, some examples including region growing techniques , active contours , statistical and shape models [13, 14, 15, 16], particle filtering [17, 18, 19] and path tracing . All of these examples are interactive, starting from a set of seed label as root and propagating towards the branches. Other approaches aim at an unsupervised enhancement of vascular structures: the multi scale second order local structure of an image (Hessian) was examined by 
with the purpose of developing a vessel enhancement filter. A measure of vessel-likeliness is then obtained as a function of all eigenvalues of the Hessian. A novel curvilinear structure detector, called Optimally Oriented Flux (OOF) was proposed by to find an optimal axis on which image gradients are projected to compute the image gradient flux. OOF has a lower computational load than the calculation of the Hessian matrix proposed in . A level-set segmentation approach with vesselness-dependent anisotropic energy weights is presented and evaluated in [23, 24] for 3-D time-of-flight (TOF) MRA. Phellan and Forkert 
presented a comparative analysis of the accuracy gains in vessel segmentation generated by the use of nine vessel enhancement algorithms on time-of-flight MRA that included multi scale vesselness algorithms, diffusion-based filters, and filters that enhance tubular shapes. A machine learning approach was followed by, combining joint 3-D vessel segmentation and centerline extraction using oblique Hough forest with steerable filters. In a similar fashion, 
used deep artificial neural network as a pixel classifier to automatically segment neuronal structures in stacks of electron microscopy images, a task somewhat similar to vessel segmentation. One example using deep learning architecture is,
who used a deep convolutional neural network to automatically segment the vessels of the brain in TOF MRA by extracting manually annotated bi-dimensional image patches in the axial, coronal, and sagittal directions as an input to the training process.
Identifying the center of a vessel is relevant for calculating the vessel diameter, but also for extracting the ’skeleton’ of a vessel when extracting the vascular network (see Fig. 1). The vessels’ skeleton and center can be found by post-processing a previously generated vessel segmentation or deal with centerline extraction in raw images with high vessel contrast. A method based on morphological operations is developed by  which performs erosion using neighborhoods of a pixel to determine if a pixel is a centerline candidate. Active contour models are applied in  as well as path planning and distance transforms for extracting centerline in vessels, and  proposed a geodesic or minimal path technique. A morphology-guided level set model is used in 
to performed centerline extraction by learning the structural patterns of a tubular-like object, and estimating the centerline of a tubular object as the path with minimal cost with respect to outward flux in gray level images. Zheng et al. adopted vesselness filters to predict the location of the centerline, while 
used Houghs transforms. Hough random forest with local image filters is designed in[26, 7] to predict the centerline, and trained on centerline data previously extracted using one of the level set approaches. The application of deep learning to the extraction of vessel centerline has not been explored. One reason may be the lack of annotated data necessary to train deep architectures that is hard to obtain especially in 3-D datasets.
Vessel bifurcation refers to the point on a vessel centerline where the vessel splits into two or more smaller vessels (see Fig. 1). Bifurcations represent the nodes of the vascular network and knowing their locations is important both for network extraction and for studying its properties . They represent structures that can easily be used as landmarks in image registration, but also indicate the locations of modified blood flow velocity and pressure within the network itself . Bifurcations are hard to detect in volumetric data – as they are rare point-like features varying in size and shape significantly. Similar to centerline extraction, the detection of bifurcations often happens by post-processing a previously generated vessels segmentation or by searching a previously extracted vessel graph. A two staged deep learning architecture is proposed in  for detecting carotid artery bifurcations as a specific landmark in volumetric CT data by first training a shallow network for predicting candidate regions followed by a sparse deep network for final prediction. A three stage algorithm for bifurcation detection is proposed in  for digital eye fundus images, a 2-D task, and their approach included image enhancement, clustering, and searching the graph for bifurcations. The direct predicting of the location of bifurcations in a full volumetric data set is a task which has not been attempted yet. Bifurcations – and the connections in between them – are the nodes and, hence, the building blocks of a vascular network. To this end, we aim at a direct prediction of their location to offer new alternatives to a plain search of the potentially inaccurate network graph.
Convolutional Neural Networks for Angiography Analysis
The use of Convolutional Neural Networks (CNNs) for image segmentation has seen rapid progress during the past years. Originally aiming at predicting global labels or scores. CNNs are employed in  for pixel-wise classification using as input the raw intensity values of a square window centered on the pixel of interest. Unfortunately, these patch-wise architectures suffer from replication of information in memory, which limits the amount of training data that can be used during training and affects speed during testing. The problems were addressed in [38, 39] through inverting the downsampling and convolutional operations in CNN with upsampling and deconvolutional operators leading to a Fully Convolutional Neural Network (FCNN) which is trained end-to-end for semantic segmentation. Based on this idea different variants of state-of-the-art FCNN have been presented for medical image segmentation [5, 40, 41, 42, 43, 44, 45]. Most of these architectures were either employed on 2-D images or extended to 3-D volumes in a manner which leads to loss of 3-D object information or to high memory needs. For example,  applied the U-NET architecture proposed by  on 3-D data in a 2-D slice-wise fashion (so called 2.5D CNNs), which does not account for inter-slice information. At the same time, full 3-D architectures, such as the V-Net by , come with the cost of increasing the number of parameters to be learned during training by at least a factor of three over 2-D architectures of similar depth.
Iii-a Cross-hair Filters Formulation
In this section, we discuss the formulation of the 3-D convolutional operator, which utilizes cross-hair filters to improve speed and memory usage while maintaining accuracy. Let be a 3-D volume, M a 3-D convolutional kernel of shape , and be a convolutional operator. We define as:
where is the greatest integer less or equal to . From Eq. (1), we see that a classical 3-D convolution involves multiplications and additions for each voxel of the resulting image. For a kernel, we have multiplications and additions per voxel. Changing the kernel size to increases the complexity to multiplications and additions per voxel. This then scales up with the dimension of the input image. For example, a volume of size and a kernel results in about multiplications and additions. To handle this increased computational complexity, we approximate the standard 3-D convolution operation by
where are 2-D cross-hair filters. Using cross-hair filters results in multiplications and additions. If we let be the sizes of the kernel such that , we can show that
where strict inequality holds for all . Eq. 3 shows a better scaling in speed and also in memory since the filters sizes in (1) and (III-A) are affected by the same inequality. With the approximation in (III-A), and using the same example as above (volume of size and a kernel), we now need less than multiplications and additions to compute the convolution leading to a reduction in computation by more than multiplications and additions when compared to a classical 3-D convolution. Increasing the volume or kernel size, further increases the gap between the computational complexity of (1) and (III-A). Moreover, we will see later from our experiments that (III-A) still retains essential 3-D context information needed for the classification task.
In Eq. (III-A), we presented our 2-D crosshair filters. However, applying (III-A) independently for each voxel (as defined in Eq. (III-A)) leads to a redundant use of memory. More precisely, voxels close to each other share some neigbourhood information and making multiple copies of it is not memory efficient. We now present an efficient implementation, which we develop and use in our experiments (Fig. 3). Consider as defined in Eq. (1) and let us extract the sagital, coronal, and axial planes as , and respectively. By application of Eqs (1), and (III-A), we have a final implementation as follows:
where refers to a 2-D convolution along the first and second axes of the left hand side matrix over all slices in the third axis and refers to our crosshair filter operation. This implementation is efficient in the sense that it makes use of one volume at a time instead of copies of the volume in memory where voxels share the same neighbourhood. In other words, we still have only one volume in memory but rather rotate the kernels to match the slices in the different orientaions. This lowers the memory requirements during training and inference, allowing to train on more data with little memory.
2.5-D Networks vs. 3-D Networks with Cross-hair Filters
We end the presentation of cross-hair filters by discussing the difference between existing 2.5-D networks and our proposed cross-hair filters. Given a 3-D task (e.g. vessel segmentation in 3-D volume) a 2.5-D based network handles the task by considering one 2-D slice at a time. More precisely, the network takes a 2-D slice as input and classifies all pixels in this slice. This is repeated for each slice in the volume and the final results from the slices are fused again to form the 3-D result. Other 2.5-D methods include a pre-processing stage where several 2-D planes are extracted and used as input channels to the 2-D network [3, 2]. On the architecture level, 2.5-D networks are 2-D networks with a preprocessing method for extracting 2-D slices and a postprocessing method for fusing 2-D results into a 3-D volume. We note that the predictions of 2.5-D networks are solely based on 2-D context information. Example of 2.5-D networks is the implementation of U-Net in  used for liver and lesion segmentation tasks in CT volumetric dataset and the network architecture of  for annotation of lumbar vertebrae.
On the other hand, 3-D networks based on our proposed cross-hair filters take the whole 3-D volume as input and at each layer in the network, we apply the convolutional operator discussed in Section III-A. Therefore, our filters make use of 3-D context information at each convolutional layer and do not require specific preprocessing or post processing. Our proposed method differs from classical 3-D networks in the sense that it uses less parameters and memory since it does not use full 3-D convolutions. However, it is worth noting that our filters scale exactly the same as 2.5-D (i.e. in only two directions) with respect to changes in filter and volume sizes. More precisely, given a square or cubic filter of size , we have parameters in a 2.5-D network and in our cross-hair filter based network. Increasing the filter size by a factor of will scale up as quadratically in both situations (i.e. for 2.5-D and in cross-hair filter case) as compared to full 3-D networks where the parameter size scales as a cube of .
Unlike the existing 2.5-D ideas where 2-D planes are extracted at a pre-processing stage and used as input channels to a 2-D network architecture, our cross-hair filters are implemented on a layer level which help retain the 3-D information throughout the network making it a preferred option when detecting curvilinear objects in 3-D.
Iii-B Extreme Class Balancing with Stable Weights
Often in medical image analysis, the object of interest (e.g. vessel, tumor etc.) accounts for a minority of the total voxels of the image. Fig. 4 shows that the objects of interest in the datasets used in this work account for less than 2.5% of the voxels (the different datasets are described in Section IV-A). Using a standard cross entropy loss function given by
is the probability of obtaining the ground truth label given the dataand network weights , is the label for the th example, is the feature set, is the set of parameters of the network, is the set of positive labels, and is the set of negative (background) labels, could cause the training process to be biased towards detecting background voxels at the expense of the object of interest. This normally results in predictions with high precision against low recall. To remedy this problem,  proposed a biased sampling loss function for training multi scale convolutional neural networks for a contour detection task. This loss function introduced additional trade-off parameters and then samples twice more edge patches than non-edge ones for positive cost-sensitive fine-tuning, and vice versa, for negative cost-sensitive fine-tuning. Based on this,  proposed a class-balancing cross entropy loss function of the form
denotes the standard set of parameters of the network, which are trained with backpropagation andand are the class weighting multipliers, which are calculated as , . ’s are the probabilities from the final layer of the network, and and are the set of positive and negative class labels respectively. This idea, which is to give more weight to the cost associated with the class with the lowest count, has been used in other recent works [5, 40, 42, 43]. However, our experiments (in Section B) show that the above loss function raises two main challenges:
The gradient computation is numerically unstable for very big training sets due to the high values taken by the loss. More precisely, there is a factor of , that scales the final sum to the mean cost in the standard cross-entropy loss function in Eq. (III-B). This factor ensures that the gradients are stable irrespective of the size of the training data . However, in Eq. (6), the weights and do not scale the cost to the mean. For high values of (usually the case of voxel-wise tasks), the sums explode leading to numerical instability. For example, given a perfectly balanced data, we have , irrespective of the value of . Thus, increasing the size of the dataset (batch size) has no effect on the weights (). However, the number of elements in the sums increases, causing the computations to be unstable.
High False Positive Rate
We observe a high rate of false positives leading to high recall values. This is caused by the fact that in most cases the object of interest accounts for less than 5% of the total voxels (about 2.5 % in our case). Therefore, we have a situation where , which implies that wrongly predicting background voxels as foreground is less penalized in the loss than predicting foreground voxels as background. This leads to high false positive rate and, hence, high recall values.
To address these challenges, we introduce different weighting ratios and an additional factor to take care of the high false positive rate; and define:
is a more numerically stable version of Eq. 6 since it computes the voxel-wise, cost which scales well with the size of the dataset or batch. But the ratio of to is maintained as desired. (FP Rate Correction) is introduced to penalize the network for false predictions. However, we do not want to give false positive () and false negatives () the same weight as total predictions (,), since we will end up with a loss function without any class balancing because the weights will offset each other. Therefore, we introduce and , which depend on the mean absolute distance of the wrong predicted probabilities to 0.5 (the value can be changed to suit the task). This allows us to penalize false predictions, which are very far from the central point (0.5). The false predictions () are obtained through a 0.5 probability threshold. Experimental results from application of FP rate correction can be found in Section B.
Iii-C Synthetic Data Generation and Preparation
To generate synthetic data, we follow the method of  which considers the mutual interplay of arterial oxygen () supply and vascular endothelial growth factor (VEGF) secreted by ischemic cells to achieve physiologically plausible results. Each vessel segment is modeled as a rigid cylindrical tube with radius and length . It is represented by a single directed edge connecting two nodes. Semantically, this gives rise to four different types of nodes, namely root, leaf, bifurcation, and inter nodes. Each node is uniquely identified by the 3-D coordinate . Combining this with connectivity information, fully captures the geometry of the approximated vasculature. Radius of parent bifurcation branch , and the radius of left () and right () daughter branches are related by a bifurcation law (also known as Murray’s law) given by , where is the bifurcation exponent. Further constraints:
are placed on the bifurcation angles of the left () and right () respectively, which geometrically corresponds to the optimal position of the branching point with respect to a minimum volume principle . The tree generation model and the bifurcation configuration is shown in Fig. 5. In the arterial tree generation experiment, the parameters in Table 1 of  are used. We use the default (underlined) values for all model parameters and generate 136 volumes.
The output of the generation process is a tree with information on the 3-D position of the nodes, their type (root, bifircation, inter, leaf), and connectivity information, which includes the edge between two nodes and , and its radius . We used this information to construct the actual 3-D volume by modeling each vessel segment as a cylinder in 3-D space. Vessel intensities are randomly chosen in the interval and non-vessel intensities are chosen from the interval . Gaussian noise is then applied to the generated volume randomly changing the mean (i.e. in the range
) and the standard deviation (i.e. in the range) for each volume.
Iii-D Network Architecture and Implementations
We construct a Fully Convolutional Network FCN with four convolutional layers and a sigmoid classification layer. In this implementation, we do not use any down-sampling layer and we carry out the convolutions in a way that the output image is of the same size as the input image by zero-padding. The removal of the down-sampling layer is motivated by the fact that the tasks (vessel segmentation, centerline prediction, and bifurcation detection) involve fine detailed objects and down-sampling has an effect of averaging over voxels which causes these fine details to be lost. With this network implementation, we have a very simple 5-layer fully-convolutional network, which takes a volume of arbitrary size and outputs a segmentation map of the same size. For the network structure and a description of the parameters, see Fig.6.
As a proof of principle, we take the V-Net architecture proposed by  (see Fig. A.6 in the Appendix) and replace all 3-D convolutions with our proposed cross-hair filters discussed in section III-A. The aim is to test the improvement in memory requirements as well as the speed up that our cross-hair implementation provides over the original architecture proposed in . We also use it to evaluate whether speed and memory consumption have a significant effect on prediction accuracy.
Network Configuration, Initialization, and Training
We use the above described architecture to implement three binary networks for vessel segmentation, centerline prediction, and bifurcation detection. Network parameters are randomly initialized, according to the method proposed in 
, by sampling from a uniform distribution in the intervalwhere is the size of the given kernel in a particular layer. For each volume, we extract non-overlapping boxes of size (
) covering the whole volume and then feed them through the network for the fine-tuning of parameters. After this, we then train the network using a stochastic gradient descent without regularization. During pre-training, we use a learning rate of 0.01 and decay of 0.99, which is applied after every 200 iterations. For fine-tuning, we use a learning rate of 0.001 and a decay of 0.99 applied after every 200 iterations. We implement our algorithm using the THEANO Python framework and train on a machine with 64GB of RAM and Nvidia TITAN X 12GB GPU.
Iv Experiments, Results and Discussion
In this work, we use three different datasets to train and test the networks. In all three data sets, the test cases are kept apart from the training data and are used only for testing purposes.
Training convolutional networks from scratch typically requires significant amounts of training data. However, assembling a properly labeled dataset of 3-D curvilinear structures, such as vessels and vessel features, takes a lot of human effort and time, which turns out to be the bottleneck for most medical applications. To overcome this problem, we generate synthetic data based on the method proposed in . A brief description of this process has already been presented in Section III-C. We initialize the processes with different random seeds and scale the resulting vessel sizes in voxels to match the sizes of vessels in clinical datasets. After this, we add different levels of Gaussian noise, as mentioned earlier in Section III-C to increase the randomness and to make it more realistic. We generate 136 volumes of size with corresponding labels for vessel segmentation, centerlines, and bifurcation detection. We then select twenty volumes out of the 136 as a test set for the pre-training phase and use the remaining volumes for pre-training in the various tasks at hand. An example of the synthetic dataset can be found in Fig. 7(c).
Clinical MRA Time-of-Flight (TOF) Dataset
To fine-tune and test our network architectures on real data, we obtain 40 volumes of clinical TOF MRA of the human brain, 20 of which are fully annotated and the remaining 20 partially annotated using the method proposed by . Each volume has a size of and spacial resolution of on the coronal, sagittal, and axial axes respectively. We select 15 out of the 20 fully annotated volumes for testing and use the remaining five as a validation set. We also correct the 20 partially annotated volumes by manually verifying some of the background and foreground voxels. This leads to three labels, which are true foreground (verified foreground), true background (verified background), and the third class, which represent the remaining voxels not verified. After this, we use the true foreground and background labels to fine-tune our network after pre-training with the synthetic dataset. This approach helps in avoiding any uncertainty with respect to using the partially annotated data for fine-tuning of the network. A sample of volume from the TOF MRA dataset can be found in Fig. 7(a).
Synchrotron Radiation X-ray Tomographic Microscopy (SRXTM)
A 3-D volume of size and spacial resolution is obtained from synchrotron radiation X-ray tomographic microscopy of a rat brain. From this large volume, we extract a dataset of 20 non-overlaping volumes of size , which were segmented using the method proposed by  and use them to fine-tune the network. To create a test set, we manually annotate 52 slices in 4 other volumes (208 slices in total). Detailed description of the SRXTM data can be found in , and a sample volume is presented in Fig. 7(b).
Iv-B Data Preprocessing with Intensity Projection
In our experiments, we use different datasets that come from different sources and acquisition modalities. Therefore, we need to normalize their intensity ranges and contrast. We test the following preprocessing strategies to achieve homogeneity between the datasets.
First, the original intensities are normalized to the range using
where is the pixel intensity and denotes the range of all intensities in the volume. The second strategy involves clipping the intensity values by and then normalizing the intensities by . Our experiments show that a value of is optimal for the intensity range of the datasets. The final strategy builds on the second preprocessing strategy by clipping by , normalizing by , and then projecting the resulting intensities by the function . In our experiments, we test quadratic and cubic projections (i.e. and , respectively).
Iv-C Evaluating DeepVesselNet Performance
Prior to evaluating the performance of DeepVesselNet, we conducted a series of experiments to test the components of DeepVesselNet which includes the preprocessing strategies, fast cross-hair filters, the FP rate correction, and pre-training on synthetic data. These experiments and their results are discussed in appendix B. In this subsection, we retain the best training strategy from the above described experiments and assess the performance of our proposed network architecture with other available methods mainly on the vessel segmentation task. As a further validation of our methodology we handle centerline prediction, and bifurcation detection using the proposed architectures. Given a good vessel segmentation, centerline prediction and bifurcation detection tasks can best be handled by applying vessel skeletonization as a post processing step. Our aim in applying our architecture to handle these tasks is not to show superiority over the existing vessel skeletonization methods but it is to serve as a further verification of the effects of our described methodology. We therefore discuss mainly results on vessel segmentation and briefly mention the results from centerline and bifurcation task. The full result and discussion of these experiments are therefore given in the appendix section C.
Experiment 1: Vessel Segmentation
We pre-train DeepVesselNet-(FCN and VNet) architectures (from Figs. 6 and A.6) on synthetic volumes for vessel segmentation and evaluate its performance on TOF MRA volumes. We then fine-tune the networks with additional clinical TOF MRA data, repeating the evaluation. Table I reports results of these tests, together with performances of competing methods. We obtain a Dice score of 0.81 for DeepVesselNet-FCN and 0.80 for DeepVesselNet-VNet before, and 0.86 (DeepVesselNet-FCN) as well as 0.84 (DeepVesselNet-VNet) after fine tuning. Generating vessel segmentations takes less than 13s per volume using DeepVesselNet-FCN and 20s for DeepVesselNet-VNet. Table I also reports results from the methods of  (V-Net) and 
both of which are outperformed by DeepVesselNet-FCN both in terms of speed (execution time) and Dice score. Comparing DeepVesselNet-VNet and original V-Net on the MRA data, we find a small advantage for the latter in terms of Dice score which cannot be considered as significant (0.8425 and 0.8497 respectively with sample standard error of 0.0066 and T-test significance probability of 0.1950). However, DeepVesselNet-VNet has the advantage of being six seconds faster (about 23% improvement) during prediction, a time difference that will scale up with volume size and filter sizes, and our results show that cross-hair filters can be used in DeepVesselNet at a little to no cost in terms of vessel segmentation accuracy.
|Schneider et al.||99.47||99.56||99.52||n/a|
|TOF MRA||DVN-FCN (F)||86.44||86.93||86.68||13 s|
|DVN-FCN (P)||82.76||80.25||81.48||13 s|
|DVN-VNet (F)||85.00||83.51||84.25||20 s|
|DVN-VNet (P)||83.32||77.12||80.10||20 s|
|V-Net (F)||84.34||85.62||84.97||26 s|
|V-Net (P)||82.41||75.82||78.98||26 s|
|Schneider et al.||84.81||82.15||83.46||min|
|Forkert et al.||84.99||73.00||78.57||n/a|
|Schneider et al.||95.15||91.51||93.30||23 min|
|Schneider et al.||77.18||85.08||80.94||n/a|
Experiment 2: Centerline Prediction
For centerline prediction, we train DeepVesselNet on the synthetic dataset and test it on synthetic as well as clinical datasets. The network uses the probabilistic segmentation masks from experiment 1 as an input (together with the same training and test splits described in experiment 1) and uses the vessel predictions to restrict predicted centerlines to be within vessel regions. We obtain a Dice score of 0.80 for DeepVesselNet-FCN, outperforming all other methods by a margin of more than 5% in Dice score, these methods include Schneider et al  which is based on morphological operations for vessel skeletonization (the state of the art). DeepVesselNet-FCN is also better than the competing methods in terms of execution time. Fig. 8 shows a qualitative example of centerline prediction with DeepVesselNet-FCN. Full quantitative results are in Table V in the appendix.
Experiment 3: Bifurcation Detection
For a quantitative evaluation of DeepVesselNet in bifurcation detection, we use synthetically generated data, and adopt a two-input-channels strategy. We use the vessel segmentations from experiments 1 as one input channel and the centerline predictions from Experiment 2 as a second input channel relying on the same training and test splits as in the previous experiments. Results from Table I and Fig. A.5 show that DeepVesselNet-FCN performs better than the other architectures in 5 out of 6 metrics with a detection Dice score of 85% and again showing superiority in execution time. DeepVesselNet-FCN again performs better than Schneider et al  which is based on first reconstructing the underlying network and then assigning bifurcation points based on node connectivity. Fig. 9 shows a qualitative example of bifurcation detection with DeepVesselNet-FCN. Full qualitative results are in Table VI in the appendex.
V Summary and Conclusions
We present DeepVesselNet, an architecture tailored to the challenges of extracting vessel networks and network features using deep learning. Our experiments show that the cross-hair filters, which is one of the components of DeepVesselNet, performs comparably well as 3-D filters and, at the same time, improves significantly both speed and memory usage, easing an upscaling to larger data sets. Another component of DeepVesselNet, the introduction of new weights and the FP rate correction in the class balancing loss function helps in maintaining a good balance between precision and recall during training. This turns out to be crucial for preventing over and under-segmentation problems, which are common problems in vessel segmentation. Finally, we successfully demonstrated that transfer learning of DeepVesselNet through pre-training on synthetically generated data improves segmentation and detection results, especially in situations where obtaining manually annotated data is a challenge.
As future work, we will generalize DeepVesselNet to multiclass segmentation, handling vessel segmentation, centerline prediction, and bifurcation detection simultaneously, rather than in three subsequent binary tasks. We also expect that network architectures tailored to our three hierarchically nested classes will improve the performance of the DeepVesselNet, for example, in a single, but hierarchical approach starting from a base network for vessel segmentation, additional layers for centerline prediction, and a final set of layers for bifurcation detection. The current implementation (cross-hair layers, the networks, cost function), and future extensions of DeepVesselNet, will be made publicly available on Github.
-  R. Rigamonti, A. Sironi, V. Lepetit, and P. Fua, “Learning separable filters,” in
-  H. R. Roth, L. Lu, A. Seff, K. M. Cherry, J. Hoffman, S. Wang, J. Liu, E. Turkbey, and R. M. Summers, “A new 2.5d representation for lymph node detection using random sets of deep convolutional neural network observations,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI 2014. Springer Int. Publishing, 2014, pp. 520–527.
-  S. Liu, D. Zhang, Y. Song, H. Peng, and W. Cai, “Triple-crossing 2.5d convolutional neural network for detecting neuronal arbours in 3d microscopic images,” in Mach. Learn. in Med. Imaging. Springer Int. Publishing, 2017, pp. 185–193.
-  J. W. Grzymala-Busse, L. K. Goodwin, W. J. Grzymala-Busse, and X. Zheng, “An approach to imbalanced data sets based on changing rule strength,” in Rough-Neural Computing: Techniques for Computing with Words, S. K. Pal, L. Polkowski, and A. Skowron, Eds. Berlin, Heidelberg: Springer, 2004, pp. 543–553.
-  P. F. Christ, F. Ettlinger, F. Grün, M. E. A. Elshaera, J. Lipkova, S. Schlecht, F. Ahmaddy, S. Tatavarty, M. Bickel, P. Bilic, M. Rempfler, F. Hofmann, M. D. Anastasi, S.-A. Ahmadi, G. Kaissis, J. Holch, W. Sommer, R. Braren, V. Heinemann, and B. Menze, “Automatic liver and tumor segmentation of ct and mri volumes using cascaded fully convolutional neural networks,” ArXiv e-prints, Feb. 2017.
-  G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73, pp. 220 – 239, 2017.
-  M. Schneider, J. Reichold, B. Weber, G. Székely, and S. Hirsch, “Tissue metabolism driven arterial tree generation,” Med. Image Anal., pp. 1397–1414, 2012.
-  D. Szczerba and G. Székely, “Simulating vascular systems in arbitrary anatomies,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI: 8th Int. Conf., Palm Springs, CA, USA, October 26-29, 2005, Proc.,. Berlin, Heidelberg: Springer, 2005, pp. 641–648.
-  C. Kirbas and F. Quek, “A review of vessel extraction techniques and algorithms.” ACM Comput. Surv, vol. 36, pp. 81–121, 2004.
-  D. Lesage, E. Angelini, I. Bloch, and G. Funka-Lea, “A review of 3d vessel lumen segmentation techniques: models, features and extraction schemes.” Med. Image Anal., vol. 13(6), p. 819–845, 2009.
-  M. E. Martínez-Pérez, A. D. Hughes, A. V. Stanton, S. A. Thom, A. A. Bharath, and K. H. Parker, “Retinal blood vessel segmentation by means of scale-space analysis and region growing,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI. Berlin, Heidelberg: Springer, 1999, pp. 90–97.
-  D. Nain, A. Yezzi, and G. Turk, “Vessel segmentation using a shape driven flow,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI. Berlin, Heidelberg: Springer, 2004, pp. 51–59.
-  A. C. S. Chung and J. A. Noble, “Statistical 3d vessel segmentation using a rician distribution,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI: Second Int. Conf., Cambridge, UK, Proc., C. Taylor and A. Colchester, Eds. Berlin, Heidelberg: Springer, 1999, pp. 82–89.
-  W. Liao, K. Rohr, and S. Wörz, “Globally optimal curvature-regularized fast marching for vessel segmentation,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI. Berlin, Heidelberg: Springer, 2013, pp. 550–557.
-  R. Moreno, C. Wang, and Ö. Smedby, “Vessel wall segmentation using implicit models and total curvature penalizers,” in Image Anal.: 18th Scandinavian Conf., Proc. Berlin, Heidelberg: Springer, 2013, pp. 299–308.
-  S. Young, V. Pekar, and J. Weese, “Vessel segmentation for visualization of mra with blood pool contrast agent,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI. Berlin, Heidelberg: Springer, 2001, pp. 491–498.
-  A. Dalca, G. Danagoulian, R. Kikinis, E. Schmidt, and P. Golland, “Segmentation of nerve bundles and ganglia in spine mri using particle filters,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI: 14th Int. Conf., Toronto, Canada, Proc. Berlin, Heidelberg: Springer, 2011, pp. 537–545.
-  C. Florin, N. Paragios, and J. Williams, “Globally optimal active contours, sequential monte carlo and on-line learning for vessel segmentation,” in Computer Vision – ECCV 2006: 9th European Conf. on Computer Vision, Graz, Austria, Proc. Berlin, Heidelberg: Springer, 2006, pp. 476–489.
-  S. Wörz, W. J. Godinez, and K. Rohr, “Probabilistic tracking and model-based segmentation of 3d tubular structures,” in Bildverarbeitung für die Medizin 2009: Algorithmen — Systeme — Anwendungen Proc. des Workshops. Berlin, Heidelberg: Springer, 2009, pp. 41–45.
-  S. Wang, B. Peplinski, L. Lu, W. Zhang, J. Liu, Z. Wei, and R. M. Summers, “Sequential monte carlo tracking for marginal artery segmentation on ct angiography by multiple cue fusion,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI: 16th Int. Conf., Nagoya, Japan, Proc. Berlin, Heidelberg: Springer, 2013, pp. 518–525.
-  A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. Viergever, “Multiscale vessel enhancement filtering,” in Med. Image Comput. and Computer-Assisted Intervention — MICCAI, W. M. Wells, A. Colchester, and S. Delp, Eds. Berlin, Heidelberg: Springer, 1998, pp. 130–137.
-  M. W. K. Law and A. C. S. Chung, “Three dimensional curvilinear structure detection using optimally oriented flux,” in Computer Vision – ECCV 2008, D. Forsyth, P. Torr, and A. Zisserman, Eds. Berlin, Heidelberg: Springer, 2008, pp. 368–382.
-  N. D. Forkert, A. Schmidt-Richberg, J. Fiehler, T. Illies, D. Möller, D. Säring, H. Handels, and J. Ehrhardt, “3d cerebrovascular segmentation combining fuzzy vessel enhancement and level-sets with anisotropic energy weights,” Magn. Reson. Imaging, vol. 31, pp. 262–271, 2013.
-  N. D. Forkert, A. Schmidt-Richberg, J. Fiehler, T. Illies, D. Möller, H. Handels, and D. Säring, “Fuzzy-based vascular structure enhancement in time-of-flight mra images for improved segmentation,” Methods of Information Medicine, vol. 50, pp. 74–83, 2011.
-  R. Phellan and N. D. Forkert, “Comparison of vessel enhancement algorithms applied to time-of-flight mra images for cerebrovascular segmentation,” Med. Physics, vol. 44, no. 11, pp. 5901–5915, 2017.
-  M. Schneider, S. Hirsch, B. Weber, G. Székely, and B. H. Menze, “Joint 3-d vessel segmentation and centerline extraction using oblique hough forests with steerable filters,” Med. Image Anal., vol. 19, no. 1, pp. 220–249, 2015.
-  D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes.” electron microscopy images. In NIPS, p. 2852–2860, 2012.
-  R. Phellan, A. Peixinho, A. Falcão, and N. D. Forkert, “Vascular segmentation in tof mra images of the brain using a deep convolutional neural network,” in Intravascular Imaging and Computer Assisted Stenting, and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. Springer Int. Publishing, 2017, pp. 39–46.
-  B. Shagufta, S. A. Khan, A. Hassan, and A. Rashid, “Blood vessel segmentation and centerline extraction based on multilayered thresholding in ct images,” Proc. of the 2nd Int. Conf. on Intelligent Systems and Image Processing, pp. 428–432, 2014.
-  M. Maddah, A. Afzali-khusha, and H. Soltanian, “Snake modeling and distance transform approach to vascular center line extraction and quantification,” Computerized Med. Imag. and Graphics, vol. 27 (6), pp. 503–512, 2003.
-  D. Chen and L. D. Cohen, “Piecewise geodesics for vessel centerline extraction and boundary delineation with application to retina segmentation,” in Scale Space and Variational Methods in Computer Vision: 5th Int. Conf., SSVM 2015, Lège-Cap Ferret, France, May 31 - June 4, 2015, Proc. Springer Int. Publishing, 2015, pp. 270–281.
-  A. Santamaría-Pang, C. M. Colbert, P. Saggau, and I. A. Kakadiaris, “Automatic centerline extraction of irregular tubular structures using probability volumes from multiphoton imaging,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI 2007: 10th Int. Conf., Brisbane, Australia, October 29 - November 2, 2007, Proc., Part II. Berlin, Heidelberg: Springer, 2007, pp. 486–494.
-  Y. Zheng, J. Shen, H. Tek, and G. Funka-Lea, “Model-driven centerline extraction for severely occluded major coronary arteries,” in Mach. Learn. in Medical Imaging: Third Int. Workshop, MLMI 2012, Held in Conjunction with MICCAI: Nice, France, Revised Selected Papers. Berlin, Heidelberg: Springer, 2012, pp. 10–18.
-  M. M. G. Macedo, C. Mekkaoui, and M. P. Jackowski, “Vessel centerline tracking in cta and mra images using hough transform,” in Progress in Pattern Recognition, Image Anal., Computer Vision, and Applications: 15th Iberoamerican Congress on Pattern Recognition, CIARP 2010, Sao Paulo, Brazil, November 8-11, 2010. Proc. Berlin, Heidelberg: Springer, 2010, pp. 295–302.
-  M. Rempfler, M. Schneider, G. D. Ielacqua, X. Xiao, S. R. Stock, J. Klohs, G. Székely, B. Andres, and B. H. Menze, “Reconstructing cerebrovascular networks under local physiological constraints by integer programming,” Med. Image Anal., pp. 86–94, 2015.
-  T. Chaichana, Z. Sun, M. Barrett-Baxendale, and A. Nagar, “Automatic location of blood vessel bifurcations in digital eye fundus images,” in Proc. of Sixth Int. Conf. on Soft Computing for Problem Solving: SocProS 2016, Volume 2. Singapore: Springer Singapore, 2017, pp. 332–342.
-  Y. Zheng, D. Liu, B. Georgescu, H. Nguyen, and D. Comaniciu, “3d deep learning for efficient and robust landmark detection in volumetric data,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI, N. Navab, J. Hornegger, W. M. Wells, and A. Frangi, Eds. Springer Int. Publishing, 2015, pp. 565–572.
-  J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” CoRR, abs/1411.4038, 2014.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation.” Med. Image Comput. and Computer-Assisted Intervention (MICCAI), vol. 9351, pp. 234–241, 2015.
-  K.-K. Maninis, J. Pont-Tuset, P. Arbeláez, and L. Van Gool, “Deep retinal image understanding,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI 2016: 19th Int. Conf., Athens, Greece, October 17-21, 2016, Proc., Part II. Springer Int. Publishing, 2016, pp. 140–148.
-  F. Milletari, N. Navab, and S. Ahmadi, “V-net: Fully convolutional neural networks for volumetric med. image segmentation,” in Fourth Int. Conf. on 3D Vision (3DV), 2016, pp. 565–571.
-  I. Nogues, L. Lu, X. Wang, H. Roth, G. Bertasius, N. Lay, J. Shi, Y. Tsehay, and R. M. Summers, “Automatic lymph node cluster segmentation using holistically-nested neural networks and structured optimization in ct images,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI 2016: 19th Int. Conf., Athens, Greece, October 17-21, 2016, Proc., Part II. Springer Int. Publishing, 2016, pp. 388–397.
-  H. R. Roth, L. Lu, A. Farag, A. Sohn, and R. M. Summers, “Spatial aggregation of holistically-nested networks for automated pancreas segmentation,” in Med. Image Comput. and Computer-Assisted Intervention – MICCAI 2016: 19th Int. Conf., Athens, Greece, October 17-21, 2016, Proc., Part II. Springer Int. Publishing, 2016, pp. 451–459.
-  A. Sekuboyina, A. Valentinitsch, J. Kirschke, and B. H. Menze, “A localisation-segmentation approach for multi-label annotation of lumbar vertebrae using deep nets,” arXiv, vol. 1703.04347, 2017.
G. Tetteh, M. Rempfler, C. Zimmer, and B. H. Menze, “Deep-fext: Deep feature extraction for vessel segmentation and centerline prediction,” inMach. Learn. in Med. Imaging. Springer Int. Publishing, 2017, pp. 344–352.
-  J.-J. Hwang and T.-L. Liu, “Pixel-wise deep learning for contour detection,” in ICLR, 2015.
-  S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. of the IEEE Int. Conf. on computer vision, 2015, pp. 1395–1403.
Y. Bengio and X. Glorot, “Understanding the difficulty of training deep
feedforward neuralnetworks,” in
Proc. of the 13th Int. Conf. on Artificial Intelligence and Statistics (AISTATS), vol. 9, 2010.
-  Theano Development Team, “Theano: A Python framework for fast computation of mathematical expressions,” arXiv e-prints, vol. abs/1605.02688, May 2016.
-  J. Reichold, M. Stampanoni, A. L. Keller, A. Buck, P. Jenny, and B. Weber, “Vascular graph model to simulate the cerebral blood flow in realistic vascular networks,” Journal of Cerebral Blood Flow & Metabolism, vol. 29, no. 8, pp. 1429–1443, 2009.
Appendix A Effect of Preprocessing Strategies
Appendix B Evaluating the DeepVesselNet Components
Experiment 1: Testing Preprocessing Strategies
In a first experiment we test the different preprocessing strategies proposed in Section IV-B. We carry out these experiments considering the synthetic dataset as ideal. Therefore, we train DeepVesselNet-FCN solely on our synthetic data for vessel segmentation and then use this trained network to segment vessels in the clinical MRA dataset after applying each of the four preprocessing strategies discussed in Section IV-B. This enables us to measure how similar our clinical MRA dataset is to the synthetic dataset after the preprocessing. Table II shows the outcome of these experiments. From the results, we observe that the intensity projections (quadratic and cubic) improve the results (a lot in terms of Dice score). Quadratic intensity projection performs slightly better than the cubic intensity projection. Therefore, we use the quadratic intensity projection in all following experiments.
Experiment 2: Fast Cross-hair Filters
To investigate the usefulness of the 3-D context information extracted by the cross-hair filters of DeepVesselNet, we construct a similar network as in Fig. 6 (DeepVesselNet-FCN) but replace the cross-hair filters by 2-D filters. We train this 2.5-D network on 2-D slices (as described in Section III-A) and then fuse the final slice-wise results to obtain the final volumetric predictions. We perform this experiment using the synthetic dataset for centerline prediction with the training and test splits discussed in Section IV-A and measure the performance of the two networks in terms of execution time per volume (Ex. time) and in terms of accuracy by the Dice score. Results are reported in Table III. We observe a benefit of 2.5-D network in terms of run time (6s vs. 13s) when compared to the 3-D network but at a cost of accuracy in terms of Dice Score (0.70 vs. 0.80), which can be explained by lack of 3-D context information.
|3-D (Cross-hair filters)||0.7763||0.8235||0.7992||13 s|
|2.5-D (2-D on slices)||0.6580||0.7472||0.7017||6 s|
Experiment 3: The FP Rate Correction Loss Function ()
To test the effect of FP rate correction loss function discussed in section III-B, we train DeepVesselNet-FCN architecture on a sub-sample of four clinical MRA volumes from scratch, with and without FP rate correction described in Eq. (7). We train for 5000 iterations and record the ratio of precision to recall every 5 iterations using a threshold of 0.5 on the probability maps. A plot of the precision - recall ratio during training without FP rate correction ( Only) and with FP rate correction () is presented in Fig. A.2. The results of this experiments suggest that training with both factors in the loss function, as proposed in Section III-B, keeps a better balance between precision and recall (i.e. a ratio closer to 1.0) than without the second factor. A balanced precision-recall ratio implies that the training process is not bias towards the background or the foreground. This helps prevent over-segmentation, which is caused by the introduction of the class balancing.
Experiment 4: Pre-training on Synthetic Data
We assess the usefulness of transfer learning with synthetic data by comparing the training convergence speed, and various other scores that we obtain when we pre-train DeepVesselNet-FCN on synthetic data and fine-tune on the clinical MRA dataset, compared to training DeepVesselNet-FCN from scratch on the clinical MRA. For this experiment, we only consider the vessel segmentation task, as no annotated clinical data is available for centerline and bifurcation tasks. Results of this experiment are reported in Table IV. We achieve a Dice score of 0.8639 for training from scratch without pre-training on synthetic data and 0.8668 when pre-training on synthetic data. This shows that training from scratch or pre-training on synthetic data does not make a big difference regarding the accuracy of the results. However, training from scratch requires about iterations more than pre-training on synthetic data for the network to converge (i.e. 50% more longer).
Appendix C Full Results and Discussion on Centerline and Bifurcation Detection tasks
Experiment 2: Centerline Prediction
For centerline prediction, we train DeepVesselNet on the synthetic dataset and test it on synthetic as well as clinical datasets. The network uses the probabilistic segmentation masks from experiment 1 as an input (together with the same training and test splits described in experiment 1) and uses the vessel predictions to restrict predicted centerlines to be within vessel regions. Qualitative results are presented in Figs. 8 and A.3 together with quantitative scores in Table V. We obtain a Dice score of 0.80 for DeepVesselNet-FCN, outperforming all other methods by a margin of more than 5%. Here we note that DeepVesselNet-FCN shows a higher precision (0.78 vs 0.48) than the mmorphological operations based method of Schneider et al.  which is one of the standard method for centerline prediction. The results also show that DeepVesselNet-FCN is better than the competing methods in terms of execution time. Again, DeepVesselNet-VNet performs slightly worse than V-Net in terms of the Dice score (0.67 vs. 0.75). However, DeepVesselNet-VNet has an advantage in speed of execution (17s vs. 23s) as expected.
|Schneider et al.||0.4807||0.8603||0.6168||n/a|
Experiment 3: Bifurcation Detection
For a quantitative evaluation of DeepVesselNet in bifurcation detection, we use synthetically generated data, and adopt a two-input-channels strategy. We use the vessel segmentations from experiments 1 as one input channel and the centerline predictions from Experiment 2 as a second input channel relying on the same training and test splits as in the previous experiments. In our predictions we aim at localizing a cubic region of size around the bifurcation points, which are contained within the vessel segmentation. We evaluate the results based on a hit-or-miss criterion: a bifurcation point in the ground truth is counted as hit if a region of a cube of size centered on this point overlaps with the prediction, and counted as a miss otherwise; a hit is considered as true positive (TP) and a miss is considered as false negative (FN); a positive label in the prediction is counted as false positive (FP) if a cube of size centered on this point contains no bifurcation point in the ground truth. Qualitative results on synthetic and clinical MRA TOF are shown in Figs. 9 and A.4, respectively. Results for  are obtained by first extracting the vessel network and then all nodes with two or more splits are treated as bifurcations – this being one of the standard methods for bifurcation extraction. In Fig. A.5, we present the precision-recall curve obtained by varying the threshold for converting probability maps, from the networks, into binary predictions. Results from Table VI and Fig. A.5 show that DeepVesselNet-FCN performs better than the other architectures in 5 out of 6 metrics. In our experiments, it became evident that V-Net tends to over-fit, possibly suffering from its high number of parameters to be determined from the rather few bifurcations present in our training data. This may explain why results for V-Net are worse than all other methods, also suggesting that in cases where little training data is available, the DeepVesselNet-FCN architecture may be the preferable.
|Method||Precision||Recall||Det. %||Mean Err||Err Std||Ex. Time|
|Schneider et al.||0.7718||0.8508||84.30||0.1529||0.7074||n/a|