1 Introduction
Centerline, or skeleton, provides a concise representation of the object topology. An ideal centerline extraction algorithm generates centerline points close enough to “centers” of the object crosssectionally, captures all “branches”, and has no false positive spurious branches. ^{†}^{†}* Zhihui Guo and Junjie Bai – equal contribution
Many semiautomated and automated approaches exist for centerline extraction. Morphological thinning and erosion based methods are popular in road centerline extraction [2]. However, centerlines extracted by these methods often come along with spurious branches and usually need adhoc post pruning.
In contrast, minimal path based centerline extraction methods guarantee a more structured centerline output by requiring explicitly specified endpoints. A centerline distance map (also called cost image) is generated from an object segmentation mask by methods such as Euclidean distance transform, which assigns smaller values to voxels closer to centerline and larger values to voxels farther away. A minimal path between the endpoints in the centerline distance map thus corresponds to the object centerline.
Minimal path algorithms are widely used in blood vessel centerline extraction [4, 2, 1, 5, 9]. In [4], Metz et al. adopted vesselness and region statistics as cost metrics and manually specified each branch endpoint. Gülsün et al. [1] used human selected features to compute flow field as cost image. Mirikharaji et al. [5] integrated a predefined tree topology and tubularity scores to get minimal paths. Zheng et al. [9]
used a machine learning based vesselness algorithm to generate the cost image. These methods require either human crafted features/priors or manual specification of branch endpoints.
Convolutional neural networks (CNN) have been prevalent in medical image analysis recently and achieved great success. There are three advantages of CNNbased methods over traditional methods. First, multilayer CNNs have enough capacity to learn complex functions that cannot be described by simple models. Second, CNNs do not require humans to select features and support endtoend training. Third, a single CNN model has the ability to handle multiple tasks.
Coronary computed tomography angiography (CCTA) is a noninvasive technique widely used in clinical practice for coronary artery disease detection. Given a coronary artery segmentation mask, extracting its centerline is a prerequisite step for automatic stenosis grading, calcium evaluation, plaque evaluation and visualization [7, 3]. Extracting coronary artery centerline from a segmentation mask faces multiple notable challenges. First, multiple branches, usually more than a dozen, with large intrasubject and intersubject variations of length, thickness, and shape are presented, forming a complex tree structure. Detecting all branches without false positive is quite challenging. Second, branch diameter changes significantly from proximal to distal portion of coronary artery. The proximal end can be several times wider than the distal end. Third, tortuous course of vessel branches hinders the performance of a minimal path based algorithm, by which straight paths are inherently preferred. Fourth, imperfections of segmentation masks such as brief touching of two nearby branches could lead to incorrect bifurcation in centerline.
To address these challenges, we propose a twohead multitask FCN which simultaneously generates a locally normalized distance map and a list of branch endpoints (Fig. 1). One head of the multitask FCN outputs a normalized centerline distance map that is scaleinvariant and robust to image segmentation imperfections. Logtransform and attention mechanism are also incorporated to increase model sensitivity. The other head of the FCN automatically detects the sparsely distributed endpoints of the object skeleton with high accuracy. The resulting distance map and endpoint list are fed into a minimal path extractor which gives the final centerline extraction results.
2 Method
The proposed method consists of two main steps (Fig. 1): a multitask FCN computing a locally normalized centerline distance map and a list of endpoints simultaneously, and a minimal cost path extractor taking the output of the FCN to generate a set of paths as centerline.
2.1 Multitask FCN architecture
The multitask FCN accomplishes two tasks: computing a normalized centerline distance map and detecting branch endpoints. As shown in Fig. 2
, the two tasks share the same encoder layers consisting of convolution (Conv), batch normalization (BN), ReLU activation function and max pooling (shown as ‘Down’ in Fig.
2) operations. The two tasks then have different decoder layers tailored for each task, consisting of conv, BN, ReLU, and upsampling operations. Skip connections are applied at the same scale between encoder and decoder layers to make effective use of both highlevel and lowlevel features similar to [6]. Suppose a volumetric segmentation mask is , and denotes the set of all voxel locations in the image.2.1.1 Centerline distance map
A centerline distance map is defined as an image whose voxel intensity shows how close each voxel is to the nearest centerline point. Due to the large variations in branch radius (coronary artery proximal radius can be five times bigger than the distal radius), a straightforward Euclidean distance transform computation generates centerline distance map with largely variable range of values at different sections of the branch. To obtain a centerline consistently wellpositioned in the “center” from beginning to end requires tricky balancing of cost image contrast between thick and thin sections.
To achieve the desired scaleinvariance property, we propose to use FCN to generate a locally normalized centerline distance map. More specifically, during training, a local crosssectional view of the segmentation mask perpendicular to the centerline tangent direction at each centerline point is obtained. Suppose the set of all foreground voxels in the crosssectional view is . Then the locally normalized distance map value for voxel index is computed as
(1) 
where is the Euclidean distance of voxel to the centerline point on the view. To further highlight contrast at portions closer to centerline, logtransform is applied to generate the reference centerline distance map
(2) 
where is the locally normalized centerline distance map throughout the whole segmentation mask image and is a small positive constant to avoid numerical issues. Note that the distance computation in Eq.(1) and Eq.(2) is only carried out when generating training reference standards. During testing phase, the FCN will directly predict centerline distance map .
2.1.2 Spatialwise and channelwise attention for centerline distance map
Traditionally the convolutional features at different spatial locations and channels are treated equally by the following layers. However, centerline extraction is inherently a localized task. Specifically, the narrow region surrounding the underlying centerline requires most attention for best discriminating contrast. Thus, a spatialwise attention module is proposed to weight feature maps at different spatial locations. Similarly, different channels of the feature maps can highlight different regions (some channels may focus more around centerline, while other channels may focus more on object mask boundary, etc). It also makes sense to add a channelwise attention to weight different channels accordingly. Fig. 3 shows the proposed spatialwise and channelwise attention module, inspired by [8]. The feature map is first weighted by channelwise attention, generating , and then weighted by the spatial attention, generating .
Channelwise attention weights different channels by vector
, where . To obtain this weighting vector, average pooling is first applied on each channel of the feature map to obtain a summarized channel feature vector . Then a convolutional layer and a ReLU nonlinearity are added to obtain the raw attention weights . In Eq.(3), is the convolution operator. is the convolutional kernel andis the bias vector. A softmax function is applied on the raw attention weights
to obtain the final channelwise attention vector , as shown in Eq.(4).(3)  
(4) 
To apply the channelwise attention weights, the input feature map at channel is multiplied by attention weight to obtain channelweighted feature map
(5) 
Spatialwise attention weight matrix is obtained similar to the channelwise attention. The raw spatial attention map is computed by applying a convolutional layer with bias and a ReLU nonlinearity to (Eq.(6)). Then a softmax function is used to obtain the final spatial attention weights , where . The spatial attention weight is applied by multiplying with features at location (Eq.(8)).
(6)  
(7)  
(8) 
2.1.3 Branch endpoint detection
Different from centerline distance map which consists of continuous values inside the whole segmentation mask, branch endpoints are just a few isolated points. Directly predicting these points using a voxelwise classification or segmentation framework is not feasible due to the extreme class imbalance. To tackle the class imbalance problem, a voxelwise endpoint confidence map is generated by constructing a Gaussian distribution around each endpoint to occupy a certain area spatially. The FCN is then trained to predict the endpoint confidence map, which has a more balanced ratio between nonzero and zero voxels.
Specifically, a spatial Gaussian field is generated around each endpoint
(9) 
where is the geodesic distance from voxel to the nearest branch endpoint inside the segmentation mask. controls the scale of the Gaussian field.
In testing phase, the predicted endpoint confidence map is thresholded by half of the maxium possible value, i.e., . The centroid of each connected component is then returned as branch endpoints.
2.1.4 Loss function
The loss function shown in Eq. (
10) consists of two terms, one for the centerline distance map prediction and the other for the branch endpoint detection. We enforce the loss function to only account for regions inside the segmentation mask. Suppose the segmentation mask is . is 1 for every voxel inside the segmentation mask, and 0 otherwise.(10) 
In Eq. (10), and are the reference standard and the predicted centerline distance map. and are the reference standard and the predicted endpoint confidence map. denotes the Hadamard matrix product operation, which is elementwise multiplication of two matrices. is a weighting factor that balances losses of centerline distance map and branch endpoint confidence map.
2.2 Minimal path extraction
Given a root point, a list of branch endpoints, and the underlying centerline distance map, a minimal path algorithm is used to extract the centerline of a treestructured object.
We construct an undirected graph , where set contains all vertices corresponding to voxels in the segmentation mask and set includes all edges connecting two neighboring vertices in set under a 26neighborhood setting. On each vertex , weight is set according to the centerline distance map. Note that each vertex carries a weight that is smaller when the corresponding voxel is closer to the centerline location, and larger when it is farther away from centerline. Given a starting vertex and an ending vertex , a minimal path between the two is defined as such that (1). , ; (2). every two neighboring vertices in the path is connected by an edge; (3). the sum of vertex weights along this path is minimized (Eq. (11)).
(11) 
Such a minimal path from to corresponds to the desired centerline between the two points. To extract the centerline of a treestructured object such as a coronary artery tree, one root point usually correspond to multiple branch endpoints . In this case, the minimal path between root point and each endpoint is computed respectively. Then we trace each path from the end to the start point sequentially. Once the current path intersects with some previously traced paths, it is merged into the previously traced paths. The centerline points are finally smoothed by an iterative mean filtering in a small window for smoother appearance.
3 Experiments and Results
3.1 Experimental design
To evaluate the proposed method, 620 volumetric coronary CTA scans of 620 patients are used. The image spacing is first normalized to . Coronary arteries and ascending aorta are segmented by a semiautomatic software with manual review and editing. The segmentation masks of coronary arteries and ascending aorta serve as input to the experiment. Since coronary artery originates from ascending aorta, the root points of each coronary vessel tree are readily available as the artery voxels connected to aorta. To simplify notation, we use CL as a shorthand for ‘centerline’.
Manual annotations of centerline are hard to obtain due to the complex 3D structure of vessels and the singlepixelwide requirement. Thus, during training, centerlines extracted by a stateoftheart traditional method (called baseline) serve as the training reference truth for DeepCL. During testing, the degree of matching between DeepCL and baseline is first studied as a sanity check. Then various metrics requiring no “truth” centerline such as centerline to segmentation mask Hausdorff distance, and independent human expert review, are utilized to evaluate DeepCL and baseline method.
3.1.1 Baseline
The baseline method is also a minimal path approach. However, both branch endpoints and centerline distance map are computed by traditional methods. The centerline distance map is derived from the Euclidean distance map
of the segmentation mask by summing three sigmoid functions to highlight the contrast in regions close to the centerline area.
(12) 
Three pairs of parameters (), each controlling the width and level of a contrast window, are tuned to enhance central contrast for vessel segments with large, medium, and small diameters respectively. Summing of these three sigmoid functions results in a relatively good contrast around centerline area throughout the whole vessel tree. The branch endpoints are detected as local maxima of the arrival time by a breadth first search, starting from the root point at the junction of aorta and artery to each voxel throughout the artery segmentation mask. All related parameters are tuned on a different dataset.
3.1.2 DeepCenterline
We randomly divided 620 scans into three dataset: 200 scans for training, 20 scans for validation and 400 scans for testing. On the training set, centerlines extracted by the baseline method are used as the reference truth. Although generated by a strong baseline method, the reference truth still contains errors such as missing branches, wrong bifurcations in case of imperfect segmentation mask, etc. The locally normalized centerline distance map and the branch endpoint confidence map are generated based on the reference truth. The parameters of the proposed method are tuned based on the validation set. The tuned model is applied to the 400 testing scans to evaluate the performance.
The input patch size is
voxels. The standard deviation of Gaussian field
in Eq. (9) is set to 3 mm. The loss weighting coefficient in Eq. (10) is 0.5. Our multitask FCN network is optimized by stochastic gradient descent, with batch size of 3. Total number of epochs is 20. The initial learning rate is
, which is divided by a factor of 2 every 5 epochs.3.1.3 Evaluation metrics
Several evaluation metrics based on either objective metrics or independent human expert review are used for a thorough comparison of the performance of baseline method and DeepCL on the test set.

Mean centerline to centerline distance. The mean absolute distance from centerline A to centerline B is defined as the mean of the absolute distance to the nearest point on B for every point on A.

Coverage percentage. A point on centerline A is covered by centerline B if the closest point on B is within half a voxel (0.2 mm).

Number of missing endpoints. The number of endpoints not found by automated algorithm is counted manually in patientlevel as well as branchlevel.

Number of scans with wrong bifurcations. This usually happens when two branches are spatially close or even briefly joined at a certain section. The centerline could wrongly consider this brief joining as a bifurcation.

Average patientlevel centerline length. The patientlevel centerline length is computed as the sum of lengths of each centerline segment. In general, the less straight “shortcut” centerline takes, and/or the more branch endpoints are detected, the longer centerline will be.

Hausdorff distance. Hausdorff distance is defined as the maximum of distances from every artery segmentation mask voxel to the closest centerline point. Hausdorff distance shows how close each segmentation voxel is being covered by the extracted centerline.

Overall success rate. A centerline extraction is called fully successful when an expert reviews the centerline and determines that the centerline covers all branches sufficiently, has no spurious false positive branch, no wrong bifurcation, and no obvious deviation from the center throughout all sections.
3.2 Results
Fig. 4 displays three examples of coronary artery segmentation masks overlaid with centerlines extracted by DeepCL. For each coronary artery, radius changes substantially from the proximal to the distal side. Different coronary arteries have large variations of vessel curvature, shape and branch topology. Despite all these difficulties, our method is able to extract wellpositioned centerline for all branches without false positive branches.
3.2.1 Matching to baseline
baseline DeepCL  DeepCL baseline  

mean distance (mm)  0.0660.053  0.0680.014 
being covered by (%)  99.70.7  99.50.6 
Table 1 shows the degree of matching between DeepCL and baseline method. The mean centerline distance and “being covered by” percentage both shows how close/well one centerline is being covered by the other centerline. The low mean distance value and the very high coverage percentage on both direction (baseline to DeepCL and DeepCL to baseline) show that the two methods are in good alignment in general. However, a larger portion of baseline centerline points are being covered by DeepCL centerline (99.7%) by a smaller distance (0.066 mm) than the other way around (99.5%, 0.068mm). This implies DeepCL provides slightly better coverage than baseline, which will be assessed in detail in the following analysis.
Notably, both DeepCL and baseline generate no spurious false positive branches in the extracted centerline according to visual inspection.
3.2.2 Performance difference
metrics  raw number #/#  ratio %  

baseline  DeepCL  baseline  DeepCL  

patientlevel  170/400  34/400  42.5%  8.5%  
branchlevel  233/6048  35/6048  3.9%  0.6%  
scans with wrong bifurcation  28/400  11/400  7.0%  2.8%  
CL length (mm)  308.9  314.3      
overall success rate  217/400  355/400  54.3%  88.8% 
Table 2 shows a detailed analysis regarding the difference between results generated by DeepCL and baseline. Bold items show the method with better performance on each metric. DeepCL finds more branch endpoints on both branchlevel and patientlevel. The number of wrong bifurcations shown in DeepCL is also less than that in baseline. Besides, DeepCL increased the average patientlevel centerline length, due to finding of more endpoints and staying to the center instead of taking straight shortcuts at tortuous regions. Overall, the percentage of scans with successful centerline extraction without any type of error on any branch is substantially increased from 54.3% to 88.8%.
Fig. 5(a) shows qualitative comparison of both methods. Compared to baseline, DeepCL shows significant improvement in finding more endpoints, reducing number of wrong centerline bifurcations at region with vessels close together, and staying at center instead of taking straight shortcut at regions with high curvature.
Fig. 5(b) shows the distribution of patientlevel Hausdorff distance from any segmentation mask voxel to centerline. A smaller Hausdorff distance means that all voxels in the segmentation mask are “covered” by a closer nearby centerline point. The majority of Hausdorff distances for DeepCL centerlines form a peak around 1.7 mm. In contrast, the baseline method has a longer tail towards higher Hausdorff distance values, with a significant percentage of scans having Hausdorff distance above 2 mm. This shows that DeepCL covers all segmentation mask voxels more closely.
3.2.3 Importance of attention
Fig. 6 compares the centerline distance maps generated with and without the attention module. As shown in Fig. 6(e), the centerline distance map has a clear peak around the central location by using attention. In contrast, the CL distance map generated by model without attention results in a “plateau” for a large area. If this situation occurred at regions with high curvature, then the minimal path extractor can easily pick a straight shortcut passing through noncenter plateau points as the resulting centerline. This problem is greatly alleviated by utilizing the attention module to improve the centerline distance map contrast around real centerline point.
4 Discussion
The proposed method tackles multiple longexisting challenges of centerline extraction. A novel branch endpoint detection algorithm using Gaussianfield based endpoint confidence map is developed to detect the extremely sparse branch endpoints. The centerline distance map is made scaleinvariant to the substantial diameter change of vessel branches from proximal to distal sections through local normalization within each crosssectional view. The scaleinvariant centerline distance map helps generate wellpositioned centerline throughout all sections. Logtransform and attention module are utilized to further highlight the central locations, aiding accurate localization of the singlepixelwide centerline. The large model capacity of FCN provides robustness to minor imperfections of segmentation masks.
Note that the reference “true” centerlines used in the training phase are results generated by the baseline method without manual correction, due to the difficulty of manual correction of a singlepixelwide centerline. Despite of this disadvantage, our method achieves better performance than baseline on the test set. This shows the good generalization ability of the proposed model. A further study topic is to use our current method’s output as reference standard to train another FCN model. It would be interesting to see whether this second model would further improve upon the firstgeneration model.
5 Conclusion
We propose a novel centerline extraction framework which combines a multitask FCN computing a locally normalized centerline distance map and detecting branch endpoints, with a minimal path extractor. The proposed method is the first deeplearning based centerline extraction method that guarantees singlepixelwide centerline for a complex treestructured object. Designed to be robust to substantial scale changes at different locations and minor imperfections of segmentation mask, the proposed method generates centerlines with more complete and closer coverage of segmentation masks without false positive branches.
5.0.1 Acknowledgement
The authors would like to thank Xiaoyang Xu and Bin Ouyang for organizing the dataset. This study has received funding by Shenzhen Municipal Government (KQTD2016112809330877).
References
 [1] Gülsün, M.A., FunkaLea, G., Sharma, P., Rapaka, S., Zheng, Y.: Coronary centerline extraction via optimal flow paths and CNN path pruning. In: MICCAI. pp. 317–325. Springer (2016)

[2]
Jin, D., Iyer, K.S., Chen, C., Hoffman, E.A., Saha, P.K.: A robust and efficient curve skeletonization algorithm for treelike objects using minimum cost paths. Pattern Recognit Lett
76, 32–40 (2016)  [3] Kirişli, H., Schaap, M., Metz, C., Dharampal, A., Meijboom, W.B., Papadopoulou, S.L., Dedic, A., Nieman, K., De Graaf, M., Meijs, M., et al.: Standardized evaluation framework for evaluating coronary artery stenosis detection, stenosis quantification and lumen segmentation algorithms in computed tomography angiography. MedIA 17(8), 859–876 (2013)
 [4] Metz, C., Schaap, M., Weustink, A., Mollet, N., van Walsum, T., Niessen, W.: Coronary centerline extraction from CT coronary angiography images using a minimum cost path approach. Med Phys 36(12), 5568–5579 (2009)
 [5] Mirikharaji, Z., Zhao, M., Hamarneh, G.: Globallyoptimal anatomical tree extraction from 3d medical images using pictorial structures and minimal paths. In: MICCAI. pp. 242–250. Springer (2017)
 [6] Ronneberger, O., Fischer, P., Brox, T.: Unet: Convolutional networks for biomedical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)
 [7] Xiong, G., Sun, P., Zhou, H., Ha, S., ó Hartaigh, B., Truong, Q.A., Min, J.K.: Comprehensive modeling and visualization of cardiac anatomy and physiology from CT imaging and computer simulations. IEEE Trans Vis Comput Graph 23(2), 1014–1028 (2017)
 [8] Zhang, X., Wang, T., Qi, J., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: CVPR. pp. 714–722 (2018)
 [9] Zheng, Y., Tek, H., FunkaLea, G.: Robust and accurate coronary artery centerline extraction in CTA by combining modeldriven and datadriven approaches. In: MICCAI. pp. 74–81. Springer (2013)