Retinal vessel segmentation is a fundamental and crucial step to develop a computer-aided diagnosis (CAD) system on retinal images 
. Although a lot of deep learning based works have been devoted to precise retinal vessel segmentation, few of them have paid attention to the connectivity of segmented vessels. In these methods, deep networks are developed to predict dense probability maps indicating how probable each pixel belongs to retinal vessels or not, and then retinal vessels are segmented by thresholding these maps. Many works have evaluated their methods by calculating some segmentation metrics, such as the area under the ROC curve (AUC). However, these metrics can not quantify the topology or connectivity of segmented vessels. Even if high values are achieved for them, breakpoints still exist on the binary segmentation results (see Fig.1). In this paper, we propose a novel deep network to improve the connectivity on retinal vessel segmentation and evaluate it by using some metrics that quantify the topology of segmented vessels.
The U-net architecture is widely adopted in many previous works for retinal vessel segmentation . The original U-net progressively connects high-level layers to shallow layers, which makes the semantics information embedded in the high-level layers to be gradually diluted. Semantics information is important for retinal vessel segmentation. It can not only provide more robust features to boost the segmentation of weak vessels while eliminating the effects of abnormal lighting and retinal pathology, but also give holistic cues for recognizing whole vessel trees. Thus, semantics information is valuable for removing the breakpoints in segmented vessels. Several previous works try to solve this problem by adding more semantics information into the U-net. Wang et al.  design a dual encoding U-Net with a context path to capture more semantic information. Xu et al.  improve U-net by introducing carefully designed semantics and multi-scale aggregation blocks. These works are devoted to designing a complicated network architecture by either adding a large number of connections  or inserting an extra sub-network that is relatively large . Different from these efforts, we apply several widely-used network block or operation to design a simple but efficient module, which can not only fully extract semantics information from high-level layers but also guide the network to learn powerful features for better vessel connectivity.
Besides, refinement is a usual manner to enhance segmentation in literatures and has been adopted to improve vessel segmentation in previous works . Wu et al.  refine the vessel segmentation by using an extra multi-scale based network that is cascaded to a preceding network. Ara et al.  have claimed that the connectivity of segmented vessels can be improved by stacking another variational auto-encoder based network after a previous one. The cascaded manner adopted in these approaches introduces lots of extra network parameters, which demands much more labelled data for training. In this paper, we adopt a different refinement that uses a single network and recursively refines its output. This refinement does not increase extra network parameters while enhancing the connectivity of segmented vessels.
Our contribution of this paper can be summarized as follows. 1) A simple but efficient network is proposed to boost connectivity in retinal vessel segmentation. 2) A semantics-guided module, which exploits semantics information to guide the network to learn more powerful features, is introduced to enhance the capacity of U-net. 3) A recursive refinement that requires no extra network parameters are exploited to iteratively refine the results.
2 Recursive Semantics-Guided Network
2.1 Network Architecture
We propose a recursive semantics-guided network to enhance the connectivity in retinal vessel segmentation. Fig. 2 illustrates the detailed network architecture, which is designed to address the semantics dilution problem of the original U-net. Semantics information is crucial for retinal vessel segmentation. It is usually embedded in high-level layers of deep network and less suffered from the effect of some abnormalities, such as non-uniform lighting and retinal pathology. It can also provide the holistic cue that is helpful for recognizing the whole blood vessel tree. Thus, it is required that semantics information should be fully exploited for boosting the vessel segmentation and connectivity. In this paper, we design a semantics-guided module that distills the semantics information for guiding the network to produce more powerful and robust features. Besides, we adopt a recursive refinement for further boosting the connectivity of segmented vessels. Different from the stacked or cascaded refinement, our method re-uses the proposed network that iteratively takes the previous results as the input to produce better output. We find that this refinement increases no extra network parameters while gradually boosting the connectivity of segmented retinal vessels.
The top part of Fig 2 shows the whole structure of the proposed network, which is based on a 3-layered encoder-decoder architecture. The encoder on each layer is a convolutional block comprised by two stacked convolutional layers with
filters and the rectified-linear unit (ReLU) activation function. These convolution blocks are serially connected by max-pooling that halves feature map sizes to produce hierarchical features in different levels. We carefully design the decoder part by introducing a semantics-guided module that can fully exploiting semantics information of the deep network. Besides, at the end part on the 2nd and 3rd layers, there is a side output path that uses upsampling andconvolution to obtain a prediction for deep supervision. In the following subsections, we describe the semantics-guided module, the recursive refinement and network training in details.
2.2 Semantics-Guided Module
The semantics-guided module is indicated by the red dash block in Fig. 2. It is mainly comprised by a pyramid pooling block  and two feature aggregation blocks. The pyramid pooling block is fed with feature maps generated on the 3rd layer. As pointed by some previous works , the receptive field of a deep network is not wide enough, even on the deepest layers. For retinal vessel segmentation, semantics information from a wider region can give more holistic cues to identify the global profile of a vessel tree for preventing breakpoints. In order to extract more holistic semantics information, we connect the pyramid pooling block to the deepest layer. The detailed structure of the the pyramid pooling block is given in the left bottom part in Fig. 2. At first, feature maps are fed into four parallel adaptive pooling layers to produce feature maps with the spatial sizes of , , and respectively. Then convolution followed by upsampling is used to compress channels to be and match their spatial sizes. Finally, they are concatenated with the original feature maps and passed to a convolution with a ReLU activation function to produce the final holistic semantics information, which is provided for the feature aggregation blocks.
The structure of feature aggregation blocks is given in the right-bottom part in Fig. 2. They exploit semantics information to guide the aggregation of more powerful features. Semantic information is inserted via semantics flows that utilize upsampling and convolution to match the feature maps coming from a deconvolution flow. An element-wise summation is used for fusion and then the results are concatenated with feature maps from the lateral connection. Finally, a convolution block consisting of two stacked convolution is used to learn more powerful feature representation.
2.3 Recursive Refinement & Network Training
The semantics-guided module can improve the connectivity of segmented vessels, however the results can be further improved by using a recursive refinement, which gradually boosts weak vessels and eliminates break points. In our refinement, predicted vessels together with the original image patch are iteratively fed into the same deep network to successively obtain a better result. A similar iterative approach is also advocated in previous works , showing that previous results can always be iteratively improved when Lipschitz continuity is assumed. Besides, this refinement is less demanding of labeled data, since it increases no extra network parameters for training. Our recursive refinement can be formulated as , where denotes input image patch, denotes the predicted results of a network in the -th iteration, denotes channel concatenation and is the total iteration number. The refinement is initialized with , which is an empty prediction, and the final result is .
Our network can be trained in an end-to-end manner by minimizing a weighted sum of loss functions in all iteration. In each iteration, there are a masterfor predicted results and two auxiliary losses ( and ) for deep supervision. Thus, the total loss in the -th iteration can be formulated as . For all of these losses, we adopt a weighted binary cross-entropy , which penalizes more on false negatives than false positives . The final refinement loss can be expressed by , where is the normalization factor . We give more weight for the loss in a later iteration to increase numerical stability for network training. In practice, we initially train the network without the refinement, which is . Then, we increase to start the training with the refinement. In experiments, we set to be 3.
3 Experiments & Results
3.1 Databases & Evaluation Protocol
We evaluate the proposed method and other methods by using three publicly available datasets, which are DRIVE , STARE , and CHASE_DB1 . The DRIVE dataset consists of 40 images, 7 of which show retinal pathology. The STARE dataset contains 20 images, 10 of which belong to sick individuals. The CHASE_DB1 dataset includes 28 images that are collected from both eyes of 14 children. The DRIVE dataset is officially divided into the training and testing sets that respectively contains 20 images. For the STARE and CHASE_DB1 datasets, we randomly select 10 images for testing, and use the rest for training.
We evaluate all methods by calculating three widely-used metrics for segmentation and two metrics that can quantify the connectivity of segmented vessels. The metrics for segmentation include the area under the receiver operating characteristic curve (AUC), sensitivity (SE), and specificity (SP). The metrics for quantifying vessel connectivity are previously used for measuring how well the topology of a road network is, and recently adopted to measure the connectivity of segmented vessels in . Their calculation requires to randomly select two points which lie both on the ground-truth and binary segmentation result. Then, check whether the shortest path between the points has the same length. This is repeated many times to record the percentages of correct and infeasible paths. Large correct (COR) while low infeasible (INF) percentage indicates good connectivity on segmented vessels. In experiments, we find 1000 times repetition per testing image is enough to obtain converged percentages. Besides, binary segmentation results are produced apply the Otsu’s thresholding method  on predicted probability maps for all methods in experiments.
3.2 Ablation Study
We perform the ablation study to check whether the proposed semantics-guided module and recursive refinement are effective or not. This study is performed by evaluating different network configurations on the DRIVE dataset. Detailed results are summarized in Table 1. The baseline method is only a 3-layered U-Net trained by deep supervision, when neither the semantics-guided module nor the recursive refinement is used. The method-a and method-b respectively denote that either the semantics module or the recursive refinement is utilized, while the method-c denotes that both of them takes effect. Just from the three segmentation related metrics, the four methods are not different too much, though the method-c has achieved the highest values on AUC and SE. However, they quite differ in the metrics related to vessel connectivity. It can be seen that both INF and COR become better when either the semantics-guide module or recursive refinement is activated. The best performance for vessel connectivity is achieved when both of them take effect. Compared with the baseline method, the INF is decreased by 21.7% while the COR is increased by 12.2% for the method-c. Therefore, these results quantitatively demonstrate that the connectivity of segmented vessels can be enhanced by using the proposed recursive semantics-guided network.
Besides, we also give the visualization results in Fig. 4 to show how these methods are visually different on binary segmentation results. Compared with the baseline, the semantics-guided module improves the connectivity slightly and makes more capillary vessels be extracted. The refinement strategy can significantly avoid the breakpoints in segmented vessels. By using the semantics-guided module together with the refinement, not only the vessel extraction becomes more accurate but also the connectivity is largely enhanced.
3.3 Comparison with Other Leading Methods
We compare the proposed method with three leading methods by evaluating them on the DRIVE, CHASE_DB1 and STARE datasets. Ara et al.  propose a stacked variational auto-encoder based network to improve the connectivity of segmented vessels, which could be the only work that is targeted for boosting vessel connectivity and published in recent years, according to our knowledge. The other two methods are specifically aimed for improving the retinal vessel segmentation. Oliveira et al.  propose a multiscale fully convolutional network for vessels segmentation, which combines the multiscale analysis provided by the stationary wavelet transform. Xu et al.  improve vessel segmentation by introducing a carefully designed semantics and multi-scale aggregation network. The results are summarized in Table 2. If only evaluating these methods by the AUC, SE and SP, one might consider that the performance of our method were roughly equal to the others. However, the difference becomes obvious when they are compared by using the INF and COR, which can quantify the connectivity of segmented vessels. Especially for the CHASE_DB1 dataset, our method can decrease the INF by 19.1% while increase the COR by 15.4%, compared with the 2nd ranking method. These results show that the proposed method outperforms the three leading methods and achieves the best performance on vessel connectivity. Finally, we give examples of our method for retinal vessel segmentation on the three public datasets in Fig. 5.
In this paper, we propose a recursive semantics-guided network for better connectivity on retinal vessel segmentation. It is featured by a semantics-guided module that can fully exploit semantics information for guiding the network to learn more powerful features, and a recursive refinement that can iteratively enhance the results while saving network parameters. Its efficiency is demonstrated from extensive experimental results.
-  Araújo, R.J., Cardoso, J.S., Oliveira, H.P.: A deep learning design for improving topology coherence in blood vessel segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 93–101. Springer (2019)
-  Fraz, M.M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A.R., Owen, C.G., Barman, S.A.: Blood vessel segmentation methodologies in retinal images–a survey. Computer methods and programs in biomedicine 108(1), 407–433 (2012)
-  Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical imaging 19(3), 203–210 (2000)
-  Mosinska, A., Marquez-Neila, P., Koziński, M., Fua, P.: Beyond the pixel-wise loss for topology-aware delineation. In: Conference on Computer Vision and Pattern Recognition (2018)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision (2016)
Oliveira, A., Pereira, S., Silva, C.A.: Retinal vessel segmentation based on fully convolutional neural networks. Expert Systems with Applications112, 229–242 (2018)
-  Otsu, N.: A threshold selection method from gray-level histograms. IEEE Transactions on Systems Man & Cybernetics 9(1), 62–66 (2007)
-  Owen, C.G., Rudnicka, A.R., Mullen, R., Barman, S.A., Monekosso, D., Whincup, P.H., Ng, J., Paterson, C.: Measuring retinal vessel tortuosity in 10-year-old children: validation of the computer-assisted image analysis of the retina (caiar) program. Investigative ophthalmology & visual science 50(5), 2004–2010 (2009)
-  Srinidhi, C.L., Aparna, P., Rajan, J.: Recent advancements in retinal vessel segmentation. Journal of Medical Systems 41(4), 70 (2017)
-  Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging 23(4), 501–509 (2004)
-  Wang, B., Qiu, S., He, H.: Dual encoding u-net for retinal vessel segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 84–92. Springer (2019)
-  Wegner, J.D., Montoya-Zegarra, J.A., Schindler, K.: A higher-order crf model for road network extraction. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
-  Wu, Y., Xia, Y., Song, Y., Zhang, Y., Cai, W.: Multiscale network followed network model for retinal vessel segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 119–126. Springer (2018)
-  Xie, S., Tu, Z.: Holistically-nested edge detection. In: International Conference on Computer Vision (ICCV) (2015)
-  Xu, R., Ye, X., Jiang, G., Liu, T., Li, L., Tanaka, S.: Semantics and multi-scale aggregation network for retinal vessel segmentation. In: 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020) (2020)
-  Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)