Log In Sign Up

Connection Sensitive Attention U-NET for Accurate Retinal Vessel Segmentation

by   Ruirui Li, et al.
NetEase, Inc

We develop a connection sensitive attention U-Net(CSAU) for accurate retinal vessel segmentation. This method improves the recent attention U-Net for semantic segmentation with four key improvements: (1) connection sensitive loss that models the structure properties to improve the accuracy of pixel-wise segmentation; (2) attention gate with novel neural network structure and concatenating DOWN-Link to effectively learn better attention weights on fine vessels; (3) integration of connection sensitive loss and attention gate to further improve the accuracy on detailed vessels by additionally concatenating attention weights to features before output; (4) metrics of connection sensitive accuracy to reflect the segmentation performance on boundaries and thin vessels. Our method can effectively improve state-of-the-art vessel segmentation methods that suffer from difficulties in presence of abnormalities, bifurcation and microvascular. This connection sensitive loss tightly integrates with the proposed attention U-Net to accurately (i) segment retinal vessels, and (ii) reserve the connectivity of thin vessels by modeling the structural properties. Our method achieves the leading position on DRIVE, STARE and HRF datasets among the state-of-the-art methods.


page 1

page 2

page 3

page 4

page 6

page 7

page 8

page 9


Retinal Vessel Segmentation Based on Conditional Deep Convolutional Generative Adversarial Networks

The segmentation of retinal vessels is of significance for doctors to di...

EAR-U-Net: EfficientNet and attention-based residual U-Net for automatic liver segmentation in CT

Purpose: This paper proposes a new network framework called EAR-U-Net, w...

Attention Guided Network for Retinal Image Segmentation

Learning structural information is critical for producing an ideal resul...

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters

Accurate retinal vessel segmentation is challenging because of the compl...

VAFO-Loss: VAscular Feature Optimised Loss Function for Retinal Artery/Vein Segmentation

Estimating clinically-relevant vascular features following vessel segmen...

A Two-Stream Meticulous Processing Network for Retinal Vessel Segmentation

Vessel segmentation in fundus is a key diagnostic capability in ophthalm...

Deep supervision with additional labels for retinal vessel segmentation task

Automatic analysis of retinal blood images is of vital importance in dia...

1 Introduction

Retinal vasculature structure implicates important information and helps the ophthalmologist in detecting and diagnosing a variety of retinal pathology such as Retinopathy of Prematurity (RoP), Diabetic Retinopathy(DR), Glaucoma, hypertension, and Age-related Macular Degeneration(AMD) which are leading causes of blindness. The segmentation of retinal vessels is particularly important for diagnosis assistance, treatment and surgery planning of retinal diseases. Changes in vessel morphology such as shape, tortuosity, branching pattern and width provide an accurate early detection of many retinal diseases.

Over the past two decades, a tremendous amount of research has been devoted in segmenting the vessels from retinal fundus images. Numerous fully automated methods[24, 14, 17] have been proposed in literature which were quite successful in achieving segmentation accuracy on par with trained human annotators. Despite this, there is a considerable method for further improvements due to various challenges posed by the complex nature of vascular structures. Some of the active problems include segmentation in the presence of abnormalities, segmentation of thin vessels structures and segmentation near the bifurcation and crossover regions.

Comprehensive and detailed survey of retinal vessels segmentation methods are included in [21, 1, 5]

. Works that concerned by the paper are deep learning based methods for accurate retinal vessel segmentation. Liskowski et al.

[10] proposed a deep neural network model, achieving an area under the curve (ROC AUC) of 0.97 on the DRIVE dataset. Their method performs reasonably well on pathological images. A novel CNN architecture was proposed in [12] to solve both the retinal vessel and optic disc segmentation problem. Fu et al. [6] formulated the vessel segmentation as a boundary detection problem using fully connected CNN model. In semantic segmentation field, U-Net[18] are fully convolutional networks for biomedical image segmentation.

Though many deep learning based approaches have been proposed, existing methods tend to miss fine vessels structures or allow false positives at terminal branches.Attention U-Net[15] is used to automatically learn to focus on target structure of varying shapes and sizes. Mosinska [13] have found that pixel-wise losses are unsuitable for retinal vessel segmentation because of their inability to reflect the topological impact of mistakes in the final prediction. The work[25]

added a coefficient to cross-entropy loss. It designed an estimating way of connectivity depending on the Euclidean distance between focused pixel and the nearest pixel belongs to the class. Ventura

[23] defined a new way to evaluate the connectivity on a patch. The most recent approach by Son et al. [20] generates the precise map of retinal vessels using generative adversarial training (GAN). Unfortunately, with limited data, generative models are considered much harder to train than discriminative models.

For thin vessels segmentation, this paper proposes an efficient topology-aware loss and a novel attention mechanism based on the U-Net to improve the accuracy. The proposed loss is called connection sensitive loss (CS loss) in that it considers the probability of connectivity in the neighboring region when designing the loss function. Moreover, the network is added new attention gates and learns a better matrix of attention weights before output. The proposed method provides an end-to-end fashion without any intervene in learning. With the well-designed attention U-Net architecture, the proposed connection sensitive loss gets the highest

- on all the three datasets which are DRIVE[22], STARE[8] and HRF[4]. It also performs better to extract thin vessel structures compared with the state-of-the-art methods. In summary, the paper mainly made the following contributions:

  1. For vessels segmentation, the paper proposes a connection sensitive loss. It is designed for simultaneous region-wise structure extraction and pixel-wise semantic segmentation. It helps achieve accurate results, even for thin vessel structures in crossover regions.

  2. A new attention mechanism is designed based on the standard U-Net. The proposed attention gates improve the quality and the effectiveness of the features and thus take better advantage of them during segmentation.

  3. The paper proposes the connection sensitive attention U-Net (CSAU) which combines the connection sensitive loss and the attention gates together. In the experiment, CSAU gets the highest - on all the three datasets compared with the state-of-the-art methods.

  4. In order to better reflect the quality of the segmentation details, this paper invents a new metrics to evaluate the segmentation of boundaries and thin vessel structures. We name it as connection sensitive accuracy.

In Section 2, we will introduce the proposed method. Section 3 shows implementation details that include data preprocessing and training process. And Section 4 discusses the experimental situation and analyzes the results. The last section shows the conclusions of this paper.

2 Proposed methodology

In this section, we present the architecture of the connection sensitive attention U-Net (CSAU). The main framework is showed in Fig. 2. Its structure is very like the original attention U-Net except the connections and the designs of attention gates. Moreover, the framework uses a new connection sensitive loss with which the attention gate learns better attentive weights and helps improve the accuracy of details.

Figure 2: The proposed framework

The parameters of the convolutional neural layers are listed in Table 1. The network contains four encoder blocks and four decoder blocks. They are connected by the skip connections. Each encoder block consists of two successive 3

3 convolutional layers and a max pooling layer. Every convolutional layer is followed by a Batch-normalization layer and a ReLU layer. The decoder block is the same as the encoder block except that it uses the transposed convolutional layer instead of the pooling layer.

conv1_1 33, 32
conv1_2 33, 32

2 max pool, stride 2

conv2_1 33, 64
conv2_2 33, 64
22 max pool, stride 2
conv3_1 33, 128
conv3_2 33, 128
22 max pool, stride 2
conv4_1 33, 256
conv4_2 33, 256
22 max pool, stride 2
conv5_1 33, 512
conv5_2 33, 512
convTranspose5_1 22, 256
conv6_1 33, 256
conv6_2 33, 256
convTranspose6_1 22, 128
conv7_1 33, 128
conv7_2 33, 128
convTranspose7_1 22, 64
conv8_1 33, 64
conv8_2 33, 64
convTranspose8_1 22, 32
conv9_1 33, 32
conv9_2 33, 1
conv10_1 33, 32
conv10_2 33, 32
conv10_3 33, 1
Table 1: The parameters of the convolutional neural layers.

2.1 Connection sensitive loss

The parameters of the model are learnt by a training objective, using Adam stochastic gradient descent. In this paper, we build a new training objective on top of the proposed attention U-Net architecture. In the following discussion, let

be the input image, and let be the corresponding ground-truth labeling, with 1 indicating pixels in the vessels and 0 indicating background pixels. Let be the proposed neural network parameterized by weights . The output of the network is an image . Every element of is interpreted as the probability of pixel having label 1: , where is a random Bernoulli variable .

Cross entropy is widely used as the loss function in deep learning networks to deal with binary classification problems, which calculates the probability of being one specific class or not. Thus, the proposed loss function is also on the basis of the cross-entropy loss defined by


By observing the definition of in (1), we can find that the cross-entropy loss assigns equal weights to the loss of different pixels, failing to consider fine object structures. Therefore, cross entropy loss is not fit well to the tasks of segmenting connected vascular structures. Fig. 3 shows the segmentation results produced by the U-Net with the cross-entropy loss. The colored pixels are false negative results. It is obvious that cross-entropy loss tends to bring broken vessels in terminal branches, which are critical for diagnosis.

Figure 3: Results trained by binary cross entropy in which red pixels are false negatives.

The connection sensitive loss is designed for neural network training tasks in the field where the structural connectivity of segmented objects is concerned. To solve the problem, we take the connectivity into consideration by encoding two coefficients into the cross-entropy loss, as showed in (3). is the connection sensitive loss. and represent local structural properties in the labeled ground truth and the predicted map respectively, while is a weighted parameter that multiplies with the encoded loss on every pixel which will be explained later.


To model the structural properties, an exponential function is constructed as showed in the following equation:


in which represents the probability of connectivity in local regions. It can be computed by the following function with upper bound 1 and lower bound 0. is a variable representing whether the pixel belongs to the ground truth or the predicted map .


It is observed that is strongly correlated to the local density. To estimate , the function chooses a polynomial model and computes the local density by averaging the values in the region. are constant coefficients. represents a square region in the map with the side length and the coordinate of the center point. The region can be defined with the matrix in the equation:


To get the values of the constant coefficients and , we throw N sampling points on region for different densities through the Monte Carlo important sampling. Inspired by the definition of connectivity in the paper[23], on each sampled patch, we decide whether the region is connected or not by checking if there exist two paths from the center point to the boundary of the region according to the eight-connected domain algorithm. Fig. 4 shows some cases when the density is 0.2 in resolution areas. Fig. 5 shows the fitted curve when , , and . The sampled blue curve is very close to the modeled red curve.

Figure 4: Some samples when adding 5 points in 55 resolution area.
Figure 5: Curves of connectivity probability with different densities in 55 region.

It is recommended to choose during the local connectivity estimation for simplification without scarifying too much accuracy. In fact, area could be seen as a local pattern. Images with complex contents and other resolutions could be mapped to the local pattern. Fig. 6 illustrates the connectivity feature map in which pixels are computed through on the ground truth. It means that the larger the value is, the more attention should be paid on this pixel. It is assumed that large value has high risk of being less connected.

Figure 6: (a) is a local region of the label image, the purple region is the background and the yellow region is the vessels. (b) is the corresponding region of connectivity feature map where the pixels with dark value have higher probability of connectivity than those with bright colors.

The factor is proposed to further decrease the false negatives and is formulated as:


If the output is expected to connect other vessel pixels and is predicted small probability, the value would be higher and brings more punishment on the false negative pixels by increasing their losses. The punishment is region-aware. The term

indicates the probability to classify the pixel as vessel class to some extent. The larger the value is, the easier it is going to be recognized, and vice versa. The term

illustrates the difference between the probability and the predicted value. It is expected to become smaller during the training process.

2.2 Attention gates

The proposed attention gates are incorporated into the standard U-Net architecture to highlight salient features that are passed through the skip connections, see Fig. 2. The attention gate has two input signals. One is the feature map that is transported by the skip connection. The other input is the coarse feature gotten from the output of previous neural layer. Information extracted from coarse scale is used in gating to disambiguate irrelevant and noisy responses in skip connections. The output of attention gate is connected to the next decoder. The gating signal for each skip connection aggregates information from multiple imaging scales which increases the resolution of the attention weights and helps achieve better performance.

Figure 7: The proposed attention gate.

The proposed attention gate is showed in Fig. 7. It is actually a sub-network in a simple encoder-decoder pattern. The attention gate consists of five convolutional layers, five batch normalizers, five ReLUs, two max pooling layers and a transposed convolutional layer. The feature map X and G are transformed to an intermedia space first. Then the addition of them are up-sampled by transposed convolution. We use additive attention[2] to obtain the gating coefficient. Additive attention is formulated as follows:



correspond to sigmoid activation function. Attention gate is characterized by a set of parameters

containing: linear transformations, non-linear transformations and bias terms.

defines the operations on and by parameters .

We tried two kinds of connection modes for designing of attention gates. We called them the UP-Link and the DOWN-Link respectively. According to the UP-Link, there is a connection between the input G and the output of attention gate as showed in Fig. 7. On the other hand, the DOWN-Link has a connection between the input X and the output instead. CSAU chooses the UP-Link mode since such mechanism improves the quality and influence of detailed features during training. Updating parameters of the attention gates depends on the gradient passed not only from the decoder layers but also from the encoder layers. It results experimentally in better attention weights for segmentation model. Examples of intermediate attention weights are converted and visualized in Fig. 8 in which (c) illustrates the last attention weights gotten by the UP-Link while (d) illustrates that gotten by the DOWN-Link in the same situation. The UP-Link mode provides sufficient detailed information as well as strengthened salient features for the following decoders in feed forward propagation. As a result, both the vessels and the structures are well preserved.

Figure 8: Visualizations of attention weights. (a) is a fundus image, (b) is the ground truth and (c) (d) are visualized attention weights by UP-Link and DOWN-Link respectively.

At the end of the network showed in Fig. 2, the last attentive weights are extracted out and concatenated to the output of the features, which further emphasize attentive pixels. Experiments in section 4.3 show the validation of the proposed attention mechanism for thin vessels segmentation and its connectivity preservation.

2.3 Metrics of connection sensitive accuracy

General metrics for image segmentation could judge how good main vessels are segmented. But they could not distinct clearly the minor changes in boundaries and fine vessel structures which are critical for early diagnosis. To solve the problem, this paper presents a new evaluation metrics to evaluate the performance of segmentation on boundaries and thin structures. Based on a factor of CS loss, we define the

as follows:


in which , are binary threshold functions. is the predicted result with input and weights . constructed a mask map. It is calculated by union operation of two sets. The first set () represents the pixels belonging to fine vessel structures that are hard to be segmented. The second set is the extracted boundary of the ground truth through the DOG edge detection algorithm. Fig. 9 presents two examples of the mask maps on No.3 image and No.11 image of DRIVE. Actually, computes the proportion of correctly segmented pixels and the total pixels with the mask.

Figure 9: Two examples of mask maps on DRIVE.

3 Implementation and Experiments setup

3.1 Implementation details

In this part, we will make a brief introduction of the implementation of the connection sensitive attention U-Net. The experiments are carried out on a laboratory computer. Its configuration is showed in Table 2. The operating system is Ubuntu 16.04. The main required packages include python 3.6, CUDA8.0, cudnn7.0, Pytorch0.4.0.

CPU Intel (R) Core (TM) i7-4790K 4.00Hz
GPU GeForce GTX1080 Ti
Hard disk Toshiba SSD 512G
System Ubuntu 16.04
Table 2: Experimental environments

To avoid complex CUDA coding, we make full use of functions provided by PyTorch, mainly the nn.Functional.conv2d and the nn.MaxPool2d. Specifically, to calculate the summation of the probability in the region that centered at a focused pixel, we use nn.Functional.conv2d with a kernel 5

5 and perform convolution on the whole image, except the padding part, which remains zero. To get the max probability of that region, we use nn.MaxPool2d, setting kernel size as 7.

3.2 Datasets and preparation

Our approach is examined on three widely used benchmarks: DRIVE[22], STARE[8] and HRF[4], provided by different organizations. All photographs in these benchmarks are RGB images, while annotated images are binary images. DRIVE contains 20 training images and 20 testing images, with each of size 584565. STARE contain 20 fundus images, with each of size 605700. We manually divide the STARE dataset into training and testing images in the ratio of 10/10. For DRIVE and STARE, we use only one image from the training set for validation. The HRF dataset comprises 45 images and is organized as 15 subsets. Each subset contains one healthy fundus image, one image of patient with diabetic retinopathy and one glaucoma image. We set the first 5 subsets as our training set and the rest as testing set. Five validation images are randomly selected in the training set.

For DRIVE, we resize each image to 640640 by padding it with zero in four margins. For STARE, we resize them to 720720 in the same way. Each image in HRF is digitalized to 23363504 pixels. Because of the high resolution image in HRF and limitation of GPU memory, we crop a single image into 640640 tiles, and test the tiles one by one from bottom left to up right in a sliding window way. To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. We use an overlap strategy described in the work[9]. For each tile, we compute the weight for overlapped pixels by the Gaussian function. Through weighted summary, we composite the overlapped tiles and seamless stitch the whole segmental image.

To augment the data, the method rotates the image every 4 degree along the whole round. Then it further flips them horizontally and vertically. Thus, there are 270 images generated from a single image.

3.3 Training methodology

The model is trained by AdamW[11] with parameters , and learning rate 0.002. We propose a new learning strategy for the experiments. According to the strategy, we test the latest model on the validation set for every fifty batches. We use its loss as metrics to adjust the following learning rate. If the loss doesn’t decrease for continuous five groups of validations, the learning rate will be set to the maximum of the values between 0.0001 and 0.1 times current learning rate. If the loss doesn’t decrease for continuous 20 groups of validations, the learning rate will be set to the initial value 0.002.

We use a mini-batch size of 2 images for DRIVE, STARE and HRF. The model with the minimal validation loss will be chosen as the final model for testing. According to the experiments, the validation loss tends to converge within 20

training epoch and we set the max training epoch to 25.

4 Results and Analysis

4.1 Evaluation metrics

We use -, PR AUC, ROC AUC, and to evaluate the performance of binary segmentation model. False Negative(), True Positive(), True Negative(), False Positive() are four basic elements to compute the metrics. We also introduce connection sensitive accuracy() to measure the performance of segmentation on terminal thin vessels.

considers both Recall and Precision, which is defined as:


- is positively related to the performance of the model.

is the proportion of the pixels which are correctly segmented and the total pixels.


PR AUC and ROC AUC A and (PR) curve is plotting Precision against Recall while a Receiver Operating Characteristic (ROC) curve is plotting True Positive Rate () against False Positive Rate (). is defined as:


AUC is the area under the curve and the performance of the model is positively related to the value of the area.

4.2 Overall performance

We trained the CSAU model on DRIVE, STARE and HRF respectively and compared it with the state-of-the-art methods. The results for comparison on DRIVE and STARE are obtained from the web site of VGAN[20]. We directly use the segmented images to compute the metrics. On the other hand, the results for comparison on HRF are gotten from the work[17]. Since they do not provide the source code and the result images, as a result, we simply copy the metrics provided in their paper. To guarantee the fairness, we use the same way when choosing training, validating and testing set, which is described in section 3.2.

Table 3-5 show the results of comparison metrics. As observed, CSAU got the highest -, and ROC AUC on all the benchmarks. On DRIVE, the proposed method achieves leading position on the leading broad through all the evaluation metrics. The comparison methods include K-Boost[3], HED[26], Wavelets[19], -Fields[7], DRIU[12], CRFs[16] and VGAN[20]. Among them, HED, DRIU and VGAN are deep learning based methods which show superior performance in contrast to the other non-deep learning methods. Fig. 10 displays the PR curves and the ROC curves. The performance of VGAN is also good and is listed in the second place. Compared with VGAN, the - of CSAU is 0.2% higher than that of VGAN and the of CSAU is promoted by 0.6%. CSAU improves the PR AUC by 0.2% and ROC AUC by 0.4% respectively. Actually, most deep learning based neural networks could segment the main vessels well. What really challenging is the task to segment thin vessel structures. In fundus images, pixels of thin vessels take a much smaller proportion compared with the other pixels. As a result, even the improvement on thin vessel segmentation is obvious, the promotion is slight when evaluated by the general metrics on the whole image. The last column in Table 3 shows the results of connection sensitive accuracy metrics by different methods. The of CSAU is 2.8% higher than that of VGAN and is 3.8% higher than that of DRIU. It means that CSAU has a better performance on segmenting the boundaries and the thin vessels. Fig. 11 shows a group of examples on DRIVE. The segmented results by VGAN and CSAU are looked similar from an overall perspective. By zooming in the area surrounded by red rectangles, it is clear to distinct that where VGAN tend to obtain inaccurate boundaries and broken thin vessels. CSAU gets more accurate boundary and more integrated vessel structures.

Methods DRIVE
K-Boost[3] 0.9307 0.8464 0.7797 0.7563 0.9456 0.6739
HED[26] 0.9696 0.8773 0.7938 0.7943 0.9475 0.7016
Wavelets[19] 0.9436 0.8149 0.7601 0.7628 0.9387 0.6839
-Fields[7] 0.9686 0.8851 0.8021 0.7994 0.9498 0.7178
DRIU[12] 0.9793 0.9064 0.8210 0.8261 0.9541 0.7470
CRFs[16] 0.7799 0.7829 0.9438 0.6785
VGAN[20] 0.9803 0.9142 0.8277 0.8300 0.9560 0.7537
CSAU 0.9807 0.9157 0.8294 0.8349 0.9563 0.7751
Table 3: Comparison of different methods on DRIVE.
Figure 10: Precision and Recall curves and Receiver Operating Characteristic curves for different methods on DRIVE.
Figure 11: Comparison of details between VGAN and CSAU.

On STARE, similar phenomenon could be found as that on DRIVE in the experiments. CSAU wins the first place by all the metrics except the PR AUC. It gets 0.34% higher ROC AUC, 0.1% higher - and 0.6% higher than VGAN. Several zoomed in images are displayed in Fig. 1 which indicate that CSAU obtain good vessel structures on STARE either.

On HRF, CSAU is compared with Odstrcilik, Vostatek(Soares), Vostatek(Sofka) and Orlando. The results of different methods are differed a lot on general segmentation metrics. Thus we did not compute the metrics of for further analysis. CSAU gets the highest scores and values in this group of experiments. Compared with Orlando, the - is enhanced by more than 14 percent.

Methods STARE
HED[26] 0.9764 0.8888 0.8057 0.8200 0.9588 0.7257
Wavelets[19] 0.9694 0.8433 0.7756 0.7817 0.9529 0.7226
DRIU[12] 0.9772 0.9101 0.8323 0.8380 0.9648 0.7667
VGAN[20] 0.9777 0.9159 0.8353 0.8350 0.9657 0.7694
CSAU 0.9834 0.9206 0.8435 0.8465 0.9673 0.7878
Table 4: Comparison of different methods on STARE.
Methods HRF
Odstrcilik[14] 0.967 0.7316 0.6950 0.7772
0.97 0.7340
0.937 0.5830
Orlando[17] 0.7168 0.7199 0.7201
CSAU 0.9867 0.9047 0.8171 0.8043 0.8303
Table 5: Comparison of different methods on HRF.

4.3 Experiment Analysis

To explore the reason why CSAU could get good performance, we carried out extra experiments on the datasets. We tried four different combinations. They are U-Net with CE loss(UCE), U-Net with CS loss(UCS), Attention U-Net with CE loss (AUCE) and Attention U-Net with CS loss(CSAU). Table 6 and 7 display the results of different combinations on DRIVE and STARE respectively. The table of HRF are provided in supplementary materials. From the results, we could find that either the usage of the proposed attention mechanism or that of the CS loss improves the performance. With both techniques, CSAU gets the best results in the group. Fig. 12 visually compares UCE and CSAU on an image of DRIVE. It is obvious that the proposed CSAU segments fine vessels more correctly while preserve topology structures well.

For quantitative analysis, on DRIVE, the result of CSAU is 0.6% higher in -, 0.2% higher in ROC AUC and 0.6% higher in than that of the UCE. As previously discussed, results on general metrics are not improved a lot. But in Fig. 12, the enhancement is noticeble. To further analyze the source of contributions, we calculates the on the results by different combinations. It could be seen that CSAU enhances the accuracy of segmentation mainly by improving the performance on boundaries and thin vessels. The other groups of experiments on STARE and HRF conform the effectiveness of the proposed method. Full experimental results could be found in the supplement materials.

Figure 12: Comparison between UCE and CSAU.
Methods DRIVE
0.8243 0.9084 0.9776 0.8318 0.9549 0.7435
0.8255 0.9101 0.9802 0.8307 0.9554 0.7523
0.8258 0.9111 0.9777 0.8303 0.9553 0.7524
0.8294 0.9157 0.9807 0.8349 0.9563 0.7751
Table 6: Comparison of different combinations on DRIVE.
Methods STARE
0.8310 0.9096 0.9789 0.8350 0.9646 0.7513
0.8372 0.9155 0.9796 0.8492 0.9656 0.7702
0.8393 0.9202 0.9842 0.8455 0.9663 0.7619
0.8435 0.9206 0.9834 0.8465 0.9673 0.7878
Table 7: Comparison of different combinations on STARE.

5 Conclusions

In this paper, we proposed a very elegant symmetric neural network named connection sensitive attention U-Net for retinal vessels segmentation. Differed with other end-to-end semantic segmentation networks, the proposed CSAU not only concerned with pixel-level accuracy but also took care of topology structures by designing a novel connection sensitive loss and a new attention gate. The network was also learnt attention weights and concatenated it at the end of the network, which further improves the accuracy.

We verify the validity of CSAU on three public datasets: DRIVE, STARE, and HRF. The CSAU not only gets the highest -, ROC AUC and on all the three datasets, but also performs well to segment the thin vessel structures, compared with the state-of-the-art methods. We also propose a new metrics named connection sensitive accuracy to evaluate the improvement on thin vessels segmentation. Based on it, we conclude that CSAU could segment thin vessels with high accuracy which is important for clinical diagnosis.

In the future, we will intend to try multiscale techniques and semi-supervised learning techniques to further enhance accuracy and efficiency.


  • [1] J. Almotiri, K. Elleithy, and A. Elleithy. Retinal vessels segmentation techniques and algorithms: A survey. Applied Sciences, 8(2):155, 2018.
  • [2] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
  • [3] C. Becker, R. Rigamonti, V. Lepetit, and P. Fua. Supervised feature learning for curvilinear structure segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 526–533. Springer, 2013.
  • [4] A. Budai, R. Bock, A. Maier, J. Hornegger, and G. Michelson. Robust vessel segmentation in fundus images. International journal of biomedical imaging, 2013, 2013.
  • [5] M. M. Fraz, P. Remagnino, A. Hoppe, B. Uyyanonvara, A. R. Rudnicka, C. G. Owen, and S. A. Barman. Blood vessel segmentation methodologies in retinal images–a survey. Computer methods and programs in biomedicine, 108(1):407–433, 2012.
  • [6] H. Fu, Y. Xu, D. W. K. Wong, and J. Liu. Retinal vessel segmentation via deep learning network and fully-connected conditional random fields. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on, pages 698–701. IEEE, 2016.
  • [7] Y. Ganin and V. Lempitsky. -fields: Neural network nearest neighbor fields for image transforms. In

    Asian Conference on Computer Vision

    , pages 536–551. Springer, 2014.
  • [8] A. Hoover, V. Kouznetsova, and M. Goldbaum. Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical imaging, 19(3):203–210, 2000.
  • [9] R. Li, W. Liu, L. Yang, S. Sun, W. Hu, F. Zhang, and W. Li. Deepunet: A deep fully convolutional network for pixel-level sea-land segmentation. IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing, PP(99):1–9, 2017.
  • [10] P. Liskowski and K. Krawiec. Segmenting retinal blood vessels with deep neural networks. IEEE transactions on medical imaging, 35(11):2369–2380, 2016.
  • [11] I. Loshchilov and F. Hutter. Fixing weight decay regularization in adam. arXiv preprint arXiv:1711.05101, 2017.
  • [12] K.-K. Maninis, J. Pont-Tuset, P. Arbeláez, and L. Van Gool. Deep retinal image understanding. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 140–148. Springer, 2016.
  • [13] A. Mosinska, P. Marquez-Neila, M. Kozinski, and P. Fua. Beyond the pixel-wise loss for topology-aware delineation. In

    Conference on Computer Vision and Pattern Recognition (CVPR)

    , number CONF, 2018.
  • [14] J. Odstrcilik, R. Kolar, A. Budai, J. Hornegger, J. Jan, J. Gazarek, T. Kubena, P. Cernosek, O. Svoboda, and E. Angelopoulou. Retinal vessel segmentation by improved matched filtering: evaluation on a new high-resolution fundus image database. IET Image Processing, 7(4):373–383, 2013.
  • [15] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018.
  • [16] J. I. Orlando and M. Blaschko. Learning fully-connected crfs for blood vessel segmentation in retinal images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 634–641. Springer, 2014.
  • [17] J. I. Orlando, M. Fracchia, V. del Río, and M. del Fresno. Retinal blood vessel segmentation in high resolution fundus photographs using automated feature parameter estimation. In 13th International Conference on Medical Information Processing and Analysis, volume 10572, page 1057210. International Society for Optics and Photonics, 2017.
  • [18] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  • [19] J. V. Soares, J. J. Leandro, R. M. Cesar, H. F. Jelinek, and M. J. Cree. Retinal vessel segmentation using the 2-d gabor wavelet and supervised classification. IEEE Transactions on medical Imaging, 25(9):1214–1222, 2006.
  • [20] J. Son, S. J. Park, and K.-H. Jung. Retinal vessel segmentation in fundoscopic images with generative adversarial networks. arXiv preprint arXiv:1706.09318, 2017.
  • [21] C. L. Srinidhi, P. Aparna, and J. Rajan. Recent advancements in retinal vessel segmentation. Journal of medical systems, 41(4):70, 2017.
  • [22] J. Staal, M. D. Abràmoff, M. Niemeijer, M. A. Viergever, and B. Van Ginneken. Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging, 23(4):501–509, 2004.
  • [23] C. Ventura, J. Pont-Tuset, S. Caelles, K.-K. Maninis, and L. Van Gool. Iterative deep learning for network topology extraction. arXiv preprint arXiv:1712.01217, 2017.
  • [24] P. Vostatek, E. Claridge, H. Uusitalo, M. Hauta-Kasari, P. Fält, and L. Lensu. Performance comparison of publicly available retinal blood vessel segmentation methods. Computerized Medical Imaging and Graphics, 55:2–12, 2017.
  • [25] Y. Wei, Z. Wang, and M. Xu. Road structure refined cnn for road extraction in aerial image. IEEE Geosci. Remote Sensing Lett., 14(5):709–713, 2017.
  • [26] S. Xie and Z. Tu. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015.