DUNet: A deformable network for retinal vessel segmentation

11/03/2018 ∙ by Qiangguo Jin, et al. ∙ Linköping University Tianjin University 0

Automatic segmentation of retinal vessels in fundus images plays an important role in the diagnosis of some diseases such as diabetes and hypertension. In this paper, we propose Deformable U-Net (DUNet), which exploits the retinal vessels' local features with a U-shape architecture, in an end to end manner for retinal vessel segmentation. Inspired by the recently introduced deformable convolutional networks, we integrate the deformable convolution into the proposed network. The DUNet, with upsampling operators to increase the output resolution, is designed to extract context information and enable precise localization by combining low-level feature maps with high-level ones. Furthermore, DUNet captures the retinal vessels at various shapes and scales by adaptively adjusting the receptive fields according to vessels' scales and shapes. Three public datasets DRIVE, STARE and CHASE_DB1 are used to train and test our model. Detailed comparisons between the proposed network and the deformable neural network, U-Net are provided in our study. Results show that more detailed vessels are extracted by DUNet and it exhibits state-of-the-art performance for retinal vessel segmentation with a global accuracy of 0.9697/0.9722/0.9724 and AUC of 0.9856/0.9868/0.9863 on DRIVE, STARE and CHASE_DB1 respectively. Moreover, to show the generalization ability of the DUNet, we used another two retinal vessel data sets, one is named WIDE and the other is a synthetic data set with diverse styles, named SYNTHE, to qualitatively and quantitatively analyzed and compared with other methods. Results indicates that DUNet outperforms other state-of-the-arts.



There are no comments yet.


page 1

page 2

page 3

page 5

page 6

page 7

page 10

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The morphological and topographical changes of retinal vessels may indicate some pathological diseases, such as diabetes and hypertension. Diabetic Retinopathy (DR) caused by elevated blood sugar levels, is a complication of diabetes in which retinal blood vessels leak into the retina, accompanying with the swelling of the retinal vessels [1]. It must be noticeable if a diabetic patient appears in a swelling of the retinal vessels. Hypertensive Retinopathy (HR) is another commonly seen retina disease caused by high blood pressure [2]. An increased vascular tortuosity or narrowing of vessels can be observed in a patient with high blood pressure [3]. Therefore, retinal blood vessels extracted from fundus images can be applied to the early diagnosis of some severe disease. This inspires the proposal of more accurate retinal blood vessel detection algorithms in order to facilitate the early diagnosis of pathological diseases.

However, the retinal blood vessels present extremely complicated structures, together with high tortuosity and various shapes [4], which makes the blood vessel segmentation task quite challenging. Different approaches have been proposed for blood vessel detection. They are mainly divided into two categories: manual segmentation and algorithmic segmentation. The manual way is time-consuming and in high-demand of skilled technical staff. Therefore, automated segmentation of retinal vessels, which can release the intense burden of manual segmentation, is highly demanded. However, due to the uneven intensity distribution of the retinal vascular images, the subtle contrast between the target vessels and the background of the images, high complexity of the vessel structures, coupled with image noise pollution, it is quite challenging to segment the retinal blood vessels in an accurate and efficient way.

Deep learning has shown its excellence in medical imaging tasks. Recently, the Fully Conventional Neural Network (FCN) based network such as U-Net [5]

has attracted more attention compared with the traditional Convolutional Neural Network (CNN) due to its ability to obtain a coarse-to-fine representation. In this study, we proposed an FCN-based network named Deformable U-Net (DUNet) that greatly enhances deep neural networks’ capability of segmenting vessels in an end-to-end and pixel-to-pixel manner. It is designed to have a U-shape similar to U-Net 

[5] where upsampling operators with a large number of feature channels are stacked symmetrically to the conventional CNN, so context information is captured and propagated to higher resolution layers and thus a more precise segmentation is obtained. Furthermore, inspired by the recently proposed deformable convolutional networks (Deformable-ConvNet) [6], we stacked deformable convolution blocks both in the encoder and decoder to capture the geometric transformations. Therefore, the receptive fields are adaptively adjusted according to the objects’ scales and shapes and complicated vessel structures can be well detected. Deformable-ConvNet and U-Net are used for comparison. All the networks were trained from scratch and detailed analysis of the experimental results was provided. In the next section, we give a brief literature review of related work. Section III explains the architecture of DUNet and systematic retinal blood vessel segmentation method. Deformable-ConvNet and U-Net are also introduced briefly in this section. Experimental results are presented in Section IV, where we evaluate the proposed method on three different retinal blood vessel datasets. Conclusions and discussions are given in Section V.

Fig. 1: The pipeline of the three networks. (a) Original image; (b) Training samples; (c) Snapshots of proposed DUNet and compared models. Note that blue blocks refer to deformable convolution and the white ones represent regular convolution; (d) Inference results; (e) Re-composition of segmentation results.

Ii Related work

The goal of retinal blood vessel segmentation is to locate and identify retinal vessel structures in the fundus images. With the development of imaging technology, various intelligent algorithms have been applied to retinal vessel segmentation. According to the learning patterns, segmentation methods can be divided into supervised method and unsupervised method. Supervised learning learns from training data to generate a model and predict test data from that model, it automatically finds the probable category of data. Next, a brief overview of vessel segmentation from these two aspects is given.

Ii-a Unsupervised method

The unsupervised method has no training samples in advance, and it constructs models directly in most cases. Zana et al. presented an algorithm based on mathematical morphology and curvature evaluation for the detection of vessel-like patterns in a noisy environment and they obtained an accuracy of 0.9377 [7]. Fraz et al. combined vessel centerlines detection and morphological bit plane slicing to extract vessel from retinal images [8]

. Martinez-Perez et al. proposed a method to automatically segment retinal blood vessels based on multiscale feature extraction 

[9]. Niemeijer et al. compared a number of vessel segmentation algorithms [10]. According to this study, the highest accuracy of those compared algorithms reached 0.9416. Zhang et al. presented a retinal vessel segmentation algorithm using an unsupervised texton dictionary, where vessel textons were derived from responses of a multi-scale Gabor filter bank [11]

. A better performance would be obtained if a proper pre-processing was carried out. Hassan et al. proposed a method which combined mathematical morphology and k-means clustering to segment blood vessels 

[12]. However, this method was not good at dealing with vessels of various widths. Tiny structures might be lost using this method. Oliveira et al. used a combined matched filter, Frangi’s filter, and Gabor Wavelet filter to enhance the vessels [13]. They took the average of a few performance metrics to enhance the contrast between vessels and background. Jouandeau et al. presented an algorithm which was based on an adaptive random sampling algorithm [14]. Garg et al. proposed a segmentation approach which modeled the vessels as trenches [15]. They corrected the illumination, detected trenches by high curvature, and oriented the trenches in a particular direction first. Then they used a modified region growing method to extract the complete vessel structure. A threshold of mean illumination level that was set empirically might bring bias in this method. Zardadi et al. presented a faster unsupervised method for automatic detection of blood vessels in fundus images [16]

. They enhanced the blood vessels in various directions; Then they presented an activation function on cellular responses; Next, they classified each pixel via an adaptive thresholding algorithm; Finally, a morphological post-processing was carried out. However, several spots were falsely segmented into vessels which affected the final performance of the algorithm.

Ii-B Supervised method

Different from unsupervised learning, supervised learning requires hand-labeled data in order to build an optimally predictive model. All the inputs are mapped to the corresponding outputs using the built model. It has been widely applied to the segmentation tasks. In order to reach the goal of segmentation, two processors are needed: One is an extractor to extract the feature vectors of pixels; The other one is a classifier to map extracted vectors to the corresponding labels. A number of feature extractors have been proposed, for instance, the Gabor filter 

[17], the Gaussian filter [18] etc. Various classifiers such as k-NN classifier [19]

, support vector machine (SVM) 

[20] [21], artificial neural networks (ANN) [22], AdaBoost [23] etc, have been proposed to deal with different tasks.

Supervised methods were used widely in retinal vessel segmentation. Aslani et al. proposed a new segmentation method which characterized pixels with a vector of hybrid features calculated via a different extractor. They trained a Random Forest classifier with the hybrid feature vector to classify vessel/non-vessel pixels 


. In order to simplify the model and increase the efficiency, the number of Gabor features should be reduced as small as possible. Marín et al. used Neural Network (NN) scheme for pixel classification and they computed a 7-D vector composed of gray-level and moment invariants-based features for pixel representation 

[25]. Yet the calculation cost was high and needed to be optimized.

For these traditional supervised methods, what features are used for classification greatly influence the final results of the prediction. However, they are often defined empirically, which requires the human intervention and may cause bias. Therefore, an automated and effective feature extractor is highly demanded to achieve higher efficiency.

Deep learning is an architecture referring to an algorithm set which can solve the image, text and other tasks based on backpropagation and multi-layer neural network. One of the most significant contributions of deep learning is that it can replace handcrafted features with features automatically learned from deep hierarchical feature extraction method 


In a number of fields such as image processing, bioinformatics, and natural language processing, various deep learning architectures such as Deep Neural Networks, Convolutional Neural Networks, Deep Belief Networks and Recurrent Neural Networks have been widely used and have shown that they could produce state-of-the-art results on various tasks. Recently, there are some studies that investigated the vessel segmentation problems based on deep learning. Wang et al. preprocessed the retinal vessel images and then combined two superior classifiers, Convolutional Neural Network (CNN) and Random Forest (RF) together to carry out the segmentation 

[27]. Fu et al. used the deep learning architecture, formulated the vessel segmentation to a holistically-nested edge detection (HED) problem, and utilized the fully convolution neural networks to generate vessel probability map [28]. Maji et al. used a ConvNet-ensemble based framework to process color fundus images and detect blood vessels [29]. Jiang et al. proposed a method which defined and computed pixels as primary features for segmentation, then a Neural Network (NN) classifier was trained using selected training data [30]

. In this method, each pixel was represented by an 8-D vector. Then the unlabeled pixels were classified based on the vector. Azemin et al. estimated the impact of aging based on the results of the supervised vessel segmentation using artificial neural network 

[31]. It showed that different age groups affected different aspect of segmentation results. Liskowski et al. proposed a supervised segmentation architecture that used a Deep Neural Network with a large training dataset which was preprocessed via global contrast normalization, zero-phase whitening, geometric transformations and gamma corrections [32]. And the network classified multiple pixels simultaneously using a variant structured prediction method. Fu et al. regarded the segmentation as a boundary detection problem and they combined the Convolution Neural Networks (CNN) and Conditional Random Field (CRF) layers into an integrated deep network to achieve their goal [33].

Overall, it is expected that deep learning approaches can overcome the difficulties existed in the traditional unsupervised and supervised methods. In our study, we developed a systematic framework using the fully convolutional based methods to finish the effective and automatic segmentation task of retinal blood vessels.

Iii Methodology

The goal of our work is to build deep learning models to segment retinal vessels in fundus images. Inspired by U-Net [5] and deformable convolutional network (Deformable-ConvNet) [6], we propose a new network named Deformable U-Net (DUNet) for retinal vessel segmentation task. The proposed approach is designed to integrate the advantages of both deformable unit and U-Net architecture. We will introduce our proposed method in details while giving a brief explanation of the two other networks as well.

Fig. 1 shows an overview of the proposed DUNet, U-Net and Deformable-ConvNet. The raw images are preprocessed and cropped into small patches to establish training and validation dataset. During contrastive experiments, different models will be set with corresponding patch size. Since DUNet and U-Net are both end-to-end deep learning frameworks for segmentation, a patch size is used to trade off between computing complexity and efficiency. Meanwhile, Deformable-ConvNet is a model for vessel classification, a patch size is chosen for training. After the inference of an image from the test dataset, all outputs from different models are re-composed to form a complete segmentation map respectively.

Fig. 2: Original retinal images (upper row) and corresponding ground truth (bottom row) examples from DRIVE, STARE and CHASE sequentially.

Iii-a Datasets and material

Performance was evaluated on three public datasets: DRIVE, STARE and CHASE_DB1 (CHASE) dataset. The DRIVE (Digital Retinal Images for Vessel Extraction) contains 40 colored fundus photographs which were obtained from a diabetic retinopathy (DR) screening program in the Netherlands [34]. The plane resolution of DRIVE is . STARE (Structured Analysis of the Retina) dataset, which contains 20 images, is proposed to assist the ophthalmologist to diagnose eye disease [35]. The plane resolution of STARE is . The CHASE dataset contains 28 images corresponding to two per patient for 14 children in the program Child Hear And Health Study in England [36]. The plane resolution of CHASE is . Experts’ manual annotations of the vascular are available as the ground truth (Fig. 2).

Iii-B Image preprocessing and dataset preparation

Deep neural network has the ability to learn from un-preprocessed image data effectively. While it tends to be much more efficient if appropriate preprocessing has been applied to the image data. In this study, three image preprocessing strategies were employed. Single channel images show the better vessel-background contrast than RGB images [37]. Thus, raw RGB images were converted into single channel ones. Normalization and Contrast Limited Adaptive Histogram Equalization [38] (CLAHE) were used over the whole data set to enhance the foreground-background contrast. Finally, gamma correction was introduced to improve the image quality much further. Intermediate images after each preprocessing step are shown in Fig. 3.

Fig. 3: Typical images after each preprocessing step. (a) Original image; (b) Normalized image; (c) Image after CLAHE operation; (d) Image after Gamma correction.

To reduce overfitting problem, our models were trained on small patches which were randomly extracted from the images. In order to reduce the calculation complexity and ensure the surrounding local features, we set the size of the patch to for DUNet and U-Net. The corresponding label for that patch was decided based on the ground truth images (Fig. 4).

Fig. 4: Typical patches selected for model training. (a) shows the patches from the original images; (b) shows the patches from the preprocessed image; (c) shows the patches from the corresponding ground truth.

All the datasets were divided into training set, validation set, and test set. The training set is used for adjusting the weights. Validation set is used for selecting the best weight while test set is used for performance evaluation. For DRIVE dataset, 20 images were used for training and validating purpose and the rest for testing. Since no splitting of training or test is provided for STARE/CHASE, we manually separated the first 10/14 images for training and validating and the remaining 10/14 for testing. From each training/validating image on DRIVE, 10000 patches were randomly sampled including 8,000 for training and 2000 for validating. From each training/validating image on STARE/CHASE, 20000/15000 patches were randomly sampled including 16000/12000 for training and 4000/3000 for validating. Therefore, DRIVE and STARE both had 160000 patches as training set and 40000 patches as validation set. Meanwhile, CHASE had 168000 patches as training set and 42000 patches as validation set. The test set consists of the whole rest images. Since the capacity of patch dataset is large enough, data augmentation is not token into consideration.

Iii-C Deformable U-Net (DUNet)

Inspired by U-Net [5] and deformable convolutional network (Deformable-ConvNet) [6], we proposed a network, named Deformable U-Net (DUNet) for retinal vessel segmentation task. The proposed network has a U-shaped architecture with encoders and decoders on two sides, and the original convolutional layer was replaced by the deformable convolutional block. The new model is trained to integrate the low-level feature with the high-level features, and the receptive field and sampling locations are trained to adaptive to vessels’ scale and shape, both of which enable precise segmentation. DUNet builds on top of U-Net and uses the deformable convolutional block as encoding and decoding unit.

Fig. 5

illustrates the network architecture. Detailed design of the deformable convolutional block is shown in the dashed window. The architecture consists of a convolutional encoder (left side) and a decoder (right side) in a U-Net framework. In each encoding and decoding phase, deformable convolutional blocks are used to model retinal vessels of various shapes and scales through learning local, dense and adaptive receptive fields. Each deformable convolutional block consists of a convolution offset layer, which is the kernel concept of deformable convolution, a convolution layer, a batch normalization layer 

[39] and an activation layer. During the decoding phase, we additionally insert a normal convolution layer after merge operation to adjust filter numbers for convolution offset layer. With this architecture, DUNet can learn discriminative features and generate the detailed retinal vessel segmentation results.

Fig. 5: DUNet architecture with convolutional encoder and decoder using deformable convolutional block based on U-Net architecture. Output size of feature map is listed beside each layer.

Iii-C1 U-Net as the basic architecture

Our U-Net architecture has an overall architecture similar to the standard U-Net, consisting of an encoder and a decoder symmetrically on the two sides of the architecture. The encoding phase is used to encode input images in a lower dimensionality with richer filters, while the decoding phase is designed to do the inverse process of encoding by upsampling and merging low dimensional feature maps, which enables the precise localization. Besides, in the upsampling part, a larger number feature channels are used in order to propagate the context to higher resolution layers. In order to solve the internal covariate shift problem and speed up the processing, a batch normalization layer was inserted after each unit.

Fig. 6: Illustration of the sampling locations in normal and deformable convolutions. The upper row stands for normal convolution and the corresponding deformable convolution is in the bottom row. Each sampling location has an offset to generate new sampling location.

Iii-C2 Deformable Convolutional Blocks

A big challenge in vessel segmentation is to model the vessels with various shapes and scales [6]. Traditional methods such as the steerable filter [40], Frangi filter [41] exploit the vessel features through linear combination of responses at multiple scales or direction, which may bring bias. The deformable convolutional network (Deformable-ConvNet) solved this problem by introducing deformable convolutional layers and deformable ROI pooling layers into the traditional neural networks. We were inspired by the idea from Deformable-ConvNet that the various shapes and scales can be captured via deformable receptive fields, which are adaptive to the input features. Therefore, we integrated the deformation convolution into the proposed network.

In the deformable convolution, offsets were added to the grid sampling locations which are normally used in the standard convolution. The offsets were learned from the preceding feature maps produced by the additional convolutional layers. Therefore, the deformation is able to adapt to different scales, shapes, orientation, etc. We take the deformable convolution as an example in Fig. 6.

As Fig. 6 shows, for a sized kernel with grid size 1, the normal convolution grid can be formalized as:


Thus, each location from output feature map can be formalized as:


Where denotes the input feature map, represents the weights of sampled value and means the locations in . While in deformable convolution, normal grid is enhanced by the offset , we have


Because offset

is usually not an integer, bilinear interpolation is applied to determine the value of the sampled points after migration. As mentioned above, the offset

is learned by an additional convolution layer. This procedure is illustrated in Fig. 7. Compared to the regular U-Net, DUNet may incur some computation cost in order to perform in a more local and adaptive manner.

Fig. 7: Illustration of a deformable convolution. Offset field comes from the input patches and features while the channel dimension is 2N corresponding to N 2D offsets. Deformable convolutional kernel has the same resolution as the current convolution layer. The convolution kernels and the offsets are learned at the same time.

Iii-D Compare with U-Net and Deformable-ConvNet

We compared our proposed model with two state-of-the-art works. One is the normal U-Net, which we have introduced above; The other is the deformable convolutional network (Deformable-ConvNet). Deformable-ConvNet was originally introduced to distinguish whether a pixel belongs to vessel or not. In this model, vessel segmentation was considered as a classification task. A pixel’s class can be determined based on its neighborhood defined as the patch centered on this pixel. For a selected pixel, which needs to be classified, we used pixel values in a patch centered on that selected pixel to capture the local information at a high level. In order to reduce the calculation complexity and to maximally capture the local features, the size of the patch was set to . The architecture of the Deformable-ConvNet is shown in Fig. 8.

Fig. 8:

The architecture of the Deformable-ConvNet. It is mainly composed of convolution layers (Conv), deformable convolutional layers (ConvOffset), batch normalization layers and activation layers (ReLU).

Iii-E Performance evaluation metrics

We evaluated our model using several metrics: Accuracy (ACC), Positive Predictive Value (PPV), True Positive Rate (TPR), True Negative Rate (TNR) and the Area Under Curve (AUC) of Receiver Operating Characteristic (ROC). ACC is a metric for measuring the ratio between the correctly classified pixels and the total pixels in the dataset. PPV, which is also called precision, indicates the proportion of the true positive samples among all the predicted positive samples. TPR, also known as sensitivity, measures the proportion of positives that are correctly identified. TNR, or specificity, measures the proportion of negatives that are correctly identified. These metrics have the forms as following:


Where TP represents the number of the true positive samples; TN stands for the number of the true negative samples; FP means the number of the false positive samples; FN means number of the false negative samples.

Additionally, performance was evaluated with F-measure ([42] and Jaccard similarity (JS) [43] to compare the similarity and diversity of testing datasets. Here GT refers to the ground truth and SR refers to the segmentation result.


Iv Experimental result

The proposed DUNet has upsampling layers to increase output resolution. It enables propagation of the context information to the higher resolution layers and detection of vessels in various shapes and scales, thus presents an accurate segmentation result. In this section, we systematically compared the DUNet with Deformable-ConvNet and U-Net. We firstly show the results based on the validation set, which is used for parameter selection. Then the results on the test set are presented. We also briefly compared DUNet with some other recently published approaches, most of which are under deep neural network framework and the others are standard segmentation approaches. All experiments were conducted under the Tensorflow 


and Keras 

[45] frameworks using an NVIDIA GeForce GTX 1080Ti GPU.

Iv-a Comparisons with Deformable-ConvNet and U-Net

We compared the three models, Deformable-ConvNet, U-Net and DUNet based on the DRIVE, STARE and CHASE datasets. As described in Section III

, we split the data into training set, validation set, and test set. We trained the three models from scratch using the training set and initialized the weights with random values. We set the batch size to 60, total training epochs to 100, Adam as optimizer and binary cross-entropy as our loss function. To ensure a quick convergence and avoid overfitting, we used a dynamic method to set the learning rate values. The initial learning rate was set to 0.001. If the loss values remained stable after

epochs, the learning rate was reduced 10 times. Additionally, the training process would be ceased if loss value stayed almost unchanged after epochs. Here and are set to 4 and 20 empirically. The validation accuracy and loss values were recorded during the training phase. Performance on validation dataset is reflected in Table I.

Deformable-ConvNet 0.9622 0.1101 0.9501 0.1593 0.9651 0.0962
U-Net 0.9648 0.1413 0.9573 0.2659 0.9664 0.1366
DUNet 0.9650 0.0919 0.9543 0.1477 0.9704 0.0833
TABLE I: Performance of the three scratched-trained models on DRIVE, STARE and CHASE datasets

From Table I, it shows that DUNet achieved the highest validation accuracy of 0.9650 and got the lowest loss value of 0.0919 on DRIVE dataset. On STARE dataset, it had the second highest accuracy and lowest loss value. And on CHASE dataset, it had the highest validation accuracy of 0.9704 and got the lowest loss value of 0.0833. Bar chart of the performance is shown in Fig. 9.

Fig. 9: Performance comparisons using three models using the validation dataset. (a) validation performance on DRIVE; (b) validation performance on STARE; (c) validation performance on CHASE.
Fig. 10: ROC curves of different models. (a) ROC curves on DRIVE; (b) ROC curves on STARE; (c) ROC curves on CHASE.

We further evaluated the model using the test data. PPV, TPR, TNR, ACC, -scores, JS and AUC were compared and shown in Table II, Table III and Table IV. It shows from the tables that the DUNet achieves the best performance in terms of most of the metrics. To be noticed, the DUNet achieves the highest accuracy among the three models. The global accuracy for Deformable-ConvNet, U-Net, and DUNet is 0.9642/0.9681/0.9697 on DRIVE, 0.9673/0.9705/0.9729 on STARE and 0.9659/0.9728/0.9724 on CHASE, respectively.

Models DRIVE
Deformable-ConvNet 0.8180 0.7618 0.9837 0.9642 0.7889 0.9642 0.9745
U-Net 0.8795 0.7373 0.9903 0.9681 0.8021 0.9681 0.9830
DUNet 0.8537 0.7894 0.9870 0.9697 0.8203 0.9697 0.9856
TABLE II: Performance of the three models tested on DRIVE
Models STARE
Deformable-ConvNet 0.8447 0.7036 0.9892 0.9673 0.7677 0.9674 0.9742
U-Net 0.9225 0.6712 0.9953 0.9705 0.7770 0.9705 0.9813
DUNet 0.8856 0.7428 0.9920 0.9729 0.8079 0.9729 0.9868
TABLE III: Performance of the three models tested on STARE
Models CHASE
Deformable-ConvNet 0.7024 0.7727 0.9786 0.9659 0.7359 0.9659 0.9772
U-Net 0.8211 0.7124 0.9898 0.9728 0.7629 0.9728 0.9830
DUNet 0.7510 0.8229 0.9821 0.9724 0.7853 0.9724 0.9863
TABLE IV: Performance of the three models tested on CHASE
Method Type Year PPV TPR TNR ACC AUC
Azzopardi et al. [46] STA 2015 - 0.7655 0.9704 0.9442 0.9614
Li et al. [47] DNN 2015 - 0.7569 0.9816 0.9527 0.9738
Liskowski et al. [32] DNN 2016 - 0.7811 0.9807 0.9535 0.9790
Fu et al. [33] DNN 2016 - 0.7603 - 0.9523 -
Dasgupta et al. [48] DNN 2017 0.8498 0.7691 0.9801 0.9533 0.9744
Roychowdhury et al. [49] STA 2017 - 0.7250 0.9830 0.9520 0.9620
Chen et al. [50] DNN 2017 - 0.7426 0.9735 0.9453 0.9516
Alom et al. [51] DNN 2018 - 0.7792 0.9813 0.9556 0.9784
DUNet DNN 2018 0.8537 0.7894 0.9870 0.9697 0.9856
TABLE V: Comparisons against existing approaches on DRIVE dataset
Method Type Year PPV TPR TNR ACC AUC
Azzopardi et al. [46] STA 2015 - 0.7716 0.9701 0.9497 0.9563
Li et al. [47] DNN 2015 - 0.7726 0.9844 0.9628 0.9879
Liskowski et al. [32] DNN 2016 - 0.8554 0.9862 0.9729 0.9928
Fu et al. [33] DNN 2016 - 0.7412 - 0.9585 -
Roychowdhury et al. [49] STA 2017 - 0.7720 0.9730 0.9510 0.9690
Chen et al. [50] DNN 2017 - 0.7295 0.9696 0.9449 0.9557
Alom et al. [51] DNN 2018 - 0.8298 0.9862 0.9712 0.9914
DUNet DNN 2018 0.8856 0.7428 0.9920 0.9729 0.9868
TABLE VI: Comparisons against existing approaches on STARE dataset
Method Type Year PPV TPR TNR ACC AUC
Azzopardi et al. [46] STA 2015 - 0.7585 0.9587 0.9387 0.9487
Li et al. [47] DNN 2015 - 0.7507 0.9793 0.9581 0.9716
Fu et al. [33] DNN 2016 - 0.7130 - 0.9489 -
Roychowdhury et al. [49] STA 2017 - 0.7201 0.9824 0.9530 0.9532
Alom et al. [51] DNN 2018 - 0.7759 0.9820 0.9634 0.9715
DUNet DNN 2018 0.7510 0.8229 0.9821 0.9724 0.9863
TABLE VII: Comparisons against existing approaches on CHASE dataset

We further evaluated the models using ROC curves, which is shown in Fig. 10. The closer the ROC curve to the top-left border is in the ROC coordinates, the more accurate a model is. It can be seen that the curves of DUNet are the most top-left one among the three models while the Deformable-ConvNet curve is the lowest one of the three. Besides, figures also show that the DUNet obtains the largest area under the ROC curve (AUC).

Fig. 11: Segmentation results using the different models on DRIVE.
Fig. 12: Segmentation results using the different models on STARE.
Fig. 13: Segmentation results using the different models on CHASE.
Fig. 14: Magnified view of green-boxed patches predicted by different models on DRIVE (two rows above), STARE (two rows middle) and CHASE (two rows below).
Fig. 15: Four distinct style of retinal images synthesized by generative adversarial nets.
Fig. 16: Detailed view of four images on WIDE. Red boxes show segmentation cases that DUNet perform better than the other two methods.

Iv-B Retinal vessel segmentation results

We display the retinal vessel segmentation results in Fig. 11, Fig. 12 and Fig. 13. From figures, it can be observed that DUNet produces more distinct vessel segmentation results. The proposed DUNet can detect weak vessel or vessels that are tied up which may be lost in U-Net and Deformable-ConvNet, thus it is more powerful to preserve more details.

We show the details of the segmentation results of the three models in Fig. 14, it shows the local magnification view of vascular junction, where several vessels are tied up and close to each other, and tiny vessels of DRIVE, STARE and CHASE respectively. Due to the complicated vascular tree, segmentation algorithms are difficult to proceed precisely with such complicated structures. In the junction region of vessel, Deformable-ConvNet and U-Net extracted coarse information due to the limitation of network. It is worth mentioning that Deformable-ConvNet extracted more detailed vessel than U-Net in some junction regions, which showed its ability to capture retinal vessels of various shapes. With the help of deformable convolutional blocks, the DUNet successfully segmented the tied vessels. In the tiny vessel regions, U-Net showed its limitation in handling details. However, Deformable-ConvNet picked them up somewhere. As a result, the DUNet got a desiring segmentation results in those tiny and weak vessels.

With this structure, the DUNet is able to distinguish different vessels and present a better performance than the other models. Experimental results arrival at a conclusion that DUNet architecture has a more desirable performance in dealing with complicated and weak vessel structures among the three models.

Iv-C Comparison against existing methods

We also compared our method with several state-of-the-art approaches. Among them, some are standard segmentation algorithms (denoted with STA) while the others are all based on deep neural networks (denoted with DNN). Table VVI, VII summarize the type of algorithm, year of publication, and performance on DRIVE, STARE and CHASE dataset. From the results, it shows that DUNet architecture performs the best among those methods on DRIVE and CHASE. It achieves the highest global accuracy of 0.9697/0.9724 and the highest AUC of 0.9856/0.9863 with a small quantity of training samples, which shows that the DUNet exhibits state-of-the-art performance comparing both standard segmentation methods and deep neural network based methods. Although DUNet performs not better than Liskowski et al.’s method [32] and Alom et al.’s method [51] on STARE, DUNet uses less training patch samples than their methods while reaches a desiring results.

Additionally, we compared our method with Dasgupta et al.’s method and Alom et al.’s method on the other two datasets for qualitative and quantitative analysis. The first dataset named WIDE, used for tree topology estimation, contains 15 high-resolution, wide-field, RGB images. Each retinal image was taken from a different individual and captured as an un-compressed TIFF file at the widest setting [52]. The WIDE dataset does not contain ground truth for retinal vessel segmentation. We used the WIDE for qualitative analysis and compared the proposed method with the other two methods. The second dataset (denoted with SYNTHE) is synthesized from generative adversarial nets [53]. The dataset contains 20 retinal images at resolution, which includes DRIVE [19], STARE [35], Kaggle and HRF [54] style. Each of these styles contains 5 retinal images generated by 5 corresponding ground truth images. Fig. 15 shows the SYNTHE dataset, four distinct style retinal images are generated from the same vessel map.

To show the generalization of these three models, we used the weights well-trained on DRIVE and predicted on WIDE and SYNTHE dataset. We preprocessed and cropped these images in patches in the same way. From Fig. 16, it qualitatively indicates that DUNet produces competitive results.

To further validate quantitatively the performance of these models, we also used the well-trained weights from DRIVE and tested on the SYNTHE datasets. We mixed the four distinct style images together, preprocessed and cropped SYNTHE in patches in the same way. The performances of three models are summarized in Table VIII, which prove quantitatively that the DUNet gets the best performance among all these three models overall.

Dasgupta et al. [48] 0.8485 0.7660 0.9868 0.9675 0.8052 0.9675 0.9822
Alom et al. [51] 0.8509 0.7728 0.9870 0.9682 0.8100 0.9682 0.9831
DUNet 0.8537 0.7894 0.9870 0.9697 0.8203 0.9697 0.9855
TABLE VIII: Performances of the three models tested on SYNTHE using well-trained weights on DRIVE

V Conclusion

Deep neural networks, which uses hierarchical layers of learned features to accomplish high-level tasks, has been applied to a wide range of medical processing tasks. In this study, we propose a fully convolutional neural network, named DUNet to handle the retinal vessel segmentation task in a pixel-wise manner. DUNet is an extension of the U-Net with convolutional layers replaced by the deformable convolution blocks. With the symmetric U-shape architecture, DUNet is designed to capture context by the encoder and enable precise localization by the decoder through combining the low-level feature maps with the high-level ones. It also allows the context being propagated to the higher resolution layers through a larger number of feature channels in the upsampling part. Furthermore, with the deformable convolution blocks, DUNet is able to capture the retinal blood vessels at various shapes and scales by adaptively adjusting the receptive fields according to the vessels’ scales and shapes. By adding offsets to the regular sampling grids of standard convolution, the receptive fields are deformable and augmented. While it does bring some extra costs of computation resources from convolution offset layer. In order to test the performance of the proposed network, we have trained Deformable-ConvNet and U-Net from scratch for comparison. This is also the first time that DUNet being used to conduct the retinal segmentation. Besides, a comparison with several standard segmentation algorithms and some other deep neural network based approaches are introduced here. We train and test the models on three public datasets: DRIVE, STARE and CHASE_DB1. To validate the generalization of our model, we tested the DUNet on WIDE and SYNTHE datasets, and analyze qualitatively and quantitatively. Results show that with the help of deformable convolutional blocks, more detailed vessels are extracted, and the DUNet exhibits state-of-the-art performance in segmenting the retinal vessels.

In the future, more retinal vessel data will be incorporated to validate the proposed end-to-end model. We also plan to extend our DUNet architecture to three dimensions, aiming to obtain more accurate results in medical image analysis tasks.


This work is supported by the Science and Technology Program of Tianjin, China [Grant No. 16ZXHLGX00170], the National Key Technology R&D Program of China [Grant No. 2015BAH52F00] and the National Natural Science Foundation of China [Grant No. 61702361].


  • [1] T. J. Smart, C. J. Richards, R. Bhatnagar, C. Pavesio, R. Agrawal, and P. H. Jones, “A study of red blood cell deformability in diabetic retinopathy using optical tweezers,” in Optical Trapping and Optical Micromanipulation XII, vol. 9548.   International Society for Optics and Photonics, 2015, p. 954825.
  • [2] S. Irshad and M. U. Akram, “Classification of retinal vessels into arteries and veins for detection of hypertensive retinopathy,” in Biomedical Engineering Conference (CIBEC), 2014 Cairo International.   IEEE, 2014, pp. 133–136.
  • [3] C. Y.-l. Cheung, Y. Zheng, W. Hsu, M. L. Lee, Q. P. Lau, P. Mitchell, J. J. Wang, R. Klein, and T. Y. Wong, “Retinal vascular tortuosity, blood pressure, and cardiovascular risk factors.” Ophthalmology, vol. 118, no. 5, pp. 812–818, 2011.
  • [4] Z. Han, Y. Yin, X. Meng, G. Yang, and X. Yan, “Blood Vessel Segmentation in Pathological Retinal Image,” in IEEE International Conference on Data Mining Workshop, 2014, pp. 960–967.
  • [5] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2015, pp. 234–241.
  • [6] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable Convolutional Networks,” in

    International Conference on Computer Vision

    , 2017, pp. 764–773.
  • [7] F. Zana and J. Klein, “Segmentation of vessel-like patterns using mathematical morphology and curvature evaluation,” IEEE Transactions on Image Processing, vol. 10, no. 7, pp. 1010–1019, 2001.
  • [8] M. M. Fraz, S. A. Barman, P. Remagnino, A. Hoppe, A. Basit, B. Uyyanonvara, A. R. Rudnicka, and C. G. Owen, “An approach to localize the retinal blood vessels using bit planes and centerline detection,” Computer Methods & Programs in Biomedicine, vol. 108, no. 2, pp. 600–616.
  • [9] M. Martinez-Perez, S. Hughes, Adthom, A. Bharath, and K. Parker, “Segmentation of blood vessels from red-free and fluorescein retinal images,” Medical Image Analysis, vol. 11, no. 1, pp. 47–61, 2007.
  • [10] M. Niemeijer, J. Staal, B. van Ginneken, M. Loog, and M. D. Abramoff, “Comparative study of retinal vessel segmentation methods on a new publicly available database,” in Medical Imaging 2004: Image Processing, vol. 5370.   International Society for Optics and Photonics, 2004, pp. 648–657.
  • [11] L. Zhang, M. Fisher, and W. Wang, “Retinal vessel segmentation using multi-scale textons derived from keypoints,” Computerized Medical Imaging & Graphics, vol. 45, pp. 47–56, 2015.
  • [12] G. Hassan, N. El-Bendary, A. E. Hassanien, A. Fahmy, A. M. Shoeb, and V. Snasel, “Retinal Blood Vessel Segmentation Approach Based on Mathematical Morphology,” Procedia Computer Science, vol. 65, pp. 612–622, 2015.
  • [13] W. S. Oliveira, J. V. Teixeira, T. I. Ren, G. D. Cavalcanti, and J. Sijbers, “Unsupervised Retinal Vessel Segmentation Using Combined Filters,” Plos One, vol. 11, no. 2, p. e0149943, 2016.
  • [14] N. Jouandeau, Z. Yan, P. Greussay, B. Zou, and Y. Xiang, “Retinal Vessel Segmentation Based on Adaptive Random Sampling,” Journal of Medical and Bioengineering, vol. 3, no. 3, pp. 199–202, 2014.
  • [15] S. Garg, J. Sivaswamy, and S. Chandra, “Unsupervised curvature-based retinal vessel segmentation,” in IEEE International Symposium on Biomedical Imaging: From Nano To Macro.   IEEE, 2007, pp. 344–347.
  • [16] M. Zardadi, N. Mehrshad, and S. M. Razavi, “Unsupervised Segmentation of Retinal Blood Vessels Using the Human Visual System Line Detection Model,” Information Systems & Telecommunication, vol. 4, pp. 125–133, 03 2016.
  • [17] Y. Hamamoto, S. Uchimura, M. Watanabe, T. Yasuda, Y. Mitani, and S. Tomita, “A gabor filter-based method for recognizing handwritten numerals,” Pattern Recognition, vol. 31, no. 4, pp. 395–400, 1998.
  • [18] V. Nguyen and M. Blumenstein, “An Application of the 2D Gaussian Filter for Enhancing Feature Extraction in Off-line Signature Verification,” in International Conference on Document Analysis and Recognition, 2011, pp. 339–343.
  • [19] J. Staal, M. D. Abràmoff, M. Niemeijer, M. A. Viergever, and B. Van Ginneken, “Ridge-based vessel segmentation in color images of the retina,” IEEE Transactions on Medical Imaging, vol. 23, no. 4, pp. 501–509, 2004.
  • [20] E. Ricci and R. Perfetti, “Retinal blood vessel segmentation using line operators and support vector classification,” IEEE Transactions on Medical Imaging, vol. 26, no. 10, pp. 1357–65, 2007.
  • [21] X. You, Q. Peng, Y. Yuan, Y. M. Cheung, and J. Lei, “Segmentation of retinal blood vessels using the radial projection and semi-supervised approach,” Pattern Recognition, vol. 44, no. 10, pp. 2314–2324, 2011.
  • [22] C. Sinthanayothin, J. F. Boyce, H. L. Cook, and T. H. Williamson, “Automated localisation of the optic disc, fovea, and retinal blood vessels from digital colour fundus images,” British Journal of Ophthalmology, vol. 83, no. 8, p. 902, 1999.
  • [23] X. Li, L. Wang, and E. Sung, “AdaBoost with SVM-based component classifiers,”

    Engineering Applications of Artificial Intelligence

    , vol. 21, no. 5, pp. 785–795, 2008.
  • [24] S. Aslani and H. Sarnel, “A new supervised retinal vessel segmentation method based on robust hybrid features,” Biomedical Signal Processing & Control, vol. 30, pp. 1–12, 2016.
  • [25] D. Marin, A. Aquino, M. E. Gegundezarias, and J. M. Bravo, “A new supervised method for blood vessel segmentation in retinal images by using gray-level and moment invariants-based features,” IEEE Transactions on Medical Imaging, vol. 30, no. 1, pp. 146–158, 2011.
  • [26] H. A. Song and S. Y. Lee, “Hierarchical Representation Using NMF,” in International Conference on Neural Information Processing, 2013, pp. 466–473.
  • [27] S. Wang, Y. Yin, G. Cao, B. Wei, Y. Zheng, and G. Yang, “Hierarchical retinal blood vessel segmentation based on feature and ensemble learning,” Neurocomputing, vol. 149, pp. 708–717, 2015.
  • [28] H. Fu, Y. Xu, D. W. K. Wong, and J. Liu, “Retinal vessel segmentation via deep learning network and fully-connected conditional random fields,” in IEEE International Symposium on Biomedical Imaging, 2016, pp. 698–701.
  • [29] D. Maji, A. Santara, P. Mitra, and D. Sheet, “Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images,” arXiv preprint arXiv:1603.04833, 2016.
  • [30] P. Jiang, Q. Dou, and X. Hu, “A supervised method for retinal image vessel segmentation by embedded learning and classification,” Journal of Intelligent and Fuzzy Systems, vol. 29, no. 5, pp. 2305–2315, 2015.
  • [31] M. Z. C. Azemin, F. A. Hamid, M. I. B. M. Tamrin, and A. H. M. Amin, “Supervised Retinal Vessel Segmentation Based on Neural Network Using Broader Aging Dataset.” in IWBBIO, 2014, pp. 1235–1242.
  • [32] P. Liskowski and K. Krawiec, “Segmenting Retinal Blood Vessels With Deep Neural Networks,” IEEE Transactions on Medical Imaging, vol. 35, no. 11, pp. 2369–2380, 2016.
  • [33] H. Fu, Y. Xu, S. Lin, D. W. K. Wong, and J. Liu, “DeepVessel: Retinal Vessel Segmentation via Deep Learning and Conditional Random Field,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2016, pp. 132–139.
  • [34] J. Staal, M. D. Abràmoff, M. Niemeijer, M. A. Viergever, and B. Van Ginneken, “Ridge-based vessel segmentation in color images of the retina,” IEEE Transactions on Medical Imaging, vol. 23, no. 4, pp. 501–509, 2004.
  • [35] A. Hoover, V. Kouznetsova, and M. Goldbaum, “Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response,” IEEE Transactions on Medical Imaging, vol. 19, no. 3, pp. 203–210, 2002.
  • [36] C. G. Owen, A. R. Rudnicka, R. Mullen, S. A. Barman, D. Monekosso, P. H. Whincup, J. Ng, and C. Paterson, “Measuring retinal vessel tortuosity in 10-year-old children: validation of the Computer-Assisted Image Analysis of the Retina (CAIAR) program.” Investigative Ophthalmology & Visual Science, vol. 50, no. 5, pp. 2004–2010, 2009.
  • [37] J. V. Soares, J. J. Leandro, R. M. Cesar, H. F. Jelinek, and M. J. Cree, “Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification,” IEEE Transactions on Medical Imaging, vol. 25, no. 9, pp. 1214–1222, 2006.
  • [38] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive histogram equalization and its variations,” Computer Vision Graphics & Image Processing, vol. 39, no. 3, pp. 355–368, 1987.
  • [39] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in

    International Conference on Machine Learning

    , 2015, pp. 448–456.
  • [40] W. T. Freeman and E. H. Adelson, “The design and use of steerable filters,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 9, pp. 891–906, 1991.
  • [41] A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. Viergever, “Muliscale Vessel Enhancement Filtering,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 1496, 1998, pp. 130–137.
  • [42] Y. Sasaki, “The truth of the F-measure,” Teach Tutor Mater, 2007.
  • [43] P. Jaccard, “The Distribution of the Flora in the Alpine Zone,” New Phytologist, vol. 11, no. 2, pp. 37–50, 2010.
  • [44] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” arXiv preprint arXiv:1603.04467, 2015.
  • [45] F. Chollet and others, Keras.   GitHub. [Online]. Available: https://github.com/keras-team/keras
  • [46] G. Azzopardi, N. Strisciuglio, M. Vento, and N. Petkov, “Trainable COSFIRE filters for vessel delineation with application to retinal images,” Medical Image Analysis, vol. 19, no. 1, pp. 46–57, 2015.
  • [47] Q. Li, B. Feng, L. P. Xie, P. Liang, H. Zhang, and T. Wang, “A Cross-Modality Learning Approach for Vessel Segmentation in Retinal Images,” IEEE Transactions on Medical Imaging, vol. 35, no. 1, pp. 109–118, 2015.
  • [48] A. Dasgupta and S. Singh, “A fully convolutional neural network based structured prediction approach towards the retinal vessel segmentation,” in International Symposium on Biomedical Imaging.   IEEE, 2017, pp. 248–251.
  • [49] S. Roychowdhury, D. D. Koozekanani, and K. K. Parhi, “Blood Vessel Segmentation of Fundus Images by Major Vessel Extraction and Subimage Classification,” IEEE Journal of Biomedical & Health Informatics, vol. 19, no. 3, pp. 1118–1128, 2017.
  • [50] Y. Chen, “A Labeling-Free Approach to Supervising Deep Neural Networks for Retinal Blood Vessel Segmentation.” arXiv preprint arXiv:1704.07502, 2017.
  • [51] M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, “Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation,” arXiv preprint arXiv:1802.06955, 2018.
  • [52] R. Estrada, C. Tomasi, S. C. Schmidler, and S. Farsiu, “Tree Topology Estimation.” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 37, no. 8, pp. 1688–1701, 2015.
  • [53]

    H. Zhao, H. Li, S. Maurerstroh, and L. Cheng, “Synthesizing retinal and neuronal images with generative adversarial nets,”

    Medical Image Analysis, vol. 49, pp. 14–26, 2018.
  • [54] T. Köhler, A. Budai, M. F. Kraus, J. Odstrčilik, G. Michelson, and J. Hornegger, “Automatic no-reference quality assessment for retinal fundus images using vessel segmentation,” in IEEE International Symposium on Computer-Based Medical Systems, 2013, pp. 95–100.