DPN: Detail-Preserving Network with High Resolution Representation for Efficient Segmentation of Retinal Vessels

09/25/2020
by   Song Guo, et al.
0

Retinal vessels are important biomarkers for many ophthalmological and cardiovascular diseases. It is of great significance to develop an accurate and fast vessel segmentation model for computer-aided diagnosis. Existing methods, such as U-Net follows the encoder-decoder pipeline, where detailed information is lost in the encoder in order to achieve a large field of view. Although detailed information could be recovered in the decoder via multi-scale fusion, it still contains noise. In this paper, we propose a deep segmentation model, called detail-preserving network (DPN) for efficient vessel segmentation. To preserve detailed spatial information and learn structural information at the same time, we designed the detail-preserving block (DP-Block). Further, we stacked eight DP-Blocks together to form the DPN. More importantly, there are no down-sampling operations among these blocks. As a result, the DPN could maintain a high resolution during the processing, which is helpful to locate the boundaries of thin vessels. To illustrate the effectiveness of our method, we conducted experiments over three public datasets. Experimental results show, compared to state-of-the-art methods, our method shows competitive/better performance in terms of segmentation accuracy, segmentation speed, extensibility and the number of parameters. Specifically, 1) the AUC of our method ranks first/second/third on the STARE/CHASE_DB1/DRIVE datasets, respectively. 2) Only one forward pass is required of our method to generate a vessel segmentation map, and the segmentation speed of our method is over 20-160x faster than other methods on the DRIVE dataset. 3) We conducted cross-training experiments to demonstrate the extensibility of our method, and results revealed that our method shows superior performance. 4) The number of parameters of our method is only around 96k, less then all comparison methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 5

page 8

06/28/2019

Accurate Retinal Vessel Segmentation via Octave Convolution Neural Network

Retinal vessel segmentation is a crucial step in diagnosing and screenin...
11/19/2018

M2U-Net: Effective and Efficient Retinal Vessel Segmentation for Resource-Constrained Environments

In this paper, we present a novel neural network architecture for retina...
04/06/2021

Pyramid U-Net for Retinal Vessel Segmentation

Retinal blood vessel can assist doctors in diagnosis of eye-related dise...
04/17/2021

Objective-Dependent Uncertainty Driven Retinal Vessel Segmentation

From diagnosing neovascular diseases to detecting white matter lesions, ...
11/25/2020

The Unreasonable Effectiveness of Encoder-Decoder Networks for Retinal Vessel Segmentation

We propose an encoder-decoder framework for the segmentation of blood ve...
09/26/2019

A Symmetric Equilibrium Generative Adversarial Network with Attention Refine Block for Retinal Vessel Segmentation

Objective: Recognizing retinal fundus vessel abnormity is vital to early...
10/19/2019

Fast and Light-weight Portrait Segmentation

Improving the efficiency of portrait segmentation is of great importance...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Retinal blood vessels are an important part of fundus images, and they can be applied to the diagnosis of many ophthalmological diseases, such as diabetic retinopathy [35], cataract [2], and hypertensive retinopathy [12]. Specifically, when patients with diffuse choroidal hemangioma, retinal blood vessels will expand [26]. Vascular structures in patients with cataract are unclear or even invisible [2]. In addition, as retinal blood vessels and cerebral blood vessels are similar in anatomical, physiological and embryological characteristics, so that retinal vessels are also important biomarkers to some cardiovascular diseases [36, 25]. Accurate segmentation of blood vessels is the basic step of efficient computer-aided diagnosis (CAD). However, manual segmentation of retinal vessels is time-consuming and relies heavily on human experience. Therefore, it is necessary to develop accurate and fast vessel segmentation methods for CAD.

Considering the clinical application scenarios, a good vessel segmentation model for CAD should satisfy the following three conditions. 1) High accuracy. The model needs to be capable to recognize both thin vessels and thick vessels, even for extremely thin vessels with one pixel width. For example, the appearance of neovascularization can be used to diagnose and grade diabetes retinopathy. 2) Good extensibility. The model needs to show good extensibility/generalization ability after it is training done. In other words, when the model is applied to clinical images, it needs to perform well, not just on the test set. 3) Fast running speed. The model needs to have a fast processing speed to meet clinical application, as faster speed means greater throughput and higher processing efficiency.

Existing vessel segmentation methods could be divided into two categories [9]: unsupervised methods and supervised methods. Unsupervised methods utilize manually designed low-level features and rules [1], therefore, they show poor extensibility. Supervised methods utilize human annotated training images, and their segmentation accuracy is usually higher than that of unsupervised methods [25]

. Deep learning based supervised methods could learn high-level features in an end-to-end manner, and they show superior performance in terms of segmentation accuracy and extensibility 

[14, 40]. Most deep vessel segmentation models follow the architecture of fully convolutional network (FCN) [27], in which the resolution of features is first down-sampled and then up-sampled to generate pixel-wise segmentation maps. However, the detailed information is lost in FCN. Furthermore, a U-Net [24] model was proposed, which could utilize intermediate layers in the up-sampling path to fuse more spatial information to generate fine segmentation maps. Although the detailed information could be utilized in the U-Net, the extra noise was also introduced. Moreover, most U-Net variant models [14] require multiple forward passes to generate a segmentation map for one testing image, since they splitted one fundus image into hundreds of small patches. As a result, they show slow segmentation speed and the contextual information is not fully utilized.

To preserve detailed information and avoid the introduction of noise, in this paper, we present the detail-preserving network (DPN). Inspired by HRNet [32], the DPN learns the high resolution representation directly rather than the low resolution representation. In this manner, the DPN could locate the boundaries of thin vessels accurately. To this end, on one hand, we present the detail-preserving block (DP-Block), where multi-scale features are fused in a cascaded manner, so that more contextual information could be utilized. And, the resolution of input features and output features of DP-Block is never changed, so that the detailed spatial information could be preserved. On the other hand, we stacked several DP-Blocks together to form the DPN. We note that there are no down-sampling operations among these DP-Blocks, so that the DPN could learn both semantic features via a large field of view and preserve the detailed information simultaneously. To validate the effectiveness of our method, we conducted experiments on the DRIVE, STARE and CHASE_DB1 datasets. Experimental results reveal that our method shows competitive/better performance compared with other state-of-the-art methods.

Overall, our contributions are summarized as follows.

  1. We present the detail-preserving block, which could learn the structural information and preserve the detailed information via intra-block multi-scale fusion.

  2. We present the detail-preserving network, which mainly consists of eight serially connected DP-Blocks, and it maintains high resolution representations during the whole process. As a result, the DPN could learn both semantic features and preserve the detailed information simultaneously.

  3. We conducted experiments over three public datasets. Experimental results reveal that our method achieves comparable or even superior performance over other methods in terms of segmentation accuracy, segmentation speed, extendibility, and the number of parameters.

The rest of this paper is organized as follows. Related works about vessel segmentation are introduced in Section 2. Our method is described in Section 3. Experimental results are analyzed in Section 4. Conclusions are drawn in Section 5.

(a) DPN
(b) DP-Block
(c) DPR-Block
Fig. 1:

(a) Overview of the proposed detail-preserving network (DPN). DPN consists of one DP-Block and seven DPR-Blocks, and it maintains high resolution representations during the whole process. (b) Overview of the proposed detail-preserving block (DP-Block), where C0, C1 and C2 denote the number of convolutional filters for each branch. (c) Overview of the proposed detail-preserving block with residual connection (DPR-Block).

2 Related Works

Retinal vessel segmentation is a pixel-wise binary classification problem, and the objective is to locate each vessel-pixel accurately for further processing. According to whether annotations are used, existing methods could be divided into two categories: unsupervised methods and supervised methods.

2.1 Unsupervised Methods

Unsupervised methods usually utilize human-designed low-level features, such as edge, line and color. Manually annotated information is not utilized. Unsupervised methods can be roughly divided into match filter based method, vessel tracking based method, threshold based method and morphology based method.

Wang et al. [34] proposed a multi-stage method for vessel segmentation. A matched filtering was first adopted for vessel enhancing, and then vessels were located via a multi-scale hierarchical decomposition. Yin et al. [41] proposed a vessel tracking method, in which local grey information was utilized to select vessel edge points. Then a Bayesian method was used to determine the direction of vessels. Garg et al. [7] proposed a curvature-based method. In their method, the vessel lines were first extracted using curvature information, and then a region growing method was used to generate the whole vessel tree. Li et al. [18] proposed an adaptive threshold method for vessel segmentation, and their method could detect both large and small vessels. Christodoulidis et al. [3]

utilized line detector and tensor voting for vessel segmentation, and thin vessels were well detected.

A major limitation for unsupervised method is that the features and rules are designed by human. It is hard to design a satisfactory feature that works well on large-scale fundus images. This kind of method may show poor generalization ability.

2.2 Supervised Methods

In contrast to unsupervised methods, supervised methods need annotation information to build vessel segmentation models. Before deep learning methods were applied to vessel segmentation, supervised methods usually consist of two procedures: feature extraction and classification. In the first procedure, features were extracted by human-designed rules, just as that did in unsupervised methods. In the second procedure, supervised classifiers were employed to classify these extracted features into vessels or non-vessels. As deep learning methods unify feature extraction and classification procedures together, they could extract much discriminative features.

Deep learning based methods could be roughly divided into classification-based methods and segmentation-based methods. For classification-based methods, the category for each pixel is determined by its surrounding small image patch [19, 33]. This kind of method does not make full use of contextual information. For segmentation-based methods, existing methods follow the architecture of FCN, where the resolution of feature maps are first down-sampled to encode structural information, and then the resolution of feature maps are up-sampled further to generate pixel-wise segmentation maps. However, the down-sampling operation sacrificed detailed spatial information, which is bad to identifying thin vessels. To alleviate this problem, multi-scale fusion methods and graph models were adopted. For instance, Maninis et al. [20] proposed a FCN for vessel segmentation. They adopted a multi-scale feature fusion to generate fine vessel maps. Fu et al. [6] adopted a holistically-nested edge detection model [39] to generate coarse segmentation maps, and then a conditional random field was adopted to model the relationship among long-range pixels to refine segmentation maps. Besides above methods, Ronneberger et al. proposed an u-shape network, called U-Net to preserve spatial information [24]. Similar with FCN, the feature maps were first down-sampled to a low resolution, then they were up-sampled step-by-step. In each step, the intermediate features with high representation in the encoder were utilized. Several methods based on U-Net have been proposed for vessel segmentation. For instance, Jin et al. [14] proposed a DUNet for vessel segmentation. They used deformable convolution rather than grid convolution in U-Net to capture the shape of vessels. Wu et al. [38] designed a two-branch network, where each branch consists of two U-Nets. The output of their method was the average of the predictions of these two branches. In addition, different from [20] and [6] that used the entire image as training samples. Both [14] and [38] used overlapped image patches of size 4848 as training samples, and a re-composed procedure is required to complete a segmentation map during testing. Hence, they suffer from a high computation complexity. Despite their success, the problem of lossing spatial information in the down-sampling phase has not been fully addressed. Meantime, considering both computation complexity and segmentation accuracy, there still lacks a fast and accurate vessel segmentation model.

3 Our Method

In this section, we will describe our method in detail, including the architecture of our proposed detail-preserving network, the detail-preserving block, and the loss function at last.

3.1 Detail-Preserving Network

A good vessel segmentation model should segment both thick vessels and thin vessels, this requires the segmentation model to learn structural semantic information and preserve detailed spatial information simultaneously. The structural information is benefit to locate thick vessels, and it requires the model to have a large field of view. While the detailed spatial information is important to locate vessel boundaries accurately, especially for thin vessels. However, it is easy to lose detailed information when learning structural information. For example, the structural information of U-Net [24] is learned by successive down-sampling operations, and the resolution of feature maps is decreased by a factor of 8 or even more (as can be seen in Fig. 2). Such low resolution implies that the spatial information of thin vessels is lost. U-Net utilizes intermediate features of the encoder to recover the spatial information. However, intermediate features may have noise due to a small field of view.

Fig. 2: The architecture of U-Net [24]. H and W denote the height and width of feature maps.

Our study is motivated by whether it is possible to preserve detailed information, while the network has a large field of view. To this end, we present a high representation network, called detail-preserving network for vessel segmentation. The architecture of our model is visualized in Fig. 1

. We can observe that DPN mainly consists of the front convolution operation, eight detail-preserving blocks (specifically, one DP-Block and seven DPR-Blocks) and four loss functions. The DPN has three characteristics. 1) Different from U-Net, there are no down-sampling operations among these DP-Blocks, this implies the resolution of features among these DP-Blocks keeps the same. In other words, the DPN maintains a full/high resolution representation during the whole processing (from input to output), thereby it could preserve detailed spatial information. 2) For DP-Block, the receptive field of the output neuron could be as large as four times that of the input neuron, while the detailed information could also be preserved. Hence, the DPN could achieves a large field of view via successive DP-Blocks, which ensures the DPN could learn structural semantic information instead of local information. The architecture of the DP-Block will be described in the next section. 3) Different from U-Net that utilized VGGNet or ResNet as the backbone, which incurs a large number of parameters. The total number of parameters of DPN is only 96k. 4) The input of DPN is the entire image, so that it could integrate more contextual information than patch-level segmentation models. Meantime, our method only needs one forward pass to generate the complete segmentation maps, thereby the inference speed of our method is faster than patch-level models.

3.2 Detail-Preserving Block

DP-Block as the key component of DPN, could learn structural semantic information and preserve the spatial detailed information at the same time. Overview of the DP-Block is visualized in Fig. 1(b). We can observe that the input feature of the DP-Block is fed into three branches, and each branch is processed in different scales. The output feature of the DP-Block is obtained by fusing features of three scales. The computing procedure of the DP-Block is as follows.

For the first branch, a convolution operation with 3

3 kernel was adopted to learn detailed information. For the second branch, a pooling operation with stride 2 was adopted, then the resolution of feature maps was down-sampled by a factor of 2. A convolution operation with 3

3 kernel was adopted further. For the third branch, it was used to enlarge the field of view and learn structural information. In this branch, a pooling operation with stride 4 was first adopted, as a result the resolution of feature maps was down-sampled by a factor of 4 and the receptive field was increased by a factor of 4 either. A convolution operation with 33 kernel was then adopted to extract features. The extracted features of each branch were fused in a cascaded manner. Specifically, features learned by the third branch were first up-sampled 2, and then connected to the second branch, and the output of the second branch was further connected to the first branch. Here, we used concatenation operation for feature fusion. We note that the resolution of the output feature of the DP-Block is the same as the input feature, so that the DP-Block could not only preserve detailed information but also learns multi-scale features.

Furthermore, we extend the DP-Block and propose the detail-preserving block with residual connection (DPR-Block). The residual connection is helpful to the gradient propagation [10], as no pre-training model is available to train DPN. Overview of the DPR-Block is visualized in Fig. 1(c). As we can see, the DPR-Block is built upon the DP-Block, except that the output of the DP-Block is further summed up to the input of the DPR-Block and then their output was connected to a convolution operation. Therefore, the size (HeightWidthChannel) of the output feature map of the DPR-Block is the same as that of the input feature map.

The number of parameters. In our experiments, the number of convolutional filters C0, C1 and C2 for each branch of DP-Block and DPR-Block was set to 16, 8 and 8, respectively. Suppose the dimension of the input feature of the DPR-Block is HWC0, then the number of parameters for each DPR-Block is only 11,592. In DPN, the dimension of the output feature of the first convolution operation is HW32, then the number of parameters of the DP-Block is 13,880. Hence, the DP-Block and DPR-Block could be effectively learned even when the number of parameters is small.

Relationship with Inception Module

. Different from the inception module 

[30] that uses parallel convolution operations with different convolutional kernels to learn multi-scale features, our DP-Block adopts down-sampling first, so that the receptive field is further enlarged. The receptive field of each output neuron is four times that of the input neuron. As a result, the receptive field grows exponentially when stacking multiple DP-Blocks. Furthermore, rather than parallel processing branches in the inception module, the features of different branches were fused in a cascaded manner in DP-Block to better learn multi-scale features.

3.3 Loss Function

Blood vessels account only for a small proportion of the entire image. Specifically, the proportion of vessels is 8.69%/7.6%/6.93% on the DRIVE/STARE/CHASE_DB1 datasets, respectively. There exists a class imbalance problem in vessel segmentation. To solve this problem, we adopted class balanced cross-entropy loss [39], which uses a weight factor to balance vessel pixels and non-vessel pixels. The class balanced cross-entropy loss is defined as follows.

(1)

where

is a probability map obtained by a sigmoid operation, and

denotes the probability that the pixel belongs to vessel. In addition, denotes the ground truth, and denotes model parameters. Rather than using a fixed value, the weight factor is calculated at each iteration based on the distribution of vessel pixels and non-vessel pixels. The weight factor is defined as below.

(2)

where denotes the number of vessel pixels, and denotes the number of non-vessel pixels. Since , the weight for vessel pixels is large than the weight for non-vessel pixels. So that the model would focus more on vessel pixels than non-vessel pixels.

Besides the segmentation loss after the last layer of DPN, we add three auxiliary losses to the intermediate layers of DPN to pass extra gradient signals to alleviate the gradient-vanish problem, just as that did in DSN [16] and GoogLeNet [30]. As can be seen in Fig. 1, the first auxiliary loss is after DPR-Block1, the second auxiliary loss is after the DPR-Block3, and the last one is after the DPR-Block5. The segmentation loss is connected after the DPR-Block7. Taking the first auxiliary loss as an example, we first adopted a convolution operation with one 11 filter to the output features of DPR-Block1, then a feature map with one channel was obtained. At last this feature map was fed into the class balanced cross-entropy loss function.

Hence, the overall objective function of DPN is the sum of three auxiliary losses and one segmentation loss, and it can be formulated as follows.

(3)

where denotes the probability map of the loss function, and denotes the weight decay coefficient.

In conclusion, we aim to minimize the above objective function during training. In the test phase, the output of the last segmentation loss is taken as the segmentation results of DPN, and the segmentation probability maps of auxiliary losses are ignored.

4 Experiments

4.1 Materials

Performances of our method were evaluated on three public datasets: DRIVE [29], STARE [11] and CHASE_DB1 [5].

The DRIVE (Digital Retinal Images for Vessel Extraction) dataset contains 40 color fundus images captured with a 45 FOV (Field of View). Each image has the same resolution, which is 565584 (width

height). The dataset is partitioned into the training set and the test set officially, and each set contains 20 images. For the test set, two groups of annotations are provided. We used the annotation of the first group as ground-truth to evaluate our model, just as other methods did. In addition, the FOV masks for calculating evaluation metrics are also provided.

The STARE (Structured Analysis of the Retina) dataset contains 20 equal-sized (700605) color fundus images. For each image, two groups of annotations are provided. To be consistent with other methods, we used the annotations of the first group to train and test our model. Since the partition of the training set and the test set are not provided explicitly, for fair comparison, we did two sets of experiments. In the first set of experiments, leave-one-out cross validation was adopted. In the second set of experiments, a 10/10 partition was adopted, where the first 10 images were selected as the training set, and the rest 10 images were selected as the test set. Moreover, as the FOV masks are not provided explicitly, we use the masks provided in [21].

The CHASE_DB1 dataset contains 28 fundus images (999960) captured with a 30 FOV. As the split of the training set and the test set is not provided. For fair comparison with other methods, we did two sets of experiments. We adopted a 20/8 partition for the first set of experiments, where the first 20 images were selected for training and the rest 8 images for testing. For another set of experiments, we adopted a 14/14 (training/test) partition. As the FOV masks are not present, we created the masks manually. The FOV masks for DRIVE, STARE and CHASE_DB1 are presented in Fig. 3.

Fig. 3: Fundus images (the first row) and the corresponding FOV masks (the second row) from DRIVE, STARE and CHASE_DB1 datasets, from left to right.

4.2 Image Preparation

To avoid over-fitting, several transformations have been adopted to augment the training set, including flipping (horizontal and vertical) and rotation (22, 45, 90, 135, 180, 225, 270, 315). As a result, the training images were augmented by a factor of 10 offline. Moreover, the training image was randomly mirrored during training for each iteration.

For the DRIVE and CHASE_DB1 datasets, no preprocessing was performed and the raw color fundus images were fed into the segmentation model. For the STARE dataset, we adopted contrast limited adaptive histogram equalization (CLAHE) [23] to the green channel of the fundus images to enhance low-contrast vessels, as can be seen in Fig. 4.

Fig. 4: (a) A fundus image from STARE dataset. (b) Green channel of color fundus image. (c) Image after CLAHE.

4.3 Training Details

Our model was implemented based on an open-source deep learning framework

Caffe [13], and it ran on a workstation equipped with one NVIDIA RTX 2080ti GPU.

We initialized weights of our model using xavier [8]. The learning rate was initialized to 1e-3. And we trained our model for 100k/30k/100k iterations with ADAM [15] (batch size 1) using weight decay 0.0005 on the DRIVE/STARE/CHASE_DB1 datasets, respectively.

To reduce computational complexity, each training image was cropped into 512512 patches randomly during training on the DRIVE and STARE datasets. For the CHASE_DB1 dataset, a 736736 patch was cropped to use spatial information as much as possible. And, the crop operation was performed via the data layer of Caffe. When testing, the entire fundus image is fed into the network without cropping, so that our model could generate a segmentation map with only one forward pass.

4.4 Evaluation Metrics

We use five metrics to evaluate our method: Sensitivity (Se), Specificity (Sp), Accuracy (Acc), the Area Under the receiving operator characteristics Curve (AUC), and F1-score (F1). They are defined as follows.

(4)
(5)
(6)
(7)

where , and true positive (TP) denotes the number of vessel pixels classified correctly and true negative (TN) denotes the number of non-vessel pixels classified correctly. Similarly, false positive (FP) denotes the number of non-vessel pixels misclassified as vessels and false negative (FN) denotes the number of vessel pixels misclassified as non-vessels. To calculate Se, Sp and Acc, we select the threshold that corresponds to the optimal operating point of the receiving operator characteristics (ROC) curve to generate the binary segmentation maps from a probability map. Also, we note that TP, FN, FP and TN are counted pixel-by-pixel, and only the pixels inside the FOV mask are calculated, not the whole fundus image. The ROC curve is obtained by multiple Se versus (1-Sp) via varying threshold. AUC evaluates the segmentation probability maps not the binary maps, which is more comprehensive. The AUC ranges from 0 to 1, and the AUC of a perfect segmentation model is 1.

Besides these five evaluation metrics, we also report the segmentation speed of our model using fps (frames per second). The segmentation time for each image is counted starting from reading the raw test image from the hard disk to writing the segmentation map into the hard disk. Then, .

4.5 Results and Analysis

4.5.1 Compare with Existing Methods

We compared our method with several state-of-the-art deep vessel segmentation methods on three public datasets in terms of segmentation performance, segmentation speed and the number of parameters. Comparison results were summarized in Table I, Table II and Table III.

As we can see from Table I, compared with DRIU [20] and BTS-DSN [9] which need only one forward pass to generate the segmentation map during testing, our method achieves much higher Se, Acc, AUC and F1 when the segmentation speed is very close. Specifically, the Acc and AUC of our method is about 0.2% higher than DRIU and BTS-DSN. These two models utilized VGGNet [28] as the backbone, and a multi-scale feature fusion method was adopted to recover the spatial structure and detailed information of vessels. Different from these two models, our method learns in a high resolution representation directly, which is friendly to thin vessel detection. We observe that our method achieves much higher Se, and this means our method could detect more vessel pixels, verifying the effectiveness of our DP-Block. In addition, compared with other eight methods that needs multiple forward passes to generate the segmentation map during testing for only one fundus image, the Se, Acc, Auc and F1 of our method is higher that six of the eight methods. Although the segmentation accuracy of our method is slightly lower than FCN [22] and Vessel-Net [37], the segmentation speed of our method is over 20 faster than FCN [22]. Specifically, our model can segment over 10 fundus images within 1 second, which greatly improves the throughout. Moreover, the number of parameters of our model is only 96k, which is much lower than all state-of-the-art models. Therefore, our model is more lightweight and more suitable for deployment to some mobile devices.

On the STARE dataset, we carried out two sets of experiments. The difference between each set of experiment is the partition strategy of the dataset. As we can observe from Table II, our model achieves 0.8020, 0.9848, 0.9649, 0.9859, 0.8237, and 9.1 for Se, Sp, Acc, AUC, F1 and fps, respectively under leave-one-out cross validation. Among these metrics, our method shows superior performance in terms of Se, Acc, AUC, F1-score and fps compared with DUNet, U-Net and Three-stage FCN. Specially, the segmentation speed of our model is 180 faster than DUNet [14], and the AUC of our model is 0.27% higher than DUNet. DUNet is a variant of U-Net, it used deformable convolution rather than grid convolution to better capture the shape characteristics of vessels. Our model uses DP-Block to capture the detailed information and the structural information simultaneously. We argue that our method and DUNet are two different directions for improving the segmentation accuracy, and the performance of our model might be further improved by replacing grid convolution with deformable convolution. Compared with DRIU, DeepVessel and BTS-DSN, although the fps of our model is slightly lower than iamge-level BTS-DSN, our model achieves the highest Sp, Acc, AUC and F1-score. In conclusion, our method shows superior performance on the STARE dataset.

On the CHASE_DB1 dataset, we compare our method with seven existing methods, and most existing state-of-the methods require multiple forward passes and a recomposed operation to generate a segmentation map for one fundus image, thus they show slow segmentation speed. For DUNet [14] and DEU-Net [31], they need over 10s to segment a fundus image with resolution 999960. However, our method runs in an end-to-end way, and it could segment a image within 0.2s, which is over 280 and 70 faster than DUNet and DEU-Net. Meantime, our model achieves higher AUC compared with DUNet and DEU-Net. Compared with MS-NFN, Three-stage FCN, Vessel-Net and BTS-DSN, the Se, Acc and AUC of our model are only lower than these of Vessel-Net. However, the Sp of our method is 0.27% higher than Vessel-Net. In summary, taking both segmentation accuracy and segmentation speed into consideration, our method shows competitive performance compared with other state-of-the-art methods.

Method One Forward Pass? Se Sp Acc AUC F1 fps Params(M)
FCN [22] No 0.8039 0.9804 0.9576 0.9821 N.A 0.5 0.2
U-Net [14] No 0.7849 0.9802 0.9554 0.9761 0.8175 0.32 3.4
DUNet [14] No 0.7963 0.9800 0.9566 0.9802 0.8237 0.07 0.9
DEU-Net [31] No 0.7940 0.9816 0.9567 0.9772 0.8270 0.15 N.A
MS-NFN [38] No 0.7844 0.9819 0.9567 0.9807 N.A 0.1 0.4
Patch BTS-DSN [9] No 0.7891 0.9804 0.9561 0.9806 0.8249 N.A 7.8
Three-stage FCN [40] No 0.7631 0.9820 0.9538 0.9750 N.A N.A 20.4
Vessel-Net [37] No 0.8038 0.9802 0.9578 0.9821 N.A N.A 1.7
DRIU [20] Yes 0.7855 0.9799 0.9552 0.9793 0.8220 N.A 7.8
Image BTS-DSN [9] Yes 0.7800 0.9806 0.9551 0.9796 0.8208 12.3 7.8
Our Method Yes 0.8004 0.9801 0.9572 0.9815 0.8293 11.8 0.1
  • N.A : Not Available

  • * : The metric was computed by ourselves

TABLE I: Comparison results on the DRIVE dataset (For each metric, the top three scores are marked as red, green and blue, respectively.)
Method One Forward Pass? Se Sp Acc AUC F1 fps Split of dataset
DRIU [20] Yes 0.8036 0.9845 0.9658 0.9773 0.8310 N.A 10/10 (train/test)
DeepVessel [6] Yes 0.7412 N.A 0.9585 N.A N.A N.A 10/10 (train/test)
Image BTS-DSN [9] Yes 0.8201 0.9828 0.9660 0.9872 0.8362 9.3 10/10 (train/test)
Patch BTS-DSN [9] No 0.8212 0.9843 0.9674 0.9859 0.8421 N.A 10/10 (train/test)
Our Method Yes 0.8109 0.9857 0.9674 0.9885 0.8424 9.1 10/10 (train/test)
U-Net [14] No 0.7640 0.9867 0.9637 0.9789 0.8133 0.26 leave-one-out
DUNet [14] No 0.7595 0.9878 0.9641 0.9832 0.8143 0.05 leave-one-out
Three-stage FCN [40] No 0.7735 0.9857 0.9638 0.9833 N.A N.A leave-one-out
Our Method Yes 0.8020 0.9848 0.9649 0.9859 0.8237 9.1 leave-one-out
  • N.A : Not Available

  • * : The metric was computed by ourselves

TABLE II: Comparison results on the STARE dataset (For each metric, the top three scores are marked as red, green and blue, respectively.)
Method One Forward Pass? Se Sp Acc AUC F1 fps Split of dataset
MS-NFN [38] No 0.7538 0.9847 0.9637 0.9825 N.A 0.1 20/8 (train/test)
Three-stage FCN [40] No 0.7641 0.9806 0.9607 0.9776 N.A N.A 20/8 (train/test)
Vessel-Net [37] No 0.8132 0.9814 0.9661 0.9860 N.A N.A 20/8 (train/test)
DEU-Net [31] No 0.8074 0.9821 0.9661 0.9812 0.8037 0.08 20/8 (train/test)
BTS-DSN [9] Yes 0.7888 0.9801 0.9627 0.9840 0.7983 6.0 20/8 (train/test)
Our Method Yes 0.7757 0.9841 0.9652 0.9854 0.8080 5.6 20/8 (train/test)
U-Net [14] No 0.8355 0.9698 0.9578 0.9784 0.7792 0.10 14/14 (train/test)
DUNet [14] No 0.8155 0.9752 0.9610 0.9804 0.7883 0.02 14/14 (train/test)
Our Method Yes 0.8374 0.9747 0.9619 0.9831 0.7980 5.6 14/14 (train/test)
  • N.A : Not Available

  • * : The metric was computed by ourselves

TABLE III: Comparison results on the CHASE_DB1 dataset (For each metric, the top three scores are marked as red, green and blue, respectively.)

4.5.2 Visualization

To show the effectiveness of our proposed DPN, we present the segmentation probability maps and the corresponding binary maps in Fig. 5. We can observe that our model could detect both thin vessels and thick vessel trees, verifying the effectiveness of our proposed DP-Block and DPR-Block.

Fig. 5: Visualization of the segmentation maps. The first, third and fifth column correspond to the highest accuracy on the DRIVE, STARE and CHASE_DB1 datasets. The second, fourth and sixth column correspond to the lowest accuracy on the DRIVE, STARE and CHASE_DB1 datasets. From row 1 to 4: fundus images, ground truth, probability maps and binary maps.

Moreover, we present three challenging cases in Fig. 6. We can observe that our model could detect thin vessels with only one-pixel width, as DPN always preserve the spatial information. In addition, our model is able to segment some extremely thin vessels with low-contrast near macula. In the third row of Fig. 6, there exists two lumps of hemorrhage, which shares similar local features with vessels. As the DPN could capture structural information, as a result, the DPN is robust to the presence of hemorrhages. Also, for some true vessels not annotated, our model could segment well. In summary, the proposed method could segment thick and thin vessels, and robust to noise.

(a) Segmentation of extremely thin vessels
(b) Segmentation of low-contrast vessels
(c) Segmentation in the presence of hemorrhages
Fig. 6: Visualization of some challenging cases. From left to right: fundus images patches, ground-truth and the segmentation probability maps generated by proposed DPN.

4.5.3 Cross-Training Experiments

A good vessel segmentation model should not only perform well on the test set for one dataset, but also on other datasets without retraining. To show the generalization ability of our model, we employed cross-training experiments on the DRIVE and STARE datasets. Different from BTS-DSN [9] which retrained the model from the whole dataset and tested on another dataset. We followed the setting of cross-training as that in DUNet [14] and Three-stage FCN [40] which the model trained on the training set of the DRIVE are applied to the whole STARE dataset without retraining. We compare with four methods, and the comparison results were summarized in Table IV.

When transferring our method from STARE to DRIVE, our method achieves the highest Se and Acc. Specially, the sensitivity of our model is nearly 4% higher than Three-stage FCN [40], and the specificity is almost the same. Hence, our model could segment more vessels compared with Three-stage FCN, and high sensitivity is critical to clinical application. Different from Three-stage FCN that designing specialized network structures for thin and thick vessels, respectively. Our proposed DP-Block could capture thin vessels by learning high resolution features and thick vessels by preserving global structural information simultaneously. The cross-training experiments show the superior performance of our method than Three-stage FCN.

When transferring from DRIVE to STARE, our method shows poor performance. We argue the reason is that we did no preprocessing for training samples, since there is a big gap between the two datasets in terms of color and illumination. So that, we retrained our model on the DRIVE training set using CLAHE, just as that did in the STARE dataset. We can observe from Table IV that all of four evaluation metrics are improved after adopting CLAHE preprocessing. Compared with other four methods, our model shows superior performance in terms of Sp and AUC. Taking cross-training experimental results and segmentation speed into consideration, our method is more suitable for clinical application than existing methods.

Dataset Methods Se Sp Acc AUC
STAREDRIVE Fraz [4] 0.7242 0.9792 0.9456 0.9697
Li [17] 0.7273 0.9810 0.9486 0.9677
Yan [40] 0.7014 0.9802 0.9444 0.9568
Jin [14] 0.6505 0.9914 0.9481 0.9718
Our Method 0.7410 0.9801 0.9499 0.9685
DRIVESTARE Fraz [4] 0.7010 0.9770 0.9495 0.9660
Li [17] 0.7027 0.9828 0.9545 0.9671
Yan [40] 0.7319 0.9840 0.9580 0.9678
Jin [14] 0.7000 0.9759 0.9474 0.9571
Our Method 0.6635 0.9821 0.9492 0.9559
Our Method 0.7100 0.9841 0.9558 0.9689
  • * : The results are obtained by retrained our model on DRIVE training set (20 images) using a preprocessing strategy with CLAHE.

TABLE IV: Results of the cross-training experiments.

4.5.4 Effectiveness of Auxiliary Losses

In order to show the effectiveness of using auxiliary losses in the intermediate layers, we removed all three auxiliary losses in DPN and the experimental results were summarized in Table V. We can observe that almost all evaluation metrics were improved after adopting auxiliary losses. Specifically, the segmentation accuracy was improved over 0.1% on all three datasets. This part of experiment verifies the rationality and effectiveness of adopting auxiliary losses.

Dataset Auxiliary Loss? Se Sp Acc AUC F1
DRIVE No 0.7838 0.9811 0.9560 0.9804 0.8237
Yes 0.8004 0.9801 0.9572 0.9815 0.8293
STARE No 0.8075 0.9847 0.9662 0.9872 0.8361
Yes 0.8109 0.9857 0.9674 0.9885 0.8424
CHASE_DB1 No 0.7626 0.9841 0.9640 0.9833 0.8009
Yes 0.7757 0.9841 0.9652 0.9854 0.8080
TABLE V: comparison results of employing auxiliary losses or not (best results shown in bold).

5 Conclusion

Deep learning models have been applied to fundus vessel segmentation, and achieve remarkable performance. In this paper, we propose a deep model, called DPN to segment fundus vessel trees. Different from U-Net and FCN, in which the resolution of features was first down-sampled and then up-sampled. Our method could maintains a high resolution throughout the whole process, so that the vessel boundaries could be located accurately. To accomplish this goal, we proposed the DP-Block further, where multi-scale fusion was adopted to preserve both detailed information and learn structural information. In order to show the effectiveness of our method, we trained DPN from scratch over three publicly available datasets: DRIVE, STARE and CHASE_DB1. Experimental results show that our method shows competitive/better performance with only about 96k parameters. Specifically, the segmentation speed of our method is over 20-160 faster than other state-of-the-art methods on the DRIVE dataset. Moreover, to evaluate the generalization ability of our method, we adopted cross-training experiments. Results reveal that our method achieves competitive performance. Considering the segmentation accuracy, segmentation speed and model generalization ability together, our model shows superior performance and it is suitable for real world application. In the future, we aim to extend our method and develop robust deep models for fundus microaneurysms segmentation.

References

  • [1] G. Azzopardi and N. Petkov (2013) Automatic detection of vascular bifurcations in segmented retinal images using trainable cosfire filters. Pattern Recognition Letters 34 (8), pp. 922–933. External Links: Document Cited by: §1.
  • [2] L. Cao, H. Li, Y. Zhang, L. Zhang, and L. Xu (2020) Hierarchical method for cataract grading based on retinal images using improved haar wavelet. Information Fusion 53, pp. 196–208. External Links: Document Cited by: §1.
  • [3] A. Christodoulidis, T. Hurtut, H. B. Tahar, and F. Cheriet (2016) A multi-scale tensor voting approach for small retinal vessel segmentation in high resolution fundus images. Computerized Medical Imaging and Graphics 52, pp. 28–43. External Links: Document Cited by: §2.1.
  • [4] M. M. Fraz, P. Remagnino, A. Hoppe, B. Uyyanonvara, A. R. Rudnicka, C. G. Owen, and S. A. Barman (2012) An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Transactions on Biomedical Engineering 59 (9), pp. 2538–2548. External Links: Document Cited by: TABLE IV.
  • [5] M. M. Fraz, P. Remagnino, A. Hoppe, B. Uyyanonvara, A. R. Rudnicka, C. G. Owen, and S. A. Barman (2012) Blood vessel segmentation methodologies in retinal images - a survey. Computer Methods and Programs in Biomedicine 108 (1), pp. 407–433. External Links: Document Cited by: §4.1.
  • [6] H. Fu, Y. Xu, S. Lin, D. W. K. Wong, and J. Liu (2016) DeepVessel: retinal vessel segmentation via deep learning and conditional random field. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 132–139. External Links: Document Cited by: §2.2, TABLE II.
  • [7] S. Garg, J. Sivaswamy, and S. Chandra (2007) Unsupervised curvature-based retinal vessel segmentation. In IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 344–347. External Links: Document Cited by: §2.1.
  • [8] X. Glorot and Y. Bengio (2010-13-15 May)

    Understanding the difficulty of training deep feedforward neural networks

    .
    In

    International Conference on Artificial Intelligence and Statistic (AISTATS)

    ,

    Proceedings of Machine Learning Research

    , Vol. 9, pp. 249–256.
    Cited by: §4.3.
  • [9] S. Guo, K. Wang, H. Kang, Y. Zhang, Y. Gao, and T. Li (2019) BTS-dsn: deeply supervised neural network with short connections for retinal vessel segmentation. International Journal of Medical Informatics 126, pp. 105–113. External Links: Document Cited by: §1, §4.5.1, §4.5.3, TABLE I, TABLE II, TABLE III.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 770–778. External Links: Document Cited by: §3.2.
  • [11] A. W. Hoover, V. L. Kouznetsova, and M. H. Goldbaum (2000) Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical Imaging 19 (3), pp. 203–210. External Links: Document Cited by: §4.1.
  • [12] S. Irshad and M. U. Akram (2014) Classification of retinal vessels into arteries and veins for detection of hypertensive retinopathy. In 2014 Cairo International Biomedical Engineering Conference (CIBEC), pp. 133–136. Cited by: §1.
  • [13] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell (2014) Caffe: convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia, pp. 675–678. External Links: Document Cited by: §4.3.
  • [14] Q. Jin, Z. Meng, T. D. Pham, Q. Chen, L. Wei, and R. Su (2019) DUNet: a deformable network for retinal vessel segmentation. Knowledge-Based Systems 178, pp. 149–162. External Links: Document Cited by: §1, §2.2, §4.5.1, §4.5.1, §4.5.3, TABLE I, TABLE II, TABLE III, TABLE IV.
  • [15] D. P. Kingma and J. L. Ba (2015) Adam: a method for stochastic optimization. In International Conference on Learning Representations, pp. 1–13. Cited by: §4.3.
  • [16] C. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu (2015) Deeply-Supervised Nets. In International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, Vol. 38, pp. 562–570. External Links: Link Cited by: §3.3.
  • [17] Q. Li, B. Feng, L. Xie, P. Liang, H. Zhang, and T. Wang (2015) A cross-modality learning approach for vessel segmentation in retinal images. IEEE Transactions on Medical Imaging 35 (1), pp. 109–118. External Links: Document Cited by: TABLE IV.
  • [18] Q. Li, J. You, L. Zhang, and P. Bhattacharya (2006) A multiscale approach to retinal vessel segmentation using gabor filters and scale multiplication. In IEEE International Conference on Systems, Man and Cybernetics, Vol. 4, pp. 3521–3527. External Links: Document Cited by: §2.1.
  • [19] P. Liskowski and K. Krawiec (2016) Segmenting retinal blood vessels with deep neural networks. IEEE Transactions on Medical Imaging 35 (11), pp. 2369–2380. External Links: Document Cited by: §2.2.
  • [20] K. Maninis, J. Pont-Tuset, P. Arbeláez, and L. Van Gool (2016) Deep retinal image understanding. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 140–148. External Links: Document Cited by: §2.2, §4.5.1, TABLE I, TABLE II.
  • [21] D. Marín, A. Aquino, M. E. Gegúndez-Arias, and J. M. Bravo (2010)

    A new supervised method for blood vessel segmentation in retinal images by using gray-level and moment invariants-based features

    .
    IEEE Transactions on Medical Imaging 30 (1), pp. 146–158. External Links: Document Cited by: §4.1.
  • [22] A. Oliveira, S. Pereira, and C. A. Silva (2018)

    Retinal vessel segmentation based on fully convolutional neural networks

    .
    Expert Systems with Applications 112, pp. 229–242. External Links: Document Cited by: §4.5.1, TABLE I.
  • [23] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld (1987) Adaptive histogram equalization and its variations. Computer Vision, Graphics, and Image Processing 39 (3), pp. 355–368. External Links: Document Cited by: §4.2.
  • [24] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. External Links: Document Cited by: §1, §2.2, Fig. 2, §3.1.
  • [25] U. Schmidt-Erfurth, A. Sadeghipour, B. S. Gerendas, S. M. Waldstein, and H. Bogunović (2018) Artificial intelligence in retina. Progress in Retinal and Eye Research 67, pp. 1–29. External Links: Document Cited by: §1, §1.
  • [26] I. U. Scott, G. Alexandrakis, G. J. Cordahi, and T. G. Murray (1999) Diffuse and circumscribed choroidal hemangiomas in a patient with sturge-weber syndrome. Archives of Ophthalmology 117 (3), pp. 406–407. Cited by: §1.
  • [27] E. Shelhamer, J. Long, and T. Darrell (2017) Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (4), pp. 640–651. External Links: Document Cited by: §1.
  • [28] K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, External Links: Link Cited by: §4.5.1.
  • [29] J. Staal, M. D. Abràmoff, M. Niemeijer, M. A. Viergever, and B. Van Ginneken (2004) Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging 23 (4), pp. 501–509. External Links: Document Cited by: §4.1.
  • [30] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. External Links: Document Cited by: §3.2, §3.3.
  • [31] B. Wang, S. Qiu, and H. He (2019) Dual encoding u-net for retinal vessel segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 84–92. External Links: Document Cited by: §4.5.1, TABLE I, TABLE III.
  • [32] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, et al. (2020) Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. External Links: Document Cited by: §1.
  • [33] X. Wang, X. Jiang, and J. Ren (2019) Blood vessel segmentation from fundus image by a cascade classification framework. Pattern Recognition 88, pp. 331–341. External Links: Document Cited by: §2.2.
  • [34] Y. Wang, G. Ji, P. Lin, and E. Trucco (2013) Retinal vessel segmentation using multiwavelet kernels and multiscale hierarchical decomposition. Pattern Recognition 46 (8), pp. 2117–2133. External Links: Document Cited by: §2.1.
  • [35] T. Y. Wong, J. Sun, R. Kawasaki, P. Ruamviboonsuk, N. Gupta, V. C. Lansingh, M. Maia, W. Mathenge, S. Moreker, M. M. Muqit, et al. (2018) Guidelines on diabetic eye care: the international council of ophthalmology recommendations for screening, follow-up, referral, and treatment based on resource settings. Ophthalmology 125 (10), pp. 1608–1622. External Links: Document Cited by: §1.
  • [36] T. Y. Wong, J. Coresh, R. Klein, P. Muntner, D. J. Couper, A. R. Sharrett, B. E. Klein, G. Heiss, L. D. Hubbard, and B. B. Duncan (2004) Retinal microvascular abnormalities and renal dysfunction: the atherosclerosis risk in communities study. Journal of the American Society of Nephrology 15 (9), pp. 2469–2476. External Links: Document Cited by: §1.
  • [37] Y. Wu, Y. Xia, Y. Song, D. Zhang, D. Liu, C. Zhang, and W. Cai (2019) Vessel-net: retinal vessel segmentation under multi-path supervision. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 264–272. External Links: Document Cited by: §4.5.1, TABLE I, TABLE III.
  • [38] Y. Wu, Y. Xia, Y. Song, Y. Zhang, and W. Cai (2018) Multiscale network followed network model for retinal vessel segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 119–126. External Links: Document Cited by: §2.2, TABLE I, TABLE III.
  • [39] S. Xie and Z. Tu (2017) Holistically-nested edge detection. International Journal of Computer Vision 125 (1-3), pp. 3–18. External Links: Document Cited by: §2.2, §3.3.
  • [40] Z. Yan, X. Yang, and K. Cheng (2018) A three-stage deep learning model for accurate retinal vessel segmentation. IEEE Journal of Biomedical and Health Informatics 23 (4), pp. 1427–1436. External Links: Document Cited by: §1, §4.5.3, §4.5.3, TABLE I, TABLE II, TABLE III, TABLE IV.
  • [41] Y. Yin, M. Adel, and S. Bourennane (2012) Retinal vessel segmentation using a probabilistic tracking method. Pattern Recognition 45 (4), pp. 1235–1244. External Links: Document Cited by: §2.1.