Segmentation of retinal vessels is a crucial step in fundus image analysis. It provides information of the distribution, thickness and curvature of the retinal vessels, thus greatly assists early stage diagnosis of circulate system related diseases, such as diabetic retinopathy. Researchers have devoted to this field for decades , and with the development of deep learning technologies , many deep networks have been proposed to tackle this problem. For instance, a Convolutional Neural Network (CNN) combined with conditional random field in , a network pipeline concatenating a preprocessing net and a vesselness Frangi-Net in , and the U-Net . Since published, U-Net has achieved remarkable performance in various fields. Researchers  even claim that hyper-parameter tuning the U-Net rather than constructing new CNN architectures is the key to high performance. However, U-Net generally contains huge amounts of parameters and is resource consuming. Previously, researchers  have proposed to prune U-Net levels in the testing phase to reduce the network size. Yet the modifications introduce even more parameters in the training phase, and only one decisive factor of the architecture, the number of levels, is considered.
We work on retinal vessel segmentation on the DRIVE database and start with a three-level U-Net with 16 filters in the input layer. Firstly, we aim to enhance its performance by integrating common deep learning blocks into the architecture. As the expected performance boost is not observed, we propose the assumption that the basic U-Net alone is adequate or even overqualified for the task. To verify this hypothesis, we design an experiment series to compress the basic U-Net. The number of levels, convolutional layers per level, and filters per convolutional layer are reduced respectively. Non-linear activation layers are removed, and the number of training sets are decreased to further delve into the limits of the network training procedure. Results show that surprisingly harsh conditions are required for the U-Net to degenerate, indicating that the default configuration is redundant. Our contributions are two-fold: the minimum U-Net for this task is reported, indicating the possibility of real-time retinal vessel segmentation on mobile devices; and the issue of excessive computational power use is exposed and stressed on.
2 Materials and Methods
2.1 Default U-Net Configuration
A three-level U-Net with 16 filters in the input layer is used as the baseline architecture as shown in Fig. 1
(a). Batch normalization layers are utilized to stabilize the training process. The deconvolution layers are replaced with upsampling combined with convolution layers to avoid the checkerboard artifact.
2.2 Additive Variants
Three popular CNN blocks are utilized to modify the network structure. The dense block  is inserted in each level of the encoder path. The side-output layer  is employed to provide deep supervision in the decoder path. And the residual block  is integrated into the encoder, the bottleneck as well as the decoder. The block structures are illustrated in Fig. 1 (b-d).
2.3 Subtractive Variants
The experiment design of the subtractive variants of the U-Net is based on the “control variates” strategy, meaning only one factor is changed from the default configuration in one series. Both the structural and training condition limits of the U-Net are studied in the following experiments:
The non-linear activation layers, i.e. the ReLU layers, are removed.
The number of convolutional layers in each level decreases to one.
The number of filters in the input layer is reduced from 16 to 1.
The number of levels decreases step-wise down to one, until the network degenerates into a chain of consecutive convolution layers.
The default U-Net is trained with subsets of the training data. The size of the subset is reduced from 16 down to 1 by a factor of 2.
2.4 Database Description
All experiments are trained and evaluated on the Digital Retinal Images for Vessel Extraction (DRIVE) database. DRIVE is composed of 40 RGB fundus photographs with the size of pixels. All images are provided with manual labels and Field of View (FOV) masks. The database is equally divided into one training and testing set. A subset containing four images is randomly selected from the training set for validation.
The raw images are prepared with a preprocessing pipeline, where the green channels are extracted, the inhomogeneous illumination is balanced with CLAHE, and pixel intensities within the FOV masks are standardized to (-1, 1). The borders of all FOV masks are eroded by four pixels to remove potential border effects and ensure meaningful comparison. Additionally, multiplicative pixel-wise weight maps are generated from the manual labels to emphasize on thin vessels using the equation: , where represents the vessel diameter in the given manual label, and is manually set to .
2.5 Experimental Details
The loss function in this work is composed of two main parts: weighted focal loss and -norm weight regularizer. The objective function is minimized with the Adam optimizer  with a decaying learning rate initialized to . Early stopping is applied according to the validation loss curve. Each batch contains 50 image patches sized , and data augmentation techniques such as rotation, shearing, additive Gaussian noise and intensity shifting are employed. All experiments are conducted for five times with random initializations to show that the performance is stable and that the conclusion is not dominated by certain specific initialization settings. For subset training experiments, the training sets are selected randomly.
The evaluation of each experiment over the five different initialization roll-outs are reported in Table 1-4. The mean and standard deviations of five commonly used metrics, namely specificity, sensitivity, F1 score, accuracy and the AUC score are presented. The threshold for binarization is selected such that the F1 score is maximized on the validation sets. The threshold independent AUC score is chosen as the main performance indicator. The output probability maps of the degenerated trials are presented in Fig.2 (c-f).
Table 1 shows that the AUC scores of additive U-Net variants fluctuate merely on the fourth digit, meaning that the expected performance boost is missing. The reduced number of convolutional layers in each level impairs the network marginally, while the absence of non-linearity has an impact on the performance. As for the subtractive experiment series with decreasing numbers of network levels in Table 2 and initial filters in Table 3, surprisingly not until the U-Net contains only one level and collapses into a sequence of convolution layers, or the number of initial filters drops to one, the segmentation results remain satisfactory with an AUC score above 0.97. In respective of the generalization study as reported in Table 4, a monotonous AUC score decline is observed with reducing amount of training subsets, in accordance with our prediction. However we did not anticipate that two sets for training already achieves an AUC score above 0.96, which indicates that the default U-Net has a high generalization capability in retinal vessel segmentation on DRIVE database.
|1-72. var||param||AUC||specificity||sensitivity||F1 score||accuracy|
|Uden||2 501 067|
|1-72. #||param||AUC||specificity||sensitivity||F1 score||accuracy|
|1-72. #||param||AUC||specificity||sensitivity||F1 score||accuracy|
|1-62. #||AUC||specificity||sensitivity||F1 score||accuracy|
In this work, we explore extreme U-Net configurations for retinal vessel segmentation, and report the results on DRIVE database. This work is motivated by the observation that additive modifications, such as the dense block, introduce additional parameters yet fail to improve the segmentation performance. Hence, an experiment series to decrease the network size as well as simplifying the network structure is conducted. The results do not follow our expectations. It is understandable that non-linearity, rather than the number of convolutional layers per level, has a stronger impact on the network representation capability. However, we did not expect that U-Net with two levels of parameters, and even U-Net with two initial filters of parameters can reach an AUC score of over 0.97. Also the generalization ability of U-Net with weights with only two training sets, achieving an AUC score above 0.96 is surprising. The minimum set-up needed for the U-Net to generate satisfactory results is small for this particular task.
Our discoveries challenge the trend towards networks with increasingly large numbers of parameters that are trained with often marginal improvements in segmentation performance. They also emphasize that, depending on the task, very few samples are sufficient to train CNNs and achieve generalization on unseen data. One can argue that these results are due to the simplicity of the retinal vessel segmentation. Nevertheless, retinal vessel segmentation may not be the only application in this line of observations. We therefore question research approaches merely focused on performance improvement regardless of excessive resource demand. In the future, similar approaches designed under the proposed paradigm could be applied on other tasks to save computational resources.
The research leading to these results has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (ERC grant no. 810316).
-  Srinidhi CL, Aparna P, Rajan J. Recent Advancements in Retinal Vessel Segmentation. J Med Syst. 2017;.
-  Maier A, Syben C, Lasser T, et al. A Gentle Introduction to Deep Learning in Medical Image Processing. Zeitschrift für Medizinische Physik. 2019;.
-  Fu H, Xu Y, Wong DWK, et al. Deep Residual Learning for Image Recognition. In: ISBI; 2016. .
-  Fu W, Breininger K, Schaffert R, et al. A Divide-and-Conquer Approach towards Understanding Deep Networks. MICCAI. 2019;.
-  Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: MICCAI; 2015. .
-  Isensee F, Petersen J, Klein A, et al. nnU-Net: Self-Adapting Framework for U-Net-Based Medical Image Segmentation. arXiv:180910486. 2018;.
-  Zhou Z, Siddiquee MMR, et al. Unet++: A Nested U-Net Architecture for Medical Image Segmentation. In: DLMIA; 2018. .
-  Huang G, Liu Z, Van Der Maaten L, et al. Densely Connected Convolutional Networks. In: CVPR; 2017. .
-  He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. In: CVPR; 2016. .
-  Lin TY, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection. In: Proc IEEE Int Conf Comput Vis; 2017. .
-  Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv:14126980. 2014;.