What Do We Really Need? Degenerating U-Net on Retinal Vessel Segmentation

by   Weilin Fu, et al.

Retinal vessel segmentation is an essential step for fundus image analysis. With the recent advances of deep learning technologies, many convolutional neural networks have been applied in this field, including the successful U-Net. In this work, we firstly modify the U-Net with functional blocks aiming to pursue higher performance. The absence of the expected performance boost then lead us to dig into the opposite direction of shrinking the U-Net and exploring the extreme conditions such that its segmentation performance is maintained. Experiment series to simplify the network structure, reduce the network size and restrict the training conditions are designed. Results show that for retinal vessel segmentation on DRIVE database, U-Net does not degenerate until surprisingly acute conditions: one level, one filter in convolutional layers, and one training sample. This experimental discovery is both counter-intuitive and worthwhile. Not only are the extremes of the U-Net explored on a well-studied application, but also one intriguing warning is raised for the research methodology which seeks for marginal performance enhancement regardless of the resource cost.



There are no comments yet.


page 6


Dense Residual Network for Retinal Vessel Segmentation

Retinal vessel segmentation plays an imaportant role in the field of ret...

Retinal Vessels Segmentation Based on Dilated Multi-Scale Convolutional Neural Network

Accurate segmentation of retinal vessels is a basic step in Diabetic ret...

ROSE: A Retinal OCT-Angiography Vessel Segmentation Dataset and New Model

Optical Coherence Tomography Angiography (OCT-A) is a non-invasive imagi...

The Little W-Net That Could: State-of-the-Art Retinal Vessel Segmentation with Minimalistic Models

The segmentation of the retinal vasculature from eye fundus images repre...

RC-Net: A Convolutional Neural Network for Retinal Vessel Segmentation

Over recent years, increasingly complex approaches based on sophisticate...

Retinal Vessel Segmentation Based on Conditional Deep Convolutional Generative Adversarial Networks

The segmentation of retinal vessels is of significance for doctors to di...

Lesson Learnt: Modularization of Deep Networks Allow Cross-Modality Reuse

Fundus photography and Optical Coherence Tomography Angiography (OCT-A) ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Segmentation of retinal vessels is a crucial step in fundus image analysis. It provides information of the distribution, thickness and curvature of the retinal vessels, thus greatly assists early stage diagnosis of circulate system related diseases, such as diabetic retinopathy. Researchers have devoted to this field for decades [1], and with the development of deep learning technologies [2], many deep networks have been proposed to tackle this problem. For instance, a Convolutional Neural Network (CNN) combined with conditional random field in [3], a network pipeline concatenating a preprocessing net and a vesselness Frangi-Net in [4], and the U-Net [5]. Since published, U-Net has achieved remarkable performance in various fields. Researchers [6] even claim that hyper-parameter tuning the U-Net rather than constructing new CNN architectures is the key to high performance. However, U-Net generally contains huge amounts of parameters and is resource consuming. Previously, researchers [7] have proposed to prune U-Net levels in the testing phase to reduce the network size. Yet the modifications introduce even more parameters in the training phase, and only one decisive factor of the architecture, the number of levels, is considered.

We work on retinal vessel segmentation on the DRIVE database and start with a three-level U-Net with 16 filters in the input layer. Firstly, we aim to enhance its performance by integrating common deep learning blocks into the architecture. As the expected performance boost is not observed, we propose the assumption that the basic U-Net alone is adequate or even overqualified for the task. To verify this hypothesis, we design an experiment series to compress the basic U-Net. The number of levels, convolutional layers per level, and filters per convolutional layer are reduced respectively. Non-linear activation layers are removed, and the number of training sets are decreased to further delve into the limits of the network training procedure. Results show that surprisingly harsh conditions are required for the U-Net to degenerate, indicating that the default configuration is redundant. Our contributions are two-fold: the minimum U-Net for this task is reported, indicating the possibility of real-time retinal vessel segmentation on mobile devices; and the issue of excessive computational power use is exposed and stressed on.

2 Materials and Methods

2.1 Default U-Net Configuration

A three-level U-Net with 16 filters in the input layer is used as the baseline architecture as shown in Fig. 1

(a). Batch normalization layers are utilized to stabilize the training process. The deconvolution layers are replaced with upsampling combined with convolution layers to avoid the checkerboard artifact.

2.2 Additive Variants

Three popular CNN blocks are utilized to modify the network structure. The dense block [8] is inserted in each level of the encoder path. The side-output layer [3] is employed to provide deep supervision in the decoder path. And the residual block [9] is integrated into the encoder, the bottleneck as well as the decoder. The block structures are illustrated in Fig. 1 (b-d).

Figure 1: U-Net (a), residual block (b), dense block (c) and side-output layer (d).

2.3 Subtractive Variants

The experiment design of the subtractive variants of the U-Net is based on the “control variates” strategy, meaning only one factor is changed from the default configuration in one series. Both the structural and training condition limits of the U-Net are studied in the following experiments:

  1. The non-linear activation layers, i.e. the ReLU layers, are removed.

  2. The number of convolutional layers in each level decreases to one.

  3. The number of filters in the input layer is reduced from 16 to 1.

  4. The number of levels decreases step-wise down to one, until the network degenerates into a chain of consecutive convolution layers.

  5. The default U-Net is trained with subsets of the training data. The size of the subset is reduced from 16 down to 1 by a factor of 2.

2.4 Database Description

All experiments are trained and evaluated on the Digital Retinal Images for Vessel Extraction (DRIVE) database. DRIVE is composed of 40 RGB fundus photographs with the size of pixels. All images are provided with manual labels and Field of View (FOV) masks. The database is equally divided into one training and testing set. A subset containing four images is randomly selected from the training set for validation.

The raw images are prepared with a preprocessing pipeline, where the green channels are extracted, the inhomogeneous illumination is balanced with CLAHE, and pixel intensities within the FOV masks are standardized to (-1, 1). The borders of all FOV masks are eroded by four pixels to remove potential border effects and ensure meaningful comparison. Additionally, multiplicative pixel-wise weight maps are generated from the manual labels to emphasize on thin vessels using the equation: , where represents the vessel diameter in the given manual label, and is manually set to .

2.5 Experimental Details

The loss function in this work is composed of two main parts: weighted focal loss 

[10] and -norm weight regularizer. The objective function is minimized with the Adam optimizer [11] with a decaying learning rate initialized to . Early stopping is applied according to the validation loss curve. Each batch contains 50 image patches sized , and data augmentation techniques such as rotation, shearing, additive Gaussian noise and intensity shifting are employed. All experiments are conducted for five times with random initializations to show that the performance is stable and that the conclusion is not dominated by certain specific initialization settings. For subset training experiments, the training sets are selected randomly.

3 Results

The evaluation of each experiment over the five different initialization roll-outs are reported in Table 1-4. The mean and standard deviations of five commonly used metrics, namely specificity, sensitivity, F1 score, accuracy and the AUC score are presented. The threshold for binarization is selected such that the F1 score is maximized on the validation sets. The threshold independent AUC score is chosen as the main performance indicator. The output probability maps of the degenerated trials are presented in Fig. 

2 (c-f).

Table 1 shows that the AUC scores of additive U-Net variants fluctuate merely on the fourth digit, meaning that the expected performance boost is missing. The reduced number of convolutional layers in each level impairs the network marginally, while the absence of non-linearity has an impact on the performance. As for the subtractive experiment series with decreasing numbers of network levels in Table 2 and initial filters in Table 3, surprisingly not until the U-Net contains only one level and collapses into a sequence of convolution layers, or the number of initial filters drops to one, the segmentation results remain satisfactory with an AUC score above 0.97. In respective of the generalization study as reported in Table 4, a monotonous AUC score decline is observed with reducing amount of training subsets, in accordance with our prediction. However we did not anticipate that two sets for training already achieves an AUC score above 0.96, which indicates that the default U-Net has a high generalization capability in retinal vessel segmentation on DRIVE database.

 1-72. var param AUC specificity sensitivity F1 score accuracy
 U  108 976
 Ures  154 768
 Uden  2 501 067
 Uside  109 072
 U-lin  108 976
 U-1C  49 072
Table 1: Performance w.r.t. structural variants. Additive variants: Ures, Uden, Uside denote the U-Net with residual blocks, U-Net with dense blocks, U-Net with side-output layers; subtractive variants: U-lin, U-1C represent U-Net without ReLU layers and U-Net with one convolutional layer per level, respectively.
1-72. # param AUC specificity sensitivity F1 score accuracy
2  23 984
1  7 344
Table 2: U-Net performance w.r.t. different numbers of levels.
1-72. # param AUC specificity sensitivity F1 score accuracy
8  27 352
4  6 892
2  1 750
1  451
Table 3: U-Net performance w.r.t. different numbers of initial filters.
1-62. # AUC specificity sensitivity F1 score accuracy
Table 4: U-Net performance w.r.t. various number of training sets.
(a) Preprocessed Image.
(b) Manual label.
(c) Default U-Net.
(d) U-Net with 1 filter.
(e) U-Net with 1 level.
(f) Trained with 1 set.
Figure 2: Probability output of U-Net variants.

4 Discussion

In this work, we explore extreme U-Net configurations for retinal vessel segmentation, and report the results on DRIVE database. This work is motivated by the observation that additive modifications, such as the dense block, introduce additional parameters yet fail to improve the segmentation performance. Hence, an experiment series to decrease the network size as well as simplifying the network structure is conducted. The results do not follow our expectations. It is understandable that non-linearity, rather than the number of convolutional layers per level, has a stronger impact on the network representation capability. However, we did not expect that U-Net with two levels of parameters, and even U-Net with two initial filters of parameters can reach an AUC score of over 0.97. Also the generalization ability of U-Net with weights with only two training sets, achieving an AUC score above 0.96 is surprising. The minimum set-up needed for the U-Net to generate satisfactory results is small for this particular task.

Our discoveries challenge the trend towards networks with increasingly large numbers of parameters that are trained with often marginal improvements in segmentation performance. They also emphasize that, depending on the task, very few samples are sufficient to train CNNs and achieve generalization on unseen data. One can argue that these results are due to the simplicity of the retinal vessel segmentation. Nevertheless, retinal vessel segmentation may not be the only application in this line of observations. We therefore question research approaches merely focused on performance improvement regardless of excessive resource demand. In the future, similar approaches designed under the proposed paradigm could be applied on other tasks to save computational resources.

5 Acknowledgements

The research leading to these results has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (ERC grant no. 810316).


  • [1] Srinidhi CL, Aparna P, Rajan J. Recent Advancements in Retinal Vessel Segmentation. J Med Syst. 2017;.
  • [2] Maier A, Syben C, Lasser T, et al. A Gentle Introduction to Deep Learning in Medical Image Processing. Zeitschrift für Medizinische Physik. 2019;.
  • [3] Fu H, Xu Y, Wong DWK, et al. Deep Residual Learning for Image Recognition. In: ISBI; 2016. .
  • [4] Fu W, Breininger K, Schaffert R, et al. A Divide-and-Conquer Approach towards Understanding Deep Networks. MICCAI. 2019;.
  • [5] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: MICCAI; 2015. .
  • [6] Isensee F, Petersen J, Klein A, et al. nnU-Net: Self-Adapting Framework for U-Net-Based Medical Image Segmentation. arXiv:180910486. 2018;.
  • [7] Zhou Z, Siddiquee MMR, et al. Unet++: A Nested U-Net Architecture for Medical Image Segmentation. In: DLMIA; 2018. .
  • [8] Huang G, Liu Z, Van Der Maaten L, et al. Densely Connected Convolutional Networks. In: CVPR; 2017. .
  • [9] He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. In: CVPR; 2016. .
  • [10] Lin TY, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection. In: Proc IEEE Int Conf Comput Vis; 2017. .
  • [11] Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv:14126980. 2014;.