Fundus image analysis serves as a key and non-invasive tool in the diagnosis and treatment of many ophthalmological and cardiovascular diseases. Additionally, with the developing of deep learning methods, many network architectures based on U-Net or adversarial procedures have been proposed to learn the end-to-end relations between an original image and a ground-truth binary mask manually labeled by experts. Maninis proposed Deep Retinal Image Understanding (DRIU) which fine-tuned VGGNet. During the progress of deep learning approaches, segmentation performance on thin vessels has become a great challenge and focus. Zhang et al.  propose a U-Net architecture (ML-UNet)  for multi-label segmentation of thin and stem (thick) vessels. Yan et al.  propose a novel segment-level loss in addition to the pixel-level loss to train a U-Net architecture (JL-UNet), and report increased segmentation accuracy for thin vessels. Yet, the work of Zhang et al.  and Yan et al.  which propose an essentially multi-label miscellaneous network, do not have an end-to-end network which dedicated for specific binary classification tasks focusing different types of features. Additionally, Gu et. al  propose a context encoder network (CE-Net) to better extract the high-level information of the image, while the CE-Net loses to focus on thin and boundary areas.
In this paper, we inspect the rationale behind this problem from a perspective of data balancing. The reason that ordinary neural networks did not obtain desirable segmentation performance on thin vessels and boundary areas is that vessel data are suffered from imbalance internal to an assumed identical class (vascular or non-vascular). Vessels with different thickness levels may have different features for identification and localization, making them essentially different classes in a segmentation task. Therefore, balancing across these classes becomes an important work to avoid bias in learning. However, such balancing remains challenging as in most available segmentation datasets, the ground-truth mask is binary, providing no immediate information regarding thickness levels. In view of this challenge, we propose a novel morphological model that automatically segments and classifies (stratifies) ground-truth masks into strata regarding vessel thickness levels using hierarchical opening operations. In order to further increase the segmentation performance, we also propose a two-stream model that learns both general retinal vascular features and those specific to thin vessels and boundary areas by processing both all strata and only the thin vessels (the following ”thin vessels” refer to both thin vessels and boundary areas) stratum. The results from the two streams are united (pixel-wise ORed) to output the final result.
Our contributions mainly lie in 3 aspects. (1) We propose a novel two-stream architecture to synthesize features of different thickness levels. (2) An efficient hierarchical model of opening operations, which automatically stratifies the ground-truth masks to inject thickness levels sensitivity to our model and is jointly utilized with a proposed CE-GAN model whose generator is based on the CE-Net  architecture. (3) A balanced loss function and an integration operation to unify and enable weighing on vessel classes of various thickness levels.
2 Proposed Method
2.1 Automatic Stratification
For each original sample , the mask is stratified into componential masks (strata): , each with only the vessel labels of the corresponding thickness levels. The stratification is achieved via opening (erosion then dilation) with thresholding kernels. For the opening operation, we apply thresholds for kernels sizes: , . We define the diameter of the vessel as the discrete Fréchet distance between its two border curves and :
where and : are two non-decreasing surjections and is the Chebyshev distance between two pixels. All vessels of are guaranteed to be completely erased via a kernel, while all vessels with (attenuated during erosion) restore their original outlines after dilation and are intact from the whole opening process. This process results in an intermediary semi-limited mask , wherefrom we can derive the final precisely selective strata:
2.2 Two-stream Model
In order to learn vessel features of different specificities, we propose a novel two-stream model for both general features and those especially related to thin vessels. On one stream, learns general features via training against 3 strata. To effectively learn the features of different thickness levels, we propose to concatenate both the two stratified masks (stem and thin) and the original mask (raw) along a third, strata dimension to form of shape for later training. Samples with stratified masks are fed to a general end-to-end U-Net-like segmentation network that outputs a prediction map against each strata. On the other stream, an additional end-to-end network dedicated for segmenting thin vessels outputs only one prediction map against only the stratum of thin vessels labels .
We use weighted MSE as the losses of the network and apply corresponding backward updates to it. In this way, vessels of different thickness levels have configurable weights in the final losses and the thickness-insensitive segmentation dataset are able to be internally balanced:
stands for the Frobenius norm of the residual tensor.
The segmentation problem can also be formulated as an image-to-image translation task from the original image to the ground-truth mask. Specifically, we materialize the two-stream network as adversarial CE-GAN models. Under this context, we train the generative networks from those following loses:
In addition, generators are also trained directly against the ground-truth strata to refine the segmentation results with L1 norm and . Moreover, the adversarial segmentation networks are updated using a min-max algorithm, where the losses of the above two training ends are regularized by a hyper-parameter :
|CE-GAN + stratify||0.9839||0.9850||0.9840|
|CE-GAN + stratify + thin||0.9844||0.9883||0.9858|
Since both the two networks produce smooth predictions, first we binarize the preliminary outputs with a threshold of 127. Then as the final outputs of our system, positive binarized predictions are united (pixel-wise ORed) with that from each prediction maps.
3.1 Datasets and Experimental Setup
We evaluate our model on three standard datasets widely used for the retinal vessels segmentation task. All of these three datasets contain no annotations of vessels thickness levels and are therefore appropriate for our stratification model to process. DRIVE  111https://www.isi.uu.nl/Research/Databases/DRIVE/ contains 40 color fundus (CF) images with manually labeled ground-truth masks, where 20 images for training and use the remaining 20 images for testing. To reduce selection bias, we repeat the experiment 5 times and report the averaged result. STARE  222http://cecas.clemson.edu/~ahoover/stare/ contains 20 manually labeled CF images. We report average results on 4-fold cross-validation with 15 training samples and 5 testing samples. CHASE_DB1  333https://blogs.kingston.ac.uk/retinal/chasedb1/ contains 28 labeled samples, where we report average performances on 4-fold cross-validation.
3.2 Evaluation Metrics
Standard metrics for binary classification tasks including Area Under Curve () of Receiver Operating Characteristic (ROC), Accuracy (), Specificity (), and Sensitivity () (Recall) are used for evaluating our model. The definitions of the selected metrics are given by: , , and , where , , , and respectively stand for true positives, true negatives, false positives, and false negatives.
3.3 Experimental Results
To justify the performance of our model, we compare the 4 metrics with 8 representative previous works from all 3 open-access datasets. The comparison results presented in Table 1 show that our MP-Net model outperforms the state-of-the-art methods regarding accuracy and in all three datasets, which meter the practical prediction quality and the overall prediction quality independent on thresholding specifications. The advancement is greater in the DRIVE dataset. It’s related to the fact that the DRIVE dataset contains more thin vessels, which is the main target of our model. Specificity is also the highest in DRIVE and CHASE_DB1 while sensitivity is highest in STARE. Particularly, our method outperforms ML-UNet  and JL-UNet  which adopt a different multi-class approach to also especially tackle the thin-vessels challenge. Figure 2 shows an example of our segmentation maps on DRIVE. As can be seen, most thin vessels and boundary areas have been meticulously picked up.
3.4 Ablation Study
Our proposed MP-Net can be roughly decomposed into 4 major progressive phases: (1) the backbone Context-Encoder Network (CE-Net) as a standalone generator segmenting non-stratified images, (2) the non-stratified CE-Net in (1) together with a discriminator to form a CE-GAN, (3) CE-GAN with a stratified CE-Net (i.e. with raw, stem, and thin strata) to form one stream of the MP-Net, and (4) The one-stream MP-Net in (3) with another stream of thin-stratum-specific GAN in (2) to form the complete two-stream MP-Net. We perform a whole series of ablation studies on all the datasets to verify the effect of each component via separation. The results in Table 2 validate that the stratification and mingled training mechanism and thin-specific designs are both effective improvements to the baseline system.
In this paper, we propose the Meticulous-Processing Network (MP-Net) which refines segmentation performance on thin vessels by stratifying and training on different thickness levels. The performance comparison and ablation study validate our design. This composited method can also be extended to more vessel-like segmentation tasks.
-  (2018) A hierarchical image matting model for blood vessel segmentation in fundus images. IEEE Transactions on Image Processing 28 (5), pp. 2367–2377. Cited by: Table 1.
-  (2012) An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Transactions on Biomedical Engineering 59 (9), pp. 2538–2548. Cited by: Table 1.
-  (2019) CE-net: context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging. Cited by: §1, §1, Table 1, Table 2.
-  (1998) Locating blood vessels in retinal images by piece-wise threshold probing of a matched filter response.. In Proceedings of the AMIA Symposium, pp. 931. Cited by: §3.1.
-  (2016) Deep retinal image understanding. In International conference on medical image computing and computer-assisted intervention, pp. 140–148. Cited by: §1, Table 1.
-  (2009) Measuring retinal vessel tortuosity in 10-year-old children: validation of the computer-assisted image analysis of the retina (caiar) program. Investigative ophthalmology & visual science 50 (5), pp. 2004–2010. Cited by: §3.1.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1.
-  (2004) Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging 23 (4), pp. 501–509. Cited by: §3.1.
-  (2019) Blood vessel segmentation from fundus image by a cascade classification framework. Pattern Recognition 88, pp. 331–341. Cited by: Table 1.
-  (2018) Joint segment-level and pixel-wise losses for deep learning based retinal vessel segmentation. IEEE Transactions on Biomedical Engineering 65 (9), pp. 1912–1923. Cited by: §1, Table 1, §3.3.
-  (2016) Robust retinal vessel segmentation via locally adaptive derivative frames in orientation scores. IEEE transactions on medical imaging 35 (12), pp. 2631–2644. Cited by: Table 1.
-  (2018) Deep supervision with additional labels for retinal vessel segmentation task. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 83–91. Cited by: §1, Table 1, §3.3.