Evaluation of Deep Segmentation Models for the Extraction of Retinal Lesions from Multi-modal Retinal Images

by   Taimur Hassan, et al.

Identification of lesions plays a vital role in the accurate classification of retinal diseases and in helping clinicians analyzing the disease severity. In this paper, we present a detailed evaluation of RAGNet, PSPNet, SegNet, UNet, FCN-8 and FCN-32 for the extraction of retinal lesions such as intra-retinal fluid, sub-retinal fluid, hard exudates, drusen, and other chorioretinal anomalies from retinal fundus and OCT scans. We also discuss the transferability of these models for extracting retinal lesions by varying training-testing dataset pairs. A total of 363 fundus and 173,915 OCT scans were considered in this evaluation from seven publicly available datasets from which 297 fundus and 59,593 OCT scans were used for testing purposes. Overall, the best performance is achieved by RAGNet with a mean dice coefficient (D_C) score of 0.822 for extracting retinal lesions. The second-best performance is achieved by PSPNet (mean D_C: 0.785) using ResNet50 as a backbone. Moreover, the best performance for extracting drusen is achieved by UNet (D_C: 0.864). The source code is available at: http://biomisa.org/index.php/downloads/.



There are no comments yet.


page 1

page 4


Utilizing Transfer Learning and a Customized Loss Function for Optic Disc Segmentation from Retinal Images

Accurate segmentation of the optic disc from a retinal image is vital to...

Clinically Verified Hybrid Deep Learning System for Retinal Ganglion Cells Aware Grading of Glaucomatous Progression

Objective: Glaucoma is the second leading cause of blindness worldwide. ...

U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated Retina

Fundus photography has routinely been used to document the presence and ...

A Generalized Motion Pattern and FCN based approach for retinal fluid detection and segmentation

SD-OCT is a non-invasive cross-sectional imaging modality used for diagn...

Towards a glaucoma risk index based on simulated hemodynamics from fundus images

Glaucoma is the leading cause of irreversible but preventable blindness ...

Grading the Severity of Arteriolosclerosis from Retinal Arterio-venous Crossing Patterns

The status of retinal arteriovenous crossing is of great significance fo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Retinopathy or retinal diseases tend to damage the retina, which may result in a non-recoverable loss of vision or even blindness. Most of these diseases are associated with diabetes. However, they may also occur due to aging, uveitis and cataract surgeries. The two common retinal diseases are macular edema (ME) and age-related macular degeneration (AMD). ME is the fluid accumulation within the macula of the retina that is mostly caused by hyperglycemia, uveitis and cataract surgeries. ME caused by diabetes is often termed as diabetic macular edema (DME) which is identified by examining the patient’s diabetic history and also by checking the presence of retinal thickening (due to retinal fluid) or hard exudates (HE) within the one-disc diameter of the center of the macula [1].

Figure 1: Retinal lesions in fundus and OCT scans of Ci-CSME (A, B) and dry AMD pathology (C, D).

Early Treatment Diabetic Retinopathy Studies (ETDRS) classified DME as clinically significant if: 1) either retinal thickening is observed within 500

m of the macular center; 2) HE along with adjacent retinal thickening is observed within 500m of the macular center; or 3) retinal thickening regions of one (or more) disc diameter are observed in which some part of them are within one-disc diameter of the center of macula [2]. However, with the advent of new imaging techniques such as OCT, the classification of DME is redefined as centrally involved clinical significant macular edema (Ci-CSME) if the presence of retinal thickening, due to retinal fluid or hard exudates, is discovered within the central subfield zone of the macula (having 1mm or greater diameter). Otherwise, DME is classified as non-centrally involved [3]. AMD is another retinal syndrome mostly found in elder people. It is typically classified into two stages i.e. non-neovascular AMD and the neovascular AMD. Non-neovascular AMD is the “dry” form of AMD in which small, medium or large-sized drusen can be observed. With the further progression of the disease, abnormal blood vessels from choroid intercepts the retina causing chorioretinal anomalies such as scars and choroidal neovascular membranes (CNVM). In such case, AMD is classified as wet or neovascular AMD. Fig. 1 shows some of the fundus and OCT scans showing retinal lesions at different stages of AMD and DME.

2 Related Work

In the literature, a large body of solutions assessing retinal regions employed features extraction techniques coupled with classic machine learning (ML) tools. The majority of these methods have been tested on a limited number of images and thus exhibited feeble reproducibility. More recently, with the advent of deep learning a wide variety of end-to-end approaches, operating on more massive datasets, have been proposed.

Traditional Approaches: Fundus imagery has been the modality of choice for examining the retinal pathology for a while [4, 5] and is still used as a secondary examination technique in analyzing the complex retinal pathologies. But, with the advent of OCT, most of the solutions for retinal image analysis have migrated towards this new modality due to its ability to present objective visualization of retinal abnormalities in early stages. Chiu et al. [6] developed a kernel regression and graph theory and dynamic graph theory and dynamic programming (KR+GTDP) scheme to extract retinal layers and retinal fluid from DME affected scans [6]. In [7]

, a Random Forest-based framework was proposed for the automated extraction of retinal layers and fluid from scans affected from central serous retinopathy. Wilkins et al.

[8] presented an automated method for the extraction of intra-retinal cystoid fluid using OCT images. Vidal et al. [9]

used a linear discriminant classifier, support vector machines and a Parzen window for the identification of intra-retinal fluid (IRF). Apart from this, we have also proposed several methods for extracting retinal layers, retinal fluid and for retinopathy classification using traditional ML techniques

[10, 11, 12, 13, 14].

Deep Learning Methods: Many researchers have applied deep learning for the extraction of retinal layers [15] and retinal lesions such as IRF [16], sub-retinal fluid (SRF) [17] and HE [18]. Seebock et al. [19] proposed a Bayesian UNet based strategy for recognizing retinal anomalies in DME, AMD, geographic atrophy and retinal vein occlusion pathologies. Fang et al. [20] developed lesion-aware convolutional neural network (LACNN) model for the accurate classification of DME, choroidal neovascularization, drusen (AMD) and normal pathologies. LACNN is composed of lesion detection network (LDN) and lesion-attention module where LDN first generates soft attention maps to weight the lesion-aware features extracted from the lesion-attention module and then these features are used for the accurate classification of retinal pathologies. Apart from this, we have recently proposed a hybrid retinal analysis and grading architecture (RAGNet) [15] that utilizes a single feature extraction model for retinal lesions segmentation, lesion-aware classification and severity grading of retinopathy based on OCT images.

3 Contribution

In this paper, we present a thorough evaluation of popular deep segmentation models for the extraction of IRF, SRF, HE, drusen and other chorioretinal anomalies such as fibrotic scars and CNVM from multimodal retinal images. To the best of our knowledge, there is no literature available till date providing a thorough evaluation of deep segmentation models for extracting this multitude of lesions in one go from multimodal retinal imagery. Subsequently, the main contributions of this paper are:

  • A first comprehensive evaluation of deep segmentation models such as RAGNet [21], PSPNet [22], SegNet [23], UNet [24], and FCN (8s and 32s) [25] for extracting multiple lesions from multimodal retinal imagery.

  • A comprehensive study encompassing seven publicly available datasets, and five different retinal pathologies represented in a total of 363 fundus and 173,915 OCT scans from which 297 fundus and 59,593 OCT scans were used for testing purposes.

  • A detailed exploration on transferability of these models across varying pair of training-testing datasets.

4 Proposed Approach

We propose a study involving state-of-the-art segmentation frameworks for the extraction of multiple retinal lesions in one go from retinal fundus and OCT imagery. The models which we considered in the evaluation are:

RAGNet: is a hybrid convolutional network that can perform pixel-level segmentation and scan-level classification at the same time [21]. The uniqueness in the RAGNet architecture is that it uses the same feature extractor for the classification and segmentation purposes. So, if the problem demands segmentation and classification from the same image based upon similar features, then RAGNet would be an ideal choice rather than using two separate models [21]. Here, we have only used RAGNet segmentation unit since we are focusing on the retinal lesions segmentation.

PSPNet: is a state-of-the-art scene parsing network that contains pyramid pooling module to generate four pyramids of feature maps representing coarser to finer details of the scene to minimize the loss of global contextual information during scan decomposition [22]. The pooled results are then resized and concatenated with the original feature maps to generate the final segmentation results.

SegNet: is a fully convolutional network for semantic segmentation. The uniqueness in the SegNet model is that it uses pooling indices from the corresponding encoder block to up-sample the feature maps at the decoder end in a non-linear fashion. Afterwards, the feature maps are convolved with trainable filters to remove their sparsity. Moreover, SegNet has smaller number of trainable parameters due to which it is computationally more efficient.

UNet: is an auto-encoder inspired by FCN for semantic segmentation. The key features of UNet is that it is fast and can generate good segmentation results with small amount of training samples because of its in-built data augmentation strategy [24]. UNet uses up-sampling instead of pooling operations and generates a large number of feature maps to mitigate the contextual information to the higher resolution layers [24].


: FCN is an end-to-end model proposed for semantic segmentation. FCN uses learned representation from the pre-trained models, fine tune them and generate finer pixel-level predictions in one go based upon up-sampling lower network layers with finer strides. In this study, we have utilized FCN-8 and FCN-32 (i.e. the finest and the coarsest version of FCN) for retinal lesions extraction.

We have applied the above segmentation models for extracting retinal lesions from both fundus and OCT imagery. Here, we note that our study covers some of the most complex and commonly occurring retinal pathologies including non-neovascular AMD, neovascular AMD, Ci-CSME and non-Ci-CSME. We also note that the related scans were collected using machines from different manufacturers and exhibit varying scan quality. To make the comparison objective and highly reproducible, we have used publicly available datasets in our investigations. Furthermore, we have tested the transferability of these models through an extensive cross-dataset validation. The series of experiments we conducted in this work provide a reliable benchmark for assessing the robustness and generalization capacity of each model.

5 Experimental Setup

We have rigorously evaluated all the segmentation models on publicly available Rabbani-I [26], Rabbani-II [27], Duke-I [28], Duke-II [6], Duke-III [29], BIOMISA [30] and Zhang dataset [31]

. We utilized all the ground truths for lesion segmentation which were originally contained in the datasets and followed their standard criteria for the training/ testing split. Datasets that do not contain any ground truth for retinal lesions like Rabbani-I, Duke-I and Duke-III etc., were annotated by a group of clinicians from the Armed Forces Institute of Ophthalmology, Rawalpindi Pakistan. The training details and the evaluation metrics are presented below:

Training Details:

All the models have been evaluated on Anaconda platform using Keras API and Python 3.7.4 on a machine with Intel Core i5@8400, 16 GB RAM and NVIDIA RTX 2080 GPU where ResNet


was used as a backbone. The training for all the models have been conducted for 20 epochs (having 512 iterations in each epoch). Moreover, the optimizer used during the training was an adaptive learning rate method (ADADELTA) having an initial learning rate of one and the decay factor of 0.95. The source code has been released as well for reproducibility.

Evaluation Metrics: In the proposed study, all the segmentation models have been evaluated using the following metrics:

Mean Dice Coefficient: Dice coefficient () computes the degree of similarity between the ground truth and the extracted results using following relation: , where indicates the true positives, indicates the false positives and indicates the false negatives. After computing for each lesion class. The mean dice coefficient is computed for each network by taking an average of their scores.

Mean Intersection-over-Union: The mean intersection-over-union (IoU) is computed by taking an average of IoU scores for each lesion class where the IoU scores are computed through .

Recall, Precision and F-score: To further evaluate the models, we computed pixel-level recall , precision

and F-score


Qualitative Evaluations: The performances of all the models for lesions extraction have been also qualitatively evaluated through some visual examples.

6 Results and Discussion

The evaluation of segmentation models has been conducted on the combination of all seven datasets containing mixed OCT and fundus scans. In terms of and as shown in Table I, RAGNet achieves 9.48% and 3.36% improvements as compared to UNet and PSPNet, respectively. However, in terms of precision, SegNet has a lead of 1.52% as compared to PSPNet. This indicates that SegNet produces less number of false positives as compared to the rest of the models. For pixel-level comparsion, we have excluded accuracy because it gives biased results towards a dominant negative class i.e. the background.

RAGNet 0.8547 0.8606 0.8576
PSPNet 0.7540 0.9200 0.8287
SegNet 0.6388 0.9342 0.7587
UNet 0.7736 0.8842 0.8252
FCN-8 0.6238 0.6165 0.6201
FCN-32 0.4755 0.5611 0.5147
Table 1: Performance evaluations in terms of pixel-level recall, precision and F-scores on combined dataset. Bold indicate the best performance.

Tables 2 and 3 report the performance of all the models for extracting retinal lesions in term of mean and mean IoU, respectively. From Table 2, it can be observed that RAGNet achieves the best mean score of 0.822, leading PSPNet by 4.5% and FCN-32 by 51.33%. Moreover, in terms of mean IoU, RAGNet also achieves the overall best performance (mean IoU: 0.710) showing a neat gap over its competitors for extracting IRF, SRF and HE regions. In Table 3, the second-best performance is achieved by PSPNet that lags from RAGNet by 6.9%. Also, we noticed that on fundus images UNet achieves optimal lesion extraction results with an overall performance comparable to that of PSPNet. Fig. 2 shows qualitative results of all the models when trained on multimodal images from all seven datasets at once, where we can notice the best overall performance of RAG-Net. It should be noticed that extracting lesions accurately from both modalities at once is quite challenging as their image features vary a lot.

Figure 2: Comparison of retinal lesions extraction on combined dataset. (From left to right: original image, ground truth, RAGNet, PSPNet, UNet, FCN-8, FCN-32, SegNet). Blue, red, yellow, green and pink indicate HE, IRF, SRF, CA, and drusen, respectively.
Network IRF SRF CA HE Drusen Mean
RAGNet 0.846 0.850 0.941 0.633 0.840 0.822
SegNet 0.810 0.610 0.886 0.373 0.695 0.675
PSPNet 0.843 0.809 0.944 0.594 0.735 0.785
UNet 0.816 0.757 0.878 0.581 0.864 0.779
FCN-8 0.681 0.568 0.761 0.124 0.410 0.509
FCN-32 0.651 0.434 0.638 0.032 0.243 0.400
Table 2: Performance evaluations of deep segmentation models for retinal lesions extraction in terms of . Bold indicates the overall best performance.
Network IRF SRF CA HE Drusen Mean
RAGNet 0.733 0.739 0.890 0.464 0.725 0.710
SegNet 0.681 0.439 0.796 0.229 0.533 0.535
PSPNet 0.728 0.680 0.895 0.423 0.581 0.661
UNet 0.689 0.609 0.783 0.409 0.761 0.650
FCN-8 0.517 0.397 0.615 0.066 0.257 0.370
FCN-32 0.482 0.277 0.468 0.016 0.138 0.276
Table 3: Performance evaluations of deep segmentation models for retinal lesions extraction in terms of IoU. Bold indicates the overall best performance.
RAGNet PSPNet SegNet UNet FCN-8 FCN-32
R D 0.624 0.589 0.414 0.574 0.281 0.170
D R 0.649 0.601 0.426 0.612 0.301 0.194
R Z 0.657 0.615 0.468 0.604 0.322 0.225
Z R 0.663 0.632 0.472 0.629 0.329 0.245
B R 0.573 0.542 0.389 0.534 0.236 0.144
R B 0.554 0.535 0.374 0.521 0.213 0.115
Z D 0.809 0.752 0.613 0.734 0.476 0.342
D Z 0.794 0.741 0.598 0.721 0.445 0.335
D B 0.582 0.534 0.487 0.525 0.229 0.136
B D 0.571 0.529 0.479 0.517 0.214 0.127
B Z 0.564 0.513 0.465 0.524 0.221 0.152
Z B 0.552 0.507 0.457 0.512 0.235 0.178
Table 4: Transferability analysis (Training Testing) for all models in terms of mean IoU. Bold and blue indicates the first and second best performance, respectively. (Datasets name are coded as follows: R: Rabbani, D: Duke, Z: Zhang and B: BIOMISA).

In a second series of experiments, we have conducted a transferability analysis to assess the generalization capabilities of all models across multi-vendor OCT images. Here, we combined Duke-I, II and III as one dataset (i.e. Duke) and Rabbani-I and Rabbani-II dataset as Rabbani to avoid redundant combinations as they have similar image features. We report the results in Table 4 where it can be observed that all the methods have shown good performance for Duke and Zhang dataset pairs and this is natural because both datasets are acquired through Spectralis, Heidelberg Inc. Moreover, RAGNet achieved the overall best performance as evident from Table 4, whereas PSPNet stood 2nd best but its performance is comparable with UNet. In another experiment, we have used Rabbani-II dataset to test how many false positive does each model produce. Since Rabbani-II contains only healthy scans, so there are no actual lesions in this dataset. The best performance in this experiment is achieved by the RAGNet with a true negative rate of 0.9999 indicating that it produces a minimum number of false lesions. Apart from this, the worse performance is achieved for FCN-32 ( rate: 0.9379). The worse performance of FCN-32 is even above 90% because the ratio of pixels and the pixels is extremely high. The results for this experiment are available in the codebase package for the readers at: http://biomisa.org/index.php/downloads/.

7 Conclusion and Future Research

In this paper, we presented a thorough evaluation of deep segmentation models for lesions detection from retinal fundus and OCT images. We also assessed the generalization for each model through a comprehensive cross-data validation. This study revealed the superiority of RAGNet on both aspects. This proposed benchmarking in this work will be of great utility for both researchers and practitioners who want to employ deep learning models for lesion-aware grading of the retina. In the future, we plan to extend and exploit this study for the extraction of optic disc and retinal layers in the optic nerve head region for the glaucoma analysis.


  • [1] G. M. Comers, “Cystoid macular edema,” in Kellog Eye Center. Accessed: June 2019.
  • [2] “Diabetic macular edema,” in EyeWiki. Accessed: November 4th, 2019.
  • [3] N. Relhan et al., “The early treatment diabetic retinopathy study historical review and relevance to today’s management of diabetic macular edema,” in Current Opinion in Ophthalmology. Wolters Kluwer, May 2017.
  • [4] M. U. Akram et al., “An automated system for the grading of diabetic maculopathy in fundus images,” in 19th International Conference on Neural Information Processing. November 12th-15th, 2012.
  • [5] T. Hassan et al., “Review of oct and fundus images for detection of macular edema,” in IEEE International Conference on Imaging Systems and Techniques (IST). September, 2015.
  • [6] S. J. Chiu et al., “Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema,” in Biomedical Optics Express. Vol. 6, No. 4, April 2015.
  • [7] D. Xiang et al., “Automatic retinal layer segmentation of oct images with central serous retinopathy,” in IEEE Journal of Biomedical and Health Informatics. Vol 23, No. 1, January 2019.
  • [8] G. R. Wilkins et al., “Automated segmentation of intraretinal cystoid fluid in optical coherence tomography,” in IEEE Transactions on Biomedical Engineering. pp. 1109-1114, 2012.
  • [9] P. L. Vidal et al., “Intraretinal fluid identification via enhanced maps using optical coherence tomography images,” in Biomedical Optics Express. October 2018.
  • [10] S. Khalid et al., “Automated segmentation and quantification of drusen in fundus and optical coherence tomography images for detection of armd,” in Journal of Digital Imaging. December 2017.
  • [11] S. Khalid et al., “Fully automated robust system to detect retinal edema, central serous chorioretinopathy, and age related macular degeneration from optical coherence tomography images,” in BioMed Research International. March 2017.
  • [12] T. Hassan et al., “Automated segmentation of subretinal layers for the detection of macular edema,” in Applied Optics. 55, 454-461, 2016.
  • [13] B. Hassan et al.

    , “Structure tensor based automated detection of macular edema and central serous retinopathy using optical coherence tomography images,” in

    Journal of Optical Society of America A.
    33, 455-463, 2016.
  • [14] A. M. Syed et al., “Automated diagnosis of macular edema and central serous retinopathy through robust reconstruction of 3D retinal surfaces,” in Computer Methods and Programs in Biomedicine. 137, 1-10, 2016.
  • [15] L. Fang et al., “Automatic segmentation of nine retinal layer boundaries in oct images of non-exudative amd patients using deep learning and graph search,” in Biomedical Optics Express. Vol. 8, No. 5, May 2017.
  • [16] A. G. Roy et al., “ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks,” in Biomedical Optics Express. Vol. 8, No. 8, 1 August 2017.
  • [17] T. Schlegl et al., “Fully automated detection and quantification of macular fluid in oct using deep learning,” in Ophthalmology. Vol. 125, No. 4, April 2018.
  • [18] B. Hassan et al., “Deep ensemble learning based objective grading of macular edema by extracting clinically significant findings from fused retinal imaging modalities,” in MDPI Sensors. July 2019.
  • [19] P. Seebock et al.

    , “Exploiting epistemic uncertainty of anatomy segmentation for anomaly detection in retinal oct,” in

    IEEE Transactions on Medical Imaging.
    May 2019.
  • [20] L. Fang et al., “Attention to lesion: Lesion-aware convolutional neural network for retinal optical coherence tomography image classification,” in IEEE Transactions on Medical Imaging. August 2019.
  • [21] T. Hassan et al., “RAG-FW: A hybrid convolutional framework for the automated extraction of retinal lesions and lesion-influenced grading of human retinal pathology,” in IEEE Journal of Biomedical and Health Informatics. March 2020.
  • [22] H. Zhao et al., “Pyramid scene parsing network,” in IEEE CVPR. 2017.
  • [23] V. Badrinarayanan et al., “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence. December 2017.
  • [24] O. Ronneberger et al., “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI. 2015.
  • [25] J. Long et al., “Fully convolutional networks for semantic segmentation,” in IEEE CVPR. 2015.
  • [26] R. Rasti et al., “Macular oct classification using a multi-scale convolutional neural network ensemble,” in IEEE Transactions on Medical Imaging. vol. 37, no. 4, pp. 1024-1034, April 2018.
  • [27] T. Mahmudi et al., “Comparison of macular octs in right andleft eyes of normal people,” in Proc. SPIE, Medical Imaging, San Diego, California, United States Feb. 15-20, 2014.
  • [28] S. Farsiu et al., “Quantitative classification of eyes with and without intermediate age-related macular degeneration using optical coherence tomography,” in Ophthalmology. 121(1), 162-172 January 2014.
  • [29] P. P. Srinivasan et al., “Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images,” in Biomedical Optics Express. Vol. 5, No. 10 — DOI:10.1364/BOE.5.0035 68, 12 Sep 2014.
  • [30] T. Hassan et al., “BIOMISA Retinal Image Database for Macular and Ocular Syndromes,” in ICIAR-2018. Portugal, June 2018.
  • [31] D. Kermany et al., “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell. 172(5):1122-1131, 2018.