Deep Learning based Early Detection and Grading of Diabetic Retinopathy Using Retinal Fundus Images

12/27/2018 ∙ by Sheikh Muhammad Saiful Islam, et al. ∙ Manarat International University 38

Diabetic Retinopathy (DR) is a constantly deteriorating disease, being one of the leading causes of vision impairment and blindness. Subtle distinction among different grades and existence of many significant small features make the task of recognition very challenging. In addition, the present approach of retinopathy detection is a very laborious and time-intensive task, which heavily relies on the skill of a physician. Automated detection of diabetic retinopathy is essential to tackle these problems. Early-stage detection of diabetic retinopathy is also very important for diagnosis, which can prevent blindness with proper treatment. In this paper, we developed a novel deep convolutional neural network, which performs the early-stage detection by identifying all microaneurysms (MAs), the first signs of DR, along with correctly assigning labels to retinal fundus images which are graded into five categories. We have tested our network on the largest publicly available Kaggle diabetic retinopathy dataset, and achieved 0.851 quadratic weighted kappa score and 0.844 AUC score, which achieves the state-of-the-art performance on severity grading. In the early-stage detection, we have achieved a sensitivity of 98 our proposed method. Our proposed architecture is at the same time very simple and efficient with respect to computational time and space are concerned.



There are no comments yet.


page 2

page 5

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Diabetic retinopathy, a chronic, progressive eye disease, has turned out to be one of the most common causes of vision impairment and blindness especially for working ages in the world today [1]. It results from prolonged diabetes. Blood vessels in the light-sensitive tissue (i.e. retina) are mainly affected in diabetic retinopathy. The non-proliferative diabetic retinopathy (NPDR) occurs when the blood vessels leak the blood in the retina. The Proliferative DR (PDR), which causes blindness in the patient, is the next stage to NPDR.

Figure 1: Example eye image of the proliferative diabetic retinopathy. Additional new blood vessels will begin to grow on the surface of the retina. Due to their abnormal and fragile nature, retinal hemorrhages and ruptured blood vessels are created in this stage which will lead to permanent vision loss.

The progress of DR can be categorized into four stages: mild, moderate, severe nonproliferative diabetic retinopathy, and the advanced stages of proliferative diabetic retinopathy. In mild NPDR, small areas in the blood vessels of the retina, called microaneurysms, swell like a balloon. In moderate NPDR, multiple microaneurysms, hemorrhages, and venous beading occur, causing the patients to lose their ability to transport blood to the retina. The third stage, called severe NPDR, results from the presence of new blood vessels, which is caused by the secretion of growth factor. The worst stage of DR is the proliferative diabetic retinopathy, as illustrated in Fig. 1 in which fragile new blood vessels and scar tissue form on the surface of the retina, increasing the likelihood of blood leaking, leading to permanent vision loss.

At present, retinopathy detection system is accomplished by involving a well-trained physician manually detecting vascular abnormalities and structural changes of retina in the retinal fundus images, which are then taken by dilating the retina using vasodilating agent. Due to the manual nature of DR screening methods, however, highly inconsistent results are found from different readers, so automated diabetic retinopathy diagnosis techniques are essential for solving these problems.

Although DR can damage retina without showing any indication at the preliminary stage [2], successful early-stage detection of DR can minimize the risk of progression to more advanced stages of DR. The diagnosis is particularly difficult for early-stage detection because the process relies on discerning the presence of microaneurysms, retinal hemorrhages, among other features on the retinal fundus images. Furthermore, accurate detection and determination of the stages of DR can greatly improve the intervention, which ultimately reduces the risk of permanent vision loss.

Earlier solutions of automated diabetic retinopathy detection system were based on hand-crafted feature extraction and standard machine learning algorithm for prediction 

[3]. These approaches were greatly suffer due to the hand-crafted nature of DR features extraction since feature extraction in color fundus images are more challenging compared to the traditional images for object detection task. Moreover, these hand-crafted features are highly sensitive to the quality of the fundus images, focus angle, presence of artifacts, and noise. Thus, these limitations in traditional hand-crafted features make it important to develop an effective feature extraction algorithm to effectively analyze the subtle features related to the DR detection task.

In recent times, most of the problems of computer vision have been solved with greater accuracy with the help modern deep learning algorithms, Convolutional Neural Networks (CNNs) being an example. CNNs have been proven to be revolutionary in different fields of computer vision such as object detection and tracking, image and medical disease classification and localization, pedestrian detection, action recognition, etc. The key attribute of the CNN is that it extracted features in task dependent and automated way. So, in this paper, we present an efficient CNN architecture for DR detection in large-scale database. Our proposed network was designed with a multi-layer CNN architecture followed by two fully connected layer and an output layer. Our network outperforms other state-of-the-art network in early-stage detection and achieves state-of-the-art performance in severity grading of diabetic retinopathy detection.

The rest of the paper is organized as follows: related work is presented in the section 2, followed by the proposed method in the section 3, while experimental set-up and results are discussed in the section 4. Finally, we draw our conclusion in the section 5.

2 Related Work

The earlier works on automatic diabetic retinopathy detection were based on designing hand-crafted feature detectors to measure the blood vessels and optic disc, and on counting the presence of abnormalities such as microaneurysms, red lesions, hemorrhages, and hard exudates, etc. The detection was performed using these extracted features by employing various machine learning methods like support vector machines (SVM) and k-nearest neighbor (kNN

[3, 4]. In [5], Acharya et al. used features of blood vessel area, microaneurysms, exudes, and hemorrhages with an SVM, achieving an accuracy of , specificity of , and sensitivity of . Roychowdhury et al. [6]

developed a two-step hierarchical classification approach, where the non-lesions or false positives were rejected in the first step. For lesion classification in the second step, they used classifiers such as the Gaussian mixture model (GMM), kNN, and support vector machine (SVM). They achieved sensitivity of

, specificity of , and AUC . However, these types of approaches have the disadvantage of utilizing limited number of features.

Deep learning based algorithms have become popular in the last few years. For example, standard ImageNet architectures were used in 

[11, 13]. Furthermore, Kaggle [9] has recently launched a DR detection competition, where all the top-ranked solutions were implemented employing CNN as the key algorithm. Pratta et al. [11] developed a CNN based model, which surpassed human experts in classifying advanced stages of DR. In [14], CNN based method was employed to detect microaneurysms a DR stage grading. Ensemble of CNN was employed to simultaneously detect DR and macular edema by Kori et al. [12]. They employed a variant of ResNet [15] and densely connected networks [16]. To make the model prediction more interpretable, a visual map was generated by Torre et al. [17] using CNN model, which can be used to detect lesion in the tested retinal fundus images. A similar approach was used in [18] along with generation of regression activation map (RAM).

Some researches focused on exploring breakdown of classification task into subproblem prediction tasks. For example, Yang et al. [19] employed a two-stage deep convolutional neural network based methodology, where exudates, microaeurysms,and haemorrhage were first detected by local network and subsequent severity grading was performed by global network. By introducing unbalanced weight map to emphasize leison detection, they achieved AUC of . Authors of [23] implemented an architecture like VGG-16 [24] and Inception-4 [21] network for DR classification.

Some recent works [22]

in diabetic retinopathy have leveraged mean squared error objective function to convert the classification task into a regression task. Here ensemble of classical machine learning algorithms, like naive bayes classifier, SVM, as well as ImageNet state-of-the-art networks, with mean squared error objective function applied to tackle DR detection problem with accuracy, Kappa, and F-score of

, , and , respectively.

3 Proposed Method

3.1 Data Preprocessing

Figure 2: Demonstration of the retinal fundus image before and after preprocessing.

There exists a lot of exposure and lighting variation over the original fundus images, so we took several preprocessing steps as suggested by Graham  [26] to standardize the image condition. First, we re-scaled the images to get the same radius and subtracted the local average color. Thereafter, the local average color of the images were mapped to gray. We also clipped the images to size to remove the boundary effects. The sample of the resulted preprocessed images, along with the original images, are illustrated in Fig. 2.

3.2 Data Augmentation

Figure 3: Examples of some augmentation operation performed on a preprocessed retinal images. After the operation, each augmented image is resized maintaining the aspect ratio.

The performance of deep neural network is strongly correlated with the size of available training data. Although Kaggle EyePACS dataset is largest for retinopathy detection consisting of around images, We are to use a very small fraction of it containing images for disease severity grading task with imbalanced classes, requiring us to heavily augment our training data to obtain a model which is stable and not overfitted. The major data augmentation operations that we performed are listed below.

Grade Raw Training Validation Operations Total

 25810 25610  200  0 25810

 2443 2243  200  11 26916

Moderate NPDR
 5292 5092  200  4 25460

Severe NPDR
 873 673  200  27 18844

Proliferative DR
 708 508  200  35 18288


Table 1: Statistics of the augmentation operations performed on the training set of Kaggle EyePACS dataset. Due to highly imbalanced nature of the dataset, different grades were augmented differently.
  • Rotation :– Images were randomly rotated between to

  • Shearing :- Randomly sheared with angle between −20 and 20

  • Flip :– Images were both horizontally and vertically flipped

  • Zoom :– Images were randomly stretched between (1/1.3 , 1.3)

  • Crop :- Images were randomly cropped to of the original size

  • Krizhevsky augmentation :– Images were augmented by Krizhevsky color augmentation technique [20].

  • Translation :– Images were randomly shifted between and pixel

Also, we scaled and centered each image channel (RGB) to get zero mean and unit variance over the dataset. Fig. 

3 shows some post-augmentation example images. We eventually obtained up to 35 augmentation version of each training image. The output image sizes of our data augmentation pipeline are x and x .

3.3 Network Architecture

Layer Type Kernel Size & Number Stride Output Shape

.. ..

4 x 4 x 32 2

4 x 4 x 32 1

3 x 3 2

4 x 4 x 64 2

4 x 4 x 64 1

3 x 3 2

4 x 4 x 128 1

4 x 4 x 128 1

3 x 3 2

4 x 4 x 256 1

3 x 3 2

4 x 4 x 384 1

3 x 3 2

4 x 4 x 512 1

3 x 3 2

fully connected
1024 ..
fully connected 1024 ..
fully connected 1 ..

Table 2: The proposed network architecture of our early-stage detection and severity grading model. The depth of the network is 18 layer while the kernel size of the convolutional filter is x . Max-pooling of x

is used to downsample the activation map. After convolution layer, two fully connected layers of 1024 neuron followed by a single neuron output layer are added considering the early-stage detection and severity grading task as a regression problem.

Table 2 illustrates the network architecture of our proposed DR detection method. The input layer of the network is x . We tried several kernel filters of sizes 3 x 3, 4 x 4, 5 x 5 and found the best result in kernel of size 4 x 4. Hence, all the convolutional layers of our network have the kernel size of 4 x 4 with united bias and the stride of 1 except the first and third convolutional layer which have stride of 2.

LeakyReLU [25]

was used in all convolutional layer as the activation function for nonlinearity. All the Max-Pooling layers used have same kernel size of 3 x 3. The final extracted local features were flattened before passing through fully connected layers. There are two fully connected layers, each having 1024 neurons. Dropout of

was added after all but the last fully connected layers to reduce overfitting. Since it is much worse to misclassify severe NPDR or PDR as normal eye than as moderate retinopathy, we considered this multi-class classification as a regression problem and so an output layer of one neuron was added. We took mean squared error

as our objective function. Also, we clipped the loss function value between 0 and 4 since our class ranges between these values.

Layer Type Output Shape

Input layer

Fully connected
Maxout network

Fully connected
Maxout network

Fully connected
Table 3: Network architecture of features blending network

In diabetic retinopathy experiment, blending the features for both eyes of a patient usually leads to a significant improvement in performance [27]. Thus, we blended our features according to the state-of-the-art blending method [27], where the output of our last max-pooling layer was also used as input features to the blending network, as shown in Table 3

. To improve the feature quality, feature extraction was repeated as many as 40 times with different augmentations per image. The mean and standard deviation of each feature was used as input to our blending network.

3.4 Training

Our eighteen-layer-deep proposed network has more than 8.9 million parameters which were randomly initialized using orthogonal weight initialization. The network was trained with SGD optimization function with 0.90 Nesterov momentum with a fixed schedule over

epochs with data augmentation at each step. L2 regularization with a factor of 0.0005 was applied to all weighted layers. We tested several learning rates but found to be the best initial learning rate. The learning rate constantly decreased as the number of training epochs progressed and ended up having a learning rate of

. The summary of the settings of our training hyperparameters are given in Table 


Hyperparameter Value
Objective function Mean Squared Error (MSE)



Multiple learning rates
(for 80 epochs)
(for next 70 epochs)
x (for next 40 epochs)
(for next 110 epochs)

Batch size


Table 4: Hyperparameters setting of our proposed network architecture

The blending network was trained with Adam [28] optimization algorithm with a fixed schedule over

epochs. ReLU activation function was used after each fully connected layer and an L2 regularization with a factor of 0.001 was applied to every layer. The batch size used was 32 considering

mean squared error as our objective function.

4 Experimental Results

4.1 Dataset

There are several publicly available databases of fundus images for DR including DIARETDB0 [7], DIARETDB1 [8], Kaggle EyePACS [9], and Messidor [10] Databases. DIARETDB0 dataset contains 130 color fundus images, out of which 20 are normal and 110 are affected by DR. On the other hand, DIARETDB1 contains a total 89 color fundus images, out of which 84 images contain sign of microaneurysms and five images are normal. The Messidor database contains 1200 retinal fundus images.

Figure 4: Example of some poor quality retinal images of Kaggle EyePACS dataset including too bright, dark, and blurred images which make it difficult for the learning algorithm to accurately classify the DR grades.

In this work, we used EyePACS, the largest publicly available dataset for diabetic retinopathy, from Kaggle Diabetic Retinopathy Detection competition [9] sponsored by the California Healthcare Foundation, which contains a total of high-resolution retinal fundus images, captured under a variety of imaging conditions. The dataset also contains artifacts, out of focus, too bright, and too dark images as illustrated in Fig. 4.

Fundus images are categorized into five grades according to the severity of DR which are named sequentially as healthy or normal image, mild non-proliferative DR, moderate non-proliferative DR, severe non-proliferative DR, and proliferative DR. The dataset is highly imbalanced due to the presence of around grade 0 (no DR) images. Table 5 shows overview of the distribution among different grades in the training set images of the dataset. We split the dataset with images in training set and images in test set as suggested by the Kaggle competition [9] dataset settings. We also used of images of each class in training set for training and for validation.

DR Grade Grade Name Total Images Percentage

Normal 25810 73.84%

Mild NPDR 2443 6.96%

Moderate NPDR 5292 15.07%

Severe NPDR 873 2.43%

Proliferative DR 708 2.01%
Table 5: Grade distribution in the training set of Kaggle EyePACS dataset

4.2 Performance Evaluation on Early-Stage Detection

In this work, we performed two binary classification for early-stage detection experiment: sick (grades ) vs healthy (grade 0), and low (grades ) vs high (grades ). We considered both of these sub-problems equally important for early-stage detection.

Classification Problem Sensitivity Specificity

Healthy (0) vs Sick (1,2,3,4)
94.5% 90.2%

Low (0,1) vs High (2,3,4)
98% 94%
Table 6: Performance evaluation of our proposed method on DR early-stage detection. Considering early-stage detection as a binary classification problem, our proposed method achieved astonishing sensitivity and in low-high DR detection.

We calculated sensitivity and specificity metric for both of these binary classification problems and found out higher performance in low-high DR classification than in healthy-sick classification. This is because retinal features of grade are similar to grade

4.3 Performance Evaluation on Severity Grading

The quadratic weighted kappa

, the state-of-the-art performance metric for multi-class classification and suggested evaluation metric for DR 

[9], is adopted as the performance metric of our severity grading prediction. We took thresholds to discretize the predicted regression values and make the class levels into integer for computing the Kappa scores. We achieved 0.851 quadratic weighted kappa on test set of Kaggle dataset after submitting our solution in [9].

  Mean squared error

Quadratic Weighted Kappa

Area Under the ROC Curve

Table 7: Performance evaluation of our proposed method on severity grading of diabetic retinopathy.

We also calculated Area Under the ROC Curve (AUROC) and F-Score of our proposed architecture on the same dataset and achieved score and respectively.

5 Conclusion

In this paper, we have presented a novel CNN-based deep neural network to detect early-stage and severity grades of diabetic retinopathy in retinal fundus images. In our work, we have found that without heavy data augmentation, a high capacity network can easily overfit the training data. Even with data-augmentation, any network can overfit on oversampled classes such as healthy eye (grade 0). Thus, designing a small capacity network with L2 regularization, and dropout has significant importance in retinopathy detection. So, in this work, we have presented a 4 x 4 kernel based CNN network with several preprocessing and augmentation methods to improve the performance of the architecture. Our network achieved 98% sensitivity and more than 94% specificity in early-stage detection and a kappa score of more than 0.85 in severity grading on the challenging Kaggle EyePACS dataset. The experimental results have demonstrated the effectiveness of our proposed algorithm to be good enough to be employed in clinical applications.


  • [1] Congdon, N.-G., Friedman, D.-S., Lietman, T.: Important causes of visual impairment in the world today. Jama 290 (15), 2057–2060 (2003).
  • [2] Melville, A., et al.: Complications of diabetes: screening for retinopathy and management of foot ulcers. Qual. Saf. Health Care. 9(2), 137–141 (2000)
  • [3]

    Silberman, N., Ahrlich, K., Fergus, R., Subramanian, L.: Case for automated detection of diabetic retinopathy. In: AAAI Spring Symposium: Artificial Intelligence for Development, (2010)

  • [4] Sopharak, A., Uyyanonvara, B. and Barman, S.: Automatic exudate detection from non-dilated diabetic retinopathy retinal images using fuzzy c-means clustering. In: Sensors, 9(3), pp. 2148–2161, (2009)
  • [5] Acharya, U.-R., Lim, C.-M., Ng, E.-Y.-K., Chee, C., Tamura, T.: Computer-based detection of diabetes retinopathy stages using digital fundus images. In: Proceedings of the institution of mechanical engineers, part H: journal of engineering in medicine 223(5), pp. 545–553, (2009)
  • [6] Roychowdhury, S., Koozekanani, D.D., Parhi, K.K.: DREAM: diabetic retinopathy analysis using machine learning. IEEE journal of biomedical and health informatics 18(5), pp. 1717–1728 (2014)
  • [7]

    Kauppi, T., Kalesnykiene, V., Kamarainen, J.K., Lensu, L., Sorri, I., Uusitalo, H., Kälviäinen, H., Pietilä, J.: DIARETDB0: Evaluation database and methodology for diabetic retinopathy algorithms. Machine Vision and Pattern Recognition Research Group, Lappeenranta University of Technology, Finland (2006)

  • [8] Kamarainen, T.K.K.K., Sorri, L., Pietilä, A.R.V., Uusitalo, H.K.: the DIARETDB1 diabetic retinopathy database and evaluation protocol. In: Proceedings of British Machine Vision Conference, pp. 15.1–15.10. BMVA Press (2007)
  • [9] Kaggle: Diabetic Retinopathy Detection., (2015)
  • [10] Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., Gain, P., Ordonez, R., Massin, P., Erginay, A., Charton, B.: Feedback on a publicly distributed image database: the Messidor database. Image Analysis & Stereology 33(3), 231–234 (2014)
  • [11] Pratta, H., Coenenb, F., Broadbentc, D.-M., Hardinga, S.-P., Zheng, Y.: Convolutional neural networks for diabetic retinopathy. In: Procedia Computer Science, pp. 200–205. (2016)
  • [12] Kori, A., Chennamsetty, S.-S., Alex, V.: Ensemble of Convolutional Neural Networks for Automatic Grading of Diabetic Retinopathy and Macular Edema. In: arXiv preprint arXiv:1809.04228, (2018)
  • [13] Wang, S., Yin, Y., Cao, G., Wei, B., Zheng, Y., Yang, G.: Hierarchical retinal blood vessel segmentation based on feature and ensemble learning. In: Neurocomputing, 149, 708–717 (2015)
  • [14] Antal, B., Hajdu, A.: An ensemble-based system for microaneurysm detection and diabetic retinopathy grading. In: IEEE transactions on biomedical engineering, 59(6), p. 1720. (2012)
  • [15] He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In. European Conference on Computer Vision (ECCV), pp. 630–645, Springer, Cham (2016)
  • [16] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.-Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), vol. 1, p. 3. (2017)
  • [17] de la Torre, J., Valls, A., Puig, D.: A Deep Learning Interpretable Classifier for Diabetic Retinopathy Disease Grading. In: arXiv preprint arXiv:1712.08107, (2017)
  • [18] Wang, Z., Yang, J.: Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation. In: arXiv preprint arXiv:1703.10757, (2017)
  • [19] Yang, Y., Li, T., Li, W., Wu, H., Fan, W., Zhang, W.: Lesion detection and grading of diabetic retinopathy via two-stages deep convolutional neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 533-540. Springer, Cham (2017)
  • [20] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
  • [21]

    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12. (2017)

  • [22] Butterworth, D.T., Mukherjee, S., Sharma, M., Ensemble Learning for Detection of Diabetic Retinopathy
  • [23] Bravo, M.-A., Arbeláez, P.A.: Automatic diabetic retinopathy classification. In: 13th International Conference on Medical Information Processing and Analysis, vol. 10572, p. 105721E (2017).
  • [24] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR, (2015)
  • [25] Maas, A.L., Hannun, A.Y. and Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 3. (2013)
  • [26] Graham, B.: Kaggle diabetic retinopathy detection competition report. (2015)
  • [27] Antony, M., Brüggemann, S.:Kaggle Diabetic Retinopathy Detection Team o_O solution. (2015)
  • [28] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. International Conference for Learning Representations (ICLR) (2015)