Optic-Net: A Novel Convolutional Neural Network for Diagnosis of Retinal Diseases from Optical Tomography Images

by   Sharif Amit Kamran, et al.

Diagnosing different retinal diseases from Spectral Domain Optical Coherence Tomography (SD-OCT) images is a challenging task. Different automated approaches such as image processing, machine learning and deep learning algorithms have been used for early detection and diagnosis of retinal diseases. Unfortunately, these are prone to error and computational inefficiency, which requires further intervention from human experts. In this paper, we propose a novel convolution neural network architecture to successfully distinguish between different degeneration of retinal layers and their underlying causes. The proposed novel architecture outperforms other classification models while addressing the issue of gradient explosion. Our approach reaches near perfect accuracy of 99.8 available Retinal SD-OCT data-set respectively. Additionally, our architecture predicts retinal diseases in real time while outperforming human diagnosticians.


page 1

page 4

page 7


Fully Automated Segmentation of Hyperreflective Foci in Optical Coherence Tomography Images

The automatic detection of disease related entities in retinal imaging d...

Deep learning achieves perfect anomaly detection on 108,308 retinal images including unlearned diseases

Optical coherence tomography (OCT) scanning is useful in detecting vario...

Comparisonal study of Deep Learning approaches on Retinal OCT Image

In medical science, the use of computer science in disease detection and...

Anomaly Detection in Retinal Images using Multi-Scale Deep Feature Sparse Coding

Convolutional Neural Network models have successfully detected retinal i...

Fused Detection of Retinal Biomarkers in OCT Volumes

Optical Coherence Tomography (OCT) is the primary imaging modality for d...

Code Repositories


[ICMLA'19] [Tensorflow] Classifying different Retinal Diseases using Deep Learning from Optical Coherence Tomography Images

view repo

I Introduction

Diabetes is a major health concern which affects up to 7.2% of the population world-wide and the numbers could soon rise up to 600 million by the year 2040 [1, 2]. With its prevalence, one third of every diabetic patient develops Diabetic Retinopathy (DR). [3] This is a major cause for vision loss and affects nearly 2.8 % of the population [4]. Despite having effective vision tests for DR screening and early treatment in developed countries, avoiding erroneous results has always been a challenge for diagnosticians. On the other hand, DR has been often mistreated in many developing and poorer economies, where access to trained ophthalmologist and eye-care machineries may be insufficient. So it’s quite imminent to have an automated system which will help diagnose Diabetic Retinopathy and other related Retinal diseases with high precision and speed. This paper proposes a novel architecture based on convolutional neural network which can identify Diabetic Retionpathy, while being able to categorize multiple retinal diseases with near perfect accuracy.

In ophthalmology a technique called Spectral Domain Optical Coherence Tomography (SD-OCT) is used for viewing the morphology of the retinal layers[5]. Moreover, depth-resolved tissue formation data encoded in the magnitude and delay of the back-scattered light by spectral analysis is also used to treat this diseases[6]. Though the image is retrieved through this process, the differential diagnosis is conducted by an ophthalmologist. Consequently, there will always be room for human error while performing the differential. Hence, an expert system is required to clearly distinguish between different retinal diseases with fewer mistakes.

One of the major reasons for misclassificaion is due to the stark similarity between Diabetic Retinopathy and other retinal diseases. They can be grouped by three major categories, i) Diabetic Macular Edema (DME) and Age-related degeneration of retinal layers (AMD), ii) Drusen, a condition where lipid or protein build-up occurs in the retinal layer and iii) Choroidal Neovascularization (CNV), a growth of new blood vessels in sub-retinal space. Diabetic Retinopathy and Age-related Macular Degeneration are the most likely cause of retinal diseases worldwide [7]. While Drusen acts as an underlying cause that can trigger DR or AMD in a prolonged time-frame. On the other hand, Choroidal Neovascularization is an advanced stage of age-related macular degeneration that affects about 200,000 people worldwide every year [8, 9].

Despite a decade of improvements to existing algorithms, identification of retinal diseases still produces erroneous results and requires expert intervention. To address this problem, we propose a novel architecture which not only identifies retinal diseases in real-time but also performs better than human experts for specific tasks. In the following sections, we elaborate our principal contributions and also provide a comparative analysis of different approaches.

Type of Convolution used in
the Middle of Residual Unit
Approximate # Parametersa
( Without Counting Bias )
Depletion Factor for Parameter,
( Compared to Regular Convolution )
= 36,864 100% 99.30%
= 16,384 = 44.9% 97.29%
= 4,672 = 12.5% 98.03%
Atrous Separable
= 4,352 = 11.6% 96.69%
Atrous Convolution
and Atrous Separable
Convolution Branched
= 5,248 = 14.4% 99.80%
  • Here, kernel size, (f , f) = (3 , 3). Depth (# kernels) in Residual unit’s middle operation, and first operation, .

  • The Test Accuracy reported in the table is obtained by training on OCT2017 [10] data-set, while the backbone network is Optic-Net 71.

TABLE I: Comparison between different convolution operations used in the middle portion of residual unit.

Ii Literature Review

Ii-a Traditional Image Analysis

The earliest approach to detect and classify retinal diseases from images included multiple image processing techniques followed by feature extraction and classification

[11]. One such automated technique included finding abnormalities such as micro-aneurysms, haemorrhages, exudate and cotton wool-spot from Retinal Fundus images [12]

. This approach uses a noise reduction algorithm and blurring to branch out the four-class problem to two cases of a two-class problem. From there on, background subtraction followed by shape estimation to extract important features is used. Finally, those features were used to classify each of the four abnormalities. Similarly, other such feature based technique was used for detecting Diabeitc Macular Edema and Choroidal Neovascularization. The images were manipulated focused on five distinct parameters: Retinal Thickness, augmentation of Retinal Thickening, Macular volume, retinal morphology and vitreoretinal relationship


. Other approaches combined statistical classification with edge detection algorithms to detect sharp edges

[14]. Sanchez et al.’s [14] algorithm achieved a sensitivity score of 79.6% for classifying Diabeitc Retionpathy. Ege et al.’s [12] approach incorporating Mahalanobis classifier detected microaneurysms, haemorrhages, exudates, and cottonwool spots with a sensitivity of 69, 83, 99, and 80%, respectively. It’s quite evident that each of these techniques shown slight improvements, but in terms of precision it didn’t achieve desired results.

Ii-B Segmentation based approaches

The most notable way to identify a patient having Diabetic Macular Edema is the enlargement of macular density in retinal layer [15, 6]. Many approaches have been proposed and implemented that involves segmentation of retinal layers. Further identification of likely causes are also performed for build-up of liquids in the sub-retinal space [16, 17, 18]. In [19, 20], the authors proposed the idea of segmenting the intra-retinal layers in ten parts and then extracted the texture and depth information from each layer. Subsequently, any aberrant retinal features are detected by classifying the dissimilarity between healthy retinas and the diseased ones. Niemeijer et al. [16] introduced a technique for 3D segmentation of regions containing fluid in OCT images using a graph-based implementation. A graph-cut algorithm is applied to get the final predictions from the information initially retrieved from layer-based segmentation of fluid regions. Even though implementation based on a previous segmentation of retinal layers have reported high scoring prediction results, the initial step is reportedly troublesome and erroneous [21, 22]. As reported in [23], retinal thickness measurements obtained by different systems has stark dissimilarity. Therefore, it is not quite effective to compare between different retinal depth information retrieved by separate machines. Enforcing the fact that segmentation based approaches weren’t effective as a universal retinal disease recognition system.

Ii-C Machine Learning and Deep Learning techniques

Lately, a combination of machine learning and deep learning architectures has become a go to, for achieving state-of-the-art accuracy for recognizing various retinal diseases [24, 25, 26]. Awais et al. combined VGG16 [27]

with KNN and Random forest classifier (100 trees) to create a deep classification architecture for differentiating between Normal Retina and Diabetic Macular Edema. On the other hand, Lee et al. used a standalone VGG16 architecture with a binary output to detect Age-related Macular Edema (AMD)


. Although this techniques exploit automatic feature learning from large array of images, the architecture itself isn’t efficient in terms of speed and memory usage. On the contrary, transfer learning methods depend on weeks of training on millions of images and are not idle for finding stark differences between Retinal diseases. To help alleviate from all of these challenges an architecture is necessary which is specially catered for identifying retinal deceases with high precision, speed, and low memory usage.

Fig. 1: Illustration of the Building Blocks of our proposed CNN [ OpticNet-71 ]. Only the very first convoluiton () layer in the CNN and the very last convolution (

) layer in Stage[2,3,4]: Residual Convolutional Unit uses stride 2, while all other convolution operations use stride 1.

Ii-D Our Contributions

In this work, we propose a novel convolutional neural network which specializes in identifying retinal diseases with near perfect precision. Moreover, through this architecture we are proposing (a) a new residual unit subsuming Atrous Separable Convolution, (b) a novel building block and (c) a mechanism to prevent gradient degradation. The proposed network outperforms other architectures with respect to the number of parameters, accuracy, and memory size. Our proposed architecture is trained from scratch and bench-marked on two publicly available data-sets: OCT2017 [10], Srinivasan2014 [5] data-sets. Henceforth, it doesn’t require any pre-trained weights, reducing the training and deployment time of the model by many folds. We believe with the deployment of this model, the rapid identification and treatment can be carried out with near perfect certainty. Additionally, it will aid the ophthalmologist to get a second expert opinion for their differential diagnosis.

Iii Proposed Methodology

Fig. 1 illustrates the Deep Convolutional Neural Network (CNN) architecture we propose for the classification of retinal diseases from Optical Coherence Tomography (OCT) images. In Fig. 1(a) we delineate how the proposed Residual Learning Unit improves feature learning capabilities while discussing the techniques we adopt to reduce computational complexity for such performance enhancement. While, Fig. 1(b) depicts the proposed mechanism to handle gradient degradation, Fig. 1(c) narrates the entire CNN architecture. We discuss the constituent segments of our CNN architecture, called Optic-Net, over the following subsections.

Iii-a Proposed Residual Learning Mechanism

Historically, Residual Units [28, 29] used in Deep Residual Convolutional Neural Networks (CNN), process the incoming input through three convolution operations while adding the incoming input with the processed output. These three convolutional operations are (11), (33) and (11) convolutions. Therefore, replacing the (33) convolution in the middle with other types of convolutional operations can potentially change the learning behaviour, computational complexity and eventually prediction performance.

We experimented with different convolution operations as replacement for the (33) middle convolution and observed which choice contributes the most to reduce the number of parameters, ergo computational complexity, as depicted in Table I. Furthermore, in Table I, we use a depletion factor for parameters, which is a ratio of number of parameters in the replaced convolution and regular convolution expressed in percent. The first four rows of Table I indicates that using Atrous Separable Convolution is the most computationally effective method. However, our experiment shows that this does not lead to the best prediction performance, which we demonstrated in Table I as well.

In this work however, we replace the middle (33) convolution operation with two different operations running in parallel as detailed in Fig. 1(a). Whereas, a conventional residual unit uses number of channels for the middle convolution, we use number of channels for each of the newly replaced operations to prevent any surge in parameter. In the proposed branching operation we use a (22) Atrous convolution () with dilation rate, = 2 to get a (33) receptive field in the left branch while in the right branch we use a (22) Atrous separable convolution () with dilation rate, = 2 to get a (33) receptive field. Sequentially, the results are then added together. Furthermore, separable convolution [30] disentangles the spatial and depth-wise feature maps separately while Atrous convolutions inspect both spatial and depth channels together. We hypothesize that adding two such feature maps that are learned very differently shall help trigger more robust and subtle features.


Fig. 2 shows how adding Atrous and Atrous separable feature maps help disentangle the input image space with more depth information instead of activating only the predominant edges. Moreover, the last row of Table I confirms that adopting this strategy still reduces the computational complexity by a reasonable margin, while improving inference accuracy. Equation (1) further clarifies how input signals travel through the proposed residual unit shown in Fig. 1(a), where refers to convolution operation.

Fig. 2: Atrous Separable Convolution. Exploiting convolutions in stead of that yields more fine grained and coarse features with better depth resolution compared to regular Atrous Convolution.

Iii-B Proposed Building Block and Signal Propagation

In this section we discuss the proposed building block as a constituent part of Optic-Net. As shown in Fig. 1(b) we split the input signal () into two branches - (1) Stack of Residual Units, (2) Signal Exhaustion. Later in this section we explain how we connect these two branches to propagate signals further in the network.


Iii-B1 Stack of Residual Units

In order to initiate a novel learning chain for propagating signals through stacking several of the proposed residual units linearly, we suggest to combine global residual effects enhanced by pre-activation residual units [29] and our proposed set of convolution operations (Fig. 1(a)). As shown in (1), denotes all the proposed set of convolution operations inside a residual unit for input . We sequentially stack these residual units N times over which is input to our proposed building block, as narrated in Fig. 1(b). Equation (2) illustrates the state of output signal denoted by which is processed through a stack of residual units of length . For the sake of further demonstration we denote as .

Iii-B2 Signal Exhaustion

In the proposed building block, we propagate the input signal

through an Max-pooling layer to achieve spatial down-sampling which we then up-sample through Bi-linear interpolation. Since the down-sampling module only forwards the strongest activations, the interpolated reconstruction makes a dense spatial volume from the down-sampled representation - intrinsically exhausting the incoming signal

. As detailed in Fig. 1(b), we sequentially pass the exhausted signal space through sigmoid activation, . Recent research [31] has shown how auto-encoding with residual skip connections [ ] improve attention oriented classification performance. However unlike auto-encoders, max-pooling and Bi-linear interpolation functions are not enabled with learning mechanism. In Optic-Net, we capacitate the CNN to activate spikes from a exhausted signal space because we use it as a mechanism to avert gradient degradation. For the sake of further demonstration we denote the exhausted signal activation module, as .

Layer Name ResNet50 V1[28] OpticNet71 [Ours]
Stage1: Res Conv
Stage1: Res Unit
Stage2: Res Conv
Stage2: Res Unit
Stage3: Res Conv
Stage3: Res Unit
Stage4: Res Conv
Stage4: Res Unit
Global Avg Pool 2048 2048
Dense Layer 1 K (Classes) 256
Dense Layer 2 K (Classes)
Parameters 25.64 Million 12.50 Million
Required FLOPs 3.8 2.5
CNN Memory 98.20 MB 48.80 MB
TABLE II: Architectural Specifications for Opticnet-71 and Layer-wise Analysis for Number of Feature Maps in Comparison with Resnet50-v1 [28].

Iii-B3 Signal Propagation

As shown if Fig. 1(b), we process the residual signal, and exhausted signal, following (3) and we denote the output signal propagated from the proposed building block as . Our hypothesis behind such design is that, whenever one of the branch falls prey to gradient degradation from a mini-batch the other branch manages to propagate signals unaffected by the mini-batch with amplified absolute gradient. To validate our hypothesis (3) shows that, and illustrating how the unaffected branch survives the degradation in the affected branch. However, when none of the branch gets affected by gradient amplification the multiplication () balances out the increase in signal propagation due to both branch’s addition. Equation (4) delineates the gradient of building block output with respect to building block input calculated during back-propagation for optimization.

Iii-C CNN Architecture and The Optimization Chain

Fig. 1(c) portrays the entire CNN architecture with all the building blocks and constituent components joined together. First, the input batch (2242243) is propagated through an 7

7 Conv with stride 2 that follows batch-normalization and ReLU activation. Then we propagate the signals via a Residual Convolution Unit (same as the unit used in

[29]) which is then followed by our proposed building block. We propagate the signals through this [Residual Convolution Unit

Building Block] procedure for S = 4 times, as we call them stage 1, 2, 3 and 4 respectively. Then global average pooling is applied to the signals which passes through two more Fully Connected(FC) layers for the loss function which is denoted by


In Table II

, we show the number of feature maps (Layer Depth) we use for each layer in the network. The output shape of the input tensor after four consecutive stages are (112

112256), (5656512), (28281024) and (14142048) respectively. Moreover, and ,where K = number of classes.


Equation (5) represents the gradient calculated for the entire network chain distributed over stages of optimization. As (4) suggests, the term - in comparison with [29]) - works as an extra layer of protection to prevent possible gradient explosion caused by the stacked residual units by multiplying non-zero activations with the residual unit’s gradients. Moreover, the term indicates that the optimization chain still has access to signals from much earlier in the network and to prevent unwanted spikes in activations the term can still mitigate gradient expansion which can potentially jeopardize learning otherwise.

Normal Drusen CNV1 DME2
Normal 0 1 1 1
Drusen 1 0 1 1
CNV1 4 2 0 1
DME2 4 2 1 0
  • CNV : Chorodial Neovascularization

  • DME : Diabetic Macular Edema

TABLE III: Penalty Weights Proposed for Oct2017 [10]
Fig. 3: Confusion matrix generated by OpticNet-71 for OCT2017[10] data-set.

Iv Experiments

Iv-a Specifications of Data-sets and Pre-processing Techniques

We benchmark our model against two distinct data-sets (different scale, sample space, etc.). The first data-set aims at correctly recognizing and differentiating between four distinct retinal states provided by the OCT2017 [10] data-set. Where, the stages are normal healthy retina, Drusen, Choroidal Neovascularization (CNV) and Diabetic Macular Edema (DME). OCT2017 [10] data-set contains 84,484 images (provided as high quality TIFF format with 3 non-RGB color channels). We split them into 83,484 train-set and 1000 test-set. The second data-set - Srinivasan2014 [5] - consists of three classes and aims at classifying normal healthy specimen of retina, Age-Related Macular Degeneration (AMD) and Diabetic Macular Edema (DME). Srinivasan2014 [5] data-set consists of 3,231 image samples that we split into 2,916 train-set, 315 test-set. We resize images from both data-sets to for both training and testing. For both the data-set we do 10-fold cross-validation on the training set and find the best models.

Iv-B Performance Metrics

We calculated four standard metrics to evaluate our CNN model on both data-sets : Accuracy (6), Sensitivity (7), Specificity (8) and a Special Weighted Error (9) from [10]. Where N is the number of image samples and K is the number of classes. Here TP, FP, FN and TN denotes True Positive, False Positive, False Negative and True Negative respectively. We report True Positive Rate (TPR) or Sensitivity (6) and True Negative Rate (TNR) or Specificity (7) for the both the data-sets [5, 10]. For this, we calculate the TPR and TNR for individual classes then sum all the values and then divide that by the number of classes (K).


As reported in [10], the penalty points for incorrect categorization of a retinal disease can be arbitrary. Table III shows the penalty weight values for misidentifying a category set by [10] which is only specific to OCT2017 [10] data-set. To calculate Weighted Error (9), we apply element-wise multiplication on the confusion matrix generated by specific model (Fig. 3 represents the confusion matrix generated by OpticNet-71 on OCT2017 [10] data-set) and the weight matrix in Table III and then take an average over the number of samples. Here, the penalty weight values from Table III is denoted by W and the model’s prediction (confusion matrix) is denoted by X where i,j denotes the rows and columns of the confusion matrix.

Fig. 4: Test accuracy (%), CNN memory (Mega-Bytes) and model parameters (Millions) on OCT2017 [10] data-set and Srinivasan2014 [5] data-set.
Fig. 5:

(a) Visualizing input images from each class through different layers of Optic-Net 71. As shown, the feature maps at the end of each building block learns more fine grained features by focusing sometimes on the same shapes - rather in different regions of the image - learning to decide what features lead the image to the ground truth. (b) The learning progression however, shows how exhausting the signal propagated with residual activation learns to detect more thin edges - delving further into the Macular region to learn anomalies. While using the signal exhaustion mechanism sometimes, important features can be lost during training. Our experiments show, by using more of these building blocks we can reduce that risk of feature loss and improve overall optimization for Optic-Net 71.

Iv-C Training OpticNet-71 and Obtained Results

Iv-C1 OCT2017 Data-set

In Table IV, we report a comprehensive study for OCT2017 [10] data-set evaluated through testing standards such as Test Accuracy, Sensitivity, Specificity and Weighted Error. OpticNet-71 scores the highest Test Accuracy () among other existing solutions, with a Sensitivity and Specificity of and . Furthermore, the Weighted Error is reported to be a mere which can be visualized in Fig. 3 as our architecture misidentifies one Drusen and one DME sample as CNV. However, the penalty weight is only 1 for each of the mis-classification as we report in Table III. Sequentially, with our proposed OpticNet-71 we obtain state-of-the-art results on OCT2017 [10] data-set across all four performance metrics, while significantly surpassing human benchmarks as mentioned in Table IV.

Sensitivity Specificity
93.40 96.60 94.00 12.70
Expert 2[10]
92.10 99.39 94.03 10.50
InceptionV3[10] 96.60 97.80 97.40 6.60
ResNet50-v1 [28] 99.30 99.30 99.76 1.00
MobileNet-v2[32] 99.40 99.40 99.80 0.60
Expert 5 [10]
99.70 99.70 99.90 0.40
Xception [33] 99.70 99.70 99.90 0.30
99.80 99.80 99.93 0.20
TABLE IV: Results on Oct2017 [10] Data-set.

Iv-C2 Srinivasan2014 Data-set

We benchmark OpticNet-71 against other methods in Table V while evaluating Srinivasan2014[5] data-set through three metrics: Accuracy, Sensitivity and Specificity. Among the mentioned solutions in Table V Lee et al. [25] uses modified VGG-16, Awais et al. [34] uses VGG architecture with KNN in final layer and Karri et al. [35]

uses GoogleNet while they all use weights from transfer learning on ImageNet

[36]. As shown in Table V, OpticNet-71 achieves state-of-the-art result by scoring Accuracy, Sensitivity and Specificity.

Furthermore, we train ResNet50-v1[28], ResNet50-v2[29], MobileNet-v2[32] and Xception[33] using pre-trained weights from 3.2 million ImageNet Data-set consisting of 1000 categories[36] to compare with our achieved results (Table IV and V), while we train Optic-Net from scratch with randomly initialized weights.

Architectures Test Accuracy Sensitivity Specificity
Lee et al. [25] 87.63 84.63 91.54
Awais et al. [34] 93.00 87.00 100.00
ResNet50-v1 [28] 94.92 94.92 97.46
Karri et al. [35] 96.00
MobileNet-v2 [32] 97.46 97.46 98.73
Xception [33] 99.36 99.36 99.68
OpticNet-71 [Ours] 100.00 100.00 100.00
TABLE V: Results on Srinivasan2014[5] Dataset

Iv-D Hyper-parameter Tuning and Performance Evaluation

The hyper-parameters while training OpticNet-47, OpticNet-63, OpticNet-71, MobileNet-v2 [32], XceptionNet [33], ResNet50-v2 [29], ResNet50-v1 [28]

are as follows: batch size, b = 8; epochs = 30; learning rate,

; step decay, . We use adaptive learning rate and decrease it using , if validation loss doesn’t lower for six consecutive epochs. Moreover, we set the lowest learning rate to . Furthermore, We use Adam optimizer with default parameters of and for all training schemes. We train OCT2017[10] data-set for 44 hours and Srinivasan2014[5] data-set for 2 hours on a 8 GB NVIDIA GTX 1070 GPU.

Inception-v3 models under-perform compared to both pre-trained models and OpticNet-71 as seen in Table IV. OpticNet-71 takes 0.03 seconds to make prediction on an OCT image - which is real time and while accomplishing state-of-the-art results on OCT2017[10], Srinivasan2014[5] data-set we also surpass human level prediction on OCT images as depicted in Table IV. Human experts are real diagnosticians as reported in [10]. In [10], there are 6 diagnosticians and the highest performing one is Human Expert 5 while the lowest performing one is Human Expert 2. To validate our CNN architecture’s optimization strength we also train two smaller versions of OptcNet-71 on both dataests, which are OpticNet-47 ( [ ] = [2 2 2 2] ) and OpticNet-63 ( [ ] = [3 3 3 3] ). In Fig. 4 we unfold how all the variants of OpticNet outperforms the pre-trained CNNs on Srinivasan2014[5] data-set while OpticNet-71 outperforms all the pre-trained CNNs on OCT2017[10] data-set in terms of accuracy as well as performance-memory trade-off.

Iv-E Analysis of Proposed Residual Interpolated Block

To understand how the Residual Interpolated Block works, we visualize features by passing a test image through our CNN model. Fig. 5(a) illustrates some of the sharp signals propagated by Residual blocks while the interpolation reconstruction routine propagates a weak signal activation, yet the resulting signal space is both more sharp and fine grained compared to their Residual counterparts. Since the conv layers in the following stage activates the incoming signals first, we do not output a activated signal space from a stage. Instead we only activate the interpolation counterpart and then multiply with the last residual block’s non-activated output space while adding the raw signals with the multiplied signal as well - which we consider as output from each stage as narrated in Fig. 5(b). Furthermore, Fig. 5(b) portrays how element-wise addition with the element-wise multiplication between signals helps the learning propagation of OpticNet-71. Fig. 5(b) precisely depicts why this optimization chain is particularly significant, as a zero activation can cancel out a live signal channel from the residual counterpart ( ) while a dead signal channel can also cancel out a non-zero activation from the interpolation counterpart ( ) - thus preventing all signals of a stage from dying and resulting in catastrophic optimization failure due to dead weights or gradient explosion.

V Conclusion

In this work, we propose a novel convolutional neural network that potentially assists in demystifying abstraction from different retinal diseases and helps to identify them with human-level precision in real time. Moreover, we incorporate two novel architectural ideas to address the issue of gradient explosion and degradation. We hope to extend this work to conduct boundary segmentation of retinal layers, so that more subtle features and abnormalities can be detected autonomously with higher certainty which can be a potential tool for ophthalmologists around the world.


We would like to thank “Center for Cognitive Skill Enhancement” Lab for providing us with the technical support.


  • [1] C. for Disease Control, Prevention et al., “National diabetes statistics report, 2017,” 2017.
  • [2] J. W. Yau, S. L. Rogers, R. Kawasaki, E. L. Lamoureux, J. W. Kowalski, T. Bek, S.-J. Chen, J. M. Dekker, A. Fletcher, J. Grauslund et al., “Global prevalence and major risk factors of diabetic retinopathy,” Diabetes care, vol. 35, no. 3, pp. 556–564, 2012.
  • [3] D. S. W. Ting, G. C. M. Cheung, and T. Y. Wong, “Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review,” Clinical & experimental ophthalmology, vol. 44, no. 4, pp. 260–277, 2016.
  • [4] R. R. Bourne, G. A. Stevens, R. A. White, J. L. Smith, S. R. Flaxman, H. Price, J. B. Jonas, J. Keeffe, J. Leasher, K. Naidoo et al., “Causes of vision loss worldwide, 1990–2010: a systematic analysis,” The lancet global health, vol. 1, no. 6, pp. e339–e349, 2013.
  • [5] P. P. Srinivasan, L. A. Kim, P. S. Mettu, S. W. Cousins, G. M. Comer, J. A. Izatt, and S. Farsiu, “Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images,” Biomedical optics express, vol. 5, no. 10, pp. 3568–3577, 2014.
  • [6] K. Alsaih, G. Lemaitre, M. Rastgoo, J. Massich, D. Sidibé, and F. Meriaudeau, “Machine learning techniques for diabetic macular edema (dme) classification on sd-oct images,” Biomedical engineering online, vol. 16, no. 1, p. 68, 2017.
  • [7] D. S. Friedman, B. J. O’Colmain, B. Munoz, S. C. Tomany, C. McCarty, P. De Jong, B. Nemesure, P. Mitchell, J. Kempen et al., “Prevalence of age-related macular degeneration in the united states,” Arch ophthalmol, vol. 122, no. 4, pp. 564–572, 2004.
  • [8] N. Ferrara, “Vascular endothelial growth factor and age-related macular degeneration: from basic science to therapy,” Nature medicine, vol. 16, no. 10, p. 1107, 2010.
  • [9] W. L. Wong, X. Su, X. Li, C. M. G. Cheung, R. Klein, C.-Y. Cheng, and T. Y. Wong, “Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis,” The Lancet Global Health, vol. 2, no. 2, pp. e106–e116, 2014.
  • [10] D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan et al., “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell, vol. 172, no. 5, pp. 1122–1131, 2018.
  • [11] H. Nguyen, A. Roychoudhry, and A. Shannon, “Classification of diabetic retinopathy lesions from stereoscopic fundus images,” in Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.’Magnificent Milestones and Emerging Opportunities in Medical Engineering’(Cat. No. 97CH36136), vol. 1.   IEEE, 1997, pp. 426–428.
  • [12] B. M. Ege, O. K. Hejlesen, O. V. Larsen, K. Møller, B. Jennings, D. Kerr, and D. A. Cavan, “Screening for diabetic retinopathy using computer based image analysis and statistical classification,” Computer methods and programs in biomedicine, vol. 62, no. 3, pp. 165–175, 2000.
  • [13] G. Panozzo, B. Parolini, E. Gusson, A. Mercanti, S. Pinackatt, G. Bertoldo, and S. Pignatto, “Diabetic macular edema: an oct-based classification,” in Seminars in ophthalmology, vol. 19, no. 1-2.   Taylor & Francis, 2004, pp. 13–20.
  • [14] C. I. Sánchez, R. Hornero, M. I. Lopez, and J. Poza, “Retinal image analysis to detect and quantify lesions associated with diabetic retinopathy,” in The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 1.   IEEE, 2004, pp. 1624–1627.
  • [15] R. A. Costa, M. Skaf, L. A. Melo Jr, D. Calucci, J. A. Cardillo, J. C. Castro, D. Huang, and M. Wojtkowski, “Retinal assessment using optical coherence tomography,” Progress in retinal and eye research, vol. 25, no. 3, pp. 325–353, 2006.
  • [16]

    X. C. MeindertNiemeijer, L. Z. K. Lee, M. D. Abràmoff, and M. Sonka, “3d segmentation of fluid-associated abnormalities in retinal oct: Probability constrained graph-search-graph-cut,”

    IEEE Transactions on Medical Imaging, vol. 31, no. 8, pp. 1521–1531, 2012.
  • [17] A. Lang, A. Carass, M. Hauser, E. S. Sotirchos, P. A. Calabresi, H. S. Ying, and J. L. Prince, “Retinal layer segmentation of macular oct images using boundary classification,” Biomedical optics express, vol. 4, no. 7, pp. 1133–1152, 2013.
  • [18] A. Mishra, A. Wong, K. Bizheva, and D. A. Clausi, “Intra-retinal layer segmentation in optical coherence tomography images,” Optics express, vol. 17, no. 26, pp. 23 719–23 728, 2009.
  • [19] G. Quellec, K. Lee, M. Dolejsi, M. K. Garvin, M. D. Abramoff, and M. Sonka, “Three-dimensional analysis of retinal layer texture: identification of fluid-filled regions in sd-oct of the macula,” IEEE transactions on medical imaging, vol. 29, no. 6, pp. 1321–1330, 2010.
  • [20] K. Lee, M. Niemeijer, M. K. Garvin, Y. H. Kwon, M. Sonka, and M. D. Abramoff, “Segmentation of the optic disc in 3-d oct scans of the optic nerve head,” IEEE transactions on medical imaging, vol. 29, no. 1, pp. 159–168, 2010.
  • [21] I. Ghorbel, F. Rossant, I. Bloch, S. Tick, and M. Paques, “Automated segmentation of macular layers in oct images and quantitative evaluation of performances,” Pattern Recognition, vol. 44, no. 8, pp. 1590–1603, 2011.
  • [22] R. Kafieh, H. Rabbani, and S. Kermani, “A review of algorithms for segmentation of optical coherence tomography from retina,” Journal of medical signals and sensors, vol. 3, no. 1, p. 45, 2013.
  • [23] J. Y. Lee, S. J. Chiu, P. P. Srinivasan, J. A. Izatt, C. A. Toth, S. Farsiu, and G. J. Jaffe, “Fully automatic software for retinal thickness in eyes with diabetic macular edema from images acquired by cirrus and spectralis systems,” Investigative ophthalmology & visual science, vol. 54, no. 12, pp. 7595–7602, 2013.
  • [24] G. Lemaître, M. Rastgoo, J. Massich, C. Y. Cheung, T. Y. Wong, E. Lamoureux, D. Milea, F. Mériaudeau, and D. Sidibé, “Classification of sd-oct volumes using local binary patterns: experimental validation for dme detection,” Journal of ophthalmology, vol. 2016, 2016.
  • [25] C. S. Lee, D. M. Baughman, and A. Y. Lee, “Deep learning is effective for classifying normal versus age-related macular degeneration oct images,” Ophthalmology Retina, vol. 1, no. 4, pp. 322–327, 2017.
  • [26] M. Treder, J. L. Lauermann, and N. Eter, “Automated detection of exudative age-related macular degeneration in spectral domain optical coherence tomography using deep learning,” Graefe’s Archive for Clinical and Experimental Ophthalmology, vol. 256, no. 2, pp. 259–265, 2018.
  • [27] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [29] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision.   Springer, 2016, pp. 630–645.
  • [30] L. Sifre and S. Mallat, “Rigid-motion scattering for image classification,” PhD thesis, Ph. D. thesis, vol. 1, p. 3, 2014.
  • [31] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, “Residual attention network for image classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3156–3164.
  • [32] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
  • [33] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
  • [34] M. Awais, H. Müller, T. B. Tang, and F. Meriaudeau, “Classification of sd-oct images using a deep learning approach,” in 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).   IEEE, 2017, pp. 489–492.
  • [35] S. P. K. Karri, D. Chakraborty, and J. Chatterjee, “Transfer learning based classification of optical coherence tomography images with diabetic macular edema and dry age-related macular degeneration,” Biomedical optics express, vol. 8, no. 2, pp. 579–592, 2017.
  • [36] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.