Learned Pre-Processing for Automatic Diabetic Retinopathy Detection on Eye Fundus Images

07/27/2020 ∙ by Asim Smailagic, et al. ∙ 36

Diabetic Retinopathy is the leading cause of blindness in the working-age population of the world. The main aim of this paper is to improve the accuracy of Diabetic Retinopathy detection by implementing a shadow removal and color correction step as a preprocessing stage from eye fundus images. For this, we rely on recent findings indicating that application of image dehazing on the inverted intensity domain amounts to illumination compensation. Inspired by this work, we propose a Shadow Removal Layer that allows us to learn the pre-processing function for a particular task. We show that learning the pre-processing function improves the performance of the network on the Diabetic Retinopathy detection task.



There are no comments yet.


page 2

page 4

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Diabetic Retinopathy (DR) is an eye disease that affects more than

of the estimated 425 worldwide diabetic patients

[ruta2013prevalence]. Consequently, DR is a leading cause of blindness in the working-age population of the world and, therefore, screening all diabetic patients is of paramount importance. With the growth in the prevalence of diabetes, the burden on ophthalmologists to screen the entire diabetic population also grows. For these reasons, a system capable of detecting DR is becoming increasingly important.

Screening for DR in the US and UK relies mainly on the right interpretation of a digital retinal image to recognize pathological features. Prompt acknowledgement and treatment of this pathology can save sight, and for this reason much research has been devoted in recent years to the design of machine learning pipelines that can help in its correct diagnosis. Unfortunately, lesions that characterize early stages of this disease are subtle, and when improperly illuminated by a fundus camera in acquisition time they can be confounded with other non-harmful signs of similar appearance.

A reasonable approach to deal with this problem is to improve the quality of the image by obtaining a shadow free version of the image. Although this step can be performed in a manual way [foracchia_luminosity_2005, savelli_illumination_2017, saha_novel_2018], it may be preferable to learn the preprocessing function with minimal intervention, directly from the data. We propose to do so by implementing a U-net architecture [DBLP:journals/corr/RonnebergerFB15]

, which is the convolutional neural network architecture of choice for biomedical image segmentation.

2 Related Work

The pre-processing of retinal images has been proposed in several papers before. One of the first proposed techniques for improving the visual appearance of this kind of data was introduced in [foracchia_luminosity_2005]. The authors estimated an illumination field by first removing foreground pixels and then fitting a Gaussian model to the background. Similarly, the technique proposed in [leahy_illumination_2012]

relies on Laplace interpolation and a multiplicative model of illumination to remove its impact. In

[xiong_enhancement_2017], an image formation model involving scattering and background illumination was proposed and inverted to retrieve well-illuminated images. A different model, based on cataracts formation, was used in [mitra_enhancement_2018] to reduce blurriness and improve contrast. Also recently, the authors of [saha_novel_2018] introduce a luminosity correction technique with a focus on avoiding the creation of visual artifacts on regions of the image that were initially well-illuminated. It is important to stress that all these methods are designed and applied on retinal images in a static manner. This means that any subsequent automatic image understanding task for diagnostic purposes remains isolated from the pre-processing stage.

Figure 1: Example of an eye fundus image. Left: Unprocessed retinal image. Right: Illumination-compensation by shadow removal.

In this paper, we follow previous observations from [savelli_illumination_2017, galdran_duality_2018] that fog/haze removal can be interpreted as illumination compensation when applied to inverted intensities on retinal images, as shown in Fig. 1. The standard model used to describe hazy images is given by the haze imaging equation [Narasimhan2002, Narasimhan00chromaticframework, Fattal:2008:SID:1360612.1360671, inproceedings]:


Therefore, haze removal involves estimating the transmission map (depth map), soft matting for its refinement, estimating the atmospheric light and recovering the scene radiance . While we also aim to apply the above model, in contrast with previous techniques our goal in this paper is to automatically learn to estimate these unknowns in such a way that they are optimal for the downstream task of diabetic retinopathy detection, which will be simultaneously solved.

Figure 2: Pipeline of the proposed method. A segmentation CNN is used to estimate the transmission map . Then, the input image and are provided to a Shadow Removal Layer that outputs the normalized image . Finally,

is given as input to a classifier CNN that outputs if the image has Diabetic Retinopathy or not. Both CNNs can be trained to minimize the classification error.

3 Method

Pre-processing the images to have more consistent illumination and colors across the dataset can help improve the performance of DR detection. In this paper, we aim to remove shadows from eye fundus images by dehazing the inverted image [savelli_illumination_2017]. Dehazing methods require the estimation of the transmission map

using heuristics that may not be optimal. To overcome this issue, we pose the problem of transmission map estimation as an optimization problem, and propose to learn the function that maps an eye fundus image to a transmission map

by minimizing a classification error. This allows us to optimize the transmission map estimation for a particular classification task.

3.1 Shadow Removal Layer

In order to accomplish this, we develop a Shadow Removal Layer. This layer uses an estimated transmission map and an input image and outputs a pre-processed image , with shadows removed. This layer applies the following equation:


We can assume that if we white balance the images before applying the illumination estimation function as shown in [savelli_illumination_2017]. The equation then reduces to i.e. simply dividing the input image intensities with the transmission map. In this work, we use a Segmentation Convolutional Neural Network (CNN) to learn the function .

The problem is that we do not have the ground-truth data to train the segmentation model . To solve this issue, we derive the training signal from a classification CNN that learns to detect DR from . Therefore, the segmentation CNN learns to output the transmission map that minimizes the classification CNN error. This is possible as Equation 2 is differentiable, and the training signal can flow to the segmentation CNN’s parameters. The entire architecture is shown in Figure 2.

3.2 Transmission Map Supervision

For the transmission map estimation model to be able to learn something close to the depth map of the image, we add a term to the loss. On top of the classification loss we minimize the mean squared error between and a reference transmission map . This reference transmission map is obtained by computing the depth maps for each image in the dataset manually as per the Dark Channel Prior theory [He:2011:SIH:2068459.2068579] and taking an average over all the depth maps, as shown in Figure 3

. The objective of the network is hence modified to decrease the difference between the manually computed reference depth map and the learned depth map. The new loss function is:

Figure 3: Average depth map computed manually from the entire dataset of eye fundus images, used as additional supervision to the transmission map.

where is the classification loss, are the classification network’s parameters and are the segmentation network’s parameters. In this paper we used Binary Cross-Entropy as the classification loss .

4 Experiments

4.1 Implementation Details

We used a network inspired by U-Net as the segmentation CNN that estimates the transmission map and a pre-trained Inception v3 network as the classification network. The eye fundus images are resized to and provided to the U-Net. The pre-processed images that are given to the Inception v3 network are also . To accomodate for the larger input image size, we remove the last layer of the Inception v3 network and add a global average pooling layer followed by a Fully-Connected layer with a single output.

The two models are trained using the Adam optimizer with a learning rate of 2x. The training process consists of 2 phases:

  1. Fitting: Here, the parameters of the Inception network are frozen and the U-net alone is trained;

  2. Fine-tuning: Here, the layers of the Inception network are made trainable and thus fine tuned along with the U-net parameters

Both fitting and fine-tuning are performed for 100 epochs each with a batch-size of 4.

Results Test Accuracy
Inception V3 89.50%
U-Net+Shadow Removal+Inception v3 90.34%

Table 1: Our Shadow Removal Layer improves the classification accuracy over the baseline.

4.2 Dataset

The Messidor dataset [decenciere_feedback_2014] is a collection of eye fundus images of healthy and unhealthy patients. It consists of eye fundus color numerical images acquired by 3 ophthalmologic departments. The image sizes are , or pixels. The retinopathy grade has been provided by medical experts, where a grade of 0 corresponds to healthy and grades 1,2 and 3 correspond to unhealthy.

The training data consists of 949 images, 441 healthy and 508 unhealthy, and the test data consists of 238 images, 106 healthy and 132 unhealthy. The images in both training an test set are distributed equally among the 3 opthalmologic departments.

The images are center cropped and resized. Each image corresponds to 4 images, the original image, a randomly rotated image by an angle in the range of 230, and the horizontally flipped version of both.

4.3 Results

We trained the classifier on the original dataset for the task of DR detection and obtained accuracy. Our pre-processing method achieves a test accuracy of after fine-tuning, as shown in Table 1, giving an improvement in the test set over the baseline. Our model converges better than the baseline and also improves the detection.

Figure 4: Transmission map learned by the U-Net and corresponding output of the Shadow Removal Layer.

Furthermore, we can visually inspect the estimated transmission maps . As shown in Figure 4, we can verify that the U-Net was able to output valid transmission maps, different from the mean transmission map . Moreover, the images produced by the Shadow Removal Layer have similar illumination, indicating that the learned pre-processing step is effectively removing shadows from the eye fundus images.

5 Conclusion and Future Work

In this paper we proposed a method to learn how to pre-process eye fundus images for the task of DR detection. We draw inspiration from haze/shadow removal methods and devise a methodology to train a segmentation CNN to estimate the transmission map of the input eye fundus image. Then, we apply a Shadow Removal Layer to pre-process the input image and then provide that image to a classifier. The entire system can be trained to minimize the classification error.

We show that, by learning to pre-process eye fundus images to a particular task, the performance of DR detection is improved. As future work, we plan to verify if the learned pre-processing function is useful for other retinal tasks, such as vessel segmentation.


This work is financed by the ERDF - European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme, by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project CMUP-ERI/TIC/0028/2014.