Vision impairment due to pathological damage of the retina can largely be prevented through periodic screening using fundus color imaging. However the challenge with large scale screening is the inability to exhaustively detect fine blood vessels crucial to disease diagnosis. In this work we present a computational imaging framework using deep and ensemble learning for reliable detection of blood vessels in fundus color images. An ensemble of deep convolutional neural networks is trained to segment vessel and non-vessel areas of a color fundus image. During inference, the responses of the individual ConvNets of the ensemble are averaged to form the final segmentation. In experimental evaluation with the DRIVE database, we achieve the objective of vessel detection with maximum average accuracy of 94.7% and area under ROC curve of 0.9283.READ FULL TEXT VIEW PDF
Pathological conditions of the retina examined through regular screening [1, 2] can heavily assist prevention of visual blindness. Fundus imaging is the most widely used modality for early screening and detection of such blindness causing diseases like diabetic retinopathy, glucoma, age-related macular degeneration , hypertension and stroke induced changes . Imaging of fundus has largely improved with progress from the film based photography camera to use of electronic imaging sensors; as well as red free imaging, stereo photography, hyperspectral imaging, angiography, etc. , thereby reducing inter- and intra-observer reporting variability. Retinal image analysis has also significantly contributed to this technological development [5, 6]. Since fundus imaging is predominantly used for first level of abnormality screening, research focus includes: (i) detection and segmentation of retinal structures (vessels, fovea, optic disc), (ii) segmentation of abnormalities, and (iii) quality quantification of images acquired to assess reporting fitness .
Related Work: The process of clinical reporting of retinal abnormalities is systematic and lesions are reported with respect to their location from vessels or optic disc. Computer assisted diagnosis systems are accordingly being developed to improve the clinical workflow . Some of the developments include, assessment of image quality ; blood vessel detection , branch pattern, diameter and vascular tree analysis [3, 4, 9, 10]; followed by reporting of lesions and their location with respect to the vessels [4, 10]. An important challenge in this context is robust and exhaustive detection of retinal vessels in color fundus images leveraging the potential of computer assisted diagnosis [5, 11] and assist in routine screening [4, 3].
predominantly use image filters, vector geometry, statistical distribution studies, and machine learning of low-level features and photon distribution models for vessel detection. Such methods rely on use of handcrafted features or heuristic assumptions for solving the problem and are not generalized to learn pattern attributes from the data itself, thus making them vulnerable to performance subjectivity on account of the method’s inherent weaknesses. Recently fully data-driven ,deep learning based models have been proposed. However, they are weaker in performance compared to the state of the art methods that use the conventional paradigm. The primary challenge here is to design an end-to-end framework which learns pattern representation from the data without any domain knowledge based heuristic information to identify both coarse and fine vascular structures and is at least at par if not better than the heuristic-based models.
Approach: This paper makes an attempt to ameliorate the issue of subjectivity induced bias in feature representation by training an ensemble of Convolutional Neural Networks (ConvNets)  on raw color fundus images to discriminate vessel pixels from non-vessel ones. Fig. 1
illustrates an example of exhaustive retinal vessel detection using this approach. Each ConvNet has three convolutional layers and two fully connected layers and is trained independently on randomly selected patches from the training images. At the time of inference, the vesselness-probabilities independently output by each ConvNet are averaged to form the final vesselness probability of each pixel.
§II gives a brief theoretical background of the proposed method. The problem statement is formally defined in §III and the proposed approach is described in §IV. The results of experimental evaluation on DRIVE dataset have been presented in §V. The paper is concluded with a summary of the proposed method and a discussion of possible impact of end-to-end deep learning based solutions for medical image analysis in §VI.
This section introduces some concepts regarding ConvNets and ensemble learning which form the pillars of the proposed solution.
Convolutional Neural Networks: Convolutional neural networks (CNN or ConvNet) are a special category of artificial neural networks designed for processing data with a grid-like structure [14, 15]. The ConvNet architecture is based on sparse interactions and parameter sharing and is highly effective for efficient learning of spatial invariances in images [16, 17]. There are four kinds of layers in a typical ConvNet architecture: convolutional (conv), pooling (pool), fully-connected (affine
) and rectifying linear unit (ReLU). Each convolutional layer transforms one set of feature maps into another set of feature maps by convolution with a set of filters. Mathematically, if and denote the weights and the bias of the filter of the convolutional layer and be its activation-map, then:
where is the convolution operator. Pooling layers perform a spatial downsampling of the input feature maps. Pooling helps to make the representation become invariant to small translations of the input. Fully-connected layers are similar to the layers in a vanilla neural network. Let denote the incoming weight matrix and
, the bias vector of a fully-connected layer,. Then:
where operator tiles the feature-maps of the input volume along the height, is matrix multiplication and
is element-wise addition. ReLU layers perform a pointwise rectification of the input and correspond to the activation function. For theunit of layer :
In a deep ConvNet, units in the deeper layers indirectly interact with a larger area of the input, thus forming a high level abstraction of the input data.
Ensemble learning is a technique of using multiple models or experts for solving a particular artificial intelligence problem. Ensemble methods seek to promote diversity among the models they combine and reduce the problem related to overfitting of the training data. The outputs of the individual models of the ensemble are combined (e.g. by averaging) to form the final prediction. Concretely, if be models of an ensemble and is the probability that the input
is classified asunder the model , then the ensemble predicts:
Ensemble learning promotes better generalization and often provides higher accuracy of prediction than the individual models.
Let be an image acquired by the RGB sensor of a color fundus camera. The intensity observed at location is denoted by . Let be a set of pixels in the local neighborhood of . Let be the set of class labels for the pixel at location . In a machine learning framework, the probability of finding a tissue of type at location , is modelled by a class of functions where is a set of parameters which are learned from the training data and
, a set of hyperparameters tuned using the validation data. In the proposed method,is an ensemble of ConvNets, the architecture of which is described next.
Each layer of a ConvNet transforms one volume of features into another. A volume of features is described as where is the number of feature maps of spatial dimension . The input to each ConvNet of the proposed ensemble is a color fundus image patch. The ConvNets have the same organization of layers which can be described as: input- [conv - relu]-[conv - relu - pool] x 2 - affine - relu - [affine with dropout] - softmax. Fig.2 gives a schematic diagram of the organization. Each conv layer has receptive field size - , stride - and output volume - . The pool layers have receptive field size - and stride - . Dropout 
is a regularization method for neural networks that enforces sparsity, prevents co-adaptation of features and promotes better generalization by forcing a fraction of neurons to be inactive during each episode of learning. The output of the final layer is passed to asoftmax function which converts the outputs into class probabilities. Let denote the activation of the neuron of the fully-connected output-layer and
denote the posterior probability of theoutput class. Then:
This section presents experimental validation of the proposed technique and its performance comparison with earlier methods .
Dataset: The ensemble of ConvNets is is evaluated by learning with the DRIVE training set (image id. 21-40) and testing over the DRIVE test set (image id. 1-20)111DRIVE dataset: http://www.isi.uu.nl/Research/Databases/DRIVE/.
Learning mechanism: Each ConvNet is trained independently on a set of randomly chosen patches. Learning rate and annealing rate were kept constant across models at and respectively. Dropout probability, regularization coefficient and number of hidden units in the penultimate affine layer of the different models were sampled respectively from , and where20] with minibatch size .
Performance assessment: Table I presents the accuracy and consistency of detection in comparison with those reported in earlier techniques. It is clearly evident that although our approach does not have the highest accuracy as compared with other methods, it does exhibit superior performance than the previously proposed deep learning based method for learning vessel representations from data . The kappa score being a study of observer consistency indicates sensitivity of the technique to detect both coarse and fine vessels as desired. Typical response to detection of both coarse and fine vessels are presented in Fig. 3.
|Max. avg. Accuracy||Kappa|
|Maji et al. ||0.9327||0.6287|
|Sheet et al. ||0.9766||0.8213|
|Staal et al. .||0.9422||-|
|Niemeijer et al.||0.9416||0.7145|
|Zana et al.||0.9377||0.6971|
|Jiang et al.||0.9212||0.6399|
|Martínez-Pérez et al.||0.9181||0.6389|
|Chaudhuri et al.||0.8773||0.3357|
Fig: 4 gives the receiver operating characteristic (ROC) curve of the proposed method. Area under ROC curve obtained is .
This paper presents a ConvNet-ensemble based framework for processing color fundus images for detection of coarse and fine vessels. The method is evaluated experimentally on the DRIVE dataset. The remarkable ability of ConvNets to recognize images and that of ensemble learning at generalization is leveraged to design a heuristics independent, data driven approach to analyzing medical images. This presents a feasible solution to subjectivity induced bias in medical image analysis. This is an improvement of our previous work on data-driven analysis of fundus images . This approach in general also provides a strong alternative approach to solve complex medical data analysis problem through deep learning combined with the power of ensemble learning.
“Detection of retinal vessels in fundus images through transfer learning of tissue specific photon interaction statistical physics,”in Proc. Int. Symp. Biomed. Imaging, 2013, pp. 1452–1456.
“Deep neural network and random forest hybrid architecture for learning to detect retinal vessels in fundus images,”in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, 2015, pp. 3029–3032.