Identifying and Categorizing Anomalies in Retinal Imaging Data

by   Philipp Seeböck, et al.
MedUni Wien

The identification and quantification of markers in medical images is critical for diagnosis, prognosis and management of patients in clinical practice. Supervised- or weakly supervised training enables the detection of findings that are known a priori. It does not scale well, and a priori definition limits the vocabulary of markers to known entities reducing the accuracy of diagnosis and prognosis. Here, we propose the identification of anomalies in large-scale medical imaging data using healthy examples as a reference. We detect and categorize candidates for anomaly findings untypical for the observed data. A deep convolutional autoencoder is trained on healthy retinal images. The learned model generates a new feature representation, and the distribution of healthy retinal patches is estimated by a one-class support vector machine. Results demonstrate that we can identify pathologic regions in images without using expert annotations. A subsequent clustering categorizes findings into clinically meaningful classes. In addition the learned features outperform standard embedding approaches in a classification task.



There are no comments yet.


page 4


Unsupervised Identification of Disease Marker Candidates in Retinal OCT Imaging Data

The identification and quantification of markers in medical images is cr...

Exploiting Epistemic Uncertainty of Anatomy Segmentation for Anomaly Detection in Retinal OCT

Diagnosis and treatment guidance are aided by detecting relevant biomark...

Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery

Obtaining models that capture imaging markers relevant for disease progr...

Auto-Classification of Retinal Diseases in the Limit of Sparse Data Using a Two-Streams Machine Learning Model

Automatic clinical diagnosis of retinal diseases has emerged as a promis...

Classification and Detection in Mammograms with Weak Supervision via Dual Branch Deep Neural Net

The high cost of generating expert annotations, poses a strong limitatio...

Reading Race: AI Recognises Patient's Racial Identity In Medical Images

Background: In medical imaging, prior studies have demonstrated disparat...

Deep Convolutional Autoencoder for Assessment of Anomalies in Multi-stream Sensor Data

A fully convolutional autoencoder is developed for the detection of anom...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The detection of diagnostically relevant markers in imaging data is critical in medicine and treatment guidance. Typically, detectors are trained based on a priori defined categories, and annotated imaging data. This makes large-scale annotation necessary which is often not feasible, limits the use to known marker categories, and overall slows the process of discovering novel markers based on available evidence. Additionally, in contrast to natural images such as photographs, the relevant categories of retinal images cover only a small fraction of the overall imaging data, though they are the focus of diagnostic attention, and are often dominated by natural variability of even healthy anatomy.

Optical Coherence Tomography (OCT) huang1991optical

is an important diagnostic modality in ophthalmology and offers a high-resolution 3D image of the layers and entities in the retina. Each position of the retina sampled by an optical beam results in a vector, the A-scan. Adjacent A-scans form a B-scan, which in turn form the entire volume. Anomaly detection

pimentel2014review in retinal images is a difficult and unsolved task, though many patients are affected by retinal diseases that cause vision loss (e.g. age-related macular degeneration has a prevalence of 9% wong2014global ). We propose to identify abnormal regions which can serve as potential biomarkers in retinal spectral-domain OCT (SD-OCT) images, using a Deep Convolutional Autoencoder (DCAE) trained unsupervised on healthy examples as feature extractor and a One-Class SVM to estimate the distribution of normal appearance. By using only healthy examples for training, we omit the need of collecting a dataset that represents all the variabilities which may occur in the data and contains sufficient amount of anomalies.

Results show that the trained model is able to achieve a dice of

regarding annotated pathologies in OCT images. Since there are different pathologies and structures occurring in the regions detected as anomalous, a meaningful sub-classification of these areas is of particular medical interest. Therefore we further cluster regions identified as anomalous in order to retrieve a meaningful segmentation of these areas. In addition, we evaluate to which extent the learned features can be used in a classification task, where intraretinal cystoid fluid (IRC), subretinal fluid (SRF) and the remaining part of the retina are classified. A classification accuracy of 86.6% indicates the discriminative power of the learned feature representation.

Related Work

Deep Convolutional Neural Networks (CNNs) in combination with large quantities of labeled data have recently improved the state-of-the-art in various tasks such as image classification

szegedy2015going or object detection ren2015faster . CNNs can automatically learn translation invariant visual input representations, enabling the adaptation of visual feature extractors to data, instead of manual feature engineering. While purely supervised training of networks is suitable if labeled data is abundant, unsupervised methods enable the exploitation of unlabeled data cho2015unsupervised ; doersch2015unsupervised ; dosovitskiy2014discriminative ; zhao2015stacked , discovering the underlying structure of data.

Doersch et al.  doersch2015unsupervised

use spatial context as supervisory signal to train a CNN without labels, learning visual similarity across natural images, which does not necessarily extend to medical domain. Dosovitskiy

et al.  dosovitskiy2014discriminative train a CNN unsupervised by learning to discriminate between surrogate image classes which are created by data augmentation. A limitation of this approach is that it does not scale to arbitrarily large amounts of unlabeled data. Zhao et al.  zhao2015stacked propose joint instead of layer-wise training of convolutional autoencoders, but they perform experiments in a semi-supervised setting.

A variety of anomaly detection techniques are reported in the literature. Carrera et al.  carrera2015detecting use the technique of convolutional sparse models to detect anomalous structures in texture images. In the work of Erfani et al.  erfani2016high Deep Belief Networks are combined with One-Class SVMs in order to address the problem of anomaly detection in real-life datasets. In contrast to our paper, these works address problems regarding natural images and real-life datasets, which have considerable different characteristics compared to medical images, as explained above. An overview for anomaly detection methods can be found in pimentel2014review .

Regarding unsupervised learning in OCT images, Venhuizen

et al.  venhuizen2015automated

train random forests with features from a bag-of-words approach in order to classify whole OCT volumes, as opposed to our pixel-wise segmentation approach. Schlegl

et al.  schlegl2015automatic

use weakly supervised learning of CNNs to link image information to semantic descriptions of image content. In contrast to our work, the aim is to identify a priori defined categories in OCTs using clinical reports.

2 Method

The following preprocessing is applied to all volumes. First we identify the top and bottom layer of the retina using a graph-based surface segmentation algorithm garvin2009automated , where the bottom layer is used to project the retina on a horizontal plane. Then each B-scan is brightness and contrast normalized. Finally we perform over-segmentation of all B-scans to monoSLIC superpixels of an average size of pixels mholzer2014superpixel .

To capture visual information at different levels of detail, we use a multi-scale approach to perform superpixel-wise segmentation of the visual input. We conduct unsupervised training of two DCAEs on pairs of extracted patches sampled at the same positions for two scales ( patches for and down-sampled to patches for ) in parallel. The used network architecture for both scales is 512c9-3p-2048f-512f for the encoder, implying a matching decoder structure that uses deconvolution and unpooling operations. All layers except pooling and unpooling are followed by Exponential Linear Units (ELUs) clevert2015fast

. The loss function for training is defined as the Mean Squared Error function

, where denotes the input patch and the reconstructed input. In addition, we use dropout in each layer, which corresponds to unsupervised joint training with local constraints in each layer.

The feature representations of both scales are concatenated and used as input for training a Denoising Autoencoder (

), its single-layer architecture denoted as 256f. All three models together form our final model that gives us a 256 dimensional feature representation for a specific superpixel in the B-Scan.

We then train the One-Class SVM scholkopf2001estimating using a linear kernel to find a boundary that describes the distribution of healthy examples in the feature space , which serves as decision boundary for unseen data. New samples can be classified either as coming from the same data distribution if lying within the boundary (normal) or not (anomaly). For each OCT, features and the corresponding class are computed for each superpixel lying within the top and bottom layer of the retina. This provides a segmentation of the retina into two classes.

Subsequently, we use spherical K-means clustering 

hornik2012spherical with cosine distance to sub-segment regions which have been identified as anomalous in the former step into multiple clusters . The number of cluster centroids is determined by an internal evaluation criterion called Davies-Bouldin (DB) index halkidi2001clustering , where a small value indicates compact and well separated clusters. To segment an unseen OCT, each superpixel with the property ”anomalous” gets a cluster assignment, according to the nearest cluster centroid in the feature space .

3 Evaluation

The primary purpose of the evaluation is to test if we can identify novel marker candidates in imaging data algorithmically, instead of relying on a priori defined object categories. We evaluated (1) if we can segment anomalies, (2) if we can find categories of these regions that correspond to a fine-grained ontology of known findings, and (3) if the learned features have discriminative power for image classification.

We used scans from 704 patients, where 283 healthy OCT volumes were used to create the healthy training set (277,340 pairs of image patches) for the and the SVM, 411 unlabeled OCTs were used to create the anomaly training set (295,920 patches) for clustering, and the validation and test set consisting of 5 volumes each, with voxel-wise ground truth annotations of anomalous regions and additional annotations for specific pathologies (IRC, SRF), were used for model selection and evaluation, respectively. The scans were acquired using Spectralis OCT instruments (Heidelberg Engineering, GER) and have a resolution of depicting a volume of the retina, where the distance between voxel centers is about in the first, in the second, and in the third dimension.

The learned anomaly detection model is evaluated on the test set, where Dice, Precision and Recall are calculated for anomalous regions regarding the ground-truth annotation. Interpretation of these quantitative values is done carefully and only in combination with qualitative evaluation by clinical retina experts.

In addition, we compare our model with conventional PCA embedding. To ensure fair comparison, we train two models. In the first model , the dimensionality is chosen to match the feature dimension of our proposed model. For both scales, the first 128 principal components are kept and concatenated to obtain . In the second model

, for each scale the first components that describe 95% of the variance in the dataset are kept.

For the categorization of anomalous regions the number of clusters is varied between 2 and 30, where the model with the lowest DB-index on the anomaly training set is selected and applied to the test set. We qualitatively evaluate if the categories found in the regions identified as abnormal are clinically meaningful, to provide a link to actual disease.

Beside anomaly detection, we test the learned models in a classification task in order to evaluate the discriminative power of the learned feature representation. Here, we train a linear Support Vector Machine (L2-SVM) on a balanced classification dataset with 15,000 labeled examples and three classes (33.3% IRC, 33.3% SRF, 33.3% remaining part of the retina) in feature space . 5-fold-cross-validation is performed so that samples of a patient are only present in one fold, where this labeled dataset is only used to train the L2-SVM.

4 Results

We report quantitative and qualitative results illustrating anomaly segmentation, visualize anomaly categorization outcome, provide descriptions of clusters according to the experts and describe results regarding the classification task.

Algorithm Dice Precision Recall
0,33 0,31 0,39
0,32 0,32 0,35
0,55 0,53 0,58
Table 2: Mean classification accuracies of L2-SVMs on the generated OCT-dataset with balanced classes, trained with features from different models.
Accuracy (in percent)
Algorithm IRC SRF Other Overall
71.9 74.0 73.9 73.4 ( 3.9)
73.3 76.9 70.0 73.4 ( 3.9)
87.3 88.3 84.1 86.6 ( 1.6)
Table 1:

Dice, Precision and Recall for anomalous regions with ground-truth annotations. The same One-Class SVM settings are used for all methods.

As can be seen in Table 2 our method achieves a dice of , which is a clear improvement in comparison with both and . This is also reflected by the visualization provided in Figure 1 (a)-(d), which shows substantial overlap between ground truth annotations and segmentation. provides a less diffuse segmentation compared to , capturing the retinal pathology in a meaningful way.

Regarding anomaly categorization, the lowest DB-index was found for 7 clusters, as illustrated in Figure 1 (f). Figure 1 (e) shows 2D t-SNE embedding van2008visualizing of the learned features. Categories were identified and described as following by two clinical retina experts and are summarized in Figure 1 (g)-(h).

Bright horizontal edges are segmented in cluster ”1” which is highlighted in blue. In the majority of cases this cluster corresponds to Retinal Pigment Epithelium (RPE). Cluster ”2” is marked in light green and corresponds to areas below these horizontal structures. Both clusters segment areas situated next to fluid, which changes the local appearance of patches to abnormal. Cluster ”4” is highlighted in yellow and corresponds to fluid within the retina. This finding is supported by the fact that 62% of manual IRC annotations located in anomalous regions are assigned to cluster ”4”. Marked in grey-blue and pink, Cluster ”3” and ”5” both segment regions that correspond to fluid beneath the RPE. Cluster ”6” (dark green) and ”7” (brown) highlight the border between vitreous and retina due to irregular curvature or fluid situated below, which alters the appearance of extracted patches.

The classification results are illustrated in Table 2, where mean overall accuracy as well as class specific accuracies are reported. The proposed unsupervised feature learning approach clearly outperforms both conventional methods and . Our proposed method achieves a mean overall accuracy of 86.6%, while the PCA-models only achieve 73.4% and 73.3%.

Figure 1: The same B-scan is illustrated for (a) the original scan, (b) the ground truth annotations (pathologic region in blue, IRC in green), anomaly detection results for (c) the comparison method and (d) our proposed method (normal=red, anomaly=blue). The clustering result is shown in (g), where identified anomalous regions are segmented into 7 categories. Each cluster is indicated by a separate color. The corresponding cluster descriptions which are identified by experts, are illustrated in (h) together with nearest neighbors of cluster centroids. 2D t-SNE embedding of the feature space is shown in (e). The calculated values of the DB-Index are plotted in (f).

5 Discussion

In this paper we propose a method to detect anomalous regions in OCT images which needs only healthy training data. A deep convolutional autoencoder is used as feature extractor, while One-Class SVM is used to estimate the distribution of healthy retinal patches. In a second step, the identified anomalous regions are segmented into subclasses by clustering. Results show that our proposed method is not only capable of finding anomalies in unseen images, but also categorizes these findings into clinically meaningful classes. The identification of new anomalies, instead of the automation of expert annotation of known anomalies is a critical shift in medical image analysis. It will impact the categorization of diseases, the monitoring of treatment and the discovery of relationships between genetic factors and phenotypes. Additionally, the power of our learned feature extractor is also indicated by performance in a classification task.


This work funded by the Austrian Federal Ministry of Science, Research and Economy, and the FWF (I2714-B31). A Tesla K40 used for this research was donated by the NVIDIA Corporation.


  • [1] D. Carrera, G. Boracchi, A. Foi, and B. Wohlberg. Detecting anomalous structures by convolutional sparse models. In Neural Networks (IJCNN), 2015 International Joint Conference on, pages 1–8. IEEE, 2015.
  • [2] M. Cho, S. Kwak, C. Schmid, and J. Ponce. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pages 1201–1210, 2015.
  • [3] D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
  • [4] C. Doersch, A. Gupta, and A. A. Efros. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, pages 1422–1430, 2015.
  • [5] A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems, pages 766–774, 2014.
  • [6] S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie.

    High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning.

    Pattern Recognition, 2016.
  • [7] M. K. Garvin, M. D. Abràmoff, X. Wu, S. R. Russell, T. L. Burns, and M. Sonka. Automated 3-d intraretinal layer segmentation of macular spectral-domain optical coherence tomography images. Medical Imaging, IEEE Transactions on, 28(9):1436–1447, 2009.
  • [8] M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On clustering validation techniques. Journal of Intelligent Information Systems, 17(2):107–145, 2001.
  • [9] M. Holzer and R. Donner. Over-segmentation of 3d medical image volumes based on monogenic cues. In Proceedings of the 19th CVWW, pages 35–42, 2014.
  • [10] K. Hornik, I. Feinerer, M. Kober, and C. Buchta. Spherical k-means clustering. Journal of Statistical Software, 50(10):1–22, 2012.
  • [11] D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, C. A. Puliafito, et al. Optical coherence tomography. Science, 254(5035):1178–1181, 1991.
  • [12] M. A. Pimentel, D. A. Clifton, L. Clifton, and L. Tarassenko.

    A review of novelty detection.

    Signal Processing, 99:215–249, 2014.
  • [13] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99, 2015.
  • [14] T. Schlegl, S. M. Waldstein, W.-D. Vogl, U. Schmidt-Erfurth, and G. Langs. Predicting semantic descriptions from medical images with convolutional neural networks. In Information Processing in Medical Imaging, pages 437–448. Springer, 2015.
  • [15] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471, 2001.
  • [16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015.
  • [17] L. Van der Maaten and G. Hinton. Visualizing data using t-sne.

    Journal of Machine Learning Research

    , 9(2579-2605):85, 2008.
  • [18] F. G. Venhuizen, B. van Ginneken, B. Bloemen, M. J. van Grinsven, R. Philipsen, C. Hoyng, T. Theelen, and C. I. Sánchez. Automated age-related macular degeneration classification in oct using unsupervised feature learning. In SPIE Medical Imaging, pages 94141I–94141I. International Society for Optics and Photonics, 2015.
  • [19] W. L. Wong, X. Su, X. Li, C. M. G. Cheung, R. Klein, C.-Y. Cheng, and T. Y. Wong. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. The Lancet Global Health, 2(2):e106–e116, 2014.
  • [20] J. Zhao, M. Mathieu, R. Goroshin, and Y. Lecun. Stacked what-where auto-encoders. arXiv preprint arXiv:1506.02351, 2015.