Semantic denoising autoencoders for retinal optical coherence tomography

03/23/2019 ∙ by Max-Heinrich Laves, et al. ∙ uni hannover 0

Noise in speckle-prone optical coherence tomography tends to obfuscate important details necessary for medical diagnosis. In this paper, a denoising approach that preserves disease characteristics on retinal optical coherence tomography images in ophthalmology is presented. By combining a deep convolutional autoencoder with a priorly trained ResNet image classifier as regularizer, the perceptibility of delicate details is encouraged and only information-less background noise is filtered out. With our approach, higher peak signal-to-noise ratios with PSNR = 31.2 dB and higher classification accuracy of ACC = 85.0 % can be achieved for denoised images compared to state-of-the-art denoising with PSNR = 29.4 dB or ACC = 70.3 %, depending on the method. It is shown that regularized autoencoders are capable of denoising retinal OCT images without blurring details of diseases.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Purpose

Optical coherence tomography (OCT) is the most common imaging technique for diagnosis in ophthalmology. However, due to image acquisition based on interference of coherent light, OCT suffers from speckle noise. This results in grainy images with low contrast where the diagnosis of medical conditions requires trained expert observers. Denoising of OCT has been addressed in the literature already and can be separated into two categories[1]

. The first one employs denoising during OCT acquisition by e.g. averaging multiple frames of the same object. This prolongs the acquisition process and is therefore not applicable for dynamic objects. The second category comprises post-processing methods such as median, bilateral, wavelet-based or other linear and nonlinear filtering techniques. These can be executed in real time but are prone not only to blurring the image, but also to erasing important disease-related details in that process. This paper describes a domain-specific post-processing method for denoising OCT images with machine learning and more specific convolutional autoencoders (AE) while maintaining disease characteristics.

2 Methods

The dataset used in this paper contains 84,484 retinal OCT images from 4,657 patients showing the disease states drusen, diabetic macular edema (DME), choroidal neovascularization (CNV) and normal and is publicly available[2]. First, a ResNet-34 image classifier

pretrained on ImageNet is fine-tuned on the dataset


. This acts as medical expert as it has been shown that the performance of convolutional neural networks (CNNs) in classifying retinal conditions is on par to that of trained ophthalmologists

[2]. Second, the ErfNet CNN autoencoder is trained to reconstruct input images

corrupted by additive gaussian white noise resulting in

with [4]. In general, an AE consists of two components. The encoder takes an input image , or in our case , and maps it from high dimension into low-dimensional, latent representation . This is then fed into the decoder and mapped back to a reconstructed image in input space. The parameters of the AE are optimized by minimizing the pixel-wise mean squared reconstruction error

. Essentially, an autoencoder learns a low-dimensional representation similar to principal component analysis (PCA). When training with a large dataset, noise tends to “average out” and the AE reconstructs distinct and relevant (noise-free) image features. In order to promote enhancement of these features, the trained ResNet with fixed weights is used as additional optimization criterion

. It is applied to the reconstructed, denoised image and tries to predict the retinal disease class. This regularizes the AE during training and enhances disease characteristics in denoised images. The proposed approach is therefore optimized using the weighted loss function


with denoised corrupted image , true disease label of image and cross entropy for . Weighting factor for was empirically set to

. The aforementioned method is implemented with PyTorch 1.0 and trained for 200 epochs using the Adam optimizer with an initial learning rate of

[5]. A reduce-on-plateau learning rate scheduling is realized to reduce with a factor of when observing saturation of the validation loss. The weight configuration with lowest loss value on the validation set is chosen for testing (early stopping).

3 Results

The CNN are optimized using 79,484 OCT images for training, 4,000 for validation and 1,000 for testing. To assess denoising performance, the proposed method is compared to total variation (TV) minimization[6], wavelet[7], and anisotropic diffusion (AD)[8] denoising regarding peak signal-to-noise ratio (PSNR) and classification performance of ResNet.

corrupted TV wavelet AD AE (ours)
PSNR 19.2 29.4 28.0 24.6 31.2
ACC 50.2 49.3 52.6 70.3 85.0
Table 1: Results of denoising reported for test set with mean peak signal-to-noise ratio PSNR in dB and mean classification accuracy ACC in %. Values for corrupted images are given for comparison. Bold values denote best results.

The results are summarized in Tab. 1. Our approach not only provides the highest disease classification accuracy with after denoising, but also has the highest peak signal-to-noise ration with compared to the other methods.

(original) (corrupted) total variation wavelet AD AE (ours)
Figure 1: Results of our approach compared to state-of-the-art denoising for retinal OCT disease conditions from the test set. Digital zoom is recommended for optimal comparison.

Fig. 1 visualizes qualitative results for example OCT from the test set showing different disease conditions. The methods are used to restore the input image (first column) from the corrupted image (second column). In contrast to state-of-the-art denoising, our approach is able to distinctively preserve the retinal layers while removing speckle noise. Pathological alterations of the retina are clearly visible and the explanatory power for diagnosis is not reduced. Mean processing time of AE for one image is 13.1 ms on an NVIDIA GeForce GTX 1080 Ti.

4 Conclusion

It has been shown that convolutional AEs are capable of denoising retinal OCT images without suppressing characteristics of diseases. This was achieved by regularizing the denoising AE during training with another CNN, which was previously trained for disease classification. The trained decoder can also be used to generate new images by sampling the latent space. Future work therefore aims on variational AEs and generative adversarial networks for OCT denoising. It should be noted, however, that speckle noise can also contain significant information as it creates a unique fingerprint of tissue. This information cannot be interpreted by humans, and CNNs can be valuable tools to acquire and utilize this information in the future.

Conflict of Interest

The authors declare that they have no conflict of interest.

Formal Consent

The medical images used in this article were made available to the public in a previous study [2], therefore formal consent is not required.


  • [1] Salinas, H. M. and Fernandez, D. C., “Comparison of PDE-Based Nonlinear Diffusion Approaches for Image Enhancement and Denoising in Optical Coherence Tomography,” IEEE Transactions on Medical Imaging 26(6), 761–771 (2007).
  • [2]

    Kermany, D. S., Goldbaum, M., Cai, W., Valentim, C. C., et al., “Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning,”

    Cell 172(5), 1122–1131 (2018).
  • [3] He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

     ], 770–778 (2016).
  • [4] Romera, E., Álvarez, J. M., Bergasa, L. M., and Arroyo, R., “ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation,” IEEE Trans. Intellent Transp. Syst. 19(1), 263–272 (2018).
  • [5] Kingma, D. P. and Ba, J., “Adam: A method for stochastic optimization,” arXiv e-prints (2014).
  • [6] Chambolle, A., “An Algorithm for Total Variation Minimization and Applications,” Journal of Mathematical Imaging and Vision 20(1–2), 89–97 (2004).
  • [7] Chang, S. G., Yu, B., and Vetterli, M., “Adaptive wavelet thresholding for image denoising and compression,” IEEE Transactions on Image Processing 9(9), 1532–1546 (2000).
  • [8] Perona, P. and Malik, J., “Scale-space and edge detection using anisotropic diffusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990).