Optical coherence tomography (OCT) is the most common imaging technique for diagnosis in ophthalmology. However, due to image acquisition based on interference of coherent light, OCT suffers from speckle noise. This results in grainy images with low contrast where the diagnosis of medical conditions requires trained expert observers. Denoising of OCT has been addressed in the literature already and can be separated into two categories
. The first one employs denoising during OCT acquisition by e.g. averaging multiple frames of the same object. This prolongs the acquisition process and is therefore not applicable for dynamic objects. The second category comprises post-processing methods such as median, bilateral, wavelet-based or other linear and nonlinear filtering techniques. These can be executed in real time but are prone not only to blurring the image, but also to erasing important disease-related details in that process. This paper describes a domain-specific post-processing method for denoising OCT images with machine learning and more specific convolutional autoencoders (AE) while maintaining disease characteristics.
The dataset used in this paper contains 84,484 retinal OCT images from 4,657 patients showing the disease states drusen, diabetic macular edema (DME), choroidal neovascularization (CNV) and normal and is publicly available. First, a ResNet-34 image classifier
pretrained on ImageNet is fine-tuned on the dataset
. This acts as medical expert as it has been shown that the performance of convolutional neural networks (CNNs) in classifying retinal conditions is on par to that of trained ophthalmologists. Second, the ErfNet CNN autoencoder is trained to reconstruct input images
corrupted by additive gaussian white noise resulting inwith . In general, an AE consists of two components. The encoder takes an input image , or in our case , and maps it from high dimension into low-dimensional, latent representation . This is then fed into the decoder and mapped back to a reconstructed image in input space. The parameters of the AE are optimized by minimizing the pixel-wise mean squared reconstruction error
. Essentially, an autoencoder learns a low-dimensional representation similar to principal component analysis (PCA). When training with a large dataset, noise tends to “average out” and the AE reconstructs distinct and relevant (noise-free) image features. In order to promote enhancement of these features, the trained ResNet with fixed weights is used as additional optimization criterion
. It is applied to the reconstructed, denoised image and tries to predict the retinal disease class. This regularizes the AE during training and enhances disease characteristics in denoised images. The proposed approach is therefore optimized using the weighted loss function
with denoised corrupted image , true disease label of image and cross entropy for . Weighting factor for was empirically set to. A reduce-on-plateau learning rate scheduling is realized to reduce with a factor of when observing saturation of the validation loss. The weight configuration with lowest loss value on the validation set is chosen for testing (early stopping).
The CNN are optimized using 79,484 OCT images for training, 4,000 for validation and 1,000 for testing. To assess denoising performance, the proposed method is compared to total variation (TV) minimization, wavelet, and anisotropic diffusion (AD) denoising regarding peak signal-to-noise ratio (PSNR) and classification performance of ResNet.
The results are summarized in Tab. 1. Our approach not only provides the highest disease classification accuracy with after denoising, but also has the highest peak signal-to-noise ration with compared to the other methods.
|(original)||(corrupted)||total variation||wavelet||AD||AE (ours)|
Fig. 1 visualizes qualitative results for example OCT from the test set showing different disease conditions. The methods are used to restore the input image (first column) from the corrupted image (second column). In contrast to state-of-the-art denoising, our approach is able to distinctively preserve the retinal layers while removing speckle noise. Pathological alterations of the retina are clearly visible and the explanatory power for diagnosis is not reduced. Mean processing time of AE for one image is 13.1 ms on an NVIDIA GeForce GTX 1080 Ti.
It has been shown that convolutional AEs are capable of denoising retinal OCT images without suppressing characteristics of diseases. This was achieved by regularizing the denoising AE during training with another CNN, which was previously trained for disease classification. The trained decoder can also be used to generate new images by sampling the latent space. Future work therefore aims on variational AEs and generative adversarial networks for OCT denoising. It should be noted, however, that speckle noise can also contain significant information as it creates a unique fingerprint of tissue. This information cannot be interpreted by humans, and CNNs can be valuable tools to acquire and utilize this information in the future.
Conflict of Interest
The authors declare that they have no conflict of interest.
The medical images used in this article were made available to the public in a previous study , therefore formal consent is not required.
-  Salinas, H. M. and Fernandez, D. C., “Comparison of PDE-Based Nonlinear Diffusion Approaches for Image Enhancement and Denoising in Optical Coherence Tomography,” IEEE Transactions on Medical Imaging 26(6), 761–771 (2007).
Kermany, D. S., Goldbaum, M., Cai, W., Valentim, C. C., et al., “Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning,”Cell 172(5), 1122–1131 (2018).
-  He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [ ], 770–778 (2016).
-  Romera, E., Álvarez, J. M., Bergasa, L. M., and Arroyo, R., “ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation,” IEEE Trans. Intellent Transp. Syst. 19(1), 263–272 (2018).
-  Kingma, D. P. and Ba, J., “Adam: A method for stochastic optimization,” arXiv e-prints (2014).
-  Chambolle, A., “An Algorithm for Total Variation Minimization and Applications,” Journal of Mathematical Imaging and Vision 20(1–2), 89–97 (2004).
-  Chang, S. G., Yu, B., and Vetterli, M., “Adaptive wavelet thresholding for image denoising and compression,” IEEE Transactions on Image Processing 9(9), 1532–1546 (2000).
-  Perona, P. and Malik, J., “Scale-space and edge detection using anisotropic diffusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990).