Optical coherence tomography (OCT) imaging from different camera devices causes challenging domain shifts and can cause a severe drop in accuracy for machine learning models. In this work, we introduce a minimal noise adaptation method based on a singular value decomposition (SVDNA) to overcome the domain gap between target domains from three different device manufacturers in retinal OCT imaging. Our method utilizes the difference in noise structure to successfully bridge the domain gap between different OCT devices and transfer the style from unlabeled target domain images to source images for which manual annotations are available. We demonstrate how this method, despite its simplicity, compares or even outperforms state-of-the-art unsupervised domain adaptation methods for semantic segmentation on a public OCT dataset. SVDNA can be integrated with just a few lines of code into the augmentation pipeline of any network which is in contrast to many state-of-the-art domain adaptation methods which often need to change the underlying model architecture or train a separate style transfer model. The full code implementation for SVDNA is available at https://github.com/ValentinKoch/SVDNA.READ FULL TEXT VIEW PDF
Diseases in the Human retina are among the leading reasons for reduced vision and blindness globally. Estimates are that currently roughly 170 million people are affected by Age-related Macular Degeneration[Pennington2016-vj], while Diabetic Retinopathy is recognized as a global epidemic with the numbers increasing at ever higher rates [Lee2015-eu]. Optical coherence tomography (OCT) is a powerful technique used in many medical applications to generate real time and non-invasive cross-sectional images of live biological tissues [Al-Mujaini2013-ww]. In the field of Ophthalmology, OCT images help doctors to make therapy decisions and monitor the treatment outcome. As eye diseases are becoming more and more prevalent due to the increased age of populations [Fricke2018-ho], the need for research in automating diagnosis and aiding doctors is increasing.
Recently, deep learning methods are showing promising results in areas such as disease prediction[Banerjee2020-zd, Holmberg2020-rs], semantic segmentation [Borkovkina2020-cq, Hassan2021-zf] or improving quality [Cheong2021-qc]
of retinal OCT images. Although a lot of progress has been made, challenges remain in applying artificial intelligence methods on real-world OCT data, where image characteristics such as signal-to-noise ratio, brightness, and contrast can vary and cause changes in data distribution, so-called domain shifts. For OCT images, a particular domain shift is introduced from different OCT imaging devices being used, which have different image-quality properties. While for a medical doctor those differences are only a mild annoyance, machine learning models can quickly fail when faced with only small disturbances in the underlying data distribution. One solution is to label images from all possible devices, but as manually labeling images is very costly and needs highly skilled specialists, other methods need to be developed.
Domain adaptation methods offer a solution to the reduced performance of AI algorithms that are caused by the difference in data distribution [Guan2021-wh, Patel2015-bn]. Domain adaptation has also been used for shifting domains between different device manufacturers for OCT imaging devices. Yang et al. [Yang2020-cq] detect lesions in OCT images from different camera devices using an adversarial approach, several recent works use CycleGAN approaches to transfer style between domains [Romo-Bucheli2020-az, Zhang2020-mu, Manakov2019-cw]. In particular, Romo-Bucheli et al. train a CycleGAN and measure the improved performance on a segmentation task, which makes it the most comparable work to ours [Romo-Bucheli2020-az]. While GAN architectures are performant when dealing with domain shifts where the structural difference between the domains is much larger than between OCT camera devices [yang2020phase], we argue that for domain adaptation between retinal OCT devices these models are unnecessarily complex and can through small changes to image content even be decremental to the performance.
In this work, we present a novel method for unsupervised domain adaptation (UDA) for semantic segmentation of retinal OCT images from multiple devices. The underlying observation motivating our approach is twofold. First, we observe that the noise structure is a key difference between images from different OCT cameras and needs to be specifically modeled. Second, we observe that many UDA methods are often developed for unsupervised domain adaptation between synthetic and real-world data and are not necessarily optimal for retinal OCT images. We therefore develop and evaluate a simple but effective unsupervised domain adaptation method using a singular value decomposition-based noise adaptation approach (SVDNA). We train a semantic segmentation model by using SVDNA and show that the model generalizes to unseen OCT devices and even performs on par with supervised methods trained with manual labels. Further, we show that the SVDNA method is comparable or even outperforms other state-of-the-art UDA methods that often require more complex training schemes or implementation and usage of separate style transfer models. Further, to the best of our knowledge previous work evaluated on private datasets, making comparisons difficult, whereas we benchmark SVDNA on publicly available OCT datasets from multiple devices.
We make the following contributions to OCT imaging analysis and biomedical unsupervised domain adaptation:
We present SVDNA, a minimal method for unsupervised domain adaptation method that transfers style between images by using a singular value decomposition-based noise transfer model.
We demonstrate that our method performs on par or even outperforms more complex state-of-the-art UDA methods as well as models trained with supervised labels, while considerably reducing training complexity.
First we introduce our Singular Value Decomposition-based noise adaptation method. In the second part, we describe how we trained our segmentation network with the help of the SVDNA restyled images.
SVDNA achieves style transfer between a source and target domain image by decomposing both images of size into their respective singular value decompositions , where
corresponds to the left singular vectors,to the right singular vectors and to the singular values. Then, the reconstructed noise from the target domain image is added to the reconstructed content of the source image. Therefore, we use the first singular values and their corresponding right and left singular vectors from the source image, where the content is encoded, and add the noise that is encoded in the singular values and their corresponding vectors from the target image. The resulting image can be expressed as a matrix multiplication as can be seen in algorithm 1. The parameter must be chosen by hand but is in our experience not very sensitive to variation. In practice, values between and were used to train our network with images of pixels. For more details on the feasibility of the used values of see supplementary figure 2.
To account for possible out of bound pixel values that can occur after this noise transfer operation, values are clipped to the interval . In addition, a histogram matching [Gonzalez2009-ky] step is done after the noise adaptation to match pixel intensity distribution. This step is motivated by the fact that after the addition of target image noise and source image content, the pixel values of the resulting restyled image and target image are still differently distributed. The effect of this step can be seen in the ablation study in table 1 of the supplementary material.
Combined, this does not only achieve a visually good style transfer, but we are also able to match different noise-related metrics of the target domain well as seen in figure 2. We evaluate the noise transfer from our private source domain dataset, where images were taken with a Spectralis device, on three different datasets: Two datasets from the RETOUCH challenge [Bogunovic2019-hh] 111Access can be requested at https://retouch.grand-challenge.org/ who use Cirrus and Topcon devices as well as a dataset with images taken with a Bioptigen device [Farsiu2014-xv] 222https://www.kaggle.com/paultimothymooney/farsiu-2014.
SVDNA can be used to train any segmentation network and is applied before augmenting the training images. When loading an image, either no style transfer (probability), or an SVDNA style transfer to a randomly chosen target domain () is applied, where
is the total number of domains, including the source domain. When a target domain is chosen, one image is randomly selected and used as a style target for the input image. When the source domain is chosen, no style transfer is applied. To maximize style variability, the hyperparameter, determining the amount of noise to be transferred, is randomly sampled for each style transfer individually within range . As the content of the source image is combined with the style of the target image, the annotated labels of the source dataset can be used as ground truth to train the network.
We compare our SVDNA against state-of-the-art domain adaptation approaches that we trained with the same segmentation model architecture, data processing pipeline, and augmentations. As the baseline, we use a network without any domain adaptation, which we compare to Fourier Domain Adaptation (FDA) [Yang2020-vd], Confidence regularized self-training (CRST) [Yang2019-ik], the CycleGAN approach [Romo-Bucheli2020-az] and an SVDNA trained model. For each method, the main hyperparameters were individually optimized to include each models best possible results in the comparison. Training details of all methods can be seen in the supplementary material.
Our source domain training set consists of 462 OCT scans of the macula, taken with a Spectralis device (Spectralis; Heidelberg Engineering GmbH, Heidelberg, Germany) from different patients suffering from age-related macular degeneration. It is a private dataset annotated by three retinal experts of the Department of Ophthalmology, Ludwig-Maximilians-University, Munich (B.A., J.S. and M.H.), where each pixel of an OCT scan is labeled with one of 14 classes following the Consensus Nomenclature for Reporting Neovascular Age-Related Macular Degeneration Data of the AAO (American Academy of Ophthalmology) [SPAIDE2020616]
: Intraretinal Fluid, Subretinal Fluid, Pigment Epithelium Detachments, Fibrosis, Epiretinal Membrane, Posterior Hyaloid Membrane, Subretinal Hyper Reflective Material, Neurosensory Retina, Choroid layers, Choroid Border, Vitreous and Subhyaloid Space, Retinal Pigment Epithelium, imaging artifacts, image padding. For the quantitative evaluation, the RETOUCH challenge dataset is used, where images are taken from the Topcon (images) and Cirrus ( images) device and are annotated with Subretinal Fluid, Intraretinal Fluid, Pigment Epithelium Detachments. The OCT images annotations were not used in training any algorithms but only for evaluating performance.
For all experiments a Unet++ [Zhou2018-ab] with a ResNet18 [He2016-ux] encoder is used as the segmentation model. In figure 2a, SVDNA is applied to images of three different domains, showing how one image can be fitted to the style characteristics of different target domain images. Figure 2b shows a noise representation UMAP [mcinnes2020umap] embedding. For the embedding, three different noise statistics (signal-to-noise ratio [Janesick2007-wo]
, noise variance estimator[Immerkaer1996-og]
, and wavelet noise standard deviation estimator[Donoho1994-gx]) are used. After SVDNA, the noise embeddings of the restyled Spectralis images align closely with those of the target domains Topcon and Cirrus. Only with Bioptigen, a domain with a very high noise level, there is still a gap between the respective embeddings. Additionally, we compare SVDNA against state-of-the-art domain adaptation approaches for the task of semantic segmentation on two datasets consisting of images of a Topcon or Cirrus device respectively in figure 3. As the baseline, a network without any domain adaptation is used, which is compared to Fourier Domain Adaptation (FDA) [Yang2020-vd], Confidence regularized self-training (CRST) [Yang2019-ik], the CycleGAN approach of Romo-Bucheli et al. [Romo-Bucheli2020-az] as well as our SVDNA trained model. For each method, the individual hyperparameter were optimized to include each models best possible results in the comparison. For further training details of all methods, we refer to the supplementary material. The evaluation is done on a 5-fold cross-validation over both datasets, using 80% of the target domain images as style targets for SVDNA or the Fourier Domain Adaptation or as training images for Self-Training and evaluate on the 20% remaining images and iterate this until the algorithm has been evaluated on all data samples. The CycleGAN method was, due to its complicated proposed evaluation scheme, trained on all of the images. As done in the RETOUCH challenge, we measure the performance with the dice score. The largest performance gain over the baseline can be seen for images from the Topcon device, where the SVDNA model consistently outperformed all other methods segmenting subretinal fluid and PED and is on par with the CycleGAN method for intraretinal fluid. When considering the mean performance difference across all classes, the SVDNA model performs better than all other methods.
We also benchmark SVDNA on the separate hold-off non-public test datasets, where we compare our domain adaptation method to results achieved by fully supervised trained networks submitted in the RETOUCH challenge. We handed in predictions of the naive baseline as well as from our SVDNA trained model, again on the three classes Subretinal Fluid (SRF), Intraretinal Fluid (IRF) and Pigment Epithelium Detachements (PED). With SVDNA we achieved the second-best result of 10 submitted segmentation methods on the Topcon dataset and the sixth-best on the Cirrus dataset. SVDNA achieved a large improvement over the baseline, as well as a very competitive performance compared to fully supervised trained models as can be seen in table 1. A qualitative analysis between methods can be seen in supplementary figure 1, where the three retinal experts (B.A., J.S. and M.H.) annotated all 14 classes on images from the two domains Topcon and Cirrus, as well as on the dataset of the manufacturer Bioptigen. There, accurate segmentations of the SVDNA model can be seen, whereas other methods often struggle to segment correctly.
We demonstrate that the minimal SVDNA method outperforms or performs on par with state-of-the-art UDA methods and allows for accurate cross-device segmentation of OCT Images without using any additional labeled data. The other main benefit of the SVDNA method is that it integrates directly to the regular training pipeline of semantic segmentation networks and does not need any modifications to the models architecture or training of a separate style transfer model as done in the CycleGAN approach [Romo-Bucheli2020-az], which can influence the feasibility of applying a method in practice. One possible limitation of the SVDNA method could be that it does not necessarily denoise images well. This would mean that domain adaptation from Spectralis, a less noisy domain, to Topcon, Cirrus, or Bioptigen works well but the opposite direction might not be as successful. One possible solution would then be to do test time SVDNA style transfer, meaning that when predicting low noise images on a high noise image trained model, one could add noise to the images before feeding it into the model, similar to the idea used by the CycleGAN approach where they restyle images from the target to the source domain at test time. The other benchmarked state-of-the-art domain adaptation methods did not consistently perform well for all devices. In our experiments with the CycleGAN model, we sometimes found it can slightly alter the content of the image or sometimes even fail completely to produce images similar to the input OCT image. As OCT biomarkers such as intraretinal cysts and PEDs are often represented by ambiguous and hard to detect textual features, even the slightest distortion to the morphology of the tissue can cause incorrect segmentation results, for examples of content distortions achieved by an optimized CycleGAN see supplementary figure 4. It is worth noting that in other domains such as on natural images, small context distortions might have a smaller effect on the segmentation performance. The features distinguish objects such as cars, houses, and trees or are considerably different from those representing OCT biomarkers. The FDA method [Yang2020-vd] does not directly distort the content of the source image, however, we were not able to improve meaningfully over the baseline. Depending on the hyperparameter settings, either little to no style transfer was achieved for small beta values or image distorting artifacts got introduced into images for higher beta values, as can be seen in supplementary figure 3. Finally using a self-training-based domain adaptation method did not work well in our experiments which might be due to the limited size of the datasets.
The present contribution is supported by the Helmholtz Association under the joint research school “Munich School for Data Science - MUDS“. H.S. and F.J.T. acknowledge support by the BMBF (grant number: 031L0210A) and by the Helmholtz Association’s Initiative and Networking Fund through Helmholtz AI (grant number: ZT-I-PF-5-01). We want to thank Dr. Carsten Marr for his support and Dr. Hrvoje Bogunović for his help in evaluating our results on the non-public testset.