Modern machine learning methods require large amounts of data to be trained. This data is rarely available in the field of medical image analysis, since obtaining clinical annotations is often a costly process. Therefore, the possibility of synthetically generating medical visual data is greatly appealing, and has been explored for years. However, the realistic generation of high-quality medical imagery still remains a complex unsolved challenge for current computer vision methods.
Early methods for medical image generation consisted of digital phantoms, following simplified mathematical models of human anatomy collins_design_1998 . These models slowly evolved to more complex techniques, able to reliably model relevant aspects of the different acquisition devices. When combined with anatomical and physiological information arising from expert medical knowledge, realistic images can be produced fiorini_automatic_2014 . These are useful to validate image analysis techniques hodneland_physical_2016 , for medical training liu_simulation_2010 , therapy planning cai_integrated_2014 , and a wide range of applications.
However, the traditional top-down approach of observing the available data and formulating mathematical models that explain it (image simulation) implies modeling complex natural laws by unavoidably simplifying assumptions. More recently, a new paradigm has arisen in the field of medical image generation, exploiting the bottom-up approach of directly learning from the data the relevant information. This is achieved with machine learning systems able to automatically learn the inner variability on a large training dataset tulder_why_2015 . Once trained, the same system can be sampled to output a new but plausible image (image synthesis).
In the general computer vision field, the synthesis of natural images has recently experimented a dramatic progress, based on the general idea of adversarial learning goodfellow_generative_2014 . In this context, a generator component synthesizes images from random noise, and an auxiliary discriminator system trained on real data is assigned the task of discerning whether the generated data is real or not. In the training process, the generator is expected to learn to produce images that pose an increasingly more difficult classification problem for the discriminator.
Although adversarial techniques have achieved a great success in the generation of natural images, their application to medical imaging is still incipient. This is partially due to the lack of large amounts of training data, and partially to the difficulty of finely controlling the output of the adversarial generator. In this work, we propose to apply the adversarial learning framework to retinal images. Notably, instead of generating images from scratch, we propose to generate new plausible images from binary retinal vessel trees. Therefore, the task of the generator remains achievable, as it only needs to learn how to generate part of the retinal content, such as the optical disk, or the texture of the background (Figure 1).
The remaining of this work is organized as follows: we first describe a recent generative adversarial framework isola_image–image_2016
that can be employed on pairs of vessel trees and retinal images to learn how to map the former to the latter. Then, we briefly review U-Net, a Deep Convolutional Neural Network architecture designed for image segmentation, which allows us to generate pairs of retinal images and corresponding binary vessel trees. This model provides us with a dataset of vessel trees and corresponding retinal images that we then use to train an adversarial model, producing new good-quality retinal images out of a new vessel tree. Finally, the quality of the generated images is evaluated qualitatively and quantitatively, and a description of potential future research directions is presented.
2 Adversarial Retinal Image Synthesis
2.1 Adversarial Translation from Vessel Trees to Retinal Images
Image-to-image translation is a relatively recent computer vision task in which the goal is to learn a mapping , called Generator, from an image into another representation isola_image–image_2016 . Once the model has been trained, it is able to predict the most likely representation for a previously unseen image .
However, for many problems a single input image can correspond to many different correct representations. If we consider the mapping between a retinal vessel tree and a corresponding retinal fundus image , variations in color or illumination may produce many acceptable retinal images that correspond to the same vessel tree, i.e. . Directly related to this is the choice of the objective function to be minimized while learning , which turns out to be critical. Training a model to naively minimize the distance between and for a collection of training pairs given by is known to produce low-quality results with lack of detail lotter_pgn_2015 , due to the model selecting an average of many equally valid representations.
Instead of explicitly defining a particular loss function for each task, it is possible to employ Generative Adversarial Networks to implicitly build a more appropriate lossisola_image–image_2016 . In this case, the learning process attempts to maximize the misclassification error of a neural network (called Discriminator, ) that is trained jointly with , but with the goal of discriminating between real and generated images. This way, not only but also the loss are progressively learned from examples, and adapt to each other: while tries to generate increasingly more plausible representations that can deceive , becomes better at its task, thereby improving the ability of to generate high-quality samples. Specifically, the adversarial loss is defined by:
where represents the expectation of the log-likelihood of the pair
being sampled from the underlying probability distribution of real pairs, while corresponds to the distribution of real vessel trees. An overview of this process is shown in Figure 2.
To generate realistic retinal images from binary vessel trees, we follow recent ideas from shrivastava_su_learning_2016 ; isola_image–image_2016 , which propose to combine the adversarial loss with a global loss to produce sharper results. Thus, the loss function to optimize becomes:
where balances the contribution of the two losses. The goal of the learning process is thus to find an equilibrium of this expression. The discriminator attempts to maximize eq. (2
) by classifying eachpatch of a retinal image, deciding if it comes from a real or synthetic image, while the generator aims at minimizing it. The loss controls low-frequency information in images generated by in order to produce globally consistent results, while the adversarial loss promotes sharp results. Once is trained, it is able to produce a realistic retinal image from a new binary vessel tree.
2.2 Obtaining Training Data
The model described above requires training data in the form of pairs of binary retinal vessel trees and corresponding retinal images. Since such a large scale manually annotated database is not available, we apply a state-of-the-art retinal vessel segmentation algorithm to obtain enough data for the model to learn the mapping from vessel trees to retinal images. There exist a large number of methods capable of providing reliable retinal vessel segmentations. Here we employ a supervised method based on Convolutional Neural Networks (CNNs), namely the U-Net architecture, first proposed in ronneberger_u-net:_2015 for the segmentation of biomedical images. This technique is an extension of the idea of Fully-Convolutional Networks, introduced in shelhamer_fcn_2015 , adapted to be trained with a low number of images and produce more precise segmentations.
The architecture of the U-Net consists of a downsampling and an upsampling block. The first half of the network follows a typical CNN architecture, with stacked convolutional layers of stride two and Rectified Linear Unit (ReLU) activations. The second part of the architecture upsamples the input input feature map symmetrically to the downsampling path. The feature map of the last layer of the downsampling path is upsampled so that it has the same dimension of the second last layer. The result is concatenated with the feature map of the corresponding layer in the downsampling path, and this new feature map undergoes convolution and activation. This is repeated until the upsampling path layers reach the same dimensions as the first layer of the the network.
The final layer is a convolution followed by a sigmoid activation in order to map each feature vector into vessel/non-vessel classes. The concatenation operation allows for very precise spatial localization, while preserving the coarse-level features learned during the downsampling path. A representation of this architecture as used in the present work is depicted in Figure3.
For the purpose of retinal vessel segmentation, the DRIVE database staal:2004-855 was used to train the method described in the previous Section. The images and the ground truth annotations were divided into overlapping patches of pixels and fed randomly to the U-Net, with 10% of the patches being used for validation. The network was trained using the Adam optimizer kingma:adam_2014 and binary crossentropy as the loss function.
Retinal vessel segmentation using the U-Net was evaluated on DRIVE’s test set, achieving a AUC, aligned with state-of-the-art results Liskowski_2016
. The optimal binarization threshold maximizing the Youden indexYouden_1950 was selected. Messidor decenciere_feedback_2014 images were cropped, in order to only display the field of view, and downscaled to . Then, the segmentation method was applied to these images. Messidor contains images annotated with the corresponding diabetic retinopathy grade, and displays more color and texture variability than DRIVE’s training images. Due to the U-Net being trained and tested in different datasets, some of the produced segmentations were not entirely correct. This may be related to DRIVE only containing examples of images with signs of mild diabetic retinopathy (grade 1). For this reason, we decided to retain only pairs of images and vessel trees in which the corresponding image had grade 0, 1, and 2.
The final dataset collected for training our adversarial model consisted of Messidor image pairs. This dataset was further randomly divided into training ( pairs), validation ( pairs) and test ( pairs) sets. Regarding image resolution, the original model in isola_image–image_2016 used pairs of images, with a U-Net-like generator . We modified the architecture to handle pairs, which is closer to the resolution of DRIVE images. For that, we added one layer to the downsampling part and another to the upsampling part of . The discriminator classifies overlapping patches of size
. The implementation was developed in Python using Keras111Code to reproduce our results is available at https://github.com/costapt/vess2ret chollet_keras_2015 . The learning process starts by training with real and generated pairs . Then, is trained with real pairs. This process is repeated iteratively until the losses of and stabilize.
3 Experimental Evaluation
For a subjective visual evaluation of the images generated by our model, we show in Figure 4 some results. The first row depicts a random sample of real images extracted from the held-out test set, which was not used during training. The second row shows vessel trees segmented from those images with the method outlined in Section 2.2, and the bottom row shows the synthetic retinal images produced by the proposed technique. We see that the original and the generated images share some global geometric characteristics. This is natural, since they approximately share the same vascular structure. However, the synthetic images have markedly different high-level visual features, such as the color and tone of the image, or the illumination. This information was extracted by our model from the training set, and effectively applied to the input vessel trees in order to produce realistic retinal images.
The first seven columns of Figure 4 show results in which the model behaved as expected: the vessel trees retrieved from the images in the first row were approximately correct, and provided sufficient information for the generator to create new consistent information in the synthetic image, shown in the last row. The last column in Figure 4 shows a failure case of the proposed technique. Therein, the segmentation technique described in Section 2.2 failed to produce a meaningful vessel network out of the original image. This is probably due to the high degree of defocus that the input image had. In this situation, the binary vessel tree supplied to the generator contained too few information, leading to the appearance of spurious artifacts and chromatic noise in the synthetic image. Fortunately, the amount of cases in which this happens was relatively low: out of our test set of images, were found to suffer from artifacts.
Objective image quality verification is known to be a hard challenge when no reference is available wang_why_2002 . In addition, for generative models it has been recently observed that specialized evaluation should be performed for each problem theis_note_2016 . In our case, to achieve a meaningful objective quantitative evaluation of the quality of the generated images, we apply two different retinal image quality metrics, namely the score, proposed in koler_2013 , and the Image Structure Clustering (ISC) metric niemeijer_image_2006 . Both metrics have been employed previously to assess the quality of retinal images. While the score focuses more on the assessment of contrast around vessel pixels, the ISC metric performs a more global evaluation. Thus, together they provide an appropriate mechanism to quantitatively evaluate the correctness of a synthetically generated retinal image.
It is worth noting that in cases where artifacts and distortions were generated due to the undercomplete vessel network problem explained above, the ISC metric tended to artificially rise the quality of the synthetic image, as compared to the real one. Due to this, synthetic images containing this class of degradations were manually identified and removed from the ISC metric analysis below, together with their real counterparts. A more detailed discussion of both of the employed retinal image quality metrics, and their behavior when distorted images where supplied to them is provided in appendix A, together with supplementary results generated by the proposed technique.
The ISC score was computed on a reduced test set of 171 images (after removing the images with visual artifacts), while the score was computed on all the images. The statistical analysis performed on both quality score distributions showed that both were normal according to the Kolmogorov-Smirnov test. The resulting data was therefore expressed as mean standard deviation, and compared with the paired Student’s t-test. All -values were two-tailed and was considered significant. Statistical analyses were performed using GraphPad Prism 7 (Graphpad Software Inc.) software. Results obtained with this methodology are shown in Table 1.
|Mean score||Std. dev.||Mean score||Mean score|
Statistically significant results are shown in bold.
In the case of the ISC metric, the synthetic images produced a slightly higher quality score, with the difference between them not statistically significant (). For the score, the real images were considered to be of better quality with regard to their synthetic counterparts, the difference being statistically significant (). However, it should be considered that the score consists of an anisotropy measure weighted by the values of a simple vessel detector (see Appendix A.1). In this case, it can be expected that image regions around the vessels of a synthetic image won’t probably be of a better quality than the original ones. On the other hand, results on the ISC metric, which has a more global nature, point to a similar quality in the real and synthetic images, which agrees with the subjective visual quality found in the produced images, see Appendix A.2.
4 Conclusions and Future Work
The above visual and quantitative results demonstrate the feasibility of learning to synthesize new retinal images from a dataset of pairs of retinal vessel trees and corresponding retinal images, applying current generative adversarial models. In addition, the dimension of the produced images was , which is greater than commonly generated images on general computer vision problems. We believe that achieving this resolution was only possible due to the constrained class of images in which the method was applied: contrarily to generic natural images, retinal images show a repetitive geometry, where high-level structures such as the field of view, the optical disc, or the macula, are usually present in the image, and act as a guide for the model to learn how to produce new texture and background intensities.
The main limitation of the presented method is its dependence on a pre-existing vessel tree in order to generate a new image. Furthermore, if the vessel tree comes from the application of a segmentation technique to the original image, the potential weaknesses of the segmentation algorithm will be inherited by the synthesized image. We are currently working on overcoming these challenges.
This work is financed by the ERDF – European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme, by National Funds through the FCT – Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project CMUP-ERI/TIC/0028/2014 and by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement within the project "NanoSTIMA: Macro-to-Nano Human Sensing: Towards Integrated Multimodal Health Monitoring and Analytics/NORTE-01-0145-FEDER-000016". MDA is the recipient of the Robert C. Watzke Professor of Ophthalmology and Visual Sciences. IDx LLC has no interest in any of the algorithms discussed in this study.
Appendix A Synthetic Retinal Image Quality Evaluation - Discussion
a.1 Image Quality Metrics
We discuss now the technical details of the two retinal image quality metrics employed in this work. Regarding the score koler_2013 , it is a no-reference quality metric that proceeds by computing a local degree of vesselness around each pixel. This is achieved by building a multiscale version of the input image, represented by the local Hessian matrix around each pixel extracted from the green channel. Frangi’s vesselness measure is then computed frangi_multiscale_1998zhu_automatic_2010 , and the final quality score is obtained as a weighted average of the vesselness map and the local anisotropy values. This way, only vessel pixels are considered in this metric, since these are expected to be good candidates for a reliable contrast and focus estimate.
On the other hand, the Image Structure Clustering (ISC) proposed in niemeijer_image_2006 follows a substantially different approach. Even if it is also a no-reference quality metric, it is trained on a dataset of retinal images. This dataset contained images (independent of our training set) that had been previously labeled by medical experts, depending on whether they showed enough visibility to perform diagnosis. The ISC metric assesses a correct distribution of pixel intensities corresponding to the relevant anatomical structures present in the retina. This is achieved by extracting features consisting of intensities and Gaussian derivatives of the , , and
channels, and then employing k-means to group them intodifferent clusters. These are observed to be sufficient to model the relevant regions of a retinal image (vessels, optical disk, macula, background-to-foreground and foreground-to-background transitions). Histograms of counts of the computed features are then passed to an SVM, which is trained to predict if the presence and proportion of pixels associated to those structures is consistent, according to the training set correspondent quantities.
Both metrics seem thus quite complementary, since the ISC technique considers regions from the image that are not addressed by the score. In our experiments, however, we noticed that the artifacts produced when the generative model was provided an undercomplete vessel tree tended to rise the ISC score. This drawback was not observed when the score was computed.
We believe that the reason for this was the following: starting from a real synthetic image, our method employs the vessel tree extracted from it to synthesize a new image; thus, the amount of vessel pixels present in a real image will always be greater than in the corresponding synthetic image, favoring the score. The ISC metric does not only rely on vessels, but on other anatomical structures. In addition, it considers the three color channels, while the score employs only one of them. When supplied an image with artifacts such as those in Figure 5, the ISC score finds that the proportion of colors and edges is not adequate, but still relatively acceptable (note that the scores assigned to the synthetic images are not high in these cases). This situation was detected only on images from the entire images present in our test set. Accordingly, for a fair comparison, those images were removed from the statistical experiments that involved the ISC score. Since the score seemed to be unaffected by this problem, we include every test image on its analysis.
We believe that current retinal image quality metrics are reasonably suitable to assess the visual quality of synthetic images. However, the study of the anatomical plausibility of these images may benefit of specifically designed quality metrics, that may involve different aspects (local and global) of existing quality assessment approaches.
a.2 Supplementary Results
Below we show a random sample of the results produced by our model, together with their real counterparts.
Further results are displayed below:
-  D. L. Collins, A. P. Zijdenbos, V. Kollokian, J. G. Sled, N. J. Kabani, C. J. Holmes, and A. C. Evans. Design and construction of a realistic digital brain phantom. IEEE Transactions on Medical Imaging, 17(3):463–468, June 1998.
-  Samuele Fiorini, Lucia Ballerini, Emanuele Trucco, and Alfredo Ruggeri. Automatic generation of synthetic retinal fundus images. In Constantino Carlos Reyes-Aldasoro and Greg Slabaugh, editors, Medical Image Understanding and Analysis 2014, pages 7–12. BMVA Press, 2014.
-  Erlend Hodneland, Erik Hanson, Antonella Z. Munthe-Kaas, Arvid Lundervold, and Jan M. Nordbotten. Physical Models for Simulation and Reconstruction of Human Tissue Deformation Fields in Dynamic MRI. IEEE Transactions on Bio-Medical Engineering, 63(10):2200–2210, October 2016.
-  X. Liu, H. Liu, A. Hao, and Q. Zhao. Simulation of Blood Vessels for Surgery Simulators. In 2010 International Conference on Machine Vision and Human-machine Interface, pages 377–380, April 2010.
-  Jing Cai, You Zhang, Irina Vergalasova, Fan Zhang, W. Paul Segars, and Fang-Fang Yin. An Integrated Simulation System Based on Digital Human Phantom for 4d Radiation Therapy of Lung Cancer. Journal of Cancer Therapy, 2014, July 2014.
-  Gijs van Tulder and Marleen de Bruijne. Why Does Synthesized Data Improve Multi-sequence Classification? In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 531–538. Springer International Publishing, October 2015.
-  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
-  Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. arXiv.org, November 2016. arXiv: 1611.07004.
-  William Lotter, Gabriel Kreiman, and David Cox. Unsupervised Learning of Visual Structure using Predictive Generative Networks. arXiv preprint arXiv:1511.06380, 2015.
-  Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russ Webb. Learning from Simulated and Unsupervised Images through Adversarial Training. arXiv preprint arXiv:1612.07828, 2016.
-  Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241. Springer, Cham, October 2015.
Evan Shelhamer, Jonathan Long, and Trevor Darrell.
Fully convolutional networks for semantic segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
-  J. J. Staal, M. D. Abramoff, M. Niemeijer, M. A. Viergever, and B. van Ginneken. Ridge based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging, 23(4):501–509, 2004.
-  Diederik Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations, pages 1–13, 2014.
-  P. Liskowski and K. Krawiec. Segmenting retinal blood vessels with deep neural networks. IEEE Transactions on Medical Imaging, 35(11):2369–2380, Nov 2016.
-  W. J. Youden. Index for rating diagnostic tests. Cancer, 3(1):32–35, 1950.
-  Etienne Decencière, Xiwei Zhang, Guy Cazuguel, Bruno Lay, Béatrice Cochener, Caroline Trone, Philippe Gain, Richard Ordonez, Pascale Massin, Ali Erginay, Béatrice Charton, and Jean-Claude Klein. Feedback on a publicly distributed database: the Messidor database. Image Analysis & Stereology, 33(3):231–234, August 2014.
-  François Chollet. Keras. https://github.com/fchollet/keras, 2015.
-  Z. Wang, A. C. Bovik, and L. Lu. Why is image quality assessment so difficult? In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages IV–3313–IV–3316, May 2002.
-  L. Theis, A. van den Oord, and M. Bethge. A note on the evaluation of generative models. In International Conference on Learning Representations, 2016.
-  T. Köhler, A. Budai, M. F. Kraus, J. Odstrčilik, G. Michelson, and J. Hornegger. Automatic no-reference quality assessment for retinal fundus images using vessel segmentation. In Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems, pages 95–100, June 2013.
-  Meindert Niemeijer, Michael D. Abràmoff, and Bram van Ginneken. Image structure clustering for image quality verification of color retina images in diabetic retinopathy screening. Medical Image Analysis, 10(6):888–898, December 2006.
-  Alejandro F. Frangi, Wiro J. Niessen, Koen L. Vincken, and Max A. Viergever. Multiscale vessel enhancement filtering. In Medical Image Computing and Computer-Assisted Intervention — MICCAI’98, pages 130–137. Springer, Berlin, Heidelberg, October 1998. DOI: 10.1007/BFb0056195.
-  X. Zhu and P. Milanfar. Automatic Parameter Selection for Denoising Algorithms Using a No-Reference Measure of Image Content. IEEE Transactions on Image Processing, 19(12):3116–3132, December 2010.