, including object detection, image classification, face recognition, and medical image analysis. The large scale training data is extremely important for training accurate and deep models. Although it is easy to collect data in conventional computer vision tasks, it is often difficult to obtain sufficient high quality data in medical imaging area. Recently, Generative Adversarial Networks (GANs) are proposed to generate a distribution that matches the real data distribution via an adversarial process. Due to the powerful capability of image generation, GANs have been successfully applied to many medical image synthesis tasks, including retinal fundus [2, 19], X-Ray , CT and MRI images  synthesizing.
The GANs algorithms can be divided into the conditional and unconditional manners. The conditional GANs direct the data generation process by conditioning the model on additional information 
, which have been widely used in cross-modality synthesis and conditioned segmentation. For example, the pix2pix method is proposed to translate images from one type to another
. An auxiliary classifier GAN (ACGAN) is provided to produce higher quality sample by adding more structures to the GAN latent space. In , a CT and MRI translation network is provided to segment multimodal medical volumes. By contrast, the unconditional GANs synthesize images from random noise without any conditional constraint, which are mainly used to generate images. For example, Deep Convolutional GAN (DCGAN)  uses deep convolution structure to generate images. -GAN  materializes a two-stage network and depth maps to generate images with realistic surface normal map (i.e, generate RGBD images). However, the -GAN requires depth maps of the training dataset, while we usually do not have medical image datasets with paired depth maps. Wasserstein GAN (WGAN)  improves the loss and training stability of previous GANs to obtain a better performance. Progressive Growing GAN (PGGAN)  grows the depth of convolution layers to produce the high resolution natural images.
In this paper, we aim to generate high quality medical images with correct anatomical objects and realistic foreground structures. Inspired by realistic drawing procedures of human painting , which is composed of stroking and color rendering, we propose a novel unconditional GAN named Sketching-rendering Unconditional Generative Adversarial Network (SkrGAN) for medical image synthesis. Our SkrGAN decomposes into two tractable sub-modules: one sketch guidance module generating the structural sketch from random noise; and one color render mapping module producing the structure-preserved medical images. The main contributions of this paper are summarized as follows:
1) An unconditional GAN, named SkrGAN, is proposed for medical image synthesis. By decomposing the whole image generation into sketch guidance and color rendering stages, our SkrGAN could embed the sketch structural representations to guide the high quality medical image generation.
2) The experiments in four medical imaging modalities synthesizing tasks show that our SkrGAN is more accurate and robust to variations in the size, intensity inhomogeneity and modality of the data than other state-of-the-art GAN methods.
3) The medical image segmentation experiments demonstrate that our SkrGAN could be applied as a data augmentation method to improve the segmentation performance effectively.
2 Proposed Method
Inspired by realistic drawing skills of the human painting , which suggests that the painting is usually accomplished from simple to difficult procedures, i.e., from sketching to color rendering, we propose a novel Sketching-rendering Unconditional Generative Adversarial Networks (SkrGAN), to generate high quality medical images with realistic anatomical structures. As shown in Fig. 2, we decompose the entire image generator into two phases, as a sketch guidance module (in Sec. 2.2) and a color render mapping (in Sec. 2.3) . The sketch guidance module generates the sketch structural representations with a sketch discriminator , while the color render mapping embeds the sketch representations to generate the final image with a color discriminator .
2.1 Sketch Draft Preparation
In order to train our SkrGAN, the sketch draft corresponding to the input training image is required by sketch discriminator . We aim to retain the main structural information of the given images, such as the blood vessels of retinal fundus, and bones of X-ray images. In our method, firstly the Sobel edge detection method is used to extract the initial structural boundaries, and then a Gaussian lowpass filtering is applied to remove the isolated noise and pixels. Finally, a morphological operation consisting of an opening process followed by a closing process is employed to remove noise further and fill the vessel-like structures. This procedure will greatly reduce the complexity of sketch images, which makes the sketch synthetic process easier than just using traditional edge detection methods. An example of sketch draft detection method could be found at the bottom of Fig. 2, where the main sketch structures (e.g., vessels and bones) are extracted.
2.2 Sketch Guidance Module
With the given dataset and corresponding sketch draft set by the sketch draft extraction, the sketch guidance module is trained by using loss in sketch discriminator :
where and represent the noise pattern and latent code respectively; represents the distribution of and is the element-wise multiplication. denote discriminating layers of the discriminator in different levels, whose inputs are determined to different resolutions. are the generating layers of different resolutions, respectively. More concretely, our method iteratively adds convolutional layers of the generator and the discriminator during the training period, which guarantees to synthesize images at resolutions. Additionally, the training process fades in the high resolution layer smoothly by using skip connections and the smooth coefficients. For simplicity, we utilize the network structure in PGGAN  as the backbone of .
2.3 Color Render Mapping
The color render mapping translates the generated sketch representations to color images, which contains the U-net  structure as backbone, and a color discriminator for adversarial training. Two losses and for training are described as:
where represent the training pair of real image and sketch. The is utilized to provide adversarial loss for training , while is utilized to calculate the
norm for accelerating training. Finally, the full objective of our SkrGAN is given by the combination of the loss functions in Eq (1) and Eq (2):
Three public datasets and one in-house dataset are utilized in our experiments: Chest X-Ray dataset 111https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia with 5,863 images categorized into Pneumonia and normal; Kaggle Lung dataset222https://www.kaggle.com/kmader/finding-lungs-in-ct-data/data/ with 267 CT images; Brain MRI dataset333http://neuromorphometrics.com with 147 selected images and a local retinal color fundus dataset (RCF) with 6,432 retinal images collected from local hospitals. In our unconditional experiment, we do not need labeling information.
3.0.2 Evaluation Metrics:
In this work, we employ the following three metrics to evaluate the performance in the synthetic medical images, including multi-scale structural similarity (MS-SSIM), Sliced Wasserstein Distance (SWD) , and Freshet Inception Distance (FID) . MS-SSIM is a widely used metric to measure the similarity of paired images, where the higher MS-SSIM the better performance. SWD is an efficient metric to compute random approximation to earth mover’s distance, which has also been used for measuring GAN performance, where the lower SWD the better performance. FID calculates the distance between real and fake images at feature level, where the lower FID the better performance.
3.0.3 Experimental Results:
The images from all datasets are firstly resized to . In , , and , we use Adam optimizers, where the learning rate of and are set to , and the learning rate of our and are set to . Based on experience, we set the value of in Eq (2) to and a small change of does not affect much the performance. The batch size of our model is set to
. The proposed SkrGAN is implemented on PyTorch library with two NVIDIA GPUs (GeForce TITAN XP).
To justify the performance of the proposed method, we compare our SkrGAN with four state-of-the-art GANs: DCGAN  , ACGAN  , WGAN  and PGGAN . These different methods are used to generate 100 images, and the aforementioned metrics are used for quantitative comparsions by using these generated images. Table 1 summarizes the results. It can be seen that our SkrGAN achieves SWD of and , MS-SSIM of , , and and FID of , , and on the generated retinal color fundus, Chest X-ray, lung CT and brain MRI images, better than other GANs. On one hand, as DCGAN, ACGAN, WGAN and PGGAN are not designed for generating high resolution images from a small dataset. Therefore, these methods produce relatively poor results on generating medical images from small training datasets. On the other hand, these methods only consider the global contextual information and ignore the foreground structures, which lead to the discontinued and distorted sketch structures, such as the discontinued vessel and distorted disc cup in retinal color fundus, the discontinued bones and the distorted lung in chest X-ray, the discontinued ribs in CT and the distorted textures in MRI. By contrast, our method uses sketch to guide the intermediate training step, which guarantees the network to generate high quality medical images with realistic anatomical structures.
Fig. 3 illustrates examples of the synthetic images by DCGAN, ACGAN, WGAN, PGGAN, and our method in the four different medical image modalities: CT, X-Ray, retinal color fundus and MRI. It can be observed that SkrGAN presents visually appealing results, where most of the structural features such as the vessel in color fundus, bones in X-ray, ribs and backbone in CT, texture distribution in MRI are close to those in real images. On the contrary, there are some structural distortions in images, which are generated by other GANs, as illustrated by green arrows in Fig 3.
3.0.4 Application to Vessel Segmentation:
Besides the above quantitative and qualitative comparisons, we further apply the proposed SkrGAN as a data augmentation method on a vessel segmentation task in DRIVE444https://www.isi.uu.nl/Research/Databases/DRIVE/  (including 20 training images and 20 testing images). The DRIVE dataset provides two expert manual annotations, and the first one is chosen as the ground truth for performance evaluation in the literature. We generated 2000 synthetic images
and utilized the generated sketches as the label to pretrain a vessel detection network. In this paper, we use the U-net , which is widely used in many biomedical segmentation tasks. The pretrained model is then further finetuned for vessel detection using 20 training images and tested in 20 testing images.
To justify the benefits of the synthetic images for training the segmentation network, we compared the trained model using synthetic images with the model without pretraining. The following metrics were calculated to provide an objective evaluation: , , and the Area Under the ROC Curve (AUC). The results summarized in Table 2 shows that: pretraining with synthetic images improves SEN of the vessel detection by , while and are improved by pretraining with the synthetic pairs too.
In this paper, we have proposed an unconditional GAN named Sketching-rendering Unconditional Generative Adversarial Network (SkrGAN) that is capable of generating high quality medical images. Our SkrGAN embedded the sketch representation to guide the unconditional medical image synthesis and generate images with realistic foreground structures. The experiments on four types of medical images, including retinal color fundus, chest X-ray, lung CT and brain MRI, showed that our SkrGAN obtained state-of-the-art performances in medical image synthesis. It demonstrated that the sketch information can benefit the structure generation. Besides, the application of retina vessel segmentation showed that the SkrGAN could be used as a data augmentation method to improve deep network training.
-  (2017) Wasserstein GAN. arXiv. Cited by: §1, Figure 3, §3.0.3, Table 1.
-  (2017) End-to-end adversarial retinal image synthesis. IEEE TMI 37 (99), pp. 781–791. Cited by: §1.
-  (2018) Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE transactions on medical imaging 37 (7), pp. 1597–1605. Cited by: §1.
-  (2014) Generative adversarial networks. NIPS. Cited by: §1.
-  (2019) CE-net: context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging. Cited by: §1.
Image-to-image translation with conditional adversarial networks. In CVPR, pp. 1125–1134. Cited by: §1.
-  (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv. Cited by: Figure 1, §1, §2.2, Figure 3, §3.0.2, §3.0.3, Table 1.
-  (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 (5), pp. 1122–1131. Cited by: §3.0.1.
-  (2018) Semi-supervised learning with generative adversarial networks for chest x-ray classification with ability of data domain adaptation. In ISBI, pp. 1038–1042. Cited by: §1.
-  (2014) Conditional generative adversarial nets. arXiv. Cited by: §1.
-  (2017) Conditional image synthesis with auxiliary classifier GANs. In ICML, pp. 2642–2651. Cited by: Figure 1, §1, Figure 3, §3.0.3, Table 1.
-  (2012) Perceptual constancies and visual selection as predictors of realistic drawing skill.. Psychology of Aesthetics, Creativity, and the Arts 6 (2), pp. 124–136. Cited by: §1, §2.
Computational optimal transport.
Foundations and Trends® in Machine Learning11 (5-6), pp. 355–607. Cited by: §3.0.2.
-  (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. international conference on learning representations. Cited by: Figure 1, §1, Figure 3, §3.0.3, Table 1.
-  (2015) U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241. Cited by: §2.3, §3.0.4.
-  (2004) Ridge-based vessel segmentation in color images of the retina. IEEE TMI 23 (4), pp. 501–509. Cited by: §3.0.4.
-  (2016) Generative image modeling using style and structure adversarial networks. In European Conference on Computer Vision, pp. 318–335. Cited by: §1.
-  (2018) Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network. In CVPR, pp. 9242–9251. Cited by: §1, §1.
Synthesizing retinal and neuronal images with generative adversarial nets. Medical Image Analysis 49, pp. 14–26. Cited by: §1.