SkrGAN: Sketching-rendering Unconditional Generative Adversarial Networks for Medical Image Synthesis

by   Tianyang Zhang, et al.

Generative Adversarial Networks (GANs) have the capability of synthesizing images, which have been successfully applied to medical image synthesis tasks. However, most of existing methods merely consider the global contextual information and ignore the fine foreground structures, e.g., vessel, skeleton, which may contain diagnostic indicators for medical image analysis. Inspired by human painting procedure, which is composed of stroking and color rendering steps, we propose a Sketching-rendering Unconditional Generative Adversarial Network (SkrGAN) to introduce a sketch prior constraint to guide the medical image generation. In our SkrGAN, a sketch guidance module is utilized to generate a high quality structural sketch from random noise, then a color render mapping is used to embed the sketch-based representations and resemble the background appearances. Experimental results show that the proposed SkrGAN achieves the state-of-the-art results in synthesizing images for various image modalities, including retinal color fundus, X-Ray, Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). In addition, we also show that the performances of medical image segmentation method have been improved by using our synthesized images as data augmentation.


page 2

page 3

page 6


Generation of Artificial CT Images using Patch-based Conditional Generative Adversarial Networks

Deep learning has a great potential to alleviate diagnosis and prognosis...

Structure Unbiased Adversarial Model for Medical Image Segmentation

Generative models have been widely proposed in image recognition to gene...

Medical Image Segmentation on MRI Images with Missing Modalities: A Review

Dealing with missing modalities in Magnetic Resonance Imaging (MRI) and ...

ProstateGAN: Mitigating Data Bias via Prostate Diffusion Imaging Synthesis with Generative Adversarial Networks

Generative Adversarial Networks (GANs) have shown considerable promise f...

Synthesis and Edition of Ultrasound Images via Sketch Guided Progressive Growing GANs

Ultrasound (US) is widely accepted in clinic for anatomical structure in...

ResViT: Residual vision transformers for multi-modal medical image synthesis

Multi-modal imaging is a key healthcare technology in the diagnosis and ...

XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches

Sketches are a medium to convey a visual scene from an individual's crea...

1 Introduction

In the last decade, deep learning techniques have shown to be very promising in many visual recognition tasks [3, 5]

, including object detection, image classification, face recognition, and medical image analysis. The large scale training data is extremely important for training accurate and deep models. Although it is easy to collect data in conventional computer vision tasks, it is often difficult to obtain sufficient high quality data in medical imaging area. Recently, Generative Adversarial Networks (GANs) are proposed to generate a distribution that matches the real data distribution via an adversarial process 

[4]. Due to the powerful capability of image generation, GANs have been successfully applied to many medical image synthesis tasks, including retinal fundus [2, 19], X-Ray [9], CT and MRI images [18] synthesizing.

Figure 1: Synthesized retinal images by PGGAN [7], DCGAN [14], ACGAN [11] and our SkrGAN. Compared with these methods, our method performs better in retaining structural details, e.g., blood vessels, disc and cup regions, as indicated by green arrows.

The GANs algorithms can be divided into the conditional and unconditional manners. The conditional GANs direct the data generation process by conditioning the model on additional information [10]

, which have been widely used in cross-modality synthesis and conditioned segmentation. For example, the pix2pix method is proposed to translate images from one type to another 


. An auxiliary classifier GAN (ACGAN) is provided to produce higher quality sample by adding more structures to the GAN latent space 

[11]. In [18], a CT and MRI translation network is provided to segment multimodal medical volumes. By contrast, the unconditional GANs synthesize images from random noise without any conditional constraint, which are mainly used to generate images. For example, Deep Convolutional GAN (DCGAN) [14] uses deep convolution structure to generate images. -GAN [17] materializes a two-stage network and depth maps to generate images with realistic surface normal map (i.e, generate RGBD images). However, the -GAN requires depth maps of the training dataset, while we usually do not have medical image datasets with paired depth maps. Wasserstein GAN (WGAN) [1] improves the loss and training stability of previous GANs to obtain a better performance. Progressive Growing GAN (PGGAN) [7] grows the depth of convolution layers to produce the high resolution natural images.

In this paper, we aim to generate high quality medical images with correct anatomical objects and realistic foreground structures. Inspired by realistic drawing procedures of human painting [12], which is composed of stroking and color rendering, we propose a novel unconditional GAN named Sketching-rendering Unconditional Generative Adversarial Network (SkrGAN) for medical image synthesis. Our SkrGAN decomposes into two tractable sub-modules: one sketch guidance module generating the structural sketch from random noise; and one color render mapping module producing the structure-preserved medical images. The main contributions of this paper are summarized as follows:
1) An unconditional GAN, named SkrGAN, is proposed for medical image synthesis. By decomposing the whole image generation into sketch guidance and color rendering stages, our SkrGAN could embed the sketch structural representations to guide the high quality medical image generation.
2) The experiments in four medical imaging modalities synthesizing tasks show that our SkrGAN is more accurate and robust to variations in the size, intensity inhomogeneity and modality of the data than other state-of-the-art GAN methods.
3) The medical image segmentation experiments demonstrate that our SkrGAN could be applied as a data augmentation method to improve the segmentation performance effectively.

2 Proposed Method

Figure 2: Illustration of our SkrGAN structure, which can generate medical images from the input noises. The sketch guidance module (blue block) obtains the representations based on sketch structure discriminator . The color render mapping (green block) embeds the sketch representations to generate the final color image with a discriminator . Moreover, We also extract a sketch draft dataset (bottom) for training the model. (best viewed in color)

Inspired by realistic drawing skills of the human painting [12], which suggests that the painting is usually accomplished from simple to difficult procedures, i.e., from sketching to color rendering, we propose a novel Sketching-rendering Unconditional Generative Adversarial Networks (SkrGAN), to generate high quality medical images with realistic anatomical structures. As shown in Fig. 2, we decompose the entire image generator into two phases, as a sketch guidance module (in Sec. 2.2) and a color render mapping (in Sec. 2.3) . The sketch guidance module generates the sketch structural representations with a sketch discriminator , while the color render mapping embeds the sketch representations to generate the final image with a color discriminator .

2.1 Sketch Draft Preparation

In order to train our SkrGAN, the sketch draft corresponding to the input training image is required by sketch discriminator . We aim to retain the main structural information of the given images, such as the blood vessels of retinal fundus, and bones of X-ray images. In our method, firstly the Sobel edge detection method is used to extract the initial structural boundaries, and then a Gaussian lowpass filtering is applied to remove the isolated noise and pixels. Finally, a morphological operation consisting of an opening process followed by a closing process is employed to remove noise further and fill the vessel-like structures. This procedure will greatly reduce the complexity of sketch images, which makes the sketch synthetic process easier than just using traditional edge detection methods. An example of sketch draft detection method could be found at the bottom of Fig. 2, where the main sketch structures (e.g., vessels and bones) are extracted.

2.2 Sketch Guidance Module

With the given dataset and corresponding sketch draft set by the sketch draft extraction, the sketch guidance module is trained by using loss in sketch discriminator :


where and represent the noise pattern and latent code respectively; represents the distribution of and is the element-wise multiplication. denote discriminating layers of the discriminator in different levels, whose inputs are determined to different resolutions. are the generating layers of different resolutions, respectively. More concretely, our method iteratively adds convolutional layers of the generator and the discriminator during the training period, which guarantees to synthesize images at resolutions. Additionally, the training process fades in the high resolution layer smoothly by using skip connections and the smooth coefficients. For simplicity, we utilize the network structure in PGGAN [7] as the backbone of .

2.3 Color Render Mapping

The color render mapping translates the generated sketch representations to color images, which contains the U-net [15] structure as backbone, and a color discriminator for adversarial training. Two losses and for training are described as:


where represent the training pair of real image and sketch. The is utilized to provide adversarial loss for training , while is utilized to calculate the

norm for accelerating training. Finally, the full objective of our SkrGAN is given by the combination of the loss functions in Eq (

1) and Eq (2):


3 Experiments

3.0.1 Datasets:

Three public datasets and one in-house dataset are utilized in our experiments: Chest X-Ray dataset [8]111 with 5,863 images categorized into Pneumonia and normal; Kaggle Lung dataset222 with 267 CT images; Brain MRI dataset333 with 147 selected images and a local retinal color fundus dataset (RCF) with 6,432 retinal images collected from local hospitals. In our unconditional experiment, we do not need labeling information.

3.0.2 Evaluation Metrics:

In this work, we employ the following three metrics to evaluate the performance in the synthetic medical images, including multi-scale structural similarity (MS-SSIM), Sliced Wasserstein Distance (SWD) [13], and Freshet Inception Distance (FID) [7]. MS-SSIM is a widely used metric to measure the similarity of paired images, where the higher MS-SSIM the better performance. SWD is an efficient metric to compute random approximation to earth mover’s distance, which has also been used for measuring GAN performance, where the lower SWD the better performance. FID calculates the distance between real and fake images at feature level, where the lower FID the better performance.

3.0.3 Experimental Results:

Figure 3: These images are generated by different GANs: from left to right are results by: (a) PGGAN [7] , (b) WGAN [1] , (c) DCGAN [14] , (d) ACGAN [11] and (e) Our SkrGAN . The synthetic sketches generated from random noise are shown in the figure (f). From top to bottom, we show results from: CT, X-ray, Retina color fundus and MRI. The green arrows illustrate the structural distortions in the generated images. (More visualization results could be found in Supplementary Material.)

The images from all datasets are firstly resized to . In , , and , we use Adam optimizers, where the learning rate of and are set to , and the learning rate of our and are set to . Based on experience, we set the value of in Eq (2) to and a small change of does not affect much the performance. The batch size of our model is set to

. The proposed SkrGAN is implemented on PyTorch library with two NVIDIA GPUs (GeForce TITAN XP).

To justify the performance of the proposed method, we compare our SkrGAN with four state-of-the-art GANs: DCGAN [14] , ACGAN [11] , WGAN [1] and PGGAN [7]. These different methods are used to generate 100 images, and the aforementioned metrics are used for quantitative comparsions by using these generated images. Table 1 summarizes the results. It can be seen that our SkrGAN achieves SWD of and , MS-SSIM of , , and and FID of , , and on the generated retinal color fundus, Chest X-ray, lung CT and brain MRI images, better than other GANs. On one hand, as DCGAN, ACGAN, WGAN and PGGAN are not designed for generating high resolution images from a small dataset. Therefore, these methods produce relatively poor results on generating medical images from small training datasets. On the other hand, these methods only consider the global contextual information and ignore the foreground structures, which lead to the discontinued and distorted sketch structures, such as the discontinued vessel and distorted disc cup in retinal color fundus, the discontinued bones and the distorted lung in chest X-ray, the discontinued ribs in CT and the distorted textures in MRI. By contrast, our method uses sketch to guide the intermediate training step, which guarantees the network to generate high quality medical images with realistic anatomical structures.

Fig. 3 illustrates examples of the synthetic images by DCGAN, ACGAN, WGAN, PGGAN, and our method in the four different medical image modalities: CT, X-Ray, retinal color fundus and MRI. It can be observed that SkrGAN presents visually appealing results, where most of the structural features such as the vessel in color fundus, bones in X-ray, ribs and backbone in CT, texture distribution in MRI are close to those in real images. On the contrary, there are some structural distortions in images, which are generated by other GANs, as illustrated by green arrows in Fig 3.

Evaluation Method
Dataset Metric SkrGAN DCGAN[14] ACGAN[11] WGAN[1] PGGAN[7]
Color Fundus SWD 0.025 0.160 0.149 0.078 0.036
MS-SSIM 0.614 0.418 0.490 0.584 0.537
FID 27.59 64.83 96.72 240.7 110.8
Chest X-ray SWD 0.026 0.118 0.139 0.196 0.031
MS-SSIM 0.506 0.269 0.301 0.401 0.493
FID 114.6 260.3 235.2 300.7 124.2
Lung CT SWD 0.020 0.333 0.317 0.236 0.057
MS-SSIM 0.359 0.199 0.235 0.277 0.328
FID 79.97 285.0 222.5 349.1 91.89
Brain MRI SWD 0.028 0.163 0.122 0.036 0.042
MS-SSIM 0.436 0.277 0.235 0.314 0.411
FID 27.51 285.0 222.5 176.1 33.76
Table 1: Performances (mean) of different GANs on Retinal color fundus, chest X-Ray, lung CT and Brain MRI.

3.0.4 Application to Vessel Segmentation:

Besides the above quantitative and qualitative comparisons, we further apply the proposed SkrGAN as a data augmentation method on a vessel segmentation task in DRIVE444 [16] (including 20 training images and 20 testing images). The DRIVE dataset provides two expert manual annotations, and the first one is chosen as the ground truth for performance evaluation in the literature. We generated 2000 synthetic images

Pretrain SEN ACC AUC
with 0.8464 0.9513 0.9762
whithout 0.7781 0.9477 0.9705
Table 2: Segmentation performance of U-net

and utilized the generated sketches as the label to pretrain a vessel detection network. In this paper, we use the U-net [15], which is widely used in many biomedical segmentation tasks. The pretrained model is then further finetuned for vessel detection using 20 training images and tested in 20 testing images.

To justify the benefits of the synthetic images for training the segmentation network, we compared the trained model using synthetic images with the model without pretraining. The following metrics were calculated to provide an objective evaluation: , , and the Area Under the ROC Curve (AUC). The results summarized in Table 2 shows that: pretraining with synthetic images improves SEN of the vessel detection by , while and are improved by pretraining with the synthetic pairs too.

4 Conclusion

In this paper, we have proposed an unconditional GAN named Sketching-rendering Unconditional Generative Adversarial Network (SkrGAN) that is capable of generating high quality medical images. Our SkrGAN embedded the sketch representation to guide the unconditional medical image synthesis and generate images with realistic foreground structures. The experiments on four types of medical images, including retinal color fundus, chest X-ray, lung CT and brain MRI, showed that our SkrGAN obtained state-of-the-art performances in medical image synthesis. It demonstrated that the sketch information can benefit the structure generation. Besides, the application of retina vessel segmentation showed that the SkrGAN could be used as a data augmentation method to improve deep network training.


  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein GAN. arXiv. Cited by: §1, Figure 3, §3.0.3, Table 1.
  • [2] P. Costa, A. Galdran, and et. al. (2017) End-to-end adversarial retinal image synthesis. IEEE TMI 37 (99), pp. 781–791. Cited by: §1.
  • [3] H. Fu, J. Cheng, Y. Xu, D. W. K. Wong, J. Liu, and X. Cao (2018) Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE transactions on medical imaging 37 (7), pp. 1597–1605. Cited by: §1.
  • [4] I. J. Goodfellow, J. Pouget-Abadie, and et. al. (2014) Generative adversarial networks. NIPS. Cited by: §1.
  • [5] Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, and J. Liu (2019) CE-net: context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging. Cited by: §1.
  • [6] P. Isola, J. Zhu, and et. al. (2017)

    Image-to-image translation with conditional adversarial networks

    In CVPR, pp. 1125–1134. Cited by: §1.
  • [7] T. Karras, T. Aila, and et. al.o (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv. Cited by: Figure 1, §1, §2.2, Figure 3, §3.0.2, §3.0.3, Table 1.
  • [8] D. S. Kermany, M. Goldbaum, and et. al. (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 (5), pp. 1122–1131. Cited by: §3.0.1.
  • [9] A. Madani, M. Moradi, and et. al. (2018) Semi-supervised learning with generative adversarial networks for chest x-ray classification with ability of data domain adaptation. In ISBI, pp. 1038–1042. Cited by: §1.
  • [10] M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. arXiv. Cited by: §1.
  • [11] A. Odena, C. Olah, and J. Shlens (2017) Conditional image synthesis with auxiliary classifier GANs. In ICML, pp. 2642–2651. Cited by: Figure 1, §1, Figure 3, §3.0.3, Table 1.
  • [12] J. Ostrofsky, A. Kozbelt, and A. Seidel (2012) Perceptual constancies and visual selection as predictors of realistic drawing skill.. Psychology of Aesthetics, Creativity, and the Arts 6 (2), pp. 124–136. Cited by: §1, §2.
  • [13] G. Peyré, M. Cuturi, et al. (2019) Computational optimal transport.

    Foundations and Trends® in Machine Learning

    11 (5-6), pp. 355–607.
    Cited by: §3.0.2.
  • [14] A. Radford, L. Metz, and S. Chintala (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. international conference on learning representations. Cited by: Figure 1, §1, Figure 3, §3.0.3, Table 1.
  • [15] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241. Cited by: §2.3, §3.0.4.
  • [16] J. Staal, M. D. AbrÃmoff, and et. al. (2004) Ridge-based vessel segmentation in color images of the retina. IEEE TMI 23 (4), pp. 501–509. Cited by: §3.0.4.
  • [17] X. Wang and A. Gupta (2016) Generative image modeling using style and structure adversarial networks. In European Conference on Computer Vision, pp. 318–335. Cited by: §1.
  • [18] Z. Zhang, L. Yang, and Y. Zheng (2018) Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network. In CVPR, pp. 9242–9251. Cited by: §1, §1.
  • [19] H. Zhao, H. Li, and et. al. (2018)

    Synthesizing retinal and neuronal images with generative adversarial nets

    Medical Image Analysis 49, pp. 14–26. Cited by: §1.