The performance of deep learning systems in medical image analysis applications is constrained by the quantity of high-quality image annotations. Large-scale datasets for training and testing are essential to reduce the variance of the trained networks in supervised learning as well as providing a reliable estimate of their long-term performance after deployment. Most of the medical image datasets only scale from hundreds to thousands of patients acquired from few clinical imaging sites. Different from annotating the natural image datasets, the diagnostic AI applications normally require the annotators to have years of medical training, and thus are expensive to scale due to the time and financial cost. The distribution of such images is highly biased towards only a small portion of the global population. Also, rare abnormalities may have too few exemplars in the training dataset to generalize well to prospective patients.
The data efficiency of such learning systems is thus essential. It can be either improved via (1) better supervised learning algorithms, such as the network architectures, optimization algorithms, and objective functions, or (2) synthesizing images and their annotations from the manually annotated images, for example, the data augmentation techniques with simple image transformations. We explore the latter path in this work by synthesizing objects in medical images with high fidelity while allowing their properties to be manipulable. This manipulability is important in the context of limited medical image datasets as it allows synthesis to (1) reproduce the variability of semantically meaningful features that are observed in the clinic but not in the limited dataset and (2) over-sample realistic but challenging samples where system performance is more clinically important.
We propose the 3D manipulable object synthesis with structured object decomposition and adversarial image refining. We start with training a conditional variational autoencoder (cVAE)[12, 16] on mesh vertices to generate realistic 3D object meshes. We then train an image decomposition network to decompose the object patch into a segmentation mask and a 1D vector containing the residual information related to the object intensity, texture, etc. A decoder network is trained to reconstruct the object patch from the decomposed components and further blend the reconstructed object into its context. In the last training stage, the decoder is fine-tuned by applying two adversarial discriminators to synthesize objects on patches initially without the target objects. Different from the existing object in-painting methods in medical images, the proposed framework allows the shapes, sizes and intensities of the generated objects to be manipulated. We evaluated the proposed framework on an example application of synthesizing lung nodules in 3D CT images and using the synthetic nodules to improve the nodule detection performance. With a dataset of 2800 3D CT volumes in total, we show that the synthetic nodule patches could improve the lung nodule detection performance. Our contributions can be summarized as: (1) an object synthesis framework which can synthesize objects, such as lesions, in 3D medical images with manipulable properties at random locations; (2) investigating the application of using synthetic patches to improve the detection of pulmonary nodules.
Ii Related Work
Ii-a Medical Image In-painting
, the authors proposed to use a fully convolutional neural network to in-paint lung nodules in a masked area of a 3D chest CT image patch. The network output is sent to an adversarial discriminator network to ensure the realism of the synthetic nodule. The appearance of the in-painted nodules is only conditioned on the context of the in-painting area, the effect on the data augmentation might be highly limited in many applications if the diversity the synthetic objects cannot be controlled. In a similar work, the generated objects are conditioned by the segmentation masks. The object texture is controlled by the noise pixels in the input mask. The masks used in this work are directly from the manually labelled ground-truths. It is thus hard to synthesize objects with shape diversity. Though both studies showed that synthetic data could be helpful for improving the performance of the supervised learning tasks, the capability of manipulating the object synthesis is still lacking in such methods. In , the authors propose to obtain structured representations of medical image by training an auto-encoder network to factorize the input image into a segmentation mask and a 1D vector. Inspired by this work, we believe that manipulating the factorized image components could allow the manipulation of the synthetic image objects.
Ii-B Disentangled Image Generation
Using the semantic-level information to guide the image synthesis or in-painting has been explored by several computer vision studies with natural images. InfoGAN was proposed as an extension to GANs, to learn disentangled representations using mutual information in an unsupervised manner. The -VAE  and Factor-VAE  were also proposed for unsupervised image disentangling. DRGAN  was proposed to learn both a generative and a discriminative representation from one or multiple face images to synthesize identity-preserving faces at target poses. In , the authors proposed to combine the information from both the semantic segmentation mask and the object bounding boxes to manipulate the object in-painting. Different from most of the image synthesis methods starting with random noises, in 
, the authors proposed a two-stage training strategy to train image synthesis networks. The first training stage trains an auto-encoder network to obtain the embedding of the real images; the second stage maps the noise from a Gaussian distribution to the embedding distribution and then to the real data. Similar to
, we use the KL divergence to train the embedding distribution to be close to a standard normal distribution instead of training another distribution mapping. Sun et al. and Di et al. proposed to split the face synthesis into two sub-tasks: (1) facial landmark generation from image context (2) facial landmark conditioned head in-painting[26, 5].
We first obtain a 3D shape synthesizer by training a conditional VAE on the annotated segmentation masks. Then an auto-encoder-like network is trained to (1) decompose the object of interests into a segmentation mask and a residual embedding vector, and (2) reconstruct the object of interests from a segmentation and a residual vector. Finally, we finetune the decoder of the last stage to blend the reconstructed object into the image background originally without the presence of the target object.
Iii-a Shape Synthesis in 3D
As shown in Fig. 1, we use a conditional variational autoencoder (cVAE) to obtain (1) an encoder of the object shapes that compresses the shape parameters into the distribution of a compact 1D vector, and (2) a decoder capable of reconstructing the shape given a vector randomly sampled from the standard normal distribution. We use the coordinates of the 3D mesh vertices to represent the object shape.
To approximate different nodules with a consistent parameterization, we fit a template mesh with fixed topology to binary segmentation masks. Object shapes are thereby parameterized by the mesh vertex coordinates. Specifically, a spherical template mesh with vertices is registered to the marching-cube-based isosurface of each mask using the coherent-point-drift algorithm . Each 3D shape is thus represented by a 1D vector of the length . A cVAE is trained with the input as
where is the encoder with the weights that maps the shape parameters to the distribution of the latent variables; is an embedding vector drawn from the distribution ; is an output shape parameter vector that is reconstructed by the decoder ; represents 3 conditional parameters that controls the scale and aspect ratio of the generated shape which are respectively the L2 norm of all the 3D coordinates in each dimension , , . Although the scale of an object could be set analytically, our construction captures the correlations between nodule shape and size. The cVAE is optimized by combining the L1 reconstruction loss and the KL divergence of the latent variables. To generate a random shape, is sampled from the standard normal distribution as
A result binary mask is derived from the generated mesh by 3D rasterization.
Iii-B Stage1: Image Object Decomposition
To generate an object in an image , we formulate the problem as learning an invertible distribution matching
where is a set of latent variables that could represent the objects of interests. To fit the generated object in a real-world image , an additional transform is needed to blend the object into the background, making image indistinguishable to the real world images containing similar objects
where defines the operation of fusing the generated object and an real-world image . To make part of manipulable and interpretable, can be decomposed as where contains the parameters that can be specified with known properties such as the size and the intensity; contains the residual information needed to represent the object. In this work, we decompose as the instance segmentation of the object and a residual vector that contains the information of the textures and boundary appearance. Given an image patch and the instance segmentation of the object , we train an auto-encoder like architecture to decompose the masked image patch into the shape mask and a residual vector as
where is built with a 2D hour-glass network which outputs a binary segmentation mask with the same size as the input. We use to denote Hadamard product throughout the paper. is the bounding box region covering the object . The binary dice loss is used to optimize the network to segment the correct masks.
By applying the global average pooling on the output features of the bottom block of , we obtain a 1D vector and forward it to two fully connected layers to output the distribution parameters of where is sampled from. gives a smooth manifold for randomly sampling for the training stage 2 and inference.
The input of is the permuted tensor of with where and are respectively the batch size and the feature dimension. progressively upsamples with upsampling layers and 2D
convolutional blocks with the stridestill the resampled features are of the same size as . Then the upsampled features are concatenated with and fed into a Res-UNet to output the masked area of the input image where is a rectangle mask surrounding the object . The reconstructed object area is added to the background patch to form the initial in-painting. To blend the reconstructed object into the context, the fused patch is then fed into a fully convolutional neural network to reconstruct the entire patch in . Another segmentation network is added on top of the final reconstruction. It is optimized to segment the mask from the final output to reproduce , regularizing to preserve the original shape. The reconstruction loss can be summarized as
Here, the term is the KL divergence that regularizes the distribution , so that we can sample from a standard normal distribution .
Iii-C Stage 2: Object Synthesis on Random Patches
After the image decomposition training, the network is discarded since it was used for helping the network to learn the latent embedding and a segmentation mask to an image object. The weights of the networks , and are preserved for finetuning the system to synthesis objects at random locations of the images. For this training stage, we use random negative patches that do not contain the object of interests as the input background patches. The trained 3D shape synthesizer is used to generate masks with different sizes and shapes. The masks are fed into the object reconstruction network together with a random embedding vector sampled from the standard normal distribution . The masked output of is added to the masked background patch to form a coarse synthetic patch. Different from the stage of training the image decomposition, we use the synthesized mask here to mask out the background rather than using a squared mask because (1) the mask is more reliable at this stage (2) the final synthesized image could otherwise suffer from unnecessary artefacts at the squared mask boundaries. This patch is fed into to blend the synthetic object into its context and obtain the final output . We use two Wasserstein GAN (WGAN) [1, 7] discriminators and on to improve the appearance of the output object. is applied to the masked area of the output patch ; is applied to a larger region of the output to discriminate if the synthetic object has been blended into the background as the real objects. The weights of are frozen throughout this stage. Both discriminators are built with a small DenseNet  with spectral normalization  in each convolutional layer. The objective function for the generator can be summarized as
where , has the same definition as the terms in Eq. 12; the here is the L1 loss between the surrounding areas - of the final reconstruction and the corresponding areas of the original patch
is the weighted sum of the losses from the local discriminator and the context discriminator which are trained with the WGAN criteria
where ; ; is the gradient penalty .
The trained generator networks and can be used for placing random synthetic objects of diameters at random locations in a 3D image volume. Though the 3D shape synthesizer is trained by conditioning on the size to learn the correlations between the shape distribution and the object sizes, it does not guarantee that the output mask will be of the precise size as expected. We instead re-scale the generated mesh to the target size and rasterize it to a 3D mask. We crop the 3D patch surrounding and feed the decomposed 2D slices to the trained and . Before adding the output of to the masked background, we multiply it with a scale factor to adjust the intensity of the generated object. The 2D outputs of are stitched into a 3D patch before being put back to to the original 3D volume.
Iii-D Application: Hard-Case Sampling of Synthetic Patches for Improving Lung Nodule Detection
One example application of the proposed framework is to improve the performance of the pulmonary nodule detection systems. Such systems are normally built with 2-stage coarse-to-fine network training as in 
: (1) A fully convolutional neural network with a large receptive field is trained to obtain the nodule candidates; (2) A patch classifier is trained on the candidate patches to reduce the number of false positives. When training the 3D patch classifier network, the positive patches are sampled from both the synthetic patches and the real patches in each batch. We control the proportion of the synthetic patches to be between 20% to 50%. The hard cases in the synthetic patches can be selected based on the output of a patch classifier trained with real data only and the output of the trained discriminators. Since the synthetic patches are all constructed to contain a nodule in it, the patches with low classifier probability are considered as hard positives. At the same time, we would also like to only preserve the nodule patches that look real, because the knowledge learned from such patches could be generalized to the unseen data. We use the output from the local discriminatorto discard 20% the synthetic patches with low quality from the training set.
Iv Experiments and Results
We acquired the chest CT images with lung nodules from the LUNA16 challenge dataset, the NLST cohort  and an in-house dataset. The breakdown of the three datasets is shown in Table I. We reserved the test images from our in-house dataset which were reviewed by experienced radiologists. Because the original NLST images were only annotated with the slice number of the nodules, we had radiologists annotate the precise 3D locations of the nodules. The NLST images were only used for extracting positive training patches since not all the nodules were guaranteed to be annotated. We extracted positive training patches with a nodule centered in the image. The negative training patches are sampled within the lung area without nodule appearance. The patches are sampled with the size under the resolution of . The image patches are clipped with Hounsfield unit (HU) values and rescaled to . We generated the segmentation masks of the lung nodules for all the positive CT patches with a 3D DenseUNet that was trained on 710 images (LUNA  subset 2 to subset 9) obtained from the LIDC dataset . The segmentation masks are used for both training the shape synthesizer and the image object decomposition network
. With the trained nodule synthesizer, we synthesized 47400 3D positive nodule patches with the background patches randomly sampled from the lung area of the training images in all three datasets. To generate the synthetic masks, we randomly sampled the shape embedding from a standard normal distribution and re-scaled the synthetic meshes to make sure the diameters of the synthetic nodules are uniformly distributed between 4mm and 30mm.
Iv-B Architecture and Training
The shape synthesizer VAE is built with a multi-layer perceptron with the ReLU activation. The encoder has 3 layers which compress the input of 1452 template 3D vertices to the variational embedding of 100 variables. The decoder is built with the symmetric architecture with a linear output. This VAE directly learns the distribution of the 3D coordinates of the transformed meshes. The network was optimized using AMSGrad with the learning rate of and the batch size of 512.
The encoder of is built with 3 ResNet blocks with a max-pooling each and a bottom ResNet block without max-pooling. is obtained from the output of the bottom block with 256 feature maps. The feature maps are firstly converted into a 1D vector using the global average pooling and fed into two separate fully connected layers to obtain the variables for sampling . The firstly uses the 6 pairs of a upsampling layer and a convolutional layer to upsample to the original patch size. The feature maps are then concatenated with the predicted image segmentation mask and fed into a Res-UNet. has the identical architecture as the Res-UNet in . AMSGrad  is used for optimizing all the networks used in image decomposition and refining. We use the initial learning rate of for training all the networks in the generators except the discriminators. The discriminators are trained with the initial learning rate of . To balance the GAN loss with the L1 loss in the training stage 2, we fixed to be .
To compare our proposed methods with the conventional in-painting methods, we also implemented a baseline 3D in-painting method that resembles the pulmonary nodule in-painting framework proposed in . The generator network was built with a 3D Res-UNet. A WGAN discriminator was built with a 3D DenseNet. Note that these networks are 3D networks, as it does not make sense to stitch the 2D network outputs into 3D if the generator is not trained by conditioning on the nodule shapes. The input of the network is a 3D lung CT patch with the center area cropped out. The networks are optimized using a combined L1 loss of the local and global areas together with the WGAN adversarial loss. Consistent to the observation in , we also found conditioning on the random vector could hamper the performance. We introduce the generation diversity by test-time dropout in the generator network.
Iv-C Qualitative Analysis of the Synthesis Networks
In Fig. 4, we show the example shape meshes generated by the shape synthesis VAE. By sampling the hidden embedding variables from a standard normal distribution, the VAE is able to output diverse 3D meshes. Though the network was initialized randomly, most of the generated meshes tend to be roundish, resembling real pulmonary nodules. We show in Fig. 5 how the same generated mesh can be re-scaled to generate lung nodules of different sizes in the image. The refine networks are able to slightly alter the appearance of the nodule to blend the generated nodule into the context. Though the image synthesis networks were trained in 2D, we show in Fig. 6 that the nodule in contiguous slices could remain consistent to its volumetric shape. The generated object slices are conditioned on the segmentation slices as well as its residual embedding. In Fig. 7, we show the zoomed-in synthetic nodules with the same masks and different randomly sampled residual vectors. The residual vectors could manipulate the textures inside the synthetic nodules as well as slightly alter the nodule boundaries. By fixing the shapes and the residual vectors, we show that the intensity of the generated nodules can also be controlled by the intensity scale factor in Fig. 8. In Fig. 9, we compare the synthesis results from the network with and without the last training stage with WGAN discriminators. The adversarial training is helpful for refining the intensities at the core and the boundaries of the nodule to blend them into the tissue context. In Fig. 10, we present example patches from the real and synthetic patches. We define the patch as easy when the classifier output is larger than 95% or hard when the classifier output is smaller than 5%. In both the real and fake patches, the nodules with high-intensity solid cores are easier to be classified. The hard patches tend to be of smaller sizes and low average intensity. It also confuses the classifier when the nodule is hidden beside the pulmonary wall or other high-intensity tissues such as the vessels or other types of abnormalities. We define the patch with low fidelity when the mean output of the local discriminator is in the lower 20% of the training set. It is easier for the discriminator to tell a synthetic patch contains a nodule with larger than the average diameter or irregular shape. The generator also does not handle the boundary well when it is asked to generate a large nodule besides the pulmonary wall because it is supposed to preserve the nodule boundaries of the training process. In Fig. 11, we compare example results (Ours) of our proposed methods with the results of the baseline method (Baseline).
Iv-D Quantitative Analysis of the Synthesis Networks
We focus on the results of the second stage of the nodule detection framework by freezing the candidate generation network and only training the 3D patch classifier with different settings. The patch classifier is a 3D ResNet50 with the weights pre-trained the videos in the Kinetic dataset [8, 14]. In Fig. 12, We compare the FROC curves and the competition performance metric (CPM) scores  on the test images for sampling different proportion of the synthetic patches together with all the real patches (1) training without sampling from the synthetic patches (2) with 20% of the patches sampled from all the synthetic samples (20%) (3) with 50% of the patches sampled from the synthetic samples (50%). We show that the synthetic data can be helpful for improving the detection performance especially when the number of false positives is low. Using more than only slightly improve the classification performance. The confidence bands were generated with bootstrapping. With the same sampling strategy, the patches generated by the baseline in-painting method (Baseline) did not show improvement. In our experiment, we also tried to sample the positive patches only from the synthetic patches which did not work well because the synthetic patches do not cover the entire distribution in the real data, for example, sub-solid nodules. We also obtained higher detection performance by only sampling from the hard cases selected based on the criteria described in S. III-D (Hard). We observed that training with the batches mixed with real and the selected hard-synthetic patches work (Scratch) slightly better than finetuning the classifier already trained on real-data only (Finetune).
In this paper, we proposed the manipulable object synthesis framework for generating objects in medical images. The proposed framework is evaluated by generating synthetic lung nodules in 3D CT volumes. By showing the qualitative results, we demonstrate that the proposed framework could synthesize realistic lung nodules at random locations with different sizes, shapes, textures, average intensities. By evaluating on an example application of lung nodule detection using a combined dataset of CT volumes, we show that the nodules synthesized by the proposed methods could improve the overall detection performance by 8.44% CPM score. The detection performance can be further improved by selecting only the hard samples from the synthetic patches based on the outputs from both the patch classifier and the discriminator. The limitations of the current framework include: (1) it does not generate high-fidelity nodules close to the pulmonary wall since the networks are trained to preserve the complete nodule shapes. This might be dealt with by constraining the nodule generation with more detailed semantic segmentation masks; and (2) the proposed simple shape synthesis methods do not support generating objects with more complex structures, for example, nested models with multiple semantic labels or connected components.
Disclaimer: This feature is based on research, and is not commercially available. Due to regulatory reasons, its future availability cannot be guaranteed.
-  M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. ArXiv e-prints, page arXiv:1701.07875, Jan. 2017.
-  S. G. Armato et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Medical Physics, 38(2):915–931, 1 2011.
A. Chartsias, T. Joyce, G. Papanastasiou, S. Semple, M. Williams, D. Newby,
R. Dharmakumar, and S. A. Tsaftaris.
Factorised spatial representation learning: Application in
semi-supervised myocardial segmentation.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018.
-  X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. ArXiv e-prints, page 1606.03657, 6 2016.
-  X. Di, V. A. Sindagi, and V. M. Patel. GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks. ArXiv e-prints, page arXiv:1710.00962, Oct. 2017.
-  M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan. GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification. ArXiv e-prints, page arXiv:1803.01229, Mar. 2018.
-  I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved Training of Wasserstein GANs. ArXiv e-prints, page arXiv:1704.00028, Mar. 2017.
-  K. Hara, H. Kataoka, and Y. Satoh. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? ArXiv e-prints, page arXiv:1711.09577, Nov. 2017.
-  I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. ICLR2017, 2017.
-  S. Hong, X. Yan, T. Huang, and H. Lee. Learning Hierarchical Semantic Image Manipulation through Structured Representations. ArXiv e-prints, page arXiv:1808.07535, Aug. 2018.
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger.
Densely connected convolutional networks.
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017.
-  D. Jimenez Rezende, S. Mohamed, and D. Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ArXiv e-prints, page arXiv:1401.4082, Jan. 2014.
-  D. Jin, Z. Xu, Y. Tang, A. P. Harrison, and D. J. Mollura. CT-Realistic Lung Nodule Simulation from 3D Conditional Generative Adversarial Networks for Robust Lung Segmentation. ArXiv e-prints, page arXiv:1806.04051, June 2018.
-  W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman. The Kinetics Human Action Video Dataset. ArXiv e-prints, page arXiv:1705.06950, May 2017.
-  H. Kim and A. Mnih. Disentangling by Factorising. ArXiv e-prints, page arXiv:1802.05983, Feb. 2018.
-  D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling. Semi-Supervised Learning with Deep Generative Models. ArXiv e-prints, page arXiv:1406.5298, June 2014.
-  D. Korkinof, T. Rijken, M. O’Neill, J. Yearsley, H. Harvey, and B. Glocker. High-Resolution Mammogram Synthesis using Progressive Generative Adversarial Networks. ArXiv e-prints, page arXiv:1807.03401, July 2018.
-  F. Lau, T. Hendriks, J. Lieman-Sifry, B. Norman, S. Sall, and D. Golden. ScarGAN: Chained Generative Adversarial Networks to Simulate Pathological Tissue on Cardiovascular MR Scans. ArXiv e-prints, page arXiv:1808.04500, Aug. 2018.
-  L. Ma, Q. Sun, S. Georgoulis, L. Van Gool, B. Schiele, and M. Fritz. Disentangled Person Image Generation. ArXiv e-prints, page arXiv:1712.02621, Dec. 2017.
-  T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral Normalization for Generative Adversarial Networks. ArXiv e-prints, page arXiv:1802.05957, Feb. 2018.
-  A. Myronenko and X. Song. Point set registration: Coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12):2262–2275, Dec 2010.
-  M. Niemeijer, M. Loog, M. D. Abramoff, M. A. Viergever, M. Prokop, and B. van Ginneken. On combining computer-aided detection systems. IEEE Transactions on Medical Imaging, 30(2):215–223, Feb 2011.
-  S. J. Reddi, S. Kale, and S. Kumar. On the Convergence of Adam and Beyond. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
-  A. A. A. Setio et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge. Medical image analysis, 42:1–13, 12 2017.
-  H.-C. Shin, N. A. Tenenholtz, J. K. Rogers, C. G. Schwarz, M. L. Senjem, J. L. Gunter, K. Andriole, and M. Michalski. Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks. ArXiv e-prints, page arXiv:1807.10225, July 2018.
-  Q. Sun, L. Ma, S. J. Oh, L. Van Gool, B. Schiele, and M. Fritz. Natural and Effective Obfuscation by Head Inpainting. ArXiv e-prints, page arXiv:1711.09001, Nov. 2017.
-  N. L. S. T. R. Team. The national lung screening trial: overview and study design. Radiology, 258(1):243–253, 2011.
-  L. Tran, X. Yin, and X. Liu. Disentangled Representation Learning GAN for Pose-Invariant Face Recognition. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1283–1292. IEEE, 7 2017.
-  B. Wang, G. Qi, S. Tang, L. Zhang, L. Deng, and Y. Zhang. Automated pulmonary nodule detection: High sensitivity with few candidates. In A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, and G. Fichtinger, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, pages 759–767, Cham, 2018. Springer International Publishing.
-  E. Wu, K. Wu, D. Cox, and W. Lotter. Conditional Infilling GANs for Data Augmentation in Mammogram Classification. ArXiv e-prints, page arXiv:1807.08093, July 2018.
-  B. Zhao, X. Wu, Z.-Q. Cheng, H. Liu, Z. Jie, and J. Feng. Multi-View Image Generation from a Single-View. In 2018 ACM Multimedia Conference on Multimedia Conference - MM ’18, pages 383–391, New York, New York, USA, 2018. ACM Press.