Deep learning has achieved significant recent successes. However, large amounts of training samples, which sufficiently cover the population diversity, are often necessary to produce high quality results. Unfortunately, data availability in the medical image domain, especially when pathologies are involved, is quite limited due to several reasons: significant image acquisition costs, protections on sensitive patient information, limited numbers of disease cases, difficulties in data labeling, and large variations in locations, scales, and appearances. Although efforts have been made towards constructing large medical image datasets, options are limited beyond using simple automatic methods , huge amounts of radiologist labor , or mining from radiologist reports . Thus, it is still an open question on how to generate effective and sufficient medical data samples with limited or no expert-intervention.
One enticing alternative is to generate synthetic training data. However, historically synthetic data is less desirable due to shortcomings in realistically simulating true cases. Yet, the advent of generative adversarial networks (GANs) 
has made game-changing strides in simulating real images and data. This ability has been further expanded with developments on fully convolutional and conditional  GANs. In particular, Isola et al. extend the conditional GAN (CGAN) concept to predict pixels from known pixels . Within medical imaging, Nie et al. use a GAN to simulate CT slices from MRI data , whereas Wolterink et al. introduce a bi-directional CT/MRI generator . For lung nodules, Chuquicusma et al.
train a simple GAN to generate simulated images from random noise vectors, but do not condition based on surrounding context.
In this work, we explore using CGAN to augment training data for specific tasks. For this work, we focus on pathological lung segmentation, where the recent progressive holistically nested network (P-HNN) has demonstrated state of the art results . However, P-HNN can struggle when there are relatively large (e.g., ) peripheral nodules touching the lung boundary. This is mainly because these types of nodule are not common in Harrison et al.’s  training set. To improve P-HNN’s robustness, we generate synthetic 3D lung nodules of different sizes and appearances, at multiple locations, that naturally blend with surrounding tissues (see Fig. 1 for an illustration). We develop a 3D CGAN model that learns nodule shape and appearance distributions directly in 3D space. For the generator, we use a U-Net-like  structure, where the input to our CGAN is a volume of interest (VOI) cropped from the original CT image with the central part, containing the nodule, erased (Fig. 1(c)). We note that filling in this region with a realistic nodule faces challenges different than generating a random 2D nodule image from scratch . Our CGAN must generate realistic and natural 3D nodules conditioned upon and consistent with the surrounding tissue information. To produce high quality nodule images and ensure their natural blending with surrounding lung tissues, we propose a specific multi-mask reconstruction loss that complements the adversarial loss.
The main contributions of this work are: (1) we formulate lung nodule generation using a 3D GAN conditioned on surrounding lung tissues; (2) we design a new multi-mask reconstruction loss to generate high quality realistic nodules alleviating boundary discontinuity artifacts; (3) we provide a feasible way to help overcome difficulties in obtaining data for ”edge cases” in medical images; and (4) we demonstrate that GAN-synthetized data can improve training of a discriminative model, in this case for segmenting pathological lungs using P-HNN .
Fig. 2 depicts an overview of our method. Below, we outline the CGAN formulation, architecture, and training strategy used to generate realistic lung nodules.
2.1 CGAN Formulation
In their original formulation, GANs  are generative models that learn a mapping from a random noise vector to an output image . The generator,
, tries to produce outputs that fool a binary classifier discriminator, which aims to distinguish real data from generated “fake” outputs. In our work, the goal is to generate synthetic 3D lung nodules of different sizes, with various appearances, at multiple locations, and have them naturally blend with surrounding lung tissues. For this purpose, we use a CGAN conditioned on the image , which is a 3D CT VOI cropped from a specific lung location. Importantly, as shown in Fig. 1(c), we erase the central region containing the nodule. The advantage of this conditional setting is that the generator not only learns the distribution of nodule properties from its surrounding context, but it also forces the generated nodules to naturally fuse with the background context. While it is possible to also condition on the random vector , we found it hampered performance. Instead, like Isola et al. , we use dropout to inject randomness into the generator.
The adversarial loss for CGANs can then be expressed as
where is the original VOI and tries to minimize this objective against an adversarial discriminator, , that tries to maximize it. Like others [12, 6], we also observe that an additional reconstruction loss is beneficial, as it provides a means to learn the latent representation from surrounding context to recover the missing region. However, reconstruction losses tend to produce blurred results because it tends to average together multiple modes in the data distribution . Therefore, we combine the reconstruction and adversarial loss together, making the former responsible for capturing the overall structure of the missing region while the latter learns to pick specific data modes based on the context. We use the loss, since the loss performed poorly in our experiments.
Since the generator is meant to learn the distribution of nodule appearances in the erased region, it is intuitive to apply loss only to this region. However, completely ignoring surrounding regions during generator’s training can produce discontinuities between generated nodules and the background. Thus, to increase coherence we use a new multi-mask loss. Formally, let be the binary mask where the erased region is filled with ’s. Let be a dilated version of . Then, we assign a higher loss weight to voxels where is equal to one:
where is the element-wise multiplication operation and is a weight factor. We find that a dilation of to voxels generally works well. By adding the specific multi-mask loss, our final CGAN objective is
where and are determined experimentally. We empirically find and works well in our experiments.
2.2 3D CGAN Architecture
Fig. 2 depicts our architecture, which builds off of Isola et al.’s 2D work , but extends it to 3D images. More specifically, the generator consists of an encoding path with convolutional layers and a decoding path with another de-convolutional layers where short-cut connections are added in a similar fashion to U-net . The encoding path takes an input VOI with missing regions and produces a latent feature representation, and the decoding path takes this feature representation and produces the erased nodule content. We find that without shortcut connections, our CGAN models do not converge, suggesting that they are important for information flow across the network and for handling fine-scale 3D structures, confirmed by others . To inject randomness, we apply dropout on the first two convolutional layers in the decoding path.
The discriminator also contains an encoding path with convolutional layers. We also follow the design principles of Radford et al.  to increase training stability, which includes strided convolutions instead of pooling operations, LeakyReLu’s in the encoding path of and , and a Tanh activation for the last output layer of .
2.3 CGAN Optimization
We train the CGAN model end-to-end. To optimize our networks, we use the standard GAN training approach , which alternates between optimizing and , as we found this to be the most stable training regimen. As suggested by Goodfellow et al. , we train to maximize rather than minimize . Training employs the Adam optimizer  with a learning rate and momentum parameters and for both the generator and discriminator.
3 Experiments and Results
We first validate our CGAN using the LIDC dataset . Then, using artificially generated nodules, we test if they can help fine-tune the state-of-the-art P-HNN pathological lung segmentation method .
3.1 3D CGAN Performance
The LIDC dataset contains chest CT scans of patients with observed lung nodules, totaling roughly nodules. Out of these, we set aside patients and their accompanying nodules as a test set. For each nodule, there can be multiple radiologist readers, and we use the union of the masks for such cases. True nodule images, , are generated by cropping cubic VOIs centered at each nodule with 3 random scales between and times larger than the maximum dimension of the nodule mask. All VOIs are then resampled to a fixed size of . Conditional images, , are derived by erasing the pixels within a sphere of diameter centered at the VOI. We exclude nodules whose diameter is less than , since small nodules provide very limited contextual information after resampling and our goal is to generate relatively large nodules. This results in roughly training sample pairs. We train the cGAN for epochs.
We tested against three variants of our method: 1) only using an all-image loss; 2) using both the adversarial and all-image loss. This is identical to Isola et al.’s approach , except extended to 3D; 3) using the same combined objective in (3), but not using the multi-mask version, i.e., only using the first term of equation (2). As reconstruction quality hinges on subjective assessment , we visually examine nodule generation on our test set. Selected examples are shown in Fig. 3.
As can be seen, our proposed CGAN produces realistic nodules of high quality with various shapes and appearances that naturally blend with surrounding tissues, such as vessels, soft tissue, and parenchyma. In contrast, when only using the reconstruction loss, results are considerably blurred with very limited variations in shape and appearance. Results from Isola et al.’s method  improve upon the only loss; however, it has obvious inconsistencies/misalignments with the surrounding tissues and undesired sampling artifacts that appear inside the nodules. It is possible that by forcing the generator to reconstruct the entire image, it distracts the generator from learning the nodule appearance distribution. Finally, when only performing the loss on the erased region, the artifacts seen in Isola et al.’s are not exhibited; however, there are stronger border artifacts between the region and the rest of the VOI. In contrast, by incorporating a multi-mask loss, our method can produce nodules with realistic interiors and without such border artifacts.
3.2 Improving Pathological Lung Segmentation
With the CGAN trained, we test whether our CGAN benefits pathological lung segmentation. In particular, the P-HNN model shared by Harrison et al.  can struggle when peripheral nodules touch the lung boundary, as these were not well represented in their training set. Prior to any performed experiments, we selected images from the LIDC dataset exhibiting such peripheral nodules. Then, we randomly chose LIDC subjects from relatively healthy subjects with no large nodules. For each of these, we pick random VOI locations, centering within to the lung boundary with random size ranging . VOIs are resampled to 646464 voxels and simulated lung nodules are generated in each VOI, using the same process as in §3.1, except the trained CGAN is only used for inference. The resulting VOIs are resampled back to their original resolution and pasted back to the original LIDC images, and then the axial slices containing the simulated nodules are used as training data ( slices) to fine-tune the P-HNN model for epochs. For comparison, we also fine-tune P-HNN using images generated by the -only loss and also Isola et al.’s CGAN.
Fig. 4 depicts quantitative results. First, as the chart demonstrates, fine-tuning using all CGAN variants improves P-HNN’s performance on peripheral lung nodules. This confirms the value in using simulated data to augment training datasets. Moreover, the quality of nodules is also important, since the results using nodules generated by only an all-image loss have the least improvement. Importantly, out of all alternatives, our proposed CGAN produces the greatest improvements in Dice scores, Hausdorff distances and average surface distances. For instance, our proposed CGAN allows P-HNN’s mean Dice scores to improve from to , and reduces the Hausdorff and average surface distance by and , respectively. In particular, worse case performance is also much better for our proposed system, showing it can help P-HNN deal with edge cases. In terms of visual quality, Fig. 5 depicts two examples. As these examples demonstrate, our proposed CGAN allows P-HNN to produce considerable improvements in segmentation mask quality at peripheral nodules, allowing it to overcome an important limitation.
We use a 3D CGAN, coupled with a novel multi-mask loss, to effectively generate CT-realistic high-quality lung nodules conditioned on a VOI with an erased central region. Our new multi-mask loss ensures a natural blending of the generated nodules with the surrounding lung tissues. Tests demonstrate the superiority of our approach over three competitor CGANs on the LIDC dataset, including Isola et al.’s state-of-the-art method . We further use our proposed CGAN to generate a fine-tuning dataset for the published P-HNN model , which can struggle when encountering lung nodules adjoining the lung boundary. Armed with our CGAN images, P-HNN is much better able to capture the true lung boundaries compared to both its original state and when it is fine-tuned using the other CGAN variants. As such, our CGAN approach can provide an effective and generic means to help overcome the dataset bottleneck commonly encountered within medical imaging.
-  Armato, S.G., McLennan, G., Bidaut, L., et al.: The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans. Medical physics 38(2), 915–931 (2011)
-  Chuquicusma, M.J., Hussein, S., Burt, J., Bagci, U.: How to fool radiologists with generative adversarial networks? a visual turing test for lung cancer diagnosis. In: Proc. IEEE ISBI. pp. 240–244 (2017)
-  Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., et al.: 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation, pp. 424–432. Springer (2016)
-  Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial nets. In: Advances in neural information processing systems. pp. 2672–2680 (2014)
-  Harrison, A.P., Xu, Z., George, K., et al.: Progressive and multi-path holistically nested neural networks for pathological lung segmentation from ct images. In: Proc. MICCAI. pp. 621–629. Springer (2017)
Jin, D., Xu, Z., Harrison, A.P., et al.: 3d convolutional neural networks with graph refinement for airway segmentation using incomplete data labels. In: International Workshop on Machine Learning in Medical Imaging. pp. 141–149. Springer (2017)
-  Karwoski, R.A., Bartholmai, B., Zavaletta, V.A., et al.: Processing of ct images for analysis of diffuse lung disease in the lung tissue research consortium. In: Proc. SPIE 6916 (2008)
-  Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
-  Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
-  Nie, D., Trullo, R., Lian, J., et al.: Medical image synthesis with context-aware generative adversarial networks. In: Proc. MICCAI. pp. 417–425. Springer (2017)
-  Pathak, D., Krahenbuhl, P., Donahue, J., et al.: Context encoders: Feature learning by inpainting. In: Proc. IEEE CVPR. pp. 2536–2544 (2016)
-  Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
-  Wang, X., Peng, Y., Lu, L., et al.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proc. IEEE CVPR. pp. 3462–3471 (2017)
-  Wolterink, J.M., Dinkla, A.M., Savenije, M.H., et al.: Deep mr to ct synthesis using unpaired data. In: International Workshop on Simulation and Synthesis in Medical Imaging. pp. 14–23. Springer (2017)