Generative Adversarial Neural Networks (GANs) are state of the art machine learning models that can learn the statistical regularities of input data and then generate a nearly endless stream of synthetic examples that resemble, but do no exactly replicate, the input data. These models have been applied to generate a variety of natural images, including images of bedrooms, faces and animals. In this work GANs are applied to generate realistic looking synthetic images of prostate lesions resembling the SPIE ProstateX Challenge 2016 training data. Multiple aligned MRI modalities are generated simultaneously, and the model produces compelling results with a relatively small amount of training data.
The ability to create synthetic data that resembles real data in key statistical aspects is well studied and particularly important in the medical field where anonymity is critical. In the appropriate circumstances, machine learning or data mining can be carried out on surrogate synthetic data instead of raw sensitive data, giving improved anonymization. When there are only a small number of training examples, generated data can be used as extra training data, a powerful way to combat overfitting and increase model performance. Synthetic image generation can also be used as an aid for education and medical training.
2 Generative Models
Generative models are distinct from discriminative models because they capture the distribution of data itself, instead of the conditional probability of a label given data. This data distribution can then be sampled; a process of generating new data that ‘looks like’ real data.
is a random variable taking values from the input domain andis a random variable of associated labels. is the conditional distribution of labels given data and is the data distribution itself.
Generative models are often much more complicated than discriminative models. They must capture all the intricacies of data not just the parts specific to a label. For example, with a blood test, a statistical model may only need to look for elevated levels in one or two dimensions to indicate a pathology, but generating a whole new blood panel that looks as if it had come from a patient would require capturing complex interdependencies between different levels.
3 Generative Adversarial Models
An adversarial model is formalized as a game played between two players, with distinct competing objectives. Called the generator and a discriminator . is an ‘artist’ that tries to create realistic looking images.
is a ‘critic’ that tries to classify images as fake, created by the artist; or as a real images sampled from the world. The principal equilibrium strategy in this game is forto draw from in which case performs no better than random guessing, i.e. the best way for to fool is to create images that are indistinguishable from real images (according to ).
Interestingly, unlike most models, there is no global loss function that must be minimized, instead these models are trained to an equilibrium point where neither player can improve their performance given a small unilateral change to their strategy; where their strategy is represented by continuous neural network weights. A leap-frog gradient descent algorithm is used for training, where a gradient descent step is taken forwith held constant, then with held constant. With some luck and under conditions that are in general not well understood this algorithm can move both players into a suitable equilibrium strategy.
This method is particularly powerful if the discriminative models are large Deep Convolutional Neural Networks. If there are any recognizable statistical aberrations in the data generated by then can catch out the generator by recognizing these aberrations. Unrealistic structures are thus suppressed when training has reached equilibrium — produces highly realistic samples.
4 Practical Training of GANs
GANs are already notorious for being hard to train, equilibrium strategies are often unstable and hard to reach compared to the optima of a single function. If either or are too powerful, one will dominate the other, gradients will vanish and the models will become stuck in a poor equilibrium, often producing images that look like noise or have no content. In general and must be designed together and matched in terms of power i.e. they should be commensurate in terms of layer size and depth. Implementors should be aware that only certain combinations of generator and discriminator will work well together, and compatibility is hard to predict in advance. The authors recommend iterative development informed by existing literature, intuition, and empirical testing.
It can also be beneficial to introduce a large amount of activation noise and dropout into , allowing
to compete with a wide variety of slightly different strategies; this can help to escape from poor equilibria. Using batch normalization and special activation functions has also show to be effective in some cases.
5.1 Data Preparation
All training data is extracted from the SPIE ProstateX Challenge 2016 data set and prepared using the same method the authors used for competition entries. Patches of in size are extracted around the centres of 330 prostate lesion MRI scans at a resolution of . Three modalities are aligned and utilized: T2, ADC and . All channels are normalized to approximately lie in the range . Each input image patch has three channels, one for each modality. See figure 1 for diagram.
5.2 Generator Architecture
T. conv. / ReLU
|T. conv. / ReLU|
|T. conv. / ReLU|
The generator neural network has 5 layers and includes transposed convolutional layers 
(also called ‘deconvolutional’ layers). The input is a 25 dimensional vector of standard normal random numbers, followed by a fully connected layer and 3 transposed convolutions. See figure2 for a schematic and table 1 for layer details.
5.3 Discriminator Architecture
|conv. / L. ReLU||32||gaussian|
|conv. / L. ReLU||64||gaussian|
|conv. / L. ReLU||128||gaussian|
|global avg. pool||dropout|
The discriminator neural network has 6 layers, an initial image input layer, 3 layers of convolutions followed by global average pooling and a fully connected layer. The final hidden layer uses dropout; all other hidden layers have gaussian noise added. To try and improve gradient flow by preventing saturation ‘leaky’ ReLU activation functions are used: , where in this work. Gaussian noise is drawn from . See figure 3 for a schematic and table 2 for details.
5.4 Training Objective
Formulas essentially the same as the empirical cross entropy are used for both and loss functions:
Where is the discriminator loss function, is the generator loss function. and are the respective neural network parameters. is the probability that assigns to being real. is a set of images generated by for random normal inputs . is a sample of natural images from . The first sum of equation 1 is taken over fake images and penalizes high probabilities from , the second term is taken over real images and penalizes low probabilities. is the number of fake images in a batch, is the number of real images in a batch.
5.5 Training Procedure
A leapfrog gradient descent is used to find an equilibrium point of the GAN game. The following updates are iterated until convergence:
Where is a vector of generator neural network parameters, and are the discriminator parameters, and are their respective loss functions. The arrow indicates application of the Adam accelerated gradient descent algorithm for the update. The model is trained for 15,000 iterations with a batch size of 200 (200 fake and 200 real images).
See figure 4 for a full page comparison of real and synthetic images. Qualitatively the synthetic T2 mode has captured the rough broken textures of the real patches, the ADC mode correctly darkens the lesion centre. The mode displays large coherent blobs similar to how they appear in real data, notice that bright areas are accompanied by matching darker regions in the ADC mode, this is a benefit of simultaneously generating all modes together, they are coherent with each other.
For any random input , should fool with a high probability. Thus the input space of forms an implicit latent representation of prostate lesions. See figure 5 for example of linear interpolation between two lesion images in space. There is a smooth transition between two lesion morphologies, demonstrating the high quality of the implicit latent representation.
All included research has been independently self funded by the authors outside of the institutional system. No conflicts of interest, financial or otherwise, are declared by the authors.
The authors would like to acknowledge the organizers of the SPIE ProstateX Challenge 2016 for their hard work in organizing the competition and preparing the training data used in this work.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, et al., Eds., 2672–2680, Curran Associates, Inc. (2014).
-  A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” CoRR abs/1511.06434 (2015).
-  T. Salimans, I. Goodfellow, W. Zaremba, et al., “Improved techniques for training gans,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, et al., Eds., 2234–2242, Curran Associates, Inc. (2016).
A. Kitchen and J. Seah, “Support vector machines for prostate lesion classification,”Proc. SPIE 10134, 1013427–1013427–4 (2017).
-  V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” ArXiv e-prints 1603.07285 (2016).
-  D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR abs/1412.6980 (2014).