It is widely known that sufficient data volume is necessary for training a successful machine learning algorithm  for medical image analysis. Data with high class imbalance or of insufficient variability  leads to poor classification performance. This often proves to be problematic in the field of medical imaging where abnormal findings are by definition uncommon. Moreover, in the case of image segmentation tasks, the time required to manually annotate volumetric data only exacerbates this disparity; manually segmenting an abnormality in three dimensions can require upwards of fifteen minutes per study making it impractical in a busy radiology practice. The result is a paucity of annotated data and considerable challenges when attempting to train an accurate algorithm. While traditional data augmentation techniques (e.g., crops, translation, rotation) can mitigate some of these issues, they fundamentally produce highly correlated image training data.
In this paper we demonstrate one potential solution to this problem by generating synthetic images using a generative adversarial network (GAN) , which provides an additional form of data augmentation and also serves as a effective method of data anonymization. Multi-parametric magnetic resonance images (MRIs) of abnormal brains (with tumor) are generated from segmentation masks of brain anatomy and tumor. This offers an automatable, low-cost source of diverse data that can be used to supplement the training set. For example, we can alter the tumor’s size, change its location, or place a tumor in an otherwise healthy brain, to systematically have the image and the corresponding annotation. Furthermore, GAN trained on a hospital data to generate synthetic images can be used to share the data outside of the institution, to be used as an anonymization tool.
Medical image simulation and synthesis have been studied for a while and are increasingly getting traction in medical imaging community . It is partly due to the exponential growth in data availability, and partly due to the availability of better machine learning models and supporting systems. Twelve recent research on medical image synthesis and simulation were presented in the special issue of Simulation and Synthesis in Medical Imaging .
This work falls into the synthesis category, and most related works are those of Chartsias et al  and Costa et al . We use the publicly available data set (ADNI and BRATS) to demonstrate multi-parametric MRI image synthesis and Chartsias et al  use BRATS and ISLES (Ischemic Stroke Lesion Segmentation (ISLES) 2015 challenge) data set. Nonetheless, evaluation criteria for synthetic images were demonstrated on MSE, SSIM, and PSNR, but not directly on diagnostic quality. Costa et al  used GAN to generate synthetic retinal images with labels, but the ability to represent more diverse pathological pattern was limited compared to this work. Also, both previous works were demonstrated on 2D images or slices/views of 3D images, whereas in this work we directly process 3D input/output. The input/output dimension is 4D when it is multi-parametric (T1/T2/T1c/Flair). We believe processing data as 3D/4D in nature better reflects the reality of data and their associated problems.
Reflecting the general trend of the machine learning community, the use of GANs in medical imaging has increased dramatically in the last year. GANs have been used to generate a motion model from a single preoperative MRI , upsample a low-resolution fundus image , create a synthetic head CT from a brain MRI , and synthesizing T2-weight MRI from T1-weighted ones (and vice-versa) . Segmentation using GANs was demonstrated in [22, 21]. Finally, Frid-Adar et al. leveraged a GAN for data augmentation, in the context of liver lesion classification . To the best of our knowledge, there is no existing literature on the generation of synthetic medical images as form of anonymization and data augmentation for tumor segmentation tasks.
We use two publicly available data set of brain MRI:
Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set
The ADNI was launched in 2003 as a public-private partnership, led by principal investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information on the ADNI study, see www.adni-info.org. We follow the approach of  that is shown to be effective for segmenting the brain atlas of ADNI data. The atlas of white matter, gray matter, and cerebrospinal fluid (CSF) in the ADNI T1-weighted images are generated using the SPM12  segmentation and the ANTs SyN  non-linear registration algorithms. In total, there are 3,416 pairs of T1-weighted MRI and their corresponding segmented tissue class images.
Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) data set
BRATS utilizes multi-institutional pre-operative MRIs and focuses on the segmentation of intrinsically heterogeneous (in appearance, shape, and histology) brain tumors, namely gliomas . Each patient’s MRI image set includes a variety of series including T1-weighted, T2-weighted, contrast-enhanced T1, and FLAIR, along with a ground-truth voxel-wise annotation of edema, enhancing tumor, and non-enhancing tumor. For more details about the BRATS data set, see braintumorsegmentation.org. While the BRATS challenge is held annually, we used the BRATS 2015 training data set which is publicly available.
2.2 Dataset Split and Pre-Processing
As a pre-processing step, we perform skull-stripping  on the ADNI data set as skulls are not present in the BRATS data set. The BRATS 2015 training set provides 264 studies, of which we used the first 80% as a training set, and the remaining 20% as a test set to assess final algorithm performance. Hyper-parameter optimization was performed within the training set and the test set was evaluated only once for each algorithm and settings assessed. Our GAN operates in 3D, and due to memory and compute constraints, training images were cropped axially to include the central 108 slices, discarding those above and below this central region, then resampled to for model training and inference. For a fair evaluation of the segmentation performance to the BRATS challenge we used the original images with a resolution of for evaluation and comparison. However, it is possible that very small tumor may get lost by the downsampling, thus affecting the final segmentation performance.
The image-to-image translation conditional GAN (pix2pix) model introduced in  is adopted to translate label-to-MRI (synthetic image generation) and MRI-to-label (image segmentation). For brain segmentation, the generator G is given a T1-weighted image of ADNI as input and is trained to produce a brain mask with white matter, grey matter and CSF. The discriminator D on the other hand, is trained to distinguish “real” labels versus synthetically generated “fake” labels. During the procedure (depicted in Figure 1 (a)) the generator G learns to segment brain labels from a T1-weighted MRI input. Since we did not have an appropriate off-the-shelf segmentation method available for brain anatomy in the BRATS data set, and the ADNI data set does not contain tumor information, we first train the pix2pix model to segment normal brain anatomy from the T1-weighted images of the ADNI data set. We then use this model to perform inference on the T1 series of the BRATS data set. The segmentation of neural anatomy, in combination with tumor segmentations provided by the BRATS data set, provide a complete segmentation of the brain with tumor.
The synthetic image generation is trained by reversing the inputs to the generator and training the discriminator to perform the inverse task (i.e., “is this imaging data acquired from a scanner or synthetically generated?” as opposed to “is this segmentation the ground-truth annotation or synthetically generated?” – Figure 1 (b)). We generate synthetic abnormal brain MRI from the labels and introduce variability by adjusting those labels (e.g., changing tumor size, moving the tumor’s location, or placing tumor on a otherwise tumor-free brain label). Then GAN segmentation module is used once again, to segment tumor from the BRATS data set (input: multi-parametric MRI; output: tumor label). We compare the segmentation performance 1) with and without additional synthetic data, 2) using only the synthetic data and fine-tuning the model on 10% of the real data; and compare their performance of GAN to a top-performing algorithm111https://github.com/taigw/brats17 from the BRATS 2017 challenge.
3.1 Data Augmentation with Synthetic Images
The GAN trained to generate synthetic images from labels allows for the generation of arbitrary multi-series abnormal brain MRIs. Since we have the brain anatomy label and tumor label separately, we can alter either the tumor label or the brain label to get synthetic images with the characteristics we desire. For instance, we can alter the tumor characteristics such as size, location of the existing brain and tumor label set, or place tumor label on an otherwise tumor-free brain label. Examples of this are shown in Figure 3.
The effect of the brain segmentation algorithm’s performance has not been evaluated in this study.
Since the GAN was first trained on 3,416 pairs of T1-weighted (T1) images from the ADNI data set, generated T1 images are of the high quality, and, qualitatively difficult to distinguish from their original counterparts. BRATS data was used to train the generation of non-T1-weighted image series. Contrast-enhanced T1-weighted images use the same image acquisition scheme as T1-weighted images. Consequently, the synthesized contrast-enhanced T1 images appear reasonably realistic, although higher contrast along the tumor boundary is observed in some of the generated images. T2-weighted (T2) and FLAIR image acquisitions are fundamentally different from the T1-weighted images, resulting in synthetic images that are less challenging to distinguish from scanner-acquired images. However, given a sufficiently large training set on all these modalities, this early evidence suggests that the generation of realistic synthetic images on all the modalities may be possible.
Other than increasing the image resolution and getting more data especially for the sequences other than T1-weighted images, there are still a few important avenues to explore to improve the overall image quality. For instance, more attention likely needs to be paid for the tumor boundaries so it does not look superimposed and discrete when synthetic tumor is placed. Also, performance of brain segmentation algorithm and its ability to generalize across different data sets needs to be examined to obtain higher quality synthetic images combining data sets from different patient population.
The augmentation using synthetic images can be used in addition to the usual data augmentation methods such as random cropping, rotation, translation, or elastic deformation . Moreover, we have more control over the augmented images using the GAN-based synthetic image generation approach, that we have more input-option (i.e., label) to perturb the given image than the usual data augmentation techniques. The usual data augmentation methods rely mostly on random processes and operates on the whole image level than specific to a location, such as tumor. Additionally, since we generate image from the corresponding label, we get more images for training without needing to go through the labor-intensive manual annotation process. Figure 4 shows the process of training GAN with real and synthetic image and label pairs.
3.2 Generating Anonymized Synthetic Images with Variation
Protection of personal health information (PHI) is a critical aspect of working with patient data. Often times concern over dissemination of patient data restricts the data availability to the research community, hindering development of the field. While removing all DICOM metadata and skull-stripping will often eliminate nearly all identifiable information, demonstrably proving this to a hospital’s data sharing committee is near impossible. Simply de-identifying the data is insufficient. Furthermore, models themselves are subject to caution when derived from sensitive patient data. It has been shown  that private data can be extracted from a trained model.
Development of a GAN that generates synthetic, but realistic, data may address these challenges. The first two rows of Figure 3 illustrate how, even with the same segmentation mask, notable variations can be observed between the generated and original studies. This indicates that the GAN produces images that do not reflect the underlying patients as individuals, but rather draws individuals from the population in aggregate. It generates new data that cannot be attributed to a single patient but rather an instantiation of the training population conditioned upon the provided segmentation.
|Method||Real||Real + Synthetic||Synthetic only||Synthetic only,|
|fine-tune on 10% real|
|GAN-based (no aug)||0.64/0.14||0.80/0.07||0.25/0.14||0.80/0.18|
|GAN-based (with aug)||0.81/0.13||0.82/0.08||0.44/0.16||0.81/0.09|
|Wang et al. ||0.85/0.15||0.86/0.09||0.66/0.13||0.84/0.15|
Dice score evaluation (mean / standard deviation) of GAN-based segmentation algorithm and BRATS’17 top-performing algorithm
, trained on “real” data only; real + synthetic data; and training on synthetic data only and fine-tuning the model on 10% of the real data. GAN-based models were trained both with (with aug) and without (no aug) including the usual data augmentation techniques (crop, rotation, translation, and elastic deformation) during training. All models were trained for 200 epochs to convergence.
4 Experiments and Results
4.1 Data Augmentation using Synthetic Data
Dice score evaluation of the whole tumor segmentation produced by the GAN-based model and the model of Wang et al.  (trained on real and synthetic data) are shown in Table 1. The segmentation models are trained on 80% of the BRATS’15 training data only, and the training data supplemented with synthetic data. Dice scores are evaluated on the 20% held-out set from the BRATS’15 training data. All models are trained for 200 epochs on NVIDIA DGX systems.
A much improved performance with the addition of synthetic data is observed without usual data augmentation (crop, rotation, elastic deformation; GAN-based (no-aug)). However, a small increase in performance is observed when added with usual data augmentation (GAN-based (no-aug)), and it applies also to the model of Wang et al.  that incorporates usual data augmentation techniques.
Wang et al. model operates in full resolution (256x256) combining three 2D models for each axial/coronal/sagittal view, whereas our model and generator operates in half the resolution (128x128x54) due to GPU memory limit. We up-sampled the GAN-generated images twice the generated resolution for a fair comparison with BRATS challenge, however it is possible that very small tumor may get lost during the down-/up- sampling. A better performance may be observed using the GAN-based model with an availability of GPU with more memory. Also, we believe that the generated synthetic images having half the resolution, coupled with the lack of the image sequences for training other than T1-weighted ones possibly led to the relatively small increase in segmentation performance compared to using the usual data augmentation techniques. We carefully hypothesize that with more T2/Flair images being available, better image quality will be observed for these sequences and so better performance for more models and tumor types.
4.2 Training on Anonymized Synthetic Data
We also evaluated the performance of the GAN-based segmentation on synthetic data only, in amounts greater than or equal to the amount of real data but without including any of the original data. The dice score evaluations are shown in Table 1. Sub-optimal performance is achieved for both our GAN-based and the model of Wang et al.  when training on an amount of synthetic data equal to the original 80% training set. However, higher performance, comparable to training on real data, is achieved when training the two models using more than five times as much synthetic data (only), and fine-tuning using a 10% random selection of the “real” training data. In this case, the synthetic data provides a form of pre-training, allowing for much less “real” data to be used to achieve a comparable level of performance.
In this paper, we propose a generative algorithm to produce synthetic abnormal brain tumor multi-parametric MRI images from their corresponding segmentation masks using an image-to-image translation GAN. High levels of variation can be introduced when generating such synthetic images by altering the input label map. This results in improvements in segmentation performance across multiple algorithms. Furthermore, these same algorithms can be trained on completely anonymized data sets allowing for sharing of training data. When combined with smaller, institution-specific data sets, modestly sized organizations are provided the opportunity to train successful deep learning models.
-  John Ashburner and Karl J Friston. Unified segmentation. Neuroimage, 26(3):839–851, 2005.
-  Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, and Dawn Song. The secret sharer: Measuring unintended neural network memorization & extracting secrets. arXiv preprint arXiv:1802.08232, 2018.
-  A. Chartsias, T. Joyce, M. V. Giuffrida, and S. A. Tsaftaris. Multimodal mr synthesis via modality-invariant latent representation. IEEE Transactions on Medical Imaging, 37(3):803–814, 2018.
-  P. Costa, A. Galdran, M. I. Meyer, M. Niemeijer, M. Abràmoff, A. M. Mendonça, and A. Campilho. End-to-end adversarial retinal image synthesis. IEEE Transactions on Medical Imaging, 37(3):781–791, 2018.
-  Salman Ul Hassan Dar, Mahmut Yurt, Levent Karacan, Aykut Erdem, Erkut Erdem, and Tolga Çukur. Image synthesis in multi-contrast mri with conditional generative adversarial networks. arXiv preprint arXiv:1802.01221, 2018.
-  Pedro Domingos. A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87, 2012.
-  A. F. Frangi, S. A. Tsaftaris, and J. L. Prince. Simulation and synthesis in medical imaging. IEEE Transactions on Medical Imaging, 37(3):673–679, 2018.
-  Maayan Frid-Adar, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan. Synthetic data augmentation using gan for improved liver lesion classification. In IEEE International Symposium on Biomedical Imaging (ISBI), 2018.
-  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
-  Yipeng Hu, Eli Gibson, Tom Vercauteren, Hashim U Ahmed, Mark Emberton, Caroline M Moore, J Alison Noble, and Dean C Barratt. Intraoperative organ motion models with an ensemble of conditional generative adversarial networks. In MICCAI, 2017.
-  Juan Eugenio Iglesias, Cheng-Yi Liu, Paul M Thompson, and Zhuowen Tu. Robust brain extraction across datasets and comparison with publicly available methods. IEEE transactions on medical imaging, 30(9):1617–1634, 2011.
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros.
Image-to-image translation with conditional adversarial networks.In , July 2017.
Dwarikanath Mahapatra, Behzad Bozorgtabar, Sajini Hewavitharanage, and Rahil
Image super resolution using generative adversarial networks and local saliency maps for retinal image analysis.In MICCAI, 2017.
-  Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging, 34(10):1993–2024, 2015.
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi.
V-net: Fully convolutional neural networks for volumetric medical image segmentation.In 3D Vision (3DV), 2016 Fourth International Conference on, pages 565–571. IEEE, 2016.
-  Dong Nie, Roger Trullo, Jun Lian, Caroline Petitjean, Su Ruan, Qian Wang, and Dinggang Shen. Medical image synthesis with context-aware generative adversarial networks. In MICCAI, 2017.
-  Christopher G Schwarz, Jeffrey L Gunter, Heather J Wiste, Scott A Przybelski, Stephen D Weigand, Chadwick P Ward, Matthew L Senjem, Prashanthi Vemuri, Melissa E Murray, Dennis W Dickson, et al. A large-scale comparison of cortical thickness and volume methods for measuring alzheimer’s disease severity. NeuroImage: Clinical, 11:802–812, 2016.
Hoo-Chang Shin, Holger R Roth, Mingchen Gao, Le Lu, Ziyue Xu, Isabella Nogues,
Jianhua Yao, Daniel Mollura, and Ronald M Summers.
Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning.IEEE transactions on medical imaging, 35(5):1285–1298, 2016.
-  Nicholas J Tustison, Philip A Cook, Arno Klein, Gang Song, Sandhitsu R Das, Jeffrey T Duda, Benjamin M Kandel, Niels van Strien, James R Stone, James C Gee, et al. Large-scale evaluation of ants and freesurfer cortical thickness measurements. Neuroimage, 99:166–179, 2014.
-  Guotai Wang, Wenqi Li, Sebastien Ourselin, and Tom Vercauteren. Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks. arXiv preprint arXiv:1709.00382, 2017.
-  Dong Yang, Daguang Xu, S Kevin Zhou, Bogdan Georgescu, Mingqing Chen, Sasa Grbic, Dimitris Metaxas, and Dorin Comaniciu. Automatic liver segmentation using an adversarial image-to-image network. In MICCAI, 2017.
-  Yizhe Zhang, Lin Yang, Jianxu Chen, Maridel Fredericksen, David P Hughes, and Danny Z Chen. Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In MICCAI, 2017.