Generative Adversarial Networks (GANs)  have been widely used in medical image synthesis for more than two years. This technique has shown promising results in different modalities and for different organs and applications: colour retinal images synthesis , CT lung patches for lesion detection augmentation , MRI brain lesion removal/addition for segmentation and classification (augmentation) tasks 
, to name a few. However, many of those works were focused solely on improving the performance of a deep-learning-based model (application-driven evaluation). On the other hand, few papers shed a light over visualising the feature space of the generated images and/or reviewing specialists opinion. In a previous work , we used Deep Convolutional GANs  and proved that the generated synthetic images were effective in enhancing the classification process even in unbalanced conditions (where the class under study (positive) was minor to the normal (negative) class). In order to conduct an integrated study and to satisfy the requirements of image synthesis definition (realism and anatomic plausibility conditions) in Ref. Frangi et al. , different aspects of the evaluation process of the synthetic images should be examined. Here, we give more importance to the nature of the synthetic images (previously-generated masses and microcalcifications), i.e. to visualise the 2D feature space using t-Stochastic Neighbor Embedding (t-SNE) . Additionally, the professional opinion of two radiologists was provided through a computerised study to analyse how realistic the generated images were in the eyes of specialists.
The dataset used in this work is OPTIMAM Mammography Image Database (OMI-DB) . This database includes more than 145,000 cases (over 2.4 million images) and comprises unprocessed and processed digital mammograms from the Breast Screening Programme of the United Kingdom. A subset of this database was obtained comprising over 80,000 cases. In this dataset, there are images from four vendors, however, only images belonging to Hologic Selenia Dimensions (Hologic, Inc; Bedford, Massachusetts, USA) were used in this work. This database has expert annotations identifying the image and any clinical observation. A total of 5,351 mass and microcalcification lesions and 22,000 normal tissue patches were extracted with size pixels after applying histogram normalisation.
As described earlier in our previous work , DCGANs  were used to generate synthetic mammography lesions and microcalcifications. The process of training the DCGAN is explained in the schematic in Fig. 1
. In this figure, a batch of 64 vectors of length 200 is sampled from the latent spaceand input to the generator G which learns how to map them to the distribution of mammographic lesions. Thereafter, the discriminator D learns to distinguish between real and synthetic lesions giving a value between 0 and 1 (zero denotes definitely fake while one denotes definitely real). For more details about DCGAN training techniques used, see Ref. Alyafi et al. .
To test the realism of the generated images (patches), we projected the patches into a 2D feature space which allows to see the location of each cluster (real mass lesions, fake mass lesions, and real normal tissue). Additionally, a human-observer study was performed with radiologists in order to evaluate the realism of the synthetic images.
4.1 t-SNE experiment
t-SNE was used to reduce the dimensionality of the images (patches) from to 2 using 500 patches of each cluster (real mass lesions, synthetic mass lesions, and real normal tissue). The parameters used were 4000 for iterations number and 250 for perplexity.
4.2 Radiologists study
Two radiologists from two different hospitals in Catalonia (Spain), with 7 and +25 years in breast radiology, participated in a human observer study using a balanced random sample of 150 patches (
pixels) containing cancerous mass lesions (75 real and 75 synthetic). For each image, they had 6 options to choose among: extremely, moderately, or slightly confident real; or extremely, moderately, or slightly confident synthetic. These options were then converted to numerical values by assigning the probabilitieswhere the highest value was for extremely confident real. These numbers were then used for calculating the accuracy (using 0.5 as threshold) and for drawing the Receiver Operating Characteristic (ROC) curve.
5 Results and discussion
5.1 t-SNE analysis
As a result of the experiment described in Sec. 4.1, Fig. 2 shows the distribution of each of the following: i) real lesions (red crosses), ii) synthetic lesions (green circles), and iii) real normal tissue (purple triangles). It is clear from this figure that the distribution of the synthetic images matches the distribution of the real ones largely, pointing at high realism and diversity. Moreover, even though the real lesions had some outliers (pointed by the arrow as an example and were located on the negative class side), DCGAN learned the main distribution giving very few synthetic outliers (the circles on the side of the triangles).
5.2 Radiologists assessment
The accuracy and the ROC curve of the observer study were computed. Accuracies were reported to be and for the observers 1 and 2, respectively. Moreover, and
Area Under the ROC curve (AUC) for observer 1 and 2, respectively, were reported. Those results suggest that the generated images appeared anatomically-plausible and were hard to be distinguished from real ones even for specialists (AUC’s around random classifier performance).
5.3 Qualitative results
A sample of 16 synthetic breast lesions (mass and microcalcification) is shown alongside a similar-size sample of real lesions. Fig. 3 shows that the DCGAN could generate mammographic patches that look similar to real ones. Additionally, diversity can be identified by the different shapes and types of the generated lesions (mass, microcalcification, or mass with microcalcification).
Application-driven evaluations do not necessarily guarantee plausibly-looking synthetic images. Consequently, those evaluations must be accompanied by studies that care about observers assessments and feature space distribution. Here, we conducted an observers study and analysed the distribution of the real and the generated patches to strengthen our argument in our previous application-driven work. 
-  (2020) DCGANs for Realistic Breast Mass Augmentation in X-ray Mammography. In Medical Imaging 2020: Computer-Aided Diagnosis, Note: arXiv:1909.02062 Cited by: §1, §3.
-  (2020) Quality analysis of DCGAN-generated mammography lesions. In IWDM 14: International Workshop on Breast Imaging, Cited by: §6.
-  (2018-03) End-to-end adversarial retinal image synthesis. IEEE Transactions on Medical Imaging 37 (3), pp. 781–791. External Links: Cited by: §1.
-  (2018-03) Simulation and synthesis in medical imaging. IEEE Transactions on Medical Imaging 37 (3), pp. 673–679. External Links: Cited by: §1.
-  (2018) GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321, pp. 321 – 331. External Links: Cited by: §1.
-  (2014-03) The oncology medical image database (OMI-DB). In Proc. SPIE 9039 Medical Imaging 2014: PACS and Imaging Informatics: Next Generation and Innovations, Vol. 9039, pp. 903906–1. External Links: Cited by: §2.
-  (2014) Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, pp. 2672–2680. External Links: Cited by: §1.
-  (2016) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In 2016 International Conference on Learning Representations (ICLR), Cited by: §1, §3.
-  (2018-10) An Adversarial Learning Approach to Medical Image Synthesis for Lesion Detection. pp. arXiv:1810.10850. Cited by: §1.
Visualizing Data using t-SNE.
Journal of Machine Learning Research9, pp. 2579–2605. External Links: Cited by: §1.