Deep learning in computer vision has achieved great results in the past few years [Deng et al. (2009), Karras et al. (2017)]. Most of these have been enabled by more computational power and large amounts of data. Unfortunately, in many scientific fields such as medical imaging, there are usually several orders of magnitude fewer data samples to work with than in large-scale computer vision datasets. Leaving aside issues like anonymization and privacy, this poses several specific problems for anyone wishing to use medical imaging datasets:
Scarcity — Data is hard to obtain, usually only a few samples are available per dataset.
Bias — Medical and other small datasets usually contain many more negative (healthy) images than positive ones (with a valid and confirmed illness). The reason for that is that the data usually comes from a real-world diagnostic process, where data is obtained even at a low suspicion threshold, since the potential harms of the imaging procedure are far outweighed by the benefit of a prompt diagnosis. Furthermore, in screening settings a large population of completely symptom-free subjects is deliberately examined.
Often present are also “confirmation” images – for a patient with a positive finding, many more images will be made to confirm the diagnosis and monitor the progress. This only increases the bias, as the dataset then has several positive images of the same patient. In other fields, variations on these processes also exist, all resulting in a similar bias.
Noise — Introduced by capturing devices, errors made during data processing or storage, or from a naturally noisy population (e.g. synthetic implants, marker wires, or prior surgery related to the illness).
All three of these issues pose a significant challenge for training classification models. In this work, we aim to partially alleviate the first two problems in the context of binary image classification. Our contributions are the following:
We train a generative model that has the ability to transform data from one class to the other and back with a CycleGAN architecture [Zhu et al. (2017)].
We show that the classifier is partially fooled into thinking that the transformed images are of the respective real class-label distributions.
We show that the performance of the classifier may improve when its training data is augmented with the transformed images, in comparison to classical image augmentation.
2 Related work
Generative Adversarial Networks (GANs), proposed by Goodfellow et al. (2014), have shown great potential for generating or modifying images. Many studies focused on image augmentation using GANs Shrivastava et al. (2017); Mueller et al. (2018); Bousmalis et al. (2018). The application to the medical domain is logical, because it is generally difficult to obtain data there, and all datasets are naturally heavily imbalanced. Shin et al. (2018)
focuses on brain MRI augmentation using paired image-to-image translation similar to the pix2pix approachIsola et al. (2017).
However, paired images (e.g. the same breast in the same view with and without cancer) are very hard to obtain. Thus, we focus on unpaired image augmentation. In their work on CycleGAN, Zhu et al. (2017) used a pair of GANs coupled with a cycle-consistent loss for unpaired image-to-image translation, and succeed in converting images between two domains (e.g. horses to zebras). In our work, we adopt this idea to generate cancerous features into or remove them from mammography images.
Parallel to our research, Sun et al. (2018) have applied the CycleGAN architecture to augment brain and liver MRI scans. Aligned with our work, they show that such augmentation boosts the classifier’s performance.
For the actual cancerous lesion detection, there have been several studies utilizing deep neural networks on image patches, likeLévy and Jain (2016). The reason for looking at smaller patches is mostly because of dimensionality reduction. There have also been attempts at detection by training on whole images Ribli et al. (2018); Shen (2017); Hussain et al. (2017). They all augment the dataset by translating, rotating, or flipping the images to improve the system’s performance, which we also compare to in our experiments.
Our approach consists of two models, trained separately. In the first step, we train a specific GAN architecture to learn a transformation from the domain of images of one class label to the domain of images of the other class labels. In the second step, we use the generative model to augment a Faster R-CNN classifier Ren et al. (2015) to improve its performance.
3.1 Generative augmentation model
The generative model is based on CycleGAN Zhu et al. (2017). Its goal is to perform unpaired translation of images from one domain to another, and back. It achieves this by training two generator–discriminator pairs and introducing a cycle-consistency loss. In our case, we apply it to generate and remove cancerous features from mammography images. Figure 1 shows the output of the generative model on two training samples.
More formally, CycleGAN transforms images from a domain to another domain . For that, it uses two independent mappings, and . To train these mappings directly, one would need paired images, which are very hard to obtain (for example, the same patient’s image with and without breast cancer, in the exact same orientation). Instead, CycleGAN uses a GAN-like loss of introducing a discriminator, that attempts to differentiate generated images from the empirical domain from the real images from by learning a mapping (analogously for ).
Furthermore, it adds a cycle-consistency loss , which enforces the "identity" property . All of this is analogously done for domain as well. Figure 2 shows a simple diagram of the model.
The loss is composed of the following partial loss terms. The first is the classic adversarial GAN loss, where is the discriminator of the GAN on domain , and is the generator of samples in the domain given a sample from .
The second loss term corresponds to cycle-consistency losses for both directions.
The loss of the final model sums all the partial loss terms with constant weights (regarded as hyperparameters).
The objective of training is summarized by the following optimization problem.
3.1.1 Conditioning on regions of interest
To enhance the usefulness of our model, we add another input modality into to our generative model that represents regions of interest in the picture. For example, for breast cancer imaging, this modality could contain a boolean mask indicating segmented regions with “suspicious” (potentially cancerous) tissue. This also allows for encoding of various invariants into the dataset. By varying the additional mask position spatially, we obtain several variants of the transformed image, which together encode spatial equivariance of cancerous tissue, which might not be represented in the original dataset due to a low number of samples. The datasets we use all contain masks (of varying quality) with highlighted lesions or benign masses of the same dimension as the image.
To model the additional data source, we append another channel to our input image and let the model train using both the original image and the mask as both input and output. The generator now obtains a “two channel” image, and produces two channels instead of one. The final loss function is a sum of our loss function applied to each channel individually. The rest of the model remains the same. The changes in the formulation of both generators and discriminators are the following (shown for , is the domain of masks):
3.1.2 Removing checkerboard artifacts
Empirically, models with deconvolutional layers tend to exhibit “checkerboard” artifacts, especially when trained for longer amounts of time Odena et al. (2016). Therefore in our experiments, we 1) substitute a deconvolution with nearest-neighbor upsampling followed by a convolution, and 2) we initialize the kernel weights using ICNR Aitken et al. (2017). Generally, deconvolution preserves more details and produces less blurry results compared to upsampling and followed by convolution. We also evaluated bilinear upsampling, but it empirically produced more artifacts than nearest-neighbor upsampling.
3.2 Neural classification model
. Faster R-CNN is a convolution-based network capable of classifying and localizing objects in an image. Pure classification networks (predicting a binary answer) are easier to train, and thus more commonly used for mammography images. However, we believe that localizing malignant tumors is important if the system was to be implemented in clinical routine, since it helps in verifying the decision. The network is based on ResNet-50, a 50 layered network with residual connections pretrained on ImageNetDeng et al. (2009). Similarly to Ribli et al. (2018), we also changed the following parameters: we enabled the proposal network, and changed the proposal non-maximal suppression threshold to 0.5.
To validate our ideas and claims, we propose several simple experiments in the domain of breast cancer recognition from 2D mammography images.
4.1 Model implementation
The generative augmentation models for all our experiments are based on the CycleGAN Zhu et al. (2017)
architecture, and are implemented in TensorFlow111Based on the TensorFlow research CycleGAN implementation: https://github.com/tensorflow/models Abadi et al. (2016). More details about the architectures and training procedures are provided in Appendix A.
There are several datasets that relate to breast cancer diagnosis. In most of these one can observe the limitations that we outlined in the introduction. For our experiments, we used the following datasets: (1) BCDR Guevara Lopez et al. (2012), Breast Cancer Digital Repository, several datasets from Portugal; (2) INbreast Moreira et al. (2012), INbreast digital breast database, also from Portugal. Samples with a BiRads classification greater than 3 were considered as positive (cancerous), lower than 3 were considered negative (healthy); and (3) CBIS–DDSM Sawyer Lee et al. (2016), Curated Breast Imaging Subset of DDSM (Digital Database for Screening Mammography) from the USA.
For the generative model, we use BCDR-1 and BCDR-2 (merged together) for training. For the classifier, we use both BCDR datasets along with CBIS with an 85% training and 15% evaluation split. Due to a high noise ratio in CBIS, we only used it for the classifier. We use the held-out INbreast dataset as a test dataset for both models. All images were downscaled to pixels due to hardware limitations. We also experimented with pixels, but the image quality was poorer. Table 1 shows the number of samples in the respective datasets.
4.3 Training a classifier
Our Faster R-CNN Ribli et al. (2018) based classifier was trained to localize malignant and benign lesions. We convert the pixel masks into a set of bounding box by applying Otsu threshold segmentation and taking the bounding box around every disconnected region. Images with no lesions or lesions with a bounding box area smaller than pixels were discarded, as R-CNN doesn’t need to train on “negative” images. For each image, the model predicts a set of bounding boxes, corresponding scores, and classes. For evaluation, we treat an image as positive (cancerous) if any of the bounding boxes score with a malignant class is higher than a chosen, constant confidence threshold.
We train the classifier on different datasets for a maximum of 100,000 steps (batch size 8) and pick the best model based on ROC AUC Bradley (1997) on the evaluation set. Based on inspection of the evaluation set loss, we empirically chose the models trained for steps (for all model variants).
4.4 “Fooling” a trained classifier
As a first step, we want to see if our classifier, trained only on original images, is “fooled” by the generated images. In other words, for correctly classified images, how many times does the label change after we run the images through the generative augmentation model? We evaluate this question on all of our test data (see Section 5).
4.5 Improving the classifier
Secondly, we evaluate if a classifier trained in the same way on a mixed dataset of original and augmented images using the generative model performs better in terms of both classification metrics and “being fooled” We also compare the model to standard augmentation techniques such as image translation, rotation, and horizontal flipping. We use the same training/evaluation/testing split, but balance the training dataset by converting all the healthy images to cancerous ones, and adding them to our dataset. We then balance the dataset in a similar way as in Section 4.3.
To visualize the results of our generative augmentation models, we show a random uniform selection of images augmented by our generative model from the INbreast test dataset in Figure 4 and 5 (Appendix B).
|Classifier training data||Correctly clf. %||Fooled %||ROC AUC %||F1 score %|
Fooling and improving the classifier evaluated on the test dataset INbreast (different patient population than the training set). GAN-augmented images are from the unconditioned GAN model because of better image quality. Each run was repeated three times — shown are the average and the standard deviation for each value.
The first and second columns of Table 2 show that the classifier learns to be less fooled by our generative augmentation model if we augment the training set images using the same model, which confirms the intuition that this makes the classifier slightly more robust.
As shown in the first row of Table 2, the classifier performs reasonably well when trained on the original dataset and evaluated on a test split from that dataset (both in terms of ROC AUC and F1 score). The F1 score is computed using a custom bounding box proposal confidence threshold of , same as in Ribli et al. (2018).
When the training set images are augmented by our GAN (third row), the average ROC AUC goes down slightly, but the error margin is too big to produce a conclusive result. As was previously shown by Becker et al. (2018b, a) and our subjective assessment, this suggests that the new GAN-generated data might be challenging to classify for our classifier. The same conclusion applies for the experiment where we augment the training set images using traditional image augmentation techniques.
Overall, our GAN training has been very prone to checkerboard and “S”-shaped artifacts, as can be partially seen in Figures 4 and 5 (Appendix B). We also experimented with both higher ( px) and lower resolutions ( px) of images: the lower resolutions generally had fewer artifacts and faster training times, but a higher resolution is desirable when thinking about moving to full-field mammographic images in the future. Unfortunately, due to GPU memory limitations resolution could not be further increased. Our GAN models and RCNN-based classifiers train in less than 24 hours on an NVIDIA TITAN Xp GPU.
The classifier results are inconclusive, and it is not clear that adding our augmented images helps the classifier achieve better performance or not. We hypothesize that this might be due to the noise in our data, as the results of Sun et al. (2018) suggest that the overall method is sound and can improve classifier performance if applied well.
7 Future work
Possible future improvements to our work include investigating upscaling the resolution without obtaining artifacts with approaches similar to Wang et al. (2018), stabilizing the conditioned model training and results, and also leveraging that model fully to augment the images in pre-specified places. For a more detailed image, we could explore approaches similar to Self-Attention GAN Zhang et al. (2018), which promises to pay close attention to parts of the input image for output generation. This would also help in interpreting the resulting changes done by the GAN. Unfortunately, this approach is very memory-expensive.
Traditionally, Variational AutoencodersKingma and Welling (2013) (VAEs) lack detail in the output images and GANs lack “truthfulness” — they may overgenerate parts of the image Sajjadi et al. (2018). As a more hybrid approach, we could combine a VAE with a GAN to model both the location and the image details jointly with one model, similarly to the approaches in Liu et al. (2017); Huang et al. (2018); Andermatt et al. (2018). To simplify the model, one could also try using a StarGAN-like Choi et al. (2018) approach by only using one generator/discriminator pair which is conditioned by the class label, instead of using two generators and discriminators.
In our work, we have shown that for binary classification on images, there exists a simple way to potentially increase prediction accuracy by generative dataset augmentation. Leveraging the idea behind CycleGAN, we have designed a GAN that is able to translate images from one class label to the other, and use that property to augment the training dataset of a classifier into a bigger, more balanced, and less sparse dataset. We have provided a proof of concept implementation and shown that on the challenging noisy example case of breast cancer recognition from mammography images, we may be able to help improve performance of classifiers. This suggests our generative augmentation model learns a meaningful approximation of the manifolds of our class labels.
We would like to thank the Computer Vision Lab at ETH Zürich for providing us with computational resources.
Abadi et al. (2016)
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis,
Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael
Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G
Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin
Wicke, Yuan Yu, Xiaoqiang Zheng, and Google Brain.
TensorFlow: A System for Large-Scale Machine Learning.In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), pages 265–284, 2016. ISBN 978-1-931971-33-1. doi: 10.1038/nn.3331. URL https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.
- Aitken et al. (2017) Andrew Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, and Wenzhe Shi. Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize. arXiv preprint arXiv:1707.02937, 2017.
- Andermatt et al. (2018) Simon Andermatt, Antal Horváth, Simon Pezold, and Philippe C. Cattin. Pathology Segmentation using Distributional Differences to Images of Healthy Origin. CoRR, abs/1805.10344, 2018. URL http://arxiv.org/abs/1805.10344.
- Becker et al. (2018a) Anton S Becker, Lukas Jendele, Ondrej Skopek, Nicole Berger, Soleen Ghafoor, Magda Marcon, and Ender Konukoglu. Injecting and removing malignant features in mammography with CycleGAN: Investigation of an automated adversarial attack using neural networks. arXiv preprint arXiv:1811.07767, 2018a.
- Becker et al. (2018b) Anton S. Becker, Lukas Jendele, Ondrej Skopek, Soleen Ghafoor, Nicole Berger, Magda Marcon, and Ender Konukoglu. Generative Neural Network Inserting or Removing Cancer into Mammograms Fools Radiologists and Deep Learning Alike: Example of an Adversarial Attack. In Proceedings of the RSNA Annual Meeting, 2018b.
- Bousmalis et al. (2018) Konstantinos Bousmalis, Alex Irpan, Paul Wohlhart, Yunfei Bai, Matthew Kelcey, Mrinal Kalakrishnan, Laura Downs, Julian Ibarz, Peter Pastor, Kurt Konolige, et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 4243–4250. IEEE, 2018.
- Bradley (1997) Andrew P Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7):1145–1159, 1997.
- Choi et al. (2018) Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009.
- Goodfellow et al. (2014) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
- Guevara Lopez et al. (2012) Miguel Angel Guevara Lopez, Naimy González Posada, Daniel Moura, Raúl Pollán, José Franco-Valiente, César Ortega, Manuel Del Solar, Guillermo Díaz-Herrero, Isabel Pereira M A Ramos, Joana Pinheiro Loureiro, Teresa Cardoso Fernandes, and Bruno Ferreira M Araújo. BCDR: A Breast Cancer Digital Repository, 2012.
- Hahnloser et al. (2000) Richard H. R. Hahnloser, Rahul Sarpeshkar, Misha A. Mahowald, Rodney J. Douglas, and H. Sebastian Seung. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405, 2000. URL http://dx.doi.org/10.1038/35016072.
- Huang et al. (2018) Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. Multimodal Unsupervised Image-to-image Translation. In ECCV, 2018.
- Hussain et al. (2017) Zeshan Hussain, Francisco Gimenez, Darvin Yi, and Daniel Rubin. Differential Data Augmentation Techniques for Medical Imaging Classification Tasks. In AMIA Annual Symposium Proceedings, volume 2017, page 979. American Medical Informatics Association, 2017.
Isola et al. (2017)
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros.
Image-to-Image Translation with Conditional Adversarial Networks.CVPR, 2017.
- Karras et al. (2017) Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
- Kingma and Ba (2015) Diederik P. Kingma and Jimmy Lei Ba. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICRL), 2015. URL http://arxiv.org/abs/1412.6980.
- Kingma and Welling (2013) Diederik P Kingma and Max Welling. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
- Lévy and Jain (2016) Daniel Lévy and Arzav Jain. Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks. arXiv preprint arXiv:1612.00542, 2016.
- Liu et al. (2017) Ming-Yu Liu, Thomas Breuel, and Jan Kautz. Unsupervised image-to-image translation networks. In Advances in Neural Information Processing Systems, pages 700–708, 2017.
- Maas et al. (2013) Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectifier Nonlinearities Improve Neural Network Acoustic Models, 2013.
- Mao et al. (2017) Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least Squares Generative Adversarial Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 2813–2821. IEEE, 2017.
- Moreira et al. (2012) Inês C Moreira, Igor Amaral, Inês Domingues, António Cardoso, Maria João Cardoso, and Jaime S Cardoso. INbreast: Toward a Full-field Digital Mammographic Database. Academic radiology, 19(2):236–248, 2012.
- Mueller et al. (2018) Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB. In Proceedings of Computer Vision and Pattern Recognition (CVPR), 2018. URL https://handtracker.mpi-inf.mpg.de/projects/GANeratedHands/.
- Nair and Hinton (2010) Vinod Nair and Geoffrey E Hinton. Rectified Linear Units improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML), pages 807–814, 2010.
- Odena et al. (2016) Augustus Odena, Vincent Dumoulin, and Chris Olah. Deconvolution and Checkerboard Artifacts. Distill, 2016. doi: 10.23915/distill.00003. URL http://distill.pub/2016/deconv-checkerboard.
- Ren et al. (2015) Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 91–99, 2015. URL https://arxiv.org/abs/1506.01497.
- Ribli et al. (2018) Dezso Ribli, Anna Horváth, Zsuzsa Unger, Péter Pollner, and István Csabai. Detecting and classifying lesions in mammograms with Deep Learning. Scientific Reports, 8(1):4165, 2018. ISSN 2045-2322. doi: 10.1038/s41598-018-22437-z. URL https://doi.org/10.1038/s41598-018-22437-z.
Sajjadi et al. (2018)
M. S. M. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, and S. Gelly.
Assessing Generative Models via Precision and Recall.In Workshop on Theoretical Foundations and Applications of Deep Generative Models (TADGM) at the 35th International Conference on Machine Learning (ICML), 2018.
- Sawyer Lee et al. (2016) Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi, and Daniel Rubin. Curated Breast Imaging Subset of DDSM. The Cancer Imaging Archive, 2016. doi: http://dx.doi.org/10.7937/K9/TCIA.2016.7O02S9CY.
- Shen (2017) Li Shen. End-to-end Training for Whole Image Breast Cancer Diagnosis using An All Convolutional Design. arXiv preprint arXiv:1708.09427, 2017.
- Shin et al. (2018) Hoo-Chang Shin, Neil A Tenenholtz, Jameson K Rogers, Christopher G Schwarz, Matthew L Senjem, Jeffrey L Gunter, Katherine P Andriole, and Mark Michalski. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In International Workshop on Simulation and Synthesis in Medical Imaging, pages 1–11. Springer, 2018.
- Shrivastava et al. (2017) Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russell Webb. Learning from Simulated and Unsupervised Images through Adversarial Training. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2242–2251, 2017.
- Sun et al. (2018) Liyan Sun, Jiexiang Wang, Xinghao Ding, Yue Huang, and John Paisley. An Adversarial Learning Approach to Medical Image Synthesis for Lesion Removal. arXiv preprint arXiv:1810.10850, 2018.
- Wang et al. (2018) Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- Zhang et al. (2018) Han Zhang, Ian J. Goodfellow, Dimitris N. Metaxas, and Augustus Odena. Self-Attention Generative Adversarial Networks. arXiv:1805.08318, 2018.
- Zhu et al. (2017) Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
Appendix A Model implementation
We train all our GAN models for steps, using a learning rate of for the discriminators and for the generators. The optimization is performed using Adam Kingma and Ba (2015) and a batch size of 1. All code is available on GitHub222https://github.com/BreastGAN/augmentation.
The architectures of both discriminators are the same: 4 convolutional layers with reflection padding, with filters of size 64, 128, 256, 512 and stride 2 for all layers except for the last one that has stride 1, with a LeakyReLU activation functionHahnloser et al. (2000); Nair and Hinton (2010); Maas et al. (2013): . All the convolutions have a kernel size of . The output is subsequently flattened to one channel using a stride 1 convolution, with a sigmoid activation function.
Both generator networks consist of two convolutions with stride 2 to compress the dimensionality of the image followed by 9 ResNet blocks (2 convolutions layers each). Lastly, the result is upsampled using two additional convolutional layers as described in Section 3.1.2
. All the generator layers use ReLU activation functions.