DCGANs for Realistic Breast Mass Augmentation in X-ray Mammography

09/04/2019 ∙ by Basel Alyafi, et al. ∙ 0

Early detection of breast cancer has a major contribution to curability, and using mammographic images, this can be achieved non-invasively. Supervised deep learning, the dominant CADe tool currently, has played a great role in object detection in computer vision, but it suffers from a limiting property: the need of a large amount of labelled data. This becomes stricter when it comes to medical datasets which require high-cost and time-consuming annotations. Furthermore, medical datasets are usually imbalanced, a condition that often hinders classifiers performance. The aim of this paper is to learn the distribution of the minority class to synthesise new samples in order to improve lesion detection in mammography. Deep Convolutional Generative Adversarial Networks (DCGANs) can efficiently generate breast masses. They are trained on increasing-size subsets of one mammographic dataset and used to generate diverse and realistic breast masses. The effect of including the generated images and/or applying horizontal and vertical flipping is tested in an environment where a 1:10 imbalanced dataset of masses and normal tissue patches is classified by a fully-convolutional network. A maximum of 0:09 improvement of F1 score is reported by using DCGANs along with flipping augmentation over using the original images. We show that DCGANs can be used for synthesising photo-realistic breast mass patches with considerable diversity. It is demonstrated that appending synthetic images in this environment, along with flipping, outperforms the traditional augmentation method of flipping solely, offering faster improvements as a function of the training set size.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Breast cancer is the second deadliest cancer in women globally after lung cancer. This disease was the most frequently diagnosed cancer in 154 countries and the first cause of cancer death in women in 100 countries in 2018 [1]. Computer-aided detection (CADe) systems have been a good alternative for double reading strategies in breast cancer screening benefiting from recent advances in supervised deep learning to reduce false negative and false positive cases. Supervised deep learning tools, however, require large amounts of annotated data. Unfortunately, publicly-available medical datasets are usually small and imbalanced (cancer vs non-cancer) due to privacy issues and the high cost of expert annotations. Generative Adversarial Networks (GAN) [5] have shown promising results in synthesising medical images [2]. GAN consist of a network (called generator or G) that learns the distribution of the input data implicitly by the aid of another network (called discriminator or D) which, in turn, tries to learn to distinguish real among synthetic images. Deep Convolutional GAN (DCGAN) [6]

, chosen due to training stability, are used to neutralise a wide range of non-pertinent sources of variance given that the dataset has enough examples. Thereafter, the synthetic breast mass images are used to augment an imbalanced dataset for improving the classification performance. All materials are available online for scientific use (

link).

2 Description of Purpose

To use DCGAN in order to synthesise realistic and diverse breast masses to augment unbalanced datasets in a classification problem. The ultimate goal is to improve the performance of CADe systems in breast mass detection tasks.

3 Materials

The dataset used in this work is OPTIMAM Mammography Image Database (OMI-DB) [4]. This database includes over 145,000 cases (over 2.4 million images) and comprises unprocessed and processed digital mammograms from the NHS Breast Screening Programme of the United Kingdom. A subset of this database was obtained comprising over 80,000 cases. In this dataset, there are images from four vendors, however, only images belonging to Hologic Selenia Dimensions (Hologic, Inc; Bedford, Massachusetts, USA) were used in this work. This database has expert annotations identifying the image and any clinical observation. A total of 2,215 mass lesion and 22,000 normal tissue patches were extracted with size pixels after applying histogram normalisation. Negative patches (normal tissue) were extracted randomly given that there was no overlap with the background or masses.

4 Method

DCGAN [6] was used in this work with an additional layer for both of G and D to allow the generation of

pixel patches. The aim of the generator is to learn the mapping between the latent space (the normal distribution in this case) and the space of breast mass in a sense that it can transform a 200-value vector from the latent space to a breast mass image that can fool the discriminator. The loss functions used to train the DCGAN, originally recommended in Ref.

J. Goodfellow et al. [5], were cross entropy for D (Equation (1)) and the non-saturated loss for G (Equation (2)).

(1)
(2)

where is the distribution of real breast masses, and

is the normal distribution with zero mean and unitary standard deviation. As mentioned in Ref.

J. Goodfellow et al. [5], this is preferred to because it has larger gradients at the beginning of the training process which makes G learn faster.

(a)
(b)
Figure 1: (a) Training DCGAN. Dotted arrows refer to fake patches. Steps from one to seven are: generate a noise batch z, forward z through G, forward the real and fake batches through D, calculate , update D, calculate , and update G. (b) The proposed framework for evaluating the DCGAN when used in data augmentation for supporting the minority class in an unbalanced dataset. Four strategies are investigated: ORG for using real images only, GAN for using real and synthetic images, Aug ORG for applying horizontal and vertical flipping on real images only, and Aug GAN for applying horizontal and vertical flipping on real and synthetic images.

Fig. 0(a) shows a schematic framework of the DCGAN. For each training iteration, 64 random latent vectors are sampled (step 1 in Fig. 0(a)). This pure-noise batch is normalized to the range then forwarded through G to generate a batch of fake images (G(z)) (step two). These fake images are normalised to the range

then forwarded through D to get realism probabilities, see step three with dashed arrows. An equal-size batch of real images is normalised and forwarded through D to learn the boundary between real and fake breast mass spaces, see step three with the dense arrow. In step four, Equation (

1) is used to calculate , then D parameters are updated (step five). Thereafter, the fake batch is forwarded through D and Equation (2) is used to calculate

in step six. Backpropagation is done eventually to update G parameters (step seven). To complete one epoch, these steps are repeated until all the real breast mass patches are covered. As recommended in Ref.

Salimans et al. [7], one-sided label smoothing was used to reduce over-confidence problems. In addition, conventional data augmentation, horizontal and vertical flipping, was used for increasing the diversity of the generated images. One critical issue was faced during training was the checkerboard effect in which a grid pattern appears in the synthesised images. The solution was inspired by a talk of Goodfellow [3] , where the use of different kernel sizes between G and D was suggested. In order to evaluate the trained generator, an augmentation environment was used where a 1:10 imbalanced dataset of masses (positive minority class) and normal tissue (negative majority class) was classified by a fully-convolutional network. In this setting, the classifier has a similar architecture to the DCGAN discriminator. Fig. 0(b) shows the pipeline used to evaluate the effect of data augmentation using four different approaches. In addition, these augmentation effects on classification were investigated for different sizes of the training dataset. For training the classifier, and were used for training, validation and testing, respectively. With respect to the positive class (2,215 breast masses), the training part was divided as {}, where the subscripts refer to the size of the training subset. These subsets were sampled so that each subset is contained in the next larger one. Regarding the negative class (22,000 normal tissue patches), training subsets were designed to have an imbalance ratio of 10 {}. The four augmentation approaches investigated (see Fig. 0(b)) are:

  • ORG: using original images, the input for the classifier is as positive images plus as negative.

  • Aug ORG: original images were augmented using random horizontal and vertical flipping.

  • GAN: the training set of the classifier is real masses and synthetic masses as the positive class, and normal tissue patches as the negative class.

  • Aug GAN: the generated images as well as the real ones were augmented on the fly by random horizontal and vertical flipping.

Because the dataset is imbalanced, F1 score was used as an evaluation metric. This provides equal importance to precision and recall. As observed in Fig.

0(b), the test and validation sets were fixed for all k’s. 3-fold cross validation was used to assure reliable results.

5 Results and Discussion

Fig. 1(a) shows two synthetic masses (left column) and two real masses (right column) depicting that DCGAN could generate visually-similar masses to the ones it was trained on. Moreover, Fig. 1(b) shows the F1 score for different training sizes, where each line represents one augmentation approach. The blue line (ORG) shows the classification results using the original unbalanced training dataset. As more training images are available, the classifier increases its performance until k=750 where the performance saturates. When comparing the blue and the green lines (GAN), the latter shows faster improvements which shows that the generator has learned to unlock unseen images in the real distribution helping the classifier to distinguish masses among normal tissue. If, on the other hand, the original data is augmented using horizontal and vertical flipping (the orange line of Aug ORG), the classifier performs similarly to GAN (green) at medium sizes. Finally, the red line (Aug GAN) shows the F1 score when random online flipping was applied on the combined real and synthetically-generated images. As can be depicted in the figure, Aug GAN outperformed all other modes at any k, except a negligible drop at 1,300, with the maximum improvement at 250 with about 0.09 over ORG approach.

(a)
(b)
Figure 2: (a) Synthetic (left column) and real (right column) breast mass lesions. (b) F1 score as a function of the size of the positive training set investigating four approaches: ORG, GAN, Aug ORG, and Aug GAN.

6 Breakthrough work

The main contribution of this work is the development of a DCGAN-based model to generate synthetic breast masses. Additionally, the performance of a fully-convolutional network classifier in an imbalanced mammography image dataset was investigated when the training dataset was enriched by including synthetic images, by flipping augmentation, or by a combination of them. This analysis on this scale is the first to the best of our knowledge. This work neither is being nor has been submitted to elsewhere.

7 Conclusions

In this study, we used a modified version of DCGAN to generate realistic breast mass patches with dimensions of pixels. These synthetically-generated images were used to increase the size of the training dataset in a breast mass classifier. This was compared with conventional augmentation, i.e. flipping. Results, based on F1 score, suggest that classifiers with a training dataset smaller than 750 cases can greatly benefit from the synthetic images. On the contrary, conventional data augmentation strategies are sufficient for larger datasets.

References

  • [1] F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal (2018)

    Global cancer statistics 2018: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries

    .
    A Cancer Journal for Clinicians 68 (6), pp. 394–424. Note: https://onlinelibrary.wiley.com/doi/abs/10.3322/caac.21492 External Links: Document Cited by: §1.
  • [2] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan (2018) GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321, pp. 321 – 331. External Links: ISSN 0925-2312, Document, Link Cited by: §1.
  • [3] I. Goodfellow (2016-12) NIPS 2016 Tutorial: Generative Adversarial Networks. Barcelona. Note: https://channel9.msdn.com/Events/Neural-Information-Processing-Systems-Conference/Neural-Information-Processing-Systems-Conference-NIPS-2016/Generative-Adversarial-Networks External Links: 1701.00160 Cited by: §4.
  • [4] M. D. Halling-Brown, P. T. Looney, M. N. Patel, L. M. Warren, A. Mackenzie, and K. C. Young (2014-03) The oncology medical image database (OMI-DB). In Proc. SPIE 9039 Medical Imaging 2014: PACS and Imaging Informatics: Next Generation and Innovations, Vol. 9039, pp. 903906–1. External Links: Document Cited by: §3.
  • [5] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengioa (2014) Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, pp. 2672–2680. External Links: Link Cited by: §1, §4.
  • [6] A. Radford, L. Metz, and S. Chintala (2016) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In 2016 International Conference on Learning Representations (ICLR), Cited by: §1, §4.
  • [7] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In Advances in Neural Information Processing Systems 29, pp. 2234–2242. External Links: Link Cited by: §4.