I Introduction
Publicly available biomedical image datasets often contain an insufficient number of images to train a deep learning model
[1]. To augment these datasets, Generative Adversarial Networks (GANs) are used to produce synthetic images [2]. A GAN consists of two models; the generator, for producing synthetic images, and the discriminator, for distinguishing synthetic images from real images. While training a GAN, the generator takes an input of random noise and learns the distribution of real images from feedback provided by the discriminator. The discriminator classifies the generated images as real or synthetic and back-propagates gradient feedback to the generator. The generator updates its learning of feature distributions and endeavors to generate improved synthetic images based on the gradient feedback received from the discriminator.In biomedical image analysis, the diverse features are important for training a classifier to learn the region of interest for better prediction results. The diverse features of biomedical images are more significant than natural images because biomedical images contain vital information about the disease being classified [3]. Thus, the GAN must generate diverse images representative of the real images for the biomedical image domain.
A significant barrier to the generation of diverse images is the mode collapse problem. The mode collapse can occur in one of two forms: the intra-class mode collapse where a GAN generates identical synthetic images from distinct input images for a single class and the inter-class mode collapse whereby a GAN generates identical synthetic images from distinct input images for all classes. The generator in a GAN can find it difficult to capture every feature from diverse input images to generate synthetic ones [4]. Both variants of mode collapse degrade the performance of GANs in terms of the diversified images generated and subsequently, the performance of machine learning models is degraded when trained on less diversified images [5].
This work proposes the adaptive input-image normalization (AIIN) technique as a means of enabling the DCGAN to generate improved diversified synthetic X-ray images. The AIIN is a pre-processing technique for input images that enhances the prominence of desirable input features using a contrast-based computer-vision technique. These features include the shape and texture of body parts in a biomedical image. In the context of X-ray images, these features include the spine, heart, and lungs with their visual signatures like ribs, aortic arch, and distinct curvature of lower lungs. The discriminator learns the input image features more accurately and provides constructive gradient feedback to the generator using normalized X-ray images. Consequently, the generator is forced to produce more diversified images.
The contribution of this work is to empirically evaluate the efficacy of using AIIN for the DCGAN architecture as means of generating more diversified X-ray images. Key parameters are considered, window size, contrast threshold, and batch size. The AIIN is also compared with other image preprocessing techniques such as median and Gaussian filtering techniques. The occurrence of the intra-class mode collapse problem is evaluated by Multi-scale Structural Similarity Index Measure (MS-SSIM) metric. The intra-class diversity of generated images is evaluated using Fréchet Inception Distance (FID) score.
Ii Methodology
Ii-a Dataset
In this work, the publicly available dataset published by Kermany et al. [6] is utilized. The dataset contains 1340 healthy and 3875 Pneumonia chest X-ray images for training purposes. There are 624 X-ray (234 healthy and 390 Pneumonia) images available for testing purposes. The dataset is imbalanced with the healthy chest X-ray images being the minority class and the Pneumonia chest X-ray images being the majority class. The augmentation of healthy chest X-ray images is required to balance the dataset. Images were resized to 128x128 as detailed in [7].
Ii-B Data Preprocessing
The images are pre-processed using AIIN for the DCGAN. As part of the normalization process, contrast-based histogram equalization is used to normalize the images [8]. Contrast, one of the morphological features, is normalized to aid in highlighting the diverse features of the chest X-ray images. Normalized images are visually inspected to find a suitable combination of window size and contrast threshold.
In this work, window sizes 4x4, 8x8, and 16x16 are adopted based on visual inspection of image features. A window size 32x32 was also considered but degradation in image quality and feature loss warranted its exclusion. A series of contrast threshold values (0, 5, 10, 20, 50) were selected for normalizing the X-ray images.
To compare the performance of AIIN technique with alternate image preprocessing approaches, the median and Gaussian filtering methods are used. In Gaussian filtering, window size defines a discrete approximation of Gaussian distribution. A central pixel value in a window is replaced by the weighted average of the neighboring pixels to remove noise in the image. In median filtering, the central pixel value of a window is replaced by the median value of that window. In this work, window size 3x3 and 9x9 were used to normalize the X-ray images as detailed in
[9]. A window size 15x15 was also considered but excluded due to loss of visual information of image features.Ii-C DCGAN Architecture
The architecture of the DCGAN is depicted in Fig. 1. The DCGAN has been reimplemented as detailed in [10] and further fine-tuned as detailed in [11].
The DCGAN uses a Gaussian-latent random input
of 100, ADAM optimizer, and separate real and fake batches for training. Binary crossentropy loss function is used. The DCGAN is trained for 500 epochs to enable convergence of both models in the GAN
[7]. The DCGAN is trained using the images for each permutation of window size, contrast threshold values, and training batch size. A suitable batch size is considered one that can utilize all of the images available in the minority class. As such, three different batch sizes that are factors of the available training data for healthy chest X-rays images (20, 67, and 134) were selected to evaluate the DCGAN.


Ii-D Identification of the Mode Collapse and Diversity of Synthetic Images
The MS-SSIM score of real and generated images is analyzed to identify the occurrence of mode collapse. FID score is used to identify the level of diversity of synthetic images generated by the DCGAN. These are combined to enable the evaluation of DCGAN’s capacity to generate images with a diverse set of features [12] [13].
Ii-D1 Intra-class Mode Collapse Problem
Odena et al. [12] first investigated the use of MS-SSIM to measure the intra-class diversity of generated imagery and assess the occurrence of intra-class mode collapse in GANs. The similarity between two images is computed based on image pixels and structures. MS-SSIM scores are measured between randomly selected pairs of real-to-real images and pairs of synthetic-to-synthetic images separately with the cumulative mean score being reported. The range of the MS-SSIM score lies between 0 and 1. In this work, 670 image pairs are used randomly to measure the MS-SSIM score. A higher score for synthetic images as compared to real images is indicative of mode collapse occurring. Synthetic images should possess a similar or lower MS-SSIM score as compared to real images.
MS-SSIM considers luminance and contrast estimations for a metric score. MS-SSIM is computed using luminance (I), contrast (C), and structure (S) as defined in Eq. (
1) [14].(1) |
Ii-D2 Intra-class Diversity
The intra-class diversity of generated images is assessed using FID. FID evaluates the distance between synthetic images and real images using feature activations [13]. In this work, FID is measured using a sample size of 1340 images separately selected from real and synthetic images with a score ranging from 0.0 to . A lower FID score indicates a higher degree of diversity of synthetic images related to real images.
(2) |
In Eq.(2), and denote real and synthetic images while and denote the mean and covariances of real and synthetic images.
FID uses the last pooling layer of the Inception V3 model, which contains a 2048 dimensional feature, it requires 2048 or more training image samples as input. As there are only 1340 healthy chest X-ray images available, a Pre-aux Classifier layer containing 768-dimensional features is used instead of the 2048 dimensional feature.
Ii-E Assessing the Utility of Synthetic Chest X-ray Images
To assess the utility of synthetic X-ray images, a sequential CNN [15] is implemented for the classification of healthy X-ray images. The intent is to augment the minority class with synthetic X-ray images with varying degrees of similarity and diversity. This will enable an evaluation of the efficacy of AIIN for the DCGAN in augmenting chest X-ray images. Whereas the CNN was used to classify Pneumonia X-ray images in [15] as the results were assessed for Pneumonia representing positive labels.
Synthetic images were rescaled to 150x150 with Open-CV library and the CNN model reimplemented as detailed in [15]. The CNN model is trained on the dataset with 13 GAN-based augmentation variants. 1340 synthetic images are generated for each variant and used to address the data imbalance problem. Selection of variants was based on those with the most promising scores for MS-SSIM and FID. Geographical transformation such as rotation 15°, shear, and zoom range of 0.2 was also used. Classification scores are compared with the un-normalized generated healthy chest X-ray images as detailed in Table I.
Classifier (BS=10) | Traditional Aug | GAN Aug | GAN-training BS | WS | CT | MS-SSIM | FID | Accuracy (%) | Precision | Recall | Specificity |
CNN [15] | ✓ | x | x | x | x | x | x | 94.39 | 0.92 | 0.99 | 0.86 |
CNN (Re-implemented) | ✓ | x | x | x | x | x | x | 91.20 | 0.88 | 0.99 | 0.78 |
CNN | ✓ | Un-norm | 20 | x | x | +0.473 | 1.903 | 87.5 | 0.84 | 0.98 | 0.69 |
CNN | ✓ | Un-norm | 67 | x | x | +0.054 | 1.096 | 87.20 | 0.84 | 0.98 | 0.69 |
CNN | ✓ | Un-norm | 134 | x | x | +0.029 | 0.687 | 88.94 | 0.86 | 0.98 | 0.74 |
CNN | ✓ | Norm | 134 | 4x4 | 20 | +0.013 | 0.580 | 87.6 | 0.85 | 0.98 | 0.71 |
CNN | ✓ | Norm | 134 | 8x8 | 50 | +0.025 | 0.430 | 91.50 | 0.89 | 0.99 | 0.79 |
CNN | ✓ | Norm | 67 | 16x16 | 10 | +0.031 | 0.444 | 87.98 | 0.85 | 0.98 | 0.71 |
CNN | ✓ | Norm | 134 | 4x4 | 0 | +0.036 | 0.540 | 91.50 | 0.89 | 0.99 | 0.79 |
CNN | ✓ | Norm | 67 | 8x8 | 10 | +0.035 | 0.425 | 90.06 | 0.87 | 0.99 | 0.74 |
CNN | ✓ | Norm | 67 | 16x16 | 20 | +0.058 | 0.362 | 84.13 | 0.82 | 0.96 | 0.65 |
CNN | ✓ | Gaussian | 134 | 3x3 | x | +0.01 | 0.426 | 85.57 | 0.82 | 0.99 | 0.63 |
CNN | ✓ | Gaussian | 134 | 9x9 | x | +0.0008 | 0.547 | 90.38 | 0.88 | 0.99 | 0.76 |
CNN | ✓ | Median | 134 | 3x3 | x | +0.015 | 0.557 | 88.62 | 0.85 | 0.99 | 0.72 |
CNN | ✓ | Median | 134 | 9x9 | x | +0.001 | 0.550 | 87.50 | 0.84 | 0.99 | 0.69 |
Aug: Augmentation; Un-norm: Un-normalized data; Norm: Normalized data; CT: Contrast Threshold; BS: Batch Size; WS: Window Size |
Iii Results and Discussion
The MS-SSIM and FID scores of AIIN, Gaussian, and Mean normalized with un-normalized generated X-ray images are depicted in Fig. 2 and Fig. 3.
The intra-class mode collapse is identified by a higher MS-SSIM score of un-normalized synthetic X-ray images than real images. The AIIN has alleviated the mode collapse by improving the capacity of DCGAN to generate diversified normalized X-ray images as shown by the improved MS-SSIM scores. Parameters like window size, contrast threshold, and batch size have a significant impact on the generation of diversified synthetic images as indicated by the varying MS-SSIM scores. Gaussian and median filtering approaches achieve higher MS-SSIM scores of normalized synthetic images than AIIN. These filtering approaches suppress noise in the image but blur the edges of an image reducing the structural information of features yet improving the MS-SSIM score. Therefore, these filtering approaches have no advantage to DCGAN for alleviating the mode collapse problem.
FID analysis indicates the efficacy of AIIN in improving the intra-class diversity of synthetic normalized X-ray images as depicted in Fig. 2. Results show that the parameters; window size, contrast threshold, and batch size have a considerable impact on the diversity of synthetic X-ray images as indicated by the varying FID scores. The AIIN achieves relatively better FID scores than Gaussian and median filtering techniques as depicted in Fig. 3.
Several GAN-based variants have been implemented to augment healthy X-ray images as reported in Table I. The predictions for minority class containing healthy X-ray images represent negative labels as the CNN is a classifier of Pneumonia X-ray images. Therefore, the results are evaluated by the specificity score. The best specificity score is achieved at batch size 134 using window size 4x4 and 8x8 with a contrast threshold of 0 and 50. This improvement demonstrates the normalized synthetic images capacity to augment the healthy X-ray images outperforming the un-normalized approaches. The CNN has issues like randomization of features and overfitting. In this case, the CNN focuses on classifying Pneumonia and learning the features of lung segments to differentiate healthy images from Pneumonia images in the chest X-ray images. The lung segment is highlighted more using 4x4 and 8x8 window sizes as compared to the alternate permutations of the experiment yet achieve good classification measures. In comparison with Gaussian and median filtering approaches, the AIIN demonstrates advantages in that it does not degrade the structural information of features in images yet achieves better classification scores as presented in Table. I.
Iv Conclusion
In this work, an AIIN technique is proposed for the DCGAN to improve the diversity of generated chest X-ray images. Results show that the DCGAN with AIIN can generate high diversified X-ray images than DCGAN without AIIN (indicated by better MS-SSIM and FID scores) while alleviating mode collapse problem. The AIIN also performed better than the Gaussian and median filtering preprocessing techniques via diverse image features. Furthermore, the efficacy of the proposed approach is verified by using the augmented (generated X-ray images combined with the real images) images to train machine learning classifiers.
References
- [1] M. Mostapha and M. Styner, “Role of deep learning in infant brain MRI analysis,” Magnetic resonance imaging, vol. 64, pp. 171–189, 2019.
- [2] Z. Wang, Q. She, and T. E. Ward, “Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2021.
- [3] A. S. Lundervold and A. Lundervold, “An overview of deep learning in medical imaging focusing on MRI,” Zeitschrift für Medizinische Physik, vol. 29, no. 2, pp. 102–127, 2019.
-
[4]
A. Torfi, M. Beyki, and E. A. Fox, “On the Evaluation of Generative
Adversarial Networks By Discriminative Models,” in
2020 25th International Conference on Pattern Recognition (ICPR)
. IEEE, 2021, pp. 991–998. - [5] Z. Gong, P. Zhong, and W. Hu, “Diversity in machine learning,” IEEE Access, vol. 7, pp. 64 323–64 350, 2019.
- [6] Kermany, Daniel S and Goldbaum, Michael and Cai, Wenjia and Valentim, Carolina CS and Liang, Huiying and Baxter, Sally L and McKeown, Alex and Yang, Ge and Wu, Xiaokang and Yan, Fangbing and others, “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell, vol. 172, no. 5, pp. 1122–1131, 2018.
- [7] S. Kora Venu and S. Ravula, “Evaluation of Deep Convolutional Generative Adversarial Networks for Data Augmentation of Chest X-ray Images,” Future Internet, vol. 13, no. 1, p. 8, 2021.
- [8] K. Zuiderveld, “Contrast limited adaptive histogram equalization,” Graphics gems, pp. 474–485, 1994.
- [9] A. Shah, J. I. Bangash, A. W. Khan, I. Ahmed, A. Khan, A. Khan, and A. Khan, “Comparative analysis of median filter and its variants for removal of impulse noise from gray scale images,” Journal of King Saud University-Computer and Information Sciences, 2020.
- [10] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv preprint arXiv:1511.06434, 2015.
- [11] Soumith Chintala. (2021) How to Train a GAN? Tips and tricks to make GANs work. [Accessed on May. 05, 2021]. [Online]. Available: https://github.com/soumith/ganhacks#how-to-train-a-gan-tips-and-tricks-to-make-gans-work
- [12] A. Odena, C. Olah, and J. Shlens, “Conditional Image Synthesis with Auxiliary Classifier GANs,” in International conference on machine learning. PMLR, 2017, pp. 2642–2651.
- [13] T. Miyato and M. Koyama, “cGANs with Projection Discriminator,” in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=ByS1VpgRZ
- [14] A. Borji, “Pros and cons of GAN evaluation measures,” Computer Vision and Image Understanding, vol. 179, pp. 41–65, 2019.
-
[15]
R. Siddiqi, “Automated pneumonia diagnosis using a customized sequential convolutional neural network,” in
Proceedings of the 2019 3rd international conference on deep learning technologies, 2019, pp. 64–70.
Comments
There are no comments yet.