Addressing the Intra-class Mode Collapse Problem using Adaptive Input Image Normalization in GAN-based X-ray Images

by   Muhammad Muneeb Saad, et al.

Biomedical image datasets can be imbalanced due to the rarity of targeted diseases. Generative Adversarial Networks play a key role in addressing this imbalance by enabling the generation of synthetic images to augment and balance datasets. It is important to generate synthetic images that incorporate a diverse range of features such that they accurately represent the distribution of features present in the training imagery. Furthermore, the absence of diverse features in synthetic images can degrade the performance of machine learning classifiers. The mode collapse problem can impact a Generative Adversarial Network's capacity to generate diversified images. The mode collapse comes in two varieties; intra-class and inter-class. In this paper, the intra-class mode collapse problem is investigated, and its subsequent impact on the diversity of synthetic X-ray images is evaluated. This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization for the Deep Convolutional GAN to alleviate the intra-class mode collapse problem. Results demonstrate that the DCGAN with adaptive input-image normalization outperforms DCGAN with un-normalized X-ray images as evident by the superior diversity scores.



There are no comments yet.


page 1

page 3


DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Learning to generate new images for a novel category based on only a few...

Generative Adversarial Networks for Non-Raytraced Global Illumination on Older GPU Hardware

We give an overview of the different rendering methods and we demonstrat...

Adaptive DropBlock Enhanced Generative Adversarial Networks for Hyperspectral Image Classification

In recent years, hyperspectral image (HSI) classification based on gener...

A Survey on Training Challenges in Generative Adversarial Networks for Biomedical Image Analysis

In biomedical image analysis, the applicability of deep learning methods...

Coupling Rendering and Generative Adversarial Networks for Artificial SAS Image Generation

Acquisition of Synthetic Aperture Sonar (SAS) datasets is bottlenecked b...

Normalized Diversification

Generating diverse yet specific data is the goal of the generative adver...

Does Normalization Methods Play a Role for Hyperspectral Image Classification?

For Hyperspectral image (HSI) datasets, each class have their salient fe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Publicly available biomedical image datasets often contain an insufficient number of images to train a deep learning model

[1]. To augment these datasets, Generative Adversarial Networks (GANs) are used to produce synthetic images [2]. A GAN consists of two models; the generator, for producing synthetic images, and the discriminator, for distinguishing synthetic images from real images. While training a GAN, the generator takes an input of random noise and learns the distribution of real images from feedback provided by the discriminator. The discriminator classifies the generated images as real or synthetic and back-propagates gradient feedback to the generator. The generator updates its learning of feature distributions and endeavors to generate improved synthetic images based on the gradient feedback received from the discriminator.

In biomedical image analysis, the diverse features are important for training a classifier to learn the region of interest for better prediction results. The diverse features of biomedical images are more significant than natural images because biomedical images contain vital information about the disease being classified [3]. Thus, the GAN must generate diverse images representative of the real images for the biomedical image domain.

A significant barrier to the generation of diverse images is the mode collapse problem. The mode collapse can occur in one of two forms: the intra-class mode collapse where a GAN generates identical synthetic images from distinct input images for a single class and the inter-class mode collapse whereby a GAN generates identical synthetic images from distinct input images for all classes. The generator in a GAN can find it difficult to capture every feature from diverse input images to generate synthetic ones [4]. Both variants of mode collapse degrade the performance of GANs in terms of the diversified images generated and subsequently, the performance of machine learning models is degraded when trained on less diversified images [5].

This work proposes the adaptive input-image normalization (AIIN) technique as a means of enabling the DCGAN to generate improved diversified synthetic X-ray images. The AIIN is a pre-processing technique for input images that enhances the prominence of desirable input features using a contrast-based computer-vision technique. These features include the shape and texture of body parts in a biomedical image. In the context of X-ray images, these features include the spine, heart, and lungs with their visual signatures like ribs, aortic arch, and distinct curvature of lower lungs. The discriminator learns the input image features more accurately and provides constructive gradient feedback to the generator using normalized X-ray images. Consequently, the generator is forced to produce more diversified images.

The contribution of this work is to empirically evaluate the efficacy of using AIIN for the DCGAN architecture as means of generating more diversified X-ray images. Key parameters are considered, window size, contrast threshold, and batch size. The AIIN is also compared with other image preprocessing techniques such as median and Gaussian filtering techniques. The occurrence of the intra-class mode collapse problem is evaluated by Multi-scale Structural Similarity Index Measure (MS-SSIM) metric. The intra-class diversity of generated images is evaluated using Fréchet Inception Distance (FID) score.

Ii Methodology

Ii-a Dataset

In this work, the publicly available dataset published by Kermany et al. [6] is utilized. The dataset contains 1340 healthy and 3875 Pneumonia chest X-ray images for training purposes. There are 624 X-ray (234 healthy and 390 Pneumonia) images available for testing purposes. The dataset is imbalanced with the healthy chest X-ray images being the minority class and the Pneumonia chest X-ray images being the majority class. The augmentation of healthy chest X-ray images is required to balance the dataset. Images were resized to 128x128 as detailed in [7].

Ii-B Data Preprocessing

The images are pre-processed using AIIN for the DCGAN. As part of the normalization process, contrast-based histogram equalization is used to normalize the images [8]. Contrast, one of the morphological features, is normalized to aid in highlighting the diverse features of the chest X-ray images. Normalized images are visually inspected to find a suitable combination of window size and contrast threshold.

In this work, window sizes 4x4, 8x8, and 16x16 are adopted based on visual inspection of image features. A window size 32x32 was also considered but degradation in image quality and feature loss warranted its exclusion. A series of contrast threshold values (0, 5, 10, 20, 50) were selected for normalizing the X-ray images.

To compare the performance of AIIN technique with alternate image preprocessing approaches, the median and Gaussian filtering methods are used. In Gaussian filtering, window size defines a discrete approximation of Gaussian distribution. A central pixel value in a window is replaced by the weighted average of the neighboring pixels to remove noise in the image. In median filtering, the central pixel value of a window is replaced by the median value of that window. In this work, window size 3x3 and 9x9 were used to normalize the X-ray images as detailed in

[9]. A window size 15x15 was also considered but excluded due to loss of visual information of image features.

Ii-C DCGAN Architecture

The architecture of the DCGAN is depicted in Fig. 1. The DCGAN has been reimplemented as detailed in [10] and further fine-tuned as detailed in [11].

The DCGAN uses a Gaussian-latent random input

of 100, ADAM optimizer, and separate real and fake batches for training. Binary crossentropy loss function is used. The DCGAN is trained for 500 epochs to enable convergence of both models in the GAN

[7]. The DCGAN is trained using the images for each permutation of window size, contrast threshold values, and training batch size. A suitable batch size is considered one that can utilize all of the images available in the minority class. As such, three different batch sizes that are factors of the available training data for healthy chest X-rays images (20, 67, and 134) were selected to evaluate the DCGAN.

Fig. 1: DCGAN Architecture: AIIN is used as a pre-processing step for the DCGAN. The internal structure of the generator and discriminator is depicted. MS-SSIM and FID metrics are used to assess the intra-class mode collapse and intra-class diversity of generated X-ray images.
Fig. 2: MS-SSIM and FID scores for un-normalized and normalized X-ray images to enable assessment of mode collapse and diversity measures. The DCGAN is trained to generate un-normalized and normalized images at three batch sizes. X-ray images are preprocessed for a series of contrast threshold values.
Fig. 3: Figure A and B shows MS-SSIM scores while C and D shows the FID scores for un-normalized and Gaussian and Mean normalized X-ray images to enable assessment of mode collapse and diversity measures. The DCGAN is trained on three batch sizes.

Ii-D Identification of the Mode Collapse and Diversity of Synthetic Images

The MS-SSIM score of real and generated images is analyzed to identify the occurrence of mode collapse. FID score is used to identify the level of diversity of synthetic images generated by the DCGAN. These are combined to enable the evaluation of DCGAN’s capacity to generate images with a diverse set of features [12] [13].

Ii-D1 Intra-class Mode Collapse Problem

Odena et al. [12] first investigated the use of MS-SSIM to measure the intra-class diversity of generated imagery and assess the occurrence of intra-class mode collapse in GANs. The similarity between two images is computed based on image pixels and structures. MS-SSIM scores are measured between randomly selected pairs of real-to-real images and pairs of synthetic-to-synthetic images separately with the cumulative mean score being reported. The range of the MS-SSIM score lies between 0 and 1. In this work, 670 image pairs are used randomly to measure the MS-SSIM score. A higher score for synthetic images as compared to real images is indicative of mode collapse occurring. Synthetic images should possess a similar or lower MS-SSIM score as compared to real images.

MS-SSIM considers luminance and contrast estimations for a metric score. MS-SSIM is computed using luminance (I), contrast (C), and structure (S) as defined in Eq. (

1) [14].


Ii-D2 Intra-class Diversity

The intra-class diversity of generated images is assessed using FID. FID evaluates the distance between synthetic images and real images using feature activations [13]. In this work, FID is measured using a sample size of 1340 images separately selected from real and synthetic images with a score ranging from 0.0 to . A lower FID score indicates a higher degree of diversity of synthetic images related to real images.


In Eq.(2), and denote real and synthetic images while and denote the mean and covariances of real and synthetic images.

FID uses the last pooling layer of the Inception V3 model, which contains a 2048 dimensional feature, it requires 2048 or more training image samples as input. As there are only 1340 healthy chest X-ray images available, a Pre-aux Classifier layer containing 768-dimensional features is used instead of the 2048 dimensional feature.

Ii-E Assessing the Utility of Synthetic Chest X-ray Images

To assess the utility of synthetic X-ray images, a sequential CNN [15] is implemented for the classification of healthy X-ray images. The intent is to augment the minority class with synthetic X-ray images with varying degrees of similarity and diversity. This will enable an evaluation of the efficacy of AIIN for the DCGAN in augmenting chest X-ray images. Whereas the CNN was used to classify Pneumonia X-ray images in [15] as the results were assessed for Pneumonia representing positive labels.

Synthetic images were rescaled to 150x150 with Open-CV library and the CNN model reimplemented as detailed in [15]. The CNN model is trained on the dataset with 13 GAN-based augmentation variants. 1340 synthetic images are generated for each variant and used to address the data imbalance problem. Selection of variants was based on those with the most promising scores for MS-SSIM and FID. Geographical transformation such as rotation 15°, shear, and zoom range of 0.2 was also used. Classification scores are compared with the un-normalized generated healthy chest X-ray images as detailed in Table I.

Classifier (BS=10) Traditional Aug GAN Aug GAN-training BS WS CT MS-SSIM FID Accuracy (%) Precision Recall Specificity
CNN [15] x x x x x x 94.39 0.92 0.99 0.86
CNN (Re-implemented) x x x x x x 91.20 0.88 0.99 0.78
CNN Un-norm 20 x x +0.473 1.903 87.5 0.84 0.98 0.69
CNN Un-norm 67 x x +0.054 1.096 87.20 0.84 0.98 0.69
CNN Un-norm 134 x x +0.029 0.687 88.94 0.86 0.98 0.74
CNN Norm 134 4x4 20 +0.013 0.580 87.6 0.85 0.98 0.71
CNN Norm 134 8x8 50 +0.025 0.430 91.50 0.89 0.99 0.79
CNN Norm 67 16x16 10 +0.031 0.444 87.98 0.85 0.98 0.71
CNN Norm 134 4x4 0 +0.036 0.540 91.50 0.89 0.99 0.79
CNN Norm 67 8x8 10 +0.035 0.425 90.06 0.87 0.99 0.74
CNN Norm 67 16x16 20 +0.058 0.362 84.13 0.82 0.96 0.65
CNN Gaussian 134 3x3 x +0.01 0.426 85.57 0.82 0.99 0.63
CNN Gaussian 134 9x9 x +0.0008 0.547 90.38 0.88 0.99 0.76
CNN Median 134 3x3 x +0.015 0.557 88.62 0.85 0.99 0.72
CNN Median 134 9x9 x +0.001 0.550 87.50 0.84 0.99 0.69
Aug: Augmentation; Un-norm: Un-normalized data; Norm: Normalized data; CT: Contrast Threshold; BS: Batch Size; WS: Window Size
TABLE I: Results for CNN’s binary classification performance of healthy and Pneumonia X-ray images under different GAN-based training scenarios. This table compares the best scores of MS-SSIM and FID of a specific window size using classification performance metrics.

Iii Results and Discussion

The MS-SSIM and FID scores of AIIN, Gaussian, and Mean normalized with un-normalized generated X-ray images are depicted in Fig. 2 and Fig. 3.

The intra-class mode collapse is identified by a higher MS-SSIM score of un-normalized synthetic X-ray images than real images. The AIIN has alleviated the mode collapse by improving the capacity of DCGAN to generate diversified normalized X-ray images as shown by the improved MS-SSIM scores. Parameters like window size, contrast threshold, and batch size have a significant impact on the generation of diversified synthetic images as indicated by the varying MS-SSIM scores. Gaussian and median filtering approaches achieve higher MS-SSIM scores of normalized synthetic images than AIIN. These filtering approaches suppress noise in the image but blur the edges of an image reducing the structural information of features yet improving the MS-SSIM score. Therefore, these filtering approaches have no advantage to DCGAN for alleviating the mode collapse problem.

FID analysis indicates the efficacy of AIIN in improving the intra-class diversity of synthetic normalized X-ray images as depicted in Fig. 2. Results show that the parameters; window size, contrast threshold, and batch size have a considerable impact on the diversity of synthetic X-ray images as indicated by the varying FID scores. The AIIN achieves relatively better FID scores than Gaussian and median filtering techniques as depicted in Fig. 3.

Several GAN-based variants have been implemented to augment healthy X-ray images as reported in Table I. The predictions for minority class containing healthy X-ray images represent negative labels as the CNN is a classifier of Pneumonia X-ray images. Therefore, the results are evaluated by the specificity score. The best specificity score is achieved at batch size 134 using window size 4x4 and 8x8 with a contrast threshold of 0 and 50. This improvement demonstrates the normalized synthetic images capacity to augment the healthy X-ray images outperforming the un-normalized approaches. The CNN has issues like randomization of features and overfitting. In this case, the CNN focuses on classifying Pneumonia and learning the features of lung segments to differentiate healthy images from Pneumonia images in the chest X-ray images. The lung segment is highlighted more using 4x4 and 8x8 window sizes as compared to the alternate permutations of the experiment yet achieve good classification measures. In comparison with Gaussian and median filtering approaches, the AIIN demonstrates advantages in that it does not degrade the structural information of features in images yet achieves better classification scores as presented in Table. I.

Iv Conclusion

In this work, an AIIN technique is proposed for the DCGAN to improve the diversity of generated chest X-ray images. Results show that the DCGAN with AIIN can generate high diversified X-ray images than DCGAN without AIIN (indicated by better MS-SSIM and FID scores) while alleviating mode collapse problem. The AIIN also performed better than the Gaussian and median filtering preprocessing techniques via diverse image features. Furthermore, the efficacy of the proposed approach is verified by using the augmented (generated X-ray images combined with the real images) images to train machine learning classifiers.


  • [1] M. Mostapha and M. Styner, “Role of deep learning in infant brain MRI analysis,” Magnetic resonance imaging, vol. 64, pp. 171–189, 2019.
  • [2] Z. Wang, Q. She, and T. E. Ward, “Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy,” ACM Computing Surveys (CSUR), vol. 54, no. 2, pp. 1–38, 2021.
  • [3] A. S. Lundervold and A. Lundervold, “An overview of deep learning in medical imaging focusing on MRI,” Zeitschrift für Medizinische Physik, vol. 29, no. 2, pp. 102–127, 2019.
  • [4] A. Torfi, M. Beyki, and E. A. Fox, “On the Evaluation of Generative Adversarial Networks By Discriminative Models,” in

    2020 25th International Conference on Pattern Recognition (ICPR)

    .   IEEE, 2021, pp. 991–998.
  • [5] Z. Gong, P. Zhong, and W. Hu, “Diversity in machine learning,” IEEE Access, vol. 7, pp. 64 323–64 350, 2019.
  • [6] Kermany, Daniel S and Goldbaum, Michael and Cai, Wenjia and Valentim, Carolina CS and Liang, Huiying and Baxter, Sally L and McKeown, Alex and Yang, Ge and Wu, Xiaokang and Yan, Fangbing and others, “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell, vol. 172, no. 5, pp. 1122–1131, 2018.
  • [7] S. Kora Venu and S. Ravula, “Evaluation of Deep Convolutional Generative Adversarial Networks for Data Augmentation of Chest X-ray Images,” Future Internet, vol. 13, no. 1, p. 8, 2021.
  • [8] K. Zuiderveld, “Contrast limited adaptive histogram equalization,” Graphics gems, pp. 474–485, 1994.
  • [9] A. Shah, J. I. Bangash, A. W. Khan, I. Ahmed, A. Khan, A. Khan, and A. Khan, “Comparative analysis of median filter and its variants for removal of impulse noise from gray scale images,” Journal of King Saud University-Computer and Information Sciences, 2020.
  • [10] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv preprint arXiv:1511.06434, 2015.
  • [11] Soumith Chintala. (2021) How to Train a GAN? Tips and tricks to make GANs work. [Accessed on May. 05, 2021]. [Online]. Available:
  • [12] A. Odena, C. Olah, and J. Shlens, “Conditional Image Synthesis with Auxiliary Classifier GANs,” in International conference on machine learning.   PMLR, 2017, pp. 2642–2651.
  • [13] T. Miyato and M. Koyama, “cGANs with Projection Discriminator,” in International Conference on Learning Representations, 2018. [Online]. Available:
  • [14] A. Borji, “Pros and cons of GAN evaluation measures,” Computer Vision and Image Understanding, vol. 179, pp. 41–65, 2019.
  • [15]

    R. Siddiqi, “Automated pneumonia diagnosis using a customized sequential convolutional neural network,” in

    Proceedings of the 2019 3rd international conference on deep learning technologies, 2019, pp. 64–70.