3N-GAN: Semi-Supervised Classification of X-Ray Images with a 3-Player Adversarial Framework

by   Shafin Haque, et al.

The success of deep learning for medical imaging tasks, such as classification, is heavily reliant on the availability of large-scale datasets. However, acquiring datasets with large quantities of labeled data is challenging, as labeling is expensive and time-consuming. Semi-supervised learning (SSL) is a growing alternative to fully-supervised learning, but requires unlabeled samples for training. In medical imaging, many datasets lack unlabeled data entirely, so SSL can't be conventionally utilized. We propose 3N-GAN, or 3 Network Generative Adversarial Networks, to perform semi-supervised classification of medical images in fully-supervised settings. We incorporate a classifier into the adversarial relationship such that the generator trains adversarially against both the classifier and discriminator. Our preliminary results show improved classification performance and GAN generations over various algorithms. Our work can seamlessly integrate with numerous other medical imaging model architectures and SSL methods for greater performance.



There are no comments yet.


page 3


EC-GAN: Low-Sample Classification using Semi-Supervised Algorithms and GANs

Semi-supervised learning has been gaining attention as it allows for per...

Semi-Supervised Deep Learning for Abnormality Classification in Retinal Images

Supervised deep learning algorithms have enabled significant performance...

False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs

Generating large quantities of quality labeled data in medical imaging i...

Generative Adversarial Network for Medical Images (MI-GAN)

Deep learning algorithms produces state-of-the-art results for different...

Semi-Supervised Self-Growing Generative Adversarial Networks for Image Recognition

Image recognition is an important topic in computer vision and image pro...

Semi-Supervised Semantic Segmentation of Vessel Images using Leaking Perturbations

Semantic segmentation based on deep learning methods can attain appealin...

A Semi-Supervised Classification Method of Apicomplexan Parasites and Host Cell Using Contrastive Learning Strategy

A common shortfall of supervised learning for medical imaging is the gre...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Medical image analysis via deep learning is heavily reliant on large-scale, labeled datasets Shen et al. (2017). Semi-supervised learning (SSL) has gained attention as an alternative to fully-supervised learning. SSL is for tasks which have datasets containing unlabeled samples, and there have been many works in such direction Anwar et al. (2018); Liu et al. (2020); Haque et al. (2021); Gu et al. (2019); Liu et al. (2020). However, in medical imaging, datasets which are fully labeled but extremely low in samples are more common. For example, x-rays are most often taken for diagnostic purposes, so they are generally accompanied with labels. The difficulty lies in aggregating these x-rays.

Restricted, fully-supervised datasets can be aided by deep generative models, as they can generate artificial samples to supplement full-supervised training. Generative Adversarial Networks Goodfellow et al. (2014) are a prominent form of generative models where a discriminator and generator work adversarially towards one another. The generator is trained to produce realistic images by learning to replicate the data distribution while the discriminator is trained to distinguish between real and fake samples. Structured as a 2-player game, only very few papers construct an adversarial framework with an additional neural classifier Li et al. (2017); Shen et al. (2018). To our best knowledge, none utilize classifier predictions on unlabeled generations to update the generator, especially using pseudo-labels Lee and others (2013). Also to our best knowledge, no methods in medical imaging use GANs to generate artificial samples for semi-supervised classification.

We introduce 3N-GAN, or 3 Network Generative Adversarial Network, for semi-supervised classification of X-ray images using generated samples as supplemental data. Our results confirm that our 3-player adversarial framework with innovative adversarial loss functions improves classifier and generator performance over baseline models at varying levels of supervision.

2 Methods

Figure 1: Schematic of 3N-GAN. The generator (blue) produces fake X-Ray images, which are unlabeled. The classifier (red) is trained on both real and fake images and uses pseudo-labeling for semi-supervised classification. The generator is updated on the classifier adversarial loss (based on the classifier-produced pseudo-label) and the discriminator adversarial loss.

Figure 1 displays the adversarial training procedure for 3N-GAN. This method can be viewed as an extension of Haque (2021), which is only published as a 2-page abstract. Our method, 3N-GAN, introduces multiple significant improvements, such as truly incorporating the classifier into the adversarial framework, which the previous method did not attempt.

All 3 networks are trained simultaneously. The discriminator is trained conventionally. The generator is given a random latent vector as an input and outputs fake images. Contrary to many semi-supervised GAN classification methods

Salimans et al. (2016), we separate the discriminator and classifier. Achieving two tasks with a single network when the tasks are not very related, such as discrimination and classification, may not be optimal as the network must approximate two distributions Haque (2021).

The classifier is simultaneously trained on real images and artificial images produced by the generator. Generated images act as supplemental data for a classifier, increasing the amount of data available to the classifier. GAN generated samples do not have labels, requiring the use of semi-supervised algorithms. For semi-supervised classification, we use both pseudo-labeling Lee and others (2013) and KL divergence loss, which has not been used with GAN-based semi-supervised classification before. KL divergence loss ensures consistency by penalizing divergence between predictions on real and generated samples.

The classification loss objective


has a supervised component for the labels () and predictions () of real samples, an unsupervised component for the predictions on generated samples () and their pseudo-labels (), and a second unsupervised component computing KL divergence loss between the predictions on real samples and GAN generated samples. and are unsupervised loss weights and is the pseudo-labeling threshold.

To incorporate the classifier into the adversarial framework, we update the generator from the unsupervised classification loss. The generator now has a discriminator adversarial loss and a classifier adversarial loss. The generator objective


has the discriminator adversarial loss based on the discriminator predictions () on generated images and the classifier adversarial loss which is identical to the unsupervised classification loss from .

The classifier adversarial loss trains the generator to generate images whose class can be discerned accurately. Certain features that distinguish between classes, such as lung inflammations, may be more accurately produced in the generator’s samples. The classifier and generator provide feedback to one another, completing the 3-player adversarial framework. To our best knowledge, updating the generator on classification predictions using a pseudo-label is entirely novel and unavailable in literature.

3 Results and Conclusion

We experiment with a binary pneumonia classification dataset (CheX) Kermany et al. (2018). The dataset contains a total of 5,863 X-Rays with 624 as an external validation set. We perform experiments at various quantities of training data: 200, 500, 750, 1000, and 2000 X-rays (even split between classes). We use the DCGAN Radford et al. (2015) implementation for our generator and discriminator. Our classifier architecture is the DC discriminator. All inputs were normalized, gray-scaled, and resized to 64 64

1 before training. Each experiment was trained for 100 epochs with mini-batch size 10 and repeated 5 times. We compare against: a vanilla classifier, a multi-tasking discriminator, and EC-GAN. We set

= 0.9, , = 0.3 and = 0.01 through tuning experiments.

Model Dataset Size
200 500 750 1000 2000
Vanilla Classifier 85.04 87.82 89.17 90.8 91.67
Multi-Tasking Discriminator Salimans et al. (2016) 89.06 92.12 91.85 94.12 94.85
EC-GAN Haque (2021) 90.3 92.44 93.10 94.95 95.35
3N-GAN 91.52 93.49 94.16 94.88 96.03
Table 1: Accuracy metrics of baselines against 3N-GAN at various quantities of data. The best scores at each dataset size are bolded.
(a) DCGAN Generated X-Rays
(b) 3N-GAN Generated X-Rays
(c) Real X-Rays
Figure 2: Generated X-rays from DCGAN and our model, 3N-GAN (using the same input vector) compared to real X-Rays.

Table 1 displays the accuracy scores for diagnosing pneumonia of each model at the 5 different data settings. The results confirm the superiority of 3N-GAN over the other methods, as our model trained with just 200 images almost reaches the performance of a vanilla classifier trained with 2000 images. Compared to EC-GAN, we have slightly increased accuracy ( 1% raw score), proving the effectiveness of the adversarial classifier loss. Figure 2 compares fake X-rays from DCGAN and 3N-GAN to real X-rays. Visually, 3N-GAN generations are less blurry and more defined, proving the importance of our contributions. Moreover, key structures such as ribs and organs are sharp and present in 3N-GAN generations, and they resemble real X-rays more accurately than a DCGAN.

Ultimately, our preliminary results are quite promising, as 3N-GAN outperforms related methods in both diagnostic classification and x-ray image generation. In 3N-GAN, our classifier performance increases because the supplemental images supplied by the generator are of higher, improved quality. Our future work will include evaluations against even more methods and investigate additional changes to improve image generation.

4 Potential Negative Impacts

Potential negative impacts of our work include deepfake ethical violations related to GANs. GANs have been used for deepfakes Korshunov and Marcel (2018), which are AI generated media to replicate and falsify a particular individual’s likeness. Our GAN could be exploited to generate fake X-ray scans for patients. Synthetic X-rays could be used for false diagnostic results by patients or medical professionals. Fairness issues would also arise if our work were to be used in a clinical setting, such as determining which patients receive priority access to deep learning-based diagnostics. Regarding privacy, collecting patient data for deep learning-based methods poses patient privacy concerns. Another major concern results from false-positive or false-negative diagnoses on X-ray scans. While recent advancements suggest deep learning-based medical imaging analysis is reaching performance levels on par with professionals, if our classification algorithm predicted incorrect diagnoses, major liability and ethical issues would arise. This is a common challenge for deep learning medical imaging diagnostic models Aggarwal et al. (2021). By our interpretation, our work does not introduce any unique negative impacts aside from the long-existing challenges of deep learning-based medical imaging analysis. However, these challenges must be addressed before algorithms such as ours could be applied in clinical practice.


  • R. Aggarwal, V. Sounderajah, G. Martin, D. S. Ting, A. Karthikesalingam, D. King, H. Ashrafian, and A. Darzi (2021) Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ digital medicine 4 (1), pp. 1–23. Cited by: §4.
  • S. M. Anwar, M. Majid, A. Qayyum, M. Awais, M. Alnowami, and M. K. Khan (2018)

    Medical image analysis using convolutional neural networks: a review

    Journal of Medical Systems 42 (11), pp. 226. Cited by: §1.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative Adversarial Networks. arXiv. External Links: 1406.2661, Link Cited by: §1.
  • Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, and J. Liu (2019) CE-Net: context encoder network for 2D medical image segmentation. IEEE Transactions on Medical Imaging 38 (10), pp. 2281–2292. External Links: Document Cited by: §1.
  • A. Haque, A. Wang, D. Terzopoulos, et al. (2021) Multimix: sparingly-supervised, extreme multitask learning from medical images. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 693–696. Cited by: §1.
  • A. Haque (2021) EC-gan: low-sample classification using semi-supervised algorithms and gans (student abstract).

    Proceedings of the AAAI Conference on Artificial Intelligence

    35 (18), pp. 15797–15798.
    External Links: Link Cited by: §2, §2, Table 1.
  • D. S. Kermany, M. Goldbaum, W. Cai, C. C. S. Valentim, H. Liang, S. L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan, et al. (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 (5), pp. 1122–1131. Cited by: §3.
  • P. Korshunov and S. Marcel (2018)

    Deepfakes: a new threat to face recognition? assessment and detection

    arXiv preprint arXiv:1812.08685. Cited by: §4.
  • D. Lee et al. (2013)

    Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks

    In Workshop on challenges in representation learning, ICML, Vol. 3, pp. 896. Cited by: §1, §2.
  • C. Li, K. Xu, J. Zhu, and B. Zhang (2017) Triple generative adversarial nets. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4091–4101. Cited by: §1.
  • Q. Liu, L. Yu, L. Luo, Q. Dou, and P. A. Heng (2020) Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model. arXiv. External Links: 2005.07377, Document Cited by: §1.
  • Q. Liu, L. Yu, L. Luo, Q. Dou, and P. A. Heng (2020) Semi-supervised medical image classification with relation-driven self-ensembling model. IEEE transactions on medical imaging 39 (11), pp. 3429–3440. Cited by: §1.
  • A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §3.
  • T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In Advances in neural information processing systems, pp. 2234–2242. Cited by: §2, Table 1.
  • D. Shen, G. Wu, and H. Suk (2017) Deep learning in medical image analysis. Annual review of biomedical engineering 19, pp. 221–248. Cited by: §1.
  • Y. Shen, P. Luo, J. Yan, X. Wang, and X. Tang (2018) Faceid-gan: learning a symmetry three-player gan for identity-preserving face synthesis. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 821–830. Cited by: §1.