TUNA-Net: Task-oriented UNsupervised Adversarial Network for Disease Recognition in Cross-Domain Chest X-rays

08/21/2019 ∙ by Yuxing Tang, et al. ∙ National Institutes of Health 0

In this work, we exploit the unsupervised domain adaptation problem for radiology image interpretation across domains. Specifically, we study how to adapt the disease recognition model from a labeled source domain to an unlabeled target domain, so as to reduce the effort of labeling each new dataset. To address the shortcoming of cross-domain, unpaired image-to-image translation methods which typically ignore class-specific semantics, we propose a task-driven, discriminatively trained, cycle-consistent generative adversarial network, termed TUNA-Net. It is able to preserve 1) low-level details, 2) high-level semantic information and 3) mid-level feature representation during the image-to-image translation process, to favor the target disease recognition task. The TUNA-Net framework is general and can be readily adapted to other learning tasks. We evaluate the proposed framework on two public chest X-ray datasets for pneumonia recognition. The TUNA-Net model can adapt labeled adult chest X-rays in the source domain such that they appear as if they were drawn from pediatric X-rays in the unlabeled target domain, while preserving the disease semantics. Extensive experiments show the superiority of the proposed method as compared to state-of-the-art unsupervised domain adaptation approaches. Notably, TUNA-Net achieves an AUC of 96.3 pediatric pneumonia classification, which is very close to that of the supervised approach (98.1 domain.



There are no comments yet.


page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

While deep convolutional neural networks (CNNs) have achieved encouraging results across a number of tasks in the medical imaging domain, they frequently suffer from generalization issues due to source and target domain divergence. Examples of such divergence include distribution shift caused by images collected with distinct protocols, from different institutions or patient groups. This can be alleviated by

supervised domain adaptation (SDA) [1, 19], which adapts certain layers of the model that trained with large amounts of well-labeled source data, with additional moderate amounts of labeled target data. However, obtaining abundant labels in each new, unseen domain is a non-trivial and laborious process that relies heavily on skilled clinicians in the majority of clinical applications. Alternatively, unsupervised domain adaptation (UDA) [15] aims to mitigate the harmful effects of domain divergence when transferring knowledge [12, 13]

from a supervised (labeled) source domain to an unsupervised (unlabeled) target domain. Because of its potential benefits for medical image processing, UDA of deep learning models has attracted many researchers’ attention 

[7, 18, 2].

Adversarial adaptation methods [15, 5] have become increasingly popular with the recent success of generative adversarial networks (GANs) [3] and their variants [20]. In medical imaging, most of the previous work for adversarial adaptation focuses on lesion or organ segmentation [7, 19, 18, 2]. For instance, Kamnitsas et al[7] derive domain-invariant features by an adversarial network for brain lesion segmentation of MR images from two different datasets. GAN-based image-to-image (I2I) translation methods [20] are also widely used to generate medical images [10, 8] cross modalities to help adaptation. For example, Zhang et al[18] segment multiple organs in unlabeled X-ray images with labeled digitally reconstructed radiographs rendered from 3D CT volumes, using I2I translation. Zhang et al[19] improve Cycle-GAN [20] by introducing shape-consistency for CT and MRI cardiovascular 3D image translation to help organ segmentation. Though CT and MR images are not necessarily paired, the shape-consistency loss requires supervision of pixel-wise annotations from both domains. Chen et al[2] preserve semantic structural information of the lungs in chest radiographs (X-rays) for cross-dataset lung segmentation.

All the previous methods deal with limited domain shift or large organs appearing at approximately fixed positions with clear boundaries, or both. Moreover, they do not necessarily preserve class-specific semantic information of lesions or abnormalities in the process of distribution alignment. An illustrative example is, when translating an adult X-ray into a pediatric X-ray, there is no guarantee that fine-grained disease content on the original image will be explicitly transferred. The capability of preserving class-specific semantic context across domains is crucial for medical imaging analysis for certain clinically relevant tasks, such as disease or lesion classification, detection and segmentation [14, 11, 9, 17]. However, to our best knowledge, solutions to this problem of adversarial adaptation for medical imaging are limited.

In this paper, we present a novel framework to tackle the target task of disease recognition in cross-domain chest X-rays. Specifically, we proposed a task-oriented unsupervised adversarial network (TUNA-Net) for pneumonia (findings on X-rays are airspace opacity, lobar consolidation, or interstitial opacity) recognition in cross-domain X-rays. Two visually discrepant but intrinsically related domains are involved: adult and pediatric chest X-rays. The TUNA-Net consists of a cyclic I2I translation framework with class-aware semantic constraint modules. In the absence of labels from one domain, the proposed model is able to 1) synthesize “radio-realistic” (i.e., a synthesized radiograph that appears anatomically realistic) images with sufficient low-level details across two different domains, 2) preserve high-level class-specific semantic contextual information during translation, 3) regularize learned mid-level features of real and synthetic target domains to be similar, 4) optimize the objective functions simultaneously to generalize to the unlabeled domain. We demonstrate the effectiveness of our approach on two public chest X-ray datasets of sufficient domain shift for pneumonia recognition.

2 Method

2.1 Problem Formulation

In this work, we focus on the problem of unsupervised domain adaptation, where we are given a source domain with both images (e.g., adult X-rays) and labels (e.g., normal or pneumonia), and a target domain with only images (e.g., pediatric X-rays), but no labels. The goal is to learn a classification model from images of both domains but with only source labels and predict the labels in the target domain. Note that are naturally unpaired with as these images are from two different patient populations (adults and children).

A naive baseline method is to learn solely from source images and labels, then apply it directly on target domain. While performs well on data with similar distribution as the source data, it typically leads to degraded performance on the target data because of domain divergence. To alleviate this effect, we follow previous methods [20, 19, 18] to map images from two domains () using multi-domain I2I translation with unpaired training data. During translation, we add constraints at different levels to preserve both holistic and fine-grained class-specific image content. Consequently, the model learned on the source domain can be well generalized to the target domain. The flowchart of the proposed framework for UDA is shown in Figure 1.

Figure 1: The framework of TUNA-Net. The question we investigate is whether class-specific semantics can be preserved in an I2I translation framework (e.g., Cycle-GAN [20]) to help domain adaptation, providing disease labels only in source domain (e.g., translate an adult chest X-ray into a pediatric chest X-ray while preserving disease semantics, i.e., normal or pneumonia). In test phase, model is applied on target pediatric images to make predictions. In this figure, for inputs from both domains, top two examples are normal, bottom two examples are with pneumonia.

2.2 Pixel-level image-to-image translation with unpaired images

GANs [3] have been widely used for image-to-image translation. Given unpaired images from two domains, we adopt Cycle-GAN [20] to first learn two mappings: and , with two generators and , so that discriminators and can not distinguish between real and synthetic images generated by . For and its discriminator , the objective is expressed as the adversarial learning loss:


A similar adversarial loss can be designed for mapping and its discriminator as well: i.e., .

To preserve sufficient low-level content information for domain adaptation, we then use the cycle consistency loss [20] to force the reconstructed synthetic images and to resemble their inputs and :


where and , is the norm.

The generative adversarial training with cycle-consistency enables synthesizing realistic looking radiographs across domains. However, there is no guarantee that high-level semantics would be preserved during translation. For example, when translating an adult X-ray with lung opacities, sometimes it might be converted into a normal pediatric X-ray without opacities, since the disease semantics are not explicitly modelled in the learning process.

2.3 High-level class-specific semantics modelling

To preserve high-level class-specific semantic information indicating abnormalities in the image before and after translation, we propose to explicitly model disease labels into the translation framework by incorporating auxiliary classification models with source labels.

A source classification model is first learned on the labeled source data

using a cross-entropy loss to classify



where is the softmax function, if an input image belongs to class , otherwise . We then enforce the learned to perform similarly on the reconstructed source data to minimize . In this way, the high-level class specific content is preserved within the source target source cycle.

To retain similar semantics within the target source target cycle in the absence of target labels , we learn a target classification model (fine-tuned from ) on synthetic target images to minimize , in the mean time, minimizing , so that classifiers in both domains produce consistent predictions to keep semantic consistency. The total semantic classification loss is:


By modelling disease labels into the translation network, the synthesized images maintain meaningful semantics to favor the target clinically relevant task. For instance, can be acted as a disease classifier on the target domain.

2.4 Mid-level feature regularization

Now that we have both low-level content and high-level semantics preserved in the transformation network, we further add mid-level feature constraints on the target model

. This is done so that features extracted from the middle layers of

on real target data will be similar with that on synthetic target data. Inspired by the perceptual loss [6] that encourages image before and after translation to be perceptually similar, we impose feature reconstruction loss, to encourage real target image and synthetic target image

to be similar in the feature space. Using this feature regularization in training for middle layers of CNNs also tends to generate images that are visually indistinguishable from target domain referring to our experiments. The feature reconstruction loss is the normalized Euclidean distance between feature representations:


where is a convolutional block from target model , and and are features maps of size output by the convolutional block.

2.5 Final objective and implementation details

The final objective of TUNA-Net is the sum of adversarial learning losses, cycle-consistency loss, semantic classification loss and feature reconstruction loss:


Driven by the target task of disease recognition, this corresponds to optimizing the objective for the adapted target model .

We adopt Cycle-GAN [20] for training the I2I translation framework. We use 9 residual blocks [4] for the generator network for an input X-ray image of size 512 512. For source classification networks

, we use ImageNet pre-trained ResNet with 18 layers 

[4] as a trade-off between performance and GPU memory usage. The target classification model is fine-tuned from the source model hence has the same network structure with . Feature maps of conv_3 (56 56 128) and conv_4 (28 28 256) are extracted from as mid-level feature representations to calculate the reconstruction loss. in Eq. 6 is set to 10 as in [20]

. All other networks are trained from scratch with a batch size of 1, an initial learning rate of 0.0002 for first 100 epochs and linearly decay to 0 in the next 100 epochs. All the network components are optimized using the Adam solver. The TUNA-Net is implemented using the PyTorch framework. All the experiments are run on a 32GB NVIDIA Tesla V-100 GPU.

3 Experiments

Material and settings: We extensively evaluate the proposed TUNA-Net for unsupervised domain adaptation on two public chest X-ray datasets containing normal and pneumonia frontal view X-rays, i.e., an adult chest X-ray dataset used in the RSNA Pneumonia Detection Challenge 111https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data (a subset of the NIH Chest X-ray 14 [16]) and a pediatric chest X-ray dataset 222https://doi.org/10.17632/rscbjbr9sj.3 from Guangzhou Women and Children’s Medical Center in China. We set the adult dataset as source domain and the pediatric dataset as target domain. For the adult dataset, we use 6993 normal X-rays and 4659 X-rays with pneumonia. For the pediatric dataset, we use 5232 X-rays (either normal (n=1349) or abnormal with pneumonia (n=3883), but labels were removed in our setting) for training and validation. The combined dataset are used to train the adult pediatric translation framework. 5-fold cross-validation is performed. Classification performance of the proposed adaptation method is evaluated on a hold-out test test of 624 pediatric X-rays (normal: 234, pneumonia: 390) from the target domain.

Reference methods: Although unsupervised adversarial domain adaptation methods exist in medical imaging field, they are mainly designed for segmentation. Here we compare the performance of our proposed TUNA-Net with the following five relevant reference models:

1. NoAdapt: A ResNet-50 [4] CNN trained on adult X-rays is applied to the pediatric X-rays for pneumonia prediction. This serves as a lower bound method.

2. Cycle-GAN [20]: Without considering labels indicating diseases in X-rays during I2I translation using [20]. A model trained on labeled real adult X-rays is applied to synthetic adult X-rays generated from pediatric X-rays.

3. ADDA [15]: First we train an adult classification network with labeled X-rays. Then we adversarially learn a target encoder CNN such that a domain discriminator is unable to differentiate between the source and target domain. During testing, pediatric images are mapped with the target encoder to the shared feature space of the source adult domain and classified by the adult disease classifier.

4. CyCADA [5]: It improves upon ADDA by incorporating cycle consistency at both pixel and feature levels.

5. Supervised: We assume that disease labels for target domain are accessible. A supervised model can be trained and tested on labeled target domain. This servers as an upper bound method.

Quantitative results and ablation studies:

We calculate the Area Under the Receiver Operating Characteristic Curve (AUC), accuracy (Acc.), sensitivity (Sen.), specificity (Spec.) and F1 score to evaluate the classification performance of our model. The validation set is only used to optimize the threshold using Youden’s index (i.e.,

) for normal versus pneumonia classification. The classification results of our TUNA-Net and reference methods are shown in Table 1. The baseline method without adaptation (NoAdapt) performs poorly on the target task of pediatric pneumonia recognition, though the source classifier excels in pneumonia recognition on adult chest X-rays (AUC=98.0%). It demonstrates that the gap between the source and target domain are fairly large although they share the same disease labels. Cycle-GAN does not consider disease labels during I2I translation. It generates X-rays without preserving high-level semantics, resulting in many normal adult X-rays converted into pediatric X-rays with opacities on the lungs, or adults with lung opacities converted into normal pediatric X-rays. This hugely decreases the adaptation performance for the classification task, where correct labels are considered to be crucial. Our full TUNA-Net considers high-level class-specific semantics achieves an AUC of 96.3% with both sensitivity and specificity larger than 91%. It outperforms both ADDA and CyCADA with similar settings. It is also worth noting that the performance of TUNA-Net is very close to that of the supervised model, where labeled training images on the target dataset are available. We ablate different modules in the TUNA-Net to see their influence on the final model: a). We exclude the feature construction loss in the target classification model; b). We do not use reconstructed images to retrain the source classification model; c). We exclude the target classification model in the training, but use the synthetic images to train it offline. As shown in Table 1, each component contributes to improving the final TUNA-Net. The online end-to-end learning of with other components is crucial and contributes most to the performance improvement.

Model AUC(%) Acc.(%) Sen.(%) Spec.(%) F1
NoAdapt 89.30.4 82.50.3 83.60.7 80.80.8 0.860.02
Cycle-GAN [20] 80.42.5 74.22.7 76.93.3 69.92.8 0.760.04
ADDA [15] 91.80.4 88.10.4 88.20.5 87.00.4 0.890.02
CyCADA [5] 93.50.5 90.00.4 90.40.4 89.60.5 0.910.02
TUNA-Net 96.30.2 93.10.4 92.90.3 91.10.4 0.930.01
a) w/o feature loss 95.90.1 91.90.3 91.70.3 90.60.2 0.920.01
b) w/o on rec. 94.60.2 91.30.2 91.00.3 91.10.3 0.920.01
c) w/o , offline 94.10.2 90.70.2 91.00.4 90.50.2 0.910.01
Supervised 98.10.1 96.30.1 94.6 92.8 0.96
Table 1: Comparison of normal versus pneumonia classification results on the test set of pediatric X-ray dataset.
Figure 2: Qualitative comparison of image-to-image translation. Cycle-GAN is trained without using labels indicating normal or pneumonia, while CyCADA and our TUNA-Net considers labels in source domain in training. Left part shows adult pediatric, right shows pediatric adult. The first row shows two normal X-rays as input. The appearances of pneumonia(s) are pointed by arrows. Please refer to supplementary material for higher resolution images.

Qualitative results: We show some qualitative image-to image translation examples in Figure 2. Cycle-GAN failed to preserve important semantic information during transfer. CyCADA is able to preserve certain high-level semantics but not as robust as the proposed TUNA-Net. TUNA-Net retains image content of various levels: from low-level content, mid-level features, to high-level semantics. For example, for the bottom left adult input, Cycle-GAN removes the pathology while our TUNA-Net perfectly preserves it. The synthetic X-rays by TUNA-Net are most close to the input source image semantically and to the target domain anatomically.

Discussion: We specifically focused on normal versus pneumonia classification on a cross-domain setting. We showed that the I2I translation framework can be constrained using semantic classification components to preserve class-specific disease content for medical image synthesis. We used two public chest X-ray datasets with sufficient domain shift to demonstrate the ability of our unsupervised domain adaptation method. The domain adaptation from adult to pediatric chest X-rays is natural and intuitive. For example, medical students and radiology residents learn in a similar way: they first learn to read adult chest X-rays, and then they transfer the learned knowledge to pediatric X-rays.

4 Conclusion

In this paper, we investigated how knowledge about class-specific labels can be transferred from a source domain to an unlabeled target domain for unsupervised domain adaptation. Using adversarially learned cross-domain image-to-image translation networks, we found clear evidence that semantic labels could be translated across medical image domains. The proposed TUNA-Net is general and has the potential to be extended to more disease classes (e.g., pneumothorax), other image modalites (such as CT and MRI) and more clinically relevant tasks.


This research was supported by the Intramural Research Program of the National Institutes of Health Clinical Center and by the Ping An Technology Co., Ltd. through a Cooperative Research and Development Agreement. The authors thank NVIDIA for GPU donations.


  • [1]

    Ghafoorian, M., Mehrtash, A., Kapur, T., Karssemeijer, N., Marchiori, E., Pesteie, M., Guttmann, C.R., de Leeuw, F.E., Tempany, C.M., van Ginneken, B., et al.: Transfer learning for domain adaptation in mri: Application in brain lesion segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 516–524. Springer (2017)

  • [2] Ghafoorian, M., Mehrtash, A., Kapur, T., Karssemeijer, N., Marchiori, E., Pesteie, M., Guttmann, C.R., de Leeuw, F.E., Tempany, C.M., van Ginneken, B., et al.: Transfer learning for domain adaptation in mri: Application in brain lesion segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 516–524. Springer (2017)
  • [3] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
  • [4]

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  • [5]

    Hoffman, J., Tzeng, E., Park, T., Zhu, J.Y., Isola, P., Saenko, K., Efros, A., Darrell, T.: CyCADA: Cycle-consistent adversarial domain adaptation. In: Proceedings of the 35th International Conference on Machine Learning. pp. 1989–1998 (2018)

  • [6]

    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision. pp. 694–711. Springer (2016)

  • [7] Kamnitsas, K., Baumgartner, C., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D., Nori, A., Criminisi, A., Rueckert, D., et al.: Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In: International conference on information processing in medical imaging. pp. 597–609. Springer (2017)
  • [8] Tang, Y., Tang, Y., Han, M., Xiao, J., Summers, R.M.: Abnormal chest x-ray identification with generative adversarial one-class classifier. In: IEEE 16th International Symposium on Biomedical Imaging. pp. 1358–1361 (2019)
  • [9] Tang, Y., Yan, K., Tang, Y., Liu, J., Xiao, J., Summers, R.M.: Uldor: A universal lesion detector for ct scans with pseudo masks and hard negative example mining. In: IEEE 16th International Symposium on Biomedical Imaging. pp. 833–836 (2019)
  • [10] Tang, Y., Tang, Y., Xiao, J., Summers, R.M.: Xlsor: A robust and accurate lung segmentor on chest x-rays using criss-cross attention and customized radiorealistic abnormalities generation. arXiv preprint arXiv:1904.09229 (2019)
  • [11] Tang, Y.X., Tang, Y.B., Han, M., Xiao, J., Summers, R.M.: Deep adversarial one-class learning for normal and abnormal chest radiograph classification. In: Medical Imaging 2019: Computer-Aided Diagnosis. vol. 10950, p. 1095018. International Society for Optics and Photonics (2019)
  • [12] Tang, Y., Wang, J., Gao, B., Dellandréa, E., Gaizauskas, R., Chen, L.: Large scale semi-supervised object detection using visual and semantic knowledge transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2119–2128 (2016)
  • [13] Tang, Y., Wang, J., Wang, X., Gao, B., Dellandréa, E., Gaizauskas, R., Chen, L.: Visual and semantic knowledge transfer for large scale semi-supervised object detection. IEEE transactions on pattern analysis and machine intelligence 40(12), 3045–3058 (2017)
  • [14] Tang, Y., Wang, X., Harrison, A.P., Lu, L., Xiao, J., Summers, R.M.: Attention-guided curriculum learning for weakly supervised classification and localization of thoracic diseases on chest radiographs. In: International Workshop on Machine Learning in Medical Imaging. pp. 249–258. Springer (2018)
  • [15] Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7167–7176 (2017)
  • [16] Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2097–2106 (2017)
  • [17] Yan, K., Tang, Y., Peng, Y., Sandfort, V., Bagheri, M., Lu, Z., Summers, R.M.: Mulan: Multitask universal lesion analysis network for joint lesion detection, tagging, and segmentation. arXiv preprint arXiv:1908.04373 (2019)
  • [18] Zhang, Y., Miao, S., Mansi, T., Liao, R.: Task driven generative modeling for unsupervised domain adaptation: Application to x-ray image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 599–607. Springer (2018)
  • [19] Zhang, Z., Yang, L., Zheng, Y.: Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9242–9251 (2018)
  • [20] Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2223–2232 (2017)