The dataset used in COVID-DA: Deep Domain Adaptation from Typical Pneumonia to COVID-19
The outbreak of novel coronavirus disease 2019 (COVID-19) has already infected millions of people and is still rapidly spreading all over the globe. Most COVID-19 patients suffer from lung infection, so one important diagnostic method is to screen chest radiography images, e.g., X-Ray or CT images. However, such examinations are time-consuming and labor-intensive, leading to limited diagnostic efficiency. To solve this issue, AI-based technologies, such as deep learning, have been used recently as effective computer-aided means to improve diagnostic efficiency. However, one practical and critical difficulty is the limited availability of annotated COVID-19 data, due to the prohibitive annotation costs and urgent work of doctors to fight against the pandemic. This makes the learning of deep diagnosis models very challenging. To address this, motivated by that typical pneumonia has similar characteristics with COVID-19 and many pneumonia datasets are publicly available, we propose to conduct domain knowledge adaptation from typical pneumonia to COVID-19. There are two main challenges: 1) the discrepancy of data distributions between domains; 2) the task difference between the diagnosis of typical pneumonia and COVID-19. To address them, we propose a new deep domain adaptation method for COVID-19 diagnosis, namely COVID-DA. Specifically, we alleviate the domain discrepancy via feature adversarial adaptation and handle the task difference issue via a novel classifier separation scheme. In this way, COVID-DA is able to diagnose COVID-19 effectively with only a small number of COVID-19 annotations. Extensive experiments verify the effectiveness of COVID-DA and its great potential for real-world applications.READ FULL TEXT VIEW PDF
The capability of generalization to unseen domains is crucial for deep
Automated infection measurement and COVID-19 diagnosis based on Chest X-...
We introduce a new dataset called Synthetic COVID-19 Chest X-ray Dataset...
The global pandemic of COVID-19 has infected millions of people since it...
Supervised learning tends to produce more accurate classifiers than
The rapid spread of the new pandemic, coronavirus disease 2019 (COVID-19...
The COVID-19 pandemic has had devastating effects on the well-being of t...
The dataset used in COVID-DA: Deep Domain Adaptation from Typical Pneumonia to COVID-19
The outbreak of novel coronavirus disease 2019 (COVID-19) has rapidly spread worldwide [46, 42]. To date (April 28, 2020) there have been 2,954,222 confirmed cases of COVID-19 (including 202,597 deaths, with a fatal rate of 6.9%) , leading to great threats for global public health. Due to COVID-19, many countries have been forced into emergencies  and suffered from devastating effects on population health  and social economy [13, 3]. To fight against COVID-19, one key step is to diagnose patients and provide immediate medical treatment, thereby preventing further spread of COVID-19.
Currently, the main diagnosis method for COVID-19 is the real-time Reverse-Transcriptase Polymerase Chain Reaction (RT-PCR)  test, which is regarded as the gold standard for COVID-19 detection. However, RT-PCR has a lower diagnostic sensitivity and generally requires repeated tests for the final confirmation of infection . As a result, the RT-PCR test is very time-consuming and laborious . Meanwhile, it is extremely difficult for hospitals in hyper-endemic regions to provide sufficient RT-PCR tests for tons of suspected patients. To conquer this issue, an alternative diagnostic method is based on the screening of chest radiography images (CRIs), e.g., X-ray or computed tomography (CT) images , since COVID-19 patients often present abnormal characteristics of lung infection on CRIs [19, 28]. Compared with RT-PCR, CRI-based diagnosis method is more efficient and has been widely used in clinical diagnosis in practice [12, 36]. Nevertheless, when dealing with tons of patients, medical specialists still need to screen CRIs one by one, which, however, is highly stressful and time-consuming. Hence, there is an urgent need to develop computer-aided methods for the diagnosis of COVID-19, which help to improve the diagnostic efficiency of medical specialists.
Recently, deep learning (DL) has achieved remarkable success in medical image analysis [15, 44, 2, 1, 38, 17, 29]. It is a natural idea to develop DL-based methods for COVID-19 diagnosis. One of the key factors behind the success of DL is the large amount of labeled data . However, in the diagnosis of COVID-19, such extensive annotations are unavailable now due to prohibitive annotation costs and urgent work of doctors to fight against the pandemic. Hence, there is a strong motivation to develop domain adaptation 
to improve DL-based diagnostic models for COVID-19. Specifically, domain adaptation leverages a source domain with rich labeled data to help the model learning on the target domain. Considering that typical pneumonia has some similar characteristics with COVID-19 and many open-source pneumonia datasets are accessible, we seek to conduct domain adaptation from typical pneumonia to COVID-19 in this paper.
In this task, there are two major challenges. The first one is the domain discrepancy of data distributions, which mainly derived from different medical imaging devices or techniques. In this regard, directly applying a deep model trained on the typical pneumonia domain to the COVID-19 domain tends to perform poorly and be impractical. The second challenge lies in the diagnostic task difference between typical pneumonia and COVID-19. The two tasks are similar but not completely the same, which may result in poor generalization of the source-trained classifier to the target domain. However, most existing domain adaptation methods [49, 40, 41] neglect the task difference issue and adopt only one domain-shared classifier. As a result, they may perform poorly in COVID-19 diagnosis.
To solve the above challenges, we propose a novel deep domain adaptation method for the diagnosis of COVID-19 (namely COVID-DA), which relies on domain adversarial learning and a new classifier separation scheme. To be specific, we alleviate the domain discrepancy by aligning the feature distributions of two domains via feature adversarial adaptation. In this way, COVID-DA is able to learn domain-invariant features for classification. Based on such features, we handle the task difference issue based on a novel classifier separation scheme, which disentangles the classifier into a domain-shared classifier and two domain-specific classifiers (for two domains). Specifically, the domain-shared classifier aims to learn task-shared classification knowledge between typical pneumonia and COVID-19, while the domain-specific classifiers seek to learn task-specific classification knowledge. To this end, we train the domain-shared classifier to learn task-shared semantic information by aligning the joint distributions of two domains over features and predictions. Meanwhile, we maximize the diversity between the domain-shared and domain-specific classifiers to make the latter ones focus on task-specific classification information. Based on the above, COVID-DA is able to diagnose COVID-19 effectively with only a limited amount of labeled data, and thus is more applicable in real-world applications.
Our main contributions are summarized as follows:
We propose a novel deep domain adaptation method for the diagnosis of COVID-19. To the best of our knowledge, this is the first attempt to study domain adaptation from typical pneumonia to COVID-19.
Based on a novel classifier separation scheme and a new domain adversarial adaptation method, the proposed method is able to overcome the task difference and domain discrepancy simultaneously.
We conduct extensive experiments to evaluate the proposed method. Promising results demonstrate its effectiveness and superiority, e.g., the proposed method improves the diagnostic performacne for COVID-19 from 0.6875 to 0.9298 in terms of the F1 metric.
To control the transmission of COVID-19, one of the most important steps is to screen out the infected patients, and then provide proper treatments for them. Due to the relatively low time and labour costs, chest radiography imaging (CRI), e.g., X-ray or computed tomography (CT) imaging, has been widely adopted to provide diagnostic evidences for radiologists. However, when facing tons of suspected patients, it is still time-consuming for radiologists to screen medical images one by one, leading to inferior diagnostic efficiency. To address this, based on deep learning techniques, many computer-aided diagnosis methods have been developed , and some of them have been deployed in hospitals . For example, Xu et al. proposed a deep learning (DL) method for the early detection of COVID-19 from Influenza-A viral pneumonia and normal cases . Chen et al. proposed a UNet++ based deep model for segmenting the infected regions of COVID-19 . Nevertheless, since deep models are notoriously data-hungry, these DL-based methods require plenty of annotated data to achieve satisfactory performance. However, in the diagnosis of COVID-19, such rich supervision is unavailable in most practical scenarios due to prohibitive annotation costs and urgent work of doctors to fight the pandemic.
To solve this issue, recent studies [43, 48] directly combined publicly available typical pneumonia datasets and COVID-19 dataset together to train a multi-class classification model. However, these methods ignore the domain discrepancy between typical pneumonia and COVID-19, thereby resulting in limited diagnostic performance for COVID-19. Therefore, there is an urgent need to develop task-specific domain adaptation methods for COVID-19 to improve the performance of DL-based diagnosis models.
Most existing domain adaptation methods for natural images seek to alleviate the domain discrepancy either by adding adaptation layers to match high-order moments of distributions,e.g., DDC , or by devising a domain discriminator to learn domain-invariant features in an adversarial manner, e.g., DANN  and MCD . Following the latter manner, CLAN  proposed to conduct category-aware domain adaptation instead of only global alignment of domain distributions. In medical image analysis, by taking the characteristics of medical imaging into account, Kamnitsas et al. attempted to introduce a multi-connected domain discriminator for improved adversarial training . Ren et al. proposed a Siamese architecture on the target domain to add a regularization for the whole-slide images . However, all these methods ignore the task difference between domains, and thus may perform poorly on the adaptation from typical pneumonia to COVID-19. To handle these, we propose a new deep domain adaptation method for COVID-19, which aims to alleviate the domain discrepancy and overcome task difference simultaneously. In this way, the proposed method is able to diagnose COVID-19 more effectively in real-world applications.
Problem Definition. This paper studies the problem of domain adaptation from typical Pneumonia (source domain) to COVID-19 (target domain), where the model has access to only limited labeled data from the target domain. Formally, let be labeled source data, be limited labeled target data and be unlabeled target data. Here, denote the number of source data, labeled target data and unlabeled target data, where and . Moreover, let be the complete target domain with as the sample number.
The goal is to learn a well-performed deep model for the target domain, using both source samples (labeled) and target samples (partially labeled). This task, however, is very difficult due to (1) only limited labeled samples in the target domain and (2) apparent discrepancy between typical Pneumonia and COVID-19 in terms of domain distributions and tasks. However, existing domain adaptation methods for medical images only focus on alleviating the discrepancy in terms of domain distributions, while ignoring the task difference between domains. As a result, directly applying them to the task tends to perform poorly in practice. To solve this, we propose a new deep domain adaptation method for the diagnosis of COVID-19, namely COVID-DA.
To enforce effective domain knowledge adaptation, we seek to alleviate the domain discrepancy with domain adversarial adaptation and handle the task difference via a novel classifier separation scheme. To this end, as shown in Fig. 1, COVID-DA consists of three main parts: (1) a domain-shared feature extractor for extracting domain-invariant features; (2) two domain discriminators for feature adaptation and classifier adaptation, respectively; (3) a domain-shared classifier and two domain-specific classifiers for the diagnosis of COVID-19. Note that the separation of domain-shared and domain-specific classifiers helps to disentangle task-shared and task-specific pathological information regarding typical pneumonia and COVID-19.
Overall, COVID-DA conducts three main strategies as follows. (a) feature adversarial adaptation: we impose a domain loss to align the feature distributions of two domains, so that the domain discrepancy is minimized in an adversarial learning manner [40, 49]; (b) classifier adversarial adaptation: we exploit a domain loss to conduct joint distribution alignment for the domain-shared classifier , making it able to learn domain-shared pathological information in an adversarial learning manner; (c) classifier diversity maximization: we maximize the diversity between the domain-shared and domain-specific classifiers via a diversity loss , so that the domain-specific classifiers can learn task-specific information in two domains. Note that the strategies (b) and (c) help COVID-DA to handle the task difference effectively. Lastly, we train the feature extractor and all classifiers via a focal classification loss , which makes the model class imbalance-aware and discriminative. In this way, COVID-DA is able to adapt the source domain knowledge to the target domain and diagnose COVID-19 effectively.
The overall training procedure of COVID-DA is to solve the following minimax problem :
where denotes the parameters of the feature extractor and all classifiers , while denotes the parameters of two discriminators . Here, and are trade-off parameters.
Diverse imaging devices and preprocessing techniques intrinsically result in huge domain discrepancy in terms of data distribution. To resolve the discrepancy, we resort to domain adversarial learning for aligning feature distributions of domains. Specifically, on the one hand, a domain discriminator is trained to adequately distinguish feature representations between two domains by minimizing a domain loss . On the other hand, the feature extractor is trained to confuse the discriminator by maximizing the domain loss . As a result, the learned feature extractor is able to extract domain-invariant features that confuse the discriminator well. Based on the least square distance , we define the domain loss for feature adversarial adaptation as:
where denotes the prediction of the domain discriminator w.r.t. . We set the label of the target domain to 1 and that of the source domain to 0.
Most existing adversarial domain adaptation methods [21, 14, 49] assume that the source domain deals with the same classification task as the target domain. Hence, they usually focus on feature distribution alignment as Section III-B and use a classifier trained on the source domain to classify the target data. However, they may fail to handle the problem in this paper, since pneumonia diagnosis and COVID-19 diagnosis are similar but not completely the same, originated from different pathological mechanisms. Specifically, let and denote the feature distributions of the source and target domains, while let and be the prediction conditional distributions of two domains. Even though the feature distributions have been matched (i.e., ), the task difference potentially results in different prediction conditional distributions (i.e., ) [27, 25]. As a result, only using the source-trained classifier may not be able to diagnose COVID-19 well.
To solve this issue, as shown in Fig. 1, we propose a novel classifier separation scheme by disentangling the domain-shared classifier and the domain-specific classifiers . To be specific, the domain-shared classifier seeks to acquire task-shared classification knowledge, while the domain-specific classifiers aim to learn task-specific knowledge. By taking the average ensemble of two classifiers as the final prediction , COVID-DA is able to handle the task difference and diagnose COVID-19 well.
In this scheme, one key issue is how to learn the domain-shared classifier. In fact, when the feature distributions match well (i.e., ), if we train the classifier to align the joint distributions (i.e., ), then the domain-shared classifier is able to learn task-shared classification knowledge since . Motivated by this, we propose to train the domain-shared classifier by aligning the joint distributions via domain adversarial learning .
On the one hand, a discriminator is trained to adequately differentiate the joint distributions between domains by minimizing a domain loss . Specifically, the input of the discriminator consists of both features and predictions. On the other hand, the domain-shared classifier is trained to confuse the discriminator by maximizing the domain loss. As Section III-B, we define the domain loss for classifier adversarial adaptation based on the least square distance:
where denotes the prediction of the domain discriminator regarding the joint distribution over the feature and the prediction w.r.t. . Moreover, we denote the label of the target domain as 1 and that of the source as 0.
In COVID-DA, we handle the issue of task difference by disentangling the domain-shared and domain-specific classifiers. In Section III-C, we have enforced the domain-shared classifier to acquire domain-invariant classification knowledge. In this section, we further enforce the domain-specific classifiers to acquire domain private classification knowledge. To this end, we maximize the diversity between the domain-specific classifier and the domain-shared classifier. Specifically, given any sample , let denote the prediction of domain-shared classifiers (i.e., ), and let and denote the prediction of the domain-specific classifier (i.e., and ). Based on the cosine distance, we maximize the classifier diversity based on the following diversity loss:
In this way, the two domain-specific classifiers are different with the domain-shared classifier as large as possible, and thus able to learn domain-specific classification information. Moreover, since the final prediction of COVID-DA is the average ensemble of two classifiers, maximizing classifier diversity also enhance the performance of ensemble learning .
For the diagnosis of COVID-19, one can adopt any classification losses to train our COVID-DA, e.g., cross-entropy. Nevertheless, considering the class imbalance issue in medical diagnosis, we use the focal loss  as follows:
where is the final prediction of COVID-DA w.r.t. a given sample . Here, is the average prediction of both domain-shared () and domain-specific ( or ) classifiers. Note that, the focal loss is a widely-used loss for class imbalance issue . Moreover, denotes the element-wise product and is a hyper-parameter in focal loss.
, which reverses the gradient of the domain loss when backpropagating to the feature extractor or domain-shared classifier. In this way, we are able to train COVID-DA through standard backpropagation in an end-to-end manner.
To verify the proposed method111We will make the source code publicly available., we evaluate COVID-DA on two main aspects: (1) the performance in the diagnosis of COVID-19; (2) the algorithm characteristics of COVID-DA.
The dataset222The dataset is available at https://github.com/qiuzhen8484/COVID-DA.git. used in this experiment is collected from three open-source datasets, i.e., the COVID chest X-ray dataset , the COVID-19 Radiography Database333https://www.kaggle.com/tawsifurrahman/covid19-radiography-database. and the dataset of RSNA Pneumonia Detection Challenge on Kaggle444https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data.. Based on these collected data, we randomly choose part of normal cases and all typical pneumonia cases to make up the source domain, and use the rest of normal cases and all COVID-19 cases as the target domain. The statistics of two domains are summarized in Table I.
Note that only 30% of the training COVID-19 samples are labeled in the training process, which would be more practical in real-world scenarios. Moreover, the three datasets are acquired from different countries with various imaging devices, while the tasks of two domains are similar but not completely the same. Therefore, this domain adaptation task suffers from severe domain discrepancy and task difference. In addition, as shown in Table I, the class imbalance is also severe. Considering the above issues, such a diagnosis task of COVID-19 is very challenging.
We compare COVID-DA with four categories of methods. (1) Baselines: Source-only (training the model on the well-labeled source domain), Target-only (training the model on the target domain with limited labels) and Fine-tuning (training the model on the well-labeled source domain and then fine-tuning it on the target domain with limited labels); (2) deep diagnostic models for COVID-19: DLAD  and COVID-Net  (training on the labeled target domain); (3) Unsupervised domain adaptation: MCD , DSN , DANN , DMAN , which train deep models on both the labeled source domain and unlabeled target domain; (4) Semi-supervised domain adaptation: SDT  and semi-DMAN (extended from DMAN ). Note that, the semi-supervised domain adaptation methods train deep models using both labeled source data and partially-labeled target data.
|Method||F1 ()||Recall ()||Precision ()||AUC ()||Sum ()||Cost ()|
|Backbone||F1 ()||Recall ()||Precision ()||AUC ()||Sum ()||Cost ()|
We implement our method based on PyTorch. For a fair comparison, we adopt a Resnet-18 
model, pretrained on ImageNet, as the backbone of all methods. For all compared methods, we keep the same hyper-parameters as the original paper. For COVID-DA, the details consist of two parts. (1) network architectures: We implement the feature extractor based on Resnet-18, while we implement the two domain discriminators based on two-layer fully-connected networks , and implement all classifiers by one fully-connected layer; (2) parameter settings: In the training process, we use an SGD optimizer with the learning rate of 0.001 to train the whole network. The batch size for each domain is set to 16. As for the trade-off parameters in Eqn. (III-A), we set and via cross-validation. Following , we set for the focal loss.
We evaluate the diagnostic performance of all considered methods for COVID-19 in terms of six metrics: F1 score (%), Recall (%), Precision (%), AUC, Sum (%) and Cost. Specifically, the first three metrics are calculated based on previous studies [49, 50], the AUC metric is based on the work , and the Sum and Cost metrics are based on [51, 52]. We recommend readers to these papers for detailed implementations of these metrics.
We evaluate all methods in terms of six metrics and report the results in Table II. Overall, the proposed method performs the best, which confirms its effectiveness and superiority in COVID-19 diagnosis. Since there is an urgent need for computer-aided diagnosis for COVID-19, COVID-DA makes great medical sense in practice.
According to Table II, we draw the following observations. (1)
Both Target-only and Fine-tuning do not certainly outperform the Source-only on all considered evaluation metrics. One possible reason is that the deep model may overfit to the limited labeled data of the target domain, and thus perform limitedly on the test set.(2) deep diagnostic models (DLAD and COVID-Net) outperform Target-only, which demonstrates the contribution of particularly devised architectures. (3) most unsupervised domain adaptation (DA) methods (e.g., DANN, DSN and DMAN) outperform Source-only, which confirms the effectiveness of DA. (4) The semi-supervised DA methods (i.e., SDT and semi-DMAN) further improve the model performance on the target domain. This result verifies the contribution of limited target annotations in DA. (5) COVID-DA outperforms all compared methods, which demonstrates the superiority of the proposed method to leverage both well-labeled source data and partially-labeled target data.
|Parameter||Value||F1 ()||Recall ()||Precision ()||AUC ()||Sum ()||Cost ()|
|Objective||Measurement||F1 ()||Recall ()||Precision ()||AUC ()||Sum ()||Cost ()|
|Least Square Loss||92.98||88.33||98.15||0.985||94.11||6.4|
In this section, we use the Gradient-weighted Class Activation Mapping  (Grad-CAM) method to visualize the important regions that the devised classifiers focus on for predictions. We present the visualization results of three COVID-19 patients in Fig. 2. As expected, the domain-shared classifier focuses more on the surrounding regions to capture the task-shared classification information, while the target-specific classifier focuses more on pathological regions of COVID-19 itself. By combining the above two classifiers, COVID-DA is able to make predictions based on both task-shared and target-specific classification information. In addition, the visualized Grad-CAMs are also an interpretation for the prediction of COVID-DA, which helps doctors to judge the prediction reliability in practice.
We conduct ablation studies to evaluate the effectiveness of different components in COVID-DA. As shown in Table III, all components in our methods (i.e., feature adversarial adaptation, classifier adversarial adaptation, classifier diversity maximization, and focal loss) make empirical contributions and play important roles in our method. Particularly, domain adversarial losses ( and ) are relatively important. Moreover, the classifier diversity loss () is able to further improve diagnostic performance. These results demonstrate the necessity to reduce the domain discrepancy and overcome the task difference in the task of domain adaptation for COVID-19.
As mentioned in Section IV-A3, we set trade-off parameters and in all experiments, where adjusts the domain adversarial losses ( and ) and controls the classifier diversity loss (). In this section, we evaluate the sensitivities of these two parameters, where we only evaluate one parameter each time, fixing all other parameters. From Table IV, our proposed method achieves the best or relatively good performance with the setting and .
In our method, we define the domain adversarial losses (in Eqns. (2) and (3)) relying on the least square loss, since it helps to improve domain confusion and stabilize training, by preserving the domain distance information . In this section, we empirically compare it with the focal loss and original GAN loss . As shown in Table V, the least-square loss outperforms another two losses, which demonstrates the superiority of the adopted domain loss.
In our method, we define the classifier diversity loss (in Eqn. (4)) relying on the cosine distance. In fact, one can also use other distance metrics, e.g., L1 distance, L2 distance, KL divergence and JS divergence, according to the tasks at hand. To find the most suitable one, we conduct many preliminary experiments and report the results in Table V. To be specific, the cosine distance performs the best over all considered distances in the domain adaptation task for COVID-19.
In this paper, we have proposed a deep domain adaptation method for the diagnosis of COVID-19 (namely COVID-DA), which aims to transfer the domain knowledge from the well-labeled source domain (i.e., typical pneumonia) to the partially-labeled target domain (i.e., COVID-19). To be specific, we minimize the domain discrepancy by aligning the feature distributions of two domains via domain adversarial learning. Meanwhile, we develop a novel classifier separation scheme to overcome the issue of task difference between domains. In this way, the proposed method is able to learn a well-performed deep model with very limited annotations of COVID-19. Extensive experiments demonstrate the effectiveness and superiority of COVID-DA.
It is worth mentioning that our proposed method is of great clinical importance, since extensive annotations of COVID-19 are inaccessible now, while there is an urgent demand to develop deep learning based diagnosis methods for COVID-19. In the future, one can apply COVID-DA to CT imaging-based diagnosis of COVID-19 and extend it to the segmentation task for fine-grained diagnosis of COVID-19.
Lung pattern classification for interstitial lung diseases using a deep convolutional neural network. IEEE Transactions on Medical Imaging 35 (5), pp. 1207–1216. Cited by: §I.
Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19. arXiv. Cited by: §I, §II-A.
Deep transfer learning with joint adaptation networks. In ICML, pp. 2208–2217. Cited by: §III-C.
COVID-19 screening on chest x-ray images using deep learning based anomaly detection. arXiv. Cited by: §I, §II-A, §IV-A2, TABLE II.
Online adaptive asymmetric active learning for budgeted imbalanced data. In SIGKDD, pp. 2768–2777. Cited by: §IV-A4.