Multi-Task Driven Explainable Diagnosis of COVID-19 using Chest X-ray Images

by   Aakarsh Malhotra, et al.
IIT Rajasthan
IIIT Delhi

With increasing number of COVID-19 cases globally, all the countries are ramping up the testing numbers. While the RT-PCR kits are available in sufficient quantity in several countries, others are facing challenges with limited availability of testing kits and processing centers in remote areas. This has motivated researchers to find alternate methods of testing which are reliable, easily accessible and faster. Chest X-Ray is one of the modalities that is gaining acceptance as a screening modality. Towards this direction, the paper has two primary contributions. Firstly, we present the COVID-19 Multi-Task Network which is an automated end-to-end network for COVID-19 screening. The proposed network not only predicts whether the CXR has COVID-19 features present or not, it also performs semantic segmentation of the regions of interest to make the model explainable. Secondly, with the help of medical professionals, we manually annotate the lung regions of 9000 frontal chest radiographs taken from ChestXray-14, CheXpert and a consolidated COVID-19 dataset. Further, 200 chest radiographs pertaining to COVID-19 patients are also annotated for semantic segmentation. This database will be released to the research community.


page 1

page 3

page 4

page 6

page 7

page 9


CoroNet: A Deep Neural Network for Detection and Diagnosis of Covid-19 from Chest X-ray Images

The novel Coronavirus also called Covid-19 originated in Wuhan, China in...

CovidAID: COVID-19 Detection Using Chest X-Ray

The exponential increase in COVID-19 patients is overwhelming healthcare...

A Deep Convolutional Neural Network for COVID-19 Detection Using Chest X-Rays

We present an image classifier based on the CheXNet and a transfer learn...

DeepCOVIDExplainer: Explainable COVID-19 Predictions Based on Chest X-ray Images

Amid the coronavirus disease(COVID-19) pandemic, humanity experiences a ...

COVIDx CXR-3: A Large-Scale, Open-Source Benchmark Dataset of Chest X-ray Images for Computer-Aided COVID-19 Diagnostics

After more than two years since the beginning of the COVID-19 pandemic, ...

OSegNet: Operational Segmentation Network for COVID-19 Detection using Chest X-ray Images

Coronavirus disease 2019 (COVID-19) has been diagnosed automatically usi...

A cascade network for Detecting COVID-19 using chest x-rays

The worldwide spread of pneumonia caused by a novel coronavirus poses an...

I Introduction

The COVID-19 pandemic has affected the health and well-being of people across the globe and continues its devastating effect on the global population. The total cases have increased at an alarming rate and have crossed million worldwide [31]. Increasing cases of COVID-19 patients raises the concern for effective screening of infected patients. The current process of testing for COVID-19 is time-consuming and requires availability of testing kits. This necessitates the requirement for alternative methods of screening, which is available to the general population, cost effective, time efficient, and scalable.


Fig. 1: Samples of chest x-ray images. (a) Different kinds of lung abnormalities and (b) AP and PA views corresponding to CheXpert and ChestXray-14, and COVID-19 datasets. The bounding box highlights the diseased region.

Dyspnea is a common symptom for COVID-19. Analyzing the chest X-ray, radiologists observed that it introduces specific abnormalities in a patient’s lungs [25]. For instance, COVID-19 pneumonia has a typical appearance on chest radiographs with bilateral peripheral patchy lung opacities, lower lung distribution, rounded morphology and absence of pleural effusion and lymphadenopathy. Fig. 1

shows samples of chest x-ray images with different lung abnormalities including COVID-19. Motivated by this observation and the fact that x-ray imaging is faster, cheaper, accessible, and has scope for portability, many recent studies have proposed machine learning algorithms to predict COVID-19 using CXRs


In this research, we propose a deep learning network termed as COVID-19 Multi-Task Network (CMTNet), which learns the abnormalities present in the chest x-ray images to differentiate between a COVID-19 affected lung and a Non-COVID affected lung. Since the explainability of machine learning systems, particularly for medical applications, is of paramount importance, the proposed network also incorporates the task of lung and disease segmentation. The proposed CMTNet simultaneously processes the input X-ray for semantic lung segmentation, disease localization, and healthy/unhealthy classification. Incorporating additional tasks while performing the primary task of COVID classification has multiple advantages. While processing for COVID classification, the additional segmentation tasks enforce the network to focus on lung regions and disease-affected areas only. Further, inclusion of healthy/unhealthy classification aids the CMTNet to effectively identify a healthy lung. Further, assistance from other tasks reduces dependence on enormous amounts of data required during training. The key research highlights of this work are as follows:

  1. Develop COVID-19 Multi-Task Network (CMTNet) for classification and segmentation of the lung and disease111In our context, the terms ‘abnormality’, ‘disease’, and ‘radiological finding’ are used synonymously. regions. The CMTNet further predicts if lungs are affected with COVID-19 or Non-COVID-19 disorders and differentiate them from healthy lungs.

  2. Inclusion of simultaneous disease segmentation in the CMTNet helps in making the decisions explainable.

  3. Extensive evaluation and comparison against the existing deep learning algorithms for COVID-19 prediction, lung, and disease segmentation.

  4. Assemble frontal chest x-rays from various sources, that can be used for diverse tasks such as classification and semantic segmentation of lungs and disease.

  5. Creating and publicly releasing manual annotations for lung semantic segmentation for healthy, unhealthy, and COVID-19 affected X-ray images.

Fig. 2: The proposed CMTNet to perform multiple related tasks to improve the classification performance for COVID-19 disease diagnostics using frontal x-ray. The figure contrasts the multitask network with single task network.

Ii Literature Review

Recently, researchers have proposed AI-based techniques for detecting COVID-19 using chest CT and x-ray images. Apostolopoulos and Mpesiana [2]

explored transfer learning through various CNNs and observed that MobileNet v2 

[29] yields the best results. Narin et al. [21] proposed to use three CNN models, namely, ResNet50 [10], InceptionV3 [36], and InceptionResNetV2 [35] for detecting COVID-19 using chest x-ray. The authors fine-tuned these pre-trained deep models for distinguishing COVID-19 from normal x-rays and found that ResNet-50 performed the best. They used 50 chest x-ray images of COVID-19 patients from Github repository [6] and 50 normal chest X-ray images [20]. Abbas et al. [1] also performed transfer learning by fine-tuning a pre-trained AlexNet [17]. Their dataset comprises 80 normal, 105 COVID-19, and 11 SARS affected lung radiographs. Cohen et al. [5] predicted the severity score for COVID-19 using a deep regression model. Apart from healthy and Non-COVID disease affected frontal lung x-ray, their dataset included 94 COVID-19 affected lung x-rays (all PA views). Oh et al. [24] proposed a patch-based CNN approach for COVID-19 diagnosis. During testing, majority voting from multiple patches at different locations of lungs is performed for final decision.

For interpretation and explainability, there are limited studies. Mangal et al.[19] utilized DenseNet121[12] for classification into four classes: healthy, bacterial pneumonia, viral pneumonia, and COVID-19. They showed Class Activation Maps (CAM) for interpretation. Karim et al.[16]

also classified into these four categories using a modified ResNet architecture. With an emphasis on explainability, the authors showed CAM and confusion matrix. On similar lines, Ghoshal and Tucker

[8] showed the application of ResNet50v2[11] for the above four classes. Authors interpret the results using CAM, confusion matrices, Bayesian uncertainty, and Spearman correlation.

The problem of small sample size of COVID-19 chest X-ray images was tackled by Loey et al. [18]

, where they generate new COVID-19 infected images using GANs. They create a model composed of three components, namely, a backbone network, a classification head, and an anomaly detection head by Zhang et al. 

[38]. Wang and Wong [37] introduced COVID-Net for detecting COVID-19 cases. Further, the authors investigate the predictions made by COVID-Net to gain insights on the critical factors associated with COVID-19 cases. In their work, a three-class classification is performed to distinguish COVID-19 cases from regular and Non-COVID cases.

These research demonstrate that AI-driven techniques can diagnose COVID-19 using chest x-ray images. It could potentially overcome the challenges of limited test kits and speed up the screening process of COVID-19 cases. However, a significant limitation of existing studies is that the algorithms work as a black box. These algorithms predict if the input x-ray is affected by COVID-19 or some related disease. Most studies fail to explain the decisions - for instance, which lung regions are salient for the specific decisions. Secondly, existing studies do not focus on radiological abnormalities such as consolidation, opacities, or pneumothorax. Without a clear emphasis on the lung or the abnormality, it is hard to have the explainability of an algorithm in a crucial application of COVID-19 diagnosis. Further, most of these studies work with a limited number of COVID-19 samples, with around 100 samples under most scenarios. Thirdly, as shown in Fig. 1(b), the posteroanterior (PA) and anteroposterior (AP) views of CXR images vary due to the acquisition mechanisms. While training, samples from both classes need to be considered but existing algorithms are generally silent on these details.

Iii COVID-19 Multi-Task Network (CMTNet)

This section provides the details of the proposed CMTNet. Multi-task networks are known to learn similar and related tasks together based on the input data. As shown in Fig. 2

, multi-task networks have a base network with multi-objective outputs. Since each task shares the same base network, the weights are learned to be optimal for all functions jointly. The four tasks of CMTNet are (i) lung localization, (ii) disease localization, (iii) healthy/unhealthy classification and (iv) multi-label classification for COVID-19 prediction. These tasks are accomplished by using five loss functions: two for segmentation and three for classification. The details of these loss functions are described in the following subsections.

Let be the train set with images and represent an image. is associated with five labels, where, and represent the ground truth binary mask for lung and disease localization, respectively. , , and represents the healthy/unhealthy, COVID/Non-COVID, and Non-COVID diseases discriminator labels, respectively. Let be the proposed CMTNet that performs the four different tasks. The task set is defined as , where, and represent the task of lung and disease localization, respectively. and represents the task of healthy/unhealthy and COVID/Non-COVID classification, respectively.

Fig. 3: Architecture of the proposed COVID-19 Multi-task Network (CMTNet), which is based on a Encoder-Decoder architecture (Best viewed in color).

Iii-a Segmentation Loss

Chest x-ray of lungs contain peripheral organs along with lung regions. The primary objective of this research is to differentiate between COVID and Non-COVID samples. Since the key information lies in the lungs, the initial task is that of lung segmentation. The second segmentation loss aims to learn semantic segmentation of the diseased regions.

Lung segmentation can be achieved by learning a model that differentiates between the background and foreground lung regions. The CMTNet accomplishes this by utilizing a VGG16 Encoder-Decoder architecture [3]

. The encoder has VGG16 as a base network. It has five blocks with 2, 2, 3, 3, and 3 layers of convolution + batch norm + ReLu layers, respectively. The decoder network builds upon the representation obtained from the encoder network, with a transposed architecture of the encoder network. At the final layers, the output is derived from a SoftMax layer. The output dimension equals the input spatial resolution of the X-ray image with the number of channels equaling the number of segmentation classes. Hence, the final layer consists of two channels (lung and non-lung).

Similar to lung localization, the disease localization also builds upon the encoder representation. However, the disease localization task has a separate decoder branch and is optimized for localizing more than

lung-related disorders. For both the lung and disease localization, the gradients are backpropagated via decoder network into the encoder layers.

Let and represent the sub-networks for lung and disease localization, respectively. For any image , the output predicted binary masks for lung and disease localization are represented as:


In this research, binary cross entropy loss is used for lung and disease localization. Mathematically, it is represented as:


where, and are the lung and disease loss, respectively for image . and represent the pixel value at location for lung and disease masks, respectively.

Iii-B Classification Loss

The two classification tasks are Healthy/Unhealthy classification of the lung X-ray, and Multi-label classification for the presence of COVID-19 or other abnormalities. These tasks are performed using three classification loss functions. The lung and disease localization provides supervision for the three classification tasks. For healthy/unhealthy, COVID/Non-COVID, and Non-COVID diseases discrimination classification, two branches are derived over the compact encoder representation (after GAP). Each branch has three fully connected layers (FC), each followed by ReLu, ReLu, and SoftMax activation, respectively (Fig. 3).

Let and represent the sub-networks for healthy/ unhealthy and multi-label classification, respectively. The output of for image is represented as:



is the probability of predicting image

to . The loss function for healthy/unhealthy classification is represented as:


where, represents the healthy/unhealthy loss for image . For multi-label classification, the output of sub-network for an image is written as:


where, and represent the output predicted score ( [0, 1]) for COVID/Non-COVID and Non-COVID diseases discriminator, respectively. The radiological findings of COVID-19 pneumonia may overlap those of other viral pneumonia and acute respiratory distress syndrome due to other etiologies. The network needs supervision to segregate COVID-19 pneumonia from Non-COVID lung diseases. Hence, the joint optimization for COVID/Non-COVID along with Non-COVID diseases discrimination helps differentiate COVID-19 affected lungs from lungs affected with diseases other than COVID-19. The joint loss for predicting both COVID/Non-COVID and Non-COVID diseases discrimination is written as:


Overall Loss Function: It is possible that the ground truth labels or segmentation masks are not available for all the images during training. In this case, all branches of the networks will not be active during training of CMTNet. For instance, if the ground truth mask is unavailable for disease segmentation, then the sub-network will remain inactive and the loss for image will become zero. In the same manner, other losses can have a 0/1 “switch”. Therefore, the total loss is computed as:


where, , , , and are the switches pertaining to the tasks , respectively. The values of these switches are either 0 or 1 depending on the availability of ground truth labels/masks of the respective tasks for the image.

Database Healthy View Images
Chest X-Ray-14 [27]
Healthy PA 4088
AP 2688
PA 3469
AP 3115
CheXpert [14]
Healthy PA 1331
AP 2163
PA 3279
AP 11305
TABLE I: Details of the databases used in the experiments.
Source AP View PA View Total Images
GitHub [6] 50 26 76
Italy [33] 30 39 69
Spain [13] 0 110 110
RadioPaedia [26] 9 85 94
BSTI [4] 3 39 42
EuroRad [7] 6 18 24
Total 98 317 415
TABLE II: Details for the COVID-19 databases used in the experiments.

Iv Experimental Details

We next summarize the databases used for training and testing, the lung and disease annotations performed as part of this research, and the implementation details.

Iv-a Database and Protocol

For different tasks of the network, we require a chest X-ray database with multiple annotations and diverse properties. Thus, the database for experiments is created by combining subsets from the ChestXray-14, CheXPert, and COVID-19 infected X-ray databases. We only use frontal X-ray in our experiments from the following publicly available databases:

Fig. 4: Annotations provided for COVID-19 affected frontal lung x-ray images as a part of this study: (a) Labeled COVID-19 X-ray for locations of radiological finding, (b) Description of the radiological finding, (c) Corresponding binary masks for training deep semantic segmentation algorithms for disease segmentation.
Mask Disease-wise Views
Normal Covid
Train 8730 1456 8173 1740 16551 10161 16690
Test 1837 251 2077 125 4097 2464 3968
Total 10567 1707 10250 1865 20648 12625 20658
TABLE III: Details of train-test split across different parameters. Train set for Covid-19 includes augmentation.
  • ChestXray-14 [27]: The dataset contains healthy and unhealthy x-ray images. It has a total of 112,120 chest x-ray images, out of which 67,310 are PA view images, and remaining 44,810 are AP view. Multiple radiographs of the same patient taken at different times are also present. From the database, we derive a subset of 13,360 images, spanning both PA and AP views. The unhealthy X-rays are labeled for one or more classes in a total of 14 classes, marking the presence of different radiological signs such as pleural effusions and consolidation. Additionally, the dataset provides localization information of abnormalities for 880 X-rays. The details of the subset drawn from ChestXray-14 is illustrated in Table I.

  • CheXpert [14]: The CheXpert dataset contains a total of 223,414 chest x-ray images, out of which 29,420 are PA view, 161,590 are AP view, and the remaining are lateral or single lung view images. Multiple case studies of the same patient are available in the dataset. This dataset contains healthy and unhealthy X-ray images. We selected a subset of 18,078 images. Based on the radiological findings, each X-ray image is labeled positive/negative for 14 pre-defined classes (few overlapping with ChestXray-14). The details of the x-ray images selected from CheXpert database is shown in Table I.

  • COVID-19: For this study, we collected a total of 415 X-rays from various internet sources. The sources have a mixed number of PA and AP view frontal chest x-ray. The number of X-rays collected from each source has been summarized in Table II.

Since the above COVID-19 subset has limited number of images, we perform data augmentation. Each image is augmented five ways - clockwise rotation by 10, anti-clockwise rotation by 10, translation by 10 pixels in the X, Y, and XY-directions. Since pneumonia is a closely related pathology to COVID[22], we select all the pneumonia samples of the ChestXray-14 and CheXPert datasets. Further, to accommodate the variations in non-healthy x-ray samples, about 50% more unhealthy samples are selected compared to healthy samples. AP view x-rays are prominent compared to PA views in the CheXpert dataset. Hence, we select more AP view X-ray images.

The data is split into 80% training and 20% testing, in a subject disjoint manner, ensuring that there is no patient overlap in the train and test sets. The details of the dataset split across different properties are specified in Table III. Note that all the numbers mentioned in the table are post-augmentation.

Iv-B Lung and Disease Region Annotation

The datasets mentioned above lack lung localization details. The proposed CMTNet requires a ground-truth lung location to identify the lung region from the x-ray. For this purpose, we manually annotated a total of about 9000 lung x-rays. These x-rays include well-balanced healthy/unhealthy, AP/PA subsets taken equally from the CheXpert and ChestXray-14 datasets. All x-ray images available for COVID-19 are also manually annotated for lung segmentation. Mask for each x-ray image has been created by drawing two solid bounding boxes, corresponding to the area covered by each lung. As a part of this study, we also plan to release the ground truth masks for the manually annotated lung regions.

The datasets included as a part of this study have only 880 disease localization annotation images (from ChestXray-14 database). For COVID-19 affected frontal lung x-ray images, we lacked disease segmentation masks. Hence, as a part of this study, the x-ray images are annotated by a radiologist for various radiological findings. The findings radiologists looked for includes: (i) atelectasis, (ii) consolidation, (iii) interstitial shadows (reticular, nodular, ground glass), (iv) pneumothorax, (v) pleural effusion, (vi) pleural thickening, (vii) cardiomegaly, and (viii) lung lesion. The experts annotated a total of 200 COVID-19 affected chest x-rays. A few sample annotations for the same can be seen in Fig. 4(a) and the corresponding description in Fig. 4(b). While training deep learning algorithms, the model requires binary masks as annotation. Hence, we created these masks based on the annotations (Fig. 4(c)). We will release the ground truth binary masks to promote the training of deep semantic segmentation algorithms for abnormality localization.

Iv-C Implementation Details

The proposed Multi-task network requires input X-ray images of size 2242243. The encoder stream is initialized using a pre-trained VGG16 model. With a batch size of 16, the model is optimized over binary cross-entropy loss using Adam optimizer (learning rate

). Each loss is weighted equally. The model is trained for 30 epochs on NVIDIA GeForce RTX 2080Ti and implemented in PyTorch.

V Results and Analysis

We next evaluate the performance of the proposed CMTNet for classification and localization tasks. The performance is compared with existing deep learning algorithms for COVID-19 chest radiograph studies. Further, to study the effectiveness of the proposed CMTNet, we perform experiments by selecting different combinations of sub-networks from the CMTNet.

Fig. 5: Samples of lung segmentation output for existing algorithms and the proposed CMTNet.

Fig. 6: Samples of semantic disease segmentation for existing algorithms and the proposed CMTNet. The x-ray images and corresponding abnormality localization for “Unhealthy” are derived from ChestXray-14 database [27].

V-a Lung and Disease Localization

In this subsection, the segmentation results of the proposed CMTNet are compared against region predictions from UNet [28], Mask RCNN [9], and SegNet [3]. For lung segmentation, sample predictions of the proposed and existing algorithms are shown in Fig. 5. Additional samples for lung segmentation can be seen in Fig. 1 of supplementary material. Inferring the sample prediction, we observe that all four algorithms perform well and give comparable results. However, the proposed CMTNet yields the most precise bound for lung segmentation. Since lung and disease localization tasks are performed simultaneously, and diseases are present within the lungs, the lung decoder network learns to focus more on the lung regions rather than the outside the lungs.

For disease segmentation, the prediction results are shown in Fig. 6. Additional results are shown in Fig. 2 of the supplementary material. The first two rows of Fig. 6 illustrate abnormalities in COVID-19 affected lungs while last two rows have abnormality localization in unhealthy but Non-COVID affected lungs. Of the four algorithms, UNet performs the worst. From the perspective of shape, Mask-RCNN tends to provide well-defined shape boundaries for Non-COVID unhealthy lungs. On the other hand, SegNet and CMTNet provide irregularly shaped predictions, localizing the radiological findings compactly. Overall, we observe that each of the four algorithms predict additional regions for the abnormalities. The detected abnormalities have false positive regions when compared to the ground-truth, sometimes localizing better than the ground-truth (for SegNet and proposed CMTNet). The same trend is elaborated in Fig. 3 of the supplementary material.

Further, we observe that for certain abnormalities in ‘Unheathy’ case, deep models fail to localize the abnormality. One of the reasons for this is the limited training data for abnormality localization with large variations in the diseased regions. The unhealthy Non-COVID lung abnormalities are derived from ChestXray-14, which has 700 samples corresponding to 14 labels. As a result of a small sample size for each abnormality, the networks cannot localize diseases properly. However, the proposed CMTNet has assistance from other tasks. For instance, the lung prediction task would implicitly reinforce CMTNet to predict diseases within the lung. Hence, of the four algorithms, the proposed CMTNet provides the most overlapping prediction with the ground truth.

Compared to 700 samples for 14+ different radiological findings (approx. 50 images per abnormality), the COVID-19 affected lung x-rays are 290 in number (prior to augmentation). The majority of the COVID-19 affected chest radiographs demonstrate consolidations, which tend to be bilateral and more common in lower zones[30]. Hence, deep models have more samples to learn the localization of COVID-19 specific abnormalities than other diseases (290 vs. 50). In retrospection, the first two rows of Fig. 6 illustrate that all four models perform relatively better for COVID-19 localization than the last two rows of “unhealthy” localization. In most cases, each of the four models predict affected regions in the lower lung zones bilaterally. However, the proposed CMTNet outperforms other algorithms. For instance, in the first row of Fig. 6, both Mask-RCNN and SegNet tend to leave out the darker region in the right lung, while ground-truth and CMTNet have that region marked as diseased. Further, in the low contrast x-ray in row two, the less opaque part of the right lower lung looks darker (though being diseased). Hence, UNet fails to detect any finding in the right lower lung, while Mask RCNN and SegNet detects a few small region(s). Nevertheless, the proposed CMTNet can detect such faint differences in lung density. The same pattern can also be noticed in the low contrast x-ray (row two) of Fig. 2 in the supplementary material.

Sensitivity@ Specificity
EER (%)
= 99% = 90%
DenseNet121 + FC 60.80 90.40 9.82
MobileNetv2 + FC 67.20 93.60 8.04
ResNet18 + FC 56.00 81.60 13.78
VGG19 + FC 50.40 82.40 13.70
CMTNet Embedding
79.20 95.20 7.34
CMTNet Embedding
+ SVM (Sigmoid)
6.40 24.00 41.46
CMTNet Embedding
+ SVM (Gaussian)
82.40 88.80 11.38
CMTNet Embedding
82.40 88.80 11.38
CMTNet (Proposed) 87.20 96.80 7.30
TABLE IV: Evaluation and comparison of the proposed CMTNet with existing learning algorithms for COVID-19 prediction. FC represents fully connected classification layers.

Fig. 7: Standard deviation () of Sensitivity (at 1% FAR) for different algorithms. The performance is computed for different initialization of deep networks. The results show the stability in sensitivity for CMTNet, delivering consistent results for different initializations.
Fig. 8: ROC curves summarizing the performance for COVID-19 classification: (a) comparing the proposed CMTNet with existing deep learning models, and (b) CMTNet embedding in combination with different classifiers.

V-B Classification

Next, we evaluate the CMTNet’s performance for healthy/unhealthy (Task 3) classification and multi-label classification of COVID-19and other diseases (Task 4). The results of the CMTNet are compared against popular deep networks. These include DenseNet121[19][12], MobileNetv2[2][29], ResNet18[21][10], and VGG19[32]

. For each of these networks, the ImageNet pre-trained version is selected. The model is then fine-tuned with the dataset and protocol used for the proposed CMTNet. Further, we draw a comparison with RDF

[15] and SVM[34] with three different kernels (sigmoid, gaussian, and RBF). The training of RDF and SVM is performed using feature embedding of training samples, obtained from the last encoder layer of the CMTNet.

The results for classification performance are presented in Table IV. It is observed that the proposed CMTNet achieves a sensitivity of 87.20% at 99% specificity, with an overall test classification accuracy of 98.79%. The proposed CMTNet achieves the highest TPR and lowest EER compared to the existing algorithms. With the implicit supervision from lung and disease localization tasks, the proposed CMTNet outperforms all other existing algorithms. To show the stability of different algorithms with different initialization, the networks are three-times trained with different initialization parameters. Across different training initializations, we report the standard deviation in Sensitivity to evaluate the stability (lower standard deviation implies higher stability). As shown in Fig. 7, the proposed CMTNet is the most stable algorithm across different initializations. Classifiers that use embeddings from CMTNet also report lower standard deviation. Hence, it can be inferred that CMTNet consistently provides a discriminative representation, resulting in a stable performance. Fig. 8 further shows the comparison using the ROC curves of the proposed CMTNet and existing algorithms.

Fig. 9: COVID-19 positive case misclassified as both healthy and Non-COVID by the proposed CMTNet.

Fig. 10: Interpretation of feature representation based on (a) ground-truth and (b) predicted labels using t-SNE plot for COVID/Non-COVID classification.

The CMTNet’s classification performance for the COVID-19 samples into the healthy and unhealthy class is also analyzed. The proposed network classifies 97.25% of COVID-19 samples into unhealthy class and 2.75% in healthy class. The high TPR of the COVID-19 class and the majority of the COVID-19 samples being classified into unhealthy class showcase the effectiveness of the proposed network for COVID-19 detection. Overall, the classification performance of healthy/unhealthy classification is 75.17% for all the test samples, while for Non-COVID disease classification is 73.87%. Based on the proposed CMTNet, Fig. 9 shows some of the misclassified samples where the network predicts COVID-19 positive instances (as per the RT-PCR test) into healthy (Task 3). Correspondingly, the same samples are also predicted as Non-COVID by Task 4 of the proposed CMTNet. In retrospection, we believe that minimal opacities in the lung region could be the probable cause of misclassification. This led us to check the ground truth for the hospitalization day. Of the four misclassified samples shown in Fig. 9, three turned out to be the early days of the patient’s hospitalization (up to day 3). Based on these observations, it can be concluded that the CMTNet predicts an x-ray being affected when there is presence of opacities and consolidations.

V-C Ablation Study

To study the importance of different tasks in the proposed CMTNet, we perform an ablation study by choosing different combinations of tasks. The four tasks in the CMTNet are Task 1: Semantic lung segmentation, Task 2: Semantic disease segmentation, Task 3: Healthy/Unhealthy classification of the lung X-ray, and Task 4: Multi-label classification for the presence of COVID-19 or other diseases. With at least one segmentation task included, we perform six different ablation study experiments, which are presented in the six rows of Table V.

It is observed that for COVID-19 prediction, each task (loss function) has an important role. Removing either of the three assisting tasks deteriorates the performance. Of all these three assisting tasks, the lung segmentation task holds a pivotal role. In a COVID-19 affected x-ray, a common trait is that the lungs get affected bilaterally. Hence, a comprehensive view provided by the lung segmentation task provides more weight to lung regions, resulting in better performance with Task 1 than any other task. We perform disease segmentation and healthy/unhealthy classification since their efficacy improves in conjunction with lung segmentation and has a positive impact on the Non-COVID disease classification prediction. As validated by the ground-truth t-SNE feature space plot (shown in Fig. 10(a)), the predictions of the test COVID-19 samples (Fig. 10(b)) are well separated from Non-COVID samples. It shows that the model can distinguish COVID-19 affected samples and can predict unseen test labels correctly.

COVID-19 (Sensitivity)
All 4 Tasks 96.80
Task 1 and 4 94.40
Task 2 and 4 57.60
Task 1, 2 and 4 92.80
Task 1, 3 and 4 87.20
Task 2, 3 and 4 54.40
TABLE V: An ablation study on reducing the number of tasks and observing its effect on COVID-19 prediction.

Vi Conclusion

In the face of the SARS-CoV2 pandemic, it has become essential to perform mass screening and testing of patients. However, many countries around the world are not equipped with enough laboratory testing kits or medical personnel for the same. X-rays are amongst the most popular, cheap and widely available imaging technology across the world. This paper attempts to provide an “explainable solution” for detecting COVID-19 pneumonia in patients through chest radiographs. We propose the CMTNet which performs the tasks of classification and segmentation simultaneously. Experiments conducted on the different chest radiograph datasets show promising results of the proposed algorithm in COVID-19 prediction. The ablation study also supports the utilization of different tasks in the proposed multi-task network.


Aakarsh Malhotra is partially supported by Visvesvaraya Ph.D. Fellowship. Surbhi Mittal is partially supported by UGC-Net JRF Fellowship. Puspita Majumdar is partially supported by DST Inspire Ph.D. Fellowship.


  • [1] A. Abbas, M. M. Abdelsamea, and M. M. Gaber (2020)

    Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network

    arXiv preprint arXiv:2003.13815. Cited by: §II.
  • [2] I. D. Apostolopoulos and T. A. Mpesiana (2020) Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, pp. 1. Cited by: §II, §V-B.
  • [3] V. Badrinarayanan, A. Kendall, and R. Cipolla (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12), pp. 2481–2495. Cited by: §III-A, §V-A.
  • [4] BSTI (2020) COVID-19 BSTI IMAGING DATABASE. External Links: Link Cited by: TABLE II.
  • [5] J. P. Cohen, L. Dao, P. Morrison, K. Roth, Y. Bengio, B. Shen, A. Abbasi, M. Hoshmand-Kochi, M. Ghassemi, H. Li, et al. (2020) Predicting covid-19 pneumonia severity on chest x-ray with deep learning. arXiv preprint arXiv:2005.11856. Cited by: §II.
  • [6] J. P. Cohen, P. Morrison, and L. Dao (2020) COVID-19 image data collection. arXiv 2003.11597. External Links: Link Cited by: §II, TABLE II.
  • [7] EuroRad (2020) EuroRad Search results for COVID-19. External Links: Link Cited by: TABLE II.
  • [8] B. Ghoshal and A. Tucker (2020) Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection. arXiv preprint arXiv:2003.10769. Cited by: §II.
  • [9] K. He, G. Gkioxari, P. Dollár, and R. Girshick (2017) Mask R-CNN. In

    IEEE International Conference on Computer Vision

    pp. 2961–2969. Cited by: §V-A.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    IEEE Conference on Computer Vision and Pattern Recognition

    pp. 770–778. Cited by: §II, §V-B.
  • [11] K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In European Conference on Computer Vision, pp. 630–645. Cited by: §II.
  • [12] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely Connected Convolutional Networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708. Cited by: §II, §V-B.
  • [13] C. Imaging (2020) COVID-19 CXR Spain. External Links: Link Cited by: TABLE II.
  • [14] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya, et al. (2019) Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In AAAI Conference on Artificial Intelligence, Vol. 33, pp. 590–597. Cited by: TABLE I, 2nd item.
  • [15] H. T. Kam et al. (1995) Random decision forest. In International Conference on Document Analysis and Recognition, Vol. 1416, pp. 278282. Cited by: §V-B.
  • [16] M. Karim, T. Döhmen, D. Rebholz-Schuhmann, S. Decker, M. Cochez, O. Beyan, et al. (2020) Deepcovidexplainer: Explainable covid-19 predictions based on chest x-ray images. arXiv preprint arXiv:2004.04582. Cited by: §II.
  • [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105. Cited by: §II.
  • [18] M. Loey, F. Smarandache, and N. E. M Khalifa (2020) Within the Lack of Chest COVID-19 X-ray Dataset: A Novel Detection Model Based on GAN and Deep Transfer Learning. Symmetry 12 (4), pp. 651. Cited by: §II.
  • [19] A. Mangal, S. Kalia, H. Rajgopal, K. Rangarajan, V. Namboodiri, S. Banerjee, and C. Arora (2020) CovidAID: COVID-19 Detection Using Chest X-Ray. arXiv preprint arXiv:2004.09803. Cited by: §II, §V-B.
  • [20] P. Mooney (2018) Chest X-Ray Images (Pneumonia). External Links: Link Cited by: §II.
  • [21] A. Narin, C. Kaya, and Z. Pamuk (2020) Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. arXiv preprint arXiv:2003.10849. Cited by: §II, §V-B.
  • [22] B. Nazari (2020) Coronavirus and Pneumonia. External Links: Link Cited by: §IV-A.
  • [23] T. T. Nguyen (2020) Artificial intelligence in the battle against coronavirus (COVID-19): a survey and future research directions. Preprint, DOI: 10.13140/RG.2.2.36491.23846 10. Cited by: §I.
  • [24] Y. Oh, S. Park, and J. C. Ye (2020) Deep learning covid-19 features on cxr using limited training data sets. IEEE Transactions on Medical Imaging, DOI: 10.1109/TMI.2020.2993291. Cited by: §II.
  • [25] F. Pan, T. Ye, P. Sun, S. Gui, B. Liang, L. Li, D. Zheng, J. Wang, R. L. Hesketh, L. Yang, et al. (2020) Time course of lung changes on chest CT during recovery from 2019 novel coronavirus (COVID-19) pneumonia. Radiology, pp. 200370. Cited by: §I.
  • [26] RadioPaedia (2020) Search results for “covid 19”. External Links: Link Cited by: TABLE II.
  • [27] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, et al. (2017) Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225. Cited by: TABLE I, 1st item, Fig. 6.
  • [28] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §V-A.
  • [29] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. Cited by: §II, §V-B.
  • [30] J. Sawani (2020) How Does COVID-19 Appear in the Lungs?. External Links: Link Cited by: §V-A.
  • [31] A. Schiffmann (2020) World COVID-19 Stats. Note: Cited by: §I.
  • [32] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §V-B.
  • [33] SIRM (2020) COVID-19 Database. External Links: Link Cited by: TABLE II.
  • [34] J. A. Suykens and J. Vandewalle (1999)

    Least squares support vector machine classifiers

    Neural Processing Letters 9 (3), pp. 293–300. Cited by: §V-B.
  • [35] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi (2017)

    Inception-v4, inception-resnet and the impact of residual connections on learning

    In AAAI conference on Artificial Intelligence, Cited by: §II.
  • [36] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. Cited by: §II.
  • [37] L. Wang and A. Wong (2020) COVID-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images. arXiv preprint arXiv:2003.09871. Cited by: §II.
  • [38] J. Zhang, Y. Xie, Y. Li, C. Shen, and Y. Xia (2020) Covid-19 screening on chest x-ray images using deep learning based anomaly detection. arXiv preprint arXiv:2003.12338. Cited by: §II.