A Cascaded Learning Strategy for Robust COVID-19 Pneumonia Chest X-Ray Screening

04/24/2020 ∙ by Chun-Fu Yeh, et al. ∙ National Taiwan University 1

We introduce a comprehensive screening platform for the COVID-19 (a.k.a., SARS-CoV-2) pneumonia. The proposed AI-based system works on chest x-ray (CXR) images to predict whether a patient is infected with the COVID-19 disease. Although the recent international joint effort on making the availability of all sorts of open data, the public collection of CXR images is still relatively small for reliably training a deep neural network (DNN) to carry out COVID-19 prediction. To better address such inefficiency, we design a cascaded learning strategy to improve both the sensitivity and the specificity of the resulting DNN classification model. Our approach leverages a large CXR image dataset of non-COVID-19 pneumonia to generalize the original well-trained classification model via a cascaded learning scheme. The resulting screening system is shown to achieve good classification performance on the expanded dataset, including those newly added COVID-19 CXR images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 7

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The outbreak of COVID-19 disease (a.k.a., SARS-CoV-2) has been affecting the world in an unprecedented way. While intensively global research efforts are being made to seek its effective treatments or vaccines, at the core of the most urgent concern is to prevent the pandemic from further spreading into an uncontrollable and chaotic status. In this work, we are endeavoring to establish an open AI-based platform to seamlessly carry out preliminary large-scale screening of potential COVID-19 patients, and provide early detection at the onset of the infection from the viewpoint of radiology imaging.

Countries such as the United States and Japan are now actively practicing social distancing and seem to receive encouraging outcomes to curb further spread of COVID-19. Still, an affordable and reliable procedure to effectively screen the potentially infected patients is much needed. The reverse-transcription polymerase chain reaction (RT-PCR) testing is the preliminary evaluation to assess whether a subject is at risk of COVID-19. However, the less satisfactory sensitivity of RT-PCR has its limitation on reducing the false-negative rate [3] and may overlook a good number of COVID-19 patients, especially in their early stage of infection. It is essential to find a complementary testing to boost the screening confidence and chest radiology, especially chest x-ray (CXR), with the advantages of affordability, efficiency and reliability, is promising in this regard. As will be demonstrated in the experiments, the proposed AI-based CXR screening system of COVID-19 could effectively detect the infection with high sensitivity, even several days prior to the confirmed RT-PCR results.

Compared with chest computer tomography (CT), chest x-ray imaging is more suitable for being incorporating into a large-scale AI-based screening platform for COVID-19. The supporting evidence is threefold. First, performing CT scanning for comprehensive screening is not practical to contain a pandemic outbreak in that even the well-established healthcare system of a developed country simply does not have such a capacity to do so. Second, cleaning the CT scanning equipment takes considerably longer time than the case of using chest x-ray. The deep cleaning is necessary to prevent subsequent patients from being infected of COVID-19. Third, chest x-rays are already widely adopted as a de facto screening procedure, while CT scanning equipment is not equally popular and is mostly available in primary healthcare institutes. Their distinction in public use also reflects in the availability of large-scale open datasets of chest x-ray images over CT. The ease of collecting image data is crucial in training a reliable AI-based screening system, especially under the current circumstance.

The proposed chest x-ray screening platform leverages the pneumonia classification system developed by Taiwan AI Labs to detect COVID-19. To extend the model, we collaborate with several medical research centers in Taiwan to collect chest x-ray images from COVID-19 patients at various stages, and re-train the pneumonia classification system using a three-stage cascaded learning strategy. Specifically, in the first stage, the AI model is trained to predict from a given CXR image the regions of interest, i.e., the mask of lung regions. In the second stage, features from the predicted regions are extracted to decide whether the image is a positive case of pneumonia. In the final and the third stage, the platform would further make decision on whether the underlying case is COVID-19 or another type of pneumonia. The resulting screening system also yields the stage-wise predicted heatmaps and thus provides explainable image clues leading to the classifications. To demonstrate the effectiveness of the proposed platform for COVID-19 screening, we report the inference results from both the open datasets and those from the collected clinical cases.

2 Methodology

Figure 1: Our AI-based COVID-19 screening platform.

We denote the dataset for learning the classification network as , where is a chest x-ray image and is the pneumonia class label. In our implementation, we have to reflect that is a CXR image of Normal case and for the COVID-19 and non-COVID-19 Pneumonia, respectively. The training set can be further decomposed as and to indicate that is the original (large) collection for conventional pneumonia classification and includes all the COVID-19 CXR images. The overall network architecture of the proposed three-stage cascaded learning is illustrated in Fig. 1. We detail the design details as follows.

Lung segmentation.

The convenience of working with an open dataset often comes with a price—the image quality could vary among samples in the collection and noisy or irrelevant annotations may be included in some images. Figures 2(A) and 2(B) illustrate examples of such issues regarding the image or annotation quality from an open dataset of medical images.

To prevent an AI-based system from learning inconsistent information or noisy annotations, we design the pneumonia classification system to focus on the essential regions of interest, which in our case is the lung areas. We consider a U-Net model [6] to first predict a mask of the lung regions in a CXR image, and use the resulting mask to filter out image areas that could distract the AI system from learning the crucial pneumonia-relevant features for the underlying classification problem. Simply put, the task of the first stage, as shown in Fig. 2(C), can be thought of as a preprocessing step to segment the mask of the lung regions in each given CXR image.

Figure 2: (A) Misleading model attentions (right column) caused by non-informative areas (left column). (B) Four examples of “noisy” CXR images from open datasets. The alphabet at the upper-right corner, date stamp, and arrow marks in CXR images may confuse an AI-based system in both training and inference. (C) Stage 1 of the proposed DNN architecture functions as a preprocessing step to filter out non-informative areas and enable the system to focus on the lung regions.

Pneumonia classification.

Having excluded non-informative regions from each CXR image , the task of the succeeding second stage of the proposed cascaded learning is to carry out binary classifications: Normal versus Pneumonia. That is, the DenseNet-121 [4] pneumonia classifier is trained to predict as negative if , and otherwise positive if . Under this design setting, all CXR images of pneumonia are expected to be classified as positive no matter what their type is. After all, the goal of this stage is to filter out non-pneumonia samples from further considerations.

It is constructive to explain the pneumonia classification outcome of a CXR image by using the CAM [9] interpretation technique. As shown in Fig. 1, performing global average pooling (GAP) yields a

-D feature vector

, which can be reshaped into a tensor and then upsampled to of the original input resolution, . We express the CAM modeling at stage 2 as follows.

(1)

The heatmap derived in (1) can be interpreted as the importance distribution over the lung regions leading to the pneumonia classification outcome of . Intuitively, the AI-based readout of a CAM heatmap can be used to aide physicians to identify anomalous spots in the input CXR image .

Recall that the training dataset comprises and . The latter includes the COVID-19 images and is expected to expand constantly when new CXR samples are provided by our collaborative medical centers. The frequent changes in the available training data may require re-training the AI model to improve classification performance on COVID-19 images, while it also may degrade the overall pneumonia classification in stage 2. To overcome this dilemma, we deploy incremental learning to ensure a robust model learning in stage 2. We first adopt a pneumonia classification model by Taiwan AI Labs, which is pre-trained on , and then perform incremental learning on by optimizing the model parameters

with respect to the following loss function:

(2)

where is the cross-entropy loss, is the knowledge distillation loss for learning the prediction output of the pre-trained model and is a parameter to weigh the effect of knowledge distillation. When is set to , the incremental learning is simply reduced to re-training the model with the updated data .

COVID-19 screening.

At the stage 3 of the cascaded learning, the task is to distinguish the specific type of pneumonia: COVID () and non-COVID (). Notice that the dataset solely comprises non-COVID CXR images, which could correspond to normal cases and viral or bacterial pneumonia. We can write the input to the stage 3 classification as

(3)

where symbolizes pixel-wise product. In Fig. 1, an example of is illustrated as a mask of informative regions relevant to predicting an input CXR image as a COVID-19 or non-COVID-19 pneumonia case at the stage 3.

Analogous to the derivation of in (1), we can explore the heatmap responses to gain insights into how the classification system distinguishes COVID-19 from other types of pneumonia. We consider GradCAM [7] to generate the heatmap in that the resulting GradCAM responses tend to be concentrated rather than diffusive. The heatmap yields interpretable image cues relevant to how likely the AI system predicts the pneumonia case of as COVID-19. Inspired by [7], we also generate Guided-GradCAM (Guided Activation) as detailed responses to further focus on patterns to distinguish COVID-19 pneumonia.

3 Experiments

Data source Number of images Data split (train/val/test)
Total N P C Total N P C
Padchest 41,364 36,142 5,222 - 28,955 4,142 -
3,591 547
3,596 533
RSNA 18,406 8,851 9,555 - 7,092 7,616 -
882 947
877 992
Covid-chestxray-dataset 167 - - 167 89 - - 89
43 43
35 35
Table 1: Open datasets used in the experiments. Abbreviations: N: Normal, P: Pneumonia (non-COVID-19), C: COVID-19 [2]
Data source Number of images/cases Data split (train/val/test)
Total N P C Total N P C
NTUH 107/72 37/37 60/25 10/10 48 14 29 5
30 12 15 3
29 11 16 2
TMUH 27/8 - - 27/8 16 - - 16
8 8
3 3
NHIA 3,375 - 3,069 306 2,033 - 1,836 197
668 - 614 54
674 - 619 55
Table 2: Clinical datasets used in the experiments. Abbreviations: N: Normal, P: Pneumonia (non-COVID-19), C: COVID-19
Figure 3: Lung segmentation results on Open COVID-19 samples by our stage 1 model.

In collaborations with NTUH, TMUH and NHIA, we evaluate our method on both open and clinical collections of CXR images. Details about the datasets are listed in Table 1 and Table  2. We learn the model of each stage separately, i.e., all parameters other than those of the current training stage are kept unchanged.

Lung segmentation masking.

The preprocessing stage 1, lung segmentation, is trained on Tuberculosis Chest X-ray Image Data Sets that consist of two parts, Montgomery County X-ray Set and Shenzhen Hospital X-ray Set. The total number of samples contains 406 normal case images and 394 tuberculosis case images. We randomly sample 80% as our training/validation sets and 20% as our test set. Our stage 1 model achieves 0.88 dice similarity coefficient (DSC) on the test set. Fig. 3 shows some segmentation outcomes from our stage 1 model. We further investigate the effect of stage 1 considering the whole screening pipeline. Table 3 compares the pneumonia classification (stage 2) results between models with lung segmentation masking and models without lung segmentation masking.

Split Lung mask AUC Sensitivity Specificity
Validation 97.58 90.64 94.95
97.58 90.05 95.37
Test 97.33 90.37 95.66
97.18 87.56 95.44
Table 3: With ()/without () using stage-1 lung segmentation on open datasets

COVID-19 chest x-ray screening using open datasets.

We first train the stage 2 model to carry out pneumonia classification on the open datasets, summarized in Table 1. We separate the data into three groups: Noraml (N), Pneumonia (P, non-COVID-19), and COVID-19 (C). Normal and non-COVID-19 pneumonia CXR images are collected from Padchest [1] and RSNA [5], while COVID-19 samples are selected from [2]. For the data from Padchest and RSNA, we randomly divide it into training/validation/testing in the ratio of 80%/10%/10%. As to the data of COVID-19, we split it into training/validation/testing in the ratio of 50%/25%/25%, due to its small sample size. During stage 2, binary labels are considered, where non-COVID-19 Pneumonia (P) and COVID-19 (C) samples are labeled as 1 (positive) and Normal samples are labeled as 0 (negative). For each batch of training, we evenly sample each group to account for imbalanced distributions among Normal, non-COVID-19 Pneumonia, and COVID-19 samples. The masked images by stage 1 model (lung segmentation, as in Fig. 3) are used as the input of stage 2 model for both training and inference. After achieving results of pneumonia classification of stage 2 as in Table 4, we then train the stage 3 model to distinguish COVID-19 from non-COVID-19 pneumonia, using only non-COVID-19 Pneumonia (P) and COVID-19 (C) training samples. All parameters but those of stage 3 are fixed during training. In this stage, marked as “3: COVID-19” in Table 4, COVID-19 (C) samples are labeled as 1 (positive) while non-COVID-19 Pneumonia (P) samples are labeled as 0 (negative). Table 4 shows the results of our method on both open and clinical datasets, where only open datasets are utilized during the training phase.

Dataset Split Stage AUC Sensitivity Specificity
Open Validation 2:Pneumonia 96.97 93.05 90.79
3:COVID-19 86.91 90.34 74.42
Test 2:Pneumonia 96.72 92.85 90.05
3:COVID-19 88.04 85.26 85.86
NTUH+TMUH Validation 2:Pneumonia 91.99 33.33 92.31
3:COVID-19 67.88 66.67 63.34
Test 2:Pneumonia 93.94 63.64 90.48
3:COVID-19 40.00 50.00 40.00
Table 4: Result of our model trained on open datasets only
Dataset Split Stage AUC Sensitivity Specificity
Open Validation 2:Pneumonia 97.00 87.07 95.57
3:COVID-19 99.70 88.37 98.90
Test 2:Pneumonia 96.64 86.54 95.22
3:COVID-19 99.88 91.43 99.44
NTUH+TMUH Validation 2:Pneumonia 97.44 96.15 75.00
3:COVID-19 87.88 100.00 80.00
Test 2:Pneumonia 98.70 95.24 90.91
3:COVID-19 97.50 100.00 75.00
Table 5: Result of our model trained on open and clinical datasets

Fine-tune stage 2 and 3 on clinical datasets (NTUH and TMUH).

From Table 4, we observe the performance gaps in both validation and testing between clinical data and open data. (We will further elaborate this discovery in the next section.) To further improve the AI-based classification performance and achieve clinical applicability, we collaborate with NTUH, TMUH, and NHIA to collect more clinical COVID-19 CXR samples, as summarized in Table 2. Starting from a pre-trained model using only open datasets, we fine-tune our model using clinical data. Instead of using only clinical data, we combine open datasets and clinical datasets to prevent from overfitting during our fine-tuning phase, and consider the incremental learning strategy as in (2). As in the experiments on open datasets, we fine-tune our pneumonia classification (stage 2) and COVID-19 screening (stage 3) sequentially. Based on the final results shown in Table 5, our method is able to recognize Normal, Pneumonia (both non-COVID-19 and COVID-19), non-COVID-19, and COVID-19 cases respectively with high AUC, sensitivity, and specificity. For qualitative result, those cases sampled from the testing set of open dataset in Fig. 6 and the clinical dataset in Fig. 5 demonstrate reasonable model activation in different stages.

Dataset Split Stage Sensitivity Specificity
NHIA Validation 3:COVID-19 83.33 80.46
Test 3:COVID-19 81.82 80.45
Table 6: Result of our stage 3 model trained and evaluated on NHIA dataset

Fine-tune stage 3 on NHIA dataset.

To evaluate the clinical applicability of our method, we fine-tune our stage 3 model on a relatively larger dataset (NHIA), which includes 306 COVID-19 (C) samples from 306 confirmed cases in Taiwan and 3,069 non-COVID-19 Pneumonia (P) samples. Of those 3,069 Pneumonia (P) samples, 50% of them are diagnosed with bacterial pneumonia and the other half are diagnosed with viral pneumonia. Table 6 includes the validation and testing results on NHIA dataset, suggesting the potential of using our method for distinguishing COVID-19 from both common bacterial and viral pneumonia.

Figure 4: The chest x-ray captured date of each case with respect to subjective symptom onset date and confirmed date by RT-PCR. Red line indicates the chest x-ray captured date in which our model also predicted each case as positive (COVID-19). The left border of each light blue box denotes the subjective symptom onset date and the right end indicates the confirmed date. There are only 89 cases out of 109 cases shown in this graph due to missing dates in the other 20 cases.

Early COVID-19 detection with our screening system

Further, to gain insights into the advantages of using the proposed CXR screening platform, we investigate its efficiency of early prediction over RT-PCR so that the needed medical treatment can be performed as early as possible. To this end, we describe the overall performance of our screening platform and the case study on the NHIA dataset, particularly the validation (n=54) and the test (n=55) set. Out of these 109 samples detected by RT-PCR, there are 47 samples with visible signs of pneumonia, which are double confirmed by our medical specialist. Based on this subset, our model correctly detects 42 samples; that is, the sensitivity of our model achieves 89.36%.

Out of these 109 samples, taken in a period of three months (January 2020 to March 2020), the advantage of using our system for early detection of COVID-19 versus using RT-PCR can be observed from in the comparative study. Fig. 4 illustrates the details of each case.

  • Case 24 was detected by our system, 17 days prior to RT-PCR confirmed (on January 31, 2020).

  • Case 268 was detected by our system, 6 days prior to RT-PCR confirmed (on March 20, 2020).

  • There were 27 out of 109 cases detected at least 2 days earlier than the RT-PCR confirmed dates (Case 24, 226, 227, 235, 256, 268, 271, 272, 273, 279, 284, 287, 289, 290, 292, 308, 310, 313, 316, 317, 335, 349, 353, 355, 361, 363, and 379). Among these cases, Case 24, 268, 279, and 292 were detected 5 days earlier.

Figure 5: Visualization of qualitative results by our method on COVID-19 cases from clinical dataset. (A) Original chest x-ray image. (B) Regions of interest for pneumonia (Stage 2). (C) Regions of interest specific to COVID-19 (Stage 3). (D) Guided activation specific to COVID-19 (Stage 3).
Figure 6: Visualization of qualitative results by our method on COVID-19 cases from open dataset. (A) Original chest x-ray image. (B) Regions of interest for pneumonia (Stage 2). (C) Regions of interest specific to COVID-19 (Stage 3). (D) Guided activation specific to COVID-19 (Stage 3).

Comparison with relevant studies.

To evaluate the efficacy of our multi-stage method, we further conduct an experiment on the training/testing data split setting of COVID-Net [8]. The CXR samples are taken entirely from open datasets [2, 5], which consist of 13,594 CXR images (7966/5476/152 in N/P/C) in the training set and 231 CXR images (100/100/31 in N/P/C) in the testing set. As shown in Table 7, our method is more sensitive (and specific at inverse perspective of binary classification) than COVID-Net in both Normal and COVID-19 predictions. Note that in the stage 2 of our method, the binary classification is to predict Normal versus both COVID-19 and non-COVID-19 pneumonia.

Dataset Sensitivity
Normal Pneumonia COVID-19
COVID-Net [8] 97.0 90.0 87.1
Our Stage Pneumonia 98.0 90.84
Our Stage COVID-19 - 87.0 96.8
Table 7: Classification comparison of our method versus COVID-Net [8]

4 Conclusion

Since the outbreak of COVID-19, intensive efforts have being made by healthcare experts in hospitals to reach diagnosis result for each patient even with supplemental symptoms. Suggested by medical experts, because of the efficiency and availability of CXR, we propose an AI-based screening system to recognize COVID-19 pneumonia in CXR images. Regarding coarse to fine manners, our cascaded method consists of lung segmentation, pneumonia recognition, and COVID-19 recognition as hierarchical screening. The proposed approach outperforms a previous method on open dataset of COVID-19 cases and is able to reach clinical-grade performance on NTUH and TMUH clinical data. Moreover, our method has been integrated into the internal system of Taiwan NHIA and CDC, achieving over 80% sensitivity and specificity on NHIA clinical test cases. Key future research challenges include the sensitivity improvement on cases with mild symptoms, medical studies of COVID-19 cases with recognized lesion patterns, and the refinement of lung segmentation to uncover subtle and relevant regions.

5 Acknowledgement

We would like to thank National Taiwan University Hospital (NTUH), Taipei Medical University Hospital (TMUH), and Taiwan National Health Insurance Administration (NHIA) to provide clinical COVID-19 data for our studies. The experts from NTUH also provide us insightful feedback about medical findings and results evaluation on CXR images. For the deployment of our COVID-19 screening system, we appreciate the support from Taiwan Centers for Disease Control and Taiwan Executive Yuan, letting us contribute to the safeguard of Taiwan citizens’ well-being during this global pandemic.

References

  • [1] A. Bustos, A. Pertusa, J. Salinas, and M. de la Iglesia-Vayá (2019) Padchest: a large chest x-ray image dataset with multi-label annotated reports. arXiv preprint arXiv:1901.07441. Cited by: §3.
  • [2] J. P. Cohen, P. Morrison, and L. Dao (2020) COVID-19 image data collection. arXiv 2003.11597. External Links: Link Cited by: §3, §3, Table 1.
  • [3] Y. Fang, H. Zhang, J. Xie, M. Lin, L. Ying, P. Pang, and W. Ji (2020) Sensitivity of chest ct for COVID-19: comparison to RT-PCR. In https://doi.org/10.1148/radiol.2020200432, pp. . Cited by: §1.
  • [4] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 4700–4708. Cited by: §2.
  • [5] R. S. of North America. RSNA pneumonia detection challenge. Note: https://www.kaggle.com/c/rsnapneumonia-detection-challenge/data Cited by: §3, §3.
  • [6] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §2.
  • [7] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626. Cited by: §2.
  • [8] L. Wang and A. Wong (2020)

    COVID-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images

    .
    arXiv preprint arXiv:2003.09871. Cited by: §3, Table 7.
  • [9] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba (2016)

    Learning deep features for discriminative localization

    .
    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929. Cited by: §2.