COVID-Xpert: An AI Powered Population Screening of COVID-19 Cases Using Chest Radiography Images

04/06/2020 ∙ by Xin Li, et al. ∙ Wayne State University 15

With the increasing demand for millions of COVID-19 tests, Computed Tomography (CT) based test has emerged as a promising alternative to the gold standard RT-PCR test. However, it is primarily provided in emergency department and hospital settings due to the need for expensive equipment and trained radiologists. The accurate, rapid yet inexpensive test that is suitable for population screening of COVID-19 cases at mobile, urgent and primary care settings is urgently needed. Here we design a deep convolutional neural network (CNN) that extracts X-ray Chest Radiography (XCR) imaging features from large scale pneumonia and normal training cases and refine them with a small amount of COVID-19 cases to learn the imaging features that are capable of automatically discriminating COVID-19 cases from pneumonia and/or normal XCR imaging cases. We demonstrate the strong potential of our XCR based population screening approach, COVID-Xpert, for detecting COVID-19 cases through an impressive experimental performance. The trained models and information of the compiled data set are available from https://github.com/xinli0928/COVID-Xray.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 5

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The rapid spread of COVID-19 virus in all over the world and exponential increase in the size of susceptible population demand for accurate, rapid yet inexpensive population-based screening approaches for positive cases. The gold standard screening approach based on RT-PCR demonstrates a high accuracy but subject to significant limitations of high cost and slow turnover time, making it not scalable to the ever-increasing population at risk [wang2020detection]. Thanks to high-volume testing machines and new rapid tests, the total tests topped 1.4 million as of early April [nytimes]. However, millions of tests are still urgently needed as the virus keeps communities across the country in lockdown and hospitals are overwhelmed with patients. Besides nucleic acid-based tests, Computed Tomography (CT) [salehi2020coronavirus, lee2020covid, bai2020performance, li2020artificial] based approaches have also been widely employed for testing COVID-19 cases. CT based tests have been shown better sensitivity and specificity compared with nucleic acids-based tests [ai2020correlation] although mixed results exist [hope2020role]. Up to date most of medical imaging based diagnostic tools are based on CT and deployed in hospitals and emergency departments (EDs) where expensive CT equipment and expert radiologists are available. For example, Alibaba’s model [alibabacloud] trained on more than 5,000 confirmed cases and Infervision system [mak_2020] are likewise trained on over 5,000 confirmed cases and deployed at 34 hospitals in China.

The wide availability of X-ray Chest Radiography (XCR) diverse health care settings makes it an appealing option for rapid, accurate yet inexpensive screening particularly in mobile, urgent and primary care settings. At present, the bottleneck lies in the short of trained human radiologists who are capable of differentiating COVID-19 positive cases from other lung diseases and normal conditions directly from medical images. The intensive development of Convolutional Neural Network (CNN) powered XCR image classification has seen the unprecedented success in automatic lung disease classification [wang2017chestx, wang2018tienet]. As such, adequate knowledge has been accumulated from training the Artificial Intelligence (AI) system to accurately discern the subtle difference among the different lung disease conditions by learning discriminating XCR imaging features [rajpurkar2017chexnet, tang2018attention]. With the initially available COVID-19 XCR images, albeit with scarce labels, it is possible to build an accurate AI powered screening tool using the discriminating imaging features learned from the previous XCR based lung disease classification models [wang2017chestx, rajpurkar2017chexnet]. These models can be further trained, fine-tuned and validated using the small number of labeled COVID-19 XCR imaging cases.

Figure 1: An overview of COVID-Xpert architecture.

However, as medical resources at EDs and Intensive Care Units (ICUs) are falling short, XCR based population screening emerges as a cost-effective approach to combat the COVID-19 pandemic at the secondary health care facilities. As far as we know, there is still no AI system designed for using XCR images to perform COVID-19 screening at mobile, urgent and primary care settings. Moreover, the above-mentioned CT based AI systems are often trained primarily on the limited COVID-19 positive cases only and discriminate between COVID-19 and normal cases without exploiting the prior knowledge acquired from previous CT imaging research in lung diseases. Without including closely related pneumonia cases in the training, the models are not sufficiently sensitive to discern between COVID-19 and pneumonia cases. Furthermore, these models do not exploit the lung disease imaging features from prior studies and do not give explanations on the positive screening results right on the XCR images. Here we design a novel AI approach for XCR based COVID-19 population screening to be reliably deployed at mobile, urgent and primary care settings that enjoys the following advantages: 1) accurately detecting positive COVID-19 cases particularly from closely related pneumonia cases; 2) identifying the important regions on XCR images that correspond to (hopefully responsible for) the positive screening results; and 3) visually dissecting the spatial relationships between COVID-19 positive cases attempting to link to differential clinical outcomes.

2 Methods

2.1 COVID-Xpert Model Architecture

In this study, we employ DenseNet-121 deep neural network architecture as a template for pre-training in the source task, i.e., lung disease classification, and training, validation and testing in the destination task, i.e., COVID-19 screening. Different from recent studies [apostolopoulos2020covid, wang2020covid]

that pre-train the models with natural image data sets such as ImageNet

[deng2009imagenet], we pre-train our model using the more related ChestX-ray8 data set [wang2017chestx] to extract the XCR imaging features instead of generic natural imaging features. Specifically, beyond the dense block, we employ a shared fully connected layer for extracting the general XCR imaging feature and 8 fully connected disease-specific layers (including pneumonia as one disease layer) to extract disease-specific features (Figure 1

). After pre-training with the ChestX-ray8 data set of 108,948 image samples, the weights defining the general XCR imaging feature and the pneumonia disease feature are transferred to improve the training of our COVID-19 screening model using a smaller compiled data set of 3 classes of XCR images, i.e., COVID-19, normal and pneumonia. Collectively a total of 555 XCR images are used for training, validation and testing of the COVID-Xpert. The COVID-19 screening model is randomly initialized using two sets of weight parameters corresponding to normal and COVID-19 classes with the initial values of other weight parameters transferred from the pre-trained source model. The network is trained with Adam optimizer for 50 epochs with a mini-batch size of 32. The parameter values that give rise to the best performance on validation dataset are used for testing.

2.2 Experiment Setup

We compile a data set composed of 185 XCR images from normal class [kaggle_2018], 185 from pneumonia class [kaggle_2018] and 185 from COVID-19 class [cohen2020covid] and split it into train/validation/testing sets with 120/20/45 cases in each class. We pre-trained the source model using a large ChestX-ray8 data set of lung disease XCR images [wang2017chestx], followed by training, fine-tunning and validation using our training/validation sets of 120/20 images for each class. In Figure 3

a, we report the confusion matrix of classification performance using the 45 labeled testing images from each class. Our classification is capable of discriminating COVID-19 from pneumonia and/or normal cases evident by an impressively high accuracy of 88.9%.

In order to systematically evaluate the performance of COVID-Xpert under the different decision thresholds, we use the Area Under ROC (AUROC) value to assess how well the model is capable of discriminating COVID-19 cases from normal cases, pneumonia cases as well as normal plus pneumonia cases. In Figure 2, COVID-Xpert demonstrates a remarkable performance on all discrimination tasks. Importantly, it achieves AUROC of 0.973 when discriminating COVID-19 cases against mixed pneumonia and normal cases sustaining the strong potential for real-world deployment.

Figure 2: Discriminating (a) COVID-19 vs. Normal cases; (b) COVID-19 vs. Pneumonia cases and (c) COVID-19 vs. Normal + Pneumonia cases.

We further employ dimension reduction techniques such as T-SNE and PCA to visualize the three classes of XCR images at the low-dimension. As observed in Figure 3(b) and 3(c), the three classes of images demonstrate a good separability in the manifold learned by COVID-Xpert. The three-class separation from machine vision herein is consistent to the visual discrimination of the three classes from human radiologists. For example, one salient visual feature of the COVID-19 image class is the bilateral multilobar ground-glass opacification (GGO) with a peripheral or posterior distribution [salehi2020coronavirus].

Figure 3: (a) A confusion matrix for classification. Low-dimension visualization of the three classes of XCR images using (a) T-SNE based visualization (b) PCA based visualization.

2.3 Explaining COVID-Xpert

Besides accurately screening COVID-19 images from other lung disease and normal conditions, the model has to explain how and why the prediction result is generated before it is ready to be adopted for real-world screening. We use GRAD-CAM [selvaraju2017grad]

to interpret the COVID-19 screening results, which uses the gradient information and flows it back to the final convolutional layer to decipher the importance of each neuron in classifying an image to each disease class. Figure

4 shows the COVID-19 disease progression in a patient over the four time points, i.e., day 10, day 13, day 17 and day 25 with the worst status on the day 17 then recovered afterwards. In Figure 4, the heatmap starts from right side then spreads to the entire lung and finally migrates back to the right side upon recovery.

Even among the positive COVID-19 cases, the clinical outcomes are vastly different, ranging from very benign to very severe symptoms such as the life-threatening acute respiratory distress syndrome (ARDS) [zhou2020clinical]. It is thus intriguing to investigate the further question: is it potentially possible to use COVID-Xpert to stratify COVID-19 positive cases and dissect the mechanisms that can be potentially used for explaining the vastly different clinical outcomes. Using GRAD-CAM based model interpretation, we make an initial attempt to identify subgroups of COVID-19 cases. In Figure 5, all the 45 testing cases can be partitioned into a few spatially distributed groups where cases that are spatially close to each other are more similar to those are further away.

Figure 4: Longitudinal XCR images of a patient over the four time points. (Upper row) XCR images. (Lower row) XCR images layered with heatmaps generated using GRAD-CAM to highlight the change of sensitive regions over the time.
Figure 5: A thumbnail view of the spatially distributed XCR images layered with heatmaps of sensitive regions using t-SNE.

3 Conclusion and Discussion

In this paper, we present a deep learning model, COVID-Xpert, for rapid, accurate yet inexpensive population screening of COVID-19 cases. We also attempt to explain the COVID positive cases by locating the sensitive regions in the images and identify patient subgroups to explain the differential clinical outcomes. Since our model is trained with a relatively small labeled COVID-19 imaging data set, there is a large margin for performance improvement with the continuous addition of new training cases, either labeled or unlabeled. In the near future, we will further develop a hardware friendly model (e.g., use MobileNetV2 [sandler2018mobilenetv2] for population screening of COVID-19 cases that is suitable for deploying at mobile devices.

References