COVID-19 Screening on Chest X-ray Images Using Deep Learning based Anomaly Detection

03/27/2020 ∙ by Jianpeng Zhang, et al. ∙ The University of Adelaide 18

Coronaviruses are important human and animal pathogens. To date the novel COVID-19 coronavirus is rapidly spreading worldwide and subsequently threatening health of billions of humans. Clinical studies have shown that most COVID-19 patients suffer from the lung infection. Although chest CT has been shown to be an effective imaging technique for lung-related disease diagnosis, chest Xray is more widely available due to its faster imaging time and considerably lower cost than CT. Deep learning, one of the most successful AI techniques, is an effective means to assist radiologists to analyze the vast amount of chest X-ray images, which can be critical for efficient and reliable COVID-19 screening. In this work, we aim to develop a new deep anomaly detection model for fast, reliable screening. To evaluate the model performance, we have collected 100 chest X-ray images of 70 patients confirmed with COVID-19 from the Github repository. To facilitate deep learning, more data are needed. Thus, we have also collected 1431 additional chest X-ray images confirmed as other pneumonia of 1008 patients from the public ChestX-ray14 dataset. Our initial experimental results show that the model developed here can reliably detect 96.00 96.00 on 1531 Xray images with two splits of the dataset.



There are no comments yet.


page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently, the rapid spread of coronavirus disease (named COVID-19) has caused panic worldwide since December 2019. The rapid escalation of COVID-19 pandemic—with hundreds of deaths and thousands of infections emerging in many areas every day—is presenting great challenges for stopping the virus [1, 2].

Figure 1: Diagram of the proposed model.

Currently, viral nucleic acid detection using real-time polymerase chain reaction (RT-PCR) is the accepted standard diagnostic method. However, many hyper-endemic regions or countries are not able to provide the sufficient RT-PCR testing for tens of thousands of suspected patients. To address the lack of reagents, efforts have been made to detect COVID-19 from CT images. Fang et al. [6] reported the high sensitivity (up to 98%) of chest CT for COVID-19 screening in a series of 51 patients. To speed up the screening, Ophir et al. [8] employed the deep learning technology to detect COVID-19 on CT images. Shi et al. [13]

collected a large-scale COVID-19 CT dataset and developed an machine learning based method for COVID-19 screening. A drawback of CT imaging is that typically it takes considerably more time than X-ray imaging. Besides, sufficient high-quality CT scanners may not be available in many under-developed regions, making timely COVID-19 screening become impossible. In contrast, X-rays are the most common and widely available diagnostic imaging technique, playing a crucial role in clinical care and epidemiological studies

[7, 4]. Most ambulatory care facilities, even in rural regions, have deployed X-ray units as a basic diagnostic imaging. Besides, real-time imaging of X-rays would significantly speed up the disease screening.

In view of these advantages, we aim to develop a deep learning-based model that can detect COVID-19 based on chest X-ray images with sufficiently high sensitivity, enabling fast and reliable screening. The detected suspected COVID-19 patients will then be reported for the next clinical viral nucleic acid detection.

Detecting COVID-19 using chest X-ray with high sensitivity is very challenging, not only due to the ribs overlying soft tissue and low contrast but also because of the limited availability of a large number of annotated data. This is particularly true for deep learning based approaches as deep learning is notoriously being data hungry. To tackle this problem, we collect additional pneumonia images from the public ChestX-ray14 dataset [14] as non-COVID-19 cases. To address the data imbalance problem existing between COVID-19 and non-COVID-19 cases, we propose a X-ray-based COVID-19 screening model, inspired by [11], to promote the imbalance binary classification task through the anomaly detection task.

2 Methods

We propose a deep-learning model to identify COVID-19 from non-COVID-19 cases. As shown in Figure 1, the model is composed of three components, namely, a backbone network, a classification head, and an anomaly detection head. Given an input chest X-ray image , we employ the backbone network to extract its high-level features, which are then input into the classification head and anomaly detection head, respectively. The classification head generates a classification score , and the anomaly detection head generates a scalar anomaly score . Subsequently, we calculate another scalar score as a reference by randomly selecting normal X-ray images and calculating the mean of anomaly scores of them. Finally, we optimize the model via minimizing the binary cross entropy loss for classification and the deviation loss for anomaly detection, aiming to assign statistically significantly larger classification scores and anomaly scores to X-ray images with COVID-19 than those assigned to normal controls.

Backbone network.

We use the 18-layer residual convolutional neural network


pretrained on the ImageNet dataset

[5] as the backbone network. In Figure. 1(a), the rectangles with different colors represent the five stages of the backbone network. Large covolutions with

kernels and a stride of 2 are used in the first stage, followed by a

max-pooling layer with a stride of 2. Next, each stage is composed of two residual blocks, each containing two convolutional layers and one skip connection. After the layer-by-layer convolutional operations, the input image can be encoded as a feature map with a output stride of 5.

Classification head.

To classify the input image, we design the classification head and supplement it at the end of backbone network. We add a new classification convolutional layer with a stride of 2 and a multi-layer perception, which contains a 100-neuron hidden layer, a one-neuron output layer, and the sigmoid activation. To optimize the classification head, we choose the binary cross-entropy loss as


where represents the ground truth label, i.e., means that the input is a non-COVID-19 case and means a COVID-19 case.

Anomaly detection head. In the meantime, we also supplement the backbone network with the anomaly detection head, which has the same architecture with the classification head. Differently, this head generates the scalar anomaly scores and, accordingly, detects anomaly images (i.e., COVID-19 cases). To guide the learning of anomaly detection branch, we randomly sample

normal data from a Gaussian distribution,

i.e., , and define the reference score as . Following [11], we set and for this work. With the obtained anomaly score and reference score, we employ the following contrastive loss [9, 3] to optimize the COVID-19 score generator



is the standard deviation of the anomaly scores of randomly selected

normal data, and

is the Z-score confidence interval parameter. For this work,

is empirically set to 5.

Training algorithm.

In the training stage, we use the standard Stochastic gradient descent (SGD) algorithm with a batch size of 128 as the optimizer. We set the max epochs to 500 and set the learning rate to

, which is linearly decayed. We resize each training images to a fixed size of pixels. To alleviate the overfitting of our model on the training data, we use the data argumentation strategies, including randomly cropping patches from resized images, zooming (90%110%), and randomly horizontally flip, to enlarge the training dataset.

Inference algorithm. In the inference stage, we input a test X-ray image into the trained model and generate a classification score and a scalar anomaly score via the forward propagation. The final decision is made according to


where , is the threshold that controls the trade-off between sensitivity and specificity.

3 Experiments

3.1 Dataset and Evaluation Metrics

The dataset used for this work includes 100 chest X-ray images acquired on 70 subjects, all of which were confirmed with COVID-19, and 1431 chest X-ray images diagnosed as pneumonia (not COVID-19) from 1008 subjects. The COVID-19 cases are available at the Github repository222, and the pneumonia cases are available at the ChestX-ray14 dataset [14].

The screening performance of our model was assessed by the sensitivity, specificity, and area under the receiver operator curve (AUC). Sensitivity and specificity give the proportion of positives and negatives that are correctly identified, respectively, and AUC measures the overall classification performance, which is sensitive to the imbalance between two classes.

3.2 Results

We randomly split the data twice and conduct the experiments twice for evaluation. The first split contains 50 images of 33 COVID-19 patients and 714 images of 492 other pneumonia patients for testing and others for training. The second split contains 50 images of 37 COVID-19 patients and 717 images of 516 other pneumonia patients for testing and others for training. The final performance is the average performance on two splits.

Threshold Sensitivity (%) Specificity (%) AUC (%)
0.50 72.00 97.97 95.18
0.40 77.00 95.46 95.18
0.30 81.00 90.71 95.18
0.25 90.00 87.84 95.18
0.20 93.00 81.41 95.18
0.15 96.00 70.65 95.18
Table 1: COVID-19 screening performance of our model when setting Threshold to different values.
Figure 2: Confusion matrix of our model when the Threshold is set to different values.
Figure 3: ROC curves of two comparisons: (a) single classification network the classification head of our model; (a) single anomaly detection network the anomaly detection head of our model.
Figure 4: Visualization of four patients’ chest X-ray images (a) and the corresponding Grad-CAMs obtained by our model. (b) is the Grad-CAMs obtained by the classification head and (c) is the Grad-CAMs obtained by the anomaly detection head.

The parameter controls the trade-off between the true positive rate (i.e., sensitivity) and true negative rate (i.e., specificity). To investigate the impact of on the screening performance, we performed the inference stage with different values of , including 0.50, 0.40, 0.30, 0.25, 0.20 and 0.15. The screening performance obtained by our model was reported in Table 1 and Figure 2. It shows that

  • setting the parameter to different values in the inference stage leads interestingly to the same AUC value, though the both the sensitivity and specificity are variable; and

  • when the parameter decreases from 0.50 to 0.15, the sensitivity increases from 72.00% to 96.00% and the Specificity drops from 97.97% to 70.65%.

As a model signed for COVID-19 screening, the proposed method aims to reduce the false negative rate as much as possible, since false positive cases can potentially be identified in the subsequent viral nucleic acid detection, but false negative cases will not have a chance for a “second test”. Therefore, we suggest setting the parameter to a small value like 0.15 so as to reduce the false negative rate to as low as 4%.

To demonstrate that our two heads learning is effective, we performed the following two comparisons: (1) single classification network the classification head of our model and (2) single anomaly detection network the anomaly detection head of our model. The ROC curves were plotted in Figure 3. As we expected, our model, which combines the classification and anomaly detection tasks, outperforms each single task learning model.

We use the Gradient-weighted Class Activation Mapping (Grad-CAM) method [12] to highlight the regions that our model might use to make the predictions. Four patients’ chest X-ray images with accompanying Grad-CAMs were shown in Figure 4. It reveals that our model tends to highlight localized regions within the lungs.

4 Discussion and Conclusion

Shi et al. [13] achieved the sensitivity of 90.70% and specificity of 83.30% on a large-scale CT dataset, including 1658 subjects with COVID-19 and 1027 subjects with non-COVID-19 pneumonia. In this study, our model achieves the sensitivity of 90.00% specificity of 87.84% (when ) or the sensitivity of 96.00% specificity of 70.65% (when ) on the X-ray dataset that contains 100 images from 70 COVID-19 subjects and 1431 images from 1008 non-COVID-19 pneumonia subjects. Compare to the CT-based screening method, our X-ray-based model achieves the comparable performance. More important, our model only learns from 70 COVID-19 subjects, which is less than 5 percent of [13]. Therefore, the proposed model that uses chest X-ray can be considered as an effective computer-aided diagnosis (CAD) tool for low-cost and fast COVID-19 screening.

However, our model still has several limitations, such as missing 4% COVID-19 cases and almost 30% of false positive rate. Our future work will focus on further reducing the false negative rate and, if possible, decreasing the false positive rate as well. We will also investigate how to differentiate the COVID-19 severity using chest X-ray and then detect the potentially severe cases for the early treatment, which requires more clinical diagnostic information. What’s more, more clinical data are needed to further validate and improve the effectiveness of our model.

Declaration of Conflicting Interests
The authors declare that there is no conflict of interests regarding the publication of this article. Chunhua Shen and his employer received no financial support for the research, authorship, and/or publication of this article.

The authors thank Dr. Guansong Pang for constructive discussions. The authors appreciate the efforts devoted to collecting and sharing the COVID-19 chest X-ray images and the ChestX-ray14 dataset for research on low-cost and fast X-ray-based COVID-19 screening.


  • [1] Yan Bai, Lingsheng Yao, Tao Wei, Fei Tian, Dong-Yan Jin, Lijuan Chen, and Meiyun Wang. Presumed asymptomatic carrier transmission of COVID-19. Journal of the American Medical Association (JAMA), 2020.
  • [2] Huijun Chen, Juanjuan Guo, Chen Wang, Fan Luo, Xuechen Yu, Wei Zhang, Jiafu Li, Dongchi Zhao, Dan Xu, Qing Gong, et al. Clinical characteristics and intrauterine vertical transmission potential of covid-19 infection in nine pregnant women: a retrospective review of medical records. The Lancet, 395(10226):809–815, 2020.
  • [3] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. arXiv:2002.05709, 2020.
  • [4] Thomas Cherian, E Kim Mulholland, John B Carlin, Harald Ostensen, Ruhul Amin, Margaret de Campo, David Greenberg, Rosanna Lagos, Marilla Lucero, Shabir A Madhi, et al. Standardized interpretation of paediatric chest radiographs for the diagnosis of pneumonia in epidemiological studies. Bulletin of the World Health Organization, 83:353–359, 2005.
  • [5] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

    , pages 248–255. Ieee, 2009.
  • [6] Yicheng Fang, Huangqi Zhang, Jicheng Xie, Minjie Lin, Lingjun Ying, Peipei Pang, and Wenbin Ji. Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology, page 200432, 2020.
  • [7] T Franquet. Imaging of pneumonia: trends and algorithms. European Respiratory Journal, 18(1):196–208, 2001.
  • [8] Ophir Gozes, Maayan Frid-Adar, Hayit Greenspan, Patrick D Browning, Huangqi Zhang, Wenbin Ji, Adam Bernheim, and Eliot Siegel. Rapid AI development cycle for the coronavirus (COVID-19) pandemic: Initial results for automated detection & patient monitoring using deep learning CT image analysis. arXiv preprint arXiv:2003.05037, 2020.
  • [9] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 1735–1742. IEEE, 2006.
  • [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  • [11] Guansong Pang, Chunhua Shen, and Anton van den Hengel. Deep anomaly detection with deviation networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 353–362, 2019.
  • [12] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
  • [13] Feng Shi, Liming Xia, Fei Shan, Dijia Wu, Ying Wei, Huan Yuan, Huiting Jiang, Yaozong Gao, He Sui, and Dinggang Shen. Large-scale screening of covid-19 from community acquired pneumonia using infection size-aware classification, 2020.
  • [14] Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 2097–2106, 2017.