Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning

05/24/2020 ∙ by Joseph Paul Cohen, et al. ∙ 0

The need to streamline patient management for COVID-19 has become more pressing than ever. Chest X-rays provide a non-invasive (potentially bedside) tool to monitor the progression of the disease. In this study, we present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images. Such a tool can gauge severity of COVID-19 lung infections (and pneumonia in general) that can be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU. Images from a public COVID-19 database were scored retrospectively by three blinded experts in terms of the extent of lung involvement as well as the degree of opacity. A neural network model that was pre-trained on large (non-COVID-19) chest X-ray datasets is used to construct features for COVID-19 images which are predictive for our task. This study finds that training a regression model on a subset of the outputs from an this pre-trained chest X-ray model predicts our geographic extent score (range 0-8) with 1.14 mean absolute error (MAE) and our lung opacity score (range 0-6) with 0.78 MAE. All code, labels, and data are made available at and



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As the first countries explore deconfinement strategies [Wilson & Moulson, 2020] the death toll of COVID-19 keeps rising [O’Grady et al., 2020]. The increased strain caused by the pandemic on healthcare systems worldwide has prompted many physicians to resort to new strategies and technologies. Chest X-rays (CXRs) provide a non-invasive (potentially bedside) tool to monitor the progression of the disease [Yoon et al., 2020; Ng et al., 2020]

. As early as March 2020, Chinese hospitals used artificial intelligence (AI)-assisted computed tomography (CT) imaging analysis to screen COVID-19 cases and streamline diagnosis

[Jin et al., 2020]. Many teams have since launched AI initiatives to improve triaging of COVID-19 patients (i.e., discharge, general admission or ICU care) and allocation of hospital resources (i.e., direct non-invasive ventilation to invasive ventilation) [Strickland, 2020]. While these recent tools exploit clinical data, practically deployable CXR-based predictive models remain lacking.

In this work, we build and study a model which predicts the severity of COVID-19 pneumonia, based on CXRs, to be used as an assistive tool when managing patient care. The ability to gauge severity of COVID-19 lung infections can be used for escalation or de-escalation of care, especially in the ICU. An automated tool can be applied to patients over time to objectively and quantitatively track disease progression and treatment response.

2 Materials and Methods

2.1 COVID-19 Cohort

We used a retrospective cohort of 94 posteroanterior (PA) CXR images from a public COVID-19 image data collection [Cohen et al., 2020b]. While the dataset currently contains 153 images, it only counted 94 images at the time of the experiment, all of which were included in the study. All patients were reported COVID-19 positive and sourced from many hospitals around the world from December 2019 to March 2020. The images were de-identified prior to our use and there was no missing data. The ratio between male/female was 44/36 with an average age of 5614.8 (5515.6 for male and 5713.9 for female).

2.2 Labels

Radiological scoring was performed by three blinded experts: two chest radiologists (each with at least 20 years of experience) and a radiology resident. They staged disease severity using a score system adapted from [Wong et al., 2019], based on two types of scores (parameters): extent of lung involvement and degree of opacity.

  1. The extent of lung involvement by ground glass opacity or consolidation for each lung (right lung and left lung separately) was scored as: 0 = no involvement; 1 = 25% involvement; 2 = 25-50% involvement; 3 = 50-75% involvement; 4 = 75% involvement. The total extent score ranged from 0 to 8 (right lung and left lung together).

  2. The degree of opacity for each lung (right lung and left lung separately) was scored as: 0 = no opacity; 1 = ground glass opacity; 2 = consolidation; 3 = white-out. The total opacity score ranged from 0 to 6 (right lung and left lung together).

A spreadsheet was maintained to pair filenames with their respective scores. Fleiss’ Kappa for inter-rater agreement was 0.45 for the opacity score and 0.71 for the extent score..

2.3 Non-COVID-19 (Pre-Training) Datasets

Figure 1:

Detail of the different features being used. The two dataset blocks show that COVID-19 images were not used to train the neural network. The network diagram is split into 3 sections. The feature extraction layers are convolutional layers which transform the image into a 1024 dimensional vector which is called the intermediate network features. These features are then transformed using the task prediction layer (a sigmoid function for each task) into the outputs for each task. The different groupings of outputs used in this work are shown.

Prior to the experiment, the model was trained on the following public datasets, none of which contained COVID-19 cases:

  1. RSNA Pneumonia CXR dataset on Kaggle111;

  2. CheXpert dataset from Stanford University [Irvin et al., 2019];

  3. ChestX-ray8 dataset from the National Institute of Health (NIH) [Wang et al., 2017];

  4. ChestX-ray8 dataset from the NIH with labels from Google [Majkowska et al., 2019];

  5. MIMIC-CXR dataset from MIT [Johnson et al., 2019];

  6. PadChest dataset from the University of Alicante [Bustos et al., 2019];

  7. OpenI [Demner-Fushman et al., 2016]

These seven datasets were manually aligned to each other on 18 common radiological finding tasks in order to train a model using all datasets at once (atelectasis, consolidation, infiltration, pneumothorax, edema, emphysema, fibrosis, fibrosis, effusion, pneumonia, pleural thickening, cardiomegaly, nodule, mass, hernia, lung lesion, fracture, lung opacity, and enlarged cardiomediastinum). For example “pleural effusion” from one dataset is considered the same as “effusion” from another dataset in order to consider these labels equal. In total, 88,079 non-COVID-19 images were used to train the model on these tasks.

2.4 Model, Preprocessing, and Pre-Training

In this study, we used a DenseNet model [Huang et al., 2017] from the TorchXRayVision library [Cohen et al., 2020c, a]. DenseNet models have been shown to predict Pneumonia well [Rajpurkar et al., 2017]. Images were resized to 224 × 224 pixels, utilizing a center crop if the aspect ratio was uneven, and the pixel values were scaled to [-1024, 1024] for the training. More details about the training can be found in [Cohen et al., 2020a].

Before even processing the COVID-19 images, a pre-training step was performed using the seven datasets to train feature extraction layers and a task prediction layer (shown in Figure 1). This “pre-training” step was performed on a large set of data in order to construct general representations about lungs and other aspects of CXRs that we would have been unable to achieve on the small set of COVID-19 images available. Some of these representations are expected to be relevant to our downstream tasks. There are a few ways we can extract useful features from the pre-trained model as detailed in Figure 1.

2.5 Training

Similarly to the images from non-COVID-19 datasets used for pre-training, each image from the COVID-19 dataset was preprocessed (resized, centercropped, rescaled), then processed by the feature extraction layers and the task prediction layer of the network. The network was trained on existing datasets before the weights were frozen. COVID-19 images were processed by the network to generate features used in place of the images. As was the case with images from the seven non-COVID-19 datasets, the feature extraction layers produced a representation of the 94 COVID-19 images using a 1024 dimensional vector, then the fully connected task prediction layer produced outputs for each of the 18 original tasks. We build models on the pre-sigmoid outputs.

Linear regression was performed to predict the aforementioned scores (extent of lung involvement and opacity) using these different sets of features in place of the image itself:

  1. Intermediate network features - the result of the convolutional layers applied to the image resulting in a 1024 dimensional vector which is passed to the task prediction layer;

  2. 18 outputs - each image was represented by the 18 outputs (pre-sigmoid) from the pre-trained model;

  3. 4 outputs - a hand picked subset of outputs (pre-sigmoid) were used containing radiological findings more frequent in pneumonia (lung opacity, pneumonia, infiltration, and consolidation);

  4. Lung opacity output - the single output (pre-sigmoid) for lung opacity was used because it was very related to this task. This feature was different from the opacity score that we would like to predict.

For each experiment performed, the 94 images COVID-19 dataset was randomly split into a train and test set roughly 50/50. Multiple timepoints from the same patient were grouped together into the same split so that a patient did not span both sets. Sampling was repeated throughout training in order to obtain a mean and standard deviation for each performance. As linear regression was used, there was no early stopping that had to be done to prevent the model from overfitting. Therefore, the criterion for determining the final model was only the mean squared error (MSE) on the training set.

2.6 Saliency maps

In order to ensure that the models are looking at reasonable aspects of the images [Reed & Marks, 1999; Zech et al., 2018; Viviano et al., 2019], a saliency map is computed by computing the gradient of the output prediction with respect to the input image (if a pixel is changed how much will it change the prediction). In order to smooth out the saliency map, it is blurred using a 5x5 Gaussian kernel. Keep in mind that these saliency maps have limitations and only offer a restricted view into why a model made a prediction [Ross et al., 2017; Viviano et al., 2019].

Using features:
# parameters
(fewer is better)
PearsonCorrelation MAE MSE
“lung opacity” output 1+1 0.780.04 0.580.09 0.780.05 0.860.11
4 outputs 4+1 0.780.04 0.580.09 0.760.05 0.870.12
18 outputs 18+1 0.730.09 0.440.16 0.860.11 1.150.33
Intermediate network features 1024+1 0.660.08 0.250.21 1.010.09 1.540.28
No data 0+1 -0.000.00 -0.080.10 1.240.10 2.260.36
Table 1: Performance metrics of each set of features for the Geographic Extent prediction. Evaluation is performed on 50 randomly chosen train test splits and the metrics here are computed on a hold out test set. : coefficient of determination; MAE: mean absolute error; MSE: mean squared error. “4 outputs” refers to lung opacity, pneumonia, infiltration, and consolidation.
Using features:
# parameters
(fewer is better)
PearsonCorrelation MAE MSE
“lung opacity” output 1+1 0.800.05 0.600.09 1.140.11 2.060.34
4 outputs 4+1 0.790.05 0.570.10 1.190.11 2.170.37
18 outputs 18+1 0.760.08 0.470.16 1.320.17 2.730.89
Intermediate network features 1024+1 0.740.08 0.430.16 1.360.13 2.880.58
No data 0+1 0.000.00 -0.080.10 2.000.17 5.600.95
Table 2: Performance metrics of each set of features for the Opacity Score prediction. Evaluation is performed on 50 randomly chosen train test splits and the metrics here are computed on a hold out test set. : coefficient of determination; MAE: mean absolute error; MSE: mean squared error. “4 outputs” refers to lung opacity, pneumonia, infiltration, and consolidation.
Figure 2: Scatter plots showing alignment between our best model predictions and human annotation (ground truth) for Geographical Extent and Opacity scores. Evaluation is on a hold out test set. The grey dashed line is a perfect prediction. Red lines indicate error from a perfect prediction. : coefficient of determination.

3 Results

Quantitative performance metrics The single “lung opacity” output as a feature yielded the best correlation (0.80), followed by 4 outputs (lung opacity, pneumonia, infiltration, and consolidation) parameters (0.79) (Table 1 and 2

). Building a model on only a few outputs provides the best performance. The mean absolute error (MAE) is useful to understand the error in units of the scores that are predicted while the mean squared error (MSE) helps to rank the different methods based on their furthest outliers. One possible reason that fewer features work best is that having fewer parameters prevents overfitting. Some features could serve as proxy variables to confounding attributes such as sex or age and preventing these features from being used prevents the distraction from hurting generalization performance. Hand selecting feature subsets which are intuitively related to this task imparts domain knowledge as a bias on the model which improves performance. Thus, the top performing model (using the single “lung opacity” output as a feature) is used for the subsequent qualitative analysis.

Qualitative analysis of predicted scores Figure 2 shows the top performing model’s (using the single “lung opacity” output as a feature) predictions against the ground truth score (given by the blinded experts) on held out test data. Majority of the data points fall close to the line of unity. The model overestimates scores between 1 and 3 and underestimates scores above 4. However, generally the predictions seem reasonable given the agreement of the raters.

Figure 3: A spatial representation of pneumonia specific features (lung opacity, pneumonia, infiltration, and consolidation) when projected into 2 dimensions (2D) using a t-distributed stochastic neighbor embedding (t-SNE) [van der Maaten & Hinton, 2008]. In this 2D space, the high dimensional (4D) distances are preserved, specifically what is nearby. CXR images which have similar outputs are close to each other. Features are extracted for all 208 images in the dataset and the geographic extent prediction is shown for each image. The survival information available in the dataset represented by the shape of the marker.
(a) Geographic Extent Score: 5, Predicted: 5.3
(b) Geographic Extent Score: 0, Predicted: -0.8
(c) Geographic Extent Score: 2, Predicted: 0.62
(d) Geographic Extent Score: 0, Predicted: 1.05
Figure 4: Examples of correct (a,b) and incorrect (c,d) predictions by the model are shown with a saliency map generated by computing the gradient of the output prediction with respect to the input image and then blurred using a 5x5 Gaussian kernel. The assigned and predicted scores for Geographic Extent are shown to the right.

Studying learned representations In Figure 3, we explore what the representation used by one of the best models looks at in order to identify signs of overfitting and to gain insights into the variation of the data. A t-distributed stochastic neighbor embedding (t-SNE) [van der Maaten & Hinton, 2008] is computed on all data (even those not scored) in order to project the features into a two-dimensional (2D) space. Each CXR is represented by a point in a space where relationships to other points are preserved from the higher dimensional space. The cases of the survival group tend to cluster together as well as the cases of the deceased group. This clustering indicates that score predictions align with clinical outcomes.

Inspecting saliency maps In Figure 4, images are studied which were not seen by the model during training. For most of the results, the model is correctly looking at opaque regions of the lungs. Figure 4b shows no signs of opacity and the model is focused on the heart and diaphragm, which is likely a sign that they are used as a color reference when determining what qualifies as opaque. In Figure 4c and 4d, we see erroneous predictions.

4 Discussion

In the context of a pandemic and the urgency to contain the crisis, research has increased exponentially in order to alleviate the healthcare system’s burden. However, many prediction models for diagnosis and prognosis of COVID-19 infection are at high risk of bias and model overfitting as well as poorly reported, their alleged performance being likely optimistic [Wynants et al., 2020]. In order to prevent premature implementation in hospitals [Ross, 2020], tools must be robustly evaluated along several practical axes [Wiens et al., 2019; Ghassemi et al., 2019; Cohen et al., 2020a]. Indeed, while some AI-assisted tools might be powerful, they do not replace clinical judgment and their diagnostic performance cannot be assessed or claimed without a proper clinical trial [Nagendran et al., 2020].

Our model’s ability to gauge severity of COVID-19 lung infections could be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the intensive care unit (ICU) [Toussie et al., 2020]. The use of a score combining geographical extent and degree of opacity allows clinicians to compare CXR images with each other using a quantitative and objective measure. Also, this can be done at scale for a large scale analysis.

Existing work focuses on predicting severity from a variety of clinical indicators which include findings from chest imaging [Jiang et al., 2020; Shi et al., 2020]. Models such as the one presented in this work can complement and improve these models and potentially help to make decisions from CXR as opposed to CT.

Challenges in creating a predictive model involve labelling the data and achieving good inter-rater agreement as well as learning a representation which will generalize to new images when the number of labelled images is so low. In the case of building a predictive tool for COVID-19 CXR images, the lack of a public database made it difficult to conduct large-scale robust evaluations. This small number of samples prevents proper cohort selection which is a limitation of this study and exposes our evaluation to sample bias. However, we use a model which was trained on a large dataset with related tasks which provided us with a robust unbiased COVID-19 feature extractor and allows us to learn only two parameters for our best linear regression model. Restricting the complexity of the learned model in this way reduces the possibility of overfitting.

Our evaluation could be improved if we were able to obtain new cohorts labelled with the same severity score to ascertain the generalization of our model. Also, it is unknown if these radiographic scores of disease severity reflect actual functional or clinical outcomes as the open data do not have those data. We make the images, labels, model, and code public from this work so that other groups can perform follow-up evaluations.


This research is based on work partially supported by the CIFAR AI and COVID-19 Catalyst Grants. This work utilized the supercomputing facilities managed by Compute Canada and Calcul Quebec. We thank for making data available for our research.


This project is approved by the University of Montreal’s Ethics Committee #CERSES-20-058-D


  • Bustos et al. [2019] Bustos, Aurelia, Pertusa, Antonio, Salinas, Jose-Maria, and de la Iglesia-Vayá, Maria. PadChest: A large chest x-ray image dataset with multi-label annotated reports. arXiv preprint, 1 2019.
  • Cohen et al. [2020a] Cohen, Joseph Paul, Hashir, Mohammad, Brooks, Rupert, and Bertrand, Hadrien. On the limits of cross-domain generalization in automated X-ray prediction. In Medical Imaging with Deep Learning, 2020a.
  • Cohen et al. [2020b] Cohen, Joseph Paul, Morrison, Paul, and Dao, Lan. COVID-19 Image Data Collection., 2020b.
  • Cohen et al. [2020c] Cohen, Joseph Paul, Viviano, Joseph, Hashir, Mohammad, and Bertrand, Hadrien. TorchXRayVision: A library of chest X-ray datasets and models., 2020c.
  • Demner-Fushman et al. [2016] Demner-Fushman, Dina, Kohli, Marc D., Rosenman, Marc B., Shooshan, Sonya E., Rodriguez, Laritza, Antani, Sameer, Thoma, George R., and McDonald, Clement J. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 3 2016. doi: 10.1093/jamia/ocv080.
  • Ghassemi et al. [2019] Ghassemi, Marzyeh, Naumann, Tristan, Schulam, P., Beam, Andrew L., Chen, Irene Y., and Ranganath, Rajesh. Practical guidance on artificial intelligence for health-care data, 8 2019.
  • Huang et al. [2017] Huang, Gao, Liu, Zhuang, van der Maaten, Laurens, and Weinberger, Kilian Q. Densely Connected Convolutional Networks. In

    Computer Vision and Pattern Recognition

    , 2017.
  • Irvin et al. [2019] Irvin, Jeremy, Rajpurkar, Pranav, Ko, Michael, Yu, Yifan, Ciurea-Ilcus, Silviana, Chute, Chris, Marklund, Henrik, Haghgoo, Behzad, Ball, Robyn, Shpanskaya, Katie, Seekins, Jayne, Mong, David A., Halabi, Safwan S., Sandberg, Jesse K., Jones, Ricky, Larson, David B., Langlotz, Curtis P., Patel, Bhavik N., Lungren, Matthew P., and Ng, Andrew Y. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. In AAAI Conference on Artificial Intelligence, 1 2019.
  • Jiang et al. [2020] Jiang, Xiangao, Coffee, Megan, Bari, Anasse, Wang, Junzhang, Jiang, Xinyue, Huang, Jianping, Shi, Jichan, Dai, Jianyi, Cai, Jing, Zhang, Tianxiao, Wu, Zhengxing, He, Guiqing, and Huang, Yitong. Towards an Artificial Intelligence Framework for Data-Driven Prediction of Coronavirus Clinical Severity. Computers, Materials & Continua, 2020. doi: 10.32604/cmc.2020.010691.
  • Jin et al. [2020] Jin, Ying-Hui, Cai, Lin, Cheng, Zhen-Shun, Cheng, Hong, Deng, Tong, Fan, Yi-Pin, Fang, Cheng, Huang, Di, Huang, Lu-Qi, Huang, Qiao, Han, Yong, Hu, Bo, Hu, Fen, Li, Bing-Hui, Li, Yi-Rong, Liang, Ke, Lin, Li-Kai, Luo, Li-Sha, Ma, Jing, Ma, Lin-Lu, Peng, Zhi-Yong, Pan, Yun-Bao, Pan, Zhen-Yu, Ren, Xue-Qun, Sun, Hui-Min, Wang, Ying, Wang, Yun-Yun, Weng, Hong, Wei, Chao-Jie, Wu, Dong-Fang, Xia, Jian, Xiong, Yong, Xu, Hai-Bo, Yao, Xiao-Mei, Yuan, Yu-Feng, Ye, Tai-Sheng, Zhang, Xiao-Chun, Zhang, Ying-Wen, Zhang, Yin-Gao, Zhang, Hua-Min, Zhao, Yan, Zhao, Ming-Juan, Zi, Hao, Zeng, Xian-Tao, Wang, Yong-Yan, and Wang, Xing-Huan. A rapid advice guideline for the diagnosis and treatment of 2019 novel coronavirus (2019-nCoV) infected pneumonia (standard version). Military Medical Research, 2020. doi: 10.1186/s40779-020-0233-6.
  • Johnson et al. [2019] Johnson, Alistair E. W., Pollard, Tom J., Berkowitz, Seth J., Greenbaum, Nathaniel R., Lungren, Matthew P., Deng, Chih-ying, Mark, Roger G., and Horng, Steven. MIMIC-CXR: A large publicly available database of labeled chest radiographs. Nature Scientific Data, 1 2019. doi: 10.1038/s41597-019-0322-0.
  • Majkowska et al. [2019] Majkowska, Anna, Mittal, Sid, Steiner, David F., Reicher, Joshua J., McKinney, Scott Mayer, Duggan, Gavin E., Eswaran, Krish, Cameron Chen, Po-Hsuan, Liu, Yun, Kalidindi, Sreenivasa Raju, Ding, Alexander, Corrado, Greg S., Tse, Daniel, and Shetty, Shravya. Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation. Radiology, 12 2019. doi: 10.1148/radiol.2019191293.
  • Nagendran et al. [2020] Nagendran, Myura, Chen, Yang, Lovejoy, Christopher A., Gordon, Anthony C., Komorowski, Matthieu, Harvey, Hugh, Topol, Eric J., Ioannidis, John P.A., Collins, Gary S., and Maruthappu, Mahiben. Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies in medical imaging. The BMJ, 3 2020. doi: 10.1136/bmj.m689.
  • Ng et al. [2020] Ng, Ming-Yen, Lee, Elaine Y P, Yang, Jin, Yang, Fangfang, Li, Xia, Wang, Hongxia, Lui, Macy Mei-sze, Lo, Christine Shing-Yen, Leung, Barry, Khong, Pek-Lan, Hui, Christopher Kim-Ming, Yuen, Kwok-yung, and Kuo, Michael David. Imaging Profile of the {COVID}-19 Infection: Radiologic Findings and Literature Review. Radiology: Cardiothoracic Imaging, 2 2020. doi: 10.1148/ryct.2020200034.
  • O’Grady et al. [2020] O’Grady, Siobhán, Noack, Rick, Mettler, Katie, Knowles, Hannah, Armus, Teo, Wagner, John, And, Brittany Shammas, and Berger, Miriam. U.S. covid-19 death toll surpasses 2,000 in one day and 100,000 total worldwide -, 2020.
  • Rajpurkar et al. [2017] Rajpurkar, Pranav, Irvin, Jeremy, Zhu, Kaylie, Yang, Brandon, Mehta, Hershel, Duan, Tony, Ding, Daisy, Bagul, Aarti, Langlotz, Curtis, Shpanskaya, Katie, Lungren, Matthew P., and Ng, Andrew Y. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arxiv, 11 2017.
  • Reed & Marks [1999] Reed, Russell D. and Marks, Robert J.

    Neural smithing : supervised learning in feedforward artificial neural networks

    MIT Press, 1999.
  • Ross et al. [2017] Ross, Andrew, Hughes, Michael C, and Doshi-Velez, Finale. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. In International Joint Conference on Artificial Intelligence, 2017.
  • Ross [2020] Ross, Casey. AI used to predict Covid-19 patients’ decline before proven to work, 2020.
  • Shi et al. [2020] Shi, Yu, Yu, Xia, Zhao, Hong, Wang, Hao, Zhao, Ruihong, and Sheng, Jifang. Host susceptibility to severe COVID-19 and establishment of a host risk score: Findings of 487 cases outside Wuhan. Critical Care, 12 2020. doi: 10.1186/s13054-020-2833-7.
  • Strickland [2020] Strickland, Eliza. AI Can Help Hospitals Triage COVID-19 Patients, 2020.
  • Toussie et al. [2020] Toussie, Danielle, Voutsinas, Nicholas, Finkelstein, Mark, Cedillo, Mario A, Manna, Sayan, Maron, Samuel Z, Jacobi, Adam, Chung, Michael, Bernheim, Adam, Eber, Corey, Concepcion, Jose, Fayad, Zahi, and Gupta, Yogesh Sean. Clinical and Chest Radiography Features Determine Patient Outcomes In Young and Middle Age Adults with COVID-19. Radiology, 5 2020. doi: 10.1148/radiol.2020201754.
  • van der Maaten & Hinton [2008] van der Maaten, Laurens and Hinton, Geoffrey. Visualizing Data using t-SNE.

    Journal of Machine Learning Research

    , 2008.
  • Viviano et al. [2019] Viviano, Joseph D., Simpson, Becks, Dutil, Francis, Bengio, Yoshua, and Cohen, Joseph Paul. Underwhelming Generalization Improvements From Controlling Feature Attribution. arxiv:1910.00199, 10 2019.
  • Wang et al. [2017] Wang, Xiaosong, Peng, Yifan, Lu, Le, Lu, Zhiyong, Bagheri, Mohammadhadi, and Summers, Ronald M. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In Computer Vision and Pattern Recognition, 2017. doi: 10.1109/CVPR.2017.369.
  • Wiens et al. [2019] Wiens, Jenna, Saria, Suchi, Sendak, Mark, Ghassemi, Marzyeh, Liu, Vincent X., Doshi-Velez, Finale, Jung, Kenneth, Heller, Katherine, Kale, David, Saeed, Mohammed, Ossorio, Pilar N., Thadaney-Israni, Sonoo, and Goldenberg, Anna. Do no harm: a roadmap for responsible machine learning for health care. Nature Medicine, 8 2019. doi: 10.1038/s41591-019-0548-6.
  • Wilson & Moulson [2020] Wilson, Joseph and Moulson, Geir. Children in Spain allowed to play outdoors as country eases COVID-19 lockdown, 2020.
  • Wong et al. [2019] Wong, Ho Yuen Frank, Lam, Hiu Yin Sonia, Fong, Ambrose Ho Tung, Leung, Siu Ting, Chin, Thomas Wing Yan, Lo, Christine Shing Yen, Lui, Macy Mei Sze, Lee, Jonan Chun Yin, Chiu, Keith Wan Hang, Chung, Tom, Lee, Elaine Yuen Phin, Wan, Eric Yuk Fai, Hung, Fan Ngai Ivan, Lam, Tina Poy Wing, Kuo, Michael, and Ng, Ming Yen. Frequency and Distribution of Chest Radiographic Findings in COVID-19 Positive Patients. Radiology, 3 2019. doi: 10.1148/radiol.2020201160.
  • Wynants et al. [2020] Wynants, Laure, Van Calster, Ben, Bonten, Marc M.J., Collins, Gary S., Debray, Thomas P.A., De Vos, Maarten, Haller, Maria C., Heinze, Georg, Moons, Karel G.M., Riley, Richard D., Schuit, Ewoud, Smits, Luc J.M., Snell, Kym I.E., Steyerberg, Ewout W., Wallisch, Christine, and Van Smeden, Maarten. Prediction models for diagnosis and prognosis of covid-19 infection: Systematic review and critical appraisal. The BMJ, 4 2020. doi: 10.1136/bmj.m1328.
  • Yoon et al. [2020] Yoon, Soon Ho, Lee, Kyung Hee, Kim, Jin Yong, Lee, Young Kyung, Ko, Hongseok, Kim, Ki Hwan, Park, Chang Min, and Kim, Yun-Hyeon. Chest Radiographic and {CT} Findings of the 2019 Novel Coronavirus Disease ({COVID}-19): Analysis of Nine Patients Treated in Korea. Korean Journal of Radiology, 2020. doi: 10.3348/kjr.2020.0132.
  • Zech et al. [2018] Zech, John R., Badgeley, Marcus A., Liu, Manway, Costa, Anthony B., Titano, Joseph J., and Oermann, Eric Karl. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Medicine, 7 2018. doi: 10.1371/journal.pmed.1002683.