DeepAI
Log In Sign Up

A Comparative Study on Early Detection of COVID-19 from Chest X-Ray Images

In this study, our first aim is to evaluate the ability of recent state-of-the-art Machine Learning techniques to early detect COVID-19 from plain chest X-ray images. Both compact classifiers and deep learning approaches are considered in this study. Furthermore, we propose a recent compact classifier, Convolutional Support Estimator Network (CSEN) approach for this purpose since it is well-suited for a scarce-data classification task. Finally, this study introduces a new benchmark dataset called Early-QaTa-COV19, which consists of 175 early-stage COVID-19 Pneumonia samples (very limited or no infection signs) labelled by the medical doctors and 1579 samples for control (normal) class. A detailed set of experiments show that the CSEN achieves the top (over 98.5 learning over the deep CheXNet fine-tuned with the augmented data produces the leading performance among other deep networks with 97.14 99.49

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 6

page 7

page 9

01/28/2021

Reliable COVID-19 Detection Using Chest X-ray Images

Coronavirus disease 2019 (COVID-19) has emerged the need for computer-ai...
01/16/2022

Challenges in COVID-19 Chest X-Ray Classification: Problematic Data or Ineffective Approaches?

The value of quick, accurate, and confident diagnoses cannot be undermin...
07/16/2019

Deep Learning for Pneumothorax Detection and Localization in Chest Radiographs

Pneumothorax is a critical condition that requires timely communication ...
02/21/2022

OSegNet: Operational Segmentation Network for COVID-19 Detection using Chest X-ray Images

Coronavirus disease 2019 (COVID-19) has been diagnosed automatically usi...
11/28/2022

COVID-19 Classification Using Deep Learning Two-Stage Approach

In this paper, deep-learning-based approaches namely fine-tuning of pret...
03/06/2022

MobileNetV2 Based Chest X-Rays Classification

Diseases in the respiratory system affect many people worldwide and can ...

I Introduction

Coronavirus disease (COVID-19) became a global outbreak caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) which was first reported in Wuhan city of China, in December 2019. The transmission rate of COVID-19 is so high that it has rapidly spread over China and 33 other countries just in two months [cov-19]. Consequently, the World Health Organization (WHO) has declared COVID-19 as pandemic on the 11th of March, 2020. Although, infected patients tend to have mild and unspecific symptoms [cov-19_2] such as fever, myalgia or fatigue, and cough, the disease affects seriously people in high-risk groups especially the elderly. Up to now, COVID-19 has caused hundreds of thousands of fatalities from over six-million confirmed cases.

There have been different detection methods for COVID-19. In clinics, reverse transcription polymerase chain reaction (RT-PCR) has been used and it holds the reference method [cov-19_2] for COVID-19 detection. It is also recommended by WHO that a rapid collection of suitable specimens from suspect cases should be made with RT-PCR like nucleic acid amplification tests [world2020laboratory] and implies the vital role of RT-PCR to prevent spreading of the disease. However, RT-PCR is known to have a low sensitivity, and the following reasons for false negatives can be listed [world2020laboratory]: specimen might be collected in early or late stages, the quality of the specimen might be low (it may contain little amount of acceptable human DNA), or other reasons such as PCR inhibition and possible virus mutations. Overall, it is reported in [pcr_positive_rate] that RT-PCR has around total positive rate for throat samples, and low positive rates occur especially in mild cases. To this end, there are studies [cov-19, ct, sensitivity_ct] that investigate the usage of Chest-CTs and correlation between Chest-CT and RT-PCR tests as diagnostic tools. It is stated in [cov-19] that Chest-CT scans have positive findings for of negative RT-PCR samples, and [ct] suggests to repeat swap testing for the cases where CT scans have suspicious finding even though RT-PCR results are negative. Finally, [sensitivity_ct] calculates the sensitivities of Chest-CT and RT-PCR as and , respectively.

Fig. 1: (first row) Samples of COVID-19 pneumonia with very limited or no visible sign of COVID-19, and (second row) normal (healthy) class from Early-QaTa-COV19 dataset.

Although the above-mentioned studies propose to use Chest-CT scans in epicenters rather than RT-PCR to detect COVID-19 where RT-PCR has a low sensitivity for mild cases, there are several limitations of CT scans such as the time for image acquisition, the associated cost, and availability of CT devices. On the other hand, X-ray imaging is a highly available and faster diagnostic tool. Unlike CTs, X-ray imaging is also cheaper, and patients are less harmed from radiation [ct_harm] during the acquisition process. Another advantage is that there are portable X-ray devices, and hence, as stated in [xray_ad], X-ray can reduce the risk of contamination compared to CT for suspects where the person can spread the disease in the transport route. Overall, chest X-ray images can be an alternative for COVID-19 detection with other diagnosis tools (for example, RT-PCR) especially in heavily affected areas where the detection delay is critical and the resources are limited.

The outbreak has brought the urgent need for an automated, accurate, and robust COVID-19 detection/recognition system that can guide the practitioner to diagnose suspects especially in early stages. For example, many countries suffer from incorrect infection statistics because of the time-consuming part of the manual diagnostic tools [Xray1]. Several studies [Xray1, Xray2, Xray3, exact4, csen_covid] propose to use X-ray images for automated COVID-19 recognition. However, all of them except [csen_covid] have been experimented over only a small amount of data, e.g., the largest one includes only a few hundreds of X-ray images with only few COVID-19 samples. To address this need, in an earlier study [csen_covid], we have compiled the largest dataset called QaTa-COV19 with chest X-ray images from COVID-19 patients. The compiled dataset is not only the largest dataset in this domain, it has also additional categories as different pneumonia types: bacterial and viral in addition to normal (control) class.

As stated in [cov-19, cov-19_2], early detection plays a vital role to prevent spreading the disease by detecting infected people, isolating them, starting the treatment, and preventing possible secondary infections on the same patient. On one hand, COVID-19 detection from chest X-ray images is a straightforward task when it is already in late-stage and the patient’s X-ray shows moderate or severe signs of infection. However, during the early stage, this can be difficult or perhaps not feasible at all even for an expert medical doctor (MD). For example, in many studies [cov-19_2, xray_ad, chest_early], it is stated that chest X-ray images are not sensitive compared to CT scans for the early detection where the symptoms are mild, and they further claim that there can be traces of the infection that can only be detected by MDs in severe patients. Therefore, in this study, our first aim is to investigate state-of-the-art Machine Learning (ML) approaches for early COVID-19 detection from chest X-ray images. To accomplish this objective, we have first compiled a new benchmark dataset called Early-QaTa-COV19, which is formed from QaTa-COV19 with some additional images. For this purpose, X-ray images from the COVID-19 patients who are in the early stages of the disease are selected by a group of MDs. As some of them shown in Fig. 1, these samples have limited or no visible sign of COVID-19 pneumonia observed by the human eye. Accordingly, the Early-QaTa-COV19 dataset consists of early stage COVID-19 samples ( and images with no and limited infection signs, respectively) and samples from the control (normal) class. Early-QaTa-COV19 dataset has several unique properties: the first and foremost, it is extracted from the largest benchmark dataset, QaTa-Cov19, ever formed in the World and further populated with new X-ray images. Next, the dataset is the most diverse database encapsulating X-ray images from numerous countries (e.g. Italy, Spain, China, etc.) and different X-ray machines. Consequently, the images are in different quality, resolution, and noise levels as shown in Fig. 2.

As a consequence of recent advances in Deep Learning, techniques that are based on Convolutional Neural Networks (CNNs) have achieved state-of-the-art performances in many computer vision tasks such as image segmentation, recognition, and object detection. However, there are certain limitations on the approaches based on deep CNNs: the requirement of a large dataset to achieve the required generalization capability of such deep networks, and the necessity of an additional graphical processing unit (GPU) hardware to achieve reasonable inference time. To overcome these limitations, a transfer learning technique can still be applied by the following: (i) using a suitable pre-trained model without fine tuning to extract features which will be used by a compact classifier that does not require a large amount of training data, and (ii) using the pre-trained deep model as the initialization, and then, performing fine-tuning with limited data.

Fig. 2: Sample X-ray images from Early-QaTa-COV19 in different quality, resolution and noise level and showing no or very limited sign of COVID-19 pneumonia.

Unlike deep learners, traditional supervised approaches can also be utilized especially when the data is scarce. For example, representation-based classification approaches consisting of Sparse Representation based Classification (SRC) [SRC1, SRC2] and Collaborative Representation based Classification (CRC) [collaborative] are proven to perform well with a limited data. Accordingly, in the representation-based classification approaches, a dictionary is formed by stacking samples from the training set. Then, when a test sample y is introduced, it is assumed that the query samples can be represented as a linear combination of the atoms in . Therefore, the estimated representation coefficients, that is obtained by solving , carry enough information about the class of . For example, SRC approaches compute sparse solutions: the estimated has just enough non-zero coefficients, where only corresponding samples with the same query class in the dictionary contributes. As SRC approaches, [SRC1, SRC2], provide slightly improved results compared to CRC, they are iterative methods and computationally complex. On the other hand, CRC provides a non-iterative and a relatively faster alternative via least-square sense solution, yet it produces comparable results as presented in [collaborative].

Convolutional Support Estimator Network (CSEN) introduced in a recent work [CSEN] is proposed to combine traditional representation-based classification with a learning-based methodology. We define the support set as the location of non-zero elements in . Accordingly, the support set is more important than the exact values of since it reveals the class information of the query. Hence, it was validated in [CSEN] that reconstruction of sparse with SRC methods may not be needed to improve the performance in representation-based classification. Consequently, CSENs in [CSEN] provide a superior classification performance and computational efficiency against other representation-based methods by directly learning the mapping from query sample to the corresponding support set from a small amount of training data. Moreover, CSENs are evaluated in our previous study [csen_covid] for COVID-19 recognition in the benchmark QaTa-COV19 dataset where it has achieved over sensitivity and specificity for COVID-19 recognition. Therefore, with their capabilities of performing well with limited training dataset, both traditional representation-based classifiers and CSENs are good candidates for Early-QaTa-COV19 dataset.

Overall, in this study, we propose to use the CSEN approach for early detection of COVID-19 directly from X-ray images. To this end, it is for the first time the CSEN approach is compared against several state-of-the-art approaches including deep CNNs by providing an extensive set of evaluations. The overall evaluated methods include compact and deep classifiers using the Early-QaTa-COV19 dataset. In the former group, SRC, CRC, CSEN, Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN) classifiers are evaluated. The pre-trained CheXNet method proposed in

[chexnet] is used to produce highly representative features. This CheXNet version is based on Dense Convolutional Network with layers (DenseNet-121) [DenseNet]

which is a fine-tuned version of ImageNet weights over

frontal X-ray images with classes. In the latter group, we evaluate the following deep networks: DenseNet-121 initialized with CheXNet weights, DenseNet-121, ResNet-50 [resnet50], and Inception-v3 [inception] networks initialized with ImageNet weights. For the former group, we do perform a limited data augmentation for class balancing, whereas a significant data augmentation is performed for the latter group.

The results demonstrate that it is possible to achieve a robust and highly accurate early COVID-19 detection with a tolerable false alarm rate by CSENs using the deep features. On the other hand, it is shown that deep learners provide comparable sensitivity levels with improved specificity when they are trained using intense data augmentation.

The rest of the paper is organized as follows: a brief overview and preliminaries will be provided related to generic sparse representation in Section II. Next, the state-of-the-art classification methods used in this study are detailed in Section III. The experimental results with the benchmark Early-QaTa-COV19 dataset are presented in Section IV. Finally, Section V concludes the paper.

Fig. 3: Feature extraction pipeline from the pre-trained ChexNet, which is originally a DenseNet-121 type of deep network trained on ChestX-ray14 dataset. 1024-D feature vectors are extracted for the compact classifiers trained for the early detection of COVID-19.

Ii Background and Preliminaries

In this section, a brief overview will be provided for the representation-based classification approaches used in this study. Accordingly, we define the following notations: is the -norm of a vector where . The -norm and -norm are defined for the vector as and , respectively.

Additionally, a signal can be called as strictly -sparse if there is a proper domain that can represent the signal with less than non-zero coefficients: where . In other words, the signal can be represented in some domain, i.e., , using only small number of basis vectors. Hence, we define a (sparse) support set of the signal , , as indexing the non-zero coefficients of which corresponds these basis components.

The signal, , can be represented in a subspace , i.e., . Accordingly, can be sparse coded in the equivalent dictionary, , as follows:

(1)

where , and is the compression matrix for . If the signal is -sparse in a sparsifying basis, then, the solution of

(2)

is unique if and [spark]. Hence, it can be said based on (2) that at least -sparse signal pairs can be distinguishable in the equivalent dictionary, .

As the above-mentioned optimization problem is non-convex, and NP-hard, its relaxation by can be applied which is the closest convex norm:

(3)

which is defined as Basis Pursuit [BP] where .

On the other hand, as previously discussed, in representation-based classification [SRC2, collaborative, CSEN], estimating the support set, , would be more beneficial than the signal recovery. Let the support estimator be for a linear measurement scheme with an additive noise :

(4)

where is the estimation.

In practice, the performance of the recovery of is related with the recovery performance of the sparse signal in traditional SE methods, since they are based on first applying a signal recovery method, then, applying component-wise thresholding over the estimated signal, , to compute . However, in [CSEN], we have shown that the direct recovery most likely causes noisy estimation while CSEN is able to learn sparse patterns and accomplish better SE compared to the competing methods. The readers are referred to [CSEN] for a more detailed survey and evaluations on the support estimation performances indicating the limitations and drawbacks of traditional methods compared to the proposed CSEN approach.

Iii Methods for Early Detection of COVID-19

Iii-a Compact Approaches for Early Detection

In this section, we present the state-of-the-art methodologies and explain our configurations for their application to early detection of COVID-19. First, we present a feature extraction procedure along with the compact classifier approaches in the first group, then a detailed discussion is provided on the chosen deep networks for the early detection problem. Note the fact that the methods in the first group are selected considering their suitability for the early detection task where the training data is scarce, whereas there is a need for a heavy data augmentation operation for the second group of methods.

Iii-A1 Feature extraction by CheXNet

Fig. 4: Representation-based classification pipeline for the early detection COVID-19 using chest X-ray images.

Traditional ML approaches need feature extraction for classification. In accordance with the purpose of this study, we utilize the pre-trained CheXNet model in [chexnet], that is originally proposed to detect pneumonia cases from chest X-ray images. The network is based on DenseNet-121 architecture with some modifications. In [chexnet], DenseNet-121 is modified by adding

-neurons at the end to train the network over their benchmark ChestX-ray14 dataset

[XrayDataset] which consists of different pathology classes. The network is initialized with ImageNet weights, and it is fine-tuned over

chest X-ray images. The fine-tuning is performed by the modified loss function which is the sum of unweighted binary cross-entropy computed for each class. It is reported in

[chexnet] that CheXNet produces the best results on the ChestX-ray14 dataset, and it also achieves better performance levels than Radiologists’ average decisions.

In this study, the pre-trained CheXNet is used to extract 1024-D feature vectors by taking the output after global pooling just before the classification layer, which is illustrated in Fig. 3

. Then, a dimensionality reduction is applied over the calculated features with principal component analysis (PCA) by choosing the first

principal components. Hence, for a feature vector , the query sample is computed, where is PCA matrix computed over the training data and . Then, data normalization is applied over the calculated

to have zero mean and unit variance for MLP, SVM, and k-NN classifiers, and zero mean and unit-norm for SRC, CRC, and CSEN approaches.

Iii-A2 Sparse Representation based Classification (SRC)

In SRC, when a query sample is introduced, it is expected that the estimated sparse code should have a sufficient number of non-zero coefficients in the locations which correspond to samples in the dictionary, , with the same class of the query

. SRC techniques are used in many different classification tasks such as face recognition in

[SRC2], hyperspectral image classification [hyperspecral], and human action recognition [human-action]. In the following, we briefly give more information about the SRC scheme.

Since in representation-based classification techniques, the signal may not be exactly -sparse due to the correlation between the samples in the dictionary, one alternative approach may be to corrupt the original scheme with an additive noise, , : . In this case, stable recovery of the sparse signal can still be possible even though the exact recovery is not possible, where obeys holds for the stable solution for a small constant. For example, using the Lasso formulation:

(5)

it is shown in [lasso-stable] that it is possible to recover the partial or exact in noise-free or noisy conditions, respectively.

The correlation between the samples of different classes has led many approaches to use different strategy instead of solving (5) directly. Accordingly, [SRC2] proposes to use a four-step approach as follows: i) The atoms of and the query sample are normalized to have unit -norm, (ii) Perform sparse recovery: , (iii) Compute the residuals using the corresponding estimated coefficient for the class : (iv) Class prediction: . Since such a four-step approach brings additional improvements in the performance, many SRC studies follow a similar approach such as [human-action, hyperspecral].

Iii-A3 Collaborative Representation based Classification (CRC)

The study in [collaborative] proposes to use -minimization instead of -minimization in (5):

(6)

Hence, can be computed from the derived closed-form solution as . In other words, instead of searching for a sparse solution, this approach utilizes collaborative representation (as CRC term states) among the atoms of the dictionary due to the least-square sense minimization approach. Consequently, CRC is particularly faster compared to iterative -minimization recovery algorithms. The CRC is used in [collaborative] by modifying the second step of the previously mentioned four-step solution in Section III-A2 by changing the estimation of with its closed-form estimation. They also report in [collaborative] that CRC performances on different classification problems are comparable with -minimization based approaches (even better than some other approaches) for high compression rates. In this work, we use both SRC and CRC approaches and provide comparative evaluations for the early detection task as presented in Fig. 4.

Iii-A4 Convolutional Support Estimator Networks (CSENs)

Fig. 5: The CSEN approach for the early COVID-19 detection from chest X-ray images.

If the aim is to compute the support set rather than the exact signal recovery, then, a compact support estimator should be sufficient for this task. Moreover, in traditional approaches where signal recovery is initially performed and then is computed, as discussed in Section II, the performance of SE depends on the recovery performance, which is not guaranteed in noisy cases or if is not exactly sparse (e.g., for representation-based classification problems in which CRC classifier utilizes from this case as stated in [collaborative]). As the recovery of partial [Volkan, SE3, partial2] or complete [exact4, SE1, exact1, Volkan] is still possible in these cases, it is shown in [CSEN] that CSEN performs well in SE for these cases compared to traditional methods where a SR technique is first applied on to compute , then a thresholding is made over to estimate support set, .

The proposed CSEN network in [CSEN] aims to compute direct mapping from test sample to its corresponding support set . Hence, the CSEN approach is faster than -minimization techniques that work in an iterative manner. Moreover, CSENs have compact configurations with few convolutional layers which also contribute to computational efficiency. For example, ReconNet proposed in [reconnet] originally for the signal recovery problem requires deeper network structures compared to the SE task. For the SE, another alternative is to use MLPs as the estimator networks. Similarly, such network is used in [Lamp] for the recovery problem. However, for SE tasks, it is observed in [CSEN] that using MLPs decreases the generalization capability and robustness to noise. Overall, thanks to their compact structures, CSENs can learn from a limited number of labelled data which is exactly the case of early detection of COVID-19.

Accordingly, a SE network should compute a binary mask :

if (7a)
else , (7b)

Thus, the support set would be . Correspondingly, the CSEN networks, , produce an output vector such that

is the probability of each index being in

. Then, the estimated support set, , can be computed by thresholding with a fixed threshold, .

On the other hand, the input of the CSEN is the proxy which is a coarse estimation of as or . Inference on the proxy of is investigated by several studies for different applications, for example, studies in [degerli, inference, dat] perform classification of compressively sensed images using the proxy as input of reconstruction-free frameworks.

Since CSEN networks consist of 2-D convolutional layers as illustrated in Fig. 5, the proxy is reshaped to a 2-D plane. Then, it is convolved with the weight kernels, , connecting the input layer to the next layer with filters to form the input of the activation with the summation of weight biases :

(8)

where , and is the up- or down-sampling operation. Hence, the feature map of layer can be given as,

(9)

Overall, L layer CSEN will have the trainable weight and bias parameters as follows: .

Since samples from the same class are grouped together in the representation-based classification, a group sparsity term can be introduced in -minimization problem given in (5) as follows,

(10)

where the group of coefficients is represented by for class. Thus, the cost function would be the following for an SE network:

(11)

where is the true binary mask indicating the sparse codes of and the output of the network is at pixel. Although such a regularization technique may increase the classification performance since it forces the network to produce supports grouped together, the cost function in (11) is approximated because of its computational complexity in CSEN by inserting average pooling layers after the last convolutional layer. Afterwards, the categorical cross-entropy is calculated as the cost of CSEN using the produced class probabilities obtained after SoftMax operation. Consequently, the input output pair for the training of CSEN is .

Since CSEN takes reshaped proxy as input, the corresponding indices of the atoms in the dictionary,

, are re-ordered to make sure that samples from the same classes are grouped together in the reshaped 2-D plane. Note the fact that the grouped samples would have different sizes depending on the number of dictionary samples and classes which also determines the definite input, output mask, and average pooling stride sizes of the CSEN. The modified CSEN configurations with dictionary sizes for the two-class early detection problem will be given in Section

IV. The overall framework of CSEN in early detection of COVID-19 is presented in Fig. 5.

Iii-A5 Multi-Layer Perceptrons (MLPs)

Fig. 6: MLP framework used for the early detection of COVID-19.

The MLP network used for the early detection task consists of 3-hidden layers, where the details of its structure are depicted in Fig. 6. Such network architecture is determined by testing different network topologies with shallower or deeper networks by performing parameter search. In the early detection problem with MLP based classification, we follow a slightly different approach compared to other presented compact classifiers. The neurons connecting the input layer to the first hidden layer are first initialized with which is the PCA matrix computed for dimensionality reduction task of other classifiers. Then, all layers are trained together including the PCA initialized layer. Such an approach provides a slightly improved performance and fair comparison with other classifiers used in this study since they also utilize the PCA technique.

Iii-A6 Support Vector Machine (SVM)

The SVM topology is selected by performing grid-search to find optimal hyper-parameters such as kernel type and its parameters with the following setting and variations: the kernel function {linear, polynomial, radial basis function (RBF)}, kernel scale (

parameter for the RBF kernel) in the range incremented in log-scale, polynomial order {, , }, and box constraint ( parameter) with log-scale in the range . Since this is a detection (binary classification) problem, a single SVM suffices for the task.

Iii-A7 k-Nearest Neighbor (k-NN)

Similarly, the optimal parameters are searched in the pre-defined grid to build a k-NN classifier with the best configuration for the early detection task. Accordingly, in the search space, the k-values are varied with log-scale in the range , where is the number of observations in the train set, and the evaluated different distance metrics are as follows: Euclidean, standardized Euclidean, correlation, City-block, cosine, Chebyshev, Hamming, Minkowski, Mahalanobis, Jaccard, and Spearman.

Iii-B Deep Learning based Early Detection

Deep Learning methods, or specifically deep CNNs have achieved elegant results in many computer vision tasks. Hence, in this study, their learning capability should be investigated for the early detection of COVID-19. Contrary to the aforementioned compact classifiers, deep CNNs do not need a prior feature extraction step since CNNs combine feature extraction and classification in a single learning body and jointly optimize them. In this group, we investigate four recent deep configurations with transfer learning: CheXNet [chexnet], DenseNet-121 [DenseNet], ResNet-50 [resnet50], and Inception-v3 [inception]. As discussed previously, we use CheXNet for extracting features of the compact classifiers with the weights learned over ChestX-ray14 dataset; however, as for classification, DenseNet-121 based layers of CheXNet are trained over Early-QaTa-COV19 dataset whilst the output layer is modified to have two neurons for the binary classification. Moreover, DenseNet-121, ResNet-50, and Inception-v3 models are also trained over the Early-QaTa-COV19 dataset starting from their ImageNet weights. In this way, we also aim to investigate if the ChestX-ray14 dataset improves the performance of the early detection of COVID-19 by training DenseNet-121 directly from its ImageNet weights.

Iv Experimental Results

In this section, first the benchmark dataset released along with this study is introduced, and then the experimental setup is presented. Finally, we provide an extensive set of comparative evaluations over early detection performances of the state-of-the-art methods covered in this study.

Iv-a Benchmark Datasets

Iv-A1 QaTa-COV19 dataset

The researchers of Qatar University and Tampere University have compiled the largest COVID-19 dataset, called QaTa-COV19. As shown in Fig. 7, besides COVID-19 cases, there are images from normal (), viral (), and bacterial () pneumonia all of which are collected from the Kaggle chest X-ray database [2018chest]. In QaTa-COV19, there are COVID-19 positive X-ray images that are collected from various publicly available sources including Italian Society of Medical and Interventional Radiology (SIRM) COVID-19 Database [CovidDataSet2], Chest Imaging (Spain) at thread reader [CovidDataSet4], Radiopaedia [CovidDataSet3], and news-portals and online articles. Correspondingly, COVID-19 samples are from different gender, age-groups, ethnicity, and country. The authors have also performed the tedious task of indexing and collecting of X-ray images from various published and preprint articles from China, USA, Italy, Spain, South Korea, Taiwan for COVID-19 positive cases as well as online news-portals (until 20th of April, 2020).

Fig. 7: Sample images from QaTa-Cov19 dataset per class.

Iv-A2 Early-QaTa-COV19 dataset

This dataset is formed by selecting the early stages of COVID-19 pneumonia from the enlarged version of QaTa-COV19 dataset that consists of COVID-19 positive cases. Accordingly, constructed Early-QaTa-COV19 consists of early-stage COVID-19 samples (no and very limited infection signs on and images, respectively) labelled by the MDs, and samples for control (normal) class. The dataset is highly unbalanced and this particularly makes the early detection task harder. Furthermore, a high inter-class similarity exists in the dataset as shown in Fig. 1. Finally, as illustrated in Fig. 2, there is a high intra-class dissimilarity especially among the COVID-19 images since they are compiled from different sources.

Iv-B Experimental Setup

The comparative methods are evaluated by a -fold cross-validation (CV) scheme over the Early-QaTa-COV19 dataset. We have resized chest X-ray images to in order to fit the input dimensions to the state-of-the-art deep network topologies. Table I shows the number of samples in each fold, which we split the data into training and test (unseen folds) sets by and , respectively.

Class Total Samples
Training Samples
Test Samples
Early Stage
COVID-19
175 140 35
Normal 1579 1263 316
(a) Early-QaTa-COV19 Dataset
Class
Balanced
Training Samples
Augmented
Training Samples
Test
Samples
Early Stage
COVID-19
1263 10K 35
Normal 1263 10K 316
Total
2526 20K 351
(b) Balancing & Augmentation
TABLE I: Number of samples per class and per fold before and after data balancing / augmentation.

Since the dataset is highly unbalanced, we have balanced the training set by augmenting the data in order to have an equal number of samples in each class. However, such limited data augmentation for balancing is not enough for deep CNNs. Therefore, we have further augmented the training samples, i.e., as presented in Table I(b), we augmented the training samples up to X-ray images for data balancing, whereas data augmentation yields

K images for training deep CNNs. The data augmentation is performed by Image Data Generator in Keras. We have augmented the X-ray images by randomly rotating in

degrees of range and randomly shifting them horizontally and vertically by . The blank sections, after rotating and shifting, are filled by the ”nearest” mode.

For the CSEN approaches used in this study, we follow the proposed configurations in [CSEN]. Accordingly, there are two compact networks: CSEN1 and CSEN2. CSEN1 has only two hidden convolutional layers with and

neurons, respectively, whereas CSEN2 consists of additional max-pooling and transposed-convolutional layers with

neurons. Both networks use Rectified Linear unit (ReLu) activation functions and

filter sizes. In addition to this setup, the ReconNet [reconnet] approach is modified to perform the SE task as a deep version of the CSEN framework. ReconNet is originally proposed for the signal recovery problem as a non-iterative alternative to the traditional approaches, and it achieves the state-of-the-art performance levels in compressive sensing applications as shown in [reconnet]. The modified ReconNet for SE has 6 fully convolutional layers, and it does not have a denoiser layer as the first block and Block-matching and 3D filtering (BM3D) operation (see [reconnet]) at the output. The input-output pair is also different () to train the network as the CSEN type of approach for the early detection problem. Accordingly, its last layer is modified by inserting an average-pooling layer to mimic the cost in (11), and SoftMax to produce class probabilities.

The experimental evaluations of SRC, CRC, and k-NN are performed on a PC with Intel ® iU CPU and GB system memory with MATLAB version 2019a, whereas SVM is implemented with the same computer setup but in Python. In the regularized least square solution of CRC, the regularization parameter is set to, . For the hyper-parameter selection of k-NN and SVM classifiers, grid search is performed using another

-fold stratified CV over the training sets of the previously explained CV folds. Other approaches: MLP, CSEN, and deep learning methods are implemented with the Tensorflow library

[abadi2016tensorflow] using Python on NVidia ® TITAN-X GPU card. The training procedures of MLP, CSEN and deep CNNs are performed using ADAM optimizer [adam] with their proposed default momentum update parameters as and using categorical cross-entropy loss function. CSEN is trained for only

Back-Propagation epochs with a learning rate,

, and a batch size, . On the other hand, the MLP network and deep learners are trained with and , respectively, both for epochs and with batch size .

Iv-C Results

For the early detection of COVID-19, we have analyzed and evaluated several ML approaches including compact classifiers: SRC, CRC, CSEN, MLP, SVM, and k-NN, and deep CNNs: CheXNet (based on DenseNet-121), DenseNet-121, ResNet-50, and Inception-v3 networks over Early-QaTa-COV19 dataset. For SRC approach, we have investigated 8 different solvers: OMP [fast], Dalm [fast], L1LS [l1ls], ADDM [ADMM], Homotopy [homotopy], GPSR [gpsr], Palm [fast], and -magic [l1magic]. In this study, we report SRC results from only Dalm and Homotopy solvers since others show poor performance in the detection task (provides sensitivity).

In representation-based classification approaches, the dictionary is constructed by using all samples from the balanced training set of each fold. Hence, , has samples from each class, and then the PCA matrix, , is applied for the dimensionality reduction, where the compression ratio is . The equivalent dictionary, , would have the size of . On the other hand, since CSEN networks need additional training samples, we choose only samples per class to construct and use the remaining samples per class from the training set of each fold to train CSENs. Consequently, the corresponding denoiser matrix of size is used to perform coarse estimation for CSEN. The resulted , , is reshaped to 2-D plane with the size of in such a way that support sets from the corresponding classes are grouped together which is then fed to CSENs.

Method Accuracy Sensitivity Specificity
SRC-Dalm 0.9818 ± 0.006 0.9371 ± 0.036 0.9867 ± 0.006
SRC-Hom. 0.9481 ± 0.010 0.8171 ± 0.057 0.9626 ± 0.009
CRC-light 0.9783 ± 0.007 0.9486 ± 0.033 0.9816 ± 0.007
CRC 0.9823 ± 0.006 0.9657 ± 0.027 0.9842 ± 0.006
CSEN1 0.9635 ± 0.009 0.9886 ± 0.016 0.9607 ± 0.010
CSEN2 0.9248 ± 0.012 0.9943 ± 0.011 0.9171 ± 0.014
ReconNet 0.9424 ± 0.011 0.9943 ± 0.011 0.9367 ± 0.012
MLP 0.9584 ± 0.009 0.9371 ± 0.036 0.9607 ± 0.010
SVM 0.9681 ± 0.008 0.9657 ± 0.027 0.9683 ± 0.009
k-NN 0.9458 ± 0.011 0.9257 ± 0.039 0.9481 ± 0.011
(a) Average detection performances of compact classifiers.
Method Accuracy Sensitivity Specificity
CheXNet 0.9926 ± 0.004 0.9714 ± 0.025 0.9949 ± 0.004
DenseNet-121 0.9949 ± 0.003 0.9543 ± 0.031 0.9994 ± 0.001
Inception-v3 0.9937 ± 0.004 0.9543 ± 0.031 0.9981 ± 0.002
ResNet-50 0.9943 ± 0.004 0.9600 ± 0.029 0.9981 ± 0.002
(b) Average detection performances of deep CNNs.
TABLE II: The average performances and their confidence intervals (CIs) of different approaches over 5-folds for the early detection of COVID-19 pneumonia from the normal chest X-ray images. CRC-light uses the same with CSENs, whereas CRC uses the complete training set.
Fig. 8: False negatives of the CSEN1 and CheXNet.

The early detection performances of the compact classifiers are given with their confidence intervals (CIs) in Table II(a). Accordingly, CI can be estimated for each performance metric as follows: , where is the number of samples for that particular performance metric, and is the level of significance that is 1.96 for CI. Consequently, it is expected that the CIs for sensitivity measure are larger due to the unbalanced data. The presented results clearly indicate that CSENs achieve the top sensitivities among other compact classifiers with acceptable specificity rates. In particular, CSEN1 configuration achieves over sensitivity with a high specificity (). The same denoiser matrix is also used in the CRC method which is reported separately as the CRC-light version to observe if the CSEN approach brings performance improvement. Since representation-based classification approaches are known to perform well in the limited-data scenarios, the second competitor is indeed the CRC approach from the first group as expected. Note the fact that the CRC method provides better classification performance for the early detection compared to SRC methods. This may be explained in the following way; in representation-based classification problems, it may not be the sparsity that brings the information about the class but the collaborative representation among the samples in the dictionary. Similar findings are reported in [collaborative] for the face recognition problem. However, in this study, we have observed much higher performance gap between CRC and SRC compared to other classification problems reported in the previous studies [collaborative, CSEN].

Table II(b) presents the average detection performances of deep CNNs. Although the best sensitivity is obtained using CSENs, deep CNNs can also achieve quite high sensitivity levels with almost no false alarms (specificity is ). For example, CheXNet, which is trained over Early-QaTa-COV19 initialized with ChestX-ray14 dataset weights, outperforms other deep networks and produces comparable sensitivity but with a superior specificity of .

The confusion matrices, cumulated over the confusion matrix of each fold’s test set, are presented in Table

III for the two top-performers. Accordingly, CheXNet misses three more early case of COVID-19 than CSEN1, but it is able to provide much higher specificity (the sensitivity for normal X-ray images). Moreover, the false-negative X-ray images are given in Fig. 8. It is observed that CSEN1 and CheXNet are able to detect and , respectively, out of early cases of COVID-19 that show no visible sign of COVID-19 by a human eye.

CSEN1 Predicted
Normal COVID-19
Ground Truth Normal 1517 62
COVID-19 2 173
(a) CSEN1 Confusion Matrix
CheXNet Predicted
Normal COVID-19
Ground Truth Normal 1571 8
COVID-19 5 170
(b) CheXNet Confusion Matrix
TABLE III: Leading compact CSEN and deep CheXNet models’ cumulative confusion matrices on early COVID-19 detection.

To assess the computational complexity analysis of the compared methods, we first start with the number of trainable network parameters as presented in Table IV. Obviously CSENs have a crucial advantage especially compared to the deep CNNs in terms of computational complexity. On the other hand, for the deep CNNs with such complex configurations, as discussed earlier, and intensive data augmentation, e.g., about -times for COVID-19 samples, is a major requirement for a proper training.

Model Number of Parameters
CSEN1 11,089
CSEN2 16,297
ReconNet 22,914
MLP 672,706
CheXNet 6,955,906
ResNet-50 23,538,690
Inception-v3 21,772,450
TABLE IV: Number of trainable parameters of comparative models.

As for time complexity, Fig. 9 shows sensitivity versus computational time for each method evaluated in this study. It is clear that CSEN1 and ReconNet can achieve the top sensitivity and computational efficiency among all methods. However, ReconNet suffers from relatively low specificity, which indicates a substantial amount of false positives. An interesting observation worth mentioning is that CheXNet can achieve slightly inferior sensitivity but with a specificity of around , and it is still faster than some of the compact classifiers. Note the fact that even though CRC tends to provide closed-form solution, because of the four-step classification framework, which involves residual finding as discussed in Section III, representation-based classification techniques suffer from the highest time complexity in general. However, CSENs use only the denoiser multiplication part of CRC, , which requires an insignificant time (i.e., only ms for a test set averaged over -folds). Finally, such inference times for deep networks are valid if they can utilize the recent GPU cards, whereas other methods do not require additional hardware and can run on an ordinary computer. This is a crucial advantage for those light-weight mobile applications with a real-time analysis requirement.

Fig. 9: Time complexity versus the sensitivity of all the evaluated classifiers. Computational times are plotted in log-scale and measured for the evaluation of test sets by averaging over -folds.

V Conclusion

Since there is no known specific treatment for COVID-19, the early detection of the disease plays a vital role in preventing the spreading of the pandemic. Currently, RT-PCR is widely used in the world for the diagnosis of COVID-19. However, RT-PCR tests can easily miss a positive case (false negatives) depending on the sample collection or the disease stage of the patient. As an alternative, chest CT-scans have provided satisfactory results and outperformed sensitivity levels of RT-PCR. Nevertheless, in many areas where hospitals are congested because of the pandemic, it may not be easy to access such expensive and time-consuming equipment.

X-ray acquisition, however, is cheaper, easily accessible, and the acquisition time is shorter than CT. Furthermore, X-ray imaging can be applied with a greater ease since the equipment is portable. This justifies our motivation to investigate the feasibility of an accurate, robust and fully-automatic method for the early detection of COVID-19 from chest X-ray images. For this purpose, we first compiled the Early-QaTa-COV19 dataset which encapsulates the largest number of COVID-19 patients who are in the early stages. Our findings have clearly demonstrated the fact that early detection of COVID-19 infestation from X-ray images can be performed with a very high sensitivity and specificity. In other words, even though it is a difficult or sometimes impossible task for experts to detect COVID-19 infestation due to the early stage of the disease, with a proper setup and training, some particular compact and deep classifiers can accurately detect the disease with tolerable false-positives. In particular, it is observed that CSEN type of models provide the highest sensitivity levels with while CheXNet provides a comparable sensitivity with a higher specificity. Among those X-ray images where MDs have found no trace of COVID-19 infestation (and hence naturally would mis-diagnose all of them as ”normal”), CSEN1 and CheXNet can accurately identify and of them, respectively. Finally, both CSEN models have the utmost computational efficiency especially when compared to the CRC and deep networks. This makes them a feasible solution for those low-cost/low-power portable applications.

References