Log In Sign Up

Convolutional Sparse Support Estimator Based Covid-19 Recognition from X-ray Images

by   Mehmet Yamac, et al.

Coronavirus disease (Covid-19) has been the main agenda of the whole world since it came in sight in December 2019. It has already caused thousands of causalities and infected several millions worldwide. Any technological tool that can be provided to healthcare practitioners to save time, effort, and possibly lives has crucial importance. The main tools practitioners currently use to diagnose Covid-19 are Reverse Transcription-Polymerase Chain reaction (RT-PCR) and Computed Tomography (CT), which require significant time, resources and acknowledged experts. X-ray imaging is a common and easily accessible tool that has great potential for Covid-19 diagnosis. In this study, we propose a novel approach for Covid-19 recognition from chest X-ray images. Despite the importance of the problem, recent studies in this domain produced not so satisfactory results due to the limited datasets available for training. Recall that Deep Learning techniques can generally provide state-of-the-art performance in many classification tasks when trained properly over large datasets, such data scarcity can be a crucial obstacle when using them for Covid-19 detection. Alternative approaches such as representation-based classification (collaborative or sparse representation) might provide satisfactory performance with limited size datasets, but they generally fall short in performance or speed compared to Machine Learning methods. To address this deficiency, Convolution Support Estimation Network (CSEN) has recently been proposed as a bridge between model-based and Deep Learning approaches by providing a non-iterative real-time mapping from query sample to ideally sparse representation coefficient' support, which is critical information for class decision in representation based techniques.


page 1

page 2

page 5

page 8


Classification of COVID-19 X-ray Images Using a Combination of Deep and Handcrafted Features

Coronavirus Disease 2019 (COVID-19) demonstrated the need for accurate a...

A review of Deep learning Techniques for COVID-19 identification on Chest CT images

The current COVID-19 pandemic is a serious threat to humanity that direc...

Chest X-ray Analysis Empowered with Deep Learning: A Systematic Review

Chest radiographs are widely used in the medical domain and at present, ...

CCTCOVID: COVID-19 Detection from Chest X-Ray Images Using Compact Convolutional Transformers

COVID-19 is a novel virus that attacks the upper respiratory tract and t...

I Introduction

Coronavirus disease 2019 (Covid-19) has been declared as a pandemic by the World Health Organization (WHO) two months after its first appearance in December, 2019 in Wuhan, China. It has infected more than 3 million people, caused thousands of causalities and has so far paralyzed the mobility all around the World. The spreading rate of Covid-19 is so high that the number of cases is expected to be doubled every three days if the social distancing is not strictly observed to slow this accretion [coronaSpreading]. Roughly around half of Covid-19 positive patients exhibit also a comorbidity [clinical], making difficult to differentiate Covid-19 from other lung diseases. Automated and accurate Covid-19 diagnosis is critical for both saving lives and preventing its rapid spread in the community. Currently, RT-PCR (Reverse transcription polymerase chain reaction) and CT (computed tomography) are the common diagnosis techniques used today. RT-PCR results are ready at the earliest 24 hours for critical cases and generally take several days to conclude a decision [CT1]. CT may be an alternative at initial presentation; however, it is expensive and not easily accessible [erickson1993advanced]. The most common tool that medical experts use for both diagnostic and monitoring the course of the disease is X-ray imaging. Compared to RT-PCR or CT test, having an X-ray image is an extremely low cost and a fast process, usually taking only few seconds. Recently, WHO reported that even RT-PCR may give false results in Covid-19 cases due to several reasons such as poor quality specimen from the patient, inappropriate processing of the specimen, taking the specimen at an early or late stage of the disease [world2020laboratory]. For this reason, X-ray imaging has a great potential to be an alternative technological tool to be used along with the other tests for an accurate diagnosis.

Accordingly, there are several recent works [Xray1, Xray2, Xray3, exact4] that have been proposed for Covid-19 detection/ classification from X-ray images. However, they use a rather small dataset (the largest containing only a few hundreds of X-ray images), with only a few Covid-19 samples. This makes it difficult to generalize their results in practice. To address this deficiency and provide reliable results, in this study the researchers of Qatar University and Tampere University have compiled the largest Covid-19 dataset, called QaTa-Cov19. Compared to the earlier benchmark dataset created in this domain, such as COVID Chestxray Dataset [CovidDataSet1] or Covid-19 DATASET [CovidDataSet2], QaTa-Cov19 has the followıng unique benchmarking properties. First, it is the largest dataset, not only in terms of the number of images (more than 6200 images) but its versatility i.e., QaTa-Cov-19 contains additional major pneumonia categories, such as Viral and Bacterial, along with the control (normal) class. Moreover, this is the most diverse dataset encapsulating X-ray images from several countries (e.g. Italy, Spain, China, etc.) produced by different X-ray machines. Finally, the images are in different quality, resolution and SNR levels as shown in Fig. 1.

Fig. 1: Sample Covid-19 X-ray images from QaTa-Cov19.

QaTa-Cov19 contains many X-ray images from the Covid-19 patients who are in the early stages; therefore, their X-ray images show mild or no-sign of Covid-19 infestation by the naked eye. Some sample images are shown in Fig. 2-(b). Another fact which makes the diagnosis far more challenging is that inter-class similarity can be very high for many X-ray images as some samples shown in Fig. 2-(a). Against such high inter-class similarities and intra-class variations, in this study we aim for a high robustness level. Our primary objective is to achieve the highest sensitivity possible in the diagnosis of Covid-19 induced pneumonia with an acceptable false-alarm rate (e.g. specificity ). In particular, the misdiagnosis of a Covid-19 X-ray image as a normal case should be minimized whilst a small number of false negatives is tolerable.

Fig. 2: Sample QaTa-Cov19 X-ray images: (a) X-ray images from different classes. (b) X-ray images from the Covid-19 patients who are in the different stages.

In numerous classification tasks, Deep Learning techniques have been shown to achieve state-of-the-art performance in term of both recognition accuracy and their parallelizable computing structures which play an important role especially in real-time applications. Despite their advantages, in order to achieve a desired performance level in a deep model, a proper training over a massive training dataset is usually needed. Nevertheless, this is unfortunately not an option yet for this problem since the available data is still rather limited.

An alternative supervised approach, which requires a limited number of training samples to achieve satisfactory classification accuracy is representation-based classification [collaborative, SRC1, SRC2]. In representation-based classification systems, a dictionary, whose columns consist of the training samples that are stacked in such a way that a subset of them corresponding to a class, is pre-defined. A test sample is expected to be a linear combination of all points from the same class as the test sample. Therefore, given a predefined dictionary matrix, and a test sample , we expect the solution from , carry enough information about the class of . The two well-known representation based classification methodologies are sparse representation-based classification (SRC) [SRC1] and collaborative representation based classification (CRC) [collaborative]. Out of these two, SRC provides slightly improved accuracy by solving a sparse representation problem, i.e., producing a sparse solution from . Then, the location of the non-zero elements of , which is also known as support set, provides us with the class of the query . Despite improved recognition accuracy, SRC solutions are iterative solutions and can be computational demanding compared to CRC. In a recent work [CSEN], a compact neural network design that can be considered as a bridge between learning-based and representation-based methodologies was proposed. The so-called Convolutional Support Estimation Network (CSEN) uses a pre-defined dictionary and learns a direct mapping using moderate/low size training set, which maps query samples, , directly to the support set of representation coefficients, (as it should be purely sparse in the ideal case).

In this study, to address the aforementioned limitations in Covid-19 diagnosis from X-ray images we propose a CSEN-based approach. Since the largest set of Covid-19 X-ray images ever compiled is used in this study, the proposed approach can be evaluated rigorously against a high-level of diversity to obtain a reliable analysis. The general pipeline of the proposed CSEN based recognition scheme is illustrated in Fig. 3. In order to obtain highly discriminative features, we use the recently proposed CheXNet [chexnet], which is the fine-tuned version of layer Dense Convolutional Network (DenseNet-121) [DenseNet] by using over frontal view X-ray images form classes. Having the pre-trained CheXNet for feature extraction, we develop two different strategies to obtain the classes of query X-ray images: (i) using collaborative representation-based classification with a proper pre-processing; (ii) a slightly modified version of our recently proposed convolution support estimator (CSEN) models. The proposed CSEN scheme outperforms the competing methods and achieves over of sensitivity and over for specificity in this challenging dataset.

Fig. 3: The proposed approach for Covid recognition from X-ray images. The proposed convolution support estimator network (CSEN) which can be trained from a moderate size training set. The pipeline employs the pre-trained deep neural network for feature extraction. is the dimensional reduction (PCA) matrix, the coarse estimation of representation coefficient (sparse in ideal case), is obtained via the denoiser matrix, , where and is the pre-defined dictionary matrix of training samples (before dimensional reduction).

The rest of the paper is organized as follows. In Section II, notations and mathematical preliminaries are given with emphasis on sparse representation and sparse support estimation. Then in Section III, a literature review on deep learning models over X-ray images and representation based classification is presented. The proposed CSEN-based Covid-19 recognition system is introduced in Section IV along with two recent alternative approaches that are used as the competing methods. The data collection is also explained in this section. Experimental setup and the main results are provided in Section V. Finally, Section  VI concludes the paper and suggests topics for future research.

Ii Preliminaries and Mathematical Notations

Ii-a Notations

In this study, the

-norm of a vector

is defined as for . On the other hand, the -norm of the vector is defined as and the -norm is defined as . A signal is called strictly -sparse if . Sparse support set or simply support set, of sparse signal can be defined as the set of non-zero coefficients’ location, i.e., .

Ii-B Sparse Signal Representation

Sparse representation (SR) of a signal in a pre-defined set of waveforms, , can be defined as representing as a linear combination of only a small subset of atoms of in the dictionary , i.e, . Defining these sets, which dates back to Fourier’s pioneering work [Fourier], has been excessively studied in the literature. In the early approaches, these sets of waveforms have been selected as a collection of linearly independent and generally orthogonal waveforms (which are called a complete dictionary or basis i.e,

) such as Fourier Transform, DCT and Wavelet Transform, until the pioneering work of Mallat

[mallat1993] on overcomplete dictionaries (). In the last decade, interest in SR research increased tremendously and their wide range of applications includes denoising [denoising], classification [classification]

, anomaly detection

[AnomalyDetection, AnomalyDetection2], Deep Learning [deeplearning] and Compressive Sensing (CS) [CS1, CS2].

With a possible dimensional reduction that can be satisfied via a compression matrix (), sample can be obtained from ,


where can be called the equivalent dictionary. Because Eq. (1) describes an under-determined system of linear equations, finding the representation coefficient vector requires at least one more constraint to have a unique solution. Using the prior information about sparsity, the following representation


which is also a sparse representation of has a unique solution provided that satisfies some required properties [spark]. However, the optimization problem in Eq. (2) is a NP-hard. Fortunately, the following relaxation


produces exactly the same solution as that of Eq. (2) provided that obeys some criteria [candesRIP] and . In addition, real world applications generally exhibit not exact sparsity but approximate sparsity. Furthermore, the query sample can be corrupted with an additive noise pattern. In this case, the equality constraint in Eq. (3) can be further relaxed such as in the Basis Pursuit Denoising (BPDN) [BP]: , where is a small constant that depends on the noise level.

We may refer to the Sparse Support Estimation (SE) problem as finding the indices a set, , of non-zero elements of  [SE1, SE2]. Indeed, in many applications, SE can be more important than finding the magnitude and sign of as well as , which refers to the sparse Signal Recovery (SSR) via a recovery technique, such as Eq. (3). For example, in a sparse representation based classification system, a query sample can be represented with sparse coefficient vector, , in the dictionary, in such a way that when we recover this representation coefficient from , the solution vector is expected to have a significant number of non-zero coefficients coming from the particular locations corresponding to the class of .

Readers are referred to [CSEN] for more detailed literature review on SE and its applications. In the sequel, we briefly summarize the building blocks of the proposed approach.

Iii Background and Prior Art

Iii-a CheXNet

In the proposed approach, we first use the pre-trained deep network, CheXNet, to extract discriminative features from raw X-ray images. CheXNet was developed for pneumonia detection from the chest X-ray images [chexnet]. In [chexnet], it was claimed that their CheXNet can perform even better than expert radiologist in the pneumonia detection problem. This deep neural network design is based on previously proposed DenseNet [DenseNet]

that consists of 121 layers. It is first pre-trained over ImageNet dataset

[imagenet] and performed transfer learning over frontal-view chest X-ray images in the ChestX-ray14 dataset [XrayDataset].

Iii-B Representation Based Classification

Given a test sample , which represents either the extracted features, , or their dimensionally reduced version, i.e., . In developing the dictionary, training samples are stacked in with particular locations in such a way that the optimal support for a given query should be the set of all points coming from the same class as . Therefore, a solution vector, of is supposed to have enough information, i.e., the sparse support should be the set of location indices of the training sample from the same class as . This strategy is generally known as representation-based classification. However, a typical solution of is not necessarily a sparse one especially when its size grows with more training samples, which results in a highly under-determined system of linear equations. Fortunately, if one estimates the representation coefficient vector with a sparse recovery design such as -minimization as in Eq. (3), we can expect that the important non-zero entries of the solution, , are grouped in the particular locations that correspond to the locations of the training samples from the same class as . This can be a typical example of scenarios where support estimation can be more valuable than the magnitudes and sign recovery as explained in Section II-B.

For instance, [SRC2] proposed a systematic way of determining the identity of face images using -minimization. The authors develop a three-step classification technique that includes: (i) normalization of all the atoms in and to have unit -norm; (ii) estimating the representation coefficient vector via sparse recovery, i.e., ; and (iii) finding the residuals corresponding to each class via , where is the group of the estimated coefficients, , that correspond to class .

This technique, which is known as Sparse Representation based Classification (SRC), and its variants have been applied to a wide range of applications in literature [jointsparse, vehicleclassification], e.g., human action recognition [human-action], and hyperspecral image classification [hyperspecral], to name a few. Despite the good recognition accuracy performance of SRC systems, their main drawbacks is the fact that their sparse recovery algorithms (e.g., -minimization) is iterative methods and computationally costly, rendering them infeasible in real time applications. Later, the authors of [collaborative] introduced Collaborative Representation based Classification (CRC), which is similar to SRC except for the use of traditional -minimization in the second step; . Thus, CRC does not require an iterative solution to obtain representation coefficient thanks to that -minimization has a closed form solution, . Although, the sparsity in cannot be guaranteed, it has often been reported to achieve a comparable classification performance, especially in small-size training datasets.

Iv Proposed Approach

Fig. 4: Baseline Approach I: collaborative representation based classification is fed by deep learning based extracted features that are pre-processed.
Fig. 5: Baseline Approach II: A 5-layer MLP layer is used over the features of CheXNet.

Iv-a The Benchmark Dataset: QaTa-Cov19

Covid-19 chest X-ray images were gathered from different publicly available but scattered image sources. However, the major sources of Covid-19 images are Italian Society of Medical and Interventional Radiology (SIRM) COVID-19 Database [CovidDataSet2], Radiopaedia [CovidDataSet3], Chest Imaging (Spain) at thread reader [CovidDataSet4] and online articles and news-portals. The authors have carried out the task of collecting and indexing the X-ray images for Covid-19 positive cases reported in the published and preprint articles from China, South Korea, USA, Taiwan, Spain, and Italy, as well as online news-portals (up to 20th April 2020). Therefore, these X-ray images represent different age groups, gender, ethnicity and country. Negative Covid19 cases were normal, viral and bacterial pneumonia chest X-ray images and collected from the Kaggle chest X-ray database. Kaggle chest X-ray database contains 5863 chest X-ray images of normal, viral and bacterial pneumonia with varying resolutions [2018chest]. Out of these chest X-ray images, images are normal images and the remaining are bacterial and viral pneumonia images. Sample X-ray images from QaTa-Cov19 dataset are shown in Fig. 6.

Fig. 6: Samples from the benchmark QU-Chest dataset.

Iv-B Feature Extraction

With their outstanding performance in image classification along with other inference tasks, deep neural networks became a dominant paradigm. However, these techniques usually necessitates a large number of training samples (e.g., several hundred-thousand to millions depending on the network size) to achieve an adequate generalization capability. That is to say, the aforementioned problem of the data scarcity with the Covid-19 case prevents us from training a deep learning technique from scratch. Albeit, we can still leverage their power by finding properly pre-trained models for similar problems. To this end, we use a-state-of-the art pneumonia detection network, CheXNet, whose details are summarized in Section III-A. With the pre-trained model, we extract

-long vectors, right after the last average pooling layer. After data normalization (zero mean and unit variance), we obtain a feature vector


A dimensionality reduction PCA is applied to in order to get the query sample, , where is PCA matrix ().

Iv-C The proposed CSEN-based Classification

Considering the limited number of training data in our Covid-19 dataset, a representation-based classification can be applied hereafter to obtain the class of using the dictionary (in the form of ), whose columns are stacked training samples with class-specific locations.

As discussed earlier, sparse representation-based classification is a support estimation problem which is expected to be an easier task than a sparse signal recovery problem. On the other hand, even if the exact signal recovery is not possible in noisy cases or in cases where is not exactly but approximately sparse (which is the case in almost all the time in dictionary-based classification problems), it is still possible to recover the support set exactly [exact4, SE1, exact1, Volkan] or partially [Volkan, SE3, partial2]. However, many works in the literature dealing with SE problems tend to first apply a sparse recovery technique on to first get , then use simple thresholding over to obtain a sparse support estimation, . Nevertheless, SSR techniques such as -minimization are rather slow and their performance varies from one SRR tool to another [CSEN]. In our previous work [CSEN], we proposed an alternative solution for this handcrafted sparse recovery approach which aims to learn a direct map from test sample to the support set . Along with the speed and stability compared to conventional SSR based techniques, and recent deep learning based solutions to SRR problem, CSEN has a crucial advantage of having a compact design that can achieve a good performance level even over scarce training data.

Mathematically speaking, an ideal CSEN is supposed to yield a binary mask :


which indicates the true support i.e., . In order to approximate this ideal case, a CSEN network,

produces a probability vector

which returns a measure about the probability of each index being in such that . Having the estimated probability map, estimating the support can easily be done via , by thresholding with where is a fixed threshold.

A CSEN is composed of fully convolutional layers, and as input it takes a proxy, , of sparse coefficient vector, which is a coarse estimation of i.e., or simply . Using such a proxy of , instead of making inference directly on has also studied in a few more recent studies. For instance, In [degerli, inference], the authors proposed reconstruction-free image classification from compressively sensed images.

The input vector is reshaped to a 2-D plane in order to use is with 2-D convolutional layers. This transformation is performed via re-ordering the indices of the atoms in such a way that the non-zero elements of the representation vector for a specific class come together in the 2-D plane. A representative illustration of the proposed dictionary design with compared to the traditional one is shown in Fig. 7.

Hereafter the proxy is convolved with the weight kernels, connecting the input with the next layer with filters to yield the inputs of the next layer, with the biases as follows:


where is the weight bias, is the down- or up-sampling operation and . In more general form, the feature map of layer is defined as,


Therefore, the trainable parameters of CSEN will be:
for a L layer CSEN design.

In developing the dictionary that is to be used in the sparse representation based classification, the training samples are stacked-in by grouping of them according to their classes. Thus, instead of using traditional -minimization formulation as in Eq. (3), the following group -minimization formulation may result in increased classification accuracy,


where is the group of coefficients from the class. In this manner, one possible cost function for a SE network would be,


where is network output at location and is the ground truth binary mask of the sparse code . Due to its high computational complexity, we approximate the cost function in (8) with a simpler average pooling layer after convolutional layer, which can produce directly the estimated class in our CSEN design. An illustration of proposed CSEN-based Covid-19 recognition is shown in Fig. 3.

Fig. 7:

The illustration of proposed dictionary design vs. conventional design in representation based classifiers.

Iv-D Competing Methods

This section summarizes the competing methods that are selected among numerous alternatives due to their superior performance levels obtained in similar problems. For a fair comparative evaluations, all classification methods have the same input feature vectors fed to the proposed CSENs.

Iv-D1 Collaborative representation-based classification

As a possible competing technique to the proposed CSEN based technique which is a hybrid method, CRC [collaborative] is a direct and representation-based classification method. It is a non-iterative support estimation technique, that satisfies faster and comparable classification performance with SRC while it is more stable compared to existing iterative sparse recovery tools as it is shown in [CSEN]. In the first step of CRC, the trade-off parameter of regularized least square solution is set as .

Iv-D2 Multi-layer Perceptron (MLP) classification

As one of the most-common classifiers, a 4-hidden layer MLP is used for this problem. For training we used Back-Propagation (BP) with Adam optimization technique [adam]. The network and training hyper-parameters are as follows: learning rate,

, and moment updates

, , and

as the number of epochs. Fig.

8 illustrates the network configuration in detail. This network configuration has achieved the best performance among others (deeper and shallower) where deep configurations have suffered from over-fitting while the shallow ones exhibit an inferior learning performance.

Fig. 8: The MLP configuration.

Iv-D3 Support Vector Machines (SVMs)

For a multi-class problem, the first objective is to select the SVM topology for ensemble learning: one-vs-one or one-vs-all. In order to find the optimal topology and the hyper-parameters (e.g. kernel type and its parameters) we first performed a grid-search with the following variations and setting: kernel function {linear, radial basis function (RBF)}, box constraint (

parameter) in the range with log scale, and kernel scale ( for the RBF kernel) in the range with log scale.

Iv-D4 k-Nearest-Neighbor (k-NN)

Finally, we use a traditional approach, k-Nearest Neighbor (k-NN) is used with PCA dimensionality reduction. In a similar fashion, the distance metric and the k-value are optimized by a prior grid-search. The following distance metrics are evaluated: City-block, Chebyshev, correlation, cosine, Euclidean, Hamming, Jaccard, Mahalanobis, Minkowski, standardized Euclidean, and Spearman metrics. The k-value is varied within the range of with log scale.

Bacterial Viral Normal Covid-19 Bacterial Viral Normal Covid-19 Bacterial Viral Normal Covid-19
Accuracy Sensitivity Specificity
NN 0.777 0.801 0.903 0.950 0.623 0.612 0.899 0.965 0.898 0.859 0.904 0.949
SVM 0.771 0.788 0.928 0.928 0.586 0.632 0.911 0.981 0.916 0.837 0.933 0.924
MLP 0.761 0.765 0.923 0.947 0.620 0.561 0.885 0.965 0.872 0.828 0.936 0.946
CRC 0.820 0.827 0.928 0.955 0.758 0.550 0.922 0.968 0.869 0.913 0.930 0.954
ReconNET 0.765 0.785 0.918 0.936 0.590 0.625 0.891 0.970 0.902 0.834 0.927 0.933
CSEN1 0.793 0.805 0.926 0.955 0.656 0.642 0.906 0.985 0.901 0.856 0.932 0.953
CSEN2 0.794 0.803 0.927 0.959 0.659 0.646 0.904 0.985 0.900 0.852 0.934 0.957
TABLE I: Classification Performances of the proposed CSEN and competing methods. The best Covid-19 recognition (sensitivity) rates are highlighted.

V Experimental Results

V-a Experimental Setup

We have performed our experiments over the QaTa-Cov19 dataset, which consists of normal and three pneumonia classes: bacterial, viral, and Covid-19. The proposed approach is evaluated using a stratified 5-fold cross-validation (CV) scheme with a ratio of 80% for training and 20% for the test (unseen folds) splits, respectively.

Class # of Samples
Training Samples
2760 2208 2208 552
1485 1188 2208 297
Normal 1579 1263 2208 316
Covid-19 462 370 2208 92
Total 6286 5029 8832 1257
TABLE II: Number of images per class and per-fold before and after data augmentation.

Table II

shows the number of X-ray images per class in the QaTa-Cov19 dataset. Since the dataset is unbalanced, we have applied data augmentation to the training set in order to balance the size of each class in the train set. Therefore, the X-ray images in viral and Covid-19 pneumonia, and normal classes are augmented up to the same number as the bacterial pneumonia class in the train set. We use Image Data Generator by Keras to perform data augmentation by applying ZCA whitening with epsilon of

, randomly rotating the X-ray images in a range of 10 degrees, randomly shifting images both horizontally and vertically within the interval of . In each CV fold, we use a total of 8832 and 1257 images in the train and test (unseen in the fold) sets, respectively.

The experimental evaluations of SVM, k-NN and CRC are performed using MATLAB version 2019a, running on PC with Intel ® i7-8650U CPU and 32 GB system memory. On the other hand, MLP and CSEN methods are implemented using Tensorflow library

[abadi2016tensorflow] with Python on NVidia ® TITAN-X GPU card. For the CSEN training, ADAM optimizer [adam] is used with the proposed default learning parameters: learning rate, , and moment updates , with only 15 Back-Propagation epochs. Neither grid-search nor any other parameter or configuration optimization was performed for CSEN.

V-B Experimental Results

The same network configurations are used for CSEN as in [CSEN]

. Accordingly, we use two compact CSEN designs: CSEN1 and CSEN2, respectively. The first CSEN network consists of only two hidden convolutional layers, the first layer has 48 neurons and the second has 24. ReLu activation function is used in the hidden layers and the filter size was

. On the other hand CSEN2 uses max-pooling and has one additional hidden layer with 24 neurons to perform transposed-convolution. CSEN1 and CSEN2 are compared against the 6 competing methods under the same experimental setup.

For the dictionary construction in each CSEN design, images for each class (from the augmented training samples per fold) are stacked in a such way that the representation coefficient in the 2-D plane, has size as shown in Fig. 7. The rest of the images in the training set are used to train each CSEN i.e., samples from each class. We use PCA dimensional reduction matrix, with the compression ratio, . Therefore, we have equivalent dictionary, , and denoiser to obtain a coarse estimation of the representation (sparse in ideal case) coefficients, . Hereafter, the CSEN networks are trained to have class of from input as illustrated in Fig., 3.

Due to the lack of other learning-based SE studies in the literature, we chose a deeper network compared to CSEN designs to investigate the role of network depth in this problem. ReconNet [reconnet] was proposed as a non-iterative deep learning solution to compressive sensing problem i.e., and it is one of the state-of-the-art in compressively sensed image recognition task. It consists of 6 fully convolutional layers and one dense layer in front of the convolutional ones, which act as the learned denoiser for the mapping from to . Then, the convolutional layers are responsible for producing the reconstructed signal, from . Therefore, by replacing this dense layer with the denoiser matrix , this network can be used as a competing method.

Both CSEN and the modified ReconNet use as a input, which is produced using an equivalent dictionary and its pseudo-inverse matrix .

Fig. 9: False Negatives of proposed Covid-19 recognition scheme.

In designing the dictionary of CRC system, all training samples are stacked in the dictionary, , i.e., 2208 samples from each class. The same PCA matrix used in CSEN based recognition, is applied to features, . Therefore, a dictionary of size and the corresponding denoiser matrix of size are used in the CRC framework.

# of trainable
672,836 11,089 16,297 22,914
TABLE III: The number of network parameters of each method.
Time (in sec.)
13.4176 40.7878 0.2196 0.2272 0.2993 0.2935
TABLE IV: Computation times (sec) of each method over 1257 test images.

The classification performance of the proposed CSEN-based approach and the competing methods is presented in Table I. As can be easily observed from the Table I, the proposed approaches surpass all competing methods in Covid-19 recognition performance by achieving sensitivity, and over specificity. As shown in Table III, compared to MLP and ReconNet, the proposed CSEN designs are very compact, and computationally efficient. This is evident in Table IV where the computational complexity (measured as total computation, time over the 1257 test images) is reported.

When compared against CRC in particular, CSEN-based classification has two advantages; computational efficiency and, a superior Covid-19 recognition performance. The computational efficiency comes from the fact that a larger size dictionary matrix (of size of ) is used in CRC and hence, this requires more computations in terms of matrix-vector multiplications. Furthermore, saving the trainable parameters () and a light dictionary matrix coefficients () in the test device is more memory efficient compared to saving coefficients () of larger size dictionary used in CRC.

CRC (Light)
Accuracy Sensitivity Specificity
Bacterial 0.8129 0.7464 0.8650
Viral 0.8163 0.5461 0.8998
Normal 0.9267 0.9170 0.9299
Covid-19 0.9564 0.9394 0.9578
TABLE V: Performance of CRC algorithm when the dictionary (size of 625 per class) that is used in CSEN is used.

For further analysis, we also tested the CRC framework by using the light dictionary (of size ) used in CSEN based recognition. We called it CRC (light), and as it can be seen in Table V, the performance of CRC further reduced, and there was no significant improvement concerning the computational cost. When it comes to creating deeper convolutional layers instead of using CSEN designs, such as the modified ReconNet, the results presented in Table I shows us that compact CSEN structures are indeed preferable to achieve superior classification performances compared to deeper networks.

CSEN2 Predicted
Bacterial Viral Normal Covid-19
Bacterial 1818 636 180 126
Viral 338 959 127 61
Normal 15 71 1428 65
Real Covid-19 0 3 4 455

The overall (cumulative) confusion matrix of the proposed recognition scheme.

Finally, Table VI presents the overall (cumulative) confusion matrix of the proposed CSEN-based Covid-19 recognition approach over the new QaTa-Cov19 Dataset. The most critical mis-classifications are the false-positives, that is, the mis-classified Covid-19 X-ray images. The confusion matrix shows that the proposed approach has mis-classified 7 Covid-19 images (out of 462). The 3 out of 7 misclassifications are still in “Viral Pneumonia” category, which can be an expected confusion due to the viral nature of Covid-19. However, the other four cases are mis-classified as “Normal” which is indeed a severe clinical misdiagnosis. A close look to these false-negatives in Fig. 9 reveals the fact that they are indeed very similar to normal images where typical Covid-19 patterns are hardly visible even by an expert’s naked eye. It is possible that these images come from the patients who were in the very early stages of Covid-19.

Vi Conclusions

The commonly used methods in Covid-19 diagnosis, namely Reverse Transcription-Polymerase Chain Reaction and Computed Tomogrophy have certain limitations and drawbacks such as long processing times and unacceptably high mis-diagnosis rates. These drawbacks are also shared by most of the recent works in the literature based on deep learning due to the data scarcity from the Covid-19 cases. Although Deep Learning based recognition techniques are dominant in Computer Vision where they achieved state-of-the-art performance, their performance degrades fast due to data scarcity, which is the reality in this problem at hand. This study aims to address such limitations by proposing a robust and highly accurate Covid-19 recognition approach directly from raw X-ray images without any pre- or post-processing. The proposed approach is based on the CSEN that can be seen as a bridge between Deep Learning models and representation-based methods. CSEN uses both a dictionary and a set of training samples to train direct map from the query samples to the sparse support set of representation coefficients. With this unique ability and having the advantage of a compact network, the proposed CSEN-based Covid-19 recognition systems surpass the competing methods and achieve over

sensitivity and over specificity. Furthermore, they yield the most computationally efficient scheme in terms of speed and memory. Finally, the largest dataset of X-ray images, QaTa-Cov19 will be released along with this study as a benchmark dataset in this domain. This will, henceforth, accelerate the research efforts globally and support the fight against Covid-19 worldwide.