Log In Sign Up

OSegNet: Operational Segmentation Network for COVID-19 Detection using Chest X-ray Images

by   Aysen Degerli, et al.

Coronavirus disease 2019 (COVID-19) has been diagnosed automatically using Machine Learning algorithms over chest X-ray (CXR) images. However, most of the earlier studies used Deep Learning models over scarce datasets bearing the risk of overfitting. Additionally, previous studies have revealed the fact that deep networks are not reliable for classification since their decisions may originate from irrelevant areas on the CXRs. Therefore, in this study, we propose Operational Segmentation Network (OSegNet) that performs detection by segmenting COVID-19 pneumonia for a reliable diagnosis. To address the data scarcity encountered in training and especially in evaluation, this study extends the largest COVID-19 CXR dataset: QaTa-COV19 with 121,378 CXRs including 9258 COVID-19 samples with their corresponding ground-truth segmentation masks that are publicly shared with the research community. Consequently, OSegNet has achieved a detection performance with the highest accuracy of 99.65 precision.


Reliable COVID-19 Detection Using Chest X-ray Images

Coronavirus disease 2019 (COVID-19) has emerged the need for computer-ai...

COVID-19 Infection Localization and Severity Grading from Chest X-ray Images

Coronavirus disease 2019 (COVID-19) has been the main agenda of the whol...

COVID-19 Infection Map Generation and Detection from Chest X-Ray Images

Computer-aided diagnosis has become a necessity for accurate and immedia...

A Comparative Study on Early Detection of COVID-19 from Chest X-Ray Images

In this study, our first aim is to evaluate the ability of recent state-...

Machine learning approaches for COVID-19 detection from chest X-ray imaging: A Systematic Review

There is a necessity to develop affordable, and reliable diagnostic tool...

Multi-Task Driven Explainable Diagnosis of COVID-19 using Chest X-ray Images

With increasing number of COVID-19 cases globally, all the countries are...

1 Introduction

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has infected millions after it was first reported in 2019. The World Health Organization (WHO) has declared COVID-19 as a pandemic since it is highly contagious (especially its mutations), and affects seriously immunocompromised patients and elderly [vishnevetsky2020rethinking]. However, performing a reliable diagnosis of COVID-19 is challenging since it reveals similar symptoms such as cough, breathlessness, and fever compared to other viral diseases [singhal2020review]. Moreover, COVID-19 may not be always symptomatic, causing asymptomatic individuals to spread the disease to broad [10.3389/fpubh.2020.00473]. Consequently, computer-aided diagnosis is necessary to perform fast and accurate COVID-19 detection to prevent the further spread of the disease.

COVID-19 diagnosis can be performed via nucleic acid detection with real-time polymerase chain reaction (RT-PCR) and imaging techniques: computed tomography (CT) and chest X-ray (CXR) imaging. Even though RT-PCR is defined as the reference standard to diagnose COVID-19, it lacks stability in the laboratory test results with high false-negatives rate [tahamtan2020real]. Contrary to RT-PCR, CT has higher sensitivity level [bernheim2020chest]. However, its clinical utility is limited especially for asymptomatic individuals [waller2020diagnostic]. Thus, CXR imaging is widely used due to its fast acquisition, easy accessibility, less radiation exposure, and lower risk of cross-infection among other diagnostic tools [cozzi2020chest].

Deep Learning (DL) has achieved a remarkable performance in the COVID-19 diagnosis using CXRs. Many studies [narin2021automatic, apostolopoulos2020covid, wang2020covid, chowdhury2020pdcovidnet, pham2020classification]

used DL models to perform COVID-19 classification by transfer learning. However, they have evaluated the performance of deep networks only over scarce and limited size datasets. The data scarcity has the potential to cause overfitting since DL models need significantly large amount of data for generalization. Moreover, the control group of the aforementioned studies contains only healthy subjects or limited thoracic diseases, i.e. bacterial or other viral pneumonia against COVID-19 pneumonia. Thus, their clinical usage is unfeasible for real-case scenarios. Additionally, several studies

[degerliICIP, keidar2021covid, tahir2022deep] have investigated the decision-making process of deep models in classification tasks. Accordingly, the unreliability of DL models was revealed by the activation maps, where their attention was on the irrelevant areas of CXRs, such as background, text, or bones rather than the lungs. Thus, few studies [degerli2021covid, tahir2021covid] performed COVID-19 pneumonia segmentation for a reliable COVID-19 detection with deep networks using CXRs.

Figure 1: The proposed OSegNet model for COVID-19 pneumonia segmentation is illustrated, where the transfer learning is performed at the encoder block, and operational layers (Oper2D) are used at the decoder block.

In this study, to address the aforementioned limitations, we propose Operational Segmentation Net

work (OSegNet) that performs COVID-19 pneumonia segmentation for the diagnosis using CXR images. Contrary to convolutional layers used in many deep networks, operational layers with generative neurons of Self-Organized Operational Neural Networks (Self-ONNs)

[KIRANYAZ2021294, malik2021self, yilmaz2021self, kelecs2021self, devecioglu2021real]

are used in the decoder block. Self-ONNs are heterogeneous network models with generative neurons that can create any non-linear transformation in each kernel element. Such diversity does not only yield a superior learning performance but also allows a significant reduction in the network depth and complexity. The proposed OSegNet has an autoencoder structure except that operational layers are used at the decoder as illustrated in Fig.

1. Thus, this study uses operational layers for the first time for image segmentation. Additionally, in this study, the QaTaCOV-19 dataset that was introduced previously by our study [degerli2021covid] is extended to reach COVID-19 samples with their corresponding ground-truth segmentation masks. Thus, together with a control group of CXRs from healthy subjects and different thoracic diseases, QaTa-COV19111The benchmark QaTa-COV19 is publicly shared at the repository is the largest publicly available dataset for COVID-19 pneumonia segmentation over CXR images.

The rest of the paper is organized as follows. The proposed OSegNet model and QaTaCOV-19 dataset are introduced in Section 2. The experimental results and conclusion are given in Section 3 and Section 4, respectively.

2 Methodology and Materials

In this section, the proposed OSegNet model is first introduced, and then, the details of the benchmark QaTa-COV19 dataset are presented.

2.1 OSegNet: Operational Segmentation Network

Convolutional Neural Networks (CNNs) are widely used for many computer vision tasks including COVID-19 diagnosis. However, the potential of CNNs is limited due to the homogeneous network structure and linear neuron model. Thus, many studies have proposed deeper structures with skipping connections to diverse the modality of CNNs for boosting their performance. Furthermore, the performance is increased with transfer learning that helps to faster and stable convergence of the model.

Contrary to convolutional layers, operational layers [KIRANYAZ2021294] have a generative neuron model that can create any non-linear transformation of each kernel element to achieve a highly heterogeneous network. Accordingly, each neuron input, at layer, is calculated as follows:


where is the bias, is the number of neurons at the previous layer, and a nodal operation, is performed between the weights of the layer, and the outputs of the previous layer, . Nodal operator functions are generated during back-propagation training using Taylor polynomial approximation of any non-linear function. Thus, nodal operator functions can define any arbitrary function, as the infinite sums of the function’s derivatives at a point as follows:


where is the derivative of at the point , and is the factorial of . Accordingly, nodal operator functions can be truncated by the order Taylor approximation as follows:


where is the array contains weights . The Maclaurin series representation of (3) can be formulated for using the tangent hyperbolic (tanh

) activation function that maps the neuron outputs into

as follows:


where is dropped due to the compensation from the common bias element, of each neuron. The structure of OSegNet is similar to an autoencoder that maps the input image, to its output mask, , where the network consists of encoder , and decoder parts as depicted in Fig. 1

. Accordingly, the OSegNet encoder is composed of a state-of-the-art model, where its weights are initialized with the ImageNet weights by transfer learning. The proposed model has operational layers as decoding the features of the state-of-the-art model, where the decoder

consists of number of operational layers composed of five decoder blocks. Each decoder block includes an operational transpose layer for upsampling by

, batch normalization, and

tanh activation function. The output of the last block is attached to an operational layer with sigmoid activation function. For each operational layer, kernel size of is used sequentially with the filter sizes of . Finally, OSegNet is trained over number of samples , where and are training data and ground-truth masks, respectively.

In this study, the state-of-the-art networks: DenseNet-121 [huang2017densely] and Inception-v3 [szegedy2016rethinking] are used as the encoder of the OSegNet model. Additionally, the decoder structures: UNet++ [zhou2018unet++] and DLA [yu2018deep] that merges encoder and decoder with skipping connections and nested convolutional blocks used as the competing networks against the proposed model.

Encoder Model Sensitivity Specificity Precision F1-Score F2-Score Accuracy
DenseNet-121 UNet++ 99.91
DLA 99.91
OSegNet (q=1) 99.91
OSegNet (q=2)
OSegNet (q=3) 87.25 87.42 87.32
OSegNet (q=4) 99.91 89.85 99.78
OSegNet (q=5)
Inception-v3 UNet++
DLA 99.91 89.63 87.89 99.78
OSegNet (q=1)
OSegNet (q=2) 99.78
OSegNet (q=3) 89.36 88.75 99.78
OSegNet (q=4) 99.78
OSegNet (q=5)
Table 1: COVID-19 pneumonia segmentation performance results (%) computed over the test (unseen data) set of QaTa-COV19 dataset using state-of-the-art and the proposed OSegNet models.

2.2 QaTa-COV19 Dataset

Tampere University and Qatar University researchers have compiled the QaTa-COV19 dataset that is the largest CXR dataset for COVID-19 pneumonia segmentation. The control group images of the dataset are obtained from ChestX-ray14 dataset [wang2017chestx] that consists of CXRs from healthy subjects and different thoracic diseases. Additionally, COVID-19 images are collected from the publicly available BIMCV-COVID19+ dataset [vaya2020bimcv] along with the CXRs from our previous study [degerli2021covid]. In this study, we annotated the CXRs of BIMCV-COVID19+ [vaya2020bimcv] to create the extended version of QaTa-COV19. For this purpose, we have first eliminated the acquisitions from the same patient, session, and run in BIMCV-COVID19+ [vaya2020bimcv] to remove any duplications.

Training Samples
Total 93,669 106,524 27,709
Table 2: Details of QaTa-COV19 dataset.

The ground-truths of CXRs are generated by the collaborative human-machine annotation approach that enables fast and accurate annotation of COVID-19 pneumonia regions using deep networks inspired by U-Net [ronneberger2015u], UNet++ [zhou2018unet++], and DLA[yu2018deep] architectures as used in our previous study [degerli2021covid]. These networks are trained by previously annotated COVID-19 samples and healthy subjects that are from the group-I data in [degerli2021covid]. The trained segmentation networks are used to predict the ground-truth masks of CXRs from BIMCV-COVID19+ [vaya2020bimcv]. Accordingly, the best predictions of the segmentation networks are selected as the ground-truth segmentation masks by the collaboration of expert medical doctors. At last, the predicted segmentation masks of only CXR images are not selected since they are not accurate enough; hence, they are manually drawn by medical doctors.

Table 2 shows the details of QaTa-COV19 dataset. Since the train and test sets of ChestX-ray14 [wang2017chestx] are predefined, COVID-19 samples are split with the same train/test ratio as in [wang2017chestx] by taking the patient information into account; thus, they contain different subjects. The CXRs in QaTaCOV-19 dataset are resized to

pixels. We have applied data augmentation using the Image Data Generator in Keras. Accordingly, CXRs are randomly rotated in a

degree range and shifted vertically and horizontally with the nearest mode to fill pixels outside the input boundaries.

3 Experimental Evaluation

In this section, the experimental setup is introduced. Then, the experimental results are reported over the QaTa-COV19 dataset.

3.1 Experimental Setup

The experimental evaluations are performed over the test (unseen) set of the QaTaCOV-19 dataset. COVID-19 pneumonia segmentation is evaluated on a pixel level, where foreground (pneumonia) and background are considered as positive-class and negative-class, respectively. Accordingly, the standard performance metrics are calculated as follows: sensitivity is the ratio of correctly identified COVID-19 samples in the positive class, specificity is the rate of correctly detected control group samples in the negative class, precision is the ratio of correctly detected COVID-19 samples among the samples that are detected as positive class, accuracy is the ratio of correctly identified samples in the dataset. Lastly, the F-score is defined as follows:


where FScore is the harmonic average between sensitivity and precision for , whereas FScore tolerates sensitivity metric for . Accordingly, the objective is to achieve a high sensitivity level and FScore as minimizing the false alarm ().

The networks are implemented with the TensorFlow library on NVidia ® GeForce RTX 2080 Ti GPU card. For the optimizer, we have used Adam with its default parameter settings. Furthermore, a hybrid loss function is used that combines dice and focal loss by summation. Let the ground-truth mask be

, where the pixel label is , and the model prediction is

. Accordingly, the probabilities are defined as

and . Thus, we define the dice loss as follows:


where and are the height and width of the CXRs. Furthermore, the focal loss is defined as follows:


where the parameters are set as and . Accordingly, models are trained over epochs with a learning rate of .

Encoder Model Sensitivity Specificity Precision F1-Score F2-Score Accuracy
DenseNet-121 UNet++ 99.87 98.37
OSegNet (q=1) 98.15 97.69
OSegNet (q=2)
OSegNet (q=3)
OSegNet (q=4) 97.70 99.65
OSegNet (q=5)
Inception-v3 UNet++ 98.53 97.77
DLA 99.84
OSegNet (q=1)
OSegNet (q=2)
OSegNet (q=3) 99.84 98.09 97.72 99.65
OSegNet (q=4)
OSegNet (q=5)
Table 3: COVID-19 detection performance results (%) computed over the test (unseen data) set of QaTa-COV19 dataset using state-of-the-art and the proposed OSegNet models.

3.2 Experimental Results

In this section, we report the performances of COVID-19 pneumonia segmentation and detection. The COVID-19 pneumonia segmentation results are shown in Table 1, where state-of-the-art and the proposed OSegNet models are compared. The variation in the performance of OSegNet is investigated by changing the parameter. Primarily, we have observed that each model has achieved a successful pneumonia segmentation with an FScore of and specificity of . It can be seen from Table 1 that any model with the encoder of Inception-v3 outperforms DenseNet-121 simply due to its complex structure and higher number of trainable parameters. Accordingly, among state-of-the-art models, the best segmentation performance has been achieved by the duo of UNet++ and Inception-v3 with an FScore of . Nevertheless, the OSegNet with Inception-v3 encoder has achieved the highest sensitivity level of and FScore of among all.

UNet++ Predicted
Control Group COVID-19
Control Group
(a) Confusion Matrix UNet++
OSegNet Predicted
Control Group COVID-19
Control Group
(b) Confusion Matrix OSegNet
Table 4: Confusion matrices of the best computing UNet++ and the proposed OSegNet models with Inception-v3 encoders for COVID-19 detection.

The detection performances are presented in Table 3

, which are calculated per CXR sample. Accordingly, a CXR sample is classified as COVID-19 if

any pixel in the output mask is predicted as COVID-19 pneumonia. The duo of UNet++ and Inception-v3 holds the best detection performance among state-of-the-art with the highest sensitivity level of . Nevertheless, the highest FScore of and accuracy of has been achieved once again by the OSegNet model. Accordingly, the confusion matrices of the best computing models: UNet++ and OSegNet with Inception-v3 encoders are given in Table 4. It is observed that UNet++ only misses COVID-19 cases, whereas OSegNet has lower false alarms with only samples. Lastly, OSegNet model has M and M less number of parameters compared to the UNet++ model for both DenseNet-121 and Inception-v3 versions, respectively.

Model Trainable Non-Trainable


UNet++ M K
OSegNet (q=1) M K
OSegNet (q=2) M K
OSegNet (q=3) M K
OSegNet (q=4) M K
OSegNet (q=5) M K


UNet++ M K
OSegNet (q=1) M K
OSegNet (q=2) M K
OSegNet (q=3) M K
OSegNet (q=4) M K
OSegNet (q=5) M K
Table 5: The number of trainable and non-trainable parameters of the models.

4 Conclusions

Computer-aided diagnosis plays a vital role in the COVID-19 detection to prevent the further spread of the disease. As a major contribution, this study publicly shares the largest CXR dataset, QaTa-COV19 which consists of COVID-19 samples with their corresponding ground-truth segmentation masks along with control group CXRs. The experimental results over the QaTa-COV19 dataset show that the proposed OSegNet model has achieved the highest sensitivity level of for the COVID-19 segmentation, and precision of for the COVID-19 detection while the network complexity and depth has been reduced.