A novel coronavirus disease 2019 (COVID-19) was detected and has spread rapidly across various countries around the world since the end of the year 2019, Computed Tomography (CT) images have been used as a crucial alternative to the time-consuming RT-PCR test. However, pure manual segmentation of CT images faces a serious challenge with the increase of suspected cases, resulting in urgent requirements for accurate and automatic segmentation of COVID-19 infections. Unfortunately, since the imaging characteristics of the COVID-19 infection are diverse and similar to the backgrounds, existing medical image segmentation methods cannot achieve satisfactory performance. In this work, we try to establish a new deep convolutional neural network tailored for segmenting the chest CT images with COVID-19 infections. We firstly maintain a large and new chest CT image dataset consisting of 21,658 annotated chest CT images from 861 patients with confirmed COVID-19. Inspired by the observation that the boundary of the infected lung can be enhanced by adjusting the global intensity, in the proposed deep CNN, we introduce a feature variation block which adaptively adjusts the global properties of the features for segmenting COVID-19 infection. The proposed FV block can enhance the capability of feature representation effectively and adaptively for diverse cases. We fuse features at different scales by proposing Progressive Atrous Spatial Pyramid Pooling to handle the sophisticated infection areas with diverse appearance and shapes. We conducted experiments on the data collected in China and Germany and show that the proposed deep CNN can produce impressive performance effectively.READ FULL TEXT VIEW PDF
The novel coronavirus disease 2019 (COVID-19) has been spreading rapidly...
Coronavirus Disease spread globally and infected millions of people quic...
Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causi...
COVID-19 is a global health problem. Consequently, early detection and
COVID-19, a new strain of coronavirus disease, has been one of the most
In this paper, a 3D-RegNet-based neural network is proposed for diagnosi...
Early detection of COVID-19 is vital to control its spread. Deep learnin...
In December 2019, coronavirus disease 2019 (COVID-19) a new febrile respiratory tract illness caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was detected. The typical onset symptoms of COVID-19 patients are fever, cough, myalgia, dyspnea, and muscle aches. Despite the imposition of strict quarantine rule to limit its propagation, the COVID-19 infection has spread rapidly affecting countries worldwide. At the end of January 2020, the World Health Organization (WHO) declared that COVID-19 becomes a Public Health Emergency of International Concern . As of 11 April 2020, the WHO reported 1,610,909 worldwide cases with 99,690 deaths . While infection rates are decreasing in China, numbers of new infections are still exponentially growing in many other countries.
Reverse transcription polymerase chain reaction (RT-PCR) is one of the standard diagnostic methods to detect nucleotides from specimens obtained by oropharyngeal swab, nasopharyngeal swab, bronchoalveolar lavage, or tracheal aspirate . However, recent reports have indicated that the sensitivity of RT-PCR might not be high enough for detecting COVID-19 [2, 7], which can possibly be attributed to quality, stability and insufficient viral material in specimens. On the other hand, since chest Computed tomography (CT) images captured from COVID-19 patients frequently show bilateral patchy shadows or ground glass opacity in the lung , CT has become a vital complementary tool for detecting the lung associated with COVID-19. Comparing to RT-PCR test, chest CT is relatively easy to operate and has a high sensitivity for screening COVID-19 infection . Therefore, CT could serve as a practical approach for early screening and diagnosis of COVID-19 in China. However, as the increment of confirmed and suspected cases of COVID-19, manually contouring of lung lesions is a tedious and labor-intensive task. To speed up diagnosis and improve access to treatment, developing a fast automatically segmentation for COVID-19 infection is critical for the disease assessment.
Recently, with the rapid development of artificial intelligence[24, 23, 22, 21, 9, 11, 8], deep learning technology has been widely used in medical image processing due to its powerful feature representation. Several techniques based on deep learning have published to detect COVID‐19 pneumonia from CT images [18, 19, 12, 5]. Wang et al.  developed a deep learning method that could extract COVID-19’s graphical features in order to provide a clinical diagnosis ahead of the pathogenic test. Ayrton 
adopted the transfer learning technique with ResNet50 backbone to detect COVID-19. Wanget al.  introduced a deep convolutional neural network design tailored, called COVID-Net for the detection of COVID-19 cases from chest radiography images. Gozes et al.  presented a system that utilizes 2D and 3D deep learning models, modified and adapted existing deep network models and combined them with clinical understanding. Tang et al. 
trained a random forest (RF) model to assess the severity (non-severe or severe) based on quantitative features. Shiet al.  proposed an infection Size Aware Random Forest method (iSARF) for classification. Shan et al.  developed a deep learning-based system for segmentation and quantification of infection regions from CT scans. In summary, some deep learning based methods have been proposed to detect COVID-19 and viral pneumonia in chest CT images. To our knowledge, however, only few publications have investigated the segmentation task for COVID-19 chest CT images.
In this paper, we try to establish a new tailored deep convolutional neural network (CNN) for segmenting the chest CT images with COVID-19 infections. Fig. 1 shows the chest CT images with COVID-19 infection, which contain ground-glass opacities (GGOs), areas of consolidation, and a mix of both in all lung lobes. Most lesions were located peripherally, with a slight preponderance of dorsal lung areas. Due to the special structure and visual characteristics, the boundaries of COVID-19 infection regions are difficult to distinguish from the chest wall, making accurate segmentation for COVID-19 infection regions difficult. We observe that the boundaries of COVID-19 infection regions will be revealed by adjusting different parameters of window breadth and window locations in annotation processing, as shown in Fig. 1, which can be beneficial for the COVID-19 infection image segmentation.
We propose a three-dimensional (3D) convolution based deep learning method for automatic segmentation of COVID-19 infection regions as well as the entire lung from chest CT images, referred to as COVID-SegNet. The proposed method can be hugely beneficial for the early screening of patients with COVID-19. Inspired by the observation in annotation processing, the boundaries of COVID-19 infection regions are highlighted by adjusting the window breadth and window locations, we deign a Feature Variation (FV) block to handle the confusing boundaries. The central idea of the FV block is to implicitly enhance the contrast and adjust the intensity in the feature level automatically and adaptively for different images. Based on the captured features of previous layers, the FV block employs channel attention to obtain the global parameter to generate new features. In addition to the channel attention, the FV block uses spatial attention to guide the feature extraction from inputs in the encoder. Aggregating these features can effectively enhance the capability of feature representation for the segmentation of COVID-19. Furthermore, we propose a Progressive Atrous Spatial Pyramid Pooling (PASPP) to handle the challenging shape variations of COVID-19 infection areas. PASPP consists of a base convolution module followed by a cascade of atrous convolutional layers, which uses multistage parallel fusion branches to obtain the final features. Each atrous convolutional layer in PASPP only uses atrous filters with a reasonable dilation rate to cover different receptive fields. And by the progressively aggregated information from atrous convolutional layers, the information from multiple scales is effectively fused, which further promotes the performance of COVID-19 pneumonia segmentation.
The main contributions of the paper can be summarized as:
We propose a novel deep neural network (COVID-SegNet) for the segmentation of COVID-19 infection regions as well as the entire lung from chest CT images.
To address the key issue in the delineation of COVID-19 infection regions, a specific block, called Feature Variation (FV) block, is proposed to solve the problem of difficulty distinguishing COVID-19 pneumonia from the lung.
We introduce Progressive Atrous Spatial Pyramid Pooling (PASPP), which progressively aggregates information and obtains more effective contextual features.
To train the proposed networks, we maintain a novel and large dataset that consists of 21,658 chest CT images from 861 patients with confirmed COVID-19, which are annotated by experts. Ten cases captured from Germany are also used to test the robustness of the model.
This study was approved by the medical ethics committees of the participating hospitals. Further consent was waived with approval. In total, chest CT images of 861 patients with confirmed COVID-19 by RT-PCR are included in this study. These CT images were acquired at 5 Chinese hospitals (Beijing Tsinghua Changgung Hospital, Wuhan No.7 Hospital, Zhongnan Hospital of Wuhan University, Tianyou Hospital Affiliated to Wuhan University of Science & Technology, Wuhan’s Leishenshan Hospital) between January 2 and February 26, 2020. All imaging data were reconstructed by using a medium sharp reconstruction algorithm with a thickness of 0.625-10 mm (81% under 2mm). To protect privacy, we deleted the personally identifiable information (PII) from all CT scans. A total of 731 patient’s CT images were randomly extracted for training. The remaining CT images of 130 patients were used as the testing set.
Although we captured enough data of the COVID-19 chest CT images, accurate annotated labels are also indispensable. To enable the model to learn on accurate annotations, we build a team of six annotators with deep radiology background and proficient annotating skills to annotate the areas and boundaries of the lung and COVID-19 infection regions. Also, the quality of the final annotations is assessed by a senior radiologist with frontline clinical experience of COVID-19.
In this section, we start with the overview of the proposed approach, then introduce the feature variation block and progressive atrous spatial pyramid pooling block. We briefly discuss the training strategy and implementation details in the end.
We present a unified high-accuracy network for the segmentation of COVID-19 infection from chest CT images. This network consists of two parts: Encoder and Decoder. As shown in Fig. 2, the encoder with 4 layers (i.e.
E1, E2, E3, E4) obtains robust information via feature extractor and PASPP. Each layer employs residual and FV blocks as the basic operations for feature extractors, except the E4 layer. The residual block adds up the input features and the results after two convolutional layers, which effectively alleviates the vanishing gradient. To preserve multiple contextual information and enlarge the receptive field, we use PASPP with different dilate rates on the final E4 layer. After obtaining the encoded features, the decoder tries to restore the features to its original input size, which can remove the information loss induced by down-sampling from Encoder. The decoder has three layers (D3, D2, D1). Each decoder layer allows the networks to gradually propagate the global contextual information to a higher resolution layer. After a sigmoid activation function, we obtain the final segmentation of COVID-19 infection regions. In addition, the skip connection is adopted to concatenate the output features of the encoder and input features of the decoder. In this paper, the main contribution is we improve the encoder by addingFV block and PASPP block to better capture effective features. The overview of these two blocks is as follows.
We introduce the architectures of FV block by considering a material fact, the boundaries of COVID-19 infection regions are highlighted by adjusting the window breadth and window locations. As shown in Fig. 3, the proposed FV block includes three branches, e.g. contrast enhancement branch, position sensitive branch, identity branch, which can automatically change the parameter to display the boundaries and position of COVID-19. Specifically, the contrast enhancement branch learns a global parameter via a channel attention unit to highlight useful boundary information. The position sensitive branch obtains a weight map by spatial attention unit to focus on the COVID-19 regions. Finally, the FV block preserves more useful information by fusing these refined features.
The PASPP block takes the featured extracted with FV block as input and acquires semantic information with different receptive fields showing in Fig. 4. Although ASPP has been proposed to capture global information for semantic segmentation, we claim that aggregating information progressively is a more reasonable approach to get effective features. The PASPP block adopts atrous convolutions with different dilation rates to obtain features with various scales. The final output is generated straightforwardly to assemble residual branches in parallel.
As mentioned before, the boundaries of COVID-19 infection regions are highlighted by adjusting the window breadth and window locations. In Fig. 3, the designed FV block, which includes contrast enhancement branch, position sensitive branch and identity branch, tries to enhance the contrast of features and highlight the useful regions. Let denotes the input feature, the features after represent . The output feature is given as,
where denotes the convolutional layer, is the concatenation operation, represents the contrast enhancement branch, is the position sensitive branch. The form of residual learning in Eq. (1) implies that the information from the early blocks can quickly flow to the later blocks, and the gradient can be quickly back-propagated to the early blocks from the later blocks . The details of each sub-module are as follows.
where denotes the convolutional layer, represents global average pooling. The values of is in the range . We obtain a channel weight map via expansion, thus the number of is consistent with . Finally, the output of contrast enhancement branch can be formulated as below,
where denotes the element-wise multiplication. Note that, instead of calculating a sequence of weight for feature , we generate one weight for all the features of . This process is exactly corresponding to adjust the window breadth and window locations. Thus we deem it has the ability to generate enhanced features.
The goal of position sensitive branch is to discard harmful information and highlight the helpful features, which are used to segmentation of COVID-19 infection. This branch in Eq. (1) is a small network. The architecture of position sensitive branch is displayed in Fig. 3. The attention map is calculated using input feature after two convolutional layers. Each layer adoptsis obtained by element-wise multiplication between and the attention map.
The values in are still in the range . The attention map has same size as input feature.
In this subsection, we start with preliminary knowledge of atrous spatial pyramid pooling, then introduce the proposed PASPP block.
Global information captured by a large receptive field is essential for medical semantic segmentation. To increase the receptive field size and decrease the number of convolutional layers, arous convolution is first proposed in  to obtain enough global information while keeping the size of the feature map unchanged. In one dimensional case, let represents output and denotes input, atrous convolution can be formulated as follows:
where denotes the filter size, represents the dilation rate, and is the -th parameter of filter. A larger dilation rate will capture a larger receptive field. To produce different receptive fields, atrous spatial pyramid pooling taking atrous convolutions with different dilation rates to generate features with various scales. These features are concatenated together. Thus the outputs are indeed a sampling of the input with different scales information.
In COVID-19 segmentation task, the infection regions often have very different sizes (See Fig. 1). To alleviate this dilemma, the features must be able to include different receptive fields. For this goal, we employ ASPP in our network and progressively fuse the features with different receptive fields. The structure of PASPP is illustrated in Fig. 4. Given the input feature of PASPP , we obtain four features by four convolutional layers in parallel. Note that, compared to input features, the number of the channel decreases to quarter after each convolutional layer (See the second column in Fig. 4). Then each branch feeds the feature into different atrous convolutional layer, respectively. The corresponding function is given as,
where denotes the atrous convolutional layer with dilation rate , represents the output feature of the -th branch after . Sum the inputs of two adjacent atrous convolution branches, and add the sum to the output of each residual branch as the input of the subsequent layer. It is formulated as below,
where denotes the output features of -th branch. To get effective features, will be progressively aggregated based on adjacent features in parallel.
The tends to fuse the information with small receptive field, prones to capture features with larger receptive field. The channel’s number of and is half of the input feature. All the information are assembled by:
where denotes the output features of PASPP block.
The dataset used in this study consists of 21,658 annotated chest CT images, with 861 patients confirmed COVID-19. A total of 731 patient’s CT images are randomly extracted for training. The remaining CT images of 130 patients are used as the testing set.
The screening performance of the proposed method is conducted by the Dice similarity coefficient, sensitivity, and precision. The Dice similarity coefficient (Dice) represents a similarity metric between the ground truth, and the prediction score maps . It is calculated as follows:
where is the segmented infection region, denotes the corresponding reference region, represents the number of pixels common to both images. Sensitivity denotes the number of correctly identified positives with respect to the number of positives. Precision is the fraction of positive instances among the retrieved instances.
The parameters of the network.
For the proposed framework, the encoding layers are residual blocks, FV blocks, PASSP blocks, and downsampling, while the decoding layers are residual blocks and deconvolution layers kernels with a stride of 1/2. The last layer is a softmax activation function to produce the segmentation results. All layers use
kernels, if not specified otherwise. Each convolutional layer is followed by batch normalization and ReLU. The channel numbers are doubled each layer from 64 to 512 during encoding and halved from 512 to 64 during decoding. We set the combination of dice lossand cross-entropy loss
as the loss function using the ground-truth label map. The final loss function is.
We implement our COVID-SegNet using Pytorch. For network training, we train all models from scratch with random initial parameters. The entire models are conducted on a server with six Nvidia TITAN RTX GPUs with 24 GB memory. We randomly crop thepatches as the training samples. For optimization, we use Adam optimizer by setting , , and batch size is 2. In experiments, the initial learning rate is , and the learning rate decay of . The proposed network will preform both lung and COVID-19 segmentation tasks.
We compare our COVID-SegNet against the previous state-of-the-art methods on two datasets (the collected domestic test set and Germany data). Specifically, we evaluate the proposed method with FCN , UNet , VNet  and UNet++ . Note that all methods employ 3D convolution in the framework. The same training dataset and setting are used for all methods.
COVID-19 segmentation task: Fig. 5 (a)-(e) illustrate the results of different methods, red line denotes the COVID-19 segmentation result of ground truth. Since the contrast (COVID-19 and lung) of this case is not enough, these methods cannot obtain approving results. The FCN method cannot obtain the whole edge of COVID-19. The results of UNet++ and VNet are often scattered and overlook the overall structures of COVID-19. The proposed method and UNet achieve better results; however, UNet result products flaw in the center of the lung (white points in (b)). Since the proposed method employs FV blocks which adaptively enhance the global contrast of features, the proposed method can avoid the scattered artifacts. In addition, the PASPP blocks further improve the performance of our method. Fig 5 (f) represents the 3D surface rendering of COVID-19 infection regions segmented by our method.
Fig. 6 (a)-(e) display the example of low contrast CT images, COVID-19 infection regions are similar with chest wall. Most of the methods can obtain massive structures of COVID-19. However, the proposed method generates a more reasonable edge for infection regions due to the contributions of FV blocks. Fig. 7 shows a different case captured from a non-severe patient, but the COVID-19 infection regions still hard to distinguish from the chest wall. Thus, the methods of FCN, UNet++, VNet generate dissatisfying results. The proposed method combined global and local information effectively obtains well-pleasing segmentation results for COVID-19 infection.
Lung segmentation task: For the lung segmentation task, we test the performance of the proposed network on the test set. As shown in Fig. 8, (a)-(b) display the results of different methods, (f) is the 3D surface rendering of our method. From Fig. 8, we can easily observe that all results can close to the precision like manually annotated. UNet++ method often miss the boundary of the lung. VNet method cannot generate a smooth margin for the lung segmentation.
To verify the generalization ability of all methods, we use ten cases of data captured from Brainlab Co. Ltd. in Germany to test the segmentation of COVID-19 infection and the lung.
COVID-19 segmentation task: Fig. 9 shows the comparisons on the chest CT images on the Germany data. The intensity of COVID-19 infection regions is very similar to that of the lung, which is a very challenging example. As displayed in Fig 9, all state-of-the-art methods (i.e. FCN, UNet, UNet++, VNet) generate perishing and over-segmentation. Different from others, the proposed methods can obtain perfect results, which like a manual annotation (See Fig 9 (e)). The 3D surface rendering of the proposed method is shown in Fig 9 (f), from which we can see that the small COVID-19 infection regions also can be segmented.
Lung segmentation task: The segmentation results of all methods on the Germany data are shown in Fig 10. Most of all methods can generate a distinct outline of the lung. However, from the regions marked with the red arrow, our method has a stronger segmentation ability than other state-of-the-art methods. These perfect results demonstrate the effectiveness of the FV and PASPP blocks.
|Blocks||COVID-19 Lesion||Lung Segmentation|
Based on the ground truths manually contoured by the radiology experts, we conduct the evaluations and comparisons to evaluate the accuracy of segmentation quantitatively. The results are reported in Table I, which includes lung segmentation and COVID-19 infection segmentation.
For the segmentation of COVID-19, as shown in Table I, the results of the proposed method achieves best in all the metrics. Thanks to the FV and PASPP block, the COVID-SegNet can effectively segment COVID-19 infection regions and significantly improve the segmentation performance over the UNet by 3.8% in term of Dice. All these metrics demonstrate the effectiveness of our model.
For the lung segmentation task, the average Dice similarity coefficient is 0.987. The average sensitivity and precision are 0.986 and 0.990, respectively. Although the existing methods have achieved enough promotion and the performance is hard to improve, the proposed COVID-SegNet still surpasses state-of-the-art methods on the term of precision. We consider these results are attributed to the contributions of the proposed FV and PASPP blocks.
As shown in Table II, the baseline model is a UNet structure with 4 layers in the encoder. We conduct the contrast enhancement branch (CEB), position-sensitive branch (PSB), and FV block, respectively. In addition, we also replace CEB with the original channel attention block (CAB, removed the global parameter in CEB) to verify the function of global contrast enhancement. For verifying the PASPP block, we use ASPP and ResASPP, which removes the concatenation in PASPP to prove the advantage of possessively fusing features.
The quality of the FV block, which is the combination of the contrast, global and position information, is critical for enhancing the ability of accurate COVID-19 segmentation. In this section, we first evaluate the performance of the contrast enhancement branch (CEB) from both lung and COVID-19 segmentation. Then, we study the function of the position sensitive branch (PSB). All the comparisons are both preformed on two tasks (lung and COVID-19 segmentation). All the results in Table II demonstrate the effectiveness of the FV blocks.
Context information is of great significance for segmenting the confusing boundary and position of COVID-19 infection regions. To verify the performance of CEB, we employ the original channel attention block (CAB) to replace the CEB and PSB in the FV block. From Table II, we can see that the ASPP improves the segmentation performance over the UNet4. The reason is that the features have redundant information. However, the performance is further improved when we replace the CAB with CEB. Since the CAB merely learns the weights for each channel, the CEB uses global information to guide feature enhancement, which proves the ability of the CEB.
For PSB, it is actually a spatial attention module which has proved the effectiveness in many tasks. This branch focuses on the positions of features which are helpful to detect and segment COVID-19 infection regions. As we expected, the network with PSB generates satisfying numerical results. Combining these two branches in parallel, we obtain the FV block which consists of global (ECB) and local (PSB) information to improve the segmentation task.
PASPP consists of multiple atrous convolutional layers with different dilation rates and progressive concatenations. In this part, we conduct experiments to study how different settings of PASPP influence the performance quantitatively. We compare the PASPP block with original ASPP and modified ResASPP (removed progressive concatenations). The results are reported in Table II, from which we obtain several conclusions. First, progressively fusing strategy is very effective for COVID-19 segmentation. We deem the reason is different scale features should not be fused at once for the sophisticated COVID-19 segmentation. With the progressively fusing, the adjacent information can better supplement the missing details. Second, compared with ASPP and ResASPP, sine the ResASPP includes residual learning, it obtains reasonably high performances. This implies that the information from the early blocks can quickly flow to the output of atrous convolutional layers, and the gradient can be quickly back-propagated to the early blocks from the atrous convolutional layers. Third, the ASPP significantly improves the segmentation performance over the UNet4.
In general, to extract compacted features and obtain semantic information from COVID-19 CT images, we insert FV blocks into the encoder and employ PASPP for enlarging the receptive fields. As reported in Table II, the proposed network not only achieves the best performance on lung segmentation but also on COVID-19 segmentation.
In this paper, we designed and evaluated a three-dimensional deep learning model, called COVID-SegNet, for segmenting lung and COVID-19 from chest CT images. Inspired by contrast enhancement methods and ASPP, the proposed network includes feature variation and progressive ASPP blocks, which are beneficial to highlight the boundary and position of COVID-19 infections. These results demonstrate that the convolutional network based deep learning technology has the ability to segment COVID-19 from CT images. We were able to collect a large number of CT images from 5 hospitals, which included 861 patients with confirmed COVID-19. More importantly, we manually annotated these data by senior annotators. These contributions prove the prospect of improving diagnosis and treatment for COVID-19. In the future, we will extend the number of CT images form patients through multi-center collaborations.