I Introduction
Colorectal cancer (CRC) is the second leading cause of cancer-related death and the third leading common cause of cancer worldwide [26]. The five-year survival rate is 90% for 39% of the patients that are diagnosed with localized stage disease but declines to 71% and 14% once diagnosed with regional and distant stage respectively [25]. Colonoscopy is considered the primary technique for colon cancer screening because it offers both detecting and removal of the polyp in a single operation. U.S. Preventive Services Task Force recommends forty-five to be considered as the new fifty for screening of CRC [8]. Colonoscopy can reduce the mortality through early detection at treatable stage and remove precancerous adenomas [31, 3].
During the colonoscopy operation, the average miss rate of the polyp is around 22-28% [21]. It is mainly because colonoscopy is an operator-dependent procedure and high inter-observer variations are seen in endoscopists’ skills in detecting polyps [12]. During routine colonoscopy, the most frequently missed polyps are flat and smaller polyps [11, 24, 29]. Studies have shown that a 1% increase in adenomas detection leads to 3% decrease in the risk of interval colon cancer [7]. Therefore, it is highly critical to decrease the polyp miss-rate via an automated systems for CRC screening.
A Computer-Aided diagnosis (CADx) system can highlight the suspicious frames and improve colonoscopy procedures. Jha et al. [16] proposed DoubleU-Net that used two U-Net’s where the output of first U-Net acts as a soft-attention to the other. The network uses VGG-19 as an encoder and efficient blocks such as squeeze and excitation network [14] and atrous spatial pyradimal pooling [5] to capture some semantically meaningful information. DoubleU-Net showed state-of-the-art (SOTA) results on different biomedical image segmentation datasets. Wu et al. [30] proposed a lightweight context-aware network, PolypSeg+, for real-time polyp segmentation. The proposed architecture can capture distinguishable polyp features even with less trainable parameters and retain real-time speed. Tomar et al. [28] proposed a feedback attention network (FANet) for improved biomedical image segmentation, where they showed the SOTA
performance on seven publicly available benchmark datasets. FANet unifies the mask of the previous epoch with the current training epoch and rectifies the prediction iteratively during the test time for improved performance. Ji et al.
[19] proposed a progressively normalized self-attention network (PNS-Net) for video polyp segmentation. Shen et al. [23] proposed a hard region enhancement network (HRENet) for automatic polyp segmentation.
Despite the several automated methods proposed to improve the accuracy of polyp segmentation, further investigations are required to show the generalizability of the existing proposed method. Currently, most of the algorithms are only trained and tested on the same datasets [19, 18, 33, 28, 9]. Therefore, we aim to develop a novel deep learning algorithm to work well on varying distribution datasets coming from different institutions across different countries. To this end, we introduce a multiple kernel dilated convolution network, MKDCNet, architecture and test its performance on four still image datasets (polyps) and one cell nuclei data set.
The main contribution of the work can be summarized as follows:
-
We propose a novel deep learning architecture, MKDCNet, that utilizes novel multiple kernel dilated convolution block to increase the field of view of convolution kernel to capture local and global features. The multiscale feature fusion block fuses different decoder blocks output for more robust feature representation that helps in accurate polyp segmentation.
-
Our proposed method showed SOTA results on four publicly available polyp dataset (same train-test set), and nuclei segmentation dataset. Similarly, the proposed method outperformed other benchmarks methods on three cross-center polyp dataset. Extensive experimental results shows the strong learning and generalization ability of MKDCNet.
Ii Method
The proposed MKDCNet architecture is shown in Figure 1. The architecture begins with a pre-trained ResNet50 [10] as the encoder from which we extract four different feature maps. Each of these feature map is then passed through a sequence of
convolution layer, batch normalization and a ReLU activation function. The output from the ReLU activation function is then passed through our novel Multiple Kernel Dilated Convolution (MKDC) block, which consists of multiple parallel convolution layers with different kernel sizes and dilation rates. After that, we have three decoder blocks, the output from all the three decoder blocks is passed through a Multiscale Feature Fusion (MSFF) block where we upsample and fuse the feature map to produce a more robust semantic representation. Finally, this feature map is then passed through a
convolution followed by a sigmoid activation function generating a binary segmentation mask.Ii-a Multiple kernel dilated convolution (MKDC) block
The MKDC block begins with four parallel convolution layers with a kernel size of , , and respectively. The kernel size’s progressive increase helps capture a broad range of features, allowing the network to learn a more robust representation. Each convolution layer is then followed by batch normalization and a ReLU activation function. Next, each of these feature maps are then concatenated and passes through four parallel convolution layer, each having a dilation rate of , , and , respectively. The use of different dilated convolutions helps to further expand the field of view and allows the network to capture more details and refine the significant features. In this sense, the MKDC is similar to multi-resolution strategies but in our case we capture rich details with convolutional kernels instead of using multiple parallel architectures or iterative and simultaneous connection from each resolutions. Each of the convolution layer is then followed by batch normalization and ReLU activation function. After that, we perform a concatenation over these features and feed them to a
convolution followed by a residual connection. Finally, the generated feature maps are passed through a channel and spatial attention mechanism which further highlight the significant features.
Ii-B Decoder block
The decoder block begins with a bilinear upsampling which increases the spatial dimensions (height and width) of the input feature map by a factor of two. After that, the upsampled feature map is then concatenated with the output of another MKDC block, that brings more semantic information to the decoder increasing its feature representation. Next, we have two residual block, where each residual block consists of a convolutional block and an identity mapping connecting the input and output of the convolutional block. The convolutional block begins with two convolution layer, where each is followed by a batch normalization and a ReLU activation function.
Ii-C Multiscale feature fusion (MSFF) block
We use the proposed MSFF block to enhance the feature at different scales by aggregating them to produce a more robust feature representation. The MSFF block takes the output from the first decoder block and passes it through a bilinear upsampling layer to increase its spatial dimensions by a factor of two. After that, it is followed by a convolution layer, batch normalization and a ReLU activation function. The output of the ReLU activation function is then concatenated with the output from the second decoder block. Next, we again follow a bilinear upsampling layer where the concatenated feature map is upsampled by a factor of two and then followed by a convolution layer, batch normalization and a ReLU activation function. The output from the ReLU activation function is then concatenated with the output from the third decoder block. After this, the feature map is again upsampled and passed through a convolution layer, batch normalization and a ReLU activation function. The feature map is then passed through channel and spatial attention mechanism that focus on significant features and thus improve the feature representation and its robustness.
Dataset | Images | Size | Application |
Kvasir-SEG [17] | Variable | Colonoscopy | |
BKAI-IGH [20] | Colonoscopy | ||
CVC-ClinicDB [1] | Colonoscopy | ||
MedAI Challenge test set [13] | 200 | Variable | Colonoscopy |
2018 Data Science Bowl [4] |
670 | Nuclie |
Method | DSC | mIoU | Rec. | Prec. | Acc. | F2 | FPS |
Dataset: Kvasir-SEG [17] | |||||||
U-Net[22] | 0.8264 | 0.7472 | 0.8504 | 0.8703 | 0.9510 | 0.8353 | 156.83 |
ResU-Net[32] | 0.7642 | 0.6634 | 0.8025 | 0.8200 | 0.9341 | 0.7740 | 196.85 |
U-Net++ [33] | 0.8228 | 0.7419 | 0.8437 | 0.8607 | 0.9491 | 0.8295 | 126.14 |
ResU-Net++ [18] | 0.6453 | 0.5341 | 0.6964 | 0.7080 | 0.9044 | 0.6575 | 57.99 |
HarDNet-MSEG [15] | 0.8260 | 0.7459 | 0.8485 | 0.8652 | 0.9492 | 0.8358 | 42.00 |
DeepLabV3+ (ResNet50) [6] | 0.8837 | 0.8173 | 0.9014 | 0.9028 | 0.9679 | 0.8904 | 102.62 |
DDANet [27] | 0.7415 | 0.6448 | 0.7953 | 0.7670 | 0.9326 | 0.7640 | 88.70 |
MKDCNet (Ours) | 0.8887 | 0.8267 | 0.9076 | 0.9088 | 0.9677 | 0.8954 | 47.54 |
Dataset: BKAI-IGH [20] | |||||||
U-Net [22] | 0.8286 | 0.7599 | 0.8295 | 0.8999 | 0.9903 | 0.8264 | 160.27 |
ResU-Net[32] | 0.7433 | 0.6580 | 0.7447 | 0.8711 | 0.9843 | 0.7387 | 128.93 |
U-Net++ [33] | 0.8275 | 0.7563 | 0.8388 | 0.8942 | 0.9895 | 0.8308 | 123.45 |
ResU-Net++ [18] | 0.7130 | 0.6280 | 0.7240 | 0.8578 | 0.9832 | 0.7132 | 55.86 |
HarDNet-MSEG [15] | 0.7627 | 0.6734 | 0.7532 | 0.8344 | 0.9863 | 0.7528 | 41.20 |
DeepLabV3+ (ResNet50) [6] | 0.8937 | 0.8314 | 0.8870 | 0.9333 | 0.9937 | 0.8882 | 99.16 |
DDANet [27] | 0.7269 | 0.6507 | 0.7454 | 0.7575 | 0.9851 | 0.7335 | 86.46 |
MKDCNet (Ours) | 0.8978 | 0.8392 | 0.8955 | 0.9365 | 0.9934 | 0.8947 | 45.98 |
Dataset: 2018 Data Science Bowl [4] | |||||||
U-Net [22] | 0.9122 | 0.8476 | 0.9021 | 0.9339 | 0.9799 | 0.9052 | 160.53 |
ResU-Net [32] | 0.9183 | 0.8546 | 0.9236 | 0.9198 | 0.9809 | 0.9207 | 188.74 |
U-Net++ [33] | 0.9114 | 0.8479 | 0.9107 | 0.9269 | 0.9799 | 0.9101 | 119.45 |
ResU-Net++ [18] | 0.9157 | 0.8508 | 0.9162 | 0.9211 | 0.9798 | 0.9153 | 55.91 |
HarDNet-MSEG [15] | 0.8344 | 0.7327 | 0.8686 | 0.8251 | 0.9640 | 0.8538 | 40.53 |
DeepLabV3+ (ResNet50) [6] | 0.9027 | 0.8306 | 0.9220 | 0.8902 | 0.9774 | 0.9134 | 98.53 |
DDANet [27] | 0.9117 | 0.8452 | 0.8452 | 0.9297 | 0.9792 | 0.9053 | 90.33 |
MKDCNet (Ours) | 0.9204 | 0.8586 | 0.9270 | 0.9194 | 0.9815 | 0.9237 | 46.56 |
Iii Experimental setup
In this section, we will present the datasets, evaluation metrics, and implementation details used in this study.
Method | DSC | mIoU | Rec. | Prec. | Acc. | F2 | FPS |
Train Dataset: Kvasir-SEG[17], Test Data: Unseen CVC-ClinicDB [1] | |||||||
U-Net [22] | 0.6336 | 0.5433 | 0.6982 | 0.7891 | 0.9484 | 0.6563 | 166.05 |
ResU-Net [32] | 0.5970 | 0.4967 | 0.6210 | 0.8005 | 0.9465 | 0.5991 | 195.38 |
U-Net++ [33] | 0.6350 | 0.5475 | 0.6933 | 0.7967 | 0.9504 | 0.6556 | 127.80 |
ResU-Net++ [18] | 0.4642 | 0.3585 | 0.5880 | 0.5770 | 0.9159 | 0.5084 | 57.96 |
HarDNet-MSEG [15] | 0.6960 | 0.6058 | 0.7173 | 0.8528 | 0.9592 | 0.7010 | 42.38 |
DeepLabV3+ (ResNet50) [6] | 0.8142 | 0.7388 | 0.8331 | 0.8735 | 0.9717 | 0.8198 | 103.17 |
DDANet[27] | 0.5234 | 0.4183 | 0.6502 | 0.5935 | 0.9275 | 0.5718 | 91.32 |
MKDCNet (Ours) | 0.8243 | 0.7466 | 0.8494 | 0.8637 | 0.9709 | 0.8325 | 46.71 |
Train Dataset: Kvasir-SEG[17], Test Data: Unseen BKAI-IGH [20] | |||||||
U-Net [22] | 0.6347 | 0.5686 | 0.6986 | 0.7882 | 0.9753 | 0.6591 | 162.60 |
ResU-Net [32] | 0.5836 | 0.4931 | 0.6716 | 0.6549 | 0.9671 | 0.6177 | 199.02 |
U-Net++ [33] | 0.6269 | 0.5592 | 0.6900 | 0.7968 | 0.9741 | 0.6493 | 128.59 |
ResU-Net++ [18] | 0.4166 | 0.3204 | 0.6979 | 0.3922 | 0.9061 | 0.5019 | 57.22 |
HarDNet-MSEG [15] | 0.6502 | 0.5711 | 0.7420 | 0.7469 | 0.9713 | 0.6830 | 42.44 |
DeepLabV3+ (ResNet50) [6] | 0.7286 | 0.6589 | 0.7919 | 0.8123 | 0.9787 | 0.7493 | 103.25 |
DDANet[27] | 0.5006 | 0.4115 | 0.6612 | 0.4825 | 0.9507 | 0.5592 | 91.73 |
MKDCNet (Ours) | 0.7483 | 0.6782 | 0.8087 | 0.8155 | 0.9756 | 0.7651 | 42.741 |
Train Dataset: Kvasir-SEG[17], Test Data: MedAI Challenge test data (polyp) [13] | |||||||
U-Net [22] | 0.6716 | 0.5725 | 0.7462 | 0.7438 | 0.9279 | 0.6957 | 159.90 |
ResU-Net [32] | 0.6165 | 0.4991 | 0.6726 | 0.6977 | 0.9139 | 0.6315 | 192.90 |
U-Net++ [33] | 0.6638 | 0.5702 | 0.7258 | 0.7594 | 0.9333 | 0.6845 | 128.64 |
ResU-Net++ [18] | 0.4306 | 0.3246 | 0.5865 | 0.4677 | 0.8629 | 0.4793 | 60.20 |
HarDNet-MSEG [15] | 0.6821 | 0.5877 | 0.756 | 0.7689 | 0.9271 | 0.7006 | 43.91 |
DeepLabV3+ (ResNet50) [6] | 0.7784 | 0.6875 | 0.8332 | 0.8054 | 0.9544 | 0.7989 | 106.77 |
DDANet [27] | 0.5738 | 0.4643 | 0.6638 | 0.6131 | 0.9141 | 0.6058 | 90.22 |
MKDCNet (Ours) | 0.7961 | 0.7054 | 0.8397 | 0.8151 | 0.9532 | 0.8103 | 46.59 |
Train Dataset: BKAI-IGH [20], Test Data: MedAI Challenge test data (polyp) [13] | |||||||
U-Net [22] | 0.5840 | 0.4837 | 0.5925 | 0.8147 | 0.9155 | 0.5726 | 166.94 |
ResU-Net [32] | 0.4620 | 0.3605 | 0.4822 | 0.6989 | 0.8930 | 0.4525 | 196.09 |
U-Net++ [33] | 0.5554 | 0.4530 | 0.6037 | 0.7475 | 0.8941 | 0.5591 | 126.01 |
ResU-Net++ [18] | 0.3288 | 0.2419 | 0.3560 | 0.4779 | 0.8666 | 0.3313 | 59.25 |
HarDNet-MSEG [15] | 0.4466 | 0.3550 | 0.4204 | 0.7427 | 0.9017 | 0.4210 | 42.84 |
DeepLabV3+ (ResNet50) [6] | 0.6541 | 0.5675 | 0.6711 | 0.8535 | 0.9284 | 0.6552 | 100.86 |
DDANet [27] | 0.5322 | 0.4281 | 0.5764 | 0.6547 | 0.8952 | 0.5351 | 91.02 |
MKDCNet (Ours) | 0.6985 | 0.6078 | 0.7210 | 0.8360 | 0.9366 | 0.7009 | 48.05 |

Iii-a Datasets and evaluation
For this study, we have select four publicly available polyp datasets and a nuclie segmentation dataset. The details about the number of images, their size and their application can be found in Table I. We have utilized Kvasir-SEG [17], BKAI-IGH [20], CVC-ClinicDB [1], and MedAI challenge test set [13] datasets for the polyp segmentation task. For the cell nuclei segmentation task, we have used the 2018 Data Science Bowl [4] dataset. To evaluate the performance of all the models, we have used metrics such as dice coefficient (DSC), mean intersection over union (mIoU), precision, recall, accuracy, F2-score and Frame Per Second (FPS).
Iii-B Implementation details
We have implemented the proposed MKDCNet and the SOTA methods using the PyTorch framework. For a fair comparison, we have used the same set of hyperparameters for all models used in this study. All models are trained on NVIDIA RTX 3090 GPU, where both the images and masks are first resized to
pixels for better utilization of GPU. The datasets are then split into training, validation and testing in the ratio of 80:10:10, except for Kvasir-SEG, where a split of 880/120 is used for training and testing respectively. An online data augmentation strategy is used on the training dataset which includes random rotation, horizontal flipping, vertical flipping and coarse dropout. The data augmentation help to increase the robustness of the model. All the models are trained with an Adam optimizer having a learning rate of 1e with a batch size of 16. A combination of binary cross-entropy loss and dice loss is used. ReduceLROnPlateau is used while training to reduce the learning rate for better performance. An early stopping criterion is also used to stop the training when the model stops improving.No. | Method | DSC | mIoU | Recall | Precision |
#1 | MKDCNet w/o Multiple Kernel Dilated Convolution | 0.8763 | 0.8138 | 0.8997 | 0.9071 |
#2 | MKDCNet w/o Multiscale Feature Fusion | 0.8720 | 0.8045 | 0.8974 | 0.8931 |
#3 | MKDCNet w/o Multiple Kernel Dilated Convolution & Multiscale Feature Fusion | 0.8785 | 0.8073 | 0.9003 | 0.8953 |
#4 | MKDCNet | 0.8887 | 0.8267 | 0.9076 | 0.9088 |
Iv Result
At first, we perform validation of the algorithms on same datasets (same distribution). Next, we test the trained model on completely unseen polyp datasets from different medical centers (different distribution).
Iv-a Performance test on the same dataset
Table II shows the result of the MKDCNet and SOTA methods. On the Kvasir-SEG dataset, MKDCNet achieved a Dice Coefficient (DSC) of 0.8887 and mean Intersection over Union (mIoU) of 0.8267 and outperform the most competitive benchmarking method DeepLabv+ with ResNet50 encoder with a margin of 0.5% in DSC and 0.94% in mIoU. Similarly, MKDCNet has a higher recall, precision, F2-score and nearly equal accuracy. Both DeepLabv3+ and MKDCNet have a real-time speed. Similarly, with the BKAI-IGH [20], our method outperformed DeepLabv3+ with a margin of 0.41% in DSC and 0.78% in mIoU. Additionally, we perform additional experiments on the 2018 Data Science Bowl [4] dataset, where we show that our method consistently outperforms all other baseline methods. Figure 2 shows the example of qualitative results along with the heatmap. The qualitative results show that MKDCNet has better segmentation as compared to the UNet [22] and DeepLabv3+ [6].
Iv-B Performance test on completely unseen dataset
Table III shows the results on the unseen dataset. For the unseen CVC-ClinicDB [1], our MKDCNet outperformed DeepLabv3+ with 1.01% in DSC and 0.78% in mIoU showing the generalization capability of our proposed method. Similarly, for the unseen BKAI-IGH dataset [20], our method outperformed best performing DeepLabv3+ by 1.97% in DSC and 1.93% in mIoU. For MedAI challenge test dataset, we only evaluate the performance on 200 positive polyp images provided by the task organizers. The models trained on Kvasir-SEG show better performance on the MedAI challenge dataset and slightly weak performance with the BKAI datasets. It is because BKAI-IGH dataset is captured at a different hospital (Institute of Gastroenterology and Hepatology (IGH), Vietnam), whereas the MedAI challenge dataset comes from the HyperKvasir [2] whose distribution is similar to Kvasir-SEG (as both of them are captured at Vestre Viken Hospital Trust, Norway), despite the image frames being different. For both models trained on Kvasir-SEG and BKAI-IGH, proposed MKDCNet outperforms DeepLabv3+ by 1.77% and 4.4% in DSC respectively.
Iv-C Ablation study
In Table IV, we present the ablation study on Kvasir-SEG dataset. When we compare setting #3 and setting #4, there is a 1.02% improvement in DSC and a 1.94% in mIoU with the multiple Kernel dilated convolution and multiscale feature fusion block in the network. Similarly, the Table IV also shows an improvement over both of the individual blocks.
V Conclusion
We presented a novel architecture, MKDCNet, that utilizes ResNet50 as an encoder and the novel multiple kernel dilated convolution block to learn more robust representation to automatically segment polyps from colonoscopy images with high performance. Extensive experimental results on four publicly available datasets both on the same set and completely unseen datasets showed that MKDCNet has the promising capability to improve the accuracy of the system. Similarly, MKDCNet obtained a real-time processing speed of nearly 45 frames per second. The results exhibit that MKDCNet has better generalizability and real-time speed. Thus, MKDCNet can be a strong new baseline for developing artificial intelligence-based support to improve the traditional colonoscopy procedure. In the future, we plan to exploit MKDCNet under federated learning settings where we can train multiple institute datasets and minimize the privacy concerns raised by each center.
References
- [1] (2015) WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Computerized Medical Imaging and Graphics 43, pp. 99–111. Cited by: TABLE I, §III-A, TABLE III, §IV-B.
- [2] (2020) HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Scientific data 7 (1), pp. 1–14. Cited by: §IV-B.
- [3] (2011) Protection from colorectal cancer after colonoscopy: a population-based, case–control study. Annals of internal medicine 154 (1), pp. 22–30. Cited by: §I.
- [4] (2019) Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nature methods 16 (12), pp. 1247–1253. Cited by: TABLE I, TABLE II, Fig. 2, §III-A, §IV-A.
- [5] (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40 (4), pp. 834–848. Cited by: §I.
-
[6]
(2018)
Encoder-decoder with atrous separable convolution for semantic image segmentation.
In
Proceedings of the European conference on computer vision (ECCV)
, pp. 801–818. Cited by: TABLE II, TABLE III, §IV-A. - [7] (2014) Adenoma detection rate and risk of colorectal cancer and death. New england journal of medicine 370 (14), pp. 1298–1306. Cited by: §I.
- [8] (2021) Screening for colorectal cancer: us preventive services task force recommendation statement. Jama 325 (19), pp. 1965–1977. Cited by: §I.
- [9] (2020) Pranet: parallel reverse attention network for polyp segmentation. In Proceedings of the International conference on medical image computing and computer-assisted intervention (MICCAI), pp. 263–273. Cited by: §I.
- [10] (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778. Cited by: §II.
- [11] (2008) Miss rate for colorectal neoplastic polyps: a prospective multicenter study of back-to-back video colonoscopies. Endoscopy 40 (04), pp. 284–290. Cited by: §I.
- [12] (2010) Variation in the detection of serrated polyps in an average risk colorectal cancer screening cohort. American Journal of Gastroenterology 105 (12), pp. 2656–2664. Cited by: §I.
- [13] (2021) MedAI: transparency in medical image segmentation. Nordic Machine Intelligence 1 (1), pp. 1–4. Cited by: TABLE I, §III-A, TABLE III.
- [14] (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 7132–7141. Cited by: §I.
- [15] (2021) HarDNet-MSEG A Simple Encoder-Decoder Polyp Segmentation Neural Network that Achieves over 0.9 Mean Dice and 86 FPS. arXiv preprint arXiv:2101.07172. Cited by: TABLE II, TABLE III.
-
[16]
(2020)
Doubleu-net: a deep convolutional neural network for medical image segmentation
. In Proceedings of the International symposium on computer-based medical systems (CBMS), pp. 558–564. Cited by: §I. - [17] (2020) Kvasir-SEG: A segmented polyp dataset. In Proceedings of the International Conference on Multimedia Modeling (MMM), pp. 451–462. Cited by: TABLE I, TABLE II, Fig. 2, §III-A, TABLE III, TABLE IV.
- [18] (2019) ResUNet++: An advanced architecture for medical image segmentation. In Proceedings of the International Symposium on Multimedia (ISM), pp. 225–2255. Cited by: §I, TABLE II, TABLE III.
- [19] (2021) Progressively normalized self-attention network for video polyp segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 142–152. Cited by: §I, §I.
- [20] (2021) NeoUNet: Towards accurate colon polyp segmentation and neoplasm detection. arXiv preprint arXiv:2107.05023. Cited by: TABLE I, TABLE II, Fig. 2, §III-A, TABLE III, §IV-A, §IV-B.
- [21] (2012) Factors influencing the miss rate of polyps in a back-to-back colonoscopy study. Endoscopy 44 (05), pp. 470–475. Cited by: §I.
- [22] (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical image computing and computer-assisted intervention (MICCAI), pp. 234–241. Cited by: TABLE II, TABLE III, §IV-A.
- [23] (2021) HRENet: a hard region enhancement network for polyp segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 559–568. Cited by: §I.
- [24] (2015) Colorectal cancer screening and surveillance. American family physician 91 (2), pp. 93–100. Cited by: §I.
- [25] (2020) Colorectal cancer facts & figures 2020–2022. Published online, pp. 48. Cited by: §I.
-
[26]
(2021)
Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries
. CA: a cancer journal for clinicians 71 (3), pp. 209–249. Cited by: §I. - [27] (2021) DDANet: Dual decoder attention network for automatic polyp segmentation. In Proceedigns of the International Conference on Pattern Recognition workshop, pp. 307–314. Cited by: TABLE II, TABLE III.
- [28] (2022) Fanet: a feedback attention network for improved biomedical image segmentation. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §I, §I.
- [29] (2018) Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nature biomedical engineering 2 (10), pp. 741–748. Cited by: §I.
- [30] (2022) PolypSeg+: a lightweight context-aware network for real-time polyp segmentation. IEEE Transactions on Cybernetics. Cited by: §I.
- [31] (2012) Colonoscopic polypectomy and long-term prevention of colorectal-cancer deaths. N Engl J Med 366, pp. 687–696. Cited by: §I.
- [32] (2018) Road extraction by deep residual u-net. IEEE Geoscience and Remote Sensing Letters 15 (5), pp. 749–753. Cited by: TABLE II, TABLE III.
- [33] (2018) UNet++: a nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 3–11. Cited by: §I, TABLE II, TABLE III.