Medical imaging, e.g., magnetic resonance imaging (MRI) and computed tomography (CT), plays a crucial role in cancer diagnosis and treatment decision, where precise and robust segmentation of organs and tumors in medical images is of great value. Benefitting from its powerful feature representation capability, deep learning has achieved breakthrough performance in many medical image analysis tasks such as pulmonary nodule detection and brain tumor segmentation 
. With the advent of convolutional neural networks (CNNs), abundant work on medical image segmentation has been proposed, including skip-connections, distance transform maps , attention mechanisms , etc. The performance on some simple tasks has reached the level of radiologists. However, there remains many challenges to overcome in order to meet the practical requirements in the segmentation of organs and tumors. Specifically, tumor tissues tend to have irregular shapes due to their invasive nature, leading to shape variations. In most cases, tumors often overlap with organs, which causes obstacle for accurate segmentation of organs and tumors simultaneously. There may exist large size variations between inter- and intra- subjects caused by different cancer stages and inherent inter-category differences. Radiologist’s subjective annotations and the uncertainty of malignant tumor boundaries may introduce label noise. Extreme class imbalance between the AOI and the background region also cause difficulty for medical image segmentation.
To tackle the aforementioned challenges, some innovative building blocks have been incorporated into conventional CNNs to improve its robustness to shape variations. Dai et al.  firstly introduced deformable convolution. By adding additional offsets to the regular grid sampling locations of convolution kernels, it enhances CNN’s capability of modeling geometric transformation. Despite the improved modeling of geometric transformation, there remain some issues in deformable convolution. First of all, deformable convolution requires precise position information to calculate the offset, which is conflicted with CNN’s position insensitivity (a.k.a. translation invariance). On the other hand, the offsets are learned from the preceding feature map, although it is hard to guarantee that appropriate offsets are learned with the same receptive field. In this work, we propose a position-guided deformable network, namely PGD-UNet, to deal with the deformation of anatomical structures, such as organs and tumors. It consists of a U-Net backbone incorporated with deformable convolution and an auxiliary localization path. The localization path explicitly introduces position information to guide deformable convolution, which effectively improves the capability of modeling geometric transformation. Meanwhile, in order to accommodate the structures of various sizes in an image, we use Atrous Spatial Pyramid Pooling (ASPP)  as the bottleneck layer to extract multi-scale features.
In medical image segmentation, small structures also cause class imbalance, where the anatomy of interest only occupies a very small portion of the image. For example, in the bladder MRI image used in our experiments, the tumor region is composed of only 0.63% of all pixels. Existing approaches to addressing class imbalance can be categorized into two groups, i.e., multi-stage cascaded CNNs and re-weighting the losses contributed from different classes. The former approach detects the AOI and then segments out the target from that particular region. This approach is computational excessive and not easy to be extended to multi-class segmentation. The focal loss 
was proposed to make the network to focus on hard-to-classify samples which influence more on classification performance. However, mislabeled samples and hard-to-classify samples are prone to be confused. In this work, we propose a novel noise suppression focal loss to suppress the effect of mislabeled samples and thus prevent the network from overfitting.
We test the proposed approach on two challenging medical segmentation tasks: bladder tumors segmentation in MRI and pancreas tumors segmentation in CT. Both the bladder dataset and the pancreas dataset from the Medical Segmentation Decathlon Challenge (MSD)  need segment organs and tumors simultaneously, and suffer from class imbalance due to large (background), medium (pancreas, bladder wall) and small (tumor) structures. Experimental results show that our approach can improve on prediction accuracy on both datasets and achieve state-of-the-art performance.
Ii Related Work
Ii-a Spatial Transformation
Effective modeling of spatial transformation is a key challenge in visual recognition. The typical method is to augment the training samples with sufficient desired variations through translation, rotation, scaling, etc., which is simple but laborious. Furthermore, some transformation-invariant features are designed, such as scale-invariant feature transform (SIFT)  and local binary patterns (LBP) 
. Nevertheless, such handcrafted features need expert knowledge for careful design, but lack sufficient generalization power to different domains. Although deep CNNs have powerful representation capabilities, its invariance still implicitly relies on data augmentation, parameter sharing, and pooling operations etc. Spatial transformer networks (STN) is the first work that model geometric transformations in a computational and parametric manner. The spatial transformer module dynamically learns a set of global affine transformation parameters from feature map, and then transmits the transformed feature map to subsequent layers to simplify recognition. Instead of performing global affine transformations, deformable convolution  learns a dense kernel-wise offset, which endows ordinary convolution operations the flexibility to adapt to objects with more complex geometric transformations. Our work addresses two drawbacks of deformable convolution: position insensitivity and local receptive field.
Ii-B Class Imbalance
Class imbalance is quite common in medical image segmentation. A general solution is to exploit multi-stage cascaded CNNs , which directly eliminates most of the background through the first detection stage among the pipeline. Another genre is the re-weighting method. Cross-Entropy (CE) based weight loss [14, 15, 3] re-weights the different classes according to the frequency of corresponding labels. Focal loss  further integrates the difficulty of the sample for weighting. Gradient harmonizing mechanism (GHM) loss  directly calculates the gradient distribution of each batch, and alleviates class imbalance by flattening the gradient. Dice loss  based on regional integration is commonly used to handle unbalanced medical segmentation. Kervadec et al.  proposed a boundary loss, which formulates a distance metric on the space of contours to mitigate the difficulties of regional losses.
Ii-C Label Noise
In medical image analysis, the presence of label noise is quite common due to the uneven image quality and the high clinical expertise required for annotation. To solve this problem, Minimal annotation training  is developed to segment microscopy virus particles with coarse annotations. This method first generates masks for suspected noise regions, then ignores these regions when calculating dice similarity loss. In reference 
, a noise layer is added to the end of CNNs for breast lesion detection. Noise layer can be considered as a transformation matrix of noise and true labels, which are optimized with a combination of expectation maximization (EM) and error back-propagation. Some methods are based on sample re-weighting and feature consistency.
Iii-a Network Architecture
Fig. 1 illustrates the architecture of our PGD-UNet, where U-Net is adopted as the backbone. The backbone consists of an encoding path to extract semantic information and a symmetric decoding path for recovery. To accommodate irregular and complex geometric variations of organs and tumors, deformable convolutions are embedded into the middle three blocks of the two paths. Nevertheless, the deformable convolution operator (DCO) requires accurate position information to generate coordinate offset and mask, which is agnostic in the plain convolution feature map due to CNN’s inherent translation invariance. Consequently, we introduce an auxiliary position-sensitive localization path to provide DCO with additional position information. The localization path does not share the parameters of the encoding path, and position information is added by the form of coordinates. To handle size variations between organs and tumors, as well as the tumors of different stages, we adopt Atrous Spatial Pyramid Pooling (ASPP) as a bottleneck layer so that the network can represent multiple structures of different sizes simultaneously by extracting features with different receptive fields.
Iii-B Position-Guided Deformable Convolutional Layers
An essential strength of our proposed segmentation network is to model spatial transformations. To achieve this, the deformable convolution is introduced to enable a dense pixel-wise deformation. In addition, a novel position-aware path is included to further improve the current deformation paradigm.
Iii-B1 Deformable Convolution
The standard convolution can be regarded as using a regular grid to sample over the input , and then sum the sampled values weighted by . For example, a kernel is defined as:
The value at location on the output feature map is calculated as:
where is the kernel weight and enumerates the sampling location of .
The deformable convolution adjusts the position of grid sampling cell with offset and multiplies each offset sampling cell by a modulated weight , where , and is equal to the number of cells in the grid . For deformable convolution, Eq. 1 becomes
The offset is a pair of learnable parameters with unconstrained range, while mask varies in . The
is computed via bilinear interpolation.
As illustrated in Fig. 2, both offset and mask are learned through an additional convolution layer with the same input feature map , which has the same kernel size and dilation as the deformable convolution in the main branch. For example, a deformable kernel with dilation 1 samples over the input feature map with a shifted grid , while the offsets are learned through a regular grid , shown in Fig. 2. Consequently, a natural problem is that when the shifted sampling point is outside the regular grid (points with red outline in Fig. 2), it is agnostic that whether an appropriate offset can be learned, because the receptive field of this point has exceeded those calculate it (the normal spatial range of a 3x3 grid).
Iii-B2 Localization Path
CNNs are generally considered to be position insensitive or translation invariance because features are extracted in a local manner. Nevertheless, recent studies exploring the interpretability of neural networks have shown that CNNs learn to encode position information within the feature maps implicitly, i.e., the neurons in deep layers know not only what they are representing, but also where they are. The success of position-dependent tasks (e.g. object detection and segmentation) also confirms this viewpoint. To evaluate the capability to encode position information of CNNs, Liu et al. designed a simple coordinate mapping experiment. The results show that CNNs cannot recover the coordinates accurately. Therefore, CNNs can only learn a coarse position representation, but it is defective to calculate the accurate offset for deformable convolution. In this regard, we proposed an auxiliary localization path providing explicit position information to guide the offset computation and decouple semantic and position extraction.
Larger Receptive Field
As illustrated in Fig. 1, we stack three dilated convolution layers as the backbone of the localization path. To avoid the ‘gridding effect’ , we adopt for the three dilated convolution layers, respectively. The localization path takes the output feature map of the first block of UNet as input, which is the same as the subsequent layers in the encoder path. In order to maintain the same spatial resolution as the feature map at each block of the main branch, we adopt convolutions with for downsampling. Then the feature maps calculated by localization path are concatenated into the main branch along the channel dimension to guide the offset and mask calculation. As the stacked dilated convolutions employed in localization path introduce a larger receptive field than standard convolutions in encoding path, it helps avoid the above-mentioned problem of agnostic in shifted sampling point.
To obtain appropriate offset, the localization path needs to be position sensitive. Consequently, we utilize the ‘CoordConv’ operator  to explicitly send the coordinates of each pixel in the image as additional information to the network. Specifically, before sending the feature map of the first block to the localization path, we add an ‘addCoord’ layer. The ‘addCoord’ layer generates the coordinates at and axes for each pixel, and normalizes them to . The normalized coordinates are concatenated into the input feature map along the channel dimension. So the number of output channels will plus two.
Inspired by the work of Unpooling , we further propose a novel maximum pooling operation, called, CoordPool, to perform normal max-pooling operation while outputting the locations of the maxima within each pooling region. As illustrated in Fig. 3, the locations represent the coordinates of maxima in the pooling region, along and axes. In our network, the locations of each block, output from CoordPool, is concatenated to the corresponding feature map in the localization path.
As we explicitly introduce the coordinate information into the network, hence PGD-UNet constructs a position-sensitive deformable convolution. In PGD-UNet, CoordPool preserves the spatial information lost by max-pooling and passes it to the decoding path via skip-connections. In this way, our network has the capability of Unpooling.
Iii-C Noise Suppression Focal Loss
Tumor segmentation is a difficult problem due to the following challenges: 1). malignant tumors usually have unclear boundaries; 2). the quality of images generated by different devices vary significantly; 3). manual delineation of tumors subject to inter- and intra-observer variations. All kinds of problems make label noise almost inevitable in medical images, which seriously affects the training process of neural networks. Firstly, during the initial phase of network convergence, neural networks tend to learn common features shared among the data samples 
. At this point, the noise label will have a large error and appear as an outlier. Traditional loss functions, e.g., cross-entropy loss, will strengthen the penalty for noise, which causes the gradient to be dominated by mislabeled samples. Secondly, the proportion of tumor pixels in medical image is very small, which makes networks easily overfit the noise labels.
To solve this problem, we design a noise suppression focal loss to suppress the contribution of outliers to the gradient. In multi-class segmentation, the ground-truth of each pixel is encoded by a one-hot vector, where labelrepresents the true class. Let
denotes the predicted probability of the ground-truth class. The cross entropy (CE) loss can be written as:
As shown in Fig. 4, difficult examples () have greater losses than easy examples in CE loss. However, the difference of this magnitude can be overwhelmed easily in case of large class imbalance. Focal loss (FL)  further amplifies this difference by adding a modulating factor to CE loss.
As our experiments will show, focal loss is very useful for dealing with extreme class imbalance. But at the same time, mislabeled samples also lie in low predicted regions and get large gradient. To alleviate the effects of noise, we design a piecewise focal loss, namely noise suppression focal loss (NSFL). Let denotes the piecewise parameter, NSFL replaces the modulating factor in focal loss with when .
The varies in , hence the replaced factor suppresses gradient when is less than the threshold . The degree of suppression depends on the value of . When , it is equivalent to the factor being truncated, and when , the factor becomes linear function, as shown in Fig. 4.
Furthermore, if the networks train from scratch, it is recommended to apply noise suppression focal loss after a few epochs because the prediction probability obtained by a randomly initialized network is meaningless. In our experiments, the average value ofis used to decide when to switch to the noise suppression focal loss.
Finally, the overall loss function we formulate is a combination of weighted noise suppression focal loss and dice loss.
where is used to adjust the weight flexibly between two loss terms, according to the dataset.
To justify the effectiveness of our approach, two challenging tasks are evaluated, both requiring simultaneous segmentation of organs and tumors from medical images with a high class imbalance.
Iv-A1 Bladder tumor dataset
The bladder tumor dataset contains 2200 MRI slices from 25 patients with pathologically confirmed bladder cancer. A high-resolution Axial T2-weighted (T2W) MRI sequence was adopted. The imaging process contained from 80 to 124 slices per scan, each of size 512×512 pixels, with a pixel resolution of 0.5 × 0.5 . For each MRI scan, both bladder wall and tumor regions were manually delineated by an expert. Particularly, during the delineation process, all target regions were outlined slice-by-slice by the expert who was blinded to the pathological results of patients.
Iv-A2 Pancreas tumor dataset
The pancreas tumor dataset is a sub-dataset of the Medical Segmentation Decathlon (MSC) MICCAI 2018 challenge. It comprises 282 portal venous phase CT scans for training. An expert abdominal radiologist annotated the pancreatic parenchyma and pancreatic mass (cyst or tumor) in each slice. Please refer to  for more details.
Iv-B Implementation Details
Iv-B1 Data Pre-processing
We first extract slices from the 3D scans along the axial plane. All 2D slices were normalized to , and resized to pixels. To prevent extra noise from the interpolation operation, we did not use any data augmentation operations.
Our network was trained using Adam optimizer with an initial learning rate of 0.0001 and a batch size of 12. All datasets were randomly divided into 5 folds, with each fold been tested while the remaining data are further split into training set (75%) and validation set (25%). The experiments were performed on two NVIDIA GTX 1080 Ti GPU with a total of 22 GBs of graphics memory. One fold training takes about 12 hours for bladder dataset and 24 hours for pancreas dataset.
Iv-B3 Evaluation Metrics
To evaluate segmentation performance, we adopted the common Dice Similarity Coefficient (DSC) and Jaccard Similarity Coefficient as the quantitative metrics.
|Method||Bladder Wall||Bladder Tumors|
|UNet baseline |
|Attention UNet |
|Method||Categorization||Pancreas Dice||Pancreas Tumors Dice|
|nnUNet_3D Cascade ||3D Cascade||79.30||52.12|
|Deform UNet (without local path)||88.85||75.10||76.12||47.26|
|Deform UNet (plain Conv)||89.44||74.30||78.01||42.84|
|Deform UNet (Cd Conv)||89.23||74.98||77.24||45.62|
|Deform UNet (Cd Pool)||89.57||76.93||76.58||48.87|
|Deform UNet (Cd Conv/Pool)||89.32||80.38||77.01||50.12|
We compare our PGD-UNet with recent UNet-based improvement methods on bladder datasets, and report results on a 5-fold cross validation evaluation in Table I. Our PGD-UNet achieves the best performance for both bladder and tumor segmentation. In particular, compared to the original UNet, PGD-UNet obtains a moderate improvement in bladder wall segmentation, whereas it achieves a significant improvement in bladder tumor segmentation. This indicates that our approach is robust to irregular shape variations, especially for tumors. Experiments of pancreas tumor segmentation are compared to the reported state-of-the-art methods on Medical Segmentation Decathlon (MSC) datasets in Table II, where the ‘Categorization’ column represents the type of method, ‘Search’ refers to the method of automated network architecture search and ’Cascade’ refers to the multi-stage method. Our PGD-UNet obtains comparable segmentation accuracy to the state-of-the-art 3D methods with a much simpler 2D network that requires less computational power and does not rely on exhaustive annotations for the full 3D image volumes. Compared with other 2D model, i.e. nnUNet_2D, our method improves dice performance by 3.09% and 41.54% for pancreas and pancreas tumors, respectively. All results are given by for each sample.
We visualize some segmentation instances resulted from different algorithms on both datasets in Fig. 5. As seen from the results, PGD-UNet is able to learn the discriminative features that can effectively segment narrow structures like bladder wall and complex pattern of tumors with varying shapes and sizes. Segmentation details in areas highlighted in organ also indicates that our method can effectively deal with boundary regions where tumors and bladder wall mix together.
Iv-D Ablation Experiments
The ablation experiments are performed to verify the contribution of each proposed module.
Iv-D1 Localization Path
We compared the performance of the model with and without localization path, and carried out ablation experiments on important components of ‘CoordConv’ and ‘CoordPool’. As shown in Table III, segmentation performance degrades significantly when removing the localization path. The second row represents a localization path consisting of plain convolutions. Comparing the second and following rows, it can be seen that using CoordConv alone has only a slight effect, whereas the CoordPool that preserves position information impacts more on the DSC. In addition, the results in the last row show that localization path improves the segmentation accuracy of tumor much more than that of normal tissues. This is consistent with the observation that tumors have more size and shape variations than normal tissues.
Iv-D2 Noise Suppression Focal Loss
Due to the large proportion of background in our datasets, using the Cross-Entropy (CE) loss function alone cannot make network converge, and all the outputs predict the background as results. In this case, we chose Focal Loss (FL) as the baseline. Besides, other loss functions that aiming at handling class imbalance were compared, including Gradient Harmonizing Mechanism (GHM) loss, DSC loss and their combination.
Table IV reports the results of ablation experiments using various loss function on the bladder and pancreas datasets. The DSC of tumor consistently increases by adding the NSFL, whereas the performance of normal tissue degrades slightly. This indicates that the impact of NSFL positively relates to the level of label noise. Using the DSC loss alone is unstable and may cause a sharp decline in tumor segmentation performance. We believe that this is due to the class imbalance between normal tissue and tumor. As DSC loss is based on regional integration, the classes with abundant pixels are prone to dominate the gradient, thus leading to poor results for other classes or even failing to converge.
Fig. 6 compares the evolution of loss value and validation metrics between FL and NSFL on MRI bladder dataset. After 50 epoch, the validation set loss of FL began to rise, indicating the overfitting of the network. Meanwhile, NSFL suppressed this trend significantly. Besides, as can be seen from the curve of DSC metrics on the validation set, normal tissues hardly to overfit due to the large number of samples and clean label, whereas tumors are prone to overfit. Thus, NSFL helps to reach the optimal convergence point for both normal tissues and tumors achieving precise segmentation results.
V Conclusions and Future Work
We proposed an improved UNet framework named PGD-UNet for medical image segmentation. PGD-UNet enhances the original UNet by including deformable convolution with localization path and noise suppression focal loss function to effectively address the problem of size and shape variations, and severe class imbalance in tumor segmentation. By adding ‘CoordConv’ and ‘CoordPool’ modules, we explicitly encode position information into the network to improve the offset learning of deformable convolution. To solve the problem of confusion between noise and hard-to-classify samples caused by focal loss when applying it to deal with class imbalance, we design a new loss function to suppress the impact of outliers on the gradient. The effectiveness of our method is verified on two challenging medical segmentation tasks. In the future, we plan to extend our work to allow utilising complementary information from both MRI and CT images, where challenges associated like registration  need to be solved.
This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 61671151 and 61573097, the Natural Science Foundation of JiangSu Province under Grant No. BK20181265, the Australian Research Council (ARC) under Grant No. LP170100416, LP180100114 and DP200102611, and the Research Grants Council of the Hong Kong SAR under Project CityU11202418.
-  F. Liao, M. Liang, Z. Li, X. Hu, and S. Song, “Evaluate the malignancy of pulmonary nodules using the 3-d deep leaky noisy-or network,” IEEE transactions on neural networks and learning systems, 2019.
A. Myronenko, “3d mri brain tumor segmentation using autoencoder regularization,” inInternational MICCAI Brainlesion Workshop. Springer, 2018, pp. 311–320.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
-  H. Kervadec, J. Bouchtiba, C. Desrosiers, É. Granger, J. Dolz, and I. B. Ayed, “Boundary loss for highly unbalanced segmentation,” arXiv preprint arXiv:1812.07032, 2018.
-  O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable
convolutional networks,” in
Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773.
-  L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
-  T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
-  A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B. van Ginneken, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze et al., “A large annotated medical image dataset for the development and evaluation of segmentation algorithms,” arXiv preprint arXiv:1902.09063, 2019.
-  D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
-  T. Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on featured distributions,” Pattern recognition, vol. 29, no. 1, pp. 51–59, 1996.
-  M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer networks,” in Advances in neural information processing systems, 2015, pp. 2017–2025.
-  H. R. Roth, L. Lu, N. Lay, A. P. Harrison, A. Farag, A. Sohn, and R. M. Summers, “Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation,” Medical image analysis, vol. 45, pp. 94–107, 2018.
-  J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
-  K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker, “Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation,” Medical image analysis, vol. 36, pp. 61–78, 2017.
B. Li, Y. Liu, and X. Wang, “Gradient harmonized single-stage detector,” in
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 8577–8584.
-  C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso, “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, 2017, pp. 240–248.
-  D. J. Matuszewski and I.-M. Sintorn, “Minimal annotation training for segmentation of microscopy images,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, 2018, pp. 387–390.
-  Y. Dgani, H. Greenspan, and J. Goldberger, “Training a neural network based on unreliable human annotation of medical images,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, 2018, pp. 39–42.
-  R. Liu, J. Lehman, P. Molino, F. P. Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” in Advances in Neural Information Processing Systems, 2018, pp. 9605–9616.
-  P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, “Understanding convolution for semantic segmentation,” in 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, 2018, pp. 1451–1460.
-  M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision. Springer, 2014, pp. 818–833.
D. Arpit, S. Jastrzębski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal,
T. Maharaj, A. Fischer, A. Courville, Y. Bengio et al., “A closer
look at memorization in deep networks,” in
Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 233–242.
-  Y. Qin, K. Kamnitsas, S. Ancha, J. Nanavati, G. Cottrell, A. Criminisi, and A. Nori, “Autofocus layer for semantic segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 603–611.
-  Z. Zhu, C. Liu, D. Yang, A. L. Yuille, and D. Xu, “V-nas: Neural architecture search for volumetric medical image segmentation,” 2019 International Conference on 3D Vision (3DV), pp. 240–248, 2019.
-  F. Isensee, J. Petersen, A. Klein, D. Zimmerer, P. F. Jaeger, S. Kohl, J. Wasserthal, G. Koehler, T. Norajitra, S. Wirkert et al., “nnu-net: Self-adapting framework for u-net-based medical image segmentation,” arXiv preprint arXiv:1809.10486, 2018.
M. Gong, Y. Wu, Q. Cai, W. Ma, A. K. Qin, Z. Wang, and L. Jiao, “Discrete particle swarm optimization for high-order graph matching,”Information Sciences, vol. 328, pp. 158–171, 2016.