1 Introduction
Semantic segmentation is a fundamental computer vision task that is particularly important for medical image analysis
MedicalImageSemanticSegmentation. It serves as a preprocessing step for the treatment of different medical conditions. In recent years, convolutional neural network (CNN) based methods have achieved tremendous improvement
deconvolutionnetworkforsemanticsegmentation , ArtificialConvolutionalNeuralNetwork . With abundant pixelwise labeled data, CNN can easily achieve outstanding performance relying on great nonlinear fitting capability Deepsemanticsegmentation . However, pixelwise annotated images are expensive to obtain especially in the medical domain. It is timeconsuming and extremely relies on experienced experts. Fortunately, semisupervised methods leverage not only labeled data but also unlabeled data Asurveyonsemi , freeing the researchers from labeling work. Therefore, semisupervised learning has attracted great attention Notsosupervised , SemiSupervisedCrowdCounting , semisupervisedplant .The current dominant semisupervised learning methods in deep learning are selftraining
Semi4FCN , consistency regulation selfensembling , pseudo labelingPseudolabel , adversarial learning AdversarialLearning and so on. All the above approaches are based on the assumption that semisupervised methods rely on intrinsic properties of the dataset distribution rather than individual images. So, parameter optimization can process with a combination of annotated images and unannotated images rather than labeled images only. For example, selftraining selfensembling selects the most reliable prediction of the current model as the label (pseudo label), for the complement of the labeled dataset. Other approaches attempt to exchange internal information among an ensemble of models, aimed at increasing their consistency, such as cotraining Deepcotraining . These approaches are all constructive to a certain degree. But how to leverage unannotated data more effectively is still one of the most concerning issues in these semisupervised learning methods.2 Related work
2.1 Semisupervised segmentation
Semisupervised learning is an indispensable part of deep learning theory Semisupervisedlearningliteraturesurvey . Recently, many works have proposed using semisupervised approaches in medical image segmentation to segment human organs SemisupervisedlearningforMRimage . An innovative semisupervised segmentation approach is presented by Mohamed et al. for efficient segmentation of lung CT scans from Convid19 patients.Yu et al. Uncertaintyaware proposed a teacherstudent model to segment 3D left atrium via selfensembling. Li et al. selfensembling proposed a semisupervised segmentation method for skin lesions by enforcing the consistency between the student and teacher model. Fang et al. Dmnet used a cotraining framework to boost each submodel for kidney tumor segmentation. And they applied adversarial training Adversariallearning to their network to constrain the models to output invariant results over different perturbations on input data.
Similar to cotraining, Peng et al. Deepcotraining presented a method based on an ensemble of deep segmentation networks. Training different models with corresponding subsets of the annotated data and using nonannotated images to exchange internal information among submodels. The difference between semisupervised methods is how they utilize unlabeled data and the way they relate to supervised algorithms Asurveyonsemi . Most of the approaches utilize predictions of unlabeled data for consistency training MutualConsistencyTraining . However, the model’s prediction of unlabeled data used to exchange information is unstable. We need to add components to enable us to learn the specific confidence situation of each pixel of the prediction ConfidenceAware . So we introduce uncertainty estimation Theimpactofuncertainty in our approach.
2.2 Uncertainty Estimation
Uncertainty estimation plays a pivotal role in reducing the impact of stochastic during both optimization and decisionmaking processes Areviewofuncertaintyquantification
. Knowing the confidence (uncertainty) with which we can trust the neural networks’ predictions is essential for decision making
Uncertaintyestimation. A mathematically grounded framework is offered by Bayesian probability theory to learn about the uncertainty of a datadriven model, called Bayesian Neural Network(BNN). BNN attempted to learn a distribution over each of the network’s weight parameters
WhatUncertaintiesDoWeNeed. However, Bayesian inference is intractable on computation in practice.
Deep ensemble DeepEnsembles
is an alternative to BNN, which observes the variance of many trained models to estimate uncertainty. In addition, several works applied different uncertainty estimation method to the task of semantic segmentation, including Stochastic Batch Normalization
Uncertaintyestimationviastochastic , Multiplicative Normalizing Flows Multiplicativenormalizingflows and so on. Upadhyay et al. UncertaintyAwareGAN proposed a GANbased framewrk that estimates the pervoxel uncertainty in the predictions on magnetic resonance imaging (MRI). Alex WhatUncertaintiesDoWeNeedstudied models under the framework with perpixel semantic segmentation and proposed new loss functions based on uncertainty, interpreted as learned attenuation. Wu et al.
MutualConsistencyTraining claimed that deep models require extra components to obtain performance gains. And Yu et al. Uncertaintyaware introduced uncertainty to ensemble models by using MonteCarlo Sampling. It has been proved that the Monte Carlo Sampling has much superiority over conventional methods in the estimation of uncertainty, especially of complex measurement systems’ output Uncertaintyestimation .3 Methods
3.1 Problem formulation
Since manual annotation is often timeconsuming and expensive, only a small fraction of images in datasets can have full pixelwise labels. So our proposed method aims to learn more information from both labeled and abundant unlabeled images by using the uncertaintyaware deep cotraining methodology. We formalize the problem of medical image semantic segmentation as follows.
Given labeled dataset , which contains labeled examples, each example comprised of an input image with dimensions and its corresponding pixellevel label , where is the number of classes and is spatial dimension . In semisupervised setting, we also have unlabeled dataset including unlabeled images, with . The purpose of semisupervised segmentation is to train a segmentation model of parameter with , which map each pixel of an input image to its correct label.
3.2 Proposed approach
We train two models , jointly with labeled dataset and unlabeled dataset
in a collaborative manner. Motivated by performance improvement from the uncertainty estimation in the Bayesian networks, we employ Monte Carlo Sampling
WhatUncertaintiesDoWeNeed to estimate uncertainty. Specifically, we perform stochastic forward passes on two models under random dropout. According to this, we obtain a set of predictions and choose the predictive entropy metric to approximate the uncertainty. The predictive entropy can be summarized as:(1) 
where
is the probability of the cth class in the tth time prediction. And the uncertainty is estimated in pixel level, the uncertainty of the whole volume
is .Following cotraining methods for semisupervised semantic segmentation, we employ a loss function composed of a weighted sum of three separate terms to train ensemble models.
(2) 
Uncertainty is used to impact the weight of the first two parts of the loss function. The details are explained in the following subsections.
3.2.1 Supervised learning
As in standard multiview learning Deepmultiviewlearning approach, we separate the labeled dataset to complementary subsets and . Training two models individually with corresponding labeled subsets makes each model have a certain fit to the distribution of the data. And also, we can ensure the diversity of the two models.
We employ ENetenet as our network backbone. To adapt the ENet as a Bayesian network to estimate the uncertainty, one dropout layer is added between the encoder and decoder. In the network training and uncertainty estimation, we turn on the dropout layer. Since we don’t need to estimate uncertainty in the testing phase, we turn off the dropout layer.
Through the forward propagation of each model times, we can get
different results as shown in Fig. 2. Therefore, we obtain a set of Softmax probability vectors for each pixel in the input. By using predictive entropy as the metric, we can approximate uncertainty maps. We express the uncertainty map as the model confidence of each pixel prediction. If a pixel has a large fluctuation during the prediction process of
forward propagation, the uncertainty of it will be large.With the guidance of the estimated uncertainty and , we find out the unreliable prediction in images and make models learn more from these pixels from corresponding labels. We design the uncertaintyaware supervised loss as the pixellevel crossentropy loss of the two models:
(3)  
3.2.2 Unsupervised learning
Semisupervised learning means not only exploiting labeled images, unlabeled images are more important for the learning process. The strategy of unsupervised learning wants the segmentation networks to output similar predictions for the same unlabeled image with perturbations. The deep cotraining method minimizes the distance between the class distributions predicted by different models. However, the prediction results obtained through one forward pass are contingent, which will limit models’ ability to fit the unlabeled data distribution.
So, we also introduce the concept of uncertainty in unsupervised learning. Same as the fully supervised phase, we perform stochastic forward passes on each model under random dropout as shown in Fig. 3. After leveraging predictive entropy as a metric, we get two uncertainty maps approximated by models. Different from the supervised stage, we combine the results of the two models here. We calculate the average of the two uncertainties and take it as a weight coefficient and multiply it on the loss function. Since the calculated uncertainty fluctuates greatly, we normalize it in this way:
(4) 
where and are constants used for normalization. In order t to make the uncertainty change within the controllable range, we set , .
In supervised learning, the models should pay more attention to the pixels with higher uncertainty. But here in unsupervised learning, models should learn more from lower uncertainty pixels as there is no label in this stage. The estimated uncertainty guides models concentrating more on reliable (low uncertainty) predictions. We add uncertainty , estimated by two models together, and normalize it as . The uncertaintyaware ensemble agreement loss we leverage here is JensenShannon divergence:
(5) 
3.2.3 When to introduce uncertainty?
Introducing uncertainty into models can improve the performance of the network, but introducing it at inappropriate time will crush the entire framework.
In the supervised stage, theoretically, the models don’t need prior knowledge to avoid turning in the wrong direction. So the sooner uncertainty is introduced to guide network learning, the better performance we can get. We verify this by doing experiments of taking uncertainty in concern at different epoch. The results are shown in Fig. 4. Considering the performance in both DSC and HD, we leverage uncertainty from the beginning in the supervised stage.
In the unsupervised stage, the models’ prediction error for unlabeled images is large at the beginning of experiments. This will lead to a large fluctuation range of uncertainty, which is not conducive to allowing the model to focus on where it should learn more. So, we tested adding uncertainty to the unsupervised stage at different epoch, and the final results are shown in Fig. 4. And finally, we learned that introducing uncertainty in the unsupervised stage requires that models have a certain understanding of the target distribution.
Synthesize all factors, we introduce uncertainty to the supervised stage at epoch 0, and epoch 20 for the unsupervised stage in all experiments of our approach.
3.2.4 Adversarial learning
Having diversity between models in the ensemble is indispensable. In cotraining, diversity is essential for avoiding the collapse of decision boundary and the models can learn from each other during training. The deep cotraining method uses the approach proposed by Qiao et al. for image classification and augments the dataset with adversarial examples. The adversarial examples are generated from both labeled and unlabeled data.
One model uses adversarial examples teaching other models in the ensemble. In the original deep cotraining method, the diversity loss is:
(6)  
where refers to cross entropy and is an input image. The is an adversarial example target on model . So is an adversarial example for model 1, and we can easily get that . In the process of minimizing the first term of Eq. 6., it will make approximately equal to . Combining the above relations, we can obtain . Applying the same idea to model 2, we can conclude that the models have divergence in predicting adversarial examples.
Adversarial examples can be obtained by adding small perturbations to input images. As described in Deepcotraining , we follow use different schemes to generate these examples based on the source of the image . We apply the Virtual Adversarial Training (VAT) method when is an unlabeled image from ; and we use the Fast Gradient Sign Method (FGSM) when in labeled datasets.
Method  DSC()  

RV  Myo  LV  Mean  
Full  
MT  
UAMT  
Part  avg  
vot  
DCT  avg  
vot  
Ours  avg  
vot  86.59(0.17)  86.65(0.35)  92.74(0.33)  88.45(0.12)  
Method  HD()  
RV  Myo  LV  Mean  
Full  
MT  
UAMT  
Part  avg  
vot  
DCT  avg  
vot  4.84(0.57)  
Ours  avg  
vot  3.65(0.18)  2.68(0.08)  3.48(0.05) 
Method  DSC(%)  

Part  
IND  avg  
vot  
DCT  avg  
vot  
Ours  avg  
vot  83.16  88.01  88.57  89.41  89.86  90.89  
Method  HD(mm)  
5  10  20  30  40  50  
Part  
IND  avg  
vot  
DCT  avg  
vot  
Ours  avg  
vot  4.70  3.42  3.54  3.09  3.15  2.78 
4 Experiments and results
4.1 Dataset and metrics
We evaluated our method on three medical image segmentation benchmark publicly available datasets: Automated Cardiac Challenge (ACDC)AutomaticMRICardiac , Spinal Cord Gray Matter Challenge (SCGM)Spinalcordgreymatter , and Spleen subtask dataset of the Medical Segmentation Decathlon ChallengeAlargeannotatedmedicalimagedataset .
ACDC dataset
: The University Hospital of Dijon created the ACDC dataset from real clinical exams. The ACDC dataset covers several welldefined pathologies with enough cases. It consists of 200 shortaxis cineMRI scans from 100 patients. This dataset is divided into 5 evenly distributed medical groups: patients without cardiac disease, myocardial infarction, hypertrophic cardiomyopathy, abnormal right ventricles, and dilated cardiomyopathy. There are four regions of interest in segmentation masks: right ventricle endocardium (RV), left ventricle endocardium (LV), left ventricle myocardium (Myo), and background. We leveraged 75 subjects (150 scans) for training and 25 subjects (50 scans) for testing. All shortaxis slices within 3DMRI scans were resized to 256 256 as 2D images.
SCGM dataset
: The Spinal Cord Gray Matter Challenge was organised to test different capabilities of various methods. The dataset in this challenge is a publicavailable collection of multivendors, multicenter MRI. All data was aquired at 4 different sites: University College London, Polytechnique Montreal, University of Zurich and Vanderbilt University. It contains 80 healthy subjects (age range of 28.3 to 44.3 years) with 20 subjects from each center. The training set contains 40 labeled scans, each annotated slicewise by 4 independent experts. The ground truth mask is obtained by majority voting. In our work, we leverage 30 images from the first center as a labeled dataset, and 465 images from all centers as an unlabeled dataset. The test set contains 264 labeled images from center 3 and center 4. The slices are centercropped to 200 200 pixels.
Spleen dataset
: Spleen dataset is one of the ten subtasks of the Medical Segmentation Decathlon ChallengeAlargeannotatedmedicalimagedataset . It consists of patients undergoing chemotherapy treatment for liver metastases as a publicly available dataset. The dataset includes 61 portal venous phases CT scans (only 41 were given with ground truth). The ground truth segmentation was generated by a semiautomatic segmentation software and identified by an expert abdominal radiologist. Each slice obtained from the CT scans is resized to 256 256 pixels. We split the dataset into labeled (4 patients), unlabeled (32 patients), and validation image subsets( 5 patients).
Dice similiarity coefficient(DSC)
: DSC measures the overlap between the predictions of the model and the ground truth :
(7) 
Hausdorff distance(HD)
: HD is a boundary distance metric which measures the largest distance (in mm) between a point in prediction and the closest point in the ground truth :
(8) 
4.2 Implementation Details
For the sake of speed and accuracy, we leverage the wellknown lightweight Enetenet as our basic segmentation network. This architecture is one of the most popular models for segmentation. To adapt the ENet as a Bayesian network to estimate the uncertainty, one dropout layer with a dropout rate of is added between the encoder and decoder of the ENet. All the datasets are applied random rotation, random crop and flip as augmentation strategies. We set to 0.03 in FSGM and 10 in VAT. The learning rate decreased with a factor of 10 every 90 epochs, and it is 10 every 100 epochs in SCGM experiments. To describe the level of supervision, we vary the ratio (0 1) of labeled samples in our experiments.
For all experiments, the setting of
is with a dynamic strategy. We used a rampup strategy followed Gaussian rampup curve. And for all the hyperparameter
settings, we follow Deepcotraining. The framework is implemented with Pytorch library, trained on one NVIDIA 2080Ti GPU.
4.3 Comparison with semisupervised methods
Our uncertaintyaware cotraining method is compared against several recent stateofart semisupervised methods in medical domain:
Mean Teacher
: Mean Teacher (MT) is a method using multiple deep CNNs for semisupervised segmentation. And it is an effective approach that averages model weights instead of predictions.
Uncertaintyaware Mean Teacher
: Uncertaintyaware Mean Teacher (UAMT) applies uncertainty estimation to MT.
DeepCotrianing
: DeepCotraining (CT) sharing information between simultaneously trained models, while preserving their diversity.
We also compare our method with the full supervised method of the same amount of labeled images as the semisupervised method, referred to as ”Part”. ”IND” means the performance of individuallytrained models (Independent). We report the performance of a fully supervised baseline (”Full”) which means training a single model with all available datasets.
For these baselines, we follow the same learning rate decay, weight scheduler, data augmentation setting, and optimization as for our method. For MT and UAMT, we set EMA (Exponential Moving Average) parameters
to 0.99. We report both the average performance of individual models (”avg”) and the performance of combining the predictions of all models using a voting strategy (”vot”) called ensemble soft voting. This strategy leads to a higher accuracy than the prediction of individual models for both DSC and HD. In order to avoid contingency, we compute the average and standard deviation over three runs with different random seeds.
All the compared baseline results reported in our paper are reimplemented by us.
Method  DSC()  HD()  

Full  
Mean Teacher  
UAMT  
Independent  avg  
vot  
DCT  avg  
vot  
Ours  avg  
vot  74.31(0.30)  9.01(0.73) 
Method  DSC()  HD()  

Full  
Mean Teacher  
UAMT  
Independent  avg  
vot  
DCT  avg  
vot  
Ours  avg  
vot  95.45(0.24)  5.57(0.21) 
Method  DSC  

Independent  avg  
vot  
DCT  avg  
vot  
Ours  avg  
vot  89.02(1.05)  91.94(0.77) 
Method  DSC(%)  HD(mm)  

RV  Myo  LV  Mean  RV  Myo  LV  Mean  
DCTseg  avg  81.76  82.21  91.75  85.24  9.31  9.16  5.21  7.89 
vot  82.84  83.55  92.53  86.31  5.64  4.26  2.90  4.27  
Supervised stage via uncertainty  avg  85.79  85.75  92.11  87.88  6.66  6.60  4.59  5.95 
vot  86.51  86.50  92.17  88.39  3.93  3.52  2.76  3.40  
Unsupervised stage via uncertainty  avg  84.55  84.73  92.26  87.18  7.42  5.75  4.50  5.89 
vot  85.29  85.37  92.45  87.70  4.42  3.41  2.72  3.52 
4.4 Experimental results
4.4.1 ACDC dataset
Our uncertaintyaware method is first evaluated on the ACDC dataset with = 0.2. Tab. 2. presents the quantitative results. Compared with other methods, our approach achieves higher overall performance on the test dataset in terms of DSC and HD. Especially, our approach achieves improvement over cotraining with an overall performance increase of 2.29 in DSC and 0.49 in HD, and the standard deviation of multiple trials (three runs) of the results of our approach is the smallest. With only 20 labeled images, we are only 1.8 in DSC and 0.39 in HD behind the 100 label ratio supervision. Some examples of results from the test dataset are shown in Fig. 5. We can see our method gives contours closer to ground truth (GT), with more accurate segmentation in details between different regions.
Also, we evaluate how label ratios impact results in a dualview setting. Tab. 3. shows the results for different labeled data ratios: 5, 10, 20, 30, 40, 50. We can easily find that as the label ratio increase, mean DSC values increase sharply, and mean HD values decrease. In all cases, our approach leads to a better performance in DSC and HD than training models separately and deep cotraining method. Our approach has more obvious advantages with a low label ratio. With 5 label ratio, our method outperforms Deep cotraining 2.73 in DSC and 1.34 in HD.
4.4.2 SCGM dataset
We evaluate our method on the task of segmenting spinal cord grey matter in images from the SCGM dataset. The SCGM dataset is from four different clinical centers, so different parameters are applied in collecting the MRI images. We only used a few labeled images (i.e., only 30 images from one center), and test sets are from the other centers. The labeling ratio is about 6.5. Due to the different data sources, it is more difficult to extract the image features of the samples. This also leads to low segmentation accuracy of semisupervised methods in this task. The use of uncertainty solves this problem well, models can learn effective semantic features from different centers according to the uncertainty.
The results are summarized in Tab. 4. Our approach gives a mean DSC of 5.96 and HD of 6.35 better than the best baseline (deep cotraining). And our method gives a mean DSC of 12.94 and HD of 23.26 better than the performance of the fully supervised baseline. Fig. 6. shows the segmentation results on the test dataset. In some difficulttorecognize images, fully supervised training cannot even segment the lesion area, but our approach can complete the segmentation task to a certain extent.
4.4.3 Spleen dataset
We also validate the effectiveness of our uncertaintyaware deep cotraining method on the task of segmenting spinal cord grey matter in images from the Spleen dataset. We repeated our experiments on the Spleen dataset consisting of 2D slices of CT scans resized to a resolution of 256 256 pixels. Tab. 5. summarizes the experimental results. We see that the Deep cotraining method’s performance is almost the same as the fully supervised baseline. In case of better stability (standard deviation) of the results, the accuracy has been improved and surpassed full supervision after taking uncertainty into concern. Our method improves accuracy over DCTseg from 94.09 to 95.45 and HD from 6.86 to 5.57. Examples of segmentation results obtained by tested methods are given in Fig. 7.
Semisupervised learning pursues using less labeled images to achieve better segmentation accuracy. We show the performance of our approach, DCTseg, and individually trained models(independent) on smaller label ratios: 5 and 10. Tab.6. gives the mean DSC with standard deviation. Our approach is still in the leading position among all semisupervised methods.
4.4.4 Abalation analysis of our method
We do ablation studies to prove that our method works in both supervised and unsupervised stages on ACDC dataset. The results are shown on Tab.6.
In supervised stage: As mentioned before, the best time to intorduce uncertainty to the full supervision phase is from the beginning. Uncertainty used here allows models in the network to obtain characteristics of the target area better under full supervision. We can see from Fig. 8. that the main effect of uncertainty in the full supervision stage is in the first 30 epochs. We can tell from the label that the uncertainty maps focus more on the lesion area. After adding uncertainty to the full supervision stage, the method can achieve improvement over cotraining with an overall all performance increase of 2.23 in DSC and 0.57 in HD.
In unsupervised stage: The best time at this stage is to leverage uncertainty after models have a certain understanding of the lesion areas semantic features. After experiments per 10 epochs from 0, it is finally determined that the uncertainty will be leveraged after the 20th epoch. So, we visualize the uncertainty map from 0 epochs to the last in Fig. 9. We can see that the uncertainty maps are increasingly focusing on the lesion areas. Obviously, uncertainty always plays a role throughout the experiment. As the training progresses, the color of the lesion areas of the uncertainty maps becomes lighter, which means the predictions of the two models for the same unlabeled image are gradually approaching, but there is still a certain difference. Only introduce uncertainty to the unsupervised stage, we can achieve an improvement over cotraining with an overall all performance increase of 1.39 in DSC and 0.75 in HD.
5 Conclusion
In this paper, we propose a novel uncertaintyaware deep cotraining method for three medical image segmentation tasks. Our approach using uncertainty in both the supervised learning stage and the unsupervised learning stage in the deep cotraining method. We use uncertainty obtained from Monte Carlo Sampling to guide the training process purposely. We validate our method on three challenging medical image datasets. The comparison with other semisupervised methods confirms the effectiveness of our approach. Our uncertaintyaware cotraining method achieves the performance 1.88 away from supervision with only 20 label on ACDC dataset, and an increase of 12.94 in terms of DSC over fully supervised method on SCGM dataset. And for the Spleen dataset, we get 0.89 more than the fully supervised approach in DSC. In future work, we will investigate the effect of different uncertainty estimation manners and apply our approach to other semisupervised medical image segmentation tasks.
References
 (1) M. Z. Khan, M. K. Gajendran, Y. Lee, M. A. Khan, Deep neural architectures for medical image semantic segmentation, IEEE Access (2021).
 (2) H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528.
 (3) R. Yang, Y. Yu, Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis, Frontiers in Oncology 11 (2021) 573.

(4)
S. A. Taghanaki, K. Abhishek, J. P. Cohen, J. CohenAdad, G. Hamarneh, Deep semantic segmentation of natural and medical images: a review, Artificial Intelligence Review 54 (1) (2021) 137–178.
 (5) L. Schmarje, M. Santarossa, S.M. Schröder, R. Koch, A survey on semi, selfand unsupervised learning for image classification, IEEE Access (2021).

(6)
V. Cheplygina, M. de Bruijne, J. P. Pluim, Notsosupervised: a survey of semisupervised, multiinstance, and transfer learning in medical image analysis, Medical image analysis 54 (2019) 280–296.
 (7) Y. Meng, H. Zhang, Y. Zhao, X. Yang, X. Qian, X. Huang, Y. Zheng, Spatial uncertaintyaware semisupervised crowd counting, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15549–15559.
 (8) Y. Li, X. Chao, Semisupervised fewshot learning approach for plant diseases recognition, Plant Methods 17 (1) (2021) 1–10.
 (9) C. Baur, S. Albarqouni, N. Navab, Semisupervised deep learning for fully convolutional networks, in: International Conference on Medical Image Computing and ComputerAssisted Intervention, Springer, 2017, pp. 311–319.
 (10) X. Li, L. Yu, H. Chen, C.W. Fu, P.A. Heng, Semisupervised skin lesion segmentation via transformation consistent selfensembling model, in: Proceedings of the British Machine Vision Conference (BMVC), 2018.
 (11) D.H. Lee, et al., Pseudolabel: The simple and efficient semisupervised learning method for deep neural networks, in: Workshop on challenges in representation learning, ICML, Vol. 3, 2013, p. 896.
 (12) W.C. Hung, Y.H. Tsai, Y.T. Liou, Y.Y. Lin, M.H. Yang, Adversarial learning for semisupervised semantic segmentation, in: Proceedings of the British Machine Vision Conference (BMVC), 2018.

(13)
J. Peng, G. Estrada, M. Pedersoli, C. Desrosiers, Deep cotraining for semisupervised image segmentation, Pattern Recognition 107 (2020) 107269.
 (14) X. J. Zhu, Semisupervised learning literature survey (2005).
 (15) W. Bai, O. Oktay, M. Sinclair, H. Suzuki, M. Rajchl, G. Tarroni, B. Glocker, A. King, P. M. Matthews, D. Rueckert, Semisupervised learning for networkbased cardiac mr image segmentation, in: International Conference on Medical Image Computing and ComputerAssisted Intervention, Springer, 2017, pp. 253–260.
 (16) X. L. C.W. F. P.A. H. Lequan Yu, Shujun Wang, Uncertaintyaware selfensembling model for semisupervised 3d left atrium segmentation, in: Medical Image Computing and Computer Assisted Intervention Society, 2019.
 (17) K. Fang, W.J. Li, Dmnet: Difference minimization network for semisupervised segmentation in medical images, in: International Conference on Medical Image Computing and ComputerAssisted Intervention, Springer, 2020, pp. 532–541.
 (18) W.C. Hung, Y.H. Tsai, Y.T. Liou, Y.Y. Lin, M.H. Yang, Adversarial learning for semisupervised semantic segmentation, arXiv preprint arXiv:1802.07934 (2018).
 (19) Y. Wu, M. Xu, Z. Ge, J. Cai, L. Zhang, Semisupervised left atrium segmentation with mutual consistency training, arXiv preprint arXiv:2103.02911 (2021).
 (20) X. Zhang, Z. Cui, C. Chen, J. Wei, J. Lou, W. Hu, H. Zhang, T. Zhou, F. Shi, D. Shen, Confidenceaware cascaded network for fetal brain segmentation on mr images, in: International Conference on Medical Image Computing and ComputerAssisted Intervention, Springer, 2021, pp. 584–593.
 (21) W. Edeling, H. Arabnejad, R. Sinclair, D. Suleimenova, K. Gopalakrishnan, B. Bosak, D. Groen, I. Mahmood, D. Crommelin, P. V. Coveney, The impact of uncertainty on predictions of the covidsim epidemiological code, Nature Computational Science 1 (2) (2021) 128–135.
 (22) M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, U. R. Acharya, et al., A review of uncertainty quantification in deep learning: Techniques, applications and challenges, Information Fusion (2021).
 (23) C. E. Papadopoulos, H. Yeung, Uncertainty estimation and monte carlo simulation method, Flow Measurement and Instrumentation 12 (4) (2001) 291–298.
 (24) A. Kendall, Y. Gal, What uncertainties do we need in bayesian deep learning for computer vision?, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., Red Hook, NY, USA, 2017, p. 5580–5590.
 (25) B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles (2017). arXiv:1612.01474.
 (26) A. Atanov, A. Ashukha, D. Molchanov, K. Neklyudov, D. Vetrov, Uncertainty estimation via stochastic batch normalization, arXiv preprint arXiv:1802.04893 (2018).

(27)
C. Louizos, M. Welling, Multiplicative normalizing flows for variational bayesian neural networks, in: International Conference on Machine Learning, PMLR, 2017, pp. 2218–2227.
 (28) U. Upadhyay, V. P. Sudarshan, S. P. Awate, Uncertaintyaware gan with adaptive loss for robust mri image enhancement, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 3255–3264.
 (29) X. Yan, S. Hu, Y. Mao, Y. Ye, H. Yu, Deep multiview learning methods: a review, Neurocomputing (2021).
 (30) A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for realtime semantic segmentation (2016). arXiv:1606.02147.
 (31) O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P.A. Heng, I. Cetin, K. Lekadir, O. Camara, M. A. Gonzalez Ballester, G. Sanroma, S. Napel, S. Petersen, G. Tziritas, E. Grinias, M. Khened, V. A. Kollerathu, G. Krishnamurthi, M.M. Rohé, X. Pennec, M. Sermesant, F. Isensee, P. Jäger, K. H. MaierHein, P. M. Full, I. Wolf, S. Engelhardt, C. F. Baumgartner, L. M. Koch, J. M. Wolterink, I. Išgum, Y. Jang, Y. Hong, J. Patravali, S. Jain, O. Humbert, P.M. Jodoin, Deep learning techniques for automatic mri cardiac multistructures segmentation and diagnosis: Is the problem solved?, IEEE Transactions on Medical Imaging 37 (11) (2018) 2514–2525. doi:10.1109/TMI.2018.2837502.
 (32) F. Prados, J. Ashburner, C. Blaiotta, T. Brosch, J. CarballidoGamio, M. J. Cardoso, B. N. Conrad, E. Datta, G. Dávid, B. De Leener, et al., Spinal cord grey matter segmentation challenge, Neuroimage 152 (2017) 312–329.
 (33) A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B. van Ginneken, A. KoppSchneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers, P. Bilic, P. F. Christ, R. K. G. Do, M. Gollub, J. GoliaPernicka, S. H. Heckers, W. R. Jarnagin, M. K. McHugo, S. Napel, E. Vorontsov, L. MaierHein, M. J. Cardoso, A large annotated medical image dataset for the development and evaluation of segmentation algorithms (2019). arXiv:1902.09063.