Log In Sign Up

Uncertainty-Aware Deep Co-training for Semi-supervised Medical Image Segmentation

by   Xu Zheng, et al.

Semi-supervised learning has made significant strides in the medical domain since it alleviates the heavy burden of collecting abundant pixel-wise annotated data for semantic segmentation tasks. Existing semi-supervised approaches enhance the ability to extract features from unlabeled data with prior knowledge obtained from limited labeled data. However, due to the scarcity of labeled data, the features extracted by the models are limited in supervised learning, and the quality of predictions for unlabeled data also cannot be guaranteed. Both will impede consistency training. To this end, we proposed a novel uncertainty-aware scheme to make models learn regions purposefully. Specifically, we employ Monte Carlo Sampling as an estimation method to attain an uncertainty map, which can serve as a weight for losses to force the models to focus on the valuable region according to the characteristics of supervised learning and unsupervised learning. Simultaneously, in the backward process, we joint unsupervised and supervised losses to accelerate the convergence of the network via enhancing the gradient flow between different tasks. Quantitatively, we conduct extensive experiments on three challenging medical datasets. Experimental results show desirable improvements to state-of-the-art counterparts.


page 8

page 10

page 11

page 12


Semi-supervised Medical Image Segmentation via Geometry-aware Consistency Training

The performance of supervised deep learning methods for medical image se...

Mutual- and Self- Prototype Alignment for Semi-supervised Medical Image Segmentation

Semi-supervised learning methods have been explored in medical image seg...

Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

Semi-supervised learning has attracted much attention in medical image s...

Medical Instrument Segmentation in 3D US by Hybrid Constrained Semi-Supervised Learning

Medical instrument segmentation in 3D ultrasound is essential for image-...

Calibrating Label Distribution for Class-Imbalanced Barely-Supervised Knee Segmentation

Segmentation of 3D knee MR images is important for the assessment of ost...

Transductive image segmentation: Self-training and effect of uncertainty estimation

Semi-supervised learning (SSL) uses unlabeled data during training to le...

Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation

Recently, several Bayesian deep learning methods have been proposed for ...

1 Introduction

Semantic segmentation is a fundamental computer vision task that is particularly important for medical image analysis 


. It serves as a preprocessing step for the treatment of different medical conditions. In recent years, convolutional neural network (CNN) based methods have achieved tremendous improvement 

deconvolutionnetworkforsemanticsegmentation Artificial-Convolutional-Neural-Network . With abundant pixel-wise labeled data, CNN can easily achieve outstanding performance relying on great nonlinear fitting capability Deep-semantic-segmentation . However, pixel-wise annotated images are expensive to obtain especially in the medical domain. It is time-consuming and extremely relies on experienced experts. Fortunately, semi-supervised methods leverage not only labeled data but also unlabeled data A-survey-on-semi , freeing the researchers from labeling work. Therefore, semi-supervised learning has attracted great attention Not-so-supervised Semi-Supervised-Crowd-Counting semi-supervised-plant .

The current dominant semi-supervised learning methods in deep learning are self-training 

Semi4FCN , consistency regulation self-ensembling , pseudo labelingPseudo-label , adversarial learning AdversarialLearning and so on. All the above approaches are based on the assumption that semi-supervised methods rely on intrinsic properties of the dataset distribution rather than individual images. So, parameter optimization can process with a combination of annotated images and unannotated images rather than labeled images only. For example, self-training self-ensembling selects the most reliable prediction of the current model as the label (pseudo label), for the complement of the labeled dataset. Other approaches attempt to exchange internal information among an ensemble of models, aimed at increasing their consistency, such as co-training Deep-co-training . These approaches are all constructive to a certain degree. But how to leverage unannotated data more effectively is still one of the most concerning issues in these semi-supervised learning methods.

2 Related work

2.1 Semi-supervised segmentation

Semi-supervised learning is an indispensable part of deep learning theory Semi-supervised-learning-literature-survey . Recently, many works have proposed using semi-supervised approaches in medical image segmentation to segment human organs Semi-supervised-learning-for-MR-image . An innovative semi-supervised segmentation approach is presented by Mohamed et al. for efficient segmentation of lung CT scans from Convid-19 patients.Yu et al. Uncertainty-aware proposed a teacher-student model to segment 3D left atrium via self-ensembling. Li et al. self-ensembling proposed a semi-supervised segmentation method for skin lesions by enforcing the consistency between the student and teacher model. Fang et al. Dmnet used a co-training framework to boost each sub-model for kidney tumor segmentation. And they applied adversarial training Adversarial-learning to their network to constrain the models to output invariant results over different perturbations on input data.

Similar to co-training, Peng et al. Deep-co-training presented a method based on an ensemble of deep segmentation networks. Training different models with corresponding subsets of the annotated data and using non-annotated images to exchange internal information among sub-models. The difference between semi-supervised methods is how they utilize unlabeled data and the way they relate to supervised algorithms A-survey-on-semi . Most of the approaches utilize predictions of unlabeled data for consistency training Mutual-Consistency-Training . However, the model’s prediction of unlabeled data used to exchange information is unstable. We need to add components to enable us to learn the specific confidence situation of each pixel of the prediction Confidence-Aware . So we introduce uncertainty estimation The-impact-of-uncertainty in our approach.

2.2 Uncertainty Estimation

Uncertainty estimation plays a pivotal role in reducing the impact of stochastic during both optimization and decision-making processes A-review-of-uncertainty-quantification

. Knowing the confidence (uncertainty) with which we can trust the neural networks’ predictions is essential for decision making 


. A mathematically grounded framework is offered by Bayesian probability theory to learn about the uncertainty of a data-driven model, called Bayesian Neural Network(BNN). BNN attempted to learn a distribution over each of the network’s weight parameters 


. However, Bayesian inference is intractable on computation in practice.

Deep ensemble Deep-Ensembles

is an alternative to BNN, which observes the variance of many trained models to estimate uncertainty. In addition, several works applied different uncertainty estimation method to the task of semantic segmentation, including Stochastic Batch Normalization 

Uncertainty-estimation-via-stochastic , Multiplicative Normalizing Flows Multiplicative-normalizing-flows and so on. Upadhyay et al. Uncertainty-Aware-GAN proposed a GAN-based framewrk that estimates the per-voxel uncertainty in the predictions on magnetic resonance imaging (MRI). Alex What-Uncertainties-Do-We-Need

studied models under the framework with per-pixel semantic segmentation and proposed new loss functions based on uncertainty, interpreted as learned attenuation. Wu et al. 

Mutual-Consistency-Training claimed that deep models require extra components to obtain performance gains. And Yu et al. Uncertainty-aware introduced uncertainty to ensemble models by using Monte-Carlo Sampling. It has been proved that the Monte Carlo Sampling has much superiority over conventional methods in the estimation of uncertainty, especially of complex measurement systems’ output Uncertainty-estimation .

3 Methods

3.1 Problem formulation

Since manual annotation is often time-consuming and expensive, only a small fraction of images in datasets can have full pixel-wise labels. So our proposed method aims to learn more information from both labeled and abundant unlabeled images by using the uncertainty-aware deep co-training methodology. We formalize the problem of medical image semantic segmentation as follows.

Given labeled dataset , which contains labeled examples, each example comprised of an input image with dimensions and its corresponding pixel-level label , where is the number of classes and is spatial dimension . In semi-supervised setting, we also have unlabeled dataset including unlabeled images, with . The purpose of semi-supervised segmentation is to train a segmentation model of parameter with , which map each pixel of an input image to its correct label.

Figure 1: Illustration of supervised learning. Each model outputs predictions of the image from corresponding labeled subsets, computing supervised loss with labels, illustrated as red lines. Simultaneously, the uncertainty is estimated by Monte Carlo Sampling. Each model approximates its own uncertainty by performing stochastic forward passes and multiplies the uncertainty as a weight coefficient to the supervised loss .

3.2 Proposed approach

We train two models , jointly with labeled dataset and unlabeled dataset

in a collaborative manner. Motivated by performance improvement from the uncertainty estimation in the Bayesian networks, we employ Monte Carlo Sampling

What-Uncertainties-Do-We-Need to estimate uncertainty. Specifically, we perform stochastic forward passes on two models under random dropout. According to this, we obtain a set of predictions and choose the predictive entropy metric to approximate the uncertainty. The predictive entropy can be summarized as:



is the probability of the c-th class in the t-th time prediction. And the uncertainty is estimated in pixel level, the uncertainty of the whole volume

is .

Following co-training methods for semi-supervised semantic segmentation, we employ a loss function composed of a weighted sum of three separate terms to train ensemble models.


Uncertainty is used to impact the weight of the first two parts of the loss function. The details are explained in the following subsections.

3.2.1 Supervised learning

As in standard multi-view learning Deep-multi-view-learning approach, we separate the labeled dataset to complementary subsets and . Training two models individually with corresponding labeled subsets makes each model have a certain fit to the distribution of the data. And also, we can ensure the diversity of the two models.

We employ E-Netenet as our network backbone. To adapt the E-Net as a Bayesian network to estimate the uncertainty, one dropout layer is added between the encoder and decoder. In the network training and uncertainty estimation, we turn on the dropout layer. Since we don’t need to estimate uncertainty in the testing phase, we turn off the dropout layer.

Through the forward propagation of each model times, we can get

different results as shown in Fig. 2. Therefore, we obtain a set of Softmax probability vectors for each pixel in the input. By using predictive entropy as the metric, we can approximate uncertainty maps. We express the uncertainty map as the model confidence of each pixel prediction. If a pixel has a large fluctuation during the prediction process of

forward propagation, the uncertainty of it will be large.

With the guidance of the estimated uncertainty and , we find out the unreliable prediction in images and make models learn more from these pixels from corresponding labels. We design the uncertainty-aware supervised loss as the pixel-level cross-entropy loss of the two models:


3.2.2 Unsupervised learning

Semi-supervised learning means not only exploiting labeled images, unlabeled images are more important for the learning process. The strategy of unsupervised learning wants the segmentation networks to output similar predictions for the same unlabeled image with perturbations. The deep co-training method minimizes the distance between the class distributions predicted by different models. However, the prediction results obtained through one forward pass are contingent, which will limit models’ ability to fit the unlabeled data distribution.

So, we also introduce the concept of uncertainty in unsupervised learning. Same as the fully supervised phase, we perform stochastic forward passes on each model under random dropout as shown in Fig. 3. After leveraging predictive entropy as a metric, we get two uncertainty maps approximated by models. Different from the supervised stage, we combine the results of the two models here. We calculate the average of the two uncertainties and take it as a weight coefficient and multiply it on the loss function. Since the calculated uncertainty fluctuates greatly, we normalize it in this way:


where and are constants used for normalization. In order t to make the uncertainty change within the controllable range, we set , .

In supervised learning, the models should pay more attention to the pixels with higher uncertainty. But here in unsupervised learning, models should learn more from lower uncertainty pixels as there is no label in this stage. The estimated uncertainty guides models concentrating more on reliable (low uncertainty) predictions. We add uncertainty , estimated by two models together, and normalize it as . The uncertainty-aware ensemble agreement loss we leverage here is Jensen-Shannon divergence:

Figure 2: Illustration of unsupervised learning. Two networks output predictions based on the knowledge from each view. We leverage the ensemble agreement loss to encourage the predictions of models to consistency with the guidance of the estimated uncertainty. The final uncertainty is the sum of the uncertainties generated by two models using Monte Carlo Dropout. Each model performs stochastic forward passes to get an uncertainty map.

3.2.3 When to introduce uncertainty?

Introducing uncertainty into models can improve the performance of the network, but introducing it at inappropriate time will crush the entire framework.

In the supervised stage, theoretically, the models don’t need prior knowledge to avoid turning in the wrong direction. So the sooner uncertainty is introduced to guide network learning, the better performance we can get. We verify this by doing experiments of taking uncertainty in concern at different epoch. The results are shown in Fig. 4. Considering the performance in both DSC and HD, we leverage uncertainty from the beginning in the supervised stage.

In the unsupervised stage, the models’ prediction error for unlabeled images is large at the beginning of experiments. This will lead to a large fluctuation range of uncertainty, which is not conducive to allowing the model to focus on where it should learn more. So, we tested adding uncertainty to the unsupervised stage at different epoch, and the final results are shown in Fig. 4. And finally, we learned that introducing uncertainty in the unsupervised stage requires that models have a certain understanding of the target distribution.

Synthesize all factors, we introduce uncertainty to the supervised stage at epoch 0, and epoch 20 for the unsupervised stage in all experiments of our approach.

3.2.4 Adversarial learning

Having diversity between models in the ensemble is indispensable. In co-training, diversity is essential for avoiding the collapse of decision boundary and the models can learn from each other during training. The deep co-training method uses the approach proposed by Qiao et al. for image classification and augments the dataset with adversarial examples. The adversarial examples are generated from both labeled and unlabeled data.

One model uses adversarial examples teaching other models in the ensemble. In the original deep co-training method, the diversity loss is:


where refers to cross entropy and is an input image. The is an adversarial example target on model . So is an adversarial example for model 1, and we can easily get that . In the process of minimizing the first term of Eq. 6., it will make approximately equal to . Combining the above relations, we can obtain . Applying the same idea to model 2, we can conclude that the models have divergence in predicting adversarial examples.

Adversarial examples can be obtained by adding small perturbations to input images. As described in  Deep-co-training , we follow use different schemes to generate these examples based on the source of the image . We apply the Virtual Adversarial Training (VAT) method when is an unlabeled image from ; and we use the Fast Gradient Sign Method (FGSM) when in labeled datasets.

Figure 3: Illustration of the DSC(%) and HD(mm) results of introducing uncertainty in different epochs in the supervised and unsupervised stages. Pictures (a), (b) represent leveraging uncertainty only in the unsupervised stage, and (c), (d) represent using uncertainty only in the supervised stage. The red broken lines in each picture represent voting results by two models, and the blue broken lines represent average results. Considering the performance both in DSC and HD, we finally choose to add uncertainty at 0 epoch of supervised dtage, and at 20 epoch of unsupervised stage.
Method DSC()
RV Myo LV Mean
Part avg
DCT avg
Ours avg
vot 86.59(0.17) 86.65(0.35) 92.74(0.33) 88.45(0.12)
Method HD()
RV Myo LV Mean
Part avg
DCT avg
vot 4.84(0.57)
Ours avg
vot 3.65(0.18) 2.68(0.08) 3.48(0.05)
Table 1: Comparison of semi-supervised method results on segmentation from ACDC dataset with =0.2. All the experiments are reimplemented by us.
Method DSC(%)
IND avg
DCT avg
Ours avg
vot 83.16 88.01 88.57 89.41 89.86 90.89
Method HD(mm)
5 10 20 30 40 50
IND avg
DCT avg
Ours avg
vot 4.70 3.42 3.54 3.09 3.15 2.78
Table 2: Comparison of semantic segmentation results with different label ratios in ACDC dataset.

4 Experiments and results

4.1 Dataset and metrics

We evaluated our method on three medical image segmentation benchmark publicly available datasets: Automated Cardiac Challenge (ACDC)Automatic-MRI-Cardiac , Spinal Cord Gray Matter Challenge (SCGM)Spinal-cord-grey-matter , and Spleen sub-task dataset of the Medical Segmentation Decathlon ChallengeA-large-annotated-medical-image-dataset .

ACDC dataset

: The University Hospital of Dijon created the ACDC dataset from real clinical exams. The ACDC dataset covers several well-defined pathologies with enough cases. It consists of 200 short-axis cine-MRI scans from 100 patients. This dataset is divided into 5 evenly distributed medical groups: patients without cardiac disease, myocardial infarction, hypertrophic cardiomyopathy, abnormal right ventricles, and dilated cardiomyopathy. There are four regions of interest in segmentation masks: right ventricle endocardium (RV), left ventricle endocardium (LV), left ventricle myocardium (Myo), and background. We leveraged 75 subjects (150 scans) for training and 25 subjects (50 scans) for testing. All short-axis slices within 3D-MRI scans were resized to 256 256 as 2D images.

SCGM dataset

: The Spinal Cord Gray Matter Challenge was organised to test different capabilities of various methods. The dataset in this challenge is a public-available collection of multi-vendors, multi-center MRI. All data was aquired at 4 different sites: University College London, Polytechnique Montreal, University of Zurich and Vanderbilt University. It contains 80 healthy subjects (age range of 28.3 to 44.3 years) with 20 subjects from each center. The training set contains 40 labeled scans, each annotated slice-wise by 4 independent experts. The ground truth mask is obtained by majority voting. In our work, we leverage 30 images from the first center as a labeled dataset, and 465 images from all centers as an unlabeled dataset. The test set contains 264 labeled images from center 3 and center 4. The slices are center-cropped to 200 200 pixels.

Spleen dataset

: Spleen dataset is one of the ten sub-tasks of the Medical Segmentation Decathlon ChallengeA-large-annotated-medical-image-dataset . It consists of patients undergoing chemotherapy treatment for liver metastases as a publicly available dataset. The dataset includes 61 portal venous phases CT scans (only 41 were given with ground truth). The ground truth segmentation was generated by a semi-automatic segmentation software and identified by an expert abdominal radiologist. Each slice obtained from the CT scans is resized to 256 256 pixels. We split the dataset into labeled (4 patients), unlabeled (32 patients), and validation image subsets( 5 patients).

Dice similiarity coefficient(DSC)

: DSC measures the overlap between the predictions of the model and the ground truth :

Hausdorff distance(HD)

: HD is a boundary distance metric which measures the largest distance (in mm) between a point in prediction and the closest point in the ground truth :


4.2 Implementation Details

For the sake of speed and accuracy, we leverage the well-known lightweight E-netenet as our basic segmentation network. This architecture is one of the most popular models for segmentation. To adapt the E-Net as a Bayesian network to estimate the uncertainty, one dropout layer with a dropout rate of is added between the encoder and decoder of the E-Net. All the datasets are applied random rotation, random crop and flip as augmentation strategies. We set to 0.03 in FSGM and 10 in VAT. The learning rate decreased with a factor of 10 every 90 epochs, and it is 10 every 100 epochs in SCGM experiments. To describe the level of supervision, we vary the ratio (0 1) of labeled samples in our experiments.

For all experiments, the setting of

is with a dynamic strategy. We used a ramp-up strategy followed Gaussian ramp-up curve. And for all the hyperparameter

settings, we follow  Deep-co-training

. The framework is implemented with Pytorch library, trained on one NVIDIA 2080Ti GPU.

4.3 Comparison with semi-supervised methods

Our uncertainty-aware co-training method is compared against several recent state-of-art semi-supervised methods in medical domain:

Mean Teacher

: Mean Teacher (MT) is a method using multiple deep CNNs for semi-supervised segmentation. And it is an effective approach that averages model weights instead of predictions.

Uncertainty-aware Mean Teacher

: Uncertainty-aware Mean Teacher (UAMT) applies uncertainty estimation to MT.


: Deep-Co-training (CT) sharing information between simultaneously trained models, while preserving their diversity.

We also compare our method with the full supervised method of the same amount of labeled images as the semi-supervised method, referred to as ”Part”. ”IND” means the performance of individually-trained models (Independent). We report the performance of a fully supervised baseline (”Full”) which means training a single model with all available datasets.

For these baselines, we follow the same learning rate decay, weight scheduler, data augmentation setting, and optimization as for our method. For MT and UAMT, we set EMA (Exponential Moving Average) parameters

to 0.99. We report both the average performance of individual models (”avg”) and the performance of combining the predictions of all models using a voting strategy (”vot”) called ensemble soft voting. This strategy leads to a higher accuracy than the prediction of individual models for both DSC and HD. In order to avoid contingency, we compute the average and standard deviation over three runs with different random seeds.

All the compared baseline results reported in our paper are reimplemented by us.

Figure 4: Examples of segmentation results for ACDC dataset with 20 label images. From left to right: Ground truth (GT), Fully supervised baseline (Full), Independent, Mean Teacher (MT), Uncertainty-aware Mean Teacher (UAMT), Deep co-training (DCT), and our Uncertainty-aware Deep co-training method (Ours).
Method DSC() HD()
Mean Teacher
Independent avg
DCT avg
Ours avg
vot 74.31(0.30) 9.01(0.73)
Table 3: Comparison of semi-supervised method results on segmentation from SCGM dataset.
Method DSC() HD()
Mean Teacher
Independent avg
DCT avg
Ours avg
vot 95.45(0.24) 5.57(0.21)
Table 4: Comparison of semi-supervised method results on segmentation from Spleen dataset.
Method DSC
Independent avg
DCT avg
Ours avg
vot 89.02(1.05) 91.94(0.77)
Table 5: Comparison of different label ratios in Spleen dataset.
Method DSC(%) HD(mm)
RV Myo LV Mean RV Myo LV Mean
DCT-seg avg 81.76 82.21 91.75 85.24 9.31 9.16 5.21 7.89
vot 82.84 83.55 92.53 86.31 5.64 4.26 2.90 4.27
Supervised stage via uncertainty avg 85.79 85.75 92.11 87.88 6.66 6.60 4.59 5.95
vot 86.51 86.50 92.17 88.39 3.93 3.52 2.76 3.40
Unsupervised stage via uncertainty avg 84.55 84.73 92.26 87.18 7.42 5.75 4.50 5.89
vot 85.29 85.37 92.45 87.70 4.42 3.41 2.72 3.52
Table 6: Abalation analysis of our method.

4.4 Experimental results

4.4.1 ACDC dataset

Our uncertainty-aware method is first evaluated on the ACDC dataset with = 0.2. Tab. 2. presents the quantitative results. Compared with other methods, our approach achieves higher overall performance on the test dataset in terms of DSC and HD. Especially, our approach achieves improvement over co-training with an overall performance increase of 2.29 in DSC and 0.49 in HD, and the standard deviation of multiple trials (three runs) of the results of our approach is the smallest. With only 20 labeled images, we are only 1.8 in DSC and 0.39 in HD behind the 100 label ratio supervision. Some examples of results from the test dataset are shown in Fig. 5. We can see our method gives contours closer to ground truth (GT), with more accurate segmentation in details between different regions.

Also, we evaluate how label ratios impact results in a dual-view setting. Tab. 3. shows the results for different labeled data ratios: 5, 10, 20, 30, 40, 50. We can easily find that as the label ratio increase, mean DSC values increase sharply, and mean HD values decrease. In all cases, our approach leads to a better performance in DSC and HD than training models separately and deep co-training method. Our approach has more obvious advantages with a low label ratio. With 5 label ratio, our method outperforms Deep co-training 2.73 in DSC and 1.34 in HD.

Figure 5: Examples of segmentation results for SCGM dataset using center 1 as training data. From left to right: Ground truth (GT), Fully supervised baseline (Full), Independent, Mean Teacher (MT), Uncertainty-aware Mean Teacher (UAMT), Deep co-training (DCT), and our Uncertainty-aware Deep co-training method (Ours).

4.4.2 SCGM dataset

We evaluate our method on the task of segmenting spinal cord grey matter in images from the SCGM dataset. The SCGM dataset is from four different clinical centers, so different parameters are applied in collecting the MRI images. We only used a few labeled images (i.e., only 30 images from one center), and test sets are from the other centers. The labeling ratio is about 6.5. Due to the different data sources, it is more difficult to extract the image features of the samples. This also leads to low segmentation accuracy of semi-supervised methods in this task. The use of uncertainty solves this problem well, models can learn effective semantic features from different centers according to the uncertainty.

The results are summarized in Tab. 4. Our approach gives a mean DSC of 5.96 and HD of 6.35 better than the best baseline (deep co-training). And our method gives a mean DSC of 12.94 and HD of 23.26 better than the performance of the fully supervised baseline. Fig. 6. shows the segmentation results on the test dataset. In some difficult-to-recognize images, fully supervised training cannot even segment the lesion area, but our approach can complete the segmentation task to a certain extent.

Figure 6: Examples of segmentation results for Spleen dataset with 20% label images. From left to right: Ground truth (GT), Fully supervised baseline (Full), Independent, Mean Teacher (MT), Uncertainty-aware Mean Teacher (UAMT), Deep co-training (DCT), and our Uncertainty-aware Deep co-training method (Ours).
Figure 7: Uncertainty heatmap under different epochs in supervised learning.
Figure 8: Uncertainty heatmap under different epochs in unsupervised learning.

4.4.3 Spleen dataset

We also validate the effectiveness of our uncertainty-aware deep co-training method on the task of segmenting spinal cord grey matter in images from the Spleen dataset. We repeated our experiments on the Spleen dataset consisting of 2D slices of CT scans resized to a resolution of 256 256 pixels. Tab. 5. summarizes the experimental results. We see that the Deep co-training method’s performance is almost the same as the fully supervised baseline. In case of better stability (standard deviation) of the results, the accuracy has been improved and surpassed full supervision after taking uncertainty into concern. Our method improves accuracy over DCT-seg from 94.09 to 95.45 and HD from 6.86 to 5.57. Examples of segmentation results obtained by tested methods are given in Fig. 7.

Semi-supervised learning pursues using less labeled images to achieve better segmentation accuracy. We show the performance of our approach, DCT-seg, and individually trained models(independent) on smaller label ratios: 5 and 10. Tab.6. gives the mean DSC with standard deviation. Our approach is still in the leading position among all semi-supervised methods.

4.4.4 Abalation analysis of our method

We do ablation studies to prove that our method works in both supervised and unsupervised stages on ACDC dataset. The results are shown on Tab.6.

In supervised stage: As mentioned before, the best time to intorduce uncertainty to the full supervision phase is from the beginning. Uncertainty used here allows models in the network to obtain characteristics of the target area better under full supervision. We can see from Fig. 8. that the main effect of uncertainty in the full supervision stage is in the first 30 epochs. We can tell from the label that the uncertainty maps focus more on the lesion area. After adding uncertainty to the full supervision stage, the method can achieve improvement over co-training with an overall all performance increase of 2.23 in DSC and 0.57 in HD.

In unsupervised stage: The best time at this stage is to leverage uncertainty after models have a certain understanding of the lesion areas semantic features. After experiments per 10 epochs from 0, it is finally determined that the uncertainty will be leveraged after the 20th epoch. So, we visualize the uncertainty map from 0 epochs to the last in Fig. 9. We can see that the uncertainty maps are increasingly focusing on the lesion areas. Obviously, uncertainty always plays a role throughout the experiment. As the training progresses, the color of the lesion areas of the uncertainty maps becomes lighter, which means the predictions of the two models for the same unlabeled image are gradually approaching, but there is still a certain difference. Only introduce uncertainty to the unsupervised stage, we can achieve an improvement over co-training with an overall all performance increase of 1.39 in DSC and 0.75 in HD.

5 Conclusion

In this paper, we propose a novel uncertainty-aware deep co-training method for three medical image segmentation tasks. Our approach using uncertainty in both the supervised learning stage and the unsupervised learning stage in the deep co-training method. We use uncertainty obtained from Monte Carlo Sampling to guide the training process purposely. We validate our method on three challenging medical image datasets. The comparison with other semi-supervised methods confirms the effectiveness of our approach. Our uncertainty-aware co-training method achieves the performance 1.88 away from supervision with only 20 label on ACDC dataset, and an increase of 12.94 in terms of DSC over fully supervised method on SCGM dataset. And for the Spleen dataset, we get 0.89 more than the fully supervised approach in DSC. In future work, we will investigate the effect of different uncertainty estimation manners and apply our approach to other semi-supervised medical image segmentation tasks.


  • (1) M. Z. Khan, M. K. Gajendran, Y. Lee, M. A. Khan, Deep neural architectures for medical image semantic segmentation, IEEE Access (2021).
  • (2) H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528.
  • (3) R. Yang, Y. Yu, Artificial convolutional neural network in object detection and semantic segmentation for medical imaging analysis, Frontiers in Oncology 11 (2021) 573.
  • (4)

    S. A. Taghanaki, K. Abhishek, J. P. Cohen, J. Cohen-Adad, G. Hamarneh, Deep semantic segmentation of natural and medical images: a review, Artificial Intelligence Review 54 (1) (2021) 137–178.

  • (5) L. Schmarje, M. Santarossa, S.-M. Schröder, R. Koch, A survey on semi-, self-and unsupervised learning for image classification, IEEE Access (2021).
  • (6)

    V. Cheplygina, M. de Bruijne, J. P. Pluim, Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, Medical image analysis 54 (2019) 280–296.

  • (7) Y. Meng, H. Zhang, Y. Zhao, X. Yang, X. Qian, X. Huang, Y. Zheng, Spatial uncertainty-aware semi-supervised crowd counting, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15549–15559.
  • (8) Y. Li, X. Chao, Semi-supervised few-shot learning approach for plant diseases recognition, Plant Methods 17 (1) (2021) 1–10.
  • (9) C. Baur, S. Albarqouni, N. Navab, Semi-supervised deep learning for fully convolutional networks, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2017, pp. 311–319.
  • (10) X. Li, L. Yu, H. Chen, C.-W. Fu, P.-A. Heng, Semi-supervised skin lesion segmentation via transformation consistent self-ensembling model, in: Proceedings of the British Machine Vision Conference (BMVC), 2018.
  • (11) D.-H. Lee, et al., Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, in: Workshop on challenges in representation learning, ICML, Vol. 3, 2013, p. 896.
  • (12) W.-C. Hung, Y.-H. Tsai, Y.-T. Liou, Y.-Y. Lin, M.-H. Yang, Adversarial learning for semi-supervised semantic segmentation, in: Proceedings of the British Machine Vision Conference (BMVC), 2018.
  • (13)

    J. Peng, G. Estrada, M. Pedersoli, C. Desrosiers, Deep co-training for semi-supervised image segmentation, Pattern Recognition 107 (2020) 107269.

  • (14) X. J. Zhu, Semi-supervised learning literature survey (2005).
  • (15) W. Bai, O. Oktay, M. Sinclair, H. Suzuki, M. Rajchl, G. Tarroni, B. Glocker, A. King, P. M. Matthews, D. Rueckert, Semi-supervised learning for network-based cardiac mr image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2017, pp. 253–260.
  • (16) X. L. C.-W. F. P.-A. H. Lequan Yu, Shujun Wang, Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation, in: Medical Image Computing and Computer Assisted Intervention Society, 2019.
  • (17) K. Fang, W.-J. Li, Dmnet: Difference minimization network for semi-supervised segmentation in medical images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2020, pp. 532–541.
  • (18) W.-C. Hung, Y.-H. Tsai, Y.-T. Liou, Y.-Y. Lin, M.-H. Yang, Adversarial learning for semi-supervised semantic segmentation, arXiv preprint arXiv:1802.07934 (2018).
  • (19) Y. Wu, M. Xu, Z. Ge, J. Cai, L. Zhang, Semi-supervised left atrium segmentation with mutual consistency training, arXiv preprint arXiv:2103.02911 (2021).
  • (20) X. Zhang, Z. Cui, C. Chen, J. Wei, J. Lou, W. Hu, H. Zhang, T. Zhou, F. Shi, D. Shen, Confidence-aware cascaded network for fetal brain segmentation on mr images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2021, pp. 584–593.
  • (21) W. Edeling, H. Arabnejad, R. Sinclair, D. Suleimenova, K. Gopalakrishnan, B. Bosak, D. Groen, I. Mahmood, D. Crommelin, P. V. Coveney, The impact of uncertainty on predictions of the covidsim epidemiological code, Nature Computational Science 1 (2) (2021) 128–135.
  • (22) M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, U. R. Acharya, et al., A review of uncertainty quantification in deep learning: Techniques, applications and challenges, Information Fusion (2021).
  • (23) C. E. Papadopoulos, H. Yeung, Uncertainty estimation and monte carlo simulation method, Flow Measurement and Instrumentation 12 (4) (2001) 291–298.
  • (24) A. Kendall, Y. Gal, What uncertainties do we need in bayesian deep learning for computer vision?, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., Red Hook, NY, USA, 2017, p. 5580–5590.
  • (25) B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles (2017). arXiv:1612.01474.
  • (26) A. Atanov, A. Ashukha, D. Molchanov, K. Neklyudov, D. Vetrov, Uncertainty estimation via stochastic batch normalization, arXiv preprint arXiv:1802.04893 (2018).
  • (27)

    C. Louizos, M. Welling, Multiplicative normalizing flows for variational bayesian neural networks, in: International Conference on Machine Learning, PMLR, 2017, pp. 2218–2227.

  • (28) U. Upadhyay, V. P. Sudarshan, S. P. Awate, Uncertainty-aware gan with adaptive loss for robust mri image enhancement, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 3255–3264.
  • (29) X. Yan, S. Hu, Y. Mao, Y. Ye, H. Yu, Deep multi-view learning methods: a review, Neurocomputing (2021).
  • (30) A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation (2016). arXiv:1606.02147.
  • (31) O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P.-A. Heng, I. Cetin, K. Lekadir, O. Camara, M. A. Gonzalez Ballester, G. Sanroma, S. Napel, S. Petersen, G. Tziritas, E. Grinias, M. Khened, V. A. Kollerathu, G. Krishnamurthi, M.-M. Rohé, X. Pennec, M. Sermesant, F. Isensee, P. Jäger, K. H. Maier-Hein, P. M. Full, I. Wolf, S. Engelhardt, C. F. Baumgartner, L. M. Koch, J. M. Wolterink, I. Išgum, Y. Jang, Y. Hong, J. Patravali, S. Jain, O. Humbert, P.-M. Jodoin, Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: Is the problem solved?, IEEE Transactions on Medical Imaging 37 (11) (2018) 2514–2525. doi:10.1109/TMI.2018.2837502.
  • (32) F. Prados, J. Ashburner, C. Blaiotta, T. Brosch, J. Carballido-Gamio, M. J. Cardoso, B. N. Conrad, E. Datta, G. Dávid, B. De Leener, et al., Spinal cord grey matter segmentation challenge, Neuroimage 152 (2017) 312–329.
  • (33) A. L. Simpson, M. Antonelli, S. Bakas, M. Bilello, K. Farahani, B. van Ginneken, A. Kopp-Schneider, B. A. Landman, G. Litjens, B. Menze, O. Ronneberger, R. M. Summers, P. Bilic, P. F. Christ, R. K. G. Do, M. Gollub, J. Golia-Pernicka, S. H. Heckers, W. R. Jarnagin, M. K. McHugo, S. Napel, E. Vorontsov, L. Maier-Hein, M. J. Cardoso, A large annotated medical image dataset for the development and evaluation of segmentation algorithms (2019). arXiv:1902.09063.