ACT: Semi-supervised Domain-adaptive Medical Image Segmentation with Asymmetric Co-training

by   Xiaofeng Liu, et al.

Unsupervised domain adaptation (UDA) has been vastly explored to alleviate domain shifts between source and target domains, by applying a well-performed model in an unlabeled target domain via supervision of a labeled source domain. Recent literature, however, has indicated that the performance is still far from satisfactory in the presence of significant domain shifts. Nonetheless, delineating a few target samples is usually manageable and particularly worthwhile, due to the substantial performance gain. Inspired by this, we aim to develop semi-supervised domain adaptation (SSDA) for medical image segmentation, which is largely underexplored. We, thus, propose to exploit both labeled source and target domain data, in addition to unlabeled target data in a unified manner. Specifically, we present a novel asymmetric co-training (ACT) framework to integrate these subsets and avoid the domination of the source domain data. Following a divide-and-conquer strategy, we explicitly decouple the label supervisions in SSDA into two asymmetric sub-tasks, including semi-supervised learning (SSL) and UDA, and leverage different knowledge from two segmentors to take into account the distinction between the source and target label supervisions. The knowledge learned in the two modules is then adaptively integrated with ACT, by iteratively teaching each other, based on the confidence-aware pseudo-label. In addition, pseudo label noise is well-controlled with an exponential MixUp decay scheme for smooth propagation. Experiments on cross-modality brain tumor MRI segmentation tasks using the BraTS18 database showed, even with limited labeled target samples, ACT yielded marked improvements over UDA and state-of-the-art SSDA methods and approached an "upper bound" of supervised joint training.


Semi-Supervised Domain Adaptation by Similarity based Pseudo-label Injection

One of the primary challenges in Semi-supervised Domain Adaptation (SSDA...

MT-UDA: Towards Unsupervised Cross-modality Medical Image Segmentation with Limited Source Labels

The success of deep convolutional neural networks (DCNNs) benefits from ...

Adapting Off-the-Shelf Source Segmenter for Target Medical Image Segmentation

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned ...

Memory Consistent Unsupervised Off-the-Shelf Model Adaptation for Source-Relaxed Medical Image Segmentation

Unsupervised domain adaptation (UDA) has been a vital protocol for migra...

Unsupervised Domain Adaptation with Semantic Consistency across Heterogeneous Modalities for MRI Prostate Lesion Segmentation

Any novel medical imaging modality that differs from previous protocols ...

Semi-Supervised Domain Adaptation with Prototypical Alignment and Consistency Learning

Domain adaptation enhances generalizability of a model across domains wi...

Enhancing Pseudo Label Quality for Semi-SupervisedDomain-Generalized Medical Image Segmentation

Generalizing the medical image segmentation algorithms tounseen domains ...

1 Introduction

Accurate delineation of lesions or anatomical structures is a vital step for clinical diagnosis, intervention, and treatment planning [tajbakhsh2020embracing]

. While recently flourished deep learning methods excel at segmenting those structures, deep learning-based segmentors cannot generalize well in a heterogeneous domain, e.g., different clinical centers, scanner vendors, or imaging modalities

[liu2022deep, Liu_2021_ICCV, liuconstraining, che2019deep]. To alleviate this issue, unsupervised domain adaptation (UDA) has been actively developed, by applying a well-performed model in an unlabeled target domain via supervision of a labeled source domain [chen2019synergistic, liu2021domain, liu2021generative, liu2022unsupervisedFrontiers]. Due to diverse target domains, however, the performance of UDA is far from satisfactory [zou2020unsupervised, han2022deep, liu2022self]. Instead, labeling a small set of target domain data is usually more feasible [van2020survey]. As such, semi-supervised domain adaptation (SSDA) has shown great potential as a solution to domain shifts, as it can utilize both labeled source and target data, in addition to unlabeled target data. To date, while several SSDA classification methods have been proposed [donahue2013semi, yao2015semi, saito2019semi, kim2020attract], based on discriminative class boundaries, they cannot be directly applied to segmentation, since segmentation involves complex and dense pixel-wise predictions.

Recently, while a few works [wang2020alleviating, chen2021semi, hoyer2021improving]

have been proposed to extend SSDA for segmentation on natural images, to our knowledge, no SSDA for medical image segmentation has yet been explored. For example, a depth estimation for natural images is used as an auxiliary task as in

[hoyer2021improving], but that approach cannot be applied to medical imaging data, e.g., MRI, as they do not have perspective depth maps. Wang et al. [wang2020alleviating] simply added supervision from labeled target samples to conventional adversarial UDA. Chen et al. [chen2021semi] averaged labeled source and target domain images at both region and sample levels to mitigate the domain gap. However, source domain supervision can easily dominate the training, when we directly combine the labeled source data with the target data [saito2019semi]. In other words, the extra small amount of labeled target data has not been effectively utilized, because the volume of labeled source data is much larger than labeled target data, and there is significant divergence across domains [saito2019semi].

To mitigate the aforementioned limitations, we propose a practical asymmetric co-training (ACT) framework to take each subset of data in SSDA in a unified and balanced manner. In order to prevent a segmentor, jointly trained by both domains, from being dominated by the source data only, we adopt a divide-and-conquer strategy to decouple the label supervisions for the two asymmetric segmentors, which share the same objective of carrying out a decent segmentation performance for the unlabeled data. By “asymmetric,” we mean that the two segmentors are assigned different roles to utilize the labeled data in either source or target domain, thereby providing a complementary view for the unlabeled data. That is, the first segmentor learns on the labeled source domain data and unlabeled target domain data as a conventional UDA task, while the other segmentor learns on the labeled and unlabeled target domain data as a semi-supervised learning (SSL) task. To integrate these two asymmetric branches, we extend the idea of co-training [blum1998combining, balcan2005co, qiao2018deep], which is one of the most established multi-view learning methods. Instead of modeling two views on the same set of data with different feature extractors or adversarial sample generation in conventional co-training [blum1998combining, balcan2005co, qiao2018deep], our two cross-domain views are explicitly provided by the segmentors with the correlated and complementary UDA and SSL tasks. Specifically, we construct the pseudo label of the unlabeled target sample based on the pixel-wise confident predictions of the other segmentor. Then, the segmentors are trained on the pseudo labeled data iteratively with an exponential MixUp decay (EMD) scheme for smooth propagation. Finally, the target segmentor carries out the target domain segmentation.

The contributions of this work can be summarized as follows:

We present a novel SSDA segmentation framework to exploit the different supervisions with the correlated and complementary asymmetric UDA and SSL sub-tasks, following a divide-and-conquer strategy. The knowledge is then integrated with confidence-aware pseudo-label based co-training.

An EMD scheme is further proposed to mitigate the noisy pseudo label in early epochs of training for smooth propagation.

To our knowledge, this is the first attempt at investigating SSDA for medical image segmentation. Comprehensive evaluations on cross-modality brain tumor (i.e., T2-weighted MRI to T1-weighted/T1ce/FLAIR MRI) segmentation tasks using the BraTS18 database demonstrate superiority performance over conventional source-relaxed/source-based UDA methods.

Figure 1: Illustration of our proposed ACT framework for SSDA cross-modality (e.g., T2-weighted to T1-weighted MRI) image segmentation. Note that only target domain specific segmentor will be used in testing.

2 Methodology

In our SSDA setting for segmentation, we are given a labeled source set , a labeled target set , and an unlabeled target set , where , , and are the number of samples for each set, respectively. Note that the slice , and , and the segmentation mask labels , and have the same spatial size of . In addition, for each pixel or indexed by , the label has classes, i.e., . There is a distribution divergence between source domain samples, , and target domain samples, and . Usually, is much smaller than . The learning objective is to perform well in the target domain.

2.1 Asymmetric Co-training for SSDA segmentation

To decouple SSDA via a divide-and-conquer strategy, we integrate with either or to form the correlated and complementary sub-tasks of UDA and SSL. We configure a cross-domain UDA segmentor and a target domain SSL segmentor , which share the same objective of achieving a decent segmentation performance in . The knowledge learned from the two segmentors is then integrated with ACT. The overall framework of this work is shown in Fig. 1.

Conventional co-training has focused on two independent views of the source and target data or generated artificial multi-views with adversarial examples, which learns two classifiers for each of the views and teaches each other on the unlabeled data

[blum1998combining, qiao2018deep]. By contrast, in SSDA, without multiple views of the data, we propose to leverage the distinct yet correlated supervision, based on the inherent discrepancy of the labeled source and target data. We note that the sub-tasks and datasets adopted are different for the UDA and SSL branches. Therefore, all of the data subsets can be exploited, following well-established UDA and SSL solutions without interfering with each other.

To achieve co-training, we adopt a simple deep pseudo labeling method [wei2020theoretical], which assigns the pixel-wise pseudo label for . Though UDA and SSL can be achieved by different advanced algorithms, deep pseudo labeling can be applied to either UDA [zou2019confidence] or SSL [wei2020theoretical]. Therefore, we can apply the same algorithm to the two sub-tasks, thereby greatly simplifying our overall framework. We note that while a few methods [xia2021uncertainty] can be applied to either SSL or UDA like pseudo labeling, they have not been jointly adopted in the context of SSDA.

Specifically, we assign the pseudo label for each pixel in with the prediction of either or , therefore constructing the pseudo labeled sets and for the training of another segmentor and , respectively:


where and

are the predicted probability of class

w.r.t. using and , respectively. is a confidence threshold. Note that the low softmax prediction probability indicates the low confidence for training [zou2019confidence, liu2021generative]. Then, the pixels in the selected pseudo label sets are merged with the labeled data to construct and for the training of and with a conventional supervised segmentation loss, respectively. Therefore, the two segmentors with asymmetrical tasks act as teacher and student of each other to distillate the knowledge with highly confident predictions.

Input: batch size , , , , , , , current network parameters , ;
Sample andfrom and respectively;
Initialize , ;
for  to  do
          if : update with Eq. (1);
          if : update with Eq. (2);
end for
Obtain with Eq. (3);
Obtain with Eq. (4);
Update ; ;
Output : Updated network parameters and .
Algorithm 1 An iteration of the ACT algorithm.

2.2 Pseudo-label with Exponential MixUp Decay

Initially generated pseudo labels with the two segmentors are typically noisy, which is significantly acute in the initial epochs, thus leading to a deviated solution with propagated errors. Numerous conventional co-training methods relied on simple assumptions that there is no domain shift, and the predictions of the teacher model can be reliable and be simply used as ground truth. Due to the domain shift, however, the prediction of in the target domain could be noisy and lead to an aleatoric uncertainty [der2009aleatory, kendall2017uncertainties, hu2019supervised]. In addition, insufficient labeled target domain data can lead to an epistemic uncertainty related to the model parameters [der2009aleatory, kendall2017uncertainties, hu2019supervised].

To smoothly exploit the pseudo labels, we propose to adjust the contribution of the supervision signals from both labels and pseudo labels as the training progresses. Previously, vanilla MixUp [zhang2017mixup] was developed for efficient data augmentation, by combining both samples and their labels to generate new data for training. We note that the MixUp used in SSL [berthelot2019mixmatch, chen2021semi] adopted a constant sampling, and did not take the decay scheme for gradual co-training. Thus, we propose to gradually exploit the pseudo label by mixing up or with pseudo labeled , and adjust their ratio with the EMD scheme. For the selected and with the number of slices and , we mix up each pseudo labeled image with all images from or to form the mixed pseudo labeled sets and . Specifically, our EMD can be formulated as:


where is the MixUp parameter with the exponential decay w.r.t. iteration . is the initial weight of ground truth samples and labels, which is empirically set to 1. Therefore, along with the increase over iteration , we have smaller , which adjusts the contribution of the ground truth label to be large at the start of the training, while utilizing the pseudo labels at the later training epochs. Therefore, and gradually represent the pseudo label sets of and . We note that the mixup operates on the image level, which is indicated by . The number of generated mixed samples depends on the scale of and in each iteration and batch size . With the labeled , , as well as the pseudo labeled sets with EMD and , we update the parameters of the segmentors and , i.e., and with SGD as:


where indicates the learning rate, and denotes the learning loss on with the current segmentor parameterized by . The training procedure is detailed in Algorithm 1. After training, only the target domain specific SSL segmentor is used for testing.

Figure 2: Comparisons with other UDA/SSDA methods and ablation studies for the cross-modality tumor segmentation. We show target test slices of T1, T1ce, and FLAIR MRI from three subjects.

3 Experiments and Results

To demonstrate the effectiveness of our proposed SSDA method, we evaluated our method on T2-weighted MRI to T1-weighted/T1ce/FLAIR MRI brain tumor segmentation using the BraTS2018 database [menze2014multimodal]. We denote our proposed method as ACT, and used ACT-EMD for an ablation study of an EMD-based pseudo label exploration.

Of note, the BraTS2018 database contains a total of 285 patients [menze2014multimodal] with the MRI scannings, including T1-weighted (T1), T1-contrast enhanced (T1ce), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR) MRI. For the segmentation labels, each pixel belongs to one of four classes, i.e., enhancing tumor (EnhT), peritumoral edema (ED), necrotic and non-enhancing tumor core (CoreT), and background. In addition, the whole tumor covers CoreT, EnhT, and ED. We follow the conventional cross-modality UDA (i.e., T2-weighted to T1-weighted/T1ce/FLAIR) evaluation protocols [zou2020unsupervised, han2022deep, liu2022self] for 8/2 splitting for training/testing, and extend it to our SSDA task, by accessing the labels of 1-5 target domain subjects at the adaptation training stage. All of the data were used in a subject-independent and unpaired manner. We used SSDA:1 or SSDA:5 to denote that one or five target domain subjects are labeled in training.

For a fair comparison, we used the same segmentor backbone as in DSA [han2022deep] and SSCA [liu2022self], which is based on Deeplab-ResNet50. Without loss of generality, we simply adopted the cross-entropy loss as , and set the learning rate and confidence threshold . Both and

have the same network structure. For the evaluation metrics, we adopted the widely used DSC (the higher, the better) and Hausdorff distance (HD: the lower, the better) as in

[han2022deep, liu2022self]

. The standard deviation was reported over five runs.

DICE Score (DSC) [] Hausdorff Distance (HD) [mm]
Method Task T1 FLAIR T1CE Ave T1 FLAIR T1CE Ave
Source Only No DA 4.2 65.2 6.3 27.71.2 55.7 28.0 49.8 39.60.5
Target Only SSL:5 43.8 54.6 47.5 48.61.7 31.9 29.6 35.4 32.30.8
SIFA [chen2019synergistic] UDA 51.7 68.0 58.2 59.30.6 19.6 16.9 15.0 17.10.4
DSFN [zou2020unsupervised] UDA 57.3 78.9 62.2 66.10.8 17.5 13.8 15.5 15.60.3
DSA [han2022deep] UDA 57.7 81.8 62.0 67.20.7 14.2 8.6 13.7 12.20.4
SSCA [liu2022self] UDA 59.3 82.9 63.5 68.60.6 12.5 7.9 11.2 11.50.3
SLA [wang2020alleviating] SSAD:1 64.7 82.3 66.1 71.00.5 12.2 7.1 10.5 9.90.3
DLD [chen2021semi] SSAD:1 65.8 81.5 66.5 71.30.6 12.0 7.1 10.3 9.80.2
ACT SSAD:1 69.7 84.5 69.7 74.60.3 10.5 5.8 10.0 8.80.1
ACT-EMD SSAD:1 67.4 83.9 69.0 73.40.6 10.9 6.4 10.3 9.20.2
ACT SSAD:5 71.3 85.0 70.8 75.70.5 10.0 5.2 9.8 8.30.1
ACT-EMD SSAD:5 70.3 84.4 69.8 74.80.4 10.4 5.7 10.2 8.80.2
Joint Training Supervised 73.2 85.6 72.6 77.10.5 9.5 4.6 9.2 7.70.2
Table 1: Whole tumor segmentation performance of the cross-modality UDA and SSDA. The supervised joint training can be regarded as an “upper bound”.
DICE Score  (DSC) [] Hausdorff Distance (HD) [mm]
Method Task CoreT EnhT ED CoreT EnhT ED
Source Only No DA 20.61.0 39.50.8 41.30.9 54.70.4 55.20.6 42.50.4
Target Only SSL:5 27.31.1 38.01.0 51.80.7 52.30.9 46.40.6
DSA [han2022deep] UDA 57.80.6 44.00.6 56.80.5 25.80.4 34.20.3 25.60.5
SSCA [liu2022self] UDA 58.20.4 44.50.5 60.70.4 26.40.2 32.80.2 23.40.3
SLA [wang2020alleviating] SSDA:1 58.90.6 48.10.5 65.40.4 24.50.1 27.60.3 20.30.2
DLD [chen2021semi] SSDA:1 60.30.6 48.20.5 66.00.3 24.20.2 27.80.1 19.70.2
ACT SSDA:1 64.50.3 52.70.4 69.80.6 20.00.2 24.60.1 16.20.2
ACT SSDA:5 66.90.3 54.00.3 71.20.5 18.40.4 23.70.2 15.10.2
Joint Training Supervised 70.40.3 62.50.2 75.10.4 15.80.2 22.70.1 13.00.2
Table 2: Detailed comparison of Core/EnhT/ED segmentation. Results are averaged over three tasks including T2-weighted to T1-weighted, T1CE, and FLAIR MRI with the backbone as in [han2022deep, liu2022self].

The quantitative evaluation results of the whole tumor segmentation are provided in Table 1. We can see that SSDA largely improved the performance over the compared UDA methods [han2022deep, liu2022self]. For the T2-weighted to T1-weighted MRI transfer task, we were able to achieve more than 10% improvements over [han2022deep, liu2022self] with only one labeled target sample. Recent SSDA methods for natural image segmentation [wang2020alleviating, chen2021semi] did not take the balance between the two labeled supervisions into consideration, easily resulting in a source domain-biased solution in case of limited labeled target domain data, and thus did not perform well on target domain data [saito2019semi]. In addition, the depth estimation in [hoyer2021improving] cannot be applied to the MRI data. Thus, we reimplemented the aforementioned methods [wang2020alleviating, chen2021semi] with the same backbone for comparisons, which is also the first attempt at the medical image segmentation. Our ACT outperformed [wang2020alleviating, chen2021semi] by a DSC of 3.3% w.r.t. the averaged whole tumor segmentation in SSDA:1 task. The better performance of ACT over ACT-EMD demonstrated the effectiveness of our EMD scheme for smooth adaptation with pseudo-label. We note that we did not manage to outperform the supervised joint training, which accesses all of the target domain labels, which can be considered an “upper bound” of UDA and SSDA. Therefore, it is encouraging that our ACT can approach joint training with five labeled target subjects. In addition, the performance was stable for the setting of from 1 to 10.

In Table 2, we provide the detailed comparisons for more fine-grained segmentation w.r.t. CoreT, EnhT, and ED. The improvements were consistent with the whole tumor segmentation. The qualitative results of three target modalities in Fig. 2 show the superior performance of our framework, compared with the comparison methods.

In Fig. 3(a), we analyzed the testing pixel proportion change along with the training that has both, only one, and none of two segmentor pseudo-labels, i.e., the maximum confidence is larger than as in Eq. (1). We can see that the consensus of the two segmentors keeps increasing, by teaching each other in the co-training scheme for knowledge integration. “Both” low rates, in the beginning, indicate and may provide a different view based on their asymmetric tasks, which can be complementary to each other. The sensitivity studies of using a different number of labeled target domain subjects are shown in Fig. 3(b). Our ACT was able to effectively use . In Fig. 3(c), we show that using more EMD pairs improves the performance consistently.

Figure 3: Analysis of our ACT-based SSDA on the whole tumor segmentation task. (a) The proportion of testing pixels that both, only one, or none of the segmentors have high confidence on (b) the performance improvements with a different number of labeled target domain training subjects, and (c) a sensitivity study of changing different proportion of EMD pairs of and .

4 Conclusion

This work proposed a novel and practical SSDA framework for the segmentation task, which has the great potential to improve a target domain generalization with a manageable labeling effort in clinical practice. To achieve our goal, we resorted to a divide-and-conquer strategy with two asymmetric sub-tasks to balance between the supervisions from source and target domain labeled samples. An EMD scheme is further developed to exploit the pseudo-label smoothly in SSDA. Our experimental results on the cross-modality SSDA task using the BraTS18 database demonstrated that the proposed method surpassed the state-of-the-art UDA and SSDA methods.


This work is supported by NIH R01DC018511, R01DE027989, and P41EB022544.