1 Introduction
Semisupervised learning provides great significance in left atrium (LA) segmentation model learning with insufficient labelled data. Automated and accurate LA segmentation is a crucial task to aid the diagnosis and treatment for the patients with atrial fibrillation (AF) [1, 2, 3, 4]
. Deep learning based approaches have great potential for the LA segmentation
[5, 6]. However, it is expensive and laborious to annotate large amounts of data by experienced experts for training an accurate LA segmentation model based on deep learning [7]. Since semisupervised learning can alleviate the need for the labelled data by effectively exploiting the unlabelled data to learn deep models [8]. Semisupervised learning is able to overcome the insufficient labelled data for advancing the accurate LA segmentation, benefiting the subsequent diagnosis and treatment for the patients with AF.Generalising semisupervised learning to crossdomain data for the LA segmentation is of high importance to improve model robustness. Semisupervised learning aims to mine effective hidden information from unlabelled data to support model learning [9]. Because of the noise interference and the limited collection capabilities of data sources, a single data domain can not always provide sufficient highquality unlabelled data and abundant data characteristics for robust semisupervised LA segmentation. For example, the single data domain is usually subject to the limited LA varieties of contrast, shape and texture for robust model learning. Compared to the single data domain, crossdomain data not only can provide more available highquality data, but also can provide complementary domain information and more comprehensive data characteristics to describe the LA of interest [10]. Therefore, it is important to effectively ensemble crossdomain data for robust semisupervised LA segmentation.
However, generalising semisupervised to crossdomain data is difficult due to the difference of distributions and the sample mismatch as shown in Fig. 1: (1) The difference of crossdomain data distributions. Semisupervised learning with the generative model, lowdensity separation and graphbased method can work but relies on the consistent data distribution under certain model assumptions including smoothness assumption, cluster assumption or manifold assumption [9]. Performance degradation of the semisupervised model may occur whenever the assumptions adopted for a particular task do not match the characteristics of the data distribution [9]. In the real world, crossdomain data collected from different sources exhibit heterogeneous properties [11], which can lead to the difference in distributions. For example, in medical image analysis, because of the different subject groups, scanners, or scanning protocols, the distributions of crossdomain data are different [12]. Therefore, generalising semisupervised learning to crossdomain data directly is not trivial. (2) Sample mismatch of crossdomain data. Semisupervised learning with the disagreementbased method requires matched samples from different domains, where the information of different domains is regarded as the different characteristics of matched samples [13]. Since the collection of crossdomain data is independent, the samples in different domains are not matched. This restricts the crossdomain generalisation of semisupervised learning.
In order to overcome the issues mentioned above, we propose an Adaptive Hierarchical Dual Consistency framework called AHDC for semisupervised LA segmentation on crossdomain data as shown in Fig. 1
. The AHDC consists of two modules: (1) A Bidirectional Adversarial Inference module (BAI), which performs the mutual domain adaptation to align distributions and match samples for two different data domains. The adapted domains and two corresponding source domains are merged to obtain two matched domains. The obtained matched domains not only expand the number of data in a specific source domain, but also learns complementary representation for the samples in the specific source domain. (2) A Hierarchical Dual Consistency learning module (HDC), which performs a hierarchical semisupervised segmentation with dual consistency on the obtained matched domains. The HDC builds two dualmodelling networks applied to the matched domains for mining the complementary information in both intradomain and interdomain. Within a specific domain, the segmentation task is represented as global modelling and local modelling. Then we perform a consistency between the complementary modelling LAs for intradomain semisupervised learning. For the interdomain, we build a consistency between the outputs of dualmodelling networks estimated from different domains to exploit the complementary domain information.
Our main contributions are summarised as follows:

We propose a semisupervised LA segmentation framework for generalising across domains. It provides a solution for generalising semisupervised LA segmentation to crossdomain data with effectiveness on both different distributions and mismatched samples.

We propose a paradigm of hierarchical dual consistency learning to mine the effective information in both interdomain and intradomain. It explicitly enforcing consistency under complementary information.

We have conducted comprehensive experiments on four 3D MR datasets from different centres and one 3D CT dataset. The experiment results demonstrated the feasibility and the superiority of our proposed crossdomain semisupervised segmentation framework.
2 Related Work
2.1 Domain Adaptation
Domain adaptation, which aims to overcome the distribution difference of different domains, has drawn great attention in computer vision
[14]. Because generative adversarial network (GAN) has great superiority in capturing data distribution, it has been widely used in domain adaptation for aligning distributions of different domains
[15, 16, 17, 18, 19]. There are different GAN based structures for achieving domain adaptation. For the domain adaptation with a single direction, GAN usually leverages a generator and a discriminator to improve the distribution of the source domain to approximate it to the distribution of the target domain by adversarial learning. To focus on the highresolution image with emphasis on pixellevel reconstruction, Pix2pixHD extends conditional GANs to leverage a decomposed generator and three multiscale discriminators to achieve domain adaptation [20]. For the domain adaptation with bidirection, CycleGAN [21], DualGAN [22] and DiscoGAN [23] concatenate two generators with two discriminators to ensure two cyclic consistency for the bidirectional domain adaptation of two different domains. ALI [24] and BiGAN [25]employ two generators and a discriminator to match joint distribution for different domains. However, ALI and BiGAN do not focus on pixellevel reconstruction, thus cannot effectively capture the position, colour, and style of targets. ALICE extends the ALI to exploit cycleconsistency to focus on pixellevel reconstruction for the target domain
[26]. It also proposes to enforce cycleconsistency using fully adversarial learning with an extra discriminator. Our used domain adaptation method is based on the ALICE framework. We extended it to focus on bidirectional pixellevel reconstruction for two domains simultaneously. In order to reduce computing resources and difficulty of training while using fully adversarial learning, we adopt the explicit cycleconsistency, thus exploiting two generators and a discriminator for bidirectional domain adaptation with pixellevel reconstruction.2.2 Semisupervised Learning
Semisupervised learning alleviates the problem of the lack of labelled data. Here we only discuss related consistencybased and disagreementbased semisupervised learning. More information about semisupervised learning can be found in [9]. The consistencybased methods constrain the prediction consistency under different perturbations and ensembles. For example, the model enforces the prediction consistency under the input perturbations with different Gaussian noise and the model perturbation with dropout operation [27]. Unsupervised data augmentation (UDA) replaces the traditional noise perturbations with highquality data augmentations (e.g., RandAugment, Backtranslation and TFIDF) to improve consistency learning [28]. FixMatch uses a separate weak augmentation and a strong augmentation on input data for consistency regularisation [29]. In contrast to these methods, Temporal Ensembling (TE) penalises the inconsistency between the current prediction and the integration of previous predictions based on an exponential moving average (EMA) [27]. Compared to the TE, the Mean Teacher proposes to average the weights of a base model [30]. However, they need multiple reasoning processes to provide predictions for consistency learning, thus being subject to the computational cost.
The disagreementbased semisupervised learning exploits the disagreement of predictions from multiple task learners during the learning process [13] including cotraining and coregularisation. Cotraining leverages two sufficient and redundant views of data to train two task models for annotating the unlabelled data. Then the unlabelled data with high prediction confidence is added to the training set for further improving the model [31, 32]. Coregularisation tries to directly minimise the prediction disagreement of unlabelled samples on different views [33].
Notion  Definition  Notion  Definition  
Domain from source1  Domain from source2  








Discriminator  









Ground truth  





Params of  Param of  





Loss function  Weight Param 
3 Method
3.1 Overview
The overview of our proposed AHDC framework is illustrated in Fig. 2. The notations are summarised in TABLE I. The AHDC framework consists of two modules: a BAI module and a HDC module. Given two different data domains denoted by and . contains both labelled data and unlabelled data , where with labelled samples and with unlabelled samples, respectively. The only contains unlabelled data denoted as with unlabelled samples. The BAI module employs two mapping networks of and to generate complementary domains by adapting and to each other, where the domain adapted from to is denoted as while the domain adapted from to is denoted as . Then the targeted domains ( and ) and the corresponding adapted domains ( and ) merge to form two matched domains of and . Finally, two dualmodelling networks of and are fed with matched samples sampled from and to predict LAs, where the LAs predicted by the local modelling and the global modelling are denoted as and while the LAs predicted by the local modelling and the global modelling are denoted as and , respectively.
3.2 Bidirectional Adversarial Inference for Distribution Alignment and Sample Matching.
Consider a to domain mapping network . Meanwhile, consider a to domain mapping network . We denote two domain marginal distributions of and as and . One domain can be inferred based on the other using parameterised conditional distributions, and , where and denote the parameters of two distributions. Then, we have the joint distributions of and . We aims to match to and match to by matching and . Then we use a discriminator network parameterised using to penalise mismatches in the joint distributions of and . Specifically, we consider following objectives:
(1) 
where the
denotes the sigmoid function.
Intuitively, if equation (1) is achieved, and match each other, which not only implies that and match each other, but also implies that and
match each other. However, the relationship between random variables
and is not specified or constrained by equation (1). In order to obtain paired samples, according to [26], we extend the conditional entropies from single constraint to bidirection constraints and , which imposes constraints on the conditionals and , simultaneously. Because there is no explicit distributions to compute the conditional entropies. According to [26], we bound the conditional entropies using the cycleconsistency ( and ):(2) 
Similarly,
(3) 
where the and are denoted as the reconstructions of and . KL denotes the KullbackLeible divergence. According to the equations of and , on the one hand, we have a function defined by , which first generates from based on , then produces from generated . On the other hand, we also have a function defined by , which first generates from based on , then produces from generated . In contrast to the fully adversarial training for solving and , we employ the reconstruction loss to reduce the difficulty of model training. Specifically, we consider following object:
(4) 
(5) 
where the denotes the mean absolute error. Finally, we have the following object for BAI:
(6) 
where and
are hyperparameters to balance the adversarial loss and the reconstruction loss.
3.3 Hierarchical Dual Consistency for Semisupervised Segmentation
The BAI makes the crossdomain data adapt to each other to produce matched domains. In detail, the domain adapted to source is denoted as , where with labelled samples and with unlabelled samples. The domain adapted to source is denoted as with unlabelled samples. Then we merge two source domains and two adapted domains to obtain the matched domains of and . The , where with labelled samples and with unlabelled samples. The , where with labelled samples and with unlabelled samples. We denote two domain marginal distributions of and as and , respectively. The joint distribution of and is denoted as .
Based on the matched domains, we investigate complementary LA modelling and complementary domain knowledge learning to provide inherent prediction perturbation for the consistency based crossdomain semisupervised learning. Therefore, a hierarchical dual consistency is investigated. Specifically, for the intradomain, we consider two dualmodelling networks parameterised by and parameterised by applied to the matched domains of and , respectively. Each dualmodelling network estimates two targets by considering local information and global information of image, where simultaneously performs the global modelling of parameterised by and the local modelling of parameterised by . Similarly, the simultaneously performs the global modelling of parameterised by and the local modelling of parameterised by . Then we encourage the global modelling and the local modelling of each dualmodelling network to predict consistent targets via the consistency loss:
(7) 
(8) 
where denotes the dice loss function. For the dual consistency in interdomain, we maximise the agreement on two matched domains. Therefore, we encourage and to predict similar outputs by:
(9) 
where denotes the crossentropy loss function. To avoid that and gradually resemble each other, we encourage the and to produce conditional independent features by orthogonalising the weights of feature layers:
(10) 
where the denotes the number of layers in and . represents the number of features in th layer. and denote the parameters of th feature layer in and , respectively.
Beyond the consistency learning above, and can explicitly learns from and with the supervision of the labels:
(11) 
(12) 
where the denotes the LA label. denotes the supervised loss functions (crossentropy loss function and dice loss function). Then the final training objective for the learning of and is denoted as:
(13) 
where the , , and are hyperparameters to balance the loss terms.
3.4 Network Configuration
The BAI module contains three subnetworks: two domain mapping networks (, ) and a discriminative network . We use the 2D UNet with bilinear upsampling as network backbones of both and . has six convolution layers with the numbers of filters of , respectively. Each of the first five
convolutional layers with a stride of
is followed by a batch normalisation layer and a ReLU layer. The final
convolutional layer with a stride of is followed by a sigmoid layer.Hierarchical dualmodelling network contains two dualmodelling networks with the same structure. Each dualmodelling network contains a 2D UNet with bilinear upsampling used to extract image features and two branch networks used to estimate targets. The two branch networks are the global modelling network and the local modelling network. The global modelling network is based on the selfattention [34, 35, 36] as shown in Fig. 4. In the global modelling network, we use the sinusoidal position encoding to emphasise the sequential relationship between input feature patches [37]. The local modelling network consists of three convolution blocks. The details are shown in Fig. 4.
Centres  Acquired Resolution  TE/TR  Scanner  Source  Amount of Data  

C1  mm  2.25.2 ms  1.5 Tesla Avanto  Royal Brompton Hospital  LGECMR scans  
C2  mm  2.3/5.4 ms 

CARMA, University of Utah  LGECMR scans  
C3  mm  2.1/5.3 ms  1.5T Philips Achieva  Beth Israel Deaconess Medical Center  LGECMR scans  
C4  mm  2.1/5.3 ms  1.5T Philips Achieva  Imaging Sciences at King’s College London  LGECMR scans 
4 Experiments
4.1 Overview of Experiments
Comprehensive experiments were performed to validate our proposed AHDC.
(1) The feasibility of AHDC for generalising across domains: Our proposed AHDC was validated on four 3D late gadolinium enhancement cardiac MR (LGE CMR) datasets and a 3D CT dataset combined in pairs, which followed the independent validation protocol. Furthermore, we also investigated the impact of different ratios () of the labelled data for validating our proposed AHDC.
(2) The superiority of AHDC for generalising across domains: We compared to widely used and stateoftheart semisupervised methods on crossdomain data for comparison, including mean teacher (MT) method [30], uncertaintyaware selfensembling model (UAMT) [7], DualTask consistency (DTC) [38] and DualTeacher [39]. It is of note that MT, UAMT and DTC were proposed for the singledomain semisupervised learning while the DualTeacher method was proposed for the crossdomain learning. Besides, the DualTeacher required the labelled data from both crossdomain data for model learning. For a fair comparison, MT, UAMT and DTC were performed on one of the matched domains, i.e., . We also compared with the joint training method that combining the crossdomain data directly for the LA segmentation based on our proposed semisupervised method.
(3) The effectiveness of the components in AHDC: Firstly, we compared the performance between different architectures of the BAI module. On the one hand, to validate the effectiveness of bidirectional reconstruction for specifying the relationship of matched samples, an experiment was performed on bidirectional adversarial inference without using bidirectional reconstruction (BAI/ALI/BiGAN). On the other hand, to validate the effectiveness of skip connection of domain mapping network for keeping target structure consistent, an experiment was performed on bidirectional adversarial inference without using skip connection in domain mapping network (BAI). Then, we further validated the performance of BAI by comparing it with the fully adversarial ALICE [26] on the downstream semisupervised tasks. Finally, for validating the effectiveness of HDC, we decomposed the HDC into independent intradomain dual consistency learning (HDC) by removing a dualmodelling network and interdomain dual consistency learning (HDC) by removing global modelling branch but retaining local modelling branch.
(4) The effectiveness of the BAI for matching domains
: Firstly, we performed the principal components analysis to show the data distributions of source domains (
and ) and the adapted domains ( and ). The data distributions of source domains and the adapted domains were compared to validate the effectiveness of AHDC for aligning distributions. Then, we made a qualitative visualisation of images before and after the bidirectional adversarial inference to validate the effectiveness of AHDC for matching samples.(5) The effectiveness of the HDC for the availability of complementary information: To validate the availability of complementary modelling information in the intradomain, we compared the segmentation performance of dual modelling network (localglobal modelling structure) to the ones without using dualmodelling structures. Specifically, we replaced the localmodelling branch with the globalmodelling branch (globalglobal modelling structure) and replaced the globalmodelling branch with the localmodelling branch (locallocal modelling structure) in dual modelling network for experiments. To validate the availability of complementary domain information in interdomain, we compared the segmentation performance of HDC with/without using the orthogonal weight constraint (WOW and WOOW).
(6) The effects of parameter settings on model performance: We explored two important parameter settings. (i) The impact of different patch sizes (, and ) for global modelling. (ii) The impact of different values of (0.0, 0.1, and 1.0 ) for interdomain learning.
4.2 Datasets
To evaluate the performance of our proposed AHDC, four 3D LGEMRI datasets (C1, C2, C3 and C4) and a 3D CT dataset (C5) were collected as a retrospective study. In our experiments, the collected datasets of C1 and C2 included segmentation of the LA epicardium and LA endocardium while the collected datasets of C3, C4 and C5 included segmentation of the LA endocardium. We have summarised the characteristics of the four 3D LGEMRI datasets to emphasise their differences as shown in TABLE 2.
LGEMRI scanning sequence of centre 1 (C1): Cardiac MR data were acquired in patients with longstanding persistent atrial fibrillation (AF) on a Siemens Magnetom Avanto 1.5T scanner (Siemens Medical Systems, Erlangen, Germany). Transverse navigatorgated 3D LGECMRI [40] was performed using an inversion prepared segmented gradient echo sequence (TE/TR 2.2ms/5.2ms) 15 minutes after gadolinium administration (Gadovistgadobutrol, 0.1mmol/kg body weight, BayerSchering, Berlin, Germany) [41]. The inversion time was set to null the signal from normal myocardium. The acquired resolution parameter of LGECMRI data was mm (reconstructed to mm). LGECMRI data were acquired during freebreathing using a crossedpairs navigator positioned over the dome of the right hemidiaphragm with navigator acceptance window size of and CLAWS respiratory motion control [42, 43]. The LGE CMR data were collected from the Royal Brompton Hospital. In total, 165 scans were used in this study.
LGEMRI scanning sequence of centre 2 (C2): Cardiac MR data were obtained on a 1.5 Tesla Avanto scanners or a 3.0 Tesla Vario (Siemens Medical Solutions, Erlangen, Germany). The scan is acquired 20–25 minutes after 0.1 mmol/kg gadolinium contrast (Multihance, Bracco Diagnostics Inc., Princeton, NJ) using a 3D respiratory navigated, inversion recovery prepared gradient echo pulse sequence. Typical acquisition parameters are free breathing using navigator gating, a transverse imaging volume with voxel size = mm (reconstructed to mm), TR/TE = 5.4/2.3 ms, inversion time (TI)=270310 ms. The TI value for the LGEMRI scan is identified using a scout scan. Typical scan times for the LGEMRI study were between 8 and 15 min at 1.5 T and 6–11 min using the 3T scanner (for Siemens sequences) depending on subject respiration and heart rates. The LGE CMR data were collected from the Comprehensive Arrhythmia Research and Management, University of Utah. In total, 153 scans were used in this study.
LGEMRI scanning sequence of center 3 (C3): C3 is from the ISBI 2012 Left Atrium Fibrosis and Scar Segmentation Challenge [44, 45]. The LGE CMR data were collected from the Beth Israel Deaconess Medical Center. In total, 20 scans were used in this study.
4.3 Experimental Setup
(1) Data partitioning: For C1, the 3D LGEMRI dataset with 165 scans was randomly split into a training set with 99 scans and a testing set with 66 scans (33 preablation scans and 33 postablation scans). The training set then was randomly split into a labelled training set with 20 scans (20%) and an unlabelled training set with 79 scans (80%). For C2, the 3D LGEMRI dataset with 153 scans was randomly split into a training set with 91 scans and a testing set with 62 scans (31 preablation scans and 31 postablation scans). The training set then was randomly split into a labelled training set with 18 scans (20%) and an unlabelled training set with 73 scans (80%). For C3 and C4, each 3D LGEMRI dataset with 20 scans was randomly split into a training set with 12 scans and a testing set with 8 scans (4 preablation scans and 4 postablation scans). The training set then was randomly split into a labelled training set with 4 scans and an unlabelled training set with 8 scans. Because C5 only provides 60 CT scans including 20 labelled scans and 40 unlabelled scans, we randomly selected 15 scans from 20 labelled scans as a testing set. The remaining 5 labelled scans (labelled training set) and 40 unlabelled scans (unlabelled training set) together as a training set. Since each patient may contain multiple 3D LGEMRI scans, the 3D LGEMRI datasets were split under the strategy that all scans from each unique patient were only in one of the training or testing sets.
(2) Implementation details: Experiments were performed on five datasets combined in pairs for crosscentre study (C1 and C2, C3 and C4) and crossmodality study (C2 and C5). To reduce the dependence of models on annotated data and to avoid the impact of label variations from different centres, there were two kinds of experiment settings for each crossdomain data. Take experiments on C1 and C2 as an example: one used C1 to support C2 that the model was trained using the labelled training set (18 labelled cases) of C2, the unlabelled training set (73 unlabelled cases) of C2 and the whole training set (99 unlabelled cases) of C1. The other one used C2 to support C1 that the model was trained using the labelled training set (20 labelled cases) of C1, the unlabelled training set (79 unlabelled cases) of C1 and the whole training set (91 unlabelled cases) of C2. We denoted the results obtained by the fully supervised model trained with the labelled training set from C1 (20 cases), C2 (18 cases), C3 (4 cases), C4 (4 cases) and C5 (5 cases) as the baseline and the results obtained by the fully supervised model trained with the whole training set from C1 (99 cases), C2 (91 cases), C3 (12 cases) and C4 (12 cases) as the upper bound.
We preprocessed the data with the normalisation. Smaller patches of centred on the LA region were cropped. To avoid overfitting, we applied data augmentations with random rotation. The training time of our model is about 17.17 hours while the testing time for one 3D case is about 0.259 seconds. For the learning of the BAI network, we used the Adam method to perform the optimisation of two mapping networks with an initial learning rate of and a decayed rate of . The optimiser used in the discriminative network was Adam with a fixed learning rate of . For the learning of two dualmodelling networks, we also used the Adam method with an initial learning rate of and a decayed rate of
. The current statistics of batch normalisation were used for both training and testing. All experiments were performed with an independent test. For the dual consistency learning, in each iteration, we first performed the intraconsistency with both labelled and unlabelled data simultaneously, then performed the interconsistency with both labelled and unlabelled data simultaneously, performed supervised learning with labelled data in the last. Our deep learning model was implemented using Tensorflow
on an Ubuntu machine (The code will be released publicly once the manuscript is accepted for publication via https://github.com/HeyeSYSU/AHDC). It was trained and tested using an Nvidia RTX 8000 GPU (48GB GPU memory).The coefficients and used to balance the adversarial loss and the reconstruction loss, were automatically learned based on the strategy of uncertainty [50]. The coefficient was dynamically changed over time with the function of . The coefficients , and were set to the values of , and , respectively.
(3) Evaluation criteria: To evaluate the segmentation performance, we used regionbased metrics [51, 52]
, e.g., the Dice Similarity Coefficient (DSC) and the Jaccard Index (JI), to validate the predicted segmentation map against the manually defined groundtruth. We also used a surfacebased metric called Average Surface Distance (ASD) to provide the distance in
to quantify the accuracy of the predicted mesh () compared to the groundtruth mesh () [52].


Method  C5 (CT) supports C2 (MR)  C2 (MR) supports C5 (CT)  

DSC  JI  ASD (mm)  DSC  JI  ASD (mm)  
Upper Bound        
Baseline  
MT  
UAMT  
DTC  
DualTeacher  
Jointtraining  
AHDC 
5 Results and Analysis
In this section, we demonstrate the results of the above mentioned experiments to validate our proposed AHDC for the crossdomain semisupervised segmentation.
5.1 The Feasibility Analysis of AHDC for Generalising Across Domains:
TABLE 3 and TABLE 4 summarises the quantitative segmentation results of AHDC on multicentre data and multimodality data. As we can see, our proposed AHDC obtains consistent improvements in terms of the DSC, JI and ASD against the baselines. Furthermore, as the experiment results are summarised in TABLE 5, one can see that our proposed AHDC obtains consistent improvements against the fully supervised learning under the , , labelled data setting. Fig. 5 and Fig. 6 provide the 2D and 3D qualitative LAs estimated by AHDC compared to the ground truth. It is observed that our proposed AHDC has the ability to segment LA accurately. These quantitative and qualitative results indicate the feasibility of our proposed AHDC for generalising across domains.
5.2 The Superiority Analysis of AHDC for Generalising Across Domains:
TABLE 3 and TABLE 4 summarises the experiment results on multicentre data and multimodality data combined in pairs for comparison. It is observed that the widely used semisupervised method of MT improves the segmentation accuracy of LA compared to the baseline. One can see that after adding uncertainty information to the MT, the performance of the MT is improved (UAMT). The DTC method further improves the segmentation accuracy, indicating the effectiveness of dual task consistency for semisupervised learning. Although these methods have the ability to mine effective information from unlabelled data to support task learning, they have no proper mechanism to exploit the crossdomain information, thus leading to limited segmentation results. Compared to these methods, DualTeacher leverages two teacher models to guide a student model for the learning of both intradomain and interdomain knowledge, thus achieving big improvements in terms of segmentation accuracy. Notably, our proposed AHDC obtains the best segmentation accuracy over these widely used and stateoftheart semisupervised methods, which shows its superiority for generalising across domains. Furthermore, it is observed that our proposed AHDC generally improves the segmentation accuracy compared to the joint training, which combines the crossdomain data directly for the semisupervised LA segmentation. This demonstrates that our proposed AHDC can leverage crossdomain information to improve the model performance. We also provide qualitative comparison between different methods in Fig. 5. It is observed that the LAs estimated by other methods present fragmentary parts and unsmooth boundaries. While the LAs estimated by our proposed method are closer to the ground truth with smoother boundaries.
5.3 Ablation Studies
We performed ablation studies on and (C1 supports C2) to validate the effectiveness of our proposed AHDC for the crossdomain semisupervised segmentation.
Method  Rate  Metrics  
L2/U2 (%)  L1/U1 (%)  DSC  JI  ASD  
Upper Bound  
Baseline  
AHDC  
Baseline  
AHDC  
Baseline  
AHDC 
(1) Model variation study for bidirectional adversarial inference: As the experimental results are summarised in TABLE 6, the bidirectional adversarial inference with bidirectional reconstruction improves the LA segmentation accuracy in terms of DSC, JI and ASD compared with the BAIALIBiGAN. The reason behind the improvements is that bidirectional reconstruction makes the relationship between matched samples specified and constrained. It guarantees that the matched samples are onetoone correspondence for subsequent effective hierarchical dual consistency learning on crossdomain data. It is also observed that the segmentation accuracy is dropped while removing the skip connection from the domain mapping network. The reason behind this is that the domain mapping network (UNet structure) employs the skip connection to deliver the lowlevel information. It allows the samples adapted to another domain to maintain the same LA structures, which makes subsequent dual consistency learning effective. Furthermore, one can see that our proposed BAI has better performance for the downstream semisupervised LA segmentation task compared to the fully adversarial ALICE method, which indicates the superiority of our proposed BAI.
Method  Metrics  

DSC  JI  ASD  
Lower Bound  
BAI + HDC  
BAI + HDC  
ALICE + HDC  
BAI + HDC 