Adaptive Hierarchical Dual Consistency for Semi-Supervised Left Atrium Segmentation on Cross-Domain Data

09/17/2021 ∙ by Jun Chen, et al. ∙ Imperial College London SUN YAT-SEN UNIVERSITY 8

Semi-supervised learning provides great significance in left atrium (LA) segmentation model learning with insufficient labelled data. Generalising semi-supervised learning to cross-domain data is of high importance to further improve model robustness. However, the widely existing distribution difference and sample mismatch between different data domains hinder the generalisation of semi-supervised learning. In this study, we alleviate these problems by proposing an Adaptive Hierarchical Dual Consistency (AHDC) for the semi-supervised LA segmentation on cross-domain data. The AHDC mainly consists of a Bidirectional Adversarial Inference module (BAI) and a Hierarchical Dual Consistency learning module (HDC). The BAI overcomes the difference of distributions and the sample mismatch between two different domains. It mainly learns two mapping networks adversarially to obtain two matched domains through mutual adaptation. The HDC investigates a hierarchical dual learning paradigm for cross-domain semi-supervised segmentation based on the obtained matched domains. It mainly builds two dual-modelling networks for mining the complementary information in both intra-domain and inter-domain. For the intra-domain learning, a consistency constraint is applied to the dual-modelling targets to exploit the complementary modelling information. For the inter-domain learning, a consistency constraint is applied to the LAs modelled by two dual-modelling networks to exploit the complementary knowledge among different data domains. We demonstrated the performance of our proposed AHDC on four 3D late gadolinium enhancement cardiac MR (LGE-CMR) datasets from different centres and a 3D CT dataset. Compared to other state-of-the-art methods, our proposed AHDC achieved higher segmentation accuracy, which indicated its capability in the cross-domain semi-supervised LA segmentation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 6

page 10

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Semi-supervised learning provides great significance in left atrium (LA) segmentation model learning with insufficient labelled data. Automated and accurate LA segmentation is a crucial task to aid the diagnosis and treatment for the patients with atrial fibrillation (AF) [1, 2, 3, 4]

. Deep learning based approaches have great potential for the LA segmentation

[5, 6]. However, it is expensive and laborious to annotate large amounts of data by experienced experts for training an accurate LA segmentation model based on deep learning [7]. Since semi-supervised learning can alleviate the need for the labelled data by effectively exploiting the unlabelled data to learn deep models [8]. Semi-supervised learning is able to overcome the insufficient labelled data for advancing the accurate LA segmentation, benefiting the subsequent diagnosis and treatment for the patients with AF.

Generalising semi-supervised learning to cross-domain data for the LA segmentation is of high importance to improve model robustness. Semi-supervised learning aims to mine effective hidden information from unlabelled data to support model learning [9]. Because of the noise interference and the limited collection capabilities of data sources, a single data domain can not always provide sufficient high-quality unlabelled data and abundant data characteristics for robust semi-supervised LA segmentation. For example, the single data domain is usually subject to the limited LA varieties of contrast, shape and texture for robust model learning. Compared to the single data domain, cross-domain data not only can provide more available high-quality data, but also can provide complementary domain information and more comprehensive data characteristics to describe the LA of interest [10]. Therefore, it is important to effectively ensemble cross-domain data for robust semi-supervised LA segmentation.

However, generalising semi-supervised to cross-domain data is difficult due to the difference of distributions and the sample mismatch as shown in Fig. 1: (1) The difference of cross-domain data distributions. Semi-supervised learning with the generative model, low-density separation and graph-based method can work but relies on the consistent data distribution under certain model assumptions including smoothness assumption, cluster assumption or manifold assumption [9]. Performance degradation of the semi-supervised model may occur whenever the assumptions adopted for a particular task do not match the characteristics of the data distribution [9]. In the real world, cross-domain data collected from different sources exhibit heterogeneous properties [11], which can lead to the difference in distributions. For example, in medical image analysis, because of the different subject groups, scanners, or scanning protocols, the distributions of cross-domain data are different [12]. Therefore, generalising semi-supervised learning to cross-domain data directly is not trivial. (2) Sample mismatch of cross-domain data. Semi-supervised learning with the disagreement-based method requires matched samples from different domains, where the information of different domains is regarded as the different characteristics of matched samples [13]. Since the collection of cross-domain data is independent, the samples in different domains are not matched. This restricts the cross-domain generalisation of semi-supervised learning.

Figure 1: Our proposed adaptive hierarchical dual consistency overcomes the difference of data distribution and sample mismatch in different domains for the cross-domain semi-supervised segmentation.

In order to overcome the issues mentioned above, we propose an Adaptive Hierarchical Dual Consistency framework called AHDC for semi-supervised LA segmentation on cross-domain data as shown in Fig. 1

. The AHDC consists of two modules: (1) A Bidirectional Adversarial Inference module (BAI), which performs the mutual domain adaptation to align distributions and match samples for two different data domains. The adapted domains and two corresponding source domains are merged to obtain two matched domains. The obtained matched domains not only expand the number of data in a specific source domain, but also learns complementary representation for the samples in the specific source domain. (2) A Hierarchical Dual Consistency learning module (HDC), which performs a hierarchical semi-supervised segmentation with dual consistency on the obtained matched domains. The HDC builds two dual-modelling networks applied to the matched domains for mining the complementary information in both intra-domain and inter-domain. Within a specific domain, the segmentation task is represented as global modelling and local modelling. Then we perform a consistency between the complementary modelling LAs for intra-domain semi-supervised learning. For the inter-domain, we build a consistency between the outputs of dual-modelling networks estimated from different domains to exploit the complementary domain information.

Our main contributions are summarised as follows:

  • We propose a semi-supervised LA segmentation framework for generalising across domains. It provides a solution for generalising semi-supervised LA segmentation to cross-domain data with effectiveness on both different distributions and mismatched samples.

  • We propose a paradigm of hierarchical dual consistency learning to mine the effective information in both inter-domain and intra-domain. It explicitly enforcing consistency under complementary information.

  • We have conducted comprehensive experiments on four 3D MR datasets from different centres and one 3D CT dataset. The experiment results demonstrated the feasibility and the superiority of our proposed cross-domain semi-supervised segmentation framework.

2 Related Work

2.1 Domain Adaptation

Domain adaptation, which aims to overcome the distribution difference of different domains, has drawn great attention in computer vision

[14]

. Because generative adversarial network (GAN) has great superiority in capturing data distribution, it has been widely used in domain adaptation for aligning distributions of different domains

[15, 16, 17, 18, 19]. There are different GAN based structures for achieving domain adaptation. For the domain adaptation with a single direction, GAN usually leverages a generator and a discriminator to improve the distribution of the source domain to approximate it to the distribution of the target domain by adversarial learning. To focus on the high-resolution image with emphasis on pixel-level reconstruction, Pix2pixHD extends conditional GANs to leverage a decomposed generator and three multi-scale discriminators to achieve domain adaptation [20]. For the domain adaptation with bi-direction, CycleGAN [21], DualGAN [22] and DiscoGAN [23] concatenate two generators with two discriminators to ensure two cyclic consistency for the bidirectional domain adaptation of two different domains. ALI [24] and BiGAN [25]

employ two generators and a discriminator to match joint distribution for different domains. However, ALI and BiGAN do not focus on pixel-level reconstruction, thus cannot effectively capture the position, colour, and style of targets. ALICE extends the ALI to exploit cycle-consistency to focus on pixel-level reconstruction for the target domain

[26]. It also proposes to enforce cycle-consistency using fully adversarial learning with an extra discriminator. Our used domain adaptation method is based on the ALICE framework. We extended it to focus on bidirectional pixel-level reconstruction for two domains simultaneously. In order to reduce computing resources and difficulty of training while using fully adversarial learning, we adopt the explicit cycle-consistency, thus exploiting two generators and a discriminator for bidirectional domain adaptation with pixel-level reconstruction.

2.2 Semi-supervised Learning

Semi-supervised learning alleviates the problem of the lack of labelled data. Here we only discuss related consistency-based and disagreement-based semi-supervised learning. More information about semi-supervised learning can be found in [9]. The consistency-based methods constrain the prediction consistency under different perturbations and ensembles. For example, the model enforces the prediction consistency under the input perturbations with different Gaussian noise and the model perturbation with dropout operation [27]. Unsupervised data augmentation (UDA) replaces the traditional noise perturbations with high-quality data augmentations (e.g., RandAugment, Back-translation and TF-IDF) to improve consistency learning [28]. FixMatch uses a separate weak augmentation and a strong augmentation on input data for consistency regularisation [29]. In contrast to these methods, Temporal Ensembling (TE) penalises the inconsistency between the current prediction and the integration of previous predictions based on an exponential moving average (EMA) [27]. Compared to the TE, the Mean Teacher proposes to average the weights of a base model [30]. However, they need multiple reasoning processes to provide predictions for consistency learning, thus being subject to the computational cost.

The disagreement-based semi-supervised learning exploits the disagreement of predictions from multiple task learners during the learning process [13] including co-training and co-regularisation. Co-training leverages two sufficient and redundant views of data to train two task models for annotating the unlabelled data. Then the unlabelled data with high prediction confidence is added to the training set for further improving the model [31, 32]. Co-regularisation tries to directly minimise the prediction disagreement of unlabelled samples on different views [33].

Figure 2: Overview of our proposed AHDC framework for cross-domain semi-supervised segmentation. The framework consists of a bidirectional adversarial inference (BAI) module and a hierarchical dual consistency learning (HDC) module. The BAI module employs two mapping networks to perform a mutual adaptation of two different domains of and to obtain matched domains of and . The HDC module applies two dual-modelling networks to the matched domains for performing semi-supervised segmentation tasks. Each dual-modelling network contains a global-modelling branch (/) used to capture the global correlation of feature maps to estimate LA (), and a local-modelling branch (/) used to capture the local correlation of feature maps to estimate LA (). In intra-domain, a consistency is performed between and estimated by complementary modellings, respectively. In inter-domain, a consistency is performed between and estimated by complementary domain networks, respectively.
Notion Definition Notion Definition
Domain from source1 Domain from source2
Domain adapted
from to
Domain adapted
from to
Labelled domain,
Unlabelled domain
Mutual mapping nets
of and
Dual-modelling nets
local net, global net
Discriminator
,
Images from ,
,
Estimated LAs from
Reconstructions
of
Labelled data,
unlabelled data
Joint distribution
of ,
Ground truth
,
Marginal distributions
of ,,,
Parameterised
conditional distributions
Params of Param of
Param of in the
modules of feature,
local-modelling,
global-modelling
Param of in the
modules of feature,
local-modelling,
global-modelling
Loss function Weight Param
Table 1: Summary of notations

3 Method

3.1 Overview

The overview of our proposed AHDC framework is illustrated in Fig. 2. The notations are summarised in TABLE I. The AHDC framework consists of two modules: a BAI module and a HDC module. Given two different data domains denoted by and . contains both labelled data and unlabelled data , where with labelled samples and with unlabelled samples, respectively. The only contains unlabelled data denoted as with unlabelled samples. The BAI module employs two mapping networks of and to generate complementary domains by adapting and to each other, where the domain adapted from to is denoted as while the domain adapted from to is denoted as . Then the targeted domains ( and ) and the corresponding adapted domains ( and ) merge to form two matched domains of and . Finally, two dual-modelling networks of and are fed with matched samples sampled from and to predict LAs, where the LAs predicted by the local modelling and the global modelling are denoted as and while the LAs predicted by the local modelling and the global modelling are denoted as and , respectively.

3.2 Bidirectional Adversarial Inference for Distribution Alignment and Sample Matching.

Consider a to domain mapping network . Meanwhile, consider a to domain mapping network . We denote two domain marginal distributions of and as and . One domain can be inferred based on the other using parameterised conditional distributions, and , where and denote the parameters of two distributions. Then, we have the joint distributions of and . We aims to match to and match to by matching and . Then we use a discriminator network parameterised using to penalise mismatches in the joint distributions of and . Specifically, we consider following objectives:

(1)

where the

denotes the sigmoid function.

Intuitively, if equation (1) is achieved, and match each other, which not only implies that and match each other, but also implies that and

match each other. However, the relationship between random variables

and is not specified or constrained by equation (1). In order to obtain paired samples, according to [26], we extend the conditional entropies from single constraint to bi-direction constraints and , which imposes constraints on the conditionals and , simultaneously. Because there is no explicit distributions to compute the conditional entropies. According to [26], we bound the conditional entropies using the cycle-consistency ( and ):

(2)

Similarly,

(3)

where the and are denoted as the reconstructions of and . KL denotes the Kullback-Leible divergence. According to the equations of and , on the one hand, we have a function defined by , which first generates from based on , then produces from generated . On the other hand, we also have a function defined by , which first generates from based on , then produces from generated . In contrast to the fully adversarial training for solving and , we employ the reconstruction loss to reduce the difficulty of model training. Specifically, we consider following object:

(4)
(5)

where the denotes the mean absolute error. Finally, we have the following object for BAI:

(6)

where and

are hyperparameters to balance the adversarial loss and the reconstruction loss.

Figure 3: Structure of bidirectional adversarial inference network. The mapping network and the mapping network have the same structure.

3.3 Hierarchical Dual Consistency for Semi-supervised Segmentation

The BAI makes the cross-domain data adapt to each other to produce matched domains. In detail, the domain adapted to source is denoted as , where with labelled samples and with unlabelled samples. The domain adapted to source is denoted as with unlabelled samples. Then we merge two source domains and two adapted domains to obtain the matched domains of and . The , where with labelled samples and with unlabelled samples. The , where with labelled samples and with unlabelled samples. We denote two domain marginal distributions of and as and , respectively. The joint distribution of and is denoted as .

Based on the matched domains, we investigate complementary LA modelling and complementary domain knowledge learning to provide inherent prediction perturbation for the consistency based cross-domain semi-supervised learning. Therefore, a hierarchical dual consistency is investigated. Specifically, for the intra-domain, we consider two dual-modelling networks parameterised by and parameterised by applied to the matched domains of and , respectively. Each dual-modelling network estimates two targets by considering local information and global information of image, where simultaneously performs the global modelling of parameterised by and the local modelling of parameterised by . Similarly, the simultaneously performs the global modelling of parameterised by and the local modelling of parameterised by . Then we encourage the global modelling and the local modelling of each dual-modelling network to predict consistent targets via the consistency loss:

(7)
(8)

where denotes the dice loss function. For the dual consistency in inter-domain, we maximise the agreement on two matched domains. Therefore, we encourage and to predict similar outputs by:

(9)

where denotes the cross-entropy loss function. To avoid that and gradually resemble each other, we encourage the and to produce conditional independent features by orthogonalising the weights of feature layers:

(10)

where the denotes the number of layers in and . represents the number of features in th layer. and denote the parameters of th feature layer in and , respectively.

Beyond the consistency learning above, and can explicitly learns from and with the supervision of the labels:

(11)
(12)

where the denotes the LA label. denotes the supervised loss functions (cross-entropy loss function and dice loss function). Then the final training objective for the learning of and is denoted as:

(13)

where the , , and are hyperparameters to balance the loss terms.

Figure 4: Dual-modelling network for intra-consistency learning. The local-modelling branch and global-modelling branch share a feature extractor. For the global-modelling branch, the extracted feature maps from input images are split into patches. These

patches are taken as a sequence of vectors to be fed to a self-attention based global-modelling structure.

3.4 Network Configuration

The BAI module contains three subnetworks: two domain mapping networks (, ) and a discriminative network . We use the 2D U-Net with bilinear upsampling as network backbones of both and . has six convolution layers with the numbers of filters of , respectively. Each of the first five

convolutional layers with a stride of

is followed by a batch normalisation layer and a ReLU layer. The final

convolutional layer with a stride of is followed by a sigmoid layer.

Hierarchical dual-modelling network contains two dual-modelling networks with the same structure. Each dual-modelling network contains a 2D U-Net with bilinear upsampling used to extract image features and two branch networks used to estimate targets. The two branch networks are the global modelling network and the local modelling network. The global modelling network is based on the self-attention [34, 35, 36] as shown in Fig. 4. In the global modelling network, we use the sinusoidal position encoding to emphasise the sequential relationship between input feature patches [37]. The local modelling network consists of three convolution blocks. The details are shown in Fig. 4.

Centres Acquired Resolution TE/TR Scanner Source Amount of Data
C1 mm 2.25.2 ms 1.5 Tesla Avanto Royal Brompton Hospital LGE-CMR scans
C2 mm 2.3/5.4 ms
1.5 Tesla Avanto,
3.0 Tesla Vario
CARMA, University of Utah LGE-CMR scans
C3 mm 2.1/5.3 ms 1.5T Philips Achieva Beth Israel Deaconess Medical Center LGE-CMR scans
C4 mm 2.1/5.3 ms 1.5T Philips Achieva Imaging Sciences at King’s College London LGE-CMR scans
Table 2: Comparison of four LGE-CMRI datasets from different centres. Abbreviations: TE, Echo Time; TR, Repetition Time; CARMA, Comprehensive Arrhythmia Research and Management.

4 Experiments

4.1 Overview of Experiments

Comprehensive experiments were performed to validate our proposed AHDC.

(1) The feasibility of AHDC for generalising across domains: Our proposed AHDC was validated on four 3D late gadolinium enhancement cardiac MR (LGE CMR) datasets and a 3D CT dataset combined in pairs, which followed the independent validation protocol. Furthermore, we also investigated the impact of different ratios () of the labelled data for validating our proposed AHDC.

(2) The superiority of AHDC for generalising across domains: We compared to widely used and state-of-the-art semi-supervised methods on cross-domain data for comparison, including mean teacher (MT) method [30], uncertainty-aware self-ensembling model (UA-MT) [7], Dual-Task consistency (DTC) [38] and Dual-Teacher [39]. It is of note that MT, UA-MT and DTC were proposed for the single-domain semi-supervised learning while the Dual-Teacher method was proposed for the cross-domain learning. Besides, the Dual-Teacher required the labelled data from both cross-domain data for model learning. For a fair comparison, MT, UA-MT and DTC were performed on one of the matched domains, i.e., . We also compared with the joint training method that combining the cross-domain data directly for the LA segmentation based on our proposed semi-supervised method.

(3) The effectiveness of the components in AHDC: Firstly, we compared the performance between different architectures of the BAI module. On the one hand, to validate the effectiveness of bidirectional reconstruction for specifying the relationship of matched samples, an experiment was performed on bidirectional adversarial inference without using bidirectional reconstruction (BAI/ALI/BiGAN). On the other hand, to validate the effectiveness of skip connection of domain mapping network for keeping target structure consistent, an experiment was performed on bidirectional adversarial inference without using skip connection in domain mapping network (BAI). Then, we further validated the performance of BAI by comparing it with the fully adversarial ALICE [26] on the downstream semi-supervised tasks. Finally, for validating the effectiveness of HDC, we decomposed the HDC into independent intra-domain dual consistency learning (HDC) by removing a dual-modelling network and inter-domain dual consistency learning (HDC) by removing global modelling branch but retaining local modelling branch.

(4) The effectiveness of the BAI for matching domains

: Firstly, we performed the principal components analysis to show the data distributions of source domains (

and ) and the adapted domains ( and ). The data distributions of source domains and the adapted domains were compared to validate the effectiveness of AHDC for aligning distributions. Then, we made a qualitative visualisation of images before and after the bidirectional adversarial inference to validate the effectiveness of AHDC for matching samples.

(5) The effectiveness of the HDC for the availability of complementary information: To validate the availability of complementary modelling information in the intra-domain, we compared the segmentation performance of dual modelling network (local-global modelling structure) to the ones without using dual-modelling structures. Specifically, we replaced the local-modelling branch with the global-modelling branch (global-global modelling structure) and replaced the global-modelling branch with the local-modelling branch (local-local modelling structure) in dual modelling network for experiments. To validate the availability of complementary domain information in inter-domain, we compared the segmentation performance of HDC with/without using the orthogonal weight constraint (WOW and WOOW).

(6) The effects of parameter settings on model performance: We explored two important parameter settings. (i) The impact of different patch sizes (, and ) for global modelling. (ii) The impact of different values of (0.0, 0.1, and 1.0 ) for inter-domain learning.

4.2 Datasets

To evaluate the performance of our proposed AHDC, four 3D LGE-MRI datasets (C1, C2, C3 and C4) and a 3D CT dataset (C5) were collected as a retrospective study. In our experiments, the collected datasets of C1 and C2 included segmentation of the LA epicardium and LA endocardium while the collected datasets of C3, C4 and C5 included segmentation of the LA endocardium. We have summarised the characteristics of the four 3D LGE-MRI datasets to emphasise their differences as shown in TABLE 2.

LGE-MRI scanning sequence of centre 1 (C1): Cardiac MR data were acquired in patients with longstanding persistent atrial fibrillation (AF) on a Siemens Magnetom Avanto 1.5T scanner (Siemens Medical Systems, Erlangen, Germany). Transverse navigator-gated 3D LGE-CMRI [40] was performed using an inversion prepared segmented gradient echo sequence (TE/TR 2.2ms/5.2ms) 15 minutes after gadolinium administration (Gadovist-gadobutrol, 0.1mmol/kg body weight, BayerSchering, Berlin, Germany) [41]. The inversion time was set to null the signal from normal myocardium. The acquired resolution parameter of LGE-CMRI data was mm (reconstructed to mm). LGE-CMRI data were acquired during free-breathing using a crossed-pairs navigator positioned over the dome of the right hemi-diaphragm with navigator acceptance window size of and CLAWS respiratory motion control [42, 43]. The LGE CMR data were collected from the Royal Brompton Hospital. In total, 165 scans were used in this study.

LGE-MRI scanning sequence of centre 2 (C2): Cardiac MR data were obtained on a 1.5 Tesla Avanto scanners or a 3.0 Tesla Vario (Siemens Medical Solutions, Erlangen, Germany). The scan is acquired 20–25 minutes after 0.1 mmol/kg gadolinium contrast (Multihance, Bracco Diagnostics Inc., Princeton, NJ) using a 3D respiratory navigated, inversion recovery prepared gradient echo pulse sequence. Typical acquisition parameters are free breathing using navigator gating, a transverse imaging volume with voxel size = mm (reconstructed to mm), TR/TE = 5.4/2.3 ms, inversion time (TI)=270-310 ms. The TI value for the LGE-MRI scan is identified using a scout scan. Typical scan times for the LGE-MRI study were between 8 and 15 min at 1.5 T and 6–11 min using the 3T scanner (for Siemens sequences) depending on subject respiration and heart rates. The LGE CMR data were collected from the Comprehensive Arrhythmia Research and Management, University of Utah. In total, 153 scans were used in this study.

LGE-MRI scanning sequence of center 3 (C3): C3 is from the ISBI 2012 Left Atrium Fibrosis and Scar Segmentation Challenge [44, 45]. The LGE CMR data were collected from the Beth Israel Deaconess Medical Center. In total, 20 scans were used in this study.

LGE-MRI scanning sequence of center 4 (C4): C4 is also from the ISBI 2012 Left Atrium Fibrosis and Scar Segmentation Challenge [44, 45]. The LGE CMR data were collected from the Imaging Sciences at King’s College. In total, 20 scans were used in this study.

CT scanning sequence of centre 5 (C5): C5 is from the Multi-modality Whole Heart Segmentation (MM-WHS) 2017 dataset [46, 47, 48, 49]. In total, 60 CT scans were used in this study.

4.3 Experimental Setup

(1) Data partitioning: For C1, the 3D LGE-MRI dataset with 165 scans was randomly split into a training set with 99 scans and a testing set with 66 scans (33 pre-ablation scans and 33 post-ablation scans). The training set then was randomly split into a labelled training set with 20 scans (20%) and an unlabelled training set with 79 scans (80%). For C2, the 3D LGE-MRI dataset with 153 scans was randomly split into a training set with 91 scans and a testing set with 62 scans (31 pre-ablation scans and 31 post-ablation scans). The training set then was randomly split into a labelled training set with 18 scans (20%) and an unlabelled training set with 73 scans (80%). For C3 and C4, each 3D LGE-MRI dataset with 20 scans was randomly split into a training set with 12 scans and a testing set with 8 scans (4 pre-ablation scans and 4 post-ablation scans). The training set then was randomly split into a labelled training set with 4 scans and an unlabelled training set with 8 scans. Because C5 only provides 60 CT scans including 20 labelled scans and 40 unlabelled scans, we randomly selected 15 scans from 20 labelled scans as a testing set. The remaining 5 labelled scans (labelled training set) and 40 unlabelled scans (unlabelled training set) together as a training set. Since each patient may contain multiple 3D LGE-MRI scans, the 3D LGE-MRI datasets were split under the strategy that all scans from each unique patient were only in one of the training or testing sets.

(2) Implementation details: Experiments were performed on five datasets combined in pairs for cross-centre study (C1 and C2, C3 and C4) and cross-modality study (C2 and C5). To reduce the dependence of models on annotated data and to avoid the impact of label variations from different centres, there were two kinds of experiment settings for each cross-domain data. Take experiments on C1 and C2 as an example: one used C1 to support C2 that the model was trained using the labelled training set (18 labelled cases) of C2, the unlabelled training set (73 unlabelled cases) of C2 and the whole training set (99 unlabelled cases) of C1. The other one used C2 to support C1 that the model was trained using the labelled training set (20 labelled cases) of C1, the unlabelled training set (79 unlabelled cases) of C1 and the whole training set (91 unlabelled cases) of C2. We denoted the results obtained by the fully supervised model trained with the labelled training set from C1 (20 cases), C2 (18 cases), C3 (4 cases), C4 (4 cases) and C5 (5 cases) as the baseline and the results obtained by the fully supervised model trained with the whole training set from C1 (99 cases), C2 (91 cases), C3 (12 cases) and C4 (12 cases) as the upper bound.

We pre-processed the data with the normalisation. Smaller patches of centred on the LA region were cropped. To avoid overfitting, we applied data augmentations with random rotation. The training time of our model is about 17.17 hours while the testing time for one 3D case is about 0.259 seconds. For the learning of the BAI network, we used the Adam method to perform the optimisation of two mapping networks with an initial learning rate of and a decayed rate of . The optimiser used in the discriminative network was Adam with a fixed learning rate of . For the learning of two dual-modelling networks, we also used the Adam method with an initial learning rate of and a decayed rate of

. The current statistics of batch normalisation were used for both training and testing. All experiments were performed with an independent test. For the dual consistency learning, in each iteration, we first performed the intra-consistency with both labelled and unlabelled data simultaneously, then performed the inter-consistency with both labelled and unlabelled data simultaneously, performed supervised learning with labelled data in the last. Our deep learning model was implemented using Tensorflow

on an Ubuntu machine (The code will be released publicly once the manuscript is accepted for publication via https://github.com/Heye-SYSU/AHDC). It was trained and tested using an Nvidia RTX 8000 GPU (48GB GPU memory).

The coefficients and used to balance the adversarial loss and the reconstruction loss, were automatically learned based on the strategy of uncertainty [50]. The coefficient was dynamically changed over time with the function of . The coefficients , and were set to the values of , and , respectively.

(3) Evaluation criteria: To evaluate the segmentation performance, we used region-based metrics [51, 52]

, e.g., the Dice Similarity Coefficient (DSC) and the Jaccard Index (JI), to validate the predicted segmentation map against the manually defined ground-truth. We also used a surface-based metric called Average Surface Distance (ASD) to provide the distance in

to quantify the accuracy of the predicted mesh () compared to the ground-truth mesh () [52].

Method C2 (MR) supports C1 (MR) C1 (MR) supports C2 (MR)
DSC JI ASD (mm) DSC JI ASD (mm)
Upper Bound
Baseline
MT
UA-MT
DTC
Dual-Teacher
Joint-training
AHDC
(a) Experiments on C1 (MR) and C2 (MR)
Method C4 (MR) supports C3 (MR) C3 (MR) supports C4 (MR)
DSC JI ASD (mm) DSC JI ASD (mm)
Upper Bound
Baseline
MT
UA-MT
DTC
Dual-Teacher
Joint-training
AHDC
(b) Experiments on C3 (MR) and C4 (MR)
Table 3: Quantitative comparison between our proposed AHDC and other methods on multi-centre data. Abbreviations: DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance.
Method C5 (CT) supports C2 (MR) C2 (MR) supports C5 (CT)
DSC JI ASD (mm) DSC JI ASD (mm)
Upper Bound - - -
Baseline
MT
UA-MT
DTC
Dual-Teacher
Joint-training
AHDC
Table 4: Quantitative comparison between our proposed AHDC and other methods on multi-modality data. Abbreviations: DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance.

5 Results and Analysis

In this section, we demonstrate the results of the above mentioned experiments to validate our proposed AHDC for the cross-domain semi-supervised segmentation.

5.1 The Feasibility Analysis of AHDC for Generalising Across Domains:

TABLE 3 and TABLE 4 summarises the quantitative segmentation results of AHDC on multi-centre data and multi-modality data. As we can see, our proposed AHDC obtains consistent improvements in terms of the DSC, JI and ASD against the baselines. Furthermore, as the experiment results are summarised in TABLE 5, one can see that our proposed AHDC obtains consistent improvements against the fully supervised learning under the , , labelled data setting. Fig. 5 and Fig. 6 provide the 2D and 3D qualitative LAs estimated by AHDC compared to the ground truth. It is observed that our proposed AHDC has the ability to segment LA accurately. These quantitative and qualitative results indicate the feasibility of our proposed AHDC for generalising across domains.

5.2 The Superiority Analysis of AHDC for Generalising Across Domains:

TABLE 3 and TABLE 4 summarises the experiment results on multi-centre data and multi-modality data combined in pairs for comparison. It is observed that the widely used semi-supervised method of MT improves the segmentation accuracy of LA compared to the baseline. One can see that after adding uncertainty information to the MT, the performance of the MT is improved (UA-MT). The DTC method further improves the segmentation accuracy, indicating the effectiveness of dual task consistency for semi-supervised learning. Although these methods have the ability to mine effective information from unlabelled data to support task learning, they have no proper mechanism to exploit the cross-domain information, thus leading to limited segmentation results. Compared to these methods, Dual-Teacher leverages two teacher models to guide a student model for the learning of both intra-domain and inter-domain knowledge, thus achieving big improvements in terms of segmentation accuracy. Notably, our proposed AHDC obtains the best segmentation accuracy over these widely used and state-of-the-art semi-supervised methods, which shows its superiority for generalising across domains. Furthermore, it is observed that our proposed AHDC generally improves the segmentation accuracy compared to the joint training, which combines the cross-domain data directly for the semi-supervised LA segmentation. This demonstrates that our proposed AHDC can leverage cross-domain information to improve the model performance. We also provide qualitative comparison between different methods in Fig. 5. It is observed that the LAs estimated by other methods present fragmentary parts and unsmooth boundaries. While the LAs estimated by our proposed method are closer to the ground truth with smoother boundaries.

5.3 Ablation Studies

We performed ablation studies on and (C1 supports C2) to validate the effectiveness of our proposed AHDC for the cross-domain semi-supervised segmentation.

Figure 5: 2D visual comparisons on LA segmentation results estimated by different methods. It is observed that our estimated LAs (AHDC) are more similar to the ground truth (GT) than others (DSC based segmentation accuracies of AHDC for the 2D slices from row 1 to row 4 are , , and , respectively). Abbreviations: DSC, Dice Similarity Coefficient.
Figure 6: 3D visualization of LA segmentation results estimated by AHDC. Each DSC score is calculated for the whole 3D LGE-MRI image (The DSC based segmentation accuracies of AHDC for the 3D slices from column to column are , , and , respectively). Abbreviations: DSC, Dice Similarity Coefficient.
Method Rate Metrics
L2/U2 (%) L1/U1 (%) DSC JI ASD
Upper Bound
Baseline
AHDC
Baseline
AHDC
Baseline
AHDC
Table 5: The performance of AHDC on different percentages of labelled data. Abbreviations: Lx (%): Lx (%): the ratio of labelled data in the training set of centre x; Ux (%): the ratio of unlabelled data in the training set of centre x; DSC, Dice Similarity Coefficient; JI, Jaccard Index; ASD, Average Surface Distance.

(1) Model variation study for bidirectional adversarial inference: As the experimental results are summarised in TABLE 6, the bidirectional adversarial inference with bidirectional reconstruction improves the LA segmentation accuracy in terms of DSC, JI and ASD compared with the BAIALIBiGAN. The reason behind the improvements is that bidirectional reconstruction makes the relationship between matched samples specified and constrained. It guarantees that the matched samples are one-to-one correspondence for subsequent effective hierarchical dual consistency learning on cross-domain data. It is also observed that the segmentation accuracy is dropped while removing the skip connection from the domain mapping network. The reason behind this is that the domain mapping network (UNet structure) employs the skip connection to deliver the low-level information. It allows the samples adapted to another domain to maintain the same LA structures, which makes subsequent dual consistency learning effective. Furthermore, one can see that our proposed BAI has better performance for the downstream semi-supervised LA segmentation task compared to the fully adversarial ALICE method, which indicates the superiority of our proposed BAI.

Method Metrics
DSC JI ASD
Lower Bound
BAI + HDC
BAI + HDC
ALICE + HDC
BAI + HDC