e-UDA: Efficient Unsupervised Domain Adaptation for Cross-Site Medical Image Segmentation

01/25/2020 ∙ by Hongwei Li, et al. ∙ Technische Universität München IEEE 17

Domain adaptation in healthcare data is a potentially critical component in making computer-aided diagnostic systems applicable cross multiple sites and imaging scanners. In this paper, we propose an efficient unsupervised domain adaptation framework for robust image segmentation cross multiple similar domains. We enforce our algorithm to not only adapt to the new domains via an adversarial optimization, rejecting unlikely segmentation patterns, but also to maintain its performance on the source training data, by incorporating both semantic and boundary information into the data distributions. Further, as we do not have labels for the transfer domain, we propose a new quality score for the adaptation process, and strategies to retrain the diagnostic algorithm in a stable fashion. Using multi-centric data from a public benchmark for brain lesion segmentation, we demonstrate that recalibrating on just few unlabeled image sets from the target domain improves segmentation accuracies drastically, with performances almost similar to those from algorithms trained on fresh and fully annotated data from the test domain.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Illustration of domain shift in intensity distribution, contrast level, voxel size and noise between MRI sequences acquired in different centers.

Domain shifts are commonly observed in real-world machine learning applications. Especially in the field of medical imaging, with its inherently heterogeneous and complex data, the performance of computer-aided diagnostic (CAD) systems may drop rapidly when being tested on new data acquired with scanners or sequences that underwent minor, but critical updates. Same as in other fields, deep convolutional neural networks (CNN) have propelled unprecedented advances in biomedical image analysis

[24]. Medical image segmentation, is an important step in quantifying and structuring medical image information in biomedical research and clinical practice, plays a crucial role in various medical applications. Although transformative advancements have been achieved by learning how to quantify medical images using CNNs [31, 19]

, most of the supervised learning approaches were built based on a common assumption that training and test data are drawn from the same probability distribution. This leads the established models commonly to suffer from a

domain shift in the data encountered at the inference stage [3]. We denote domain as a set of data samples from a same distribution. For example, MR images from one scanner with same imaging sequence can represent one domain whilst those acquired in another medical center, with a different scanner and a slightly modified imaging sequence can represent another [20].

One way to deal with this common problem in medical imaging is sampling the different domains, and including scans from a maximal number of centers into the training set. Unfortunately, high quality images with expert annotation and clinical follow-up verification is often only available for data acquired under highly controlled conditions, for example, during a clinical study. A diagnostic algorithm trained on samples from such a source is very likely to drop in performance significantly, when applied to real world scans from arbitrary target domains [23]. As an example, in Fig. 1 demonstrates the amount of domain shift in intensity distribution, contrast level, voxel size and noise among different sites. Domain adaptation [3]

and transfer learning

[30] methods have been studied to generalise established models. A naive solution is fine-tuning the models learned on source domain with extra labeled data from the target domain. However, the annotation is prohibitively time-consuming and expensive in real-world healthcare applications. Moreover, without control and approval of the quality and diagnosis of the new cases, which may require a full clinical study, modifying the training data sets may be prohibitive for most clinical diagnostic algorithms, also raising the question of how to rate and verify the quality of the added annotations.

Unsupervised domain adaptation (UDA) methods [8] are more feasible, given that the models are learnt across domains without using extra labelling in the target domain. Although various works have tackled UDA, they face common drawbacks: 1) feature-level adaptation methods require empirical selection of the feature level used during adaptation; 2) image adaptation methods require a high image quality in synthethized target domain images and a large amount of data are required to learn the distributions; 3) the adaptation process can not be validated since it is an unsupervised process in which does not including any labels from the target domain.

In this study, we propose to not only consider the relations of image features between different domains, but to also consider semantic and boundary information, i.e., the segmented images in both domains and the expected spatial patterns of the disease process. We assume that the understanding or knowledge of the disease patterns is domain-invariant and can be generalised to different domains. To learn this domain-invariant knowledge, we proposed an adversarial framework to drive the segmentation network to adapt to different domains constraint on semantic labels and information on the boundaries of critical image structures. This does not require to align the image features with an empirical selection of the feature level, meanwhile, the image features are also involved during the adaptation process.

Our unsupervised domain adaptation framework offers three-fold contributions:

  • [noitemsep]

  • We address the unsupervised domain adaptation problem using adversarial learning, with a few shots or instances of target-domain images and without requiring any annotation from unseen scanners, which holds large clinical benefits.

  • We propose to incorporate semantic and boundary information into the distribution to stabilize the adversarial learning process and we identify an effective criteria for evaluating the algorithm’s behaviour in the target domain.

  • We benchmark our method on public datasets in an application scenario of cross-site brain lesion segmentation and quantification. Experiments show that the proposed method achieves good generalisation during the unsupervised ’recalibration’ across multiple domains.

2 Related Work

Our work is related to unsupervised domain adaptation and cross-domain image segmentation.
Unsupervised domain adaptation. Early studies on UDA focused on aligning or matching the distributions in feature space, by minimizing the distances between the features learnt from the source and target domain [26, 27]. More recently, with the advances of generative adversarial network (GAN) and its extensions [12, 1], the latent feature spaces across different domains can be implicitly aligned by adversarial learning. Y. Ganin et al. [9] proposed to learn domain-invariant features by sharing weights between two CNN networks. E. Tzeng et al. [32] introduced a unified framework in which each domain is equipped with a dedicated encoder before the last classification layer. However, as commented in [33], the above UDA methods for classification problems cannot work well to the dense segmentation problem [16], because the mapping functions from image space to label space may differ in source and target domains due to the domain shift.
Cross-domain image segmentation. In contrast to prior works, the medical image segmentation we study is a highly-structured prediction problem, for which unsupervised domain adaptation attracts increasing attention. Transfer learning with simple fine-tuning strategies has been experimentally studied in [11] on the brain lesion segmentation application. The existing works can be divided into two main streams: feature-level adaptation and image-level adaptation. In the feature level, K. Kamnitsas et al. [19] proposed an early attempt to perform UDA for brain lesion segmentation which aimed to learn domain-invariant features with a domain discriminator. The cross-modality segmentation problem with large domain shift is addressed in [7] in which specific feature layers are fine-tuned and an adversarial loss is used to supervised feature learning. K. Bousmalis et al. [4] proposed an image-level adaptation framework, which aligns the image appearance between domains with the pixel-to-pixel transformation. In this direction, the domain shift problem is addressed at the input level. Although the images cannot be perfectly synthesized, existing studies [34, 4, 17] demonstrate that the image adaptation brings improvements in pixel-wise predictions on a target domain. A recent work [5] combines feature-level and image-level adaptation and achieves state-of-the-art results on cross-modality segmentation.

While the results were inspiring and demonstrated the efficacy of adversarial learning, there exists limitations for the two classes of methods above. For the feature-level adaptation methods, i) different levels of features are concatenated as the input of a domain discriminator. This requires much effort in empirical design of network architecture and is difficult to interpret. As shown in [19] and [7], different choices of features levels results in varied segmentation results. ii) as mentioned in [33], the assumption that features representations are aligned in source and target domains without carefully considering the structured labels, becomes less likely to hold. For the image-level adaptation ones, commonly a CycleGAN-based component [35] serve to synthesize the target images. However, the image quality of synthetic images is not guaranteed without large amount of data from the target domain, especially when the region of interest is tiny such as small brain lesions in our study. Furthermore, CycleGAN’s extensions are based on pixel-to-pixel transformation and thus cannot deal with domain shift in voxel size.

3 Method

Figure 2: Overview of the proposed efficient unsupervised domain adaptation framework, consisting of a segmentation model and a distribution discriminator. The semantic distributions of source and target domains are driven to be similar by adversarial learning. The weights of the two segmentation models are shared and they are trained on both source and target domains in supervised and adversarial manners respectively. The distribution discriminator takes the images, semantic masks and edge maps to learn the domain-invariant pixel-to-pixel relation.

Fig. 2 is an overview of our proposed method. With a segmentation model pre-trained on the source domain, we further adapt the model to target domains by using a distribution discriminator which judges whether the mapping from image space to label space is the same or not. This process is trained with an adversarial loss in an unsupervised manner as it does not require labels from the target domain.

3.1 Problem Definition, Assumption and Notation

Let denote an input image space and

a segmentation label space. We define a domain to be a joint distribution

on . Let and denote the sets of joint distributions from the source domains and the target domains respectively. We observe source domains , where is sampled from containing samples, and samples , where is sampled from . Notably, the samples from the target domain do not contain any ground-truth segmentation labels.

We consider two mapping functions from image spaces to label spaces. : is the mapping learnt in the source domain, and : is the one learnt in the target domain (if labels are available). We assume that, the two mappings from image space to label space are domain-invariant in image segmentation tasks, i.e., . In other words, the knowledge for segmentation tasks is stable across different domains. The goal of unsupervised domain adaptation is to learn a generalised domain-invariant segmentation model given {, } such that approximates and . We aim to learn using as a reference or a teacher model [14] since is not observed.

3.2 Semantic- and boundary-aware layer

In order to effectively learn domain-invariant mapping, we introduce a semantic- and boundary-aware layer which incorporates semantic and boundary information into the distribution by spatially concatenating the image, semantic masks and edge maps as a part of the model input. We claim that such a combination of information is crucial for domain adaptation in image segmentation tasks especially boundary-aware input which shows its effectiveness in previous studies [13, 6]. We introduce edge maps to detect the structure boundary in prediction masks using Sobel operators [10]. Since the labels of background and pathology are often highly unbalanced especially in brain lesion segmentation task, we further develop an inverse-label map to facilitate the learning process.

Let denote the probability (or ground-truth) maps, and denote the Sobel operations for two directions, and denote the all-ones matrix with the same size with , the edge maps are defined as and . The inverse label maps is defined as: . Thus the semantic- and boundary-aware layer concatenates multiple maps as its input:

(1)

The effectiveness of the edge maps and inverse maps is presented in Section 4.4.

3.3 Adversarial Domain Adaptation

The segmentation model in source domain can be learnt by supervised learning given . Before domain adaptation, is not generalised to and it can be used for an initialisation of . Since is expected to generalise on both and , should be partly supervised by during the domain adaptation process. The goal of learning a domain-invariant is equal to minimizing the distance between and .

Inspired by [9], we use an adversarial network including a generator which performs label predictions given input images, and a discriminator that evaluates whether the mapping is the same as or not, thus pushes to be close to . For this purpose, the mapping from source image space to ground-truth label space is treated as ’expert knowledge’ while the mapping is treated as ’machine knowledge’. However, modeling of faces the challenge of measuring the similarity between mappings which is not straightforward to compute. Meanwhile, is difficult to be directly formulated. We choose to parameterize these functions by using a deep neural network to map the samples {, } along with semantic and boundary information to a latent feature space and discriminate the latent features to be ’good’ or not. The optimization of and can be formulated as:

(2)

However, this objective can be problematic since during the early training stage the discriminator converges quickly, causing the gradient-vanishing issue. It is typical to train the generator with the standard loss function with inverted labels

[12]. This splits the optimization into two independent objectives, one for the discriminator and one for the generator:

(3)
(4)

Notably this two-stage optimization provides stronger gradients to the target mapping.
ConvNets for segmentation. One of the basic components of the proposed system is a fully convolutional neural network (ConvNets) for image segmentation [25]. With the labeled dataset of samples from source domain, supervised learning was conducted to establish a mapping from the input image space to the label space. We borrowed the top-performing U-shape architecture from [23]

and adopted the same configuration for all meta-parameters. The parameters of the network are learnt by iteratively minimizing a segmentation loss using stochastic gradient descent. The segmentation loss function is a linear combination of Dice coefficient loss

[29] and cross-entropy loss, formulated as

(5)

where is the smoothing factor to avoid numerical issues, and are the ground-truth label and prediction for voxel respectively. Notably the loss function can be a multi-class version depending on the segmentation task. Unlike the existing frameworks [7, 32] which freeze the source-domain model and learn another model on target domain, we use only one ConvNet for both source and target domains. We argue that deep ConvNet offers large modeling capacity and can learn effective representations on diverse data from multiple domains as observed from recent segmentation benchmarks [2, 22, 36]. Given the modelling capacity, the goal of this work is to learn an domain-invariant segmentation model.

3.4 Monitoring Metrics

The performance of the network is hard to validate on the target domain since there is no ground-truth labels available. We introduce two monitoring metrics to observe the UDA process. The idea is that after the supervised pre-training on source domain, the model produces an initial segmentation mask on the target domain which differs much from the ground-truth mask. While performing UDA in each training iteration, the difference between the updated segmentation mask and the initial one increases. Metaphorically speaking the segmentation mask moves from the initial segmentation towards the ground truth mask, while increases the difference to the beginning. Consequently, we assume that the difference between the current and the initial segmentation mask is expected to reach its global maximum and show stable at the end of the UDA process.

Given a sample from target domain, let denote the initial mask predicted by the pre-trained model before UDA, formulated as: . Let be the mask predicted by at iteration during the UDA process, formulated as: .

We use the Euclidean distance of the two masks as one of the monitoring metrics, formulated as:

(6)

where represents the Euclidean distance.

To measure the stability, we further compute the variance

of the mask differences in an iteration interval, which can be formulated as:

(7)

where is the average of the mask differences in an iteration interval. When is below a certain threshold , it indicates that the segmentation does not improve anymore and then we stop the training. This evaluation criterion is not only an indicator for the quality improvement of the segmentation but also a good stopping criteria for the learning process.

3.5 Training Strategies

In our training setting we define two stages (see Algorithm 1). In the first stage, we perform a supervised training on the source domain. In the second stage, we aim to learn a domain-invariant segmentation model by performing domain adaptation in an adversarial fashion.

Input: from source domain,

from target domain, number of epochs

, stopping threshold
      Output: segmentation model , discriminator
      Initialise and

1:procedure Pre-training
2:     get batches from , i = 0
3:     while  do
4:         update  by Eq. (5)
5:         i = i+1      
6:     compute
7:procedure Domain adaptation
8:     get batches with domain labels from and
9:     while  do
10:         , j = 0
11:         while j < n  do
12:              update  and with a batch from
13:              update  and with a batch from
14:              only update  with a batch from
15:              compute
16:              j = j+1          
17:         compute by Eq. 7      return
Algorithm 1 Unsupervised domain adaptation process

4 Experiments

.

4.1 Datasets and Evaluation Metrics

The public dataset of the MICCAI White Matter Hyperintensities Segmentation Challenge 2017 [22] is used for evaluation and benchmarking. It contains 60 MRI scans from 3 different scanners from the hospitals in the Netherlands and Singapore. Table 1 presents the characteristics of the imaging dataset. For each subject, 2D multi-slice FLAIR and T1, with the corresponding ground-truth annotated by two raters, are available.

Dataset  Scanner Name Voxel Size (m3) Volume Size TR/TE/TI (ms) Num.
Utrecht  3T Philips Achieva 11000/125/2800 20
Singapore  3T Siemens TrioTim 9000/82/2500 20
Amsterdam  3T GE Signa HDxt 8000/126/2340 20
Table 1: Data Characteristics of the MICCAI WMH challenge 2017 dataset consisting of MRI datasets from three centers. TR/TE/TI are imaging parameters from specific imaging protocols.

We split the dataset into a source domain set and a target domain set. The source set contains the 40 scans from two scanners with manual segmentation masks while the target set represents the data of the leftover scanner but without the segmentation masks.

For evaluating the results of our proposed algorithm, we use five evaluation metrics, taken from the MICCAI WMH challenge to compare with the state-of-the-art. Given the ground-truth segmentation mask

and the generated mask by the segmentation model , the evaluation metrics are defined as follows.
Dice similarity coefficient (DSC):
Hausdorff Distance (H95):
,
where denotes the distance between and , sup represents the supremum and inf the infimum. A robust version, i.e., percentile instead of the maximum distance ( percentile), was used.
Absolute Volume Difference (AVD): and denote the volume of lesion regions in and respectively. The AVD is defined in percentage as: .
Lesion-wise Recall: denotes the number of individual lesion regions in , while denotes the number of correctly detected lesions in . .
Lesion-wise F1-score: denotes the number of correctly detected lesions in , while denotes the wrongly detected lesions in . The F1-score is defined as: .

4.2 Implementations

4.2.1 Image preprocessing

The intensity of the images are normalised in a preprocessing step. We use 2D axial slices of both FLAIR and T1 sequences for training. All images are cropped or padded to a uniform size of 200 by 200 pixels. Then the voxel intensity is normalised with a z-score normalisation. We use data augmentation (rotation, shearing and scaling) to achieve the desired invariance in the training stage.

4.2.2 Network architectures and parameters setting

The first part called generator is a fully convolutional network taken from [23]

. The generator takes the concatenation of FLAIR and T1 image as a two-channel input and follows a U-net structure. A combination of convolutional and max-pooling layer downsamples the input data before the segmentation mask is produced by several upsampling layers. Additionally, skip connections between layers at the same level create a stronger relation between input and output. The output is a one-channel probability map with pixel-wise predictions for the input image.

We further introduce a 2.5-D convolutional discriminator network which aim to identify the distribution difference between source and target domain. The discriminator takes a seven-channel input consisting of the paired FLAIR & T1 images, the segmentation mask, the mask’s inverse-label maps, two Sobel edge maps and the edge maps’ inverse-label maps. The use of the inverse mask and edge mask introduce semantic and boundary information of the interested area which helps the network to identify if the segmentation is correct or not. This is necessary because the interested areas are very small in our case. We use PatchGAN [18] architecture with a small patch size of 10 10 as the output. The discriminator model is trained with domain labels using a cross-entropy loss.

Adam optimizer [21] is used for stochastic optimization. The learning rates for the segmentation network, discriminator model and adversarial model are set to 0.0002, 0.001 and 0.0002 respectively.

4.3 Results

We first establish the performances of lower bound baseline and upper bound considering two scenarios: (i) Lower bound (L-bound): The baseline model is trained on the source dataset to establish a lower bound performance. The segmentation network is trained on the source domain images henceforth referred to as , and tested on the subjects from a target dataset . (ii) Upper bound (U-bound): Here, the segmentation model is trained same as L-Bound, however, the training dataset is a union of the data including images and masks from both and a subset of excluding one testing sample. Then we perform leave-one-subject-out evaluation on the testing set to establish a strong upper bound.

To demonstrate the efficiency of our approach, the algorithm are run considering two conditions of the target domain: (iii) using only a few shots of images: In this scenario, we use the data from and only a tiny subset (i.e. one fixed batch) of the target domain and without using any annotation from the target domain. (iv) using the full set of the target domain: Here, we use the data from and the full set of the target domain and without using any annotation from the target domain.

Conditions  Dice score H95 (mm) AVD Lesion Recall Lesion F1
Utrecht + Amsterdam Singapore
L-bound  0.682 9.22 45.95 0.641 0.592
U-Net Ensembles [23]  0.703 8.83 37.21 0.672 0.642
CyCADA [15]  0.452 15.23 67.13 0.462 0.344
e-UDA with a few shots (ours)  0.780 7.54 24.75 0.666 0.657
e-UDA with full set (ours)  0.782 7.51 22.14 0.754 0.649
U-bound  0.803 4.62 11.22 0.761 0.711
Utrecht + Singapore Amsterdam
L-bound  0.674 11.51 37.6 0.692 0.673
U-Net Ensembles [23]  0.694 9.90 31.01 0.720 0.691
CyCADA [15]  0.412 18.21 89.23 0.402 0.292
e-UDA with a few shots (ours)  0.733 7.90 16.01 0.785 0.725
e-UDA with full set (ours)  0.737 7.53 30.97 0.841 0.739
U-bound  0.802 4.52 13.81 0.843 0.803
Table 2: Results on two cross-site segmentation tasks. The values are calculated by averaging the results on the target dataset. We compare our method with current state-of-the-art method. L-bound denotes the performance without using adaptation whilst U-bound denotes the performance using almost the full set of the target domain.

4.3.1 Domain Adaptation Results on Multi-site Data

Utrecht + Amsterdam Singapore: For this setting, we take the site 1 and 3 (Utrecht and Amsterdam) as the source domain which leaves the site 2 (Singapore) for the target domain. Table 2 presents the final results of our e-UDA algorithm on the target dataset and compares with baseline and state-of-the-art U-Net ensemble method [23]. Notably [23] is the top-performing algorithm on cross-scanner segmentation as analyzed in [22]. CyCADA [15] is an image-level CycleGAN-based adaptation method. We found that CyCADA achieve poor result because of the low image quality in the synthesized target domain. Fig. 3 shows a failure example. Our e-UDA significantly improves segmentation performance on the target dataset after domain adaptation (eUDAs vs. L-bound, p-value < 0.0001). We observe that our e-UDA with a few shots of target images achieves similar Dice score comparing with using the full set of the target domain (78.0% vs. 78.2%). When using the full set, e-UDA achieve promising performance close to the upper bound in Dice score (78.2% vs. 80.3%) and lesion-wise recall (75.4% vs. 76.1%).

Utrecht + Singapore Amsterdam: For this experiment, we take site 1 and 2 (Utrecht and Singapore) as the source dataset whilst site 3 (Amsterdam) represents the target dataset. Similarly we observe that using a few shots of target images can significantly improve the segmentation results on target domain. When using the full set, e-UDA achieves promising performance close to the upper bound in lesion-wise recall (84.1% vs. 84.3%). The performance in AVD in Table 2 decreases after using the full set whilst the Dice stays stable and recall increases. This indicates that the algorithm encourages the network to produce reasonable prediction based on spatial patterns of the disease process.

Figure 3: The translation from Utrecht to Amsterdam using CycleGAN [35]. We observed that it introduce noise in the synthetic images and the voxel-size stays the same.
Figure 4: From left to right: results on five axial slices of the same subject. From top to bottom: FLAIR axial-view images, the segmentation results before domain adaptation, the segmentation results using the proposed method. Green color indicates overlap between the segmentation result and the ground truth masks, red color false positives, and gold color false negatives. (Best viewed in color)

4.3.2 Performance on both domains

Figure 5: Performance on both domains during the domain adaptation process. We observe that the performance in the source domain remains stable whilst the performance in the target domain is increasing.

Since the segmentation network is continually trained with supervised training using the data from source domain, the model faces the risk of overfitting issue on the source domain. We argue that the distribution discriminator can regularise the training process and can avoid overfitting issues. We further observe the behaviour of the segmentation model on both source domain and target domain without stopping the adaptation process. In this setting, we split the data from source domain into a training set (80%) and a validation set (20%) for observation. We use Dice score and lesion-wise recall for evaluation metrics. From Fig. 5, we confirm that the segmentation model does not overfit on the source domain whilst the performance on target domain is increasing stably.

4.4 Ablation Study

We conduct ablation experiments to evaluate the effectiveness of each key component in our proposed adversarial learning framework. Table. 3 shows the segmentation performance is increasingly better as more semantic information being included, especially, when incorporating edge and inverse maps.

Methods M. I. M. E. M. Dice H95
Wo UDA  67.4% 10.55
Mask  68.1% 9.72
Edge Map  71.1% 8.20
Inverse Map  73.9% 7.41
Table 3: Ablation study on key components.

4.5 Feature Visualisation

As shown in Fig. 6, we visualise the features corresponding to the pathological pixels from the source and target domains to interpret the UDA process. We observe that before domain adaptation the features from the source and domain features are not seperatable and get more close to each other after domain adaptation. We argue that traditional feature-level methods have limitations when the features from the same domain are from difference subspaces in image segmentation tasks.

Figure 6: Illustration of pixel-wise feature distribution. Blue denotes pixels from source domains, yellow represents pixels from target domain. We observe that the source features and domain features are more close to each other after domain adaptation. Visualization is done by t-SNE [28]. (Best viewed in color)

5 Conclusion

We presented an efficient adversarial domain adaptation framework for cross-site medical image segmentation. The proposed framework enforce the segmentation model to adapt to target domain by learning semantic information in an adversarial fashion. We found that using a few shots of images from the target domain can significantly improve the segmentation results.

References

  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §2.
  • [2] S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, R. T. Shinohara, C. Berger, S. M. Ha, M. Rozycki, et al. (2018) Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629. Cited by: §3.3.
  • [3] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan (2010) A theory of learning from different domains. Machine learning 79 (1-2), pp. 151–175. Cited by: §1, §1.
  • [4] K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 3722–3731. Cited by: §2.
  • [5] C. Chen, Q. Dou, H. Chen, J. Qin, and P. Heng (2019) Synergistic image and feature adaptation: towards cross-modality domain adaptation for medical image segmentation. arXiv preprint arXiv:1901.08211. Cited by: §2.
  • [6] H. Ding, X. Jiang, A. Q. Liu, N. M. Thalmann, and G. Wang (2019) Boundary-aware feature propagation for scene segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6819–6829. Cited by: §3.2.
  • [7] Q. Dou, C. Ouyang, C. Chen, H. Chen, and P. Heng (2018) Unsupervised cross-modality domain adaptation of convnets for biomedical image segmentations with adversarial loss. arXiv preprint arXiv:1804.10916. Cited by: §2, §2, §3.3.
  • [8] Y. Ganin and V. Lempitsky (2014)

    Unsupervised domain adaptation by backpropagation

    .
    arXiv preprint arXiv:1409.7495. Cited by: §1.
  • [9] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016) Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17 (1), pp. 2096–2030. Cited by: §2, §3.3.
  • [10] W. Gao, X. Zhang, L. Yang, and H. Liu (2010) An improved sobel edge detection. In 2010 3rd International Conference on Computer Science and Information Technology, Vol. 5, pp. 67–71. Cited by: §3.2.
  • [11] M. Ghafoorian, A. Mehrtash, T. Kapur, N. Karssemeijer, E. Marchiori, M. Pesteie, C. R. Guttmann, F. de Leeuw, C. M. Tempany, B. van Ginneken, et al. (2017) Transfer learning for domain adaptation in mri: application in brain lesion segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 516–524. Cited by: §2.
  • [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2, §3.3.
  • [13] Z. Hayder, X. He, and M. Salzmann (2017) Boundary-aware instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5696–5704. Cited by: §3.2.
  • [14] G. Hinton, O. Vinyals, and J. Dean (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. Cited by: §3.1.
  • [15] J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell (2017) Cycada: cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213. Cited by: §4.3.1, Table 2.
  • [16] J. Hoffman, D. Wang, F. Yu, and T. Darrell (2016) Fcns in the wild: pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649. Cited by: §2.
  • [17] Y. Huo, Z. Xu, S. Bao, A. Assad, R. G. Abramson, and B. A. Landman (2018) Adversarial synthesis learning enables segmentation without target modality ground truth. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 1217–1220. Cited by: §2.
  • [18] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: §4.2.2.
  • [19] K. Kamnitsas, C. Baumgartner, C. Ledig, V. Newcombe, J. Simpson, A. Kane, D. Menon, A. Nori, A. Criminisi, D. Rueckert, et al. (2017) Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In International conference on information processing in medical imaging, pp. 597–609. Cited by: §1, §2, §2.
  • [20] N. Karani, K. Chaitanya, C. Baumgartner, and E. Konukoglu (2018) A lifelong learning approach to brain mr segmentation across scanners and protocols. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 476–484. Cited by: §1.
  • [21] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.2.2.
  • [22] H. J. Kuijf, J. M. Biesbroek, J. de Bresser, R. Heinen, S. Andermatt, M. Bento, M. Berseth, M. Belyaev, M. J. Cardoso, A. Casamitjana, et al. (2019) Standardized assessment of automatic segmentation of white matter hyperintensities; results of the wmh segmentation challenge. IEEE transactions on medical imaging. Cited by: §3.3, §4.1, §4.3.1.
  • [23] H. Li, G. Jiang, J. Zhang, R. Wang, Z. Wang, W. Zheng, and B. Menze (2018) Fully convolutional network ensembles for white matter hyperintensities segmentation in mr images. NeuroImage 183, pp. 650–665. Cited by: §1, §3.3, §4.2.2, §4.3.1, Table 2.
  • [24] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez (2017)

    A survey on deep learning in medical image analysis

    .
    Medical image analysis 42, pp. 60–88. Cited by: §1.
  • [25] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §3.3.
  • [26] M. Long, Y. Cao, J. Wang, and M. I. Jordan (2015) Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37, pp. 97–105. Cited by: §2.
  • [27] M. Long, H. Zhu, J. Wang, and M. I. Jordan (2016) Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, pp. 136–144. Cited by: §2.
  • [28] L. v. d. Maaten and G. Hinton (2008) Visualizing data using t-sne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: Figure 6.
  • [29] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §3.3.
  • [30] S. J. Pan and Q. Yang (2009) A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22 (10), pp. 1345–1359. Cited by: §1.
  • [31] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1.
  • [32] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell (2017) Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176. Cited by: §2, §3.3.
  • [33] Y. Zhang, P. David, and B. Gong (2017) Curriculum domain adaptation for semantic segmentation of urban scenes. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2020–2030. Cited by: §2, §2.
  • [34] H. Zhao, H. Li, S. Maurer-Stroh, Y. Guo, Q. Deng, and L. Cheng (2018) Supervised segmentation of un-annotated retinal fundus images by synthesis. IEEE transactions on medical imaging 38 (1), pp. 46–56. Cited by: §2.
  • [35] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §2, Figure 3.
  • [36] X. Zhuang, L. Li, C. Payer, D. Stern, M. Urschler, M. P. Heinrich, J. Oster, C. Wang, O. Smedby, C. Bian, et al. (2019) Evaluation of algorithms for multi-modality whole heart segmentation: an open-access grand challenge. arXiv preprint arXiv:1902.07880. Cited by: §3.3.