Unsupervised Domain Adaptation via CycleGAN for White Matter Hyperintensity Segmentation in Multicenter MR Images

09/10/2020 ∙ by Julian Alberto Palladino, et al. ∙ Universidad Nacional del Litoral University of Buenos Aires 19

Automatic segmentation of white matter hyperintensities in magnetic resonance images is of paramount clinical and research importance. Quantification of these lesions serve as a predictor for risk of stroke, dementia and mortality. During the last years, convolutional neural networks (CNN) specifically tailored for biomedical image segmentation have outperformed all previous techniques in this task. However, they are extremely data-dependent, and maintain a good performance only when data distribution between training and test datasets remains unchanged. When such distribution changes but we still aim at performing the same task, we incur in a domain adaptation problem (e.g. using a different MR machine or different acquisition parameters for training and test data). In this work, we explore the use of cycle-consistent adversarial networks (CycleGAN) to perform unsupervised domain adaptation on multicenter MR images with brain lesions. We aim at learning a mapping function to transform volumetric MR images between domains, which are characterized by different medical centers and MR machines with varying brand, model and configuration parameters. Our experiments show that CycleGAN allows us to reduce the Jensen-Shannon divergence between MR domains, enabling automatic segmentation with CNN models on domains where no labeled data was available.



There are no comments yet.


page 3

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

White matter hyperintensities (WMH), also known as leukoaraiosis, are a characteristic of small vessel disease commonly observed in the brain of elderly subjects [5]. Magnetic resonance image (MRI) is the modality of choice to study WMH lesions. During the last years, it has been shown that accurate quantification of WMH volume is of paramount clinical importance since it may serve as a predictor for risk of stroke, dementia and mortality [1]. Given that manual delineation of these lesions is a difficult and time consuming task, several computational methods were recently proposed to deal with automatic WHM segmentation. CNN architectures specifically tailored for biomedical image segmentation [17, 7, 20] have outperformed all previous techniques in the task of automatic brain structures segmentation in general, and WMH in particular [5, 16, 8, 18]. However, these models are extremely data-dependent, in the sense that they require many annotated images to be trained. More importantly, they maintain a good performance only when the data distribution between training (source) and test (target) domains remains unchanged. When such distribution changes (incurring in a co-variate shift scenario [21]) but we still aim at performing the same task, domain adaptation techniques [15] can be used to achieve better performance in unseen target domains.

In the context of WMH segmentation in MRI, co-variate shift and domain adaptation problems may arise when we have trained a model with images coming from a particular medical center, MR machine brand or parameter setup, and we want to test it on images acquired under different conditions. In this case, the performance of the segmentation algorithm tends to decrease. Several studies have empirically shown this behaviour and proposed alternative methods to deal with it. Ghafoorian and co-workers [3]

showed that it is possible to apply supervised transfer learning to re-use WMH segmentation models when annotated images are available in the target domain. In this case, simple fine-tunning of a previously trained model is enough to achieve state-of-the-art results in the new domain. The disadvantage of this supervised approach is that we require manual annotations in the target domain.

In this work we focus on strategies which do not require manual annotations for the target domain. Following this idea, several approaches based on adversarial training have been proposed. The work of Kamnitsas et al [6] was one of the first ones employing adversarial training to learn domain invariant features for the task of brain lesion segmentation. More recently, Orbes-Orteaga [14] proposed a different strategy which also employs adversarial learning but they combine it with a consistency loss term requiring multiple target domains with paired images, which is not a common situation in clinical scenarios. Moreover, both adversarial approaches require to have access to the unlabeled images during training, making it difficult to apply the resulting models in completely unseen domains. Here we will focus on learning a mapping function to shift the target distribution towards the source distribution, so that previously trained models can be directly applied in new scenarios.

Closest to our work are those of [19] and [10], which pose domain adaptation as an image translation problem and employ Cycle-Consistent Adversarial Networks (CycleGAN) [22] to translate from target to source domain. Differently from us, [19] focuses on optical coherence tomography (OCT) images while [10] explores the use of CycleGAN for anatomical segmentation (bilateral amygdala) in brain MR.

Contributions: To the best of our knowleadge, our work is the first one to provide empirical evidence that CycleGAN enables segmentation in multicentric MR data for brain lesions. To this end, we study one of the most challenging brain lesion segmentation problems, namely WMH. In addition, while previous approaches employ cycle-consistent adversarial networks operating only on 2D patches, we show that it is possible to train them directly operating on tridimensional images. We measure the effectiveness of our approach by analyzing not only the multicenter segmentation results, but also the co-variate shift in terms of pairwise Jensen-Shannon divergences after domain mapping. We show that lower inter-domain Jensen-Shannon divergences correlate with better performance across different domains. Our experimental evaluation in brain MR images coming from three different medical centers demonstrate that unsupervised domain adaptation via CycleGAN improves WMH segmentation in multicenter MR images.

Figure 1: Qualitative results for WMH segmentation (in red). Examples considering Singapore () as source and Utrecht () as target domain. and are the segmentation models trained on Singapore and Utrecht respectively. From left to right: (i) no domain adaptation; (ii) adaptation via histogram matching; (iii) adaptation via CycleGAN (iv) training in target domain and (v) ground-truth.

2 Unsupervised Domain Adaptation via CycleGAN

We highlight that the main contribution of this work is not related to a novel generative adversarial network. Instead, we aim at providing empirical evidence that existing Cycle-GANs tailored to process 3D images help to perform domain adaptation in the context of brain lesion segmentation for multicenter MR data. For completeness, we include a brief description of the Cycle-GAN framework.

Cycle-GAN is a style-transfer CNN model based on generative adversarial networks [4], but redesigned with the specific goal of translating images from a source domain to a target domain in the absence of paired examples. The idea is to learn a mapping function such that the distribution of images from is indistinguishable from the distribution by using an adversarial loss. Because this mapping is highly under-constrained, it is coupled with an inverse mapping , thus introducing a cycle consistency loss to enforce (and vice versa). The functions and are neural networks which follow a encoder-decoder architecture (see Appendix section 5.1 for a detailed description of the generator architecture). The framework also incorporates discriminators and which learn to distinguish between translated and real examples following a standard adversarial scheme [4] (see Appendix section 5.2

for a detailed description of the discriminator architecture) . Additionally, an identity mapping term is introduced in the loss function to encourage

and to apply the identity transformation when real samples of the target domain are provided as the input to the generator. The identity regularization plays a crucial role in producing realistic mapping functions and avoiding potential hallucinations that may emerge during image translation (see Figure 2 for a visual example of such hallucinations). In the following we describe the terms included in the final loss function used to train the CycleGAN.

Adversarial loss. The adversarial term encourages the mapping functions to translate from one domain to the other. It is applied to both mapping functions and . For and its corresponding adversarial discriminator , the objective can be expressed as:

where G generates images that look similar to images from domain , while learns to distinguish between translated samples and real samples . aims to minimize this objective against its adversary , that tries to maximize it, i.e., min max. An analogous loss is also introduced for and its discriminator as follows:

where and behave analogously, i.e., min max.

Cycle consistency loss. Adversarial losses alone do not guarantee that images can be converted back and forth from X to Y and vice versa. Intuitively, if an image is mapped to a domain by , applying the inverse mapping should return the exact same image . This behaviour is encouraged by the cycle consistency term and is formulated as:

Identity mapping loss. Last but not least, the identity mapping term encourages and to apply the identity transformation when real samples of the target domain are provided as the input to the generator. This behaviour is encoded in the following equation:

CycleGAN Training. The final loss function used to train the CycleGAN model is defined as the sum of the adversarial (), cycle consistency () and identity (

) losses. The model is trained following an iterative approach, where each step consists in the training of the discriminators over one real image and one synthetic image, followed by the generators trained to translate one instance each. We define an epoch of the whole training process as 1000 of these steps, and the whole training lasts for 200 epochs. We adopt Adam optimizer with standard parameters and initial learning rate of 0.0002. We used the original CycleGAN architecture which was only modified in two ways: first, we adapted it to process 3D patches by using standard 3D convolutions. Second, we replaced transposed convolutions by resize convolutions

[13] in order to avoid checkerboard artifacts in the output images (see Figure 3).

Figure 2: Effect of the identity mapping regularization term when training CycleGAN. (a) Input image. (b) Hallucinations induced by a CycleGAN trained without identity mapping. (c) Results obtained with CycleGAN trained with identity mapping (note there is no change in morphology, only in the intesities).

Segmentation model.

For WMH segmentation, we employ a 3D U-Net architecture with a final softmax layer producing a lesion probability map (see Appendix section

5.3 for a detailed description of the U-Net architecture used in this work). For optimization, we used the Adam optimizer with a learning rate of 0.0002. Patch-based training is performed by constructing balanced mini-batches of image patches of . We balance the mini-batches by sampling with equal probability from those patches centered on a voxel with WMH presence and those centered on a healthy voxel.

Domain Adaptation. For a source domain with ground-truth annotations, we train a segmentation model . Then, given an unseen domain , we learn a mapping function using the CycleGAN framework. In this way, we enable segmentation of images from the target domain , transforming them before segmentation. The final segmentation maps are obtained by .

Figure 3: Example of the “checkerboard artifact”. (a) correctly transformed image (obtained with a generator which uses resize convolution). (b) “checkerboard artifact’ obtained when using transposed convolutions.

3 Experiments

Database. We employ the 2017 WMH Segmentation Challenge dataset [8] which is publicly available and includes multicenter images. This database provides 60 brain magnetic resonance images (T1 and FLAIR sequences) captured in three different medical centers alongside their manual WMH segmentations. The 60 MRIs are divided in 3 groups:

  • University Medical Center, Utrecht: 20 MR images captured with a 3T Philips Achieva machine. It includes T1 (Voxel size: . TR/TE: 7.9/4.5 ms.) and FLAIR (Voxel size: . TR/TE/TI: 11000/125/2800 ms.) images.

  • National University Health System, Singapore: 20 MR images captured with a 3T Siemens TrioTim machine. It includes T1 (Voxel size: . TR/TE/TI: 2300/1.9/900 ms) and FLAIR (Voxel size: . TR/TE/TI: 9000/82/2500 ms) images.

  • Vrije Universiteit, Amsterdam: 20 MR images captured with a 3T GE Signa HDxt machine. It includes T1 (Voxel size: . TR/TE: 9.9/4.6 ms) and FLAIR (Voxel size: . TR/TE/TI: 4800/279/1650 ms) images.

Baseline histogram matching (HM) method. For comparison, we implemented a baseline adaptation method using standard histogram matching [12]. We adopted a pairwise strategy proceeding as follows: given an image from the unseen target domain , we look for the image in the training source domain that is most similar to in terms of Jensen-Shannon (JS) divergence [9] (see next paragraph). We then transform the histogram of to match that of using the SimpleITK [11] histogram matching function.

Jensen-Shannon Divergence.

JS divergence is a symmetric measure that quantifies how different are two probability distributions (see

[9] for more details about the definition of JS divergence). We interpret the histogram of intensities of every image as a distribution, and use the JS divergence to measure pairwise distances. Note that only considers the voxel intensities withing the head mask (i.e. excluding background). In this study, we employ pairwise JS divergences for two different tasks. On the one hand, as described in the previous paragraph, we use it to choose the closest image for histogram matching.

On the other hand, we use it as an indicator to quantify co-variate shift between domains. To this end, we define the average inter-domain JS divergence between all possible pairs of images from two domains and as:

where and indicate the number of images in each domain. We also define the average JS divergence intra-domain , but of course we exclude comparisons of a given image with itself (, ). We employ inter and intra-domain pairwise JS divergences as an indicator of co-variate domain shift.

Figure 4: WHM segmentation results measured in terms of Dice for different source and target domains: (i) no domain adaptation; (ii) adaptation via histogram matching; (iii) adaptation via CycleGAN (iv) training in the target domain.
Figure 5: Co-variate shift comparison based on the JS divergence between pairs of images in 3 different scenarios: without domain adaptation (, in blue), with HM domain adaptation (, in yellow), with CycleGAN domain adaptation ( in green) and within the same domain (, in brown). Lower pairwise JS indicates less differences in the intensity distribution. The average divergence is shown in with a red diamond.

Experiments and discussion.

All images were first pre-processed using z-scores normalization to account for big variations in intensity ranges. The models were implemented in Keras. We employ two different approaches to evaluate the effectiveness of CycleGAN on domain adaptation.

The first approach directly quantifies the segmentation performance with and without domain adaptation. Figure 4 shows segmentation results, comparing CycleGAN and HM, but also including an upper bound given by training on images from the target domain. Figure 1 shows some visual results from the same experiment. We use Dice coefficient [2] to measure segmentation performance. For every experiment we performed 7-fold cross validation, therefore training 7 times with 17 images (14 for training and 3 for validation) and leaving the other 3 out for testing. The results show that CycleGAN not only improves segmentation performance for all combinations of source and target but also enables segmentation in cases with null Dice before domain adaptation. When compared with HM, we observe that using CycleGAN systematically improves the mean Dice, while HM presents variable performance depending on the domain. Moreover, in all but one scenario CycleGAN outperforms HM in this task.

The second evaluation approach uses the JS pairwise divergences as a proxy to approximate the co-variate shift. Results are shown in Figure 5, where we compare the JS divergence for multiple domains without domain adaptation (), with HM domain adaptation (), with CycleGAN domain adaptation () and within the same domain (). The red diamonds in the boxplot indicate the mean pairwise JS divergence previously defined. It can be observed that, in most of the cases, CycleGAN significantly reduces outperforming the results obtained with HM.

4 Conclusions

In this work we show, for the first time, that CycleGAN-based domain adaptation improves lesion segmentation in multicenter brain MR images, particularly in WMH lesions. We compared the proposed approach with standard histogram matching, both in terms of segmentation quality improvement and co-variate shift between source and target domains. JS divergence is used as a measure to understand the differences between the domains, which seems to anti-correlate with segmentation performance. In other words, lower inter-domain results in better generalization from source to target.

Our results have important practical implications. First, even if the segmentation performance is not as good as the upper bound given by training with annotated data from the target domain, it improves segmentation in cases which had null Dice before domain adaptation. This could be used to automatically detect the presence of WMH lesions in completely unseen domains without ground-truth. Second, differently from other adversarial domain adaptation techniques [6, 14] which require to access the unannotated images of the target domain while training the segmentation network, our method can be used on completely unseen domains without re-training the segmenter. This makes the method useful in real clinical situations where new MR machines may arrive to a hospital once the segmentation software has been deployed.

In the future, we plan to extend our study to other type of brain lesions (e.g. stroke or brain tumours) which could also benefit from this approach.

The authors gratefully acknowledge the support of UNL (CAID-PIC-50220140100084LI) and ANPCyT (PICT 2018-03907).

5 Appendix

5.1 CycleGAN Generator Architecture

For the generator, we adopted the original architecture from CycleGAN with the difference that layers are converted from 2D to 3D, extending the kernel dimensions accordingly (e.g. convolutions are instead of

). Reflection padding was used to reduce artifacts in the first and last layer. The last layer does not have an activation function, since the model can adapt according to the intensity range of each domain.

Kernel Stride #Kernels Activation Padding
L1 Conv3D (f:7,7,7) (s:1,1,1) (N:32) ReLu (RP: 3,3,3)
L2 Conv3D (f:3,3,3) (s:2,2,2) (N:64) ReLu
L3 Conv3D (f:3,3,3) (s:2,2,2) (N:128) ReLu
L4 … L13 ResBlock (f:3,3,3) (s:1,1,1) (N:128) ReLu
(f:3,3,3) (s:1,1,1) (N:128) ReLu
L14 UpSampling
Conv3D (f:3,3,3) (s:1,1,1) (N:128) ReLu
L16 UpSampling
Conv3D (f:3,3,3) (s:1,1,1) (N:64) ReLu
L17 UpSampling
Conv3D (f:3,3,3) (s:1,1,1) (N:32) ReLu
L18 Conv3D (f:7,7,) (s:1,1,1) (N:2) None (RP: 3,3,3)
Table 1: Detailed description of the CycleGAN generator architecture.

5.2 CycleGAN Discriminator Architecture

The discriminator follows an architecture similar to that of PatchGAN but with 3D convolutions.

Kernel Stride #Kernels Activation Normalization
L1 Conv3D (f:4,4,4) (s:2,2,2) (N:64) LeakyReLu None
L2 Conv3D (f:3,3,3) (s:2,2,2) (N:128) LeakyReLu Instance
L3 Conv3D (f:3,3,3) (s:2,2,2) (N:256) LeakyReLu Instance
L4 Conv3D (f:3,3,3) (s:1,1,1) (N:512) ReLu Instance
L5 Conv3D (f:4,4,4) (s:1,1,1) (N:1) Sigmoid None
Table 2: Detailed description of the CycleGAN discriminator architecture.

5.3 3D U-Net Architecture

We adopted a modifified version of the standard U-Net architecture [17]. We replaced the 2D convs of the standard U-Net architecture for 3D convs. The encoding blocks consist of two convolutional layers with kernel of size , padding = 1 and ReLU activation followed by a max-pooling layer. The decoding blocks have also two convolutional layers, but we use upsampling via transposed convolutions before each block. The standard U-Net uses concatenation of feature maps in the skip connections. We replaced concatenation by sum to combine the localized features of the encoding path with the input of the corresponding block from the decoding path. The last layer consists of a convolution with softmax to output a voxel-wise probability maps.

Kernel Stride #Kernels Activation
L1 EncodingBlock (N:32)
L2 EncodingBlock (N:64)
L3 EncodingBlock (N:128)
L4 EncodingBlock (N:256)
L5 Conv3D (f:3,3,3) (s:1,1,1) (N:256) ReLu
L6 Conv3D (f:3,3,3) (s:1,1,1) (N:256) ReLu
L7 DecodingBlock (N:256)
L8 DecodingBlock (N:128)
L9 DecodingBlock (N:64)
L10 DecodingBlock (N:32)
L11 Conv3D (f:1,1,1) (s:1,1,1) (N:2)
Table 3: U-Net architecture used for WMH segmentation.


  • [1] S. Debette and H. Markus (2010) The clinical importance of white matter hyperintensities on brain magnetic resonance imaging: systematic review and meta-analysis. Bmj 341, pp. c3666. Cited by: §1.
  • [2] L. R. Dice (1945) Measures of the amount of ecologic association between species. Ecology 26 (3), pp. 297–302. Cited by: §3.
  • [3] M. Ghafoorian, A. Mehrtash, T. Kapur, N. Karssemeijer, E. Marchiori, M. Pesteie, C. R. Guttmann, F. de Leeuw, C. M. Tempany, B. van Ginneken, et al. (2017) Transfer learning for domain adaptation in mri: application in brain lesion segmentation. In International conference on medical image computing and computer-assisted intervention, pp. 516–524. Cited by: §1.
  • [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2.
  • [5] R. Guerrero, C. Qin, O. Oktay, C. Bowles, L. Chen, R. Joules, R. Wolz, M. d. C. Valdés-Hernández, D. Dickie, J. Wardlaw, et al. (2018) White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks. NeuroImage: Clinical 17, pp. 918–934. Cited by: §1.
  • [6] K. Kamnitsas, C. Baumgartner, C. Ledig, V. Newcombe, J. Simpson, A. Kane, D. Menon, A. Nori, A. Criminisi, D. Rueckert, et al. (2017) Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In International conference on information processing in medical imaging, pp. 597–609. Cited by: §1, §4.
  • [7] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker (2017) Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis 36, pp. 61–78. Cited by: §1.
  • [8] H. J. Kuijf, J. M. Biesbroek, J. De Bresser, R. Heinen, S. Andermatt, M. Bento, M. Berseth, M. Belyaev, M. J. Cardoso, A. Casamitjana, et al. (2019) Standardized assessment of automatic segmentation of white matter hyperintensities and results of the wmh segmentation challenge. IEEE transactions on medical imaging 38 (11), pp. 2556–2568. Cited by: §1, §3.
  • [9] J. Lin (1991) Divergence measures based on the shannon entropy. IEEE Transactions on Information theory 37 (1), pp. 145–151. Cited by: §3, §3.
  • [10] Y. Liu, G. R. Kirk, B. M. Nacewicz, M. A. Styner, M. Shen, D. Nie, N. Adluru, B. Yeske, P. A. Ferrazzano, and A. L. Alexander (2019) Harmonization and targeted feature dropout for generalized segmentation: application to multi-site traumatic brain injury images. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data, pp. 81–89. Cited by: §1.
  • [11] B. C. Lowekamp, D. T. Chen, L. Ibáñez, and D. Blezek (2013) The design of simpleitk. Frontiers in neuroinformatics 7, pp. 45. Cited by: §3.
  • [12] L. G. Nyúl, J. K. Udupa, and X. Zhang (2000) New variants of a method of mri scale standardization. IEEE transactions on medical imaging 19 (2), pp. 143–150. Cited by: §3.
  • [13] A. Odena, V. Dumoulin, and C. Olah (2016) Deconvolution and checkerboard artifacts. Distill. External Links: Link, Document Cited by: §2.
  • [14] M. Orbes-Arteaga, T. Varsavsky, C. H. Sudre, Z. Eaton-Rosen, L. J. Haddow, L. Sørensen, M. Nielsen, A. Pai, S. Ourselin, M. Modat, et al. (2019) Multi-domain adaptation in brain mri through paired consistency and adversarial learning. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data, pp. 54–62. Cited by: §1, §4.
  • [15] S. J. Pan and Q. Yang (2009) A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22 (10), pp. 1345–1359. Cited by: §1.
  • [16] M. F. Rachmadi, M. d. C. Valdés-Hernández, M. L. F. Agan, C. Di Perri, T. Komura, A. D. N. Initiative, et al. (2018) Segmentation of white matter hyperintensities using convolutional neural networks with global spatial information in routine clinical brain mri with none or mild vascular pathology. Computerized Medical Imaging and Graphics 66, pp. 28–43. Cited by: §1.
  • [17] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. CoRR abs/1505.04597. External Links: Link, 1505.04597 Cited by: §1, §5.3.
  • [18] N. Roulet, D. F. Slezak, and E. Ferrante (2019) Joint learning of brain lesion and anatomy segmentation from heterogeneous datasets. arXiv preprint arXiv:1903.03445. Cited by: §1.
  • [19] P. Seeböck, D. Romo-Bucheli, S. Waldstein, H. Bogunovic, J. I. Orlando, B. S. Gerendas, G. Langs, and U. Schmidt-Erfurth (2019) Using cyclegans for effectively reducing image variability across oct devices and improving retinal fluid segmentation. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 605–609. Cited by: §1.
  • [20] M. Shakeri, S. Tsogkas, E. Ferrante, S. Lippe, S. Kadoury, N. Paragios, and I. Kokkinos (2016) Sub-cortical brain structure segmentation using f-cnn’s. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), pp. 269–272. Cited by: §1.
  • [21] A. Storkey (2009) When training and test sets are different: characterizing learning transfer.

    Dataset shift in machine learning

    , pp. 3–28.
    Cited by: §1.
  • [22] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017)

    Unpaired image-to-image translation using cycle-consistent adversarial networks


    Proceedings of the IEEE international conference on computer vision

    pp. 2223–2232. Cited by: §1.