Learning Multi-Site Harmonization of Magnetic Resonance Images Without Traveling Human Phantoms

09/30/2021 ∙ by Siyuan Liu, et al. ∙ University of North Carolina at Chapel Hill 0

Harmonization improves data consistency and is central to effective integration of diverse imaging data acquired across multiple sites. Recent deep learning techniques for harmonization are predominantly supervised in nature and hence require imaging data of the same human subjects to be acquired at multiple sites. Data collection as such requires the human subjects to travel across sites and is hence challenging, costly, and impractical, more so when sufficient sample size is needed for reliable network training. Here we show how harmonization can be achieved with a deep neural network that does not rely on traveling human phantom data. Our method disentangles site-specific appearance information and site-invariant anatomical information from images acquired at multiple sites and then employs the disentangled information to generate the image of each subject for any target site. We demonstrate with more than 6,000 multi-site T1- and T2-weighted images that our method is remarkably effective in generating images with realistic site-specific appearances without altering anatomical details. Our method allows retrospective harmonization of data in a wide range of existing modern large-scale imaging studies, conducted via different scanners and protocols, without additional data collection.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Magnetic resonance imaging (MRI) is a non-invasive and versatile technique that provides good soft tissue contrasts useful for diagnosis, prognosis, and treatment monitoring. Since MRI experiments are costly and time-consuming, modern large-scale MRI studies typically rely on multiple imaging sites to collaboratively collect data with greater sample sizes for more comprehensive coverage of factors that can affect study outcomes, such as age, gender, geography, socioeconomic status, and disease subtypes. Notable examples of multi-site studies include the Adolescent Brain Cognitive Development (ABCD)[1], the Alzheimer’s Disease Neuroimaging Initiative (ADNI)[2], and the Australian Imaging, Biomarkers and Lifestyle Flagship Study of Aging (AIBL)[3].

Multi-site data collection leads inevitably to undesirable elevation of non-biological variability introduced by differences in scanners[4] and imaging protocols[5]. Protocols can be prospectively harmonized by selecting for the individual sites the acquisition parameters that result in images with maximal inter-site consistency. However, prospective harmonization requires extensive data acquisition for parameter tuning, needs to be performed before each study, and does not allow for correction of data collected in studies that have already taken place. Moreover, significant inter-site variability can still occur in data collected with harmonized acquisition protocols simply due to irreconcilable differences between scanners[4]. Retrospective MRI harmonization[6] overcomes these limitations by performing post-acquisition correction and is hence applicable to existing studies for improving the accuracy, reliability, and reproducibility of downstream processing, analyses, and inferences.

Category Method Summary
Statistics Intensity normalization Histogram matching[9, 8, 7] Align distributions of voxel intensity values based on an image template constructed from several control subjects.
White stripe[10] Normalize intensity values based on patches of normal appearing white matter (NAWM). Rescaled intensity values are biologically interpretable as units of NAWM.

Multi-site image harmonization by cumulative distribution function alignment (MICA)

[11]
Transform voxel intensity values of one scan to align with the intensity cumulative distribution function (CDF) of a second scan.
Batch effect adjustment

Removal of Artificial Voxel Effect by Linear regression (RAVEL)

[12]
Estimate using a control region of the brain the latent factors of unwanted variation common to all voxels.
Combating batch effects (ComBat)[13, 14] Identify batch-specific transformation to express all data in a common space.
Learning Machine learning Regression ensembles with patch learning for image contrast agreement (REPLICA)[15]

Supervised training of a random forest for nonlinear regression of a target contrast.

NeuroHarmony[16] Supervised training of a random forest for nonlinear regression of a target contrast determined based on prescribed regions.
Deep learning DeepHarmony[17] Supervised training of a U-Net to produce images with a consistent contrast.
Disentangled Latent Space (DLS)[18] Supervised training of a deep neural network using disentangled anatomical and contrast components.
Unlearning of dataset bias[19] Supervised training of a deep neural network to learn scanner-invariant features and then reducing scanner influence on network predictions in tasks of interest.
MURD Unsupervised harmonization of images using content and style disentanglement learned from jointly multi-site images.
Table 1: Existing MRI harmonization methods
Figure 1: Overview of MURD. a, Multi-site representation disentanglement. Given image sets from sites , MURD disentangles each image set in a site-invariant content space and a site-specific style space with the content features aligned in the content space . This is achieved using content-style disentangled cycle translation (CS-DCT). b, Site-specific CS-DCT. An image from the -th site is encoded in content space and style space to obtain content features and style features . Style features of the -th site () are generated using style generator . and decoded to generate a harmonized image for the -th site , which is in turn encoded to generate content features for reconstructing image with style features . c, Reference-specific CS-DCT. Unlike site-specific CS-DCT in b, style features are obtained by encoding a reference image in style space . d, The MURD framework. MURD implements CS-DCT by employing a site-shared content encoder, a site-specific style encoder, a site-shared generator, a site-specific style generator, and a site-specific discriminator for content encoding, style encoding, decoding, style generation, and adversarial learning, respectively. MURD is constrained by four types of losses: consistency losses , and , adversarial loss , content alignment loss , and identity loss .

Existing retrospective harmonization methods are either statistics-based or learning-based (Table 1). Statistics-based methods align intensity distributions via intensity normalization[7, 8, 9, 10, 11] or batch effect adjustment[12, 13, 14]. However, these methods are typically limited to whole-image, but not detail-specific, harmonization. Learning-based methods translate images between sites via nonlinear mappings determined using machine learning[15, 16] or deep learning[17, 18, 19]

, with or without supervision. Machine learning methods predict harmonized images using regression models learned, typically with supervision, with hand-crafted features. In contrast, deep learning techniques automatically extract features pertinent to the harmonization task. Supervised methods typically require training data acquired from traveling human phantoms, which might not always be available for large-scale, multi-site, or longitudinal studies. Unsupervised deep learning techniques

[20, 21, 22] determine mappings between sites using unpaired images and therefore avoid the need for traveling human phantom data. These methods are however unscalable to large-scale multi-site MRI harmonization as they typically learn pair-wise mappings between sites. For sites, these methods learn mappings for all site pairs and therefore require a large amount of data for learning the multitude of network parameters. These pair-wise methods are also ineffective by not fully and jointly utilizing complementary information available from all sites.

Here, we draw inspiration from recent advancements in multi-domain image-to-image translation

[23, 24, 25] and introduce a unified framework for simultaneous multi-site harmonization using only a single deep learning model. Our method, called multi-site unsupervised representation disentangler (MURD), decomposes an image into anatomical content that is invariant across sites and appearance style (e.g., intensity and contrast) that is dependent on each site. Harmonized images are generated by combining the content of the original image with styles specific to the different sites. More specifically, encoding an image in site-invariant and site-specific feature spaces is achieved via two encoders, i.e., a content encoder that captures anatomical structures common across sites and a style encoder that captures style information specific to each site. A site-harmonized image is generated via a generator that combines the extracted content with the style associated with a target site. The target style can be specified by a reference image from a site or by a randomized style code generated by a style generator specific to each site. The latter allows multiple appearances to be generated in relation to natural style variations associated with each site. The network is trained with losses that are designed to promote full representation disentanglement and to maximally preserve structural details. MURD is summarized in Figure 1 and detailed in the Methods section.

Results

Figure 2: Harmonization of T1- and T2-weighted Images. Site-specific harmonization of a, T1-weighted images and b, T2-weighted images. Reference-specific harmonization of c, T1-weighted images and d, T2-weighted images. The original images are shown in the first column and the harmonized images are shown for GE, Philips and Siemens respectively in the second to fourth columns. MURD is remarkably effective in harmonizing contrasts and preserving details.
Figure 3: Numerical Evaluation of Harmonization Outcomes. Quantitative evaluation conducted for the GE, Philips, and Siemens sites using FID and KID as metrics for a, T1-weighted images ( slices per site) and b, T2-weighted images ( slices per site) from the validation dataset, and c, T1-weighted images ( slices per site) and d, T2-weighted images (

slices per site) from the generalizability dataset. Bullseyes show the means and error bars show the standard errors on the means. MURD, both site-specific (SS) and reference-specific (RS), yields lower FID and KID values that are closer to the reference values than DRIT++

[26] and StarGAN-v2[25]. The FID and KID values for the generalizability dataset are largely consistent with those of the validation dataset.
Figure 4: Validation via the Traveling Human Phantom Dataset. Evaluation of harmonized a, T1-weighted images and b, T2-weighted images from the traveling human phantom dataset for the following harmonization tasks: GE to Philips (GP, slices), Philips to GE (PG, slices), GE to Siemens (GS, slices), Siemens to GE (SG, slices), Philips to Siemens (PS, slices), and Siemens to Philips (SP, slices). Segmentation consistency of c, T1-weighted images and d, T2-weighted images from the traveling human phantom dataset with and without harmonization: GP ( volume), PG ( volume), GS ( volumes), SG ( volumes), PS ( volumes), and SP ( volumes). The lines show the means and the shaded regions show the standard errors on the means. MURD yields the best performance in terms of MAE, MS-SSIM, and PSNR. The improvement in DSCs indicates that MURD harmonization significantly increases consistency of tissue segmentation.
Figure 5: Continuous and Incremental Harmonization. Continuous harmonization of a, T1-weighted images and b, T2-weighted images between two sites, demonstrating that MURD smoothly changes image contrasts without altering anatomical details thanks to its ability to disentangle content and style information.

0.1 Materials

We demonstrate the effectiveness of MURD on brain T1- and T2-weighted images of children 9 to 10 years of age acquired through the Adolescent Brain Cognitive Development (ABCD) study[1] using scanners from different vendors, including General Electric (GE), Philips, and Siemens. Informed consents were obtained from all participants[1]. Although the imaging protocols were matched across scanners at the different imaging sites[27], inter-scanner variability in appearance is still significant (Figure 2). For simplicity, we group the images and treat them as coming from three virtual sites—the GE, Philips, and Siemens sites. We trained and tested MURD separately for T1-weighted images and T2-weighted images. For each modality, we trained MURD using a modest sample size of 20 images per vendor. Three datasets per modality were used for testing different aspects of MURD:

  1. Validation Dataset – A small but diverse dataset of 10 images per vendor, carefully selected to be structurally different from the images in the training dataset. The purpose of this testing dataset is to test the effectiveness of MURD beyond the training dataset.

  2. Generalizability Dataset – A large dataset of 1000 images randomly selected for each vendor. This testing dataset is allowed to be deliberately much larger than the training dataset to test the generalizability of MURD.

  3. Human Phantom Dataset – A traveling human phantom dataset of 1 subject scanned on both GE and Philips scanners, 5 subjects scanned on both GE and Siemens scanners, and 2 subjects scanned on both Philips and Siemens scanners. This testing dataset allows numerical evaluation to ensure that anatomical structures are preserved after harmonization. Images for each subject were aligned using Advanced Normalization Tools (ANTS)[28].

Note that the training dataset and the three testing datasets are mutually exclusive with no overlapping samples. Considering both modalities and all three vendors, more than 6000 images were used for evaluation. 60 central axial slices were extracted from each image volume for training and testing.

0.2 MURD Harmonizes Contrasts and Preserves Details

We demonstrate the effectiveness of MURD harmonization on T1- and T2-weighted images (Figure 2). Note that MURD allows harmonization with respect to either a site or a reference image. The former amounts to harmonization with respect to an image randomly drawn from a site image distribution. MURD is remarkably effective in adapting contrast and preserving details when harmonizing images from a source site with a target site (Figures 2a and b). When the source and target sites are identical, MURD retain both contrast and details. When given a reference image, MURD harmonizes the contrasts of images from a source site with the reference image but preserves the anatomical details of the original images (Figures 2c and d).

0.3 MURD Outperforms State-of-the-Art Methods

To further demonstrate the effectiveness and superiority of MURD, we compared it with two closely-related state-of-the-art unsupervised methods, i.e., DRIT++[26] and StarGAN-v2[25], which are designed respectively for dual-domain and multi-domain image-to-image translation. DRIT++ and StarGAN-v2 were implemented and trained as described in the original papers[26, 25]. DRIT++ was trained once for every pair of sites. StarGAN-v2 was trained concurrently for multiple sites, similar to MURD. We quantitatively compared the visual quality of the harmonized images using two metrics, i.e., the Frechét inception distance (FID)[29] and the kernel inception distance (KID)[30], which reflect distribution discrepancy between two sets of images in a manner that correlates well with human judgment[31]. FID and KID are respectively computed based on the Frechét distance and maximum mean discrepancy (MMD) of features from the last average pooling layer of the Inception-V3[32]

trained on the ImageNet

[33]. FID and KID were computed at the slice level for the harmonized images with respect to the training images of the target sites. For site-specific harmonization, 10 target-site harmonized images were generated for each testing image of the source site with 10 randomly generated style codes. For reference-specific harmonization, each image in the source site was harmonized with respect to 10 reference images randomly selected from the testing images of a target site. The results (Figures 3a and b) indicate that MURD outperforms DRIT++ and StarGAN-v2 for both FID and KID. For comparison, reference FID and KID values were computed between the training and testing images from the same site (denoted as “Reference”). The FID and KID values given by MURD are significantly closer to the reference values than DRIT++ and StarGAN-v2. MURD harmonization of the generalizability dataset (Figures 3c and d) yields FID and KID values that are highly consistent with the validation dataset and close to the reference values. This indicates that, although trained using a modest dataset, the model is generalizable to a much larger dataset.

0.4 MURD Efficacy Validated via Traveling Human Phantom Data

The human phantom dataset allows direct quantitative evaluation of the effects of harmonization on consistency of structure and appearance. Based on multiple metrics, including mean absolute error (MAE), multi-scale structural similarity index (MS-SSIM), and peak signal-to-noise ratio (PSNR), MURD significantly outperforms DRIT++ and StarGAN-v2, indicating better harmonization of contrast and preservation of anatomical details (Figures 

4a and b).

0.5 MURD Improves Tissue Segmentation Consistency

Segmentation of brain tissues is sensitive to variation in image contrast and under- or over-segmentation might happen owing to differences in acquisition protocols. We applied Brain Extraction Tool (BET)[34] and FMRIB’s Automated Segmentation Tool (FAST)[35] on T1- and T2-weighted images in the human phantom dataset for brain extraction and tissue segmentation. Tissue segmentation consistency before and after harmonization was measured using the Dice similarity coefficient (DSC) with the tissue segmentation maps from the target site as references. The results (Figures 4c and d) indicate that DSCs are improved remarkably by harmonization using MURD.

0.6 MURD Supports Continuous and Incremental Harmonization

We further demonstrate that MURD completely disentangles content and style information by visualization via continuous and incremental harmonization. We generated images with between-site appearances to aid visual inspection of how appearance changes gradually between sites and whether unwanted anatomical alterations are introduced in the process. Intermediate style features are calculated based on the style features of Site A, , and Site B, , via , where . The intermediate style features and the content features of an image are then used to generate an intermediate image. The results (Figures 5) indicate that MURD gradually and smoothly changes image appearance without altering anatomical details.

0.7 Discussion

We have demonstrated that MURD is remarkably effective in harmonizing MR images by removing non-biological site differences and at the same time preserving anatomical details. MURD network training involves only site-labeled images and requires no traveling human phantom data. This flexibility allows data from existing large-scale studies to be harmonized retrospectively without requiring additional data to be collected.

We have shown that MURD yields superior performance over DRIT++[26] and StarGAN-v2[25]

. For every pair of sites, DRIT++ embeds images in a site-invariant content space capturing information shared across sites and a site-specific style space. The encoded content features extracted from an image of one site are combined with style features from another site to synthesize the corresponding harmonized image. Learning is unsupervised and hence paired data is not required. However, DRIT++ is not scalable due to the need to learn all mappings for all site pairs. DRIT++ is also ineffective because it cannot fully utilize the entire training data and can only learn from two sites at each time, causing it to miss global features that can be learned from images of all sites. Failure to fully utilize training data likely limits the quality of generated images. Unlike DRIT++, StarGAN-v2

[25] is scalable and performs image-to-image translations for multiple sites using only a single model. It has been applied to the problem of MRI harmonization[36]

with promising results. In addition to greater scalability, StarGAN-v2 generates images of higher visual quality owing to its ability to jointly consider the information offered by images from all sites. StarGAN-v2, however, does not explicitly disentangle images into structural and appearance information. This introduces the possibility of altering anatomical details during harmonization via style transfer. In contrast, MURD enforces explicit disentanglement of content and style features by jointly considering images from all sites, allowing it to produce harmonized images with diverse appearances with significantly better preservation of anatomical details (see Supplementary Figures 1–3). Disentanglement safeguards harmonization against altering image anatomical contents and allows gradual and controllable harmonization via interpolation of style features.

The harmonization target is specified for DRIT++ and StarGAN-v2 by a reference image. The appearance of the harmonized image is determined by the style features extracted from the reference image. In addition to a reference image, the harmonization target for MURD can be specified by a site label, which determines the output branch of the style generator and the style encoder. A latent code sampled from the standard Gaussian distribution determines an appearance specific to the site.

A recent MRI harmonization method, called CALAMITI[37], relies on intra-site supervised image-to-image translation and unsupervised domain adaptation for multi-site harmonization. This requires training a disentangled representation model with intra-site multi-contrast images (T1- and T2-weighted images) of the same subjects and retraining the model for a new site via domain adaptation[38]. Unlike CALAMITI, MURD requires only images from a single contrast and can learn multi-site harmonization simultaneously without needing fine-tuning or retraining.

We consider the multi-site harmonization problem as image-to-image translation among multiple sites and propose an end-to-end multi-site unsupervised representation disentangler (MURD, see Figure 1) to learn content-style disentangled cycle translation mappings that translate images forward and backward between any two sites. We employ two encoders to respectively embed each image in a site-invariant content space, which captures anatomical information, and a site-specific style space, which captures appearance information, and a generator to produce a harmonized image using the encoded content features and site-specific style features.

0.8 Content-Style Disentangled Cycle Translation (CS-DCT)

Given image sets from distinct sites, MURD utilizes content-style disentangled cycle translation (CS-DCT) to disentangle each image set in a site-invariant content space and a site-specific style space (Figure 1a). CS-DCT, realized with sequential forward and backward translation, can be site-specific (Figure 1b) or reference-specific (Figure 1c). MURD jointly learns site-invariant content features and site-specific style features from image , and utilize generator to construct the harmonized image () using content features and style features in forward translation. Style features are generated by a style generator or extracted from a reference image . In backward translation, MURD extracts the content features from the harmonized image , which are in turn fed with style features to generator to reconstruct image , which is required to be consistent with the input image .

0.9 Multi-site Unsupervised Representation Disentangler (MURD)

MURD implements CS-DCT in a end-to-end manner via five modules (Figure 1d):

  1. A content encoder , shared for all sites, to extract content features from an image in a site-invariant space ;

  2. A style encoder for each site to extract style features from an image in a site-specific style space ;

  3. A generator , shared for all sites, to synthesize images using content and style features;

  4. A style generator for each site to yield style features that reflect the appearance style of images from the site; and

  5. A discriminator for each site to distinguish between real and generated images.

Specifically, given an input image from image set , content encoder and style encoder , respectively, extract content features and style features . For a random site , style generator takes a latent code

randomly sampled from a standard normal distribution

as input to create a -site style features . With content features and style features , generator constructs harmonized image , which the discriminator

then classifies as being either real or fake using an adversarial loss

:

(1)

Content encoder and style encoder are used to extract content features and style features from . The consistency between and is enforced by a pixel-wise content consistency loss :

(2)

The consistency between and is enforced by a pixel-wise style consistency loss :

(3)

Content site-invariance is enforced by a content alignment loss :

(4)

where

is the Kullback–Leibler divergence and

is an identity matrix. Content features

are randomly perturbed during the feed-forward step:

(5)

The reconstructed image of is produced by generator using content features and style features . The consistency between and is ensured by a pixel-wise and gradient-wise cycle consistency loss :

(6)

where is the image gradient function and is the loss weight for the gradient loss term. Furthermore, an identity image can also be constructed by generator using content features and style features , which are identical to when and are completely disentangled. We devise an identity loss to measure the pixel-wise difference between and as

(7)

All modules are optimized with total loss function

:

(8)

where , , , , and

are loss weights used for controlling the contributions of the respectively terms in the loss function. Training ends when all modules are optimized, such that the optimized discriminator classifies the harmonized images into one of two categories with equal probability. See Supplementary Figures 4–7 for the effects of the individual loss functions.

During inference, content features of input image are extracted using content encoder , and style features are either generated using style generator or extracted from a reference image . Harmonized image is created by generator using and .

0.10 Network Architecture

The architectures of the components of MURD are described next.

Content Encoder Content encoder is shared among all sites and extracts the content features of an input image through three convolutional blocks and 4 residual blocks. Each convolutional block is composed of three sequential layers, i.e., convolution, instance normalization (IN)[39]

, and leaky ReLU (LReLU) activation. Each residual block consists of a nonlinear mapping branch and a shortcut branch. The nonlinear mapping branch is constructed by a series of layers, i.e., convolution, IN, LReLU, convolution, and IN. The shortcut branch is an identity mapping of the block input. We use an IN layer instead of a batch normalization layer

[39] to accelerate model convergence and maintain independence between features. All normalized feature maps are activated by LReLU with a negative slope of 0.2.

Style Encoder Style encoder is composed of site-shared and site-specific subnetworks. The site-shared subnetwork is constructed by a convolution layer, four pre-activation residual blocks, and a global average pooling layer. The site-specific subnetwork is composed of fully connected layers corresponding to the

individual sites. The pre-activation residual block is constructed by integrating LReLU activation followed by a convolution layer with unit stride into a residual block, where an average pooling layer is adopted to downsample the intermediate features and the shortcut branch is implemented by an average pooling layer and a convolution layer with unit kernel size and stride. Note that we extract style features without IN layers since IN removes feature means and variances, which contain important style information. The output dimension is set to 64. Style features

have the same dimension.

Generator Site-shared generator merges content features and style features to create a harmonized image using four residual blocks identical to the content encoder, two upsampling blocks and a convolution layer with hyperbolic tangent (tanh) activation. The upsampling block consists of deconvolution, IN, and LReLU activation.

Style Generator Style generator consists of a multilayer perception (MLP) with output branches. Six fully connected layers are shared among all sites, followed by a fully connected layer for each site. We set the dimensions of the latent code, the hidden layer, and the style features to 16, 256, and 64, respectively. We randomly sample the latent code from the standard Gaussian distribution.

Discriminator Discriminator consists of site-shared and site-specific subnetworks, similar to the style encoder. Specifically, three convolutional blocks and a global average pooling are shared among all sites, followed by a specific fully connected layer for real/fake classification for each site.

0.11 Implementation Details

MURD was implemented using Tensorflow. Evaluation was based on a machine with a CPU (i.e., Intel i7-8700K) and a GPU (i.e., NVIDIA GeForce GTX 1080Ti 11GB). The ADAM optimizer with

learning rate was utilized for minimization based on the loss function. For all datasets, we used , , , , , and

for the corresponding losses. For all datasets, every three adjacent slices in each volume were inserted into three channels. Each channel was then normalized to have a range between -1 and 1 and zero-padded to 256

256.

This work was supported in part by the United States National Institutes of Health (NIH) grants EB006733 and MH125479.

The authors declare no competing interests.

Correspondence and requests for materials should be addressed to Pew-Thian Yap (email: ptyap@med.unc.edu).

The Adolescent Brain Cognitive Development (ABCD) 2.0 data used in this study can be obtained from the National Institutes of Mental Health Data Archive (NDA) (http://nda.nih.gov/).

S.L. designed the framework and network architecture, carried out the implementation, performed the experiments, and analyzed the data. S.L. and P.-T.Y. wrote and revised the manuscript. P.-T.Y. conceived the study and were in charge of overall direction and planning.

0.12 References

References

  • [1] Jernigan, T. L., Brown, S. A. & Dowling, G. J. The adolescent brain cognitive development study. Journal of Research on Adolescence 28, 154–156 (2018).
  • [2] Mueller, S. G. et al. The Alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics of North America 15, 869–877 (2005).
  • [3] Ellis, K. A. et al. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: Methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. International Psychogeriatrics 21, 672–687 (2009).
  • [4] Shinohara, R. et al. Volumetric analysis from a harmonized multisite brain MRI study of a single subject with multiple sclerosis. American Journal of Neuroradiology 38, 1501–1509 (2017).
  • [5] Pomponio, R. et al. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. NeuroImage 208, 116450 (2020).
  • [6] Yu, M. et al. Statistical harmonization corrects site effects in functional connectivity measurements from multi-site fMRI data. Human Brain Mapping 39, 4213–4227 (2018).
  • [7] Shah, M. et al. Evaluating intensity normalization on MRIs of human brain with multiple sclerosis. Medical Image Analysis 15, 267–282 (2011).
  • [8] Udupa, L. G. N. J. K. On standardizing the mr image intensity scale. Magenetic Resonance in Medicine 42, 1072–1081 (1999).
  • [9] He, Q., Shiee, N., Reich, D. S., Calabresi, P. A. & Pham, D. L. Intensity standardization of longitudinal images using 4d clustering. In Proceedings of International Symposium on Biomedical Imaging (2013).
  • [10] Shinohara, R. T. et al. Statistical normalization techniques for magnetic resonance imaging. NeuroImage: Clinical 6, 9–19 (2014).
  • [11] Wrobel, J. et al. Intensity warping for multisite MRI harmonization. NeuroImage 223, 117242 (2020).
  • [12] Fortin, J.-P., Sweeney, E. M., Muschelli, J., Crainiceanu, C. M. & Shinohara, R. T. Removing inter-subject technical variability in magnetic resonance imaging studies. NeuroImage 132, 198–212 (2016).
  • [13] Fortin, J.-P. et al.

    Harmonization of multi-site diffusion tensor imaging data.

    NeuroImage 161, 149–170 (2017).
  • [14] Fortin, J.-P. et al. Harmonization of cortical thickness measurements across scanners and sites. NeuroImage 167, 104–120 (2018).
  • [15] Jog, A., Carass, A., Roy, S., Pham, D. L. & Prince, J. L. Random forest regression for magnetic resonance image synthesis. Medical Image Analysis 35, 475–488 (2017).
  • [16] Garcia-Dias, R. et al. Neuroharmony: A new tool for harmonizing volumetric MRI data from unseen scanners. NeuroImage 220, 117127 (2020).
  • [17] Dewey, B. E. et al. DeepHarmony: A deep learning approach to contrast harmonization across scanner changes. Magnetic Resonance Imaging 64, 160–170 (2019).
  • [18] Dewey, B. E. et al. A disentangled latent space for cross-site MRI harmonization. In Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 720–729 (Springer International Publishing, 2020).
  • [19] Dinsdale, N. K., Jenkinson, M. & Namburete, A. I. Deep learning-based unlearning of dataset bias for MRI harmonisation and confound removal. NeuroImage 228, 117689 (2021).
  • [20] Liu, M.-Y., Breuel, T. & Kautz, J. Unsupervised image-to-image translation networks. In Proceedings of Neural Information Processing Systems (NeurIPS), 700–708 (2017).
  • [21] Zhu, J.-Y. et al. Toward multimodal image-to-image translation. In Proceedings of Neural Information Processing Systems (NeurIPS), 465–476 (2017).
  • [22] Isola, P., Zhu, J.-Y., Zhou, T. & Efros, A. A. Image-to-image translation with conditional adversarial networks. In

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    , 1125–1134 (2017).
  • [23] Anoosheh, A., Agustsson, E., Timofte, R. & Van Gool, L. Combogan: Unrestrained scalability for image domain translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2018).
  • [24] Choi, Y. et al.

    StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).
  • [25] Choi, Y., Uh, Y., Yoo, J. & Ha, J.-W. StarGAN v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
  • [26] Lee, H.-Y. et al. DRIT++: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision 128, 2402–2417 (2020).
  • [27] Casey, B. et al. The adolescent brain cognitive development (ABCD) study: Imaging acquisition across 21 sites. Developmental Cognitive Neuroscience 32, 43–54 (2018).
  • [28] Avants B., S. G., Tustison N. Advanced normalization tools. Insight Journal (2009).
  • [29] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter., S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of Neural Information Processing Systems (NeurIPS) (2017).
  • [30] Bińkowski, M., Sutherland, D., Arbel, M. & Gretton, A. Demystifying MMD GANs. In Proceedings of International Conference on Learning Representations (ICLR), 1–36 (2018).
  • [31] Salimans, T. et al. Improved techniques for training GANs. In Advances in Neural Information Processing Systems, vol. 29, 2234–2242 (2016).
  • [32] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the Inception architecture for computer vision. In Proceedings of Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016).
  • [33] Deng, J. et al. ImageNet: A large-scale hierarchical image database. In Computer Vision and Pattern Recogntion (CVPR) (2009).
  • [34] Smith, S. M. Fast robust automated brain extraction. Human Brain Mapping 17, 143–155 (2002).
  • [35] Zhang, Y., Brady, M. & Smith, S.

    Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm.

    IEEE Transactions on Medical Imaging 20, 45–57 (2001).
  • [36] Liu, M. et al. Style transfer using generative adversarial networks for multi-site MRI harmonization. bioRxiv (2021).
  • [37] Zuo, L. et al.

    Information-based disentangled representation learning for unsupervised MR harmonization.

    In International Conference on Information Processing in Medical Imaging (IPMI) (2021).
  • [38] Varsavsky, T. et al. Test-time unsupervised domain adaptation. In Proceedings of Medical Image Computing and Computer Assisted Intervention (MICCAI), 428–436 (2020).
  • [39] Ulyanov, D., Vedaldi, A. & Lempitsky, V. Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6924–6932 (2017).