Pooling data from different sites and previous studies is essential for analysis of large populations with sufficient statistical power (Smith and Nichols, 2018). However, due to differences in image acquisition, demographics, disease characteristics and other factors, naive combination of datasets for subsequent large-scale population analysis can be problematic. Here, we conduct a simple, empirical study to illustrate and highlight this problem in the context of machine learning. We are not suggesting a solution, but rather re-iterate that multi-center data harmonization is an open research challenge. For some recent attemps to tackle this problem, see for example (Fortin et al., 2017, 2018).
We construct an age- and sex-matched dataset with T1-weighted brain MRI from individuals, where subjects ( females) are taken each from the Cambridge Centre for Ageing and Neuroscience study (Cam-CAN)111http://www.cam-can.org/ (Shafto et al., 2014; Taylor et al., 2017) and UK Biobank imaging study (UKBB)222http://www.ukbiobank.ac.uk/ (Sudlow et al., 2015; Miller et al., 2016; Alfaro-Almagro et al., 2018). This is to simulate a somewhat ‘best case scenario’ for multi-site data where the age- and sex-matching intends to remove population bias. We note this is rarely possible in practice, and it is expected that current and previous analyses that pool data from different sites suffer from much larger site-specific biases.
All images were collected at a single site (Medical Research Council Cognition and Brain Sciences Unit (MRC-CBSU) in Cambridge, UK) using a 3T Siemens TIM Trio scanner with a 32-channel receive head coil. Imaging parameters are: 3D MPRAGE, TR=2250ms, TE=2.99ms, TI=900ms; FA=9 deg; FOV=256x240x192mm; 1mm isotropic; GRAPPA=2; TA=4mins 32s.
All images were collected at the UKBB imaging center using a 3T Siemens Skyra scanner with a 32-channel receive head coil. Imaging parameters are: 3D MPRAGE, R=2, TR=2000ms, TE=385ms, TI=880ms; FOV=208x256x256mm; 1mm isotropic; Duration 4mins 54s.
The acquisition protocols of the two studies are remarkably similar, and possibly much closer than typically found when pooling data from multiple sites. The subjects in both studies should be normal.
2.1 Pre-Processing Pipeline
We aimed at designing a common state-of-the-art pre-processing pipepline which in this or similar form is widely used in neuroimaging studies. In particular, we apply the following sequential steps: 1) Lossless image reorientation by swapping axes using the direction information from the NIfTI image header, such that all scans are in the same radiological orientation of left, posterior, superior; 2) Skull stripping with ROBEX v1.2333https://www.nitrc.org/projects/robex (Iglesias et al., 2011); 3) Bias field correction with N4ITK444https://itk.org (Tustison et al., 2010); 4) Intensity-based linear registration (rigid and affine) to MNI ICBM 152 2009a Nonlinear Symmetric555http://nist.mni.mcgill.ca/?p=904 using an in-house registration tool with correlation coefficient as the similarity measure and downhill-simplex as the optimizer.
After these steps, we perform intensity normalization within brain regions with simple whitening (zero-mean/unit-variance). Voxels outside the brain are set to fixed value. Other techniques such as percentile matching and Nyul’s histogram standardization(Nyúl et al., 2000) led to similar subsequent observations. We also employ SPM12666http://www.fil.ion.ucl.ac.uk/spm/software/spm12/ (Friston et al., 2007; Ashburner, 2012) and FMRIB’s Automated Segmentation Tool (FAST) v4.0777https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FAST (Zhang et al., 2001)
to obtain brain tissue probability maps. SPM is run directly on the raw T1-weighted scans as it has its own pre-processing pipeline built-in including spation non-linear normalization to MNI space. FSL-FAST is run on our skull-stripped, bias field corrected and rigidly MNI aligned images.
3 Experiments, Results & Conclusion
We conduct two image classification experiments to illustrate the impact of scanner effects which remain after careful pre-processing and are present even in image-derived tissue probability maps.
We train random forest binary classifiers to distinguish between the origin of the imaging data. The classifiers are trained to distinguish between data from Cam-CAN and UKBB.
Results are summarized in Table 1
. We make the following observations: i) classifiers are able to predict data origin with high accuracy; ii) scanner effects remain in derived tissue probability maps; iii) higher degrees of spatial normalization amplify scanner effects (possibly related to interpolation).
Sex classification: We consider a simple binary classification task of sex classification. We compare results of training random forest classifiers on single-site and multi-site data.
Results for sex classification are summarized in Table 2. We make the following observations: i) age/sex-matched multi-site data gives realistic estimates of accuracy (similar to single site); ii) sex imbalance in multi-site leads to overly optimistic accuracy; iii) training on one site and testing on the other shows drop of performance indicating poor generalization; iv) when discriminative features such as brain size are removed by affine registration, the drop in performance is more severe.
|Stripped||Bias Field||Aligned||Intensities||Accuracy||Avg. Entropy||Avg. Prob.|
|SPM12 – Segment||Accuracy||Avg. Entropy||Avg. Prob.|
|FSL – FAST||Accuracy||Avg. Entropy||Avg. Prob.|
|Data Arrangement||Aligned||Accuracy||Avg. Entropy||Avg. Prob.|
|Cam-CAN females / UKBB males||rigid||94.59%||0.4036||0.8311|
|Cam-CAN 80/20% / UKBB 20/80%||rigid||85.87%||0.5038||0.7616|
|Cam-CAN train / UKBB test||rigid||81.42%||0.5617||0.7124|
|UKBB train / Cam-CAN test||rigid||78.04%||0.5284||0.7419|
|Cam-CAN females / UKBB males||affine||98.99%||0.4641||0.8013|
|Cam-CAN 80/20% / UKBB 20/80%||affine||84.78%||0.5713||0.7125|
|Cam-CAN train / UKBB test||affine||73.65%||0.6462||0.6245|
|UKBB train / Cam-CAN test||affine||62.16%||0.6075||0.6769|
This research has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 757173, project MIRA, ERC-2017-STG). UK Biobank data has been accessed under Application Number 12579.
- Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. NeuroImage 166, pp. 400–424. External Links: Cited by: §2.
- SPM: a history. NeuroImage 62 (2), pp. 791–800. Cited by: §2.1.
- Common pitfalls in machine learning applications to multi-center data: tests on the ABIDE i and ABIDE ii collections. In Joint Annual Meeting ISMRM-ESMRMB, Cited by: §3.
- Harmonization of cortical thickness measurements across scanners and sites. NeuroImage 167, pp. 104–120. Cited by: §1.
Harmonization of multi-site diffusion tensor imaging data. NeuroImage 161, pp. 149–170. Cited by: §1.
- Statistical parametric mapping: the analysis of functional brain images. Academic Press. External Links: Cited by: §2.1.
- Robust brain extraction across datasets and comparison with publicly available methods. IEEE Transactions on Medical Imaging 30 (9), pp. 1617–1634. Cited by: §2.1.
- Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience. External Links: Cited by: §2.
- New variants of a method of MRI scale standardization. IEEE Transactions on Medical Imaging 19 (2), pp. 143–150. Cited by: §2.1.
- The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC Neurology 14 (1), pp. 204. Cited by: §2.
- Statistical challenges in “big data” human neuroimaging. Neuron 97 (2), pp. 263–268. Cited by: §1.
- UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Medicine 12 (3). External Links: Cited by: §2.
- The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) data repository: structural and functional MRI, MEG, and cognitive data from a cross-sectional adult lifespan sample. NeuroImage 144, pp. 262–269. Cited by: §2.
- N4ITK: improved N3 bias correction. IEEE Transactions on Medical Imaging 29 (6), pp. 1310–1320. Cited by: §2.1.
- Quantifying confounding bias in neuroimaging datasets with causal inference. arXiv preprint arXiv:1907.04102. Cited by: §3.
Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging 20 (1), pp. 45–57. Cited by: §2.1.