Relevance Vector Machines for harmonization of MRI brain volumes using image descriptors

11/08/2019 ∙ by Maria Inês Meyer, et al. ∙ 0

With the increased need for multi-center magnetic resonance imaging studies, problems arise related to differences in hardware and software between centers. Namely, current algorithms for brain volume quantification are unreliable for the longitudinal assessment of volume changes in this type of setting. Currently most methods attempt to decrease this issue by regressing the scanner- and/or center-effects from the original data. In this work, we explore a novel approach to harmonize brain volume measurements by using only image descriptors. First, we explore the relationships between volumes and image descriptors. Then, we train a Relevance Vector Machine (RVM) model over a large multi-site dataset of healthy subjects to perform volume harmonization. Finally, we validate the method over two different datasets: i) a subset of unseen healthy controls; and ii) a test-retest dataset of multiple sclerosis (MS) patients. The method decreases scanner and center variability while preserving measurements that did not require correction in MS patient data. We show that image descriptors can be used as input to a machine learning algorithm to improve the reliability of longitudinal volumetric studies.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Large scale multi-site studies are of extreme importance in neuroimaging, both for research purposes and in clinical practice. Such studies face several challenges due to hardware- or center-related variability. It is well known that scanner-factors such as manufacturer, magnetic field and gradient non-linearly influence volume measurements [4, 14] obtained from structural Magnetic Resonance Imaging (MRI). At the image level, these factors are coupled with a high variability of intensities across patients and scanners, which can affect tasks like the segmentation of brain structures [16]. This effect is exemplified in Fig. 1, where three T1-weighted MR images from the same patient obtained on different scanners and their corresponding segmentations are represented.

Figure 1: MR images from same patient in different scanners. Left side: T1-weighted images. Right side: white matter segmentations obtained using the same method.

The need to address multi-scanner and -center data harmonization is evidenced in the follow-up of Multiple Sclerosis (MS) patients. These patients exhibit an increased rate of brain atrophy when compared to healthy subjects, which has been linked to impairment [1]

. However, it has been suggested that brain atrophy can only be reliably estimated over periods of at least five years

[2], due to the variability caused by scanner and center factors.

Besides image processing approaches that aim at matching image intensity distributions to provide a more consistent input to the segmentation method [11, 12], recent studies have focused on statistical harmonization of volumetric measurements based on scanner- or center-specific information. This type of methods generally apply regression techniques to correct measurements. Linear mixed-effects models using patient- and scanner-specific information as random effects were explored by [5] and [8]. Recently, [6]

used an algorithm devised for genomics that extends the same type of model to account for site-specific factors. A data-driven approach based on independent component analysis was explored by

[4], where correction was performed by selecting independent components related to scanning parameters. However, since these methods rely on scanner- or acquisition-specific information, they do not generalize and need to be adapted when used in new settings. Additionally, such information can be incomplete, especially in historical data. As such, it would be of interest to use information that is encoded in the images themselves, or that can be extracted from the volume quantification method to build more robust and adaptive techniques.

To address these issues, in this paper we present a novel statistical harmonization approach based on image descriptors and a machine learning algorithm. We first explore the relations between image-extracted properties and brain volume measurements that we could further exploit for harmonization. We then train a machine learning algorithm based on Automatic Relevance Determination on healthy data to perform volumetric corrections. We validate the method on a set of unseen healthy controls, and finally test it on a a test-retest dataset of MS patients.

2 Data

Healthy Subjects

This dataset comprises 1996 T1-weighted (T1w) MRI scans from healthy subjects. The data is a compilation of several public datasets, such as [10, 3], and some proprietary data. The overall set comprises data from several different centers and scanner types from the major vendors (Siemens, Philips, GE). Magnetic field strengths (1.5T or 3T) and T1w sequence types also vary. For most of the data we have information regarding age and sex of the subject, scanner type, magnetic field strength and additional acquisition parameters like echo time (TE) and repetition time (TR). For building and testing our model, we randomly divided the data into training () and test sets ().

Patient Data

To further validate the approach we test it in a dataset containing data from 10 MS patients as detailed in [7]. Each patient was scanned twice in three different 3T scanners: Philips Achieva, Siemens Skyra and GE Discovery MR450w. An example is depicted in Fig. 1. We observed that one of the patients was an extreme case, showing very enlarged ventricles. Given that the volumetric measurements in such a case are prone to errors and are considered unreliable, this patient’s data was discarded from further analysis.

2.1 Data Pre-processing and Feature Extraction

For each image we compute gray matter (GM) and white matter (WM) volumes using the well established atlas-based method described in [7]. Whole brain (WB) volume is then defined as the sum of WM and GM volumes.

We are interested in descriptors related to the T1w images that encode information about errors and bias in brain segmentations. Since the quality of a segmentation depends on a good registration to the atlas and is influenced by the contrast and noise present in an image, it is valuable to explore features that convey such information. We extract a total of 16 features of two main types: i) Alignment information regarding the registration of the T1w image to the MNI atlas space, which includes decomposing the affine transformation (rotation angles, scale and shear factors in three directions), and measuring the similarity between the registered images using Normalized Mutual Information (NMI); and ii) Contrast to Noise Ratio (CNR) between tissue types and between different brain structures (e.g., lobes and cerebellum). CNR is given by:

where represents the mean intensity of some tissue or structure and

is the variance of the image intensities across this structure. We compute

by taking as the tissue or structure with higher average image intensity than .

The brain volumes and some of the computed descriptors (e.g., CNR and NMI) are known to be age dependent [13]. As such, age is used as a feature at training time, but not at test time. For analysis and comparison, we age-detrend the volumes by subtracting an age-matched estimated median value. CNR and NMI are corrected by fitting a linear regressor to the data.

3 The Relevance Vector Machine for data harmonization

To harmonize brain volumes, we subtract correction terms based on estimated variability trends from the original volumes. To determine the variation in the volumetric data that can be explained by the aforementioned image descriptors, we fit a linear model using the extracted features as independent variables. When choosing the model, we take a few important considerations into account: i) we have 16 image-extracted features plus age; ii) some of these are related only to one of the brain volumes; iii) the features are not always unrelated; and iv) we are interested in a probabilistic model, to capture uncertainty in our predictions. Given that standard generalized linear models do not address all these considerations, we investigate using a probabilistic machine learning technique.

The Relevance Vector Machine (RVM) is defined within a fully probabilistic framework and includes a mechanism of automatic relevance determination [9]. As described in [15], the model defines a conditional distribution for real-valued input-target vector pairs , of the type:

which specifies a Gaussian distribution over

with mean and precision (inverse variance) . Here are the set of extracted features and the corresponding volume measurement for each image in a training dataset of size . The function is given by a linear combination of basis functions with weights :

where the basis function is a kernel centered around the

training sample. In order to avoid over-fitting due to the large numbers of parameters in the model, a zero-mean Gaussian prior probability distribution is defined over the weights

. Moreover, a separate hyperparameter

is introduced for each individual weight , representing the precision of the corresponding weight:

where . Using the resulting model, relevance vector learning searches for the hyperparameters and that maximize the marginal likelihood of the training data, where . Defining as the matrix with in row, the learning algorithm proceeds by iteratively updating and as follows [15]:

where and are the posterior covariance and mean for the weights, respectively. In practice, during re-estimation, many ’s tend to infinity, which causes the posterior distributions of the corresponding weights to peak around zero. The basis functions associated with these do not influence the predictions and can be pruned out, resulting in a sparse model. The remaining training samples with non-zero weights are called relevance vectors.

Once the model is trained, we can take a set of descriptors of an unseen image, and try to predict the corresponding volume based on these descriptors alone using the posterior mean : . Finally, we can obtain a corrected volume by subtracting the estimated contribution of the image descriptors from the original volume : .

4 Results

4.1 Verification of observable correlations in data

To verify whether there are correlations between image descriptors and the measured volumes, we built a cross-correlation map between these variables (see Fig. 2). Analysing these correlations reveals that image descriptors like NMI and CNR are related to scanner/acquisition specific features. These same image descriptors are in turn correlated to the brain volumes, as well as scale.

Figure 2: Left: Cross-correlations between age-detrended brain volumes (mL) and extracted features. We represent angle, shear and scale as the result of multiplying the three directions. Right: a zoomed-in view of the relationship between volumes and image descriptors: top WM vs. NMI and bottom GM vs. CNR.

4.2 Harmonization of healthy population data based on RVM

We trained and tested the RVM method described in section 3

for linear regression on the data set of healthy subjects (section

2). We used different kernel types (linear and Gaussian - RBF) and searched for the model that best preserved biological information - namely age - while decreasing the scanner/center-specific variability. Thus, the model should decrease global variance in the data but maintain the original median of the population defined by the training set, given that we build on the assumption that this sample contains enough variability to represent the heterogeneity of scanner and center effects. To evaluate the performance, we produced boxplots to represent the distribution of the measured volumes in each scanner in the test set with at least ten subjects (see Fig. 3

). We first removed the age dependency as estimated from the training set, such that the variability due to age is not accounted for. We compare median and standard deviation, preferring values closer to zero, since they represent a decrease in variability while preserving the global trend. Table

1 presents the median values of these same age-detrended values. After correction the distributions from different scanners become more similar. For both GM and WM the linear kernel produced lower or comparable mean and decreased standard deviation. For WB we compared applying a linear kernel to summing the previously corrected WM and GM volumes and verified that the last option performed better.

Kernel Median STD
Original 10.9 30.4
Linear -3.2 26.9
RBF 11.0 27.6
Median STD
-0.3 32.5
-0.3 29.0
-7.5 33.5
Kernel Median STD
Original 1.2 47.9
Linear -3.4 38.2
WM+GM (linear) -0.8 37.3
Table 1: Median and standard deviation of age-detrended volumes (mL) before and after applying the RVM-based correction with different kernels (linear and RBF). Data pertains to the scanner-wise distribution of the test set (as in Fig. 3).
Figure 3: Distribution of the validation set volumes (mL) before (top) and after (bottom) RVM-based correction using a linear kernel. (N.S.: Not Specified).

4.3 Harmonization of test-retest data

To further validate the method, we applied it to the test-retest dataset of MS patients described in Section 2. The results are summarized in Fig. 4. On the left side, the absolute volumes before and after correction are represented. First we computed for each tissue type the differences between the volumes from images acquired in the same scanner, which provides a measure of the intra-scanner error (Intra-SE). There is no significant difference between the original and corrected volumes for all the tissue types (

, paired t-test). Then we computed the difference between the averaged volumes of each scanner type against all the other scanner types. For WM there is a statistically significant difference (

, paired t-test) between the original and corrected volumes. For GM, the inter-scanner error (Inter-SE) for the original volumes was very small, being comparable to the Intra-SE. After applying the correction, there is no statistical difference between these volumes and the original ones, even if visually there is an increase in the variability which is propagated for the WB.

Figure 4: Difference between same patient brain volumes (mL) acquired in different scanner types. Left: Absolute volumes for each patient before and after correction. Right: Distribution of differences between the Intra-SE and Inter-SE volumes from same patient before and after correction.

5 Conclusions and Future Work

In this work we applied a relevance vector machine approach to find the amount of variability in the data that can be explained by variations in image descriptors. We observe that there is a large dependency of brain volumes with the atlas-registered NMI metric, which was initially not expected. NMI measures the goodness of a non-rigid registration step between the image and an atlas, necessary for the methodology we used. The final volumes depend on the goodness of this registration, and in such a way we are correcting for suboptimal segmentation results that derive from a poor registration step.

We demonstrate that it is possible to achieve a certain degree of harmonization of the data based only on image descriptors. To our knowledge, this is the first approach that does not rely on scanner-specific information to perform harmonization. We expect the current method to perform less efficiently than more tailored methods, but to generalize better. A thorough comparison to such methods still needs to be performed, but it is out of the scope of the current paper. This type of solution is interesting for large scale statistics, and could potentially have a positive impact in longitudinal studies. Moreover, the proposed approach allows dealing with missing scanner/center information, a problem not addressed in previous works and very frequent in practice. Nevertheless, in the test-retest setting inter-scanner error is still high when compared to the measured intra-scanner error, which implies that the method does not provide a completely satisfactory correction for patient specific use, and should be further investigated.

Future steps include exploring more image descriptive features that are independent from the segmentation method used and that can encode the presence of geometrical distortions and artifacts. For controlled environments it could be useful to couple general scanner-dependent information with the image descriptors. Additionally, we aim to extend the method to other brain structures of interest and to compare its performance on a controlled dataset to scanner-specific state-of-the art methods. Finally, it is important to keep in mind that in a cross-sectional setting this type of correction does not replace the need for an improved standardization at the image level.


  • [1] R. A. Bermel and R. Bakshi (2006) The measurement and clinical relevance of brain atrophy in multiple sclerosis. The Lancet Neurology 5 (2), pp. 158 – 170. External Links: ISSN 1474-4422 Cited by: §1.
  • [2] V. Biberacher and et al. (2016-11) Intra- and interscanner variability of magnetic resonance imaging based volumetry in multiple sclerosis. NeuroImage 142, pp. 188–197. External Links: ISSN 10538119, Document Cited by: §1.
  • [3] Biomedical Image Analysis Group, Imperial College London IXI dataset. Note:, Last accessed on 2019-03-18 Cited by: §2.
  • [4] J. Chen and et al. (2014-06) Exploration of scanning effects in multi-site structural MRI studies. Journal of Neuroscience Methods 230, pp. 37–50. External Links: NIHMS150003, ISBN 1872-678X, ISSN 1872678X, Document Cited by: §1, §1.
  • [5] A. S. Chua and et al. (2015) Handling changes in MRI acquisition parameters in modeling whole brain lesion volume and atrophy data in multiple sclerosis subjects: Comparison of linear mixed-effect models. NeuroImage: Clinical 8, pp. 606–610. External Links: ISBN 2213-1582, ISSN 22131582 Cited by: §1.
  • [6] J. Fortin and et al. (2018-02) Harmonization of cortical thickness measurements across scanners and sites. NeuroImage 167, pp. 104–120. External Links: ISSN 10538119, Document Cited by: §1.
  • [7] S. Jain and et al. (2015-01) Automatic segmentation and volumetry of multiple sclerosis brain lesions from MR images. NeuroImage: Clinical 8, pp. 367–375. External Links: Document, ISSN 22131582 Cited by: §2, §2.1.
  • [8] B. C. Jones and et al. (2013) Quantification of multiple-sclerosis-related brain atrophy in two heterogeneous MRI datasets using mixed-effects modeling.. NeuroImage. Clinical, pp. 171–9. External Links: Document, ISSN 2213-1582 Cited by: §1.
  • [9] D. J. C. MacKay (1992)

    Bayesian interpolation

    Neural Computation 4 (3), pp. 415–447. External Links: Document Cited by: §3.
  • [10] D. S. Marcus and et al. (2007) Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. Journal of Cognitive Neuroscience 19 (9), pp. 1498–1507. Cited by: §2.
  • [11] L. G. Nyúl and J. K. Udupa (1999-12) On standardizing the MR image intensity scale. Magnetic Resonance in Medicine 42 (6), pp. 1072–1081. External Links: ISSN 0740-3194 Cited by: §1.
  • [12] N. Robitaille, A. Mouiha, B. Crépeault, F. Valdivia, and S. Duchesne (2012-05) Tissue-based MRI intensity standardization: Application to multicentric datasets. International Journal of Biomedical Imaging 2012, pp. 1–11. External Links: ISBN 1687-4196 (Electronic) 1687-4188 (Linking), ISSN 16874188 Cited by: §1.
  • [13] D.H. Salat and et al. (2009) Age-associated alterations in cortical gray and white matter signal intensity and gray to white matter contrast. NeuroImage 48 (1), pp. 21 – 28. External Links: ISSN 1053-8119 Cited by: §2.1.
  • [14] H. Takao, N. Hayashi, and K. Ohtomo (2011) Effect of scanner in longitudinal studies of brain volume changes. Journal of Magnetic Resonance Imaging 34 (2), pp. 438–444. External Links: Document, ISSN 10531807 Cited by: §1.
  • [15] M. E. Tipping (2001) Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research 1 (3), pp. 211–244. External Links: ISSN 15324435, Document Cited by: §3.
  • [16] Y. Zhuge and J. K. Udupa (2009-10) Intensity standardization simplifies brain MR image segmentation. Computer Vision and Image Understanding 113 (10), pp. 1095–1103. External Links: ISSN 1077-3142, Document Cited by: §1.