Automated segmentation of MRI scans is a prerequisite for most human neuroimaging studies. Most of the algorithms commonly used for this task rely solely on structural MRI (sMRI) scans, and belong to one of three categories: Bayesian segmentation with a probabilistic atlas (e.g., [1, 2]); multi-atlas segmentation 
; and, more recently, convolutional neural networks (e.g.,). Typically, these techniques segment the brain into tissue types (i.e., gray matter, white matter, and cerebrospinal fluid), or into finer anatomical structures (e.g., hippocampus, ventricle). Bayesian methods drive the primary segmentation modules of the most widespread neuroimaging packages, like FreeSurfer , FSL , or SPM .
The aforementioned approaches rely mostly on T1 contrast to distinguish between gray and white matter. However, some boundaries between structures are nearly invisible in T1 (and other structural MR contrasts) due to insufficient difference in proton density and relaxation times. This is exacerbated by lower contrast-to-noise ratio in deep-brain structures, due to greater distance from the head coil. Two examples from the state-of-the-art Human Connectome Project (HCP) dataset  are shown in Fig. 1. In the first example, the lateral boundary of the thalamus appears very faint (Fig. 1a). In the second, the lateral boundary of the globus pallidus is visible thanks to the contrast with the neighboring putamen, but the medial boundary is not (Fig. 1c).
These issues create a need for fusing data from several MR modalities to better delineate structure boundaries. A natural complement to sMRI is diffusion MRI (dMRI), which may help discriminate between certain tissue types, despite its lower resolution. For example, in Fig. 1b, the lateral boundary of the thalamus is clearly discernible in the principal diffusion direction map obtained from dMRI. The diffusion data also complement the T1 scan in the pallidum, which can be delineated by combining contours obtained from the two modalities (medial from dMRI, lateral from sMRI, see Fig. 1d).
Most prior work on segmentation of dMRI focuses on delineating white matter structures, using tractography [7, 8, 9] or volumetric segmentation [10, 11]. Tractography has also been used to subdivide subcortical structures (e.g., thalamus , amygdala ) based on long-range connections. Surprisingly, the literature on joint modeling of multimodal sMRI/dMRI is sparse. When sMRI and dMRI are used by the same tool, this is most often done serially, e.g., a segmentation derived from sMRI is used to analyze the dMRI (e.g., to derive priors for Bayesian tractography 
). To the best of our knowledge, the only works analyzing sMRI and dMRI simultaneously have been on thalamic nuclei segmentation with random forests[14, 15]. The main concern with such discriminative techniques is their generalization ability to other datasets, which is limited by differences in MRI acquisition. For sMRI segmentation, this problem can be ameliorated with data augmentation and pretraining . However, this is harder to do in dMRI, where acquisition protocols are much less standardized.
The ability to generalize across datasets is critical when software is released publicly and few assumptions can be made on the acquisition. In such scenarios, Bayesian segmentation methods that automatically estimate appearance models from input images remain very popular, as they are agnostic to the MRI contrast of the input scan, and thus robust to acquisition differences. These methods are used for tissue segmentation by major neuroimaging packages (e.g., Unified Segmentation  in SPM, and FAST  in FSL). However, they can be inaccurate when segmenting structures with poor sMRI contrast (see Fig. 1).
Here we propose a sequence-adaptive Bayesian algorithm that uses a probabilistic atlas to segment sMRI and dMRI data simultaneously111Henceforth, we use “Bayesian segmentation” to refer to this specific family of Bayesian methods, using probabilistic atlases and unsupervised appearance modeling.. This is achieved via a novel dMRI likelihood term, which relies on a hierarchical model for the fractional anisogropy (FA) and principal diffusion orientation. Combined with a Gaussian likelihood for sMRI, this model of image intensities is flexible enough to produce accurate segmentations, while keeping dimensionality low. We also propose a novel inference algorithm to automatically segment scans by fitting the model to multi-modal sMRI/dMRI data. Thanks to unsupervised intensity modeling, applicability across a wide range of acquisition protocols is achieved, which is demonstrated experimentally on two considerably different datasets.
2.1 Forward probabilistic model
The graphical model of our framework is shown in Fig. 2a. The observed variables are a bias field corrected (possibly multispectral) sMRI scan defined on voxels, a dMRI scan defined at the same voxel coordinates (which might require resampling), and a probabilistic atlas
, which provides the probabilities of observing one of
neuroanatomical classes at every location across a reference spatial coordinate system. The model is governed by three sets of deterministic hyperparameters specified by the user:, and .
At the top of the generative model we find the atlas , along with a set of related parameters that deform this atlas into the space of the MRI data. These parameters are a sample of a distribution that regularizes the deformation field by penalizing, e.g., its bending energy. The strength of the regularization is controlled by the set of hyperparameterst .
Given the deformed atlas, a labeling (segmentation) , with , is obtained by independently sampling the categorical distribution defined by the deformed atlas at each voxel location. Given , the observed sMRI and dMRI data are assumed to be conditionally independent from each other and across voxels. The sMRI data at follows a distribution (typically a Gaussian) whose parameters depend on the corresponding label
. Any prior knowledge on these parameters is encoded in their priors, which are governed by hyperparameter vectors. Similarly, is also assumed to be a mixture conditioned on the segmentation, described by parameters and hyperparameters , which yields a symmetric likelihood model (Fig. 2
a). The joint probability density function (PDF) of the model is therefore:
where , , , .
2.2 Model instantiation
2.2.1 Probabilistic atlas:
We follow the framework of the thalamic atlas  that we use in the experiments in Section 3, in which the atlas is encoded as a tetrahedral mesh. Deforming the mesh is penalized by a regularizer , weighted by the mesh stiffness . The prior is given by (see further details in ):
where is simply the vector of label probabilities provided by the atlas at voxel when deformed with parameters ; is the categorical distribution; and hyperparameters comprise just , i.e., .
2.2.2 Likelihood of sMRI:
In order to model the sMRI intensities given the segmentation , we follow the Bayesian brain MR segmentation literature (e.g., [1, 16, 17]) and use Gaussian intensity distributions, such that }, i.e., the mean and covariance of the intensities of class
. We place a Normal Inverse Wishart (NIW) distribution on these parameters (i.e., the conjugate prior), with zero degrees of freedom for the covariance, as we found it difficult in practice to inform such parameter a priori. Therefore, we have:, where is the hypermean and is the scale. The sMRI likelihood is thus:
is the Gaussian distribution and
is the identity matrix.
2.2.3 Likelihood of dMRI:
We have two requirements for the likelihood function of the dMRI: low demands on gradient directions and b-values to accommodate legacy data; and low number of parameters to facilitate unsupervised clustering (yet sufficient to separate the classes). To satisfy the first requirement, we adopt the diffusion tensor imaging (DTI) model, which can be fit from virtually all available dMRI data. Rather than modeling the tensors directly (e.g., with a Wishart distribution, which we found in pilot experiments to fade too quickly from its mode), we use a hierarchical model (Fig.2b) that only considers the FA
and the principal eigenvectorat each voxel, i.e., .
At the first level, we model the FA conditioned on the class, with Beta distributions parameterized by. We chose the Beta because it can model location and dispersion of signals defined on the [0,1] interval with two parameters. At the second level, we model the principal eigenvector with the Dimroth-Scheidegger-Watson (DSW) distribution, which is axial (i.e., antipodally symmetric), accommodating the directional invariance of dMRI . This distribution is also rotationally symmetric around a mean direction and its opposite (), with a dispersion around the mean parameterized by a concentration . It has fewer parameters than other axial distributions, such as the (non rotationally symmetric) Bingham. Its PDF is given by :
with domain , and where the partition function is the Kummer function in 3D : . We further assume that the concentration is modulated (multiplied) by the FA. This is a simple yet effective way of modeling the higher directional dispersion in voxels with low FA (e.g., in areas of unrestricted diffusion or fiber crossings), without having to resort to mixtures or additional parameters. The overall model for the dMRI likelihood is thus:
and the set of parameters is thus: , with . We decided not to inform these parameters, such that , and . We note that this likelihood model defines a PDF on the unit ball for vector .
2.3 Segmentation as Bayesian inference
Within our joint generative model of sMRI and dMRI, we pose segmentation as an optimization problem, seeking to maximize the posterior probability of the labeling, given the known hyperparameters and observed input data:
where we have made the standard approximation that the posterior distribution of the parameters is heavily peaked around point estimates , , given by:
Applying Bayes rule to Eq. 7, marginalizing over the hidden segmentation , and considering the structure of the model and our design choices, we obtain:
Expanding and taking logarithm, we obtain the following objective function:
We maximize Eq. 8
with a Generalized Expectation Maximization (GEM) algorithm, iterating between expectation (E) and maximization (M) steps:
2.3.1 E step:
In the E step, we use Jensen’s inequality to build a lower bound for the objective function, which touches it at the current value of the parameters:
where a soft segmentation according to the current parameter estimates:
where is the Beta function.
2.3.2 M step:
In the generalized M step, we seek to improve the lower bound Q in Eq. 9. While optimizing the bound with respect to all parameters simultaneously is difficult, optimizing different subsets each time (coordinate ascent) is feasible.
Optimizing : Fixing all other parameters and switching signs, we obtain:
This is a registration problem combining the regularizer with a data term: the Kullback–Leibler (KL) divergence between the deformed atlas and the current soft segmentation. We solve it numerically with the conjugate gradient method.
Optimizing : Setting derivatives to zero yields a closed-form solution:
Optimizing : Substituting the expression of the Beta distribution into Eq. 9, the problem decouples across classes:
This is a simple 2D optimization problem, which we solve with conjugate gradient. In the first iteration, we use the method of moments for initialization.
Optimizing : This optimization can also be carried out one at the time:
with closed-form solution given by the leading eigenvector of: .
Optimizing : This optimization problem also decouples across classes:
which we solve with conjugate gradient, initializing in the first iteration.
2.3.3 Final Segmentation:
It is straightforward to show that the approximate posterior probability of the segmentation from Eq. 6 factorizes across voxels and is given by where is obtained by evaluating Eq. 10 at the optimal parameter values . Therefore, the optimal segmentation can be computed independently at each location as:
and the expected value of the volume of class is given by: (in voxels).
2.3.4 Implementation details:
Since GEM only requires improving the bound at each iteration, we follow a schedule in which all the model parameters except for are updated once in the M step. Since updating requires solving a more computationally expensive registration problem, we only update in the M step every five GEM iterations. The method is summarized in Algorithm 1.
In practice, we also force some parameters and to be shared across classes, for increased robustness of the algorithm. For the sMRI parameters (), we follow  and force parameter sharing across: cortex, hippocampus and amygdala; reticular nucleus and white matter; mediodorsal and pulvinar nuclei; rest of thalamic nuclei; and contralateral structures. For the FA, parameters are shared across each of the six groups of thalamic nuclei in Table 2 of , and across contralateral structures. The same grouping – but without contralateral constraints – is used for the directional parameters .
3 Experiments and results
We evaluate our method with a recent probabilistic atlas of 25 thalamic nuclei and surrounding regions derived from histology . The thalamus is an excellent target region, due to its faint lateral boundaries in sMRI (as explained in Section 1), and its set of nuclei with different connectivity. We use two considerably different datasets in evaluation: HCP (state of the art) and ADNI (legacy).
HCP: Isotropic T1 and dMRI scans from 100 healthy subjects (age 29.13.3, 44 males), at 0.7 mm (T1) and 1.25 mm resolution (dMRI). We fit the DTI model to the =1000 s/mm shell (180 directions) and 12 scans with =0 (details in ).
ADNI: T1 and dMRI scans from 77 subjects from ADNI2: 39 Alzheimer’s disease (AD) and 38 age-matched controls (74.18.1 years; 40 females total). T1 resolution: 1.211 mm (sagittal); dMRI resolution:1.351.352.7 mm (axial); 5 volumes with b=0, 41 directions (b=1000 s/mm; details at adni-info.org).
3.2 Experimental setup
We evaluate three competing methods: (i) Segmentation of the whole thalamus with FreeSurfer ; (ii) Segmentation of thalamic nuclei using Bayesian segmentation on T1 only ; and (iii) Segmentation of thalamic nuclei with the full model, including dMRI. We compare these approaches in three experiments: (i) Qualitative assessment of segmentation and tractography in HCP; (ii) Correlation between thalamic and total intracranial volume (ICV) in HCP; and (iii)
Ability to discriminate AD and control subjects based on volumes in ADNI. The sMRI and dMRI data are resampled to 0.5 mm isotropic in a bounding box around the thalami (DTI is interpolated in a log-euclidean framework). We set as in , to the median T1 intensity in class according to the main FreeSurfer segmentation, and to the volume of the class in mm.
Figure 3 shows qualitative results on an HCP subject. FreeSurfer almost completely misses the left pallidum (yellow arrow in the figure) and oversegments the thalami. We test the effects of the latter on tractography by reconstructing the full dMRI data with generalized q-sampling , performing whole-brain tractography, and isolating the tracts that intersect the whole thalami, as automatically segmented by the three competing methods. The FreeSurfer thalamus yields many false positive tracts, mostly due to overlap with the internal capsule (red arrow). Aggregating the nuclei produced by Bayesian segmentation on the T1 produces a more accurate boundary, but still oversegments the anterior thalamus (white arrow), and undersegments the pulvinar nucleus (black arrow). Our multi-modal method yields less false positive tracts, and segments thalamic nuclei that are more homogeneous in terms of diffusion orientation and FA.
We also evaluate segmentation performance quantitatively on HCP, in an indirect fashion, by computing the correlation of total thalamic volume obtained by each method (left-right averaged) with the ICV estimated by FreeSurfer; noisy thalamic segmentations are expected to degrade this correlation. Scatter plots and regression lines are shown in Fig. 4. The FreeSurfer volumes are quite large on average, and their correlation with ICV is =0.71. Bayesian segmentation with T1 yields
=0.68 (not significantly different, with p=0.37 on a two-tailed Steiger test). The proposed algorithm produces fewer outliers than the other two, and yields the highest correlation (=0.81), significantly higher than those of FreeSurfer (p=0.006) and T1-only segmentation (p=0.001).
Finally, we evaluate the ability of the segmented volumes to classify the ADNI subjects into AD and controls. We use a simple linear discriminant analysis, whose performance is mostly determined by the quality of the volumes. We project the volumes (left-right averaged, corrected for ICV and age) onto the normal to the discriminant hyperplane in a leave-one-out fashion. We use the projections to compute the area under the ROC curve (AUC), accuracy at its elbow, and sample sizes (). Results are shown in Table 1. Our method yields a fair improvement compared with T1-only Bayesian segmentation (increase of 7 points in accuracy and AUC, and reduction of 6 samples). Compared with FreeSurfer, our algorithm reduces the sample size by 60%.
|Method||FreeSurfer (whole)||Bayes. Seg. T1||Proposed|
|Accuracy at elbow||61.0%||67.5%||74.0%|
We have presented a Bayesian method for joint segmentation of sMRI and dMRI, which is robust to changes in acquisition platform and protocol – as shown with two substantially different datasets. Compared with Bayesian segmentation using sMRI alone, our method produces more accurate boundaries for subcortical structures, and yields smaller sample sizes in an AD classification task.
Future work at the methodological level will follow five main directions: (i) Modeling partial voluming in the dMRI, which may be important for smaller structures; (ii) Exploring other axial PDFs, as well as mixtures; (iii) Placing a prior on the dMRI likelihood parameters, e.g., to utilize prior knowledge on the FA; (iv) Modeling the bias field in the sMRI data, e.g., as in ; and (v) Adding connectivity derived from tractography to the dMRI likelihood, which may be challenging because tractography results depend largely on the MR acquisition.
We also plan to manually trace structures some of the HCP and AD data, with three purposes. First, to include white matter bundles in the atlas, as modeling the whole cerebral white matter with a single Beta-DSW is not realistic (not even within a bounding box). Second, to enable direct evaluation of our segmentations. And third, to help us explain discrepancies in AD classification accuracy between our results and those presented in , which may be due to the their larger dataset, their different ADNI sample, or some other factor.
As high-resolution dMRI becomes more common in neuroimaging, we believe that segmentation techniques that jointly model gray and white matter with sMRI and dMRI – like the one in this paper – will become increasingly important.
Acknowledgement: Supported by Horizon 2020 (ERC Starting Grant 677697, Marie Curie grant 765148), Danish Council for Independent Research (DFF-6111-00291), NIH (R21AG050122, P41EB015902), Wistron Corp., SIP, and AWS.
-  Ashburner, J., Friston, K.: Unified segmentation. Neuroimage 26 (2005) 839–851
-  Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., et al.: Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33(3) (2002) 341–355
-  Iglesias, J.E., Sabuncu, M.R.: Multi-atlas segmentation of biomedical images: a survey. Med. Im. Anal. 24(1) (2015) 205–219
-  Roy, A.G., Conjeti, S., Navab, N., Wachinger, C.: Quicknat: Segmenting MRI neuroanatomy in 20 seconds. NeuroImage 186(1) (2019) 713–727
-  Patenaude, B., Smith, S., Kennedy, D., Jenkinson, M.: A bayesian model of shape and appearance for subcortical brain segmentation. NeuroImage 56 (2011) 907–922
-  Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., et al.: The WU-Minn human connectome project: an overview. NeuroImage 80 (2013) 62–79
-  O’Donnell, L.J., Westin, C.F.: Automatic tractography segmentation using a high-dimensional white matter atlas. IEEE Trans. Med. Im. 26(11) (2007) 1562–1575
-  Wassermann, D., Bloy, L., Kanterakis, E., Verma, R., Deriche, R.: Unsupervised white matter fiber clustering and tract probability map generation: applications of a Gaussian process framework for white matter fibers. NeuroImage 51 (2010) 228
-  Yendiki, A., Panneck, P., Srinivasan, P., Stevens, A., Zöllei, L., et al.: Automated probabilistic reconstruction of white-matter pathways in health and disease using an atlas of the underlying anatomy. Front. Neuroinform. 5 (2011) 23
-  Awate, S.P., Zhang, H., Gee, J.C.: A fuzzy, nonparametric segmentation framework for DTI and MRI analysis: with applications to DTI-tract extraction. IEEE Trans. Med. Im. 26(11) (2007) 1525–1536
-  Hagler, D.J., Ahmadi, M.E., Kuperman, J., Holland, D., McDonald, C.R., et al.: Automated white-matter tractography using a probabilistic diffusion tensor atlas: Application to temporal lobe epilepsy. Hum. Brain Map. 30(5) (2009) 1535–1547
-  Behrens, T.E., Johansen-Berg, H., Woolrich, M., Smith, S., Wheeler-Kingshott, C., et al.: Non-invasive mapping of connections between human thalamus and cortex using diffusion imaging. Nat. Neurosci. 6(7) (2003) 750
-  Saygin, Z.M., Osher, D.E., Augustinack, J., Fischl, B., Gabrieli, J.D.: Connectivity-based segmentation of human amygdala nuclei using probabilistic tractography. Neuroimage 56(3) (2011) 1353–1361
-  Stough, J.V., Glaister, J., Ye, C., Ying, S.H., Prince, J.L., Carass, A.: Automatic method for thalamus parcellation using multi-modal feature classification. Proc. of MICCAI (Pt 3) (2014) 169–176
-  Glaister, J., Carass, A., Stough, J.V., Calabresi, P.A., Prince, J.L.: Thalamus parcellation using multi-modal feature classification and thalamic nuclei priors. Proc SPIE Int Soc Opt Eng 9784 (2016)
-  Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Trans.Med.Im. 20(1) (2001) 45–57
-  Iglesias, J.E., Insausti, R., Lerma-Usabiaga, G., Bocchetta, M., Van Leemput, K., et al.: A probabilistic atlas of the human thalamic nuclei combining ex vivo MRI and histology. NeuroImage 183 (2018) 314–326
Van Leemput, K.:
Encoding probabilistic brain atlases using bayesian inference.IEEE Trans. Med. Im. 28(6) (2009) 822
-  Zhang, H., Schneider, T., Wheeler-Kingshott, C.A., Alexander, D.C.: NODDI: practical in vivo neurite orientation dispersion and density imaging of the human brain. Neuroimage 61(4) (2012) 1000–1016
-  Mardia, K., Jupp, P.: Directional statistics. Volume 494. John Wiley & Sons (2009)
-  Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B (1977) 1–38
-  Sotiropoulos, S.N., Jbabdi, S., Xu, J., Andersson, J.L., Moeller, S., et al.: Advances in diffusion MRI acquisition and processing in the human connectome project. Neuroimage 80 (2013) 125–143
-  Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med. 56(2) (2006) 411–421
-  Yeh, F.C., Wedeen, V.J., Tseng, W.Y.I.: Generalized q-sampling imaging. IEEE Trans. Med. Im. 29(9) (2010) 1626–1635