Generative diffeomorphic atlas construction from brain and spinal cord MRI data

by   Claudia Blaiotta, et al.

In this paper we will focus on the potential and on the challenges associated with the development of an integrated brain and spinal cord modelling framework for processing MR neuroimaging data. The aim of the work is to explore how a hierarchical generative model of imaging data, which captures simultaneously the distribution of signal intensities and the variability of anatomical shapes across a large population of subjects, can serve to quantitatively investigate, in vivo, the morphology of the central nervous system (CNS). In fact, the generality of the proposed Bayesian approach, which extends the hierarchical structure of the segmentation method implemented in the SPM software, allows processing simultaneously information relative to different compartments of the CNS, namely the brain and the spinal cord, without having to resort to organ specific solutions (e.g. tools optimised only for the brain, or only for the spinal cord), which are inevitably harder to integrate and generalise.



There are no comments yet.


page 12

page 13


A generative model of realistic brain cells with application to numerical simulation of diffusion-weighted MR signal

In this work, we introduce a novel computational framework that we devel...

A model of brain morphological changes related to aging and Alzheimer's disease from cross-sectional assessments

In this study we propose a deformation-based framework to jointly model ...

MRI Super-Resolution using Multi-Channel Total Variation

This paper presents a generative model for super-resolution in routine c...

Bidirectional Modeling and Analysis of Brain Aging with Normalizing Flows

Brain aging is a widely studied longitudinal process throughout which th...

Brain MR Image Segmentation in Small Dataset with Adversarial Defense and Task Reorganization

Medical image segmentation is challenging especially in dealing with sma...

Factorisation-based Image Labelling

Segmentation of brain magnetic resonance images (MRI) into anatomical re...

Data-driven Probabilistic Atlases Capture Whole-brain Individual Variation

Probabilistic atlases provide essential spatial contextual information f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The spinal cord is a long and thin cylindrical structure of the central nervous system, which constitutes the main pathway for transmitting information between the brain and the rest of the body. Not only is the spinal cord a major site of traumatic injury (SCI), but it can also be affected by a number of neurodegenerative diseases, such as multiple sclerosis, amyotrophic lateral sclerosis, transverse myelitis and neuromyelitis optica (Rocca et al., 2015). Indeed, the spinal cord is a clinically eloquent structure, since trauma, ischemia and inflammation can affect the cord at any level, thus resulting in impairment of motor, sensory and autonomic functions (Huber et al., 2015; Freund et al., 2013a).

Understanding these degenerative processes represents a crucial step towards the development of effective therapeutic interventions, as well as towards the identification of sensitive and selective diagnostic criteria. In particular, quantification of spinal cord tissue loss (i.e. atrophy) has been regarded over the past two decades as a promising biomarker, which could potentially help in monitoring disease progression, predicting clinical outcome and understanding the mechanisms underlying neurological disability (e.g. demyelination, inflammation, axonal or neuronal loss), in a number of conditions that affect the central nervous system both at the brain and spinal cord level, such as multiple sclerosis (MS) and traumatic spinal cord injury (SCI)

(Miller et al., 2002; Freund et al., 2013b, a; Grossman et al., 2000; Bakshi et al., 2005).

Since the spinal canal is surrounded and protected by a thick vertebral bone layer, neuroimaging techniques, particularly MRI, represent the most effective tools to investigate non-invasively and in vivo the structure and function of the spinal cord, both in physiological and pathological conditions. Unfortunately spinal cord MRI is not immune from technical challenges. Some of them are intrinsic to MR imaging, such as the presence of intensity inhomogeneities, while others arise from the peculiar anatomy of the cord itself, for instance from its small cross-sectional area (Grossman et al., 2000; Wheeler-Kingshott et al., 2014; Stroman et al., 2014). Nevertheless, spinal cord imaging using MR techniques has improved significantly over the past few years, especially with the introduction of phased-array surface coils and fast spin-echo sequences (Stroman et al., 2014).

Further advances in the field of spinal cord MRI are encouraged by the fact that significant correlations between spinal cord atrophy measures, obtained from imaging data, and indicators of neurological impairment, such as motor or sensory function scores, have been shown and reproduced, within multiple spinal cord imaging studies (Losseff and Miller, 1998; Kidd et al., 1993; Filippi et al., 1996; Losseff et al., 1996; Freund et al., 2013b; Grabher et al., 2015).

Within these type of studies, delineating the cord represents the first step for assessing atrophy or detecting any other morphometric change, or difference. This indicates that there is an urgent need not only for automated algorithmic solutions dedicated to spinal cord tissue classification and image registration (Chen et al., 2013; Van Uitert et al., 2005; Fonov et al., 2014; Levy et al., 2015; Taso et al., 2014; De Leener et al., 2017), but also for large, systematic and reproducible validation studies to objectively assess the performance of such tools (Prados et al., 2017).

Not surprisingly, the first methods that appeared in the literature to perform spinal cord image segmentation and the subsequent volumetric analyses were based on semi-automated algorithms. Among these, one of the earliest is described in the work of Coulon et al. (2002), where they introduce an algorithm for fitting a cylindrical cubic B-spline surface to MR spinal cord images. Later on, a few other semi-automated solutions have been presented by Van Uitert et al. (2005) and Horsfield et al. (2010). All of these methods require that the user approximately marks the cord centre, so as to provide a reliable initialisation of the algorithms.

Only very recently have fully automated spinal cord segmentation methods started to be proposed. Chen et al. (2013) introduced a fuzzy c-means algorithm with topological constraints to segment the cervical and thoracic spinal cord from MR images. Their method relies on a statistical atlas of the cord and the surrounding CSF, which is constructed from only five manual segmentations. Instead, De Leener et al. (2014) proposed a fully automated method for delineating the contour of the spinal cord, in T1- and T2-weighted MR images, by warping of a deformable cylindrical model.

The first significant effort to define and introduce a standard anatomical space for spinal cord neuroimaging studies relates to the work of Fonov et al. (2014), who developed a standard stereotactic space for spinal cord imaging data, between the vertebral levels of C1 and T6 (MNI-Poly-AMU template).

Their template is generated using the image registration algorithm presented in Avants et al. (2008)

and includes a T2-weighted average image, together with probabilistic gray and white matter maps. Such tissue probability maps were developed by

Taso et al. (2014), via automated registration of manually labelled MRI scans of 15 subjects.

The work of Fonov et al. (2014); Levy et al. (2015); Taso et al. (2014) constitutes an important step towards the development of robust and reliable tools for analyzing structural spinal cord data. Indeed, having a common anatomical framework can potentially allow the comparison of results obtained by different research groups on different data sets, thus speeding up the progress of spinal cord imaging research.

The work presented here aims to provide researchers with a general and comprehensive modelling framework to interpret large data sets of MRI scans from a Bayesian generative perspective. This is achieved by building on the modelling elements introduced in Ashburner and Friston (2005, 2011); Blaiotta et al. (2016), which are further expanded here and integrated in one single algorithmic framework.

The aim is to demonstrate the validity of such a generative approach, especially for the purpose of performing simultaneous brain and spinal cord morphometric analyses using MRI data sets. In doing so, a strategy is outlined on how to overcome some of the limitations of most currently available image processing tools for neuroimaging, whose performance has been optimised on the brain at the expense of the spinal cord (indeed the spinal cord is frequently neglected tout court by such tools).

2 Methods

Let us consider a population of subjects belonging to a homogeneous group, from an anatomical point of view, and let us assume that image volumes of different contrast are available for each subject.

From a generative perspective, the image intensities , which constitute the observed data, can be thought of as being generated by sampling from

-dimensional Gaussian mixture probability distributions, after non-linear warping of a probabilistic anatomical atlas

(Evans et al., 1994).

Such an atlas carries a priori anatomical knowledge, in the form of average shaped tissue probability maps. From a mathematical modelling point of view, the atlas encodes local (i.e. spatially varying) mixing proportions of the mixture model, with being an index set over the template voxels, as detailed in the following subsection.

2.1 Tissue priors

Each image voxel , for each subject is considered as being drawn from possible tissue classes. The following prior latent variable model defines the probability of finding tissue type , at a specific location (i.e. centre of voxel ), in image , prior to observing the corresponding image intensity signal


or equivalently


Class memberships, for each subject and each voxel, are encoded in the latent variable , which is a

-dimensional binary vector.

are scalar functions of space , common across the entire population, which satisfy the constraint


with being a continuous coordinate vector field. Global weights are introduced to further compensate for individual differences in tissue composition.

In equation (1), denotes a generic spatial transformation, parametrised by , which allows projecting prior anatomical information onto individual data, with being a continuous mapping from the domain of image , into the space of the tissue priors .

Since digital image data is a discrete signal, defined on a tridimensional voxel grid, each mapping needs to be discretised as well, via sampling at the centre of every voxel , to give the discrete mapping that appears in (1).

As opposed to the modelling approach described in Ashburner and Friston (2005), where the tissue priors were considered as fixed and known a priori

quantities, here the tissue probability maps are treated as random variables, whose point estimates or full posteriors can be inferred via model fitting

(Bhatia et al., 2007; Ribbens et al., 2014).

For this purpose, a finite dimensional parametrisation of

needs to be defined. Typically, whenever a continuous function needs to be reconstructed from a finite discrete sequence, it is possible to formulate the problem as an interpolation that makes use of a finite set of coefficients and continuous basis functions. Since the priors

are bounded to take values in the interval on the entire domain (see equation (3)), not all basis functions are well suited here. Linear basis functions, besides being quite a computationally efficient choice, have the convenient property of preserving the values of in the interval , as long as the coefficients are also in the same interval. Such coefficients belong to the discrete set of -dimensional vectors, with


They can be learned directly from the data, as it will be shown in the following section.

Additionally, prior distributions on the parameters can be introduced (Bishop, 2006). Dirichlet priors are the most convenient choice here, since they are conjugate to multinomial forms of the type in (2), and they can be expressed as


where the normalising constant is given by


with being the gamma function and


2.2 Diffeomorphic image registration

As anticipated in the previous sections, the generative interpretation of imaging data that this work relies on involves warping an unknown, average-shaped atlas to match a series of individual scans.

Such a problem, that is to say template matching via non-rigid registration, has been largely explored in medical imaging, mainly for solving image segmentation or structural labeling problems, in an automated fashion (Ashburner and Friston, 2005; Shen and Davatzikos, 2004; Christensen, 1999; Chui et al., 2001; Bajcsy et al., 1983; Iglesias et al., 2012; Pluta et al., 2009; Warfield et al., 1999; Khan et al., 2008; Bowden et al., 1998).

Indeed, the modelling of spatial mappings between different anatomies can be approached in a variety of manners, depending on the adopted model of shape and on the objective function (i.e. similarity metric and regularisation) that the optimisation is based on, thus leading to a variety of algorithms with remarkably different properties (Penney et al., 1998; Denton et al., 1999; Klein et al., 2009).

The work presented here is formulated according to the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework (Younes, 2010), where the transformations mapping between the source images and the target image are assumed to belong to a Riemannian manifold 111A Riemannian manifold, in differential geometry, is a smooth manifold equipped with a Riemannian metric (inner product). In particular, the Riemannian metric on the -dimensional manifold defines, for every point , the scalar product of vectors in the tangent space , in such a way that given two vectors , the inner product depends smoothly on the point . The tangent space represents the nearest approximation of the manifold by a vector space (Warner, 2013). of diffeomorphisms (Ashburner, 2007). A diffeomorphism is a smooth differentiable map (with a smooth differentiable inverse ) defined on a compact, simply connected domain .

One way of constructing transformations belonging to the diffeomorphic group is to solve the following non-stationary transport equation (Joshi and Miller, 2000)


where is a time dependent, smooth velocity vector field, in the Hilbert space222A Hilbert space is a complete inner product space, where an inner product is a map  , which associates each pair of vectors in the space with a scalar quantity. In particular given and

(9) (10) (11)
An inner product naturally induces a norm by , therefore every inner product space is also a normed vector space (Dieudonné, 2013). .

The initial map, at , is equal to the identity transform , while the final map, endpoint of the flow of the velocity field , can be computed by integration on the unitary time interval (Beg et al., 2005).


Following from the theorems of existence and uniqueness of the solution of partial differential equations (

p.d.e.), the solution of (8) is uniquely determined by the velocity field and by the initial condition .

A diffeomorphic path is not only differentiable, but also guaranteed to be a one-to-one mapping. Such a quality is highly desirable for finding morphological and functional correspondences between different anatomies without introducing tears or foldings, which would violate the conditions for topology preservation (Christensen, 1999). Additionally, the diffeomorphic framework provides metrics to quantitatively evaluate distances between anatomies or shapes. It should also be noted that diffeomorphisms are locally analogous to affine transformations (Avants et al., 2006).

In practice, finding an optimal diffeomorphic transformation to align a pair, or a group, of images involves optimising an objective function (e.g. minimising a cost function), in the space of smooth velocity vector fields defined on the domain . The required smoothness is enforced by constructing the norm on the space through a differential operator (Beg et al., 2005), such that a quantitative measure of smoothness can be obtained via


where is a discretised version of .

The form of the cost function will depend on how the observed data is modelled. For the work presented here, groupwise alignment is achieved via maximisation of the following variational objective function


where is the set of latent variables across the entire population, are -dimensional vectors of posterior belonging probabilities, indicates the coefficients used to parametrise the tissue priors and denotes a set of individual tissue weights for rescaling the tissue probability maps. The coordinate mappings are encoded in the parameter set , which consists of vectors of coefficients , containing elements each. Such coefficients can be used to construct continuous initial velocity fields via trilinear, or higher order, interpolation.

A procedure known as geodesic shooting (Miller et al., 2006; Ashburner and Friston, 2011; Allassonnière et al., 2005; Vialard et al., 2012; Beg and Khan, 2006) is applied, within the work presented here, to compute diffeomorphic deformation fields from corresponding initial velocity fields. Such a procedures exploits the principle of conservation of momentum (Younes et al., 2009), which is given by , with being the adjoint of the differential operator , to integrate the dynamical system governed by (8) without having to store an entire time series of velocity fields. The implementation adopted here relies on the work presented in Ashburner and Friston (2011).

The posterior membership probabilities that appear in (14) can be computed by combining the prior latent variable model introduced in 2.1 with a likelihood model of image intensities, which will be described in subsection 2.4

, thus leading to a fully unsupervised learning scheme.

Alternatively, when manual labels are available, binary posterior class probabilities can be derived directly from such categorical annotations, without performing inference from the observed image intensity data. In particular, if all input data has been manually labelled, then the resulting algorithm would implement a fully supervised learning strategy, while, if only some of the data has associated training labels, a hybrid approach can be adopted, which would fall into the category of semisupervised learning

(Chapelle et al., 2006; Filipovych et al., 2011).

Finally, it is also possible to take into account the uncertainty inherent in the process of manual rating. In such a case, the actual posterior probabilities can be computed by making use of the categorical output of manual labelling together with an estimate of the rater sensitivity and with a generative intensity model.

Making use of Bayes rule, this gives


where indicates the set of model parameters, are categorical manual labels assigned to image and indicates the probability of voxel in image belonging to class , given the manual label attributed to the same voxel.

A simple model for this, is


where is the sensitivity of the rater that generated the set of labels for image . The problem of how to evaluate the performance of a manual or automated rater is not addressed here. For instance, a probabilistic scheme, which has been widely used to assess segmentation performance in medical imaging, is presented in Warfield et al. (2004).

2.3 Combining diffeomorphic with affine registration

Anatomical shapes are very high dimensional objects. The diffeomorphic model described in the previous section can account for a significant amount of shape variability in the observed data. Nevertheless, it is still convenient, mainly for computational reasons, to combine such a local, high dimensional shape model with global, lower dimensional transformations, such as rigid body or affine transforms. In fact, by beginning to solve the registration problem from the coarsest deformation components (e.g. rigid body or affine), it is possible to ensure that the subsequent diffeomorphic registration starts from a good initial estimate of image alignment, that is to say closer to the desired global optimum (Lester and Arridge, 1999).

This makes the optimisation problem faster to solve and at the same time it reduces significantly the chance of registration failure (Modersitzki, 2004). Indeed, it is relatively common for non-linear registration algorithms to perform poorly in the presence of a large translational or size mismatch between the reference and the target images (Jenkinson and Smith, 2001).

A possible parametrisation that combines affine and diffeomorphic transformations is


where is the resulting mapping from image of subject into the template space. Such a mapping is obtained by affine transforming the diffeomorphic deformation field . The transformation matrix

encodes nine degrees of freedom (rotation, zooming and shearing) and is computed via an exponential map

with , where is the Lie algebra for the affine group in three dimension and is a vector of nine parameters (Ashburner and Ridgway, 2013). Translations are modelled by the vector . The entire set of affine parameters is denoted as .

2.4 Intensity model

From a general probabilistic perspective, classification of tissue types based on MR signal intensities requires a model of the observed data that is capable of capturing the probability of occurrence of each signal sample value , provided that the true labels are known. In other words, the problem breaks down into defining suitable conditional probabilities , for each and then applying Bayes rule to infer the posterior class probabilities.

In the model adopted here, image intensity distributions are represented as Gaussian mixtures, with the unknown mean and covariance matrix of each Gaussian component , for subject , being governed by Gaussian-Wishart priors (Bishop, 2006; Blaiotta et al., 2016).

Correction of intensity inhomogeneities is also performed within the same modelling framework and it involves multiplying the uncorrected intensities of each image volume by a bias field, which is modelled as the exponential of a weighted sum of discrete cosine transform basis functions (Styner et al., 2000; Ashburner and Friston, 2005). Such an approach is conceptually equivalent to scaling the probability distributions of all Gaussian components by a local scale parameter, which is the bias itself, such that




where denotes the set of bias field parameters and is a -dimensional vector representing the bias for subject at voxel .

2.5 Graphical model

A graphical representation of the model adopted in this paper is depicted in Figure 1, while a legend of the symbols used to indicate the different variables can be found in table 1.

Symbol Meaning
Observed image intensity at voxel for subject .
Vector of latent class membership probabilities.
Tissue priors at voxel .
Mean intensity of class for subject .
Covariance of intensities for class and subject .
Scale matrix of Wishart prior distribution on
Degrees of freedom of Wishart prior distribution on .
Mean of Gaussian prior distribution over

Scaling hyperparameter of Gaussian prior distribution over

Hyperparameter governing the Dirichlet prior on .
Bias field parameters.
Prior mean of bias parameters.
Prior covariance matrix of bias parameters.
Affine transformation parameters.
Prior mean of affine transformation parameters.
Prior covariance matrix of affine transformation parameters.
Weights for rescaling the tissue priors.
Initial velocity at voxel for subject .
Differential operator to compute penalty on .
Table 1:
Figure 1: Graphical representation of the model adopted in this paper. Observed variables are represented by a filled circle. Latent variables as well as model parameters are depicted as unfilled circles. Blue solid dots correspond to fixed hyperparameters. The so called plate notation is adopted to indicated repeated variables. Symbols referring to all variables and parameters are listed in table 1.

Given such a model, it is possible to define the following variational objective function , which constitutes a lower bound on the logarithm of the marginal joint probability , such that




where the expectations indicated as and are computed with respect to variational posterior distributions on the latent variables   and on the Gaussian means and covariances , respectively. Optimisation of , which provides optimal parameter and hyperparameter estimates, will be discussed in the following section.

2.6 Model fitting

The model described in the previous section can be fit to data sets of MR images by combining a variational expectation-maximisation (VBEM) algorithm with gradient based numerical optimisation techniques.

Indeed, the VBEM algorithm described in Blaiotta et al. (2016) is well-suited for addressing the model estimation problem discussed here, since it allows learning posterior distributions on the Gaussian mixture parameters, under the assumption that factorizes as (Bishop, 2006), and at the same time it is able to transfer the information encoded in such posteriors to estimate empirical intensity priors for each tissue type.

Additionally, the algorithm proposed in this paper loops over all subjects in the population and, for each subject, it iterates over estimating the Gaussian posteriors, the bias field, the affine parameters and the initial velocities, which are all treated as conditional optimisations. Subsequently the tissue probability maps and intensity priors are updated and the whole cycle is repeated until convergence.

Estimation of the bias field parameters can be conveniently performed via non-linear optimisation techniques. Here the problem is solved using the Gauss-Newton method (Bertsekas, 1999), so as to maximise the objective function in (21) with respect to . The resulting implementation is very similar to the one described in Ashburner and Friston (2005), therefore further details are omitted here. Optimisation of the affine parameters can also be carried out by means of a Gauss-Newton scheme and a brief description of the required computations can be found A. For the update of the weight parameters we adopt the same strategy outlined in Ashburner and Friston (2005); Blaiotta et al. (2016).

The following sections instead will present in detail the algorithmic scheme used to learn the average shaped tissue templates and to estimate the set of initial velocity fields .

2.6.1 Updating the tissue priors

At each main iteration of the algorithm, the tissue priors need to be updated, given the current estimates of all the other parameters, which are kept fixed for each individual in the population.

Considering only the terms in (21) that depend on gives the following objective function, which has to be maximised with respect to


It should be noted that the parameters that need to be estimated are defined on the domain of the template , rather than on the individual spaces . For this reason equation (22), which is a sum of integrals on the native domains, needs to be mapped to , by inverting the warps , to give


where the determinants of the Jacobian matrices of the deformations are included to preserve volumes after the change of variables.

Finally equation (23) is discretised on a regular voxel grid, whose centres have coordinates , to give




The prior term is given by the following Dirichlet distribution


Maximising equation (24) is a constrained optimisation problem, subject to


A closed form solution could be easily found if the rescaling weights were all equal to one. In such a case


which could be maximised under the constraint (29), by making use of Lagrange multipliers (Falk, 1967), to give


with .

This solution would provide maximum a posteriori point estimates of . However for this problem, it would also be possible to derive a full variational posterior distribution, which, like its prior, would take a Dirichlet form, with parameters  .

When rescaling of the tissue priors is allowed the optimisation problem becomes more complex. The strategy adopted here consists in finding an approximate solution to the unconstrained optimisation problem by setting the derivatives of the objective function in (23) to zero


Solving with respect to , under the simplifying assumption that the term can be treated as a constant, gives


Such a solution is then projected onto the constraining hyperplane, by preserving tissue proportions at each voxel


Experimental testing of this strategy indicated that it gave a constant improvement of the objective function at a relatively cheap computational cost. Alternatively, iterative constrained non-linear optimisation techniques (Powell, 1978) could have been exploited to solve the template update problem.

2.6.2 Computing the deformation fields

Groupwise image alignment is achieved by optimisation of the variational objective function defined in (21), with respect to the parameters used to compute the deformations. This is equivalent to adopting the following image matching or similarity term


Additionally, working on discretised image grids, with associated voxel centres , requires reformulating as




The penalty term for this groupwise image registration problem is given by


with being a dimensional vector of parameters used for representing the initial velocity field of image and encoding affine deformation parameters used to compute the transformation in (17).

For each image in the data set, updating the corresponding initial velocity field, given the current estimates of the templates and all the other model parameters, involves optimising the following objective function


with respect to , under the following deformation model


where is a diffeomorphism computed via geodesic shooting (Ashburner and Friston, 2011) from the corresponding initial velocity field .

Here image registration is solved via Gauss-Newton optimisation, which requires computing both the first and second derivatives of the objective function (Hernandez and Olmos, 2008). Such derivatives can be found in B. This leads to a very high dimensional inverse problem, which unfortunately cannot be solved via numerical matrix inversion, since this would be prohibitively expensive from a computational point of view. The approach adopted in this work consists in treating this optimisation as a partial differential equation problem, which can efficiently be solved using multigrid methods (Modersitzki, 2004). In particular, we adopt the same full multigrid implementation as in Ashburner (2007).

3 Validation and Discussion

In this section we will present results obtained by applying the presented modelling framework to real brain and cervical cord MR scans acquired with different imaging protocols, as well as to synthetic MR head volumes. Both qualitative and quantitative measures will be provided to assess the behaviour of the proposed approach.

3.1 Template construction

As discussed in Section 2, the proposed method can serve to learn prior tissue probability maps from cross-sectional imaging data sets. In this paper we mainly explore the performance of such a framework to address the quest for integrated brain and spinal cord neuromorphometric tools, even if, given the generality of the presented approach, many more applications should in principle be possible.

3.1.1 Data

The input data for training the model was obtained from three different databases, two of which are freely accessible for download, thus ensuring that the results presented here could readily be compared to those produced by competing algorithms for medical image registration or segmentation.

OASIS data set

The first data set consists of thirty five T1-weighted MR scans from the OASIS (Open Access Series of Imaging Studies) database (Marcus et al., 2007). The data is freely available from the web site, where details on the population demographics and acquisition protocols are also reported. Additionally, the selected thirty five subjects are the same ones that were used within the 2012 MICCAI Multi-Atlas Labeling Challenge (Landman and Warfield, 2012).

Balgrist data set

The second data set consists of brain and cervical cord scans of twenty healthy adults, acquired at University Hospital Balgrist with a 3T scanner (Siemens Magnetom Verio). Magnetisation-prepared rapid acquisition gradient echo (MPRAGE) sequences, at 1 isotropic resolution, were used to obtain T1-weighted data, while PD-weighted images of the same subjects were acquired with a multi-echo 3D fast low-angle shot (FLASH) sequence, within a whole-brain multi-parameter mapping protocol (Weiskopf et al., 2013; Helms et al., 2008).

IXI data set

The third and last data set comprises twenty five T1-, T2- and PD-weighted scans of healthy adults from the freely available IXI brain database, which were acquired at Guy’s Hospital, in London, on a 1.5T system (Philips Medical Systems Gyroscan Intera). Additional information regarding the demographics of the population, as well as the acquisition protocols, can be found at

The complete data set therefore consists of eighty multispectral scans of healthy adults, obtained with fairly diverse acquisition protocols and using scanning systems produced by different vendors.

Unfortunately, not all the three modalities of interest (T1-, T2- and PD-weighted) are available for all of the subjects. To circumvent the difficulties arising from the presence of missing imaging modalities, without neglecting any of the available data (indeed deletion of entries with missing data is still, in spite of its crudity, a common statistical practice), the Gaussian mixture modelling approach discussed in

Blaiotta et al. (2016) was generalised by introducing an additional variational posterior distribution over the missing data points.

In practice, the resulting variational EM scheme iterates over first estimating an approximated posterior distribution on the unknown image intensities, secondly updating the sufficient statistics of the complete (observed and missing) data and finally computing variational posteriors on the Gaussian mixture parameters. Additional computational details relative to this strategy are provided in C.

In synthesis, it was possible to fit the generative groupwise model described in this paper to the entire data set, in spite of having different imaging modalities available from the different acquisition sites. This is indeed a very common scenario in real life medical imaging problems, therefore it should be actively addressed by processing or modelling solutions that claim to be applicable to large population data (van Tulder and de Bruijne, 2015).

Manual brain labels are freely available for all images in data set one. Such labels have been generated and made public by Neuromorphometrics, Inc. ( under academic subscription and they provide a fine parcellation of cortical and non cortical structures, for a total of 139 labels across the brain.

Part of this label data was used for training of the model while the remainder was left out for testing and validation. In particular, brain labels of twenty out of the thirty five OASIS subjects were used to create gray and white matter ground truth segmentations, which were provided as training input for semisupervised model fitting.

Similarly, spinal cord manual labels were created for forty subjects (twenty from data set two and twenty from data set three). Such labels were randomly split in half for training and half for subsequent test analyses. Due to the limited resolution of the data it was not possible to manually delineate gray and white matter within the spinal cord. For this reason, each voxel classified as spinal cord in the training data was allowed to be assigned either to the gray or to the white matter tissue classes, based on the fit of its intensity value to the underlying Gaussian mixture model, as outlined in equation (


Analogously, in spite of having defined only one gray matter training label, two distinct gray matter classes were introduced in the mixture model (top two rows in Figure 2), to best capture the corresponding distribution of image intensities, which is poorly represented by a single Gaussian component, as opposed to the distribution of white matter intensities. Also in this case, membership probabilities of the labelled training data were computed based on the corresponding intensity values, by making use of equation (15).

3.1.2 Tissue templates and intensity priors

The tissue probability maps obtained by applying the modelling framework presented in this paper to the data set described above are depicted in Figure 2. The total number of tissue classes used for this experiment is equal to twelve but three classes, representing air in the background, are not shown.

In particular, Figure 2 shows how one of the two gray matter classes (first row) best fits the subcortical nuclei and also includes voxels affected by partial volume effects at the interface between gray and white matter, while the second one (second row) is more representative of cortical structures, with the presence of partial volume effects generated by the juxtaposition of gray matter and CSF. The third row in Figure 2 shows the white matter class, which also includes most of the brainstem and the spinal cord.

The remaining tissue classes were estimated in a purely unsupervised way. Therefore a non ambiguous anatomical interpretation is not straightforward.

Tissue class four (fourth row) mainly contains CSF, even if other tissues are also present, especially in the neck area. This should be attributed to the lack of CSF training labels as well as to a poor multivariate coverage of the cervical region in the available data. In fact, data from the OASIS set is truncated around the first cervical vertebra. The T1-weighted scans of the IXI data set cover up to the C2/C3 vertebral level, but the corresponding T2- and PD-weighted scans do not extend beyond the brainstem. Indeed, only the data from the second database (Balgrist hospital) provides more than one modality covering up to around the fourth cervical vertebra. In this case though, additional difficulties arose from poor inter-modality alignment of the data, a problem that turned out to be particularly severe in the cervical region and that, given its non-linearity, could not be fully compensated for by affine inter-modality coregistration.

Bone tissue is also not easily identifiable from the data available for this experiment, but it could have potentially been much better extracted by incorporating some CT scans into the training data.

Fat and soft tissues are mainly represented in the last two classes (bottom two rows in Figure 2).

Figure 2: Tissue probability maps obtained by applying the presented groupwise generative model to a multispectral data set comprising brain and cervical cord scans of eighty healthy adults, from three different databases.
(a) Gray matter
(b) White matter
Figure 3: Brain and spinal cord tissue probability maps of gray LABEL:sub@gray_tpm and white LABEL:sub@white_tpm matter, constructed as described in this paper.

Figure 3 illustrates orthogonal views of the gray and white matter tissue probability maps, where the gray matter map is obtained by evaluating the sum of the two top classes in Figure 2.

The empirical Bayes learning procedure, introduced in Blaiotta et al. (2016) to estimate suitable prior distributions for the parameters of the Gaussian mixture model, was applied here to the same data used to construct the templates. Some of the results are summarised in figure 4, where the estimated empirical prior distributions on the mean intensity of gray and white matter are depicted, with overlaid contour plots showing some of the individual posteriors (randomly selected across the entire population).

Figure 4: Prior distribution over the mean intensity of gray and white matter, in T1- and PD-weighted data.

Such results indicate that the proposed empirical Bayes learning scheme can serve to capture, not only the variability of mean tissue intensity across subjects for each of the modalities of interest, but also the amount of covariance between such modalities. Information of this sort can potentially be used in a number of different frameworks, for solving problems such as tissue segmentation, pathology detection or image synthesis.

3.1.3 Validity of groupwise registration

The performance of groupwise registration achieved by the presented algorithm was assessed by computing pairwise overlap measures for all possible couples of spatially normalised test images (i.e. images whose ground truth labels were not used for training the model). The Dice score coefficient was chosen as a metric of similarity.

Results are summarised in figure 5, where the accuracy of the algorithm presented here is compared to that achieved by the method described in Avants et al. (2010), whose implementation is publicly available, as part of the Advanced normalisation Tools (ANTs) package, through the web site Indeed, the symmetric diffeomorphic registration framework implemented in ANTs has established itself as the state-of-the-art of medical image nonlinear spatial normalisation (Klein et al., 2009).

A number of options can be customised within the template construction framework distributed with ANTs. The experiments, whose results are reported here, were performed with the settings recommended in the package documentation for brain MR data, which are also reported in table LABEL:tab:_ANTS.

Option Value
Similarity Metric Cross-correlation (CC)
Transformation model Greedy SyN (GR)
Initial rigid body yes
N4 Bias Correction yes
Number of resolution levels 4
Number of iterations
Gradient step 0.2
Number of template updates 4
Table 2: Options selected to perform groupwise registration with ANTs, using the antsMultivariateTemplateConstruction script provided with the ANTS package.

Results of this validation analyses indicate that the method presented here, in spite of not being as accurate as ANTs for aligning some subcortical brain structures (e.g. thalamus, putamen, pallidum and brainstem), provided significantly better overlap when registering cortical regions, as assessed by means of paired t-tests with a significance threshold of 0.05 and without correcting for multiple comparisons. No statistically significant differences were found between the two methods with respect to registration of the spinal cord.

Figure 5: Accuracy of groupwise registration achieved by the presented method, compared to the performance of ANTs, for different neural regions. Stars indicate statistically significant differences between the two methods, assessed by means of paired t-tests without correcting for multiple comparisons.

3.1.4 Accuracy of tissue classification

The accuracy of tissue classification achieved by the method presented in this paper was first evaluated on test data that was used to create the templates but without providing manual labels for training the model during atlas construction. The aim in this case was to determine to which extent the proposed method can capture relevant features of the training data, when manual labels are not provided, by learning from few annotated examples. Dice scores 333The Dice score over two sets and is defined as  . were computed to compare the automated segmentations produced via semisupervised groupwise model fitting, with the ground truth, obtained by merging all the gray and white matter brain structures (labels) into two tissue classes respectively, and by considering the spinal cord as a third separate class.

Figure 6: Brain and spinal cord segmentation accuracy of the presented method.

The probabilistic gray and white matter segmentations of the brain were thresholded at 0.5, in order to obtain binary label maps, directly comparable to the ground truth. To derive binary cord segmentations instead, the sum of gray and white matter posterior belonging probabilities was first computed in a subvolume containing the neck only, and then thresholded at 0.5.

Results are summarised in Figure 6, which shows the distributions of Dice scores obtained for brain gray matter, brain white matter and spinal cord.

Such results were then compared to those produced by the brain segmentation algorithm implemented in SPM12, using the standard tissue probability maps distributed with the SPM software. Results of these analyses, which are summarised in Figure 6, indicate that the population specific atlases constructed with the method presented here enable higher tissue classification accuracy, at least for test data drawn from the same population that the model was trained on but whose labels were not exploited for training. A potential source of bias in the results of this experiment is the fact that the test data was actually employed for constructing the atlases, even if the corresponding labels were not seen by the algorithm. However a more cautious k-fold cross-validation, which would have required constructing multiple templates, was not practical in this case due to the expensive computational cost of groupwise model fitting.

Such results however seem to suggest that the model presented in this paper could potentially be useful to create templates with the purpose of capturing the peculiar anatomical features of those populations that are poorly represented by standard anatomical atlases (Tang et al., 2010; Fillmore et al., 2015), such as young or elderly populations, diseased populations, or individuals belonging to different ethnic groups.

This would not only lead to more accurate segmentation results, but as a direct consequence, also increase the reliability of subsequent data analyses, which build models of the segmented data to infer or predict clinically meaningful information.

3.2 Modelling unseen data

Further validation experiments were performed to quantify the accuracy of the framework described in this paper to model unseen data, that is to say data that was not included in the atlas generation process.

Such experiments were performed on synthetic T1-weighted brain MR scans from the Brainweb database (, generated using a healthy anatomical model.

3.2.1 Accuracy of bias correction

A healthy adult brain MR model was processed by means of the algorithm discussed here, using the head and neck templates previously constructed as tissue priors. Different noise and bias field levels were added to the uncorrupted synthetic data, to test the behaviour of the proposed modelling scheme in different noise (1%, 3%, 7%) and bias conditions (20% and 40%).

The noise in these simulated images has Rayleigh statistics in the background and Rician statistics in the signal regions and its level is computed as a percent standard deviation ratio, relative to the MR signal, for a reference tissue

(Cocosco et al., 1997).

Regarding the bias field instead, 20% bias is modelled as a smooth field in the range [0.9, 1.1] while 40% bias is obtained by rescaling of the 20% field, so as to range between 0.8 and 1.2 .

1% 3% 7%
20% 0.86 0.86 0.70
Bias 40% 0.72 0.72 0.51
Table 3: Pearson’s correlation coefficients between the ground truth bias fields and those estimated by the presented algorithm, for simulated T1-weighted data.


reports the Pearson product-moment correlation coefficients between the ground truth and the estimated bias fields, for the different bias ranges and noise levels. Results indicate that the similarity between the estimated and true bias decreases for more intense non-uniformity fields and higher noise levels.

Indeed this is not surprising, as the penalty term, which enforces smoothness of the bias field, has a greater impact in determining the shape of the estimated bias when the non-uniformity fields have a larger dynamic range. Nevertheless, results reported in the following section will show how this increased mismatch between the estimated and true bias, for higher non-uniformities, does not seem to affect the accuracy of tissue segmentation. On the other hand, the accuracy of bias correction is directly related to the amount of noise corrupting the data, mainly due to how this affects the precision associated with estimation of the Gaussian mixture parameters. For a comparison of these results with the performance of SPM12 bias correction on simulated T1-weighted scans from the Brainweb database see Blaiotta et al. (2016).

3.2.2 Accuracy of tissue classification

For the same data the accuracy of tissue classification was also evaluated, by comparing the similarity between the estimated gray and white matter segmentations and the underlying anatomical model.

Results are reported in Figure 7, which shows the Dice score coefficients obtained under different bias and noise conditions.

Figure 7: Dice scores between the estimated and ground truth segmentations for brain white matter and brain gray matter, under different noise and bias conditions, for synthetic T1-weighted data.

The Brainweb database has been extensively used in the neuroimaging community to validate MR image processing algorithms. Therefore the results reported here should be directly comparable to the performance of many brain segmentation techniques present in the literature.

4 Conclusions

This paper presented a comprehensive generative framework for modelling cross-sectional MR data sets, which is intended to enable simultaneous morphometric analyses of brain and cervical spinal cord data.

From a theoretical perspective, such a framework relies on variational probability density estimation techniques to model the observed data (i.e. MR signal intensities). Additionally, a hierarchical modelling perspective is proposed, where observations from a population of subjects are used to construct empirical intensity priors, which can then serve to inform models of new data.

Shape modelling is performed via groupwise diffeomorphic registration, thus ensuring bijective (i.e. one-to-one) differentiable mappings between anatomical configurations (Miller, 2004). Such an approach enables a rigorous mathematical encoding of anatomical shapes via deformable template matching (Christensen et al., 1996), therefore providing a quantitative framework for the analysis of shape variation and covariation.

Data for training the method was collected from three different databases, two of which are publicly accessible to the research community. Results of validation experiments performed both on training and unseen test data indicate that the presented framework is suitable to perform integrated brain and cervical cord computational morphometrics.

Thus, the proposed algorithm represents a concrete solution to extract volumetric and morphometric information from large structural neuroimaging data sets, in a fully automated manner. At the same time it provides outputs that could be readily interpreted, for instance via statistical hypothesis testing, with the ultimate goal of comparing different populations, treatment effects etc.

(Ashburner and Friston, 2000).

Appendix A Derivatives of the lower bound with respect to the affine parameters

The affine parameters, for each subject , can be estimated (i.e. optimised) in a Gauss-Newton fashion, so as to maximise of the following objective function


with respect to .

The gradients and Hessians, which are useful to solve this problem are reported below. In particular, for the matching term, the following derivatives need to be computed


where is defined as






Gradients and Hessians of the penalty term are instead given by


Appendix B Derivatives of the lower bound with respect to the initial velocities

Optimisation of the initial velocities, for each image , requires maximising the following objective function


with respect to .

Here, we report the first and second derivatives of this objective function, which are useful to solve the registration problem using gradient-based techniques, such as the Gauss-Newton algorithm.

The gradient of the matching term with respect to is given by


which, making use of , can be rewritten as


where is computed, at each voxel , by


and indicates the Jacobian matrix of .

An approximated positive semidefinite Hessian of can instead be computed by discarding the second derivatives of the logarithm of tissue priors


to give