Cortical surface registration using unsupervised learning

04/09/2020 ∙ by Jieyu Cheng, et al. ∙ Harvard University 0

Non-rigid cortical registration is an important and challenging task due to the geometric complexity of the human cortex and the high degree of inter-subject variability. A conventional solution is to use a spherical representation of surface properties and perform registration by aligning cortical folding patterns in that space. This strategy produces accurate spatial alignment but often requires a high computational cost. Recently, convolutional neural networks (CNNs) have demonstrated the potential to dramatically speed up volumetric registration. However, due to distortions introduced by projecting a sphere to a 2D plane, a direct application of recent learning-based methods to surfaces yields poor results. In this study, we present SphereMorph, a diffeomorphic registration framework for cortical surfaces using deep networks that addresses these issues. SphereMorph uses a UNet-style network associated with a spherical kernel to learn the displacement field and warps the sphere using a modified spatial transformer layer. We propose a resampling weight in computing the data fitting loss to account for distortions introduced by polar projection, and demonstrate the performance of our proposed method on two tasks, including cortical parcellation and group-wise functional area alignment. The experiments show that the proposed SphereMorph is capable of modeling the geometric registration problem in a CNN framework and demonstrate superior registration accuracy and computational efficiency.



There are no comments yet.


page 10

page 12

page 13

page 19

page 21

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Non-rigid shape registration is an important area of research in medical imaging, in particular for establishing cross-subject spatial correspondence in the cerebral cortex. This type of spatial alignment has been shown to improve the statistical power of group functional MRI (fMRI) analysis Van Atteveldt et al. (2004); Frost and Goebel (2012) resulting from the improved correspondence of functional areas. Due to the geometric complexity of the cortex and the large variability between individuals, cortical surface registration remains a challenging task. Inter-subject surface alignment is commonly driven by geometric features that describe measures of cortical shape (folding), such as sulcal depth or local curvature Fischl et al. (1999); Yeo et al. (2010); Conroy et al. (2013); Tardif et al. (2015).

A widely used cortical surface registration approach is to map the surface onto the unit sphere in order to perform computations in this canonical domain. Existing efforts are mainly focused on the adaptation of registration algorithms in the Euclidean space Fischl et al. (1999); Yeo et al. (2010); Robinson et al. (2014). These aim to optimize a similarity metric between the target and the deformed source volumes, regularized by various energies Sotiras et al. (2013). FreeSurfer Fischl et al. (1999)

registers an individual surface to a probabilistic atlas computed from a representative set of subjects by minimizing the squared difference between the average convexity across subjects and that of the individual, weighted by the inverse variance of the convexity across subjects. The Multimodal Surface Matching tool 

Robinson et al. (2014)

uses a similarity between the input and reference mesh features in a coarse-to-fine manner. The deformation is driven by aligning local patches around control points, i.e. vertices of a low resolution mesh, and then propagated to the high resolution input mesh via interpolation. Spherical Demons 

Yeo et al. (2010) modifies the classical Demons method Thirion (1998)

using velocity vectors tangent to the sphere. The two-step optimization of classical Demons also holds for the spherical case in which the second step handles the deformation regularization by spherical thin plate spline interpolation. To encourage desirable mathematical properties such as invertibility, diffeomorphic transforms have seen extensive methodological development, yielding state-of-the-art tools 

Ashburner (2007); Zhang et al. (2017). Unfortunately, since these methods solve an optimization problem for each image pair, they often exhibit long execution times.

High computational costs have led to an increase in the popularity of supervised Krebs et al. (2017); Sokooti et al. (2017); Yang et al. (2016) and unsupervised Balakrishnan et al. (2019); Dalca et al. (2018, 2019) learning-based registration algorithms. Considering the difficulty of establishing ground truth spatial correspondences, supervised methods require predictions from existing algorithms Yang et al. (2016), simulations Sokooti et al. (2017), or both Krebs et al. (2017)

. In contrast, unsupervised methods make use of Spatial Transformer Networks (STN) 

Jaderberg et al. (2015) to warp the moving image in a differentiable way, enabling end-to-end training Balakrishnan et al. (2019); Dalca et al. (2019); Jason et al. (2016); Krebs et al. (2019); de Vos et al. (2019). Some unsupervised methods Balakrishnan et al. (2019); Dalca et al. (2019); Krebs et al. (2019) model a stationary velocity field as latent variables representing deformations in a generative probabilistic model. They use a scaling and squaring layer Arsigny et al. (2006) for the Lie group exponentiation of the velocity field to generate diffeomorphic transforms, thus guaranteeing topology preservation. These methods have demonstrated high quality performance in registering various types of medical images. Therefore, we build on these concepts when working with surfaces.

Recent studies have developed geometric convolutional neural networks (CNN) Su and Grauman (2017); Cohen et al. (2018); Coors et al. (2018); Seong et al. (2018); Jiang et al. (2019) that operate on a spherical manifold to solve classification and detection tasks. To address the distortions introduced by projecting signals to a planar image, regular convolutions with increased kernel sizes near polar regions have been utilized Su and Grauman (2017). Spherical CNNs encode rotational instead of translational equivariance into the network in Euclidean space to solve classification problems Cohen et al. (2018). In SphereNet, the convolution kernel on the sphere has been approximated by encoding the vertex neighborhood information on 2D tangent planes, which enables adapting existing CNN architectures to the omnidirectional setup for object detection and classification tasks Coors et al. (2018). Geometric CNNs (gCNN) Seong et al. (2018) also deal with convolution and pooling operations of a CNN on a mesh surface. However, they have so far only been tested on sex classification, using cortical thickness images. A convolutional kernel discretized by an unstructured mesh was recently proposed and evaluated on spherical MNIST classification and 3D object detection tasks Jiang et al. (2019). Most existing work, however, focuses on the construction of spherical convolutional kernels and, to the best of our knowledge, neural networks have not yet been extended to surface registration. Compared with classification or detection tasks, besides convolution and pooling operations on spheres, a learning-based registration method should address local deformations defined on spheres. However, existing spatial transformation networks Esteves et al. (2018); Tai et al. (2019) for spheres only address global deformations, which are not suitable for accommodating nonlinear deformation fields.

In this paper, we propose a diffeomorphic framework combining a generative model for surfaces with CNNs to register individual cortical surfaces to an atlas space. This framework adapts conventional VoxelMorph for registering Euclidean images Dalca et al. (2019)

to spherical manifolds. In order to address the limitations of 2D planar projection, we construct a weighted neighborhood graph defined on 2D grids, which accounts for the non-uniform metric tensor of the spherical representation to encode a stationary velocity field. Considering that the 2D projection operation samples the arc-length for each latitude to the same number of points, we also take sampling distortion into account at different latitudes in the likelihood model. We quantify the performance of our framework through two applications: the generation of cortical parcellations and the alignment of functional activations. The experimental results demonstrate that our framework yields better registration accuracy to state-of-the-art classical methods at a significantly reduced computational cost, and more accurate results compared to current learning-based methods.

The remainder of this paper is organized as follows. We first introduce the cortical registration problem, review the conventional VoxelMorph Dalca et al. (2019) framework, and propose our method and network structure. We then describe two evaluation experiments including cortical parcellation and functional alignment, and show results for these experiments. Finally, we present our discussions and conclusions.

2 Methods

Numerous studies using FreeSurfer have demonstrated its efficacy in spherical-based cortical registration. We build on ideas for our surface representation from the FreeSurfer spherical registration Fischl et al. (1999) and model the unsupervised learning structure for the registration field following VoxelMorph Dalca et al. (2019).

2.1 Registration problem definition

In the FreeSurfer spherical registration pipeline Fischl et al. (1999), surface geometry is encoded as a convexity attribute at each mesh vertex and the representation of the atlas surface is computed from a group of adult subjects. In order to register the surfaces of an individual to the atlas space, first a white matter mesh is generated and mapped to the unit sphere by minimizing metric distortion Fischl et al. (1999). Next, an optimal rotational alignment is computed by global search over two rotation angles on the sphere. Then a 2D canonical warp is computed to align the subject’s convexity pattern with that of the mean pattern encoded in the atlas, by minimizing the mean squared difference, weighted by the inverse of the atlas variance. Our goal is to compute this canonical warp using a CNN framework.

Let be the unit sphere and the corresponding scalar field over the sphere (e.g. sulcal depth or curvature) projected into 2D longitude/latitude parameterization. Let be the atlas mean image as defined in FreeSurfer, and and be the number of image rows and columns. Let be a diagonal matrix where each diagonal element denotes the variability of the corresponding feature at a particular vertex as defined in the FreeSurfer atlas variance image. The goal is to find the spatial transformation given and

that maximizes the a posteriori probability of the transform assuming certain smoothness priors on the warp.

2.2 VoxelMorph

We assume a diffeomorphic deformation based on a stationary velocity field , denoted as , and adapt a generative probabilistic model following VoxelMorph Dalca et al. (2019)

. VoxelMorph uses Maximum a Posteriori (MAP) estimation to obtain the most likely velocity field

at each voxel/pixel given a pair of images. VoxelMorph models the prior probability of

as a zero-mean multivariate normal distribution,

, where is the covariance matrix. An individual image can be estimated by warping the FreeSurfer atlas, thus we model the warped image as , where

is the inverse transformation of the atlas warping to the individual. The aim is to maximize the posterior probability

, where the marginalization over is intractable. In this case, a variational approximation is adopted with parameters , by minimizing its dissimilarity, Kullback–Leibler (KL) divergence, with the true posterior probability. For simplicity, the approximate posterior is restricted to a multivariate normal distribution where and a diagonal are functions estimated with a U-Net core Ronneberger et al. (2015), as shown in Fig. 2.

Using the above assumptions, maximizing the posterior probability can be approximated by minimizing the following loss:


where is short for . The first term describes the reconstruction loss and the second term is a KL divergence term, encouraging the estimated posterior probability to be close to the prior . VoxelMorph encourages the smoothness of the velocity field by setting , where the parameter controls the scale of the velocity field and is the graph Laplacian matrix defined on the Euclidean grid. is computed as , where is the neighborhood adjacency matrix and is the graph degree matrix. Thus, Eq. (1) can be rewritten as:


where is the number of samples used to approximate the expectation. We use . We treat the fixed atlas and the warped individual image as vectors. We denote this naive application of the registration of 2D projected images as the 2D VoxelMorph method and it serves as a benchmark in our experiments.

Unfortunately, the 2D projection step introduces two main problems, as shown in Fig. 1:

  • varying level of distortions with different latitudes (distortion increases from the equator to the poles); and

  • inability to represent the periodic property of and the geometry of the poles (an enclosed spherical surface is projected onto a rectangular image region, introducing discontinuities at the image borders).

Hence, the 2D VoxelMorph method over-weights the alignment for near-pole regions, yielding misalignment in most regions even compared to global rigid registration as shown in Section 3.

Figure 1: Illustration of a spherical mesh and its corresponding 2D rectangular grids by planar projection. and denote the longitude and latitude respectively. Regions with higher latitude, e.g. red dots here, have denser samplings compared to regions with lower latitude (e.g. blue dots), yielding greater distortions. The regions crossing (green line) become non-adjacent due to the cutting and flattening operations.

2.3 Proposed Method: SphereMorph

To address the above issues, we propose SphereMorph. We start by defining the registration problem in the spherical domain. The spherical representation of an individual’s surface is first rotated for a rigid alignment with the atlas, as in FreeSurfer. The spherical surface is parameterized by the longitude and latitude and sampled to an two-dimensional image with a geometric or functional feature, e.g. convexity, assigned as a pixel intensity measure.

Figure 2: The proposed cortical surface registration workflow (SphereMorph). The input spherical representation is first rigidly aligned to the atlas using convexity patterns and then projected onto a polar coordinate system. The Spherical U-Net core takes parameterized input image and atlas mean image as inputs and estimates the distribution, and , of the velocity field , which is then sampled and integrated using scaling and squaring steps to generate the deformation field .

Prior correction. We assume that the displacement field is smooth on the sphere considering the anatomical continuity via a graph Laplacian regularizer. We define a neighbour connectivity graph on the spherical manifold and represent the velocity with respect to Cartesian coordinates as a signal defined on this graph. Let denote the conversion from polar to Cartesian coordinates, that is , then the geodesic velocity at each vertex is given by .

Each vertex on the projected image is considered as a node and each grid edge connecting two adjacent nodes as their edge in . We connect leftmost and rightmost nodes due to the periodicity in longitude. The weight of the connection between vertices in varies with location to account for the horizontal edge distance on the spherical surface which is proportional to . Thus, we define the weight of each grid edge connecting vertices , as:


We construct the corresponding neighborhood adjacency matrix with entries and the degree matrix with diagonal entries . Finally, we denote the Laplacian of this weighted graph as and define the covariance of geodesic velocity as . Intuitively, this formulation can be seen as increasing the regularization near the poles, where the Euclidean distance between mesh nodes is small, and decreasing the weighting near the equator where the Euclidean distances are larger.

Distortion correction. For VoxelMorph, which deals with Euclidean image registration, the sampling of grid points is equally distributed. However, a spherical parameterization leads to denser sampling grids for regions at higher latitudes as shown in Fig. 1. Thus, we assign mesh locations from these regions lower weights in computing the data-fitting term, by introducing a diagonal matrix with each diagonal entry encoding the resampling weight for each vertex and model . The first data-fitting term in Eq. (1) is then modified as:


Loss function. Starting with Eq. (1) and taking into account the spherical geometry, we arrive at the below objective function:


where and . The first term in Eq. (5) is the data-fitting term, which encourages matching surfaces after warping and the second term drives the posterior to approximate the smoothness prior defined on a spherical grid.

Network structure. Figure 2 illustrates the individual stages of our pipeline. For a given vertex at location , we utilize inverse gnomonic projection, which maps points on the tangent plane to the spherical surface as in SphereNet Coors et al. (2018), to obtain the corresponding locations on the projected image for the neighbor vertex on its tangent plane. We implement the convolution and pooling operations in each local tangent patch shown in Figure 1 and build a UNet core Ronneberger et al. (2015), which contains four downsampling and four upsampling layers. Following the sampling layer, seven scaling and squaring operators take the layer output, or velocity field, and return a diffeomorphism . 2D VoxelMorph uses a dense spatial transformer layer on after displacement to retrieve the warped image while SphereMorph warps the image by computing the interpolation grids as

for transformer layer. The model is implemented in Keras with a Tensorflow backend and the ADAM optimizer as part of the VoxelMorph package. We conducted all the experiments on the same workstation with Intel Xeon X5550@2.67GHz and used NVIDIA Tesla P40C for all CNN-based methods.

3 Experimental setup

To demonstrate the accuracy and efficiency of the proposed registration framework, SphereMorph, we used two sets of experiments, cortical parcellation and fMRI group analysis, on two independent test data sets.

3.1 Data

3.1.1 Training data set:

We used the surface convexity maps of the left hemispheres of 800 randomly selected FreeSurfer-processed subjects from the publicly available Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset Mueller et al. (2005) as training data and the spherical atlas from FreeSurfer as the fixed image in our model. The ADNI dataset consists of longitudinal T1-weighted scans from 836 subjects that are divided into four classes: elderly controls (n = 252), early mild cognitive impairment (eMCI, n =215), late MCI (lMCI, n = 176), and AD (n = 193). The subjects were scanned on average 4.8 times (minimum: a single time; maximum: 11 times; 4013 scans in total), with a mean interval between scans equal to 286 days (minimum: 23 days, maximum: 1567 days). The mean age at baseline of the subjects was years. Since the ADNI project spans multiple sites, different scanners were used to acquire the images; further details on the acquisition can be found at

3.1.2 Test data sets:

(1) We used FreeSurfer-processed MRI scans of 39 subjects from a cohort recruited by the Washington University Alzheimer’s Disease Research Center (ADRC) Van Horn et al. (2001). The MRI scans were acquired on a 1.5T Vision system (Siemens, Erlangen Germany). T1-weighted magnetization-prepared rapid gradient echo (MP-RAGE) scans were obtained according to the following protocol: two sagittal acquisitions, FOV = 224, Matrix = , Resolution = , TR = 9.7 ms, TE = 4 ms, Flip angle = 10, TI = 20 ms, TD = 200 ms. Two acquisitions were averaged together to increase the contrast-to-noise ratio. For the cortical parcellation experiment, we separated the data set into two: 9 validation subjects and 30 held-out test subjects. All subjects have 34 cortical areas manually annotated Desikan et al. (2006), making them ideal for evaluating registration accuracy.

(2) Additionally, we used another set of 100 unrelated young and healthy subjects from the Human Connectome Project (HCP) Van Essen et al. (2013) as a second test set. The HCP project used state-of-the-art fMRI hardware and acquisition parameters in a sample of highly educated, healthy subjects. For each subject, seven task fMRI sessions were collected, including working memory, gambling, motor, language, social cognition, relational processing and emotional processing, totaling 48:30 of fMRI data. The acquisition parameters and minimal preprocessing of these data have been described extensively elsewhere Glasser et al. (2013); Barch et al. (2013).

3.2 Baselines

We compared our proposed registration method to rigid registration on the sphere (i.e. two rotations) and four other nonlinear registration methods: a 2D version of VoxelMorph, Multimodal Surface Matching (MSM) Robinson et al. (2014), Spherical Demons (SD) Yeo et al. (2010), and FreeSurfer (FS) spherical registration Fischl et al. (1999)

. We trained 2D VoxelMorph using the same training set as described above and selected the hyperparameter

that yielded the best performance on the validation set. MSM is a surface-based registration approach that offers significant flexibility with regards to the set of features that are used to drive the spatial alignment. MSM drives the deformation via aligning local patches around control points in a multi-resolution fashion. It is implemented on CPU using a fast, multi-resolution, discrete optimisation scheme, offering significant computational speed-up compared to other classical methods. We ran MSM over three resolution levels with five iterations per level. Spherical Demons, a fast diffeomorphic landmark-free surface registration tool implements the regularization for its objective function via iterative Gaussian smoothing. We explored a range of smoothing iteration numbers (5, 10, 15, 20) to optimize performance and used the results of 10 iterations.

3.3 Evaluation

To evaluate registration accuracy, we relied on the resulting spatial transformations to project the atlas parcellation back to individual scan space. For an accurate registration solution, the test subject’s cortical parcellations will resemble the manually outlined versions. In order to quantify how well they match, we computed the Dice overlap coefficient Dice (1945), the overall Mean Minimum Distance (MMD) as well as the individual MMD measures for each anatomical region. The Dice overlap coefficient, , quantifies the surface area overlap () between manual () and automatic method-generated parcels () and the MMD describes the discrepancy between the parcellation boundaries: where denotes the Euclidean distance between a vertex on a manual boundary and its corresponding closest vertex on an automatic method-generated boundary. Additionally, we also tested how consistently the various registration methods aligned anatomical features, such as convexity/sulcal depth and curvature.

In the second set of experiments, we mapped all individuals within the group to the HCP’s 2mm standard grayordinates space Glasser et al. (2013), using the displacement field, then computed group maps of task-evoked activations. To evaluate the quality of the group alignment quantitatively, we computed average correlations between the group-average and the projected individual activation maps across 86 task contrasts derived from the seven fMRI tasks.

4 Results

4.1 Computational Efficiency

Table 1 summarizes registration accuracy and computation time for all methods. It illustrates that the proposed method ran approximately 20 times faster than the conventional registration method in FreeSurfer. On a CPU, the default FreeSurfer pipeline takes around 13 minutes to complete the spherical registration while the total computation time of our proposed framework is approximately 0.65 minutes, including the initial alignment, deep network deployment, and displacement field mapping. Compared with other registration methods including MSM and Spherical Demons (SD), CNN-based methods provide more than an order of magnitude improvement in execution speed.

Figure 3: Cortical parcellation results for two example subjects from the ADRC dataset. For each subject, the upper rows show the cortical parcellation estimated by different registration methods and the lower rows give a closer view of the parcellation comparison in the lateral occipital region (white arrows) with manual parcellation boundary superimposed.
Figure 4: Boxplots of Dice overlap coefficients for all the 34 anatomical structures computed with respect to 2D VoxelMorph, MSM, Spherical Demons, FreeSurfer spherical registration and SphereMorph.
Method Dice overall MMD () CPU time () GPU time ()
Rigid -
2D VoxelMorph
Spherical Demons -
FreeSurfer -
Table 1: Overview of registration accuracy for rigid alignment, 2D VoxelMorph, MSM, Spherical Demons, FreeSurfer spherical registration, and our proposed method SphereMorph. SphereMorph performs in a comparable manner with FreeSurfer, while being roughly 20 times faster.
Figure 5:

Sulcal depth alignment generated by rigid, MSM, Spherical Demons and SphereMorph. The maps show group mean and standard deviation of convexity from 30 testing subjects.

Figure 6: Sulcal depth alignment generated by MSM, Spherical Demons, FreeSurfer and SphereMorph for the 100 unrelated subjects in our second test data set. The top panel shows group-average maps and the bottom panel shows group standard deviations.

4.2 Cortical Parcellation experiment

4.2.1 Parcellation Accuracy

Figure 3 shows representative cortical segmentation results from all the methods for two test subjects from the ADRC dataset. The 2D VoxelMorph-estimated annotation exhibits large differences compared to the ground truth in the lateral occipital regions (marked by white arrows), while FreeSurfer and SphereMorph provide results close to the manual annotations.

Table 1 provides an overview of registration accuracy by comparing manual annotations to parcellations generated by the different registration methods, including rigid alignment, 2D Voxelmorph, MSM, SD, FreeSurfer as well as our proposed method. Our proposed method and FreeSurfer achieved the highest accuracy. SphereMorph yields significantly higher overall Dice coefficients than MSM after performing a one-tail Wilcoxon rank sum test on their respective Dice coefficients (

). The deep learning baseline, 2D VoxelMorph, performed significantly worse. This is due to the fact that this method does not account for the distortions intrinsic to the spherical coordinate system. Figure 

4 compares parcel-wise Dice overlap coefficient values associated with the different registration methods. Our method produced higher mean Dice overlap coefficients than MSM for all structures except the Entorhinal, Paracentral and Middletemporal, and showed significant improvement in regional Dice overlap coefficient compared to MSM in the Temporalpole (), Parahippocampal (), Transversetemporal (), Caudalmiddlefrontal (), Rostralmiddlefrontal () and Lingual () regions. Compared to Spherical Demons, SphereMorph produced higher mean regional Dice coefficients than SD in 26 out of total 34 regions.

4.2.2 Group average sulcal maps

To evaluate the performance on a finer scale, we also computed the group mean sulcal depth maps after registration for the test data. The resulting mean and standard deviation maps are displayed in Figure 5. We assume that a better group alignment leads to a sharper group mean and smaller group variation. As expected, the group mean maps provide more detailed information for all nonlinear registration methods than rigid alignment. Moreover, all these methods show smaller standard deviations, suggesting that they provide better alignment in convexity.

All nonlinear registration methods exhibited similar distributions of sulcal variations across brain regions. Specifically, the pre-central, post-central and insula regions show lower standard deviation, suggesting higher agreement of cross-subject convexity in these regions.

4.2.3 Robustness Analysis

We investigated the impact of pole positioning on the registration accuracy for SphereMorph by projecting the sphere using 9 different north pole locations spanning ,

and registering the corresponding 2D planar images with corresponding atlas data. We generated the deformed sphere and then re-computed all the region- and distance-based metrics. To evaluate the robustness with respect to different projection centers, we conducted analysis of variance (ANOVA) between all the computed evaluation metrics for these 9 groups. None of the ANOVA analyses (overall Dice:

, overall MMD: ) found any significant differences between the 9 groups of cortical parcellations, indicating that the registration accuracy is not sensitive to the arbitrary location of the poles.

Figure 7: Task fMRI alignment resulting from convexity-driven spatial registration using MSM, Spherical Demons, FreeSurfer and SphereMorph. The maps show group activation results from 100 subjects for the Gambling task reward contrasts (with the outline of Freesurfer-based cortical parcellations for easier interpretation). SphereMorph improves the functional alignment across subjects over MSM, particularly in the inferiorparietal and precuneus regions, indicated by the arrows.

Figure 8: Comparison in the correlation coefficients computed between the group-average and individual activation maps after registration across 86 task contrasts. Higher coefficients suggest better agreement in task activations across subjects.

Figure 9: Comparison of agreement in 86 task activations using different features. We take the ‘sulc’ and ‘curv’ atlases from FreeSurfer and compute a ‘T1/T2’ atlas from three rounds of CNN registration within group. Contrasts showing significant improvement () after incorporation of the T1/T2 feature are pointed out by arrows.

Figure 10: Task fMRI alignment driven by using different feature maps (sulcal depth and sulcal depth + T1/T2). The maps show group activation results from 100 subjects for the MOTOR task: RF contrasts, with the outline of Freesurfer-based cortical parcellations for easier interpretation. The arrows highlight the paracentral and superiorfrontal areas, where using the T1/T2 map leads to a larger region with positive response compared to using sulcal depth as the only input.

4.3 Cross-subject alignment of fMRI activation maps

4.3.1 Group average maps

Figure 6 illustrates the quality of group-wise alignment of convexity after MSM, Spherical Demons, FreeSurfer, and our method on our second test data set. Our method yielded smaller variations within the group than MSM (), just as in the case of the cortical parcellation experiments.

While we expect folding patterns to be predictive of functional areas, this relationship is variable and complex. In order to assess how well the various methods align functionally homologous regions across subjects, we computed group average functional activation maps for all 86 tasks using registration driven by sulcal depth information. Figure 7 compares group activation results from the 100 subjects for the Gambling reward contrast. Our method improves the functional alignment across subjects over MSM, particularly in the inferiorparietal and precuneus regions, indicated by the arrows. To evaluate the group alignment performance quantitatively, we computed the average correlations between the group-average and individual activation maps after registration across 86 task contrasts derived from seven tasks. Figure 8 displays average correlations for the different registration methods. For all 86 contrasts, SphereMorph resulted in significantly higher correlation coefficients than MSM (increase of and relative increase of ).

Method correlation coefficients difference with MSM relative difference ()
MSM - -
Spherical Demons
Table 2: Comparison of group functional alignment for MSM, Spherical Demons, FreeSurfer spherical registration, and our proposed method SphereMorph. Correlation coefficients are calculated for the average correlation coefficients of all 86 contrasts.

4.3.2 Alignment performance analysis using multi-modality features

We evaluated the performance for the proposed SphereMorph using curvature and (separately) T1/T2 features to their respective atlases, using our sulcal depth alignment as initialization. We computed a ‘T1/T2’ atlas from three rounds of CNN registration within group. Figure 9 compares the group agreements in task activation for only sulc and two cascade processes using respective curvature and T1/T2 maps. Using T1/T2 as input post sulcal depth-based registration significantly improves the group agreement level () in four MOTOR tasks, including RF, RF-AVG, neg-RF and AVG-RF, as indicated by the arrows. All four tasks are related to right finger tapping. Figure 10 displays the group average activation for RF constrast. Using the T1/T2 map leads to a larger region with positive response in the paracentral and superiorfrontal areas (marked by arrows) compared to using sulcal depth as the only input.

5 Discussions and Conclusions

In this paper we present a learning-based method, SphereMorph, for registering cortical surfaces and investigate its performance using two sets of experiments, by comparing it to rigid alignment, 2D VoxelMorph, MSM, Spherical Demons, and FreeSurfer. The proposed SphereMorph yields results that are comparable or superior to the state-of-the-art methods for alignment of folding patterns, cortical parcellation and functional alignment, while offering approximately a computational speedup, showing the accuracy and efficiency of our method for cortical registration.

SphereMorph takes 2D parameterized images as input, outputs the deformation field in a 2D canonical space, then warps the sphere in the original Cartesian space. We directly address two issues associated with the 2D projection: the effects of substantial distortions introduced by the parameterization and the violation of continuity at the borders of the 2D plane (i.e. the imposition of spherical topology). To account for the distortion introduced by the parameterization, we modify the data likelihood term and construct the deformation velocity on a graph weighted by the metric tensor of the parameterization. We use the same weights as in existing work by Khasanova and Frossard Khasanova and Frossard (2017) to construct our graph as it has been shown to be capable of encoding the geometry of an omnidirectional camera in the final feature representation of an image. In the implementation of network structure, we encode the neighborhood information by leveraging a spherical kernel defined in SphereNet Coors et al. (2018) instead of a conventional 2D kernel for the network convolution and pooling operations. In addition, we modify the spatial transformer layer to represent the periodicity of longitude in spherical coordinates. Combining these adaptations, our model shows a higher agreement with manual parcellations compared to 2D VoxelMorph, which does not account for distortions and topological changes induced by the 2D parameterization. For 2D VoxelMorph, each point on the rectangular grid contributes equally to the registration, resulting in the alignment of regions near the poles affecting the energy functional more than other regions with the same area in the Euclidean embedding space. Thus, the optimized deformation is over-fitted in these regions. Our experimental results confirm this when comparing the automatically generated parcellations to the manual ones. With an excessive weighting of the alignment of these polar regions, 2D VoxelMorph performs even worse than the initial alignment.

Our proposed registration framework is applicable to any signal or feature that can be represented or sampled onto the cortical surface. However, in the cortical parcellation experiment, the ADRC dataset only has structural scans. Hence, we trained our model as well as MSM and SD using convexity values to drive the registrations. Our proposed SphereMorph shows significant smaller group variations and higher overall Dice coefficients when compared to MSM, suggesting SphereMorph achieves better structural alignment than MSM.

In addition to the accuracy of the structural alignment, when using the same feature (i.e. convexity) for registration, both qualitative and quantitative evaluation results for the task-related experiments demonstrate that the proposed method also yields higher agreement of functional regions across subjects than the current registration method in the Human Connectome Project (HCP) pipeline Glasser et al. (2013). SphereMorph improves the within-group correlation coefficients significantly for all tasks compared with MSM. The group agreement in convexity is also higher after SphereMorph registration than with MSM. This implies the relationship between cortical folding patterns and boundaries of functional areas as demonstrated in previous work Hinds et al. (2008). However, using folding-based features alone may not be sufficient to provide an accurate functional alignment across the entire cortex due to regions with highly variable folding patterns across subjects, as well as structure-function variability. Thus, in the functional alignment experiments, we take the sulcal depth registration as initialization and then use cortical T1/T2 maps as input of our network for a multi-step alignment. The cortical T1/T2 maps are computed based on the ratio of T1-weighted to T2-weighted images which correlates with many functionally distinct areas in individual subjects Glasser and Van Essen (2011). The combination of T1/T2 and sulcal depth shows significant improvement in group alignment of MOTOR-related tasks, indicating the correlation between T1/T2 and MOTOR-related regions.

Our proposed model can include other contrasts in addition to the convexity metrics used above to drive the registration. Existing studies have shown the improvement in group registration accuracy relying on features generated from resting-state functional MRI (rfMRI) after conventional convexity-driven registration  Robinson et al. (2014); Tong et al. (2017). In the future, we will investigate the use of these to further improve the accuracy of our technique.

6 Acknowledgement

Support for this research was provided in part by the BRAIN Initiative Cell Census Network grant U01MH117023, the National Institute for Biomedical Imaging and Bioengineering (P41EB015896, 1R01EB023281, R01EB006758, R21EB018907, R01EB019956, 5R03EB022754-02), the National Institute on Aging (1R56AG064027, 5R01AG008122, R01AG016495), the National Institute of Mental Health 5U01MH109589-04, the National Institute of Diabetes and Digestive and Kidney Diseases (1-R21-DK-108277-01), the National Institute for Neurological Disorders and Stroke (R01NS0525851, R21NS072652, R01NS070963, R01NS083534, 5U01NS086625, 5U24NS10059103, R01NS105820), and was made possible by the resources provided by Shared Instrumentation Grants 1S10RR023401, 1S10RR019307, and 1S10RR023043. Additional support was provided by the NIH Blueprint for Neuroscience Research (5U01-MH093765), part of the multi-institutional Human Connectome Project. In addition, BF has a financial interest in CorticoMetrics, a company whose medical pursuits focus on brain imaging and measurement technologies. BF’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.


  • V. Arsigny, O. Commowick, X. Pennec, and N. Ayache (2006) A log-Euclidean framework for statistics on diffeomorphisms. In MICCAI, pp. 924–931. Cited by: §1.
  • J. Ashburner (2007) A fast diffeomorphic image registration algorithm. NeuroImage 38 (1), pp. 95–113. Cited by: §1.
  • G. Balakrishnan, A. Zhao, M. Sabuncu, J. Guttag, and A. V. Dalca (2019) VoxelMorph: a learning framework for deformable medical image registration. IEEE Transactions on Medical Imaging 38, pp. 1788–1800. Cited by: §1.
  • D. M. Barch, G. C. Burgess, M. P. Harms, S. E. Petersen, B. L. Schlaggar, M. Corbetta, M. F. Glasser, S. Curtiss, S. Dixit, C. Feldt, et al. (2013) Function in the human connectome: task-fMRI and individual differences in behavior. NeuroImage 80, pp. 169–189. Cited by: §3.1.2.
  • T. S. Cohen, M. Geiger, J. Köhler, and M. Welling (2018) Spherical cnns. In International Conference on Learning Representations (ICLR), Cited by: §1.
  • B. R. Conroy, B. D. Singer, J. S. Guntupalli, P. J. Ramadge, and J. V. Haxby (2013) Inter-subject alignment of human cortical anatomy using functional connectivity. NeuroImage 81, pp. 400–411. Cited by: §1.
  • B. Coors, A. Paul Condurache, and A. Geiger (2018) SphereNet: learning spherical representations for detection and classification in omnidirectional images. In

    Proceedings of the European Conference on Computer Vision (ECCV)

    pp. 518–533. Cited by: §1, §2.3, §5.
  • A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu (2018) Unsupervised learning for fast probabilistic diffeomorphic registration. MICCAI 11070, pp. 729–738. Cited by: §1.
  • A. V. Dalca, G. Balakrishnan, J. Guttag, and M. Sabuncu (2019) Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. Medical Image Analysis 57, pp. 226–236. Cited by: §1, §1, §1, §2.2, §2.
  • B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Staring, and I. Išgum (2019) A deep learning framework for unsupervised affine and deformable image registration. Medical image analysis 52, pp. 128–143. Cited by: §1.
  • R. S. Desikan, F. Ségonne, B. Fischl, B. T. Quinn, B. C. Dickerson, D. Blacker, R. L. Buckner, A. M. Dale, R. P. Maguire, B. T. Hyman, et al. (2006) An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage 31 (3), pp. 968–980. Cited by: §3.1.2.
  • L. R. Dice (1945) Measures of the amount of ecologic association between species. Ecology 26 (3), pp. 297–302. Cited by: §3.3.
  • C. Esteves, C. Allen-Blanchette, X. Zhou, and K. Daniilidis (2018) Polar transformer networks. In International Conference on Learning Representations (ICLR), Cited by: §1.
  • B. Fischl, M. I. Sereno, R. B. Tootell, and A. M. Dale (1999) High-resolution intersubject averaging and a coordinate system for the cortical surface. Human Brain Mapping 8 (4), pp. 272–284. Cited by: §1, §1, §2.1, §2, §3.2.
  • M. A. Frost and R. Goebel (2012) Measuring structural–functional correspondence: spatial variability of specialised brain regions after macro-anatomical alignment. NeuroImage 59 (2), pp. 1369–1381. Cited by: §1.
  • M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, et al. (2013) The minimal preprocessing pipelines for the human connectome project. NeuroImage 80, pp. 105–124. Cited by: §3.1.2, §3.3, §5.
  • M. F. Glasser and D. C. Van Essen (2011) Mapping human cortical areas in vivo based on myelin content as revealed by T1- and T2-weighted MRI. Journal of Neuroscience 31 (32), pp. 11597–11616. Cited by: §5.
  • O. P. Hinds, N. Rajendran, J. R. Polimeni, J. C. Augustinack, G. Wiggins, L. L. Wald, H. D. Rosas, A. Potthast, E. L. Schwartz, and B. Fischl (2008) Accurate prediction of v1 location from cortical folds in a surface coordinate system. NeuroImage 39 (4), pp. 1585–1599. Cited by: §5.
  • M. Jaderberg, K. Simonyan, A. Zisserman, et al. (2015) Spatial transformer networks. In Advances in Neural Information Processing Systems, pp. 2017–2025. Cited by: §1.
  • J. Y. Jason, A. W. Harley, and K. G. Derpanis (2016) Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–10. Cited by: §1.
  • C. Jiang, J. Huang, K. Kashinath, P. Marcus, M. Niessner, et al. (2019) Spherical CNNs on unstructured grids. In International Conference on Learning Representations (ICLR), Cited by: §1.
  • R. Khasanova and P. Frossard (2017) Graph-based classification of omnidirectional images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 869–878. Cited by: §5.
  • J. Krebs, H. Delingette, B. Mailhé, N. Ayache, and T. Mansi (2019) Learning a probabilistic model for diffeomorphic registration. IEEE Transactions on Medical Imaging 38 (9), pp. 2165–2176. Cited by: §1.
  • J. Krebs, T. Mansi, H. Delingette, L. Zhang, F. C. Ghesu, S. Miao, A. K. Maier, N. Ayache, R. Liao, and A. Kamen (2017) Robust non-rigid registration through agent-based action learning. In MICCAI, pp. 344–352. Cited by: §1.
  • S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. R. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga, and L. Beckett (2005) Ways toward an early diagnosis in Alzheimer’s disease: the alzheimer’s disease neuroimaging initiative (ADNI). Alzheimer’s & Dementia 1 (1), pp. 55–66. Cited by: §3.1.1.
  • E. C. Robinson, S. Jbabdi, M. F. Glasser, J. Andersson, G. C. Burgess, M. P. Harms, S. M. Smith, D. C. Van Essen, and M. Jenkinson (2014) MSM: a new flexible framework for Multimodal Surface Matching. NeuroImage 100, pp. 414–426. Cited by: §1, §3.2, §5.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §2.2, §2.3.
  • S. Seong, C. Pae, and H. Park (2018) Geometric convolutional neural network for analyzing surface-based neuroimaging data. Frontiers in Neuroinformatics 12, pp. 42. Cited by: §1.
  • H. Sokooti, B. de Vos, F. Berendsen, B. P. Lelieveldt, I. Išgum, and M. Staring (2017) Nonrigid image registration using multi-scale 3D convolutional neural networks. In MICCAI, pp. 232–239. Cited by: §1.
  • A. Sotiras, C. Davatzikos, and N. Paragios (2013) Deformable medical image registration: a survey. IEEE Transactions on Medical Imaging 32 (7), pp. 1153–1190. Cited by: §1.
  • Y. Su and K. Grauman (2017) Learning spherical convolution for fast features from 360 imagery. In Advances in Neural Information Processing Systems (NIPS), pp. 529–539. Cited by: §1.
  • K. S. Tai, P. Bailis, and G. Valiant (2019) Equivariant transformer networks. In

    International Conference on Machine Learning (ICML)

    Cited by: §1.
  • C. L. Tardif, A. Schäfer, M. Waehnert, J. Dinse, R. Turner, and P. Bazin (2015) Multi-contrast multi-scale surface registration for improved alignment of cortical areas. NeuroImage 111, pp. 107–122. Cited by: §1.
  • J. Thirion (1998) Image matching as a diffusion process: an analogy with maxwell’s demons. Medical Image Analysis 2 (3), pp. 243–260. Cited by: §1.
  • T. Tong, I. Aganj, T. Ge, J. R. Polimeni, and B. Fischl (2017) Functional density and edge maps: characterizing functional architecture in individuals and improving cross-subject registration. NeuroImage 158, pp. 346–355. Cited by: §5.
  • N. Van Atteveldt, E. Formisano, R. Goebel, and L. Blomert (2004) Integration of letters and speech sounds in the human brain. Neuron 43 (2), pp. 271–282. Cited by: §1.
  • D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub, K. Ugurbil, W. H. Consortium, et al. (2013) The WU-Minn human connectome project: an overview. NeuroImage 80, pp. 62–79. Cited by: §3.1.2.
  • J. D. Van Horn, J. S. Grethe, P. Kostelec, J. B. Woodward, J. A. Aslam, D. Rus, D. Rockmore, and M. S. Gazzaniga (2001) The functional magnetic resonance imaging data center (fMRIDC): the challenges and rewards of large–scale databasing of neuroimaging studies. Philosophical Transactions of the Royal Society of London B: Biological Sciences 356 (1412), pp. 1323–1339. Cited by: §3.1.2.
  • X. Yang, R. Kwitt, and M. Niethammer (2016) Fast predictive image registration. In Deep Learning and Data Labeling for Medical Applications, pp. 48–57. Cited by: §1.
  • B. T. Yeo, M. R. Sabuncu, T. Vercauteren, N. Ayache, B. Fischl, and P. Golland (2010) Spherical demons: fast diffeomorphic landmark-free surface registration. IEEE Transactions on Medical Imaging 29 (3), pp. 650–668. Cited by: §1, §1, §3.2.
  • M. Zhang, R. Liao, A. V. Dalca, E. A. Turk, J. Luo, P. E. Grant, and P. Golland (2017) Frequency diffeomorphisms for efficient image registration. In IPMI, pp. 559–570. Cited by: §1.