DeepCSR: A 3D Deep Learning Approach for Cortical Surface Reconstruction

10/22/2020 ∙ by Rodrigo Santa Cruz, et al. ∙ 17

The study of neurodegenerative diseases relies on the reconstruction and analysis of the brain cortex from magnetic resonance imaging (MRI). Traditional frameworks for this task like FreeSurfer demand lengthy runtimes, while its accelerated variant FastSurfer still relies on a voxel-wise segmentation which is limited by its resolution to capture narrow continuous objects as cortical surfaces. Having these limitations in mind, we propose DeepCSR, a 3D deep learning framework for cortical surface reconstruction from MRI. Towards this end, we train a neural network model with hypercolumn features to predict implicit surface representations for points in a brain template space. After training, the cortical surface at a desired level of detail is obtained by evaluating surface representations at specific coordinates, and subsequently applying a topology correction algorithm and an isosurface extraction method. Thanks to the continuous nature of this approach and the efficacy of its hypercolumn features scheme, DeepCSR efficiently reconstructs cortical surfaces at high resolution capturing fine details in the cortical folding. Moreover, DeepCSR is as accurate, more precise, and faster than the widely used FreeSurfer toolbox and its deep learning powered variant FastSurfer on reconstructing cortical surfaces from MRI which should facilitate large-scale medical studies and new healthcare applications.



There are no comments yet.


page 2

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The study of many neurodegenerative diseases and psychological disorders, rely on the analysis of the cerebral cortex using magnetic resonance imaging (MRI) [32, 6, 1]. As shown in

fig:intro_alla, the cortex can be visually described as the inner volume between two cortical surfaces: the inner surface which is the interface between the white matter (WM) and the gray matter (GM) tissues, and the outer surface which is the interface between the gray matter tissue and the cerebrospinal fluid (CSF). Therefore, the goal of cortical surface reconstruction is to estimate reliable, accurate, and topologically correct 

[30] triangular meshes, for the inner and outer cortical surfaces from a given MRI.

Reconstructing cortical surfaces from MRI is a challenging problem. First, cortical surfaces vary significantly across individuals as exemplified in fig:intro_allb by overlaying axial slices of co-registered MRIs from 3 different patients. Second, voxels near the cortex tend to hold more than one tissue type due to the discrete nature of MRI and the folding patterns of the cortical surfaces which is known as partial volume effect (PVE) [2] in medical imaging. Consequently, sub-voxel variations of these surfaces can not be captured by pure voxel-wise segmentation approaches. This limitation leads to oversmoothed reconstructed cortical surfaces even when manual expert segmentation is employed as shown in fig:intro_allc. Finally, we need to enforce the spherical topology (, homeomorphic to a sphere) of the reconstructed surfaces to allow surface-based analysis [39, 34] and visualizations [7]. fig:intro_alld emphasizes that arbitrarily small defects, known as holes, can drastically change the topology of the reconstructed surface.

Figure 1: Cerebral cortex and its reconstruction challenges. (a) Depicts the brain cortex from an axial view. It also provides schematics for the partial volume effect (PVE) problem and the inaccurate boundary estimation using partial volume mapping. (b) Overlays the cortical surfaces of co-registered MRIs from 3 different individuals exemplifying that the visual variability across individuals is like a fingerprint. (c) Shows that meshes obtained from manual segmentation provided by experts, e.g. Neuromorphometrics, can not capture high-frequency details in the cortical surfaces. (d) Present examples of small surface mesh defects that change the topology of the reconstructed surface.

Despite these challenges, there exist frameworks for cortical reconstruction from MRI [5, 21, 18, 40, 16, 4, 11]. Traditionally, they consist of extensive pipelines of hand-crafted image processing algorithms subject to careful hyper-parameter tuning (, thresholds, iteration numbers, and convergence criterion) and very long runtimes which prevent them from being employed in healthcare applications where immediate results are critical. For instance, the well-known FreeSurfer’s V6.0 [5] takes around six hours per scan depending on the quality of the input MRI. Concurrently to our work, Henschel et al. [11] propose FastSurfer which alleviates this burden by integrating a fast and accurate deep learning based brain segmentation model to the FreeSurfer cortical surface reconstruction pipeline. However, these frameworks still rely on voxel-wise segmentation of the input MRI which has a limited resolution or employ partial volume mapping which also fails to define the tissue boundaries accurately. More specifically, multiple configurations of a tissue boundary have the same partial volume assignment as illustrated in fig:intro_alla.

In this paper, we propose a 3D deep learning framework for cortical surface reconstruction from MR images named DeepCSR. More specifically, we first reformulate this problem as the prediction of an implicit surface representation for points in a continuous coordinate system. Then, the cortical surfaces are extracted using this implicit surface representation, a lightweight topological correction method, and an isosurface mesh extraction technique. Since our framework predicts implicit surface representation for real-valued points instead of voxels, providing a continuous approximation to surfaces, we can reconstruct cortical surfaces at a higher resolution than the input MR image. This continuous formalism allows us to overcome the PVE problem without relying on inaccurate estimations like partial volume mapping. We also develop a neural network architecture with hypercolumn features able to capture local information from the input MR image allowing the reconstruction of fine details describing the cortical folding.

In order to validate our approach, we first provide ablative studies of its main components. Then, we compare DeepCSR to the FreeSurfer V6.0 cross-sectional pipeline and FastSurfer for cortical reconstruction on the Test-Retest [22] and MALC [19, 23] datasets. We conclude that the proposed DeepCSR is able to reconstruct cortical surfaces faster and with higher reliability than the competitors, thus facilitating new healthcare applications.

2 Related Work

Cortical surface reconstruction.

Traditionally, cortical surface reconstruction frameworks involve a lengthy sequence of image processing techniques [5, 21, 18, 40, 16, 4]. For instance, the widely used FreeSurfer [5] performs multiple image normalization routines, linear and non-linear registration techniques, brain tissue labeling, surface fitting models, and topology correction between other techniques to reconstruct the cortical surfaces. The other traditional frameworks follow a similar approach mostly differing in the mechanism used to fit the desired surfaces onto the segmented volumes. As examples, Kim et al. [16] leverage a Laplacian field with stretch and self-proximity regularization terms to prevent mesh self-intersection and Han et al. [9] introduce a topology-preserving geometric deformable surface model. Therefore, these frameworks require a considerable processing time and eventually manual intervention of experts to fine-tune the parameters of some of these image processing techniques [14] which limit their range of applications. Differently, we propose a swift framework able to reconstruct cortical surfaces efficiently by leveraging modern 3D deep learning models.

Concurrently to our work, Henschel et al. [11] propose a faster variant for the FreeSurfer pipeline named FastSurfer. In that work, expensive computations of FreeSurfer are replaced with modern and lightweight alternatives as a deep learning model for whole-brain segmentation and a spectral mesh processing algorithm for spherical mapping. However, this approach still relies on voxel-wise brain segmentation which does not approximate continuous surfaces well due to its discrete nature as exemplified in fig:intro_allc. In contrast, we propose a 3D deep learning model that directly predicts implicit surface representations for real coordinates in the brain MRI providing a continuous approximation to the cortical surfaces. Nonetheless, it is also important to acknowledge that FreeSurfer and FastSurfer are complete brain morphometry tools [41] providing further volumetric and surface-based analysis which is beyond the scope of this work. Note also that recent approaches tend to indicate that one could bypass the estimation of cortical surfaces directly predicting averaged morphometric measurements from the MR images [33, 37].

Deep learning models for 3D reconstruction.

It comprises the set of deep learning models that learn to reconstruct 3D geometries, , objects and scenes, from data [12, 38]. These deep learning models can be broadly categorized according to the 3D shape representation they operate on as either mesh-based, voxel-based, or implicit surfaces models. Mesh-based methods exploit explicit representations like template meshes [31, 8], graphs [45, 47], and parametric 3D models [17, 27] to predict surfaces and object shapes. While these approaches are very convenient, they are also prone to produce noisy meshes with a large number of self-intersections which is very undesirable for cortical surface reconstruction. On the other hand, voxel-based methods reconstruct 3D shapes by predicting 3D voxel-grids of surface representations like discretized signed distances [28] and level-sets [26]. The main limitation of these approaches is to treat the output 3D space as a discretized grid that may not capture the fine details of the cortical surface folding. On the other hand, implicit surface models explore continuous representations of the 3D space where the objects are defined as 3D scalar fields [25, 28]. These approaches are memory efficient and can explore the finer details of the target geometries. Therefore, we follow this approach and develop a novel deep learning model to reconstruct cortical surfaces from MR images.

3 Method

Figure 2: DeepCSR’s neural network architecture. It receives as input an image and the coordinates of a point in the template coordinate system . Then, it predicts the implicit surface representation for the inner and outer cortical surfaces divided into left and right hemispheres.

In this section, we introduce DeepCSR, a deep learning framework for cortical surface reconstruction from MR images. First, we describe how to implicitly define surfaces by functions on the 3D Euclidean space, and how to learn these functions using a deep learning model. Then, we show how to perform cortical surface reconstruction using this model.

3.1 Learning Implicit Surface Representation

In order to ease the presentation, we focus on the learning of a single target surface, but this framework can be generalized to multiple surfaces by introducing mild modifications. Without loss of generality, one can implicitly define a Lipschitz surface by the -level set of a continuous function as,


where the function maps to a scalar for every point in the 3D Euclidean space. Furthermore, given , the mesh representation of can be obtained by iso-surface extraction algorithms like marching cubes [20] or ray casting [44]. Therefore, reconstructing a specific surface can be thought as modeling a function such that its -level set defines the desired geometry.

For a given surface , the function can be modeled, for instance, using occupancy field or signed distance function. The former is a binary valued function defined as,


where is the indicator function evaluating to one, if its predicate is true, and zero otherwise. In this case, the level set of interest to retrieve the surface is . The occupancy field divides the 3D space in the interior and exterior of the oriented surface . On the other hand, the signed distance function consists of the Euclidean distance between every point in the 3D space and its projection, , on the surface . Mathematically,


Therefore, assigns positive distances to points in the interior of the surface , zero to points on the surface (, ), and negative distances otherwise. These tools are widely used in surface modeling, then we evaluate both in the context of cortical surface reconstruction.

Our final goal is to reconstruct surfaces according to an observed MR image. Rather than conditioning the function on a specific surface, one should condition it on an input 3D MR image. While this formulation seems logical, we notice that MR images may be defined in different coordinate systems producing an unbounded input space for learning. We overcome this difficulty by co-registering the input MR images to a brain template constraining the space where the target surfaces exists to the coordinates of the bounding box containing the template brain in the template coordinate system. Therefore, our problem boils down to model the function,


where is a closed and bounded subspace where the template brain is defined and denotes the space of MR images represented by a 3D grid of voxels.

Using a machine learning approach to model such a function, we first define a dataset

of MR images and their corresponding surfaces which can be obtained using traditional pipelines for automatic cortical surface reconstruction [5, 40, 4]. Then, we parametrize the function in terms of learnable parameters and minimize the regularized empirical risk on the dataset . This learning problem can be stated as,


where is some regularization function on the parameters and is an appropriate cost function measuring discrepancies between the predicted implicit surface representation and the true implicit surface representation for the given point , MR image , and its corresponding surface . In the case of occupancy field, we minimize the binary cross-entropy classification loss, while for signed distance function we opt for the -loss.

In order to use this approach for cortical surface reconstruction, we extend the scalar-valued functions

to a vector-valued function

where the predicted values are the surfaces representations for the inner and outer cortical surfaces further divided into left and right brain hemispheres. Consequently, the learning problem descried by eq:method:objective is solved jointly for these four cortical surfaces across all of the MRIs in the dataset . This formulation is an instance of multi-task learning which often produces models with good generalization ability [36].

In practice, we approximate the integral in eq:method:objective by a finite sum of sampled locations in the reference space . We observed that the sampling scheme employed is critical to the quality of the reconstructed surface. More specifically, as shown later in sec:exp:ablation, the proposed framework reconstructs more accurate cortical surfaces when we sample more points near the surface, in addition to uniformly on the template space . In order to sample points near the target surface, we first sample faces proportionally to their area, then we sample points uniformly in these faces using the triangle point picking method [46]. Finally, we perturb these sampled points on the surface by adding a Gaussian centered at zero with variance getting points near the surface. Such a sampling scheme provides global information to correctly align the predicted surface to the brain and a lot of local information to capture the highly folded geometry of the brain cortex.

3.2 Neural Network Architecture and Hypercolumn Features

We implement the parameterized function as an encoder-decoder neural network. The encoder takes as input a point represented by its coordinates in the MNI105 space and a co-registered MR image as a 3D voxel grid

. It first processes the input MR image by a sequence of 3D convolutional layers (Conv3D), ReLu activation functions, max pooling layers (Max3D), and fully connected layers (FC) producing multiple feature maps as indicated in the top of

fig:method:model. The convolutional layers use

kernels, stride equal to two, and padding equal to one, with the exception of the first convolutional layer where the stride is equal to one. Our goal is to produce a rich hierarchy of visual features with different level of details. Then, inspired by

Hariharan et al. [10]

, we project the input point coordinates into each feature map, linearly interpolate a feature value at these projected locations, and concatenate these interpolated values to form a 512-dimensional hypercolumn feature vector

. This hypercolumn vector holds global and local visual cues for predicting the surface representation at the input point since it collects features from different levels of our hierarchy of features maps. As shown later in sec:exp:ablation, this architectural design is critical to capture the high frequency details on the target surfaces.

On the other hand, the decoder receives as input the point coordinates and its corresponding hypercolumn vector generated by the encoder for the image

as previously described. The decoder processes these inputs through a sequence of fully connected layers, conditional batch normalization layers (CBN)

[29] and ReLU activation functions as also indicated in fig:method:model, outputting the predicted implicit surfaces representation for the inner and outer cortical surfaces divided into left and right brain hemispheres. The decoder network uses skip-connections and follows the method proposed by Mescheder et al. [25]. However, our CBN layers compute a non-linear function conditioned on the hypercolumn vector which encodes global and local visual cues for the prediction of the desired surfaces representation.

We train such a model from scratch by optimizing the objective in

eq:method:objective extended for multiple surfaces using stochastic gradient descent. More specifically, we use Adam optimizer with an initial learning rate equal to

and back-propagate the loss computed on mini-batches of 5 MR images and 1024 sampled points per image from a precomputed pool of 4 million points whereby 10% of them are sampled uniformly in and 90% are sampled near the target surfaces as explained in sec:method:learning.

3.3 Reconstructing Cortical Surfaces

Figure 3: DeepCSR framework for cortical surface reconstruction from MRI.

DeepCSR receives as input a MRI scan and outputs mesh representations for the outer and inner cortical surfaces further divided into the left and right brain hemispheres. As illustrated in fig:method:overview, we first perform an affine registration of the input MRI scan to the MNI105 brain template [24]. This registration aims to unify the coordinate systems across different MR scans easing the learning and prediction of implicit surfaces.

Second, we construct a Cartesian grid of points at the desired resolution by dividing the template bounding box into evenly spaced points. Using the trained neural network , we predict implicit surface representations at these points, represented by their continuous coordinates, for those four cortical surfaces according to the input MRI. Throughout this paper we reconstruct cortical surfaces using points, unless mentioned, providing sub-voxel precision. The model only needs to extract the feature maps once and can process the points in parallel, generating surfaces at high resolution efficiently.

Third, since most of the applications of cortical surface reconstruction require surfaces with spherical topology (, genus zero) as discussed in sec:intro, we employ a lightweight topology correction algorithm to fix the topological defects caused by wrong predictions of the implicit surface representation. Specifically, we use the method proposed by Bazin and Pham [3] which consists of a spherical level-set evolution that avoids evolving over critical points [30], outputting a 3D implicit surface volume with guaranteed spherical topology. Each target surfaces’ topology is corrected independently, and these tasks are performed in parallel for efficiency. Note that the proposed framework is agnostic to the topology correction method allowing the application of other techniques.

Left Outer Surface Right Outer Surface Left Inner Surface Right Inner Surface
EMD AD () HD () EMD AD () HD () EMD AD () HD () EMD AD () HD ()
DeepCSR (SDF.) 7.084 0.298 0.654 7.083 0.294 0.651 6.440 0.267 0.562 6.401 0.260 0.542
DeepCSR (Occ.)
Uniform Sampling
No hypercolumns
Single Surface Model
Table 1: Results of the ablation study on the proposed DeepCSR framework for cortical surface reconstruction from MRI using the ADNI study data [15].

Finally, in order to obtain the mesh representation of the target surfaces, we apply a topology preserving marching cubes algorithm [20]. The predicted surfaces are in the template brain coordinate system, but these can be mapped back to the native space using the inverse of the transformation obtained during the registration of the input MR scan.

4 Experiments

We now evaluate the performance of our method on the reconstruction of cortical surfaces from MR images. In sec:exp:ablation, we present an ablative study while, in sec:exp:freesurfer, we compare it to the FreeSurfer V6 (cross-sectional pipeline) and FastSurfer.

4.1 Ablation Studies

We now perform experiments with variations of the proposed DeepCSR to measure the importance of its main components. The quantitative results are presented in table:exp:ablation, while the qualitative results are shown in

fig:exp:ablation:qualitative. See below a brief summary of the dataset and evaluation metrics used, in addition to the detailed description of these experiments and the discussion of their results.

Dataset. We use the MRI data provided by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [15] and their corresponding pseudo-ground truth surfaces generated with FreeSurfer V6.0. The ADNI dataset consists of 3876 MRI images from 820 different subjects collected at different time points. We split this dataset by subjects obtaining 2353 MRI scans from 492 subjects for training (), 375 MRI scans from 82 subjects for validation () and 1148 MRI scans from 246 subjects for testing (). We train the models on the training set until their loss plateau on the validation set and report their performance on the test set. We emphasize that these splits do not have MRIs or subjects in common for an unbiased evaluation.

Figure 4: Reconstructed outer cortical surfaces at different resolution or without hypercolumns ("No hc.").

Evaluation metrics.

We report the mean and standard deviation of well known surface comparison metrics like earth mover’s distance (EMD), average absolute distance (AD), and Hausdorff distance (HD) 

[42, 35, 43]

. EMD is the minimum amount of ‘‘work’’ to morph one surface to the other (lower is better), while AD and HD is the mean and maximum distance between closest points in two meshes (lower is better), respectively. Since HD is very sensitive to outliers, we use the

percentile instead of the maximum as suggested by Huttenlocher et al. [13]. It is also important to mention that AD and HD are computed in a bidirectional way for symmetry and over a sub-sample of 100k points.

Signed distance vs. Occupancy Field. We compare the performance of the DeepCSR framework when using signed distance function (first row in table:exp:ablation) and occupancy field (second row in table:exp:ablation) as implicit surface representation. The signed distance function provides better results than occupancy field since such a representation offers more information than simple binary labels to the learning process and mesh extraction method. More specifically, it also indicates the remoteness of the target surface at every point in the 3D space whereas the occupancy field just says whether a given point is inside or outside the target surface.

Surface-based Sampling. Despite uniform sampling has presented better results on generic object surface reconstruction as in Mescheder et al. [25], it is not the best choice for cortical surfaces due to the high frequency details localized on the surface. To support such a claim, we train the DeepCSR framework to regress signed distance function of points sampled uniformly in the 3D template space and compare with the proposed sampling scheme described in sec:method:learning. The results for the uniform sampling approach is reported in the third row of table:exp:ablation, while our surface-based sampling scheme is in the first row of the same table. The surface-based schema outperforms uniform sampling in all of the evaluation metrics.

Hypercolumn features. It is the most important component in our model, since we are not able to recover the high frequency details in the cortical surfaces without it. In order to support this claim, we remove the hypercolumn formation steps from our encoder depicted in fig:method:model, reshape the encoder’s last layer to output a 512-dimensional vector, use this vector as the decoder input, and train this model as before. These changes remove the hypercolumn schema but keeps the model with similar capacity since no layer or weights are removed. The performance of such a model is reported in the fourth row of table:exp:ablation and it is much worse than the proposed model for all the evaluation metrics. Furthermore, fig:exp:ablation:qualitative depicts oversmoothed cortical surfaces that were reconstructed without hypercolumn features.

Single Surface Model. As previously observed by the machine learning community [36], multi-task learning tends to provide better generalization. Similarly, we observe that learning the surface representations jointly produces better results than learn multiple models independently. We demonstrate such a result by training and evaluating a network for each target surface independently and comparing them to a network trained jointly for all the target surfaces. The results for the single-surface approach is presented in the fifth row of table:exp:ablation, while the multi-surface is presented in the first row of the same table.

Output Resolution. We reconstruct cortical surfaces at different resolutions by predicting implicit surface representations for 3D evenly spaced grids of points of different sizes as explained in sec:method:inferece. fig:exp:ablation:qualitative depicts cortical surfaces generated at , , , and 3D evenly spaced grids of points. As the resolution grows, our method reconstructs more details of the cortical surfaces. The quantitative results are presented in the last three rows of table:exp:ablation.

4.2 Comparison to FreeSurfer and FastSurfer

We now compare the precision, accuracy, and runtime of the proposed model to the FreeSurfer V6 (cross-sectional pipeline) and FastSurfer. More specifically, we use the DeepCSR (SDF) trained on the ADNI dataset and reconstruct surfaces at resolution as explained sec:method:inferece. tab:freesurfer summarizes the results and fig:exp:ablation:samples shows examples of reconstructed surfaces.

Precision on TRT Accuracy on MALC Runtime
Method AD () Dice VS
Table 2: Precision, accuracy and runtime comparison between FreeSurfer, FastSurfer, and DeepCSR on cortical reconstruction from MRI using Test-Retest (TRT) and MALC datasets.

Precision Analysis. We measure the precision, , repeatability, of the evaluated frameworks using the Test-Retest dataset (TRT) [22] and the experimental protocol described by Tosun et al. [43]. More specifically, the TRT dataset provides 120 T1-weighted MRI scans from 3 subjects which are scanned twice in 20 sessions spanning 31 days. Since no morphological changes should occur between successive scans of the same subject at the same session, the reconstructed surfaces should be identical up to the variations in the image acquisition and minor physiological changes. Therefore, following the aforementioned protocol, we reconstruct the surfaces from every MRI scan using the evaluated frameworks, align pairs of surfaces from the same subject and session using the ICP algorithm, compute discrepancy measures between these pairs of aligned surfaces, and report group statistics of these metrics. As discrepancy measures, we use the already discussed average absolute distance (AD), in millimeters (), between meshes and the percentage of distances greater than one (% 1 ) and two (% 2 ) millimeters as group statistics. The ‘‘Precision’’ column in tab:freesurfer summarizes the results for this experiment. We observe that the proposed DeepCSR has better reproducibility than both FreeSurfer and FastSurfer which is critical for medical studies.

Figure 5: Example of outer and inner cortical surfaces reconstructed with DeepCSR. The surfaces are color coded with the absolute distance to the pseudo ground-truth surface.

Accuracy Analysis. This experiment aims to measure how close the reconstructed cortical surfaces are to the true cortical surfaces. Since there is no manually annotated surface data for such a task, we decide to compare the evaluated methods on the segmentation of the brain cortex using the Multi-Atlas Labelling Challenge (MALC) dataset [23, 19]. This dataset consists of 30 brain volumes manually segmented by experts using the NeuroMorphometric labelling schema for the whole brain including cortical and subcortical structures. We first select all cortical labels to form the brain cortex ground-truth segmentation. Then, for a given MR image, we reconstruct the inner and outer surfaces using the evaluated methods, discretize these surfaces to a grid of voxels of 1 , and resample these generated voxel-grids to the input MRI resolution and dimensions. Next, we remove the resampled voxel-grid of the inner surface from the interior of the resampled voxel-grid of the outer surface and perform morphological opening and dilatation to generate a surface-based brain cortex segmentation. Finally, we report the Dice score (Dice) and volume similarity (VS) [42] between the surface-based generated segmentation and the cortex ground-truth segmentation as an accuracy measure of the evaluated frameworks. The ‘‘Accuracy’’ column in tab:freesurfer summarizes the results for this experiment. We observe that DeepCSR provides brain cortex segmentation with slightly greater overlap and more similar volume to the manually annotated data than the competitors.

Runtime Analysis. In order to compare the processing time required by the evaluated frameworks, we report the average elapsed time, in minutes, for these frameworks to reconstruct the cortical surfaces of the MRI scans in the MALC dataset [23, 19]. The ‘‘Runtime’’ column in tab:freesurfer presents the results for this experiment. It also important to note that the FreeSufer and FastSurfer runtimes reported just takes into account the processing steps necessary to reconstruct the cortical surfaces and ignores any other computations for a fair comparison. In summary, using a NVIDIA P100 GPU and Intel Xeon (E5-2690) CPU, our model is at least thirteen times faster than the FreeSurfer using the same hardware and input MRI scans. When compared to the FastSurfer, the speed-up is much smaller but the variance in runtime is drastically reduced. These improvements facilitate large medical studies and new healthcare applications.

Moreover, profiling DeepCSR, we observe that the initial registration takes minutes, the implicit surface prediction takes minutes, the topology correction takes minutes, and the marching cubes takes minutes. Therefore, further speed-up can be achieved by employing a faster topology correction method.

5 Conclusion

In this paper, we tackle the problem of directly reconstructing the brain cortex from MRI which is a critical task in clinical studies of neurodegenerative diseases. We formulate this problem as the prediction of implicit surfaces for points in a continuous brain template coordinate system. We also develop an encoder network architecture with hypercolumn features that is able to extract local and global image features from brain MR Images. Due to the continuous nature of this formulation and the efficient design of our network, we are able to accurately reconstruct cortical surfaces at high resolution capturing the brain surface geometry. Compared to the widely used FreeSurfer toolbox and its deep learning powered variant FastSurfer in two standard datasets, the proposed DeepCSR was found to be as accurate, more precise, and faster, which should facilitate large-scale medical studies and new healthcare applications.


  • [1] L. G. Apostolova, S. L. Risacher, T. Duran, E. C. Stage, N. Goukasian, J. D. West, T. M. Do, J. Grotts, H. Wilhalme, K. Nho, et al. (2018) Associations of the top 20 alzheimer disease risk variants with brain amyloidosis. jama 75 (3), pp. 328–341. Cited by: §1.
  • [2] M. A. G. Ballester, A. P. Zisserman, and M. Brady (2002) Estimation of the partial volume effect in mri. mia 6 (4), pp. 389–405. Cited by: §1.
  • [3] P. Bazin and D. L. Pham (2007) Topology correction of segmented medical images using a fast marching algorithm. Computer methods and programs in biomedicine 88 (2), pp. 182–190. Cited by: §3.3.
  • [4] R. Dahnke, R. A. Yotter, and C. Gaser (2013) Cortical thickness and central surface estimation. ni 65, pp. 336–348. Cited by: §1, §2, §3.1.
  • [5] A. M. Dale, B. Fischl, and M. I. Sereno (1999) Cortical surface-based analysis: i. segmentation and surface reconstruction. ni 9 (2), pp. 179–194. Cited by: §1, §2, §3.1.
  • [6] V. Doré, V. L. Villemagne, P. Bourgeat, J. Fripp, O. Acosta, G. Chetélat, L. Zhou, R. Martins, K. A. Ellis, C. L. Masters, et al. (2013) Cross-sectional and longitudinal analysis of the relationship between a deposition, cortical thickness, and memory in cognitively unimpaired individuals and in alzheimer disease. jama 70 (7), pp. 903–911. Cited by: §1.
  • [7] B. Fischl, M. I. Sereno, and A. M. Dale (1999) Cortical surface-based analysis: ii: inflation, flattening, and a surface-based coordinate system. ni 9 (2), pp. 195–207. Cited by: §1.
  • [8] T. Groueix, M. Fisher, V. G. Kim, B. Russell, and M. Aubry (2018) AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation. In cvpr, Cited by: §2.
  • [9] X. Han, D. L. Pham, D. Tosun, M. E. Rettmann, C. Xu, and J. L. Prince (2004) CRUISE: cortical reconstruction using implicit surface evolution. ni 23 (3), pp. 997–1012. Cited by: §2.
  • [10] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik (2015) Hypercolumns for object segmentation and fine-grained localization. In cvpr, pp. 447–456. Cited by: §3.2.
  • [11] L. Henschel, S. Conjeti, S. Estrada, K. Diers, B. Fischl, and M. Reuter (2020) FastSurfer-a fast and accurate deep learning based neuroimaging pipeline. NeuroImage, pp. 117012. Cited by: §1, §2.
  • [12] D. Hoiem, A. A. Efros, and M. Hebert (2007) Recovering surface layout from an image. ijcv 75 (1), pp. 151–172. Cited by: §2.
  • [13] D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge (1993) Comparing images using the hausdorff distance. pami 15 (9), pp. 850–863. Cited by: §4.1.
  • [14] Z. Iscan, T. B. Jin, A. Kendrick, B. Szeglin, H. Lu, M. Trivedi, M. Fava, P. J. McGrath, M. Weissman, B. T. Kurian, et al. (2015) Test–retest reliability of freesurfer measurements within and between sites: effects of visual approval process. Human brain mapping 36 (9), pp. 3472–3485. Cited by: §2.
  • [15] C. R. Jack Jr, M. A. Bernstein, N. C. Fox, P. Thompson, G. Alexander, D. Harvey, B. Borowski, P. J. Britson, J. L. Whitwell, C. Ward, et al. (2008) The alzheimer’s disease neuroimaging initiative (adni): mri methods. Journal of Magnetic Resonance Imaging 27 (4), pp. 685–691. Cited by: Table 1, §4.1.
  • [16] J. S. Kim, V. Singh, J. K. Lee, J. Lerch, Y. Ad-Dab’bagh, D. MacDonald, J. M. Lee, S. I. Kim, and A. C. Evans (2005) Automated 3-d extraction and evaluation of the inner and outer cortical surfaces using a laplacian map and partial volume effect classification. ni 27 (1), pp. 210–221. Cited by: §1, §2.
  • [17] C. Kong, C. Lin, and S. Lucey (2017) Using locally corresponding cad models for dense 3d reconstructions from a single image. In cvpr, pp. 4857–4865. Cited by: §2.
  • [18] N. Kriegeskorte and R. Goebel (2001) An efficient algorithm for topologically correct segmentation of the cortical sheet in anatomical mr volumes. ni 14 (2), pp. 329–346. Cited by: §1, §2.
  • [19] B. Landman and S. Warfield (2012) MICCAI 2012 workshop on multi-atlas labeling. In Medical image computing and computer assisted intervention conference, Cited by: §1, §4.2, §4.2.
  • [20] T. Lewiner, H. Lopes, A. W. Vieira, and G. Tavares (2003) Efficient implementation of marching cubes’ cases with topological guarantees. Journal of Graphics Tools 8 (2), pp. 1–15. Cited by: §3.1, §3.3.
  • [21] D. MacDonald, N. Kabani, D. Avis, and A. C. Evans (2000) Automated 3-d extraction of inner and outer surfaces of cerebral cortex from mri. ni 12 (3), pp. 340–356. Cited by: §1, §2.
  • [22] J. Maclaren, Z. Han, S. B. Vos, N. Fischbein, and R. Bammer (2014) Reliability of brain volume measurements: a test-retest dataset. Scientific Data 1, pp. 140037. Cited by: §1, §4.2.
  • [23] D. S. Marcus, T. H. Wang, J. Parker, J. G. Csernansky, J. C. Morris, and R. L. Buckner (2007) Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults. Journal of Cognitive Neuroscience 19 (9), pp. 1498–1507. Cited by: §1, §4.2, §4.2.
  • [24] J. C. Mazziotta, A. W. Toga, A. Evans, P. Fox, J. Lancaster, et al. (1995) A probabilistic atlas of the human brain: theory and rationale for its development. ni 2 (2), pp. 89–101. Cited by: §3.3.
  • [25] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger (2019) Occupancy networks: learning 3d reconstruction in function space. In cvpr, pp. 4460–4470. Cited by: §2, §3.2, §4.1.
  • [26] M. Michalkiewicz, J. K. Pontes, D. Jack, M. Baktashmotlagh, and A. Eriksson (2019-10) Implicit surface representations as layers in neural networks. In iccv, Cited by: §2.
  • [27] M. Omran, C. Lassner, G. Pons-Moll, P. Gehler, and B. Schiele (2018) Neural body fitting: unifying deep learning and model based human pose and shape estimation. In tdv, pp. 484–494. Cited by: §2.
  • [28] J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove (2019-06) DeepSDF: learning continuous signed distance functions for shape representation. In cvpr, Cited by: §2.
  • [29] E. Perez, H. de Vries, F. Strub, V. Dumoulin, and A. Courville (2017) Learning visual reasoning without strong priors. arXiv preprint arXiv:1707.03017. Cited by: §3.2.
  • [30] D. L. Pham, P. Bazin, and J. L. Prince (2010) Digital topology in brain imaging. IEEE Signal Processing Magazine 27 (4), pp. 51–59. Cited by: §1, §3.3.
  • [31] J. K. Pontes, C. Kong, S. Sridharan, S. Lucey, A. Eriksson, and C. Fookes (2018) Image2mesh: a learning framework for single image 3d reconstruction. In accv, pp. 365–381. Cited by: §2.
  • [32] O. Querbes, F. Aubry, J. Pariente, J. Lotterie, J. Démonet, V. Duret, M. Puel, I. Berry, J. Fort, P. Celsis, and T. A. D. N. Initiative (2009-05) Early diagnosis of Alzheimer’s disease using cortical thickness: impact of cognitive reserve. Brain 132 (8), pp. 2036–2047. Cited by: §1.
  • [33] M. Rebsamen, Y. Suter, R. Wiest, M. Reyes, and C. Rummel (2020) Brain morphometry estimation: from hours to seconds using deep learning. Frontiers in neurology 11, pp. 244. Cited by: §2.
  • [34] C. E. Rodriguez-Carranza, P. Mukherjee, D. Vigneron, J. Barkovich, and C. Studholme (2008) A framework for in vivo quantification of regional brain folding in premature neonates. ni 41 (2), pp. 462–478. Cited by: §1.
  • [35] Y. Rubner, C. Tomasi, and L. J. Guibas (1998) A metric for distributions with applications to image databases. In iccv, pp. 59–66. Cited by: §4.1.
  • [36] S. Ruder (2017) An overview of multi-task learning in deep neural networks. ArXiv abs/1706.05098. Cited by: §3.1, §4.1.
  • [37] R. Santa Cruz, L. Lebrat, P. Bourgeat, V. Doré, J. Dowling, J. Fripp, C. Fookes, and O. Salvado (2020) Going deeper with brain morphometry using neural networks. arXiv e-prints, pp. arXiv–2009. Cited by: §2.
  • [38] A. Saxena, M. Sun, and A. Y. Ng (2008) Make3d: learning 3d scene structure from a single still image. pami 31 (5), pp. 824–840. Cited by: §2.
  • [39] M. Schaer, M. B. Cuadra, L. Tamarit, F. Lazeyras, S. Eliez, and J. Thiran (2008) A surface-based approach to quantify local cortical gyrification. mi 27 (2), pp. 161–170. Cited by: §1.
  • [40] D. W. Shattuck and R. M. Leahy (2002) BrainSuite: an automated cortical surface identification tool. mia 6 (2), pp. 129–142. Cited by: §1, §2, §3.1.
  • [41] G. Spalletta, F. Piras, and T. Gili (2018) Brain morphometry. Springer. Cited by: §2.
  • [42] A. A. Taha and A. Hanbury (2015) Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC medical imaging 15 (1), pp. 29. Cited by: §4.1, §4.2.
  • [43] D. Tosun, M. E. Rettmann, D. Q. Naiman, S. M. Resnick, M. A. Kraut, and J. L. Prince (2006) Cortical reconstruction using implicit surface evolution: accuracy and precision analysis. ni 29 (3), pp. 838–852. Cited by: §4.1, §4.2.
  • [44] I. Wald, H. Friedrich, G. Marmitt, P. Slusallek, and H. Seidel (2005) Faster isosurface ray tracing using implicit kd-trees. IEEE Transactions on Visualization and Computer Graphics 11 (5), pp. 562–572. Cited by: §3.1.
  • [45] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y. Jiang (2018) Pixel2mesh: generating 3d mesh models from single rgb images. In eccv, pp. 52–67. Cited by: §2.
  • [46] E. W. Weisstein (1999) Triangle point picking. Wolfram Research, Inc.. Cited by: §3.1.
  • [47] C. Wen, Y. Zhang, Z. Li, and Y. Fu (2019) Pixel2Mesh++: multi-view 3d mesh generation via deformation. In cvpr, pp. 1042–1051. Cited by: §2.