FastSurfer – A fast and accurate deep learning based neuroimaging pipeline

10/09/2019
by   Leonie Henschel, et al.
31

Traditional neuroimage analysis pipelines involve computationally intensive, time-consuming optimization steps, and thus, do not scale well to large cohort studies with thousands or tens of thousands of individuals. In this work we propose a fast and accurate deep learning based neuroimaging pipeline for the automated processing of structural human brain MRI scans, including surface reconstruction and cortical parcellation. To this end, we introduce an advanced deep learning architecture capable of whole brain segmentation into 95 classes in under 1 minute, mimicking FreeSurfer's anatomical segmentation and cortical parcellation. The network architecture incorporates local and global competition via competitive dense blocks and competitive skip pathways, as well as multi-slice information aggregation that specifically tailor network performance towards accurate segmentation of both cortical and sub-cortical structures. Further, we perform fast cortical surface reconstruction and thickness analysis by introducing a spectral spherical embedding and by directly mapping the cortical labels from the image to the surface. This approach provides a full FreeSurfer alternative for volumetric analysis (within 1 minute) and surface-based thickness analysis (within only around 1h run time). For sustainability of this approach we perform extensive validation: we assert high segmentation accuracy on several unseen datasets, measure generalizability and demonstrate increased test-retest reliability, and increased sensitivity to disease effects relative to traditional FreeSurfer.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 5

page 9

page 10

page 11

page 12

page 13

08/08/2020

Complex Grey Matter Structure Segmentation in Brains via Deep Learning: Example of the Claustrum

Segmentationand parcellation of the brain has been widely performed on b...
03/17/2022

Surface Defect Detection and Evaluation for Marine Vessels using Multi-Stage Deep Learning

Detecting and evaluating surface coating defects is important for marine...
09/11/2017

3D Densely Convolutional Networks for Volumetric Segmentation

In the isointense stage, the accurate volumetric image segmentation is a...
11/13/2019

DARTS: DenseUnet-based Automatic Rapid Tool for brain Segmentation

Quantitative, volumetric analysis of Magnetic Resonance Imaging (MRI) is...
12/28/2017

Siamese LSTM based Fiber Structural Similarity Network (FS2Net) for Rotation Invariant Brain Tractography Segmentation

In this paper, we propose a novel deep learning architecture combining s...
03/17/2022

Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces from 3D MRI Scans with Geometric Deep Neural Networks

The reconstruction of cortical surfaces from brain magnetic resonance im...
07/16/2021

NeXtQSM – A complete deep learning pipeline for data-consistent quantitative susceptibility mapping trained with hybrid data

Deep learning based Quantitative Susceptibility Mapping (QSM) has shown ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The rapid emergence of standardized robust non-invasive imaging methods and infrastructure for big data analysis over the years has promoted the advent of a variety of large-scale neuroimaging studies. Different initiatives aim to understand the variability, development and anatomical layout of the human brain in e.g. neurodegeneration (ADNI ADNI_dataset , OASISoasis_1_dataset ; oasis_2_dataset ), psychiatric diseases (LA5c LA5c_dataset ), neurodevelopmental disorders (ABIDE abide_dataset , MIRIAD miriad_dataset ) or within populations (Rotterdam Study rotterdam , Human Connectome Project hcp_dataset , UKBiobank ukbiobank , Rhineland Study rhineland

). A core challenge within all neuroimaging studies is the need to process and analyze the continuing stream of data in a timely manner. As Magnetic Resonance Imaging (MRI) is one versatile imaging modality and integral part of all these studies, developing efficient tools to identify clinically-relevant imaging biomarkers with MRI are in high demand. In this work we, therefore, develop a fast method for volumetric segmentation, reconstruction of cortical geometry, and morphometric estimation of brain structures including cortical thickness. It is the first work that aims at integrating a novel deep learning method for image segmentation into a complete processing pipeline that includes cortical surface reconstruction and segmentation.

1.1 Neuroimage Analysis

To date, a few well maintained neuroimage processing pipelines such as FreeSurfer fischl2002whole , BrainSuite Shattuck2002 , SPM spm_book2007 , ANTs Avants2009 , or FSL Jenkinson2012 are the only means available to process and evaluate the incoming flow of data. These pipelines usually employ multiple image transformation steps, some of which require careful fine-tuning of parameters such as convergence thresholds, smoothing levels, or iteration numbers. Furthermore, due to extensive numerical optimization, e.g. non-linear registration or Bayesian segmentation, these approaches are computationally expensive and suffer from long run-times. Hence, several hours are required to process a single volume, significantly limiting scalability to large cohort studies with thousands of cases or to clinical workflows where immediate results are essential.

Supervised deep learning approaches are an attractive alternative to replace time-intensive steps within these pipelines such as whole-brain segmentation because of their 2-3 orders of magnitude lower run-time (seconds rather than hours). Fully convolutional neural networks (F-CNN), for example, are able to learn the correct feature representations in an end-to-end fashion from the image itself without requiring lengthy pre-processing steps. These methods can be effectively parallelized on graphical processing units (GPU) resulting in an enormous speed-up. Additionally, these networks often outperform traditional approaches with respect to accuracy and have become increasingly popular for pixel or voxel-wise semantic segmentation tasks in computer vision and biomedical imaging 

long2015fully ; unet ; noh2015learning ; segnet ; V-net ; densenet . In this work we propose a neuroimaging pipeline based on a novel neural network architecture for whole brain segmentation that induces local and global competition in the dense block and skip-connections.

1.2 Deep Learning for Whole Brain Segmentation

The task of whole-brain segmentation in particular is challenging due to the complex 3D architecture and spatial dependency between slices, the large number of labels, the size of the scanning volumes (memory requirements), and variability across scanners and subjects. While several deep learning based approaches have been proposed for specific tasks, such as tumor segmentation Rani2017 ; Dong2017 ; Arunachalam2017 ; Havaei2017 ; Amin2018 ; Crimi2019 , brain lesion segmentation Kamnistas2017 ; Varghese2016 ; Rezaei2017 ; RoaBarco2017 ; Chen2018 , MR image reconstruction Majumdar2015 ; Jin2017 ; Mardani2017 ; Schlemper2018 ; Yang2018 ; Dedmari2018 or prediction of brain related diseases and their progression Payan2015 ; Qi2016 ; HosseiniAsl2016 ; Khvostikov2018 ; Lee_2019 full brain segmentation into more than 25 classes has - so far - only been achieved by a few groups de_Brebisson_2015 ; wachinger2018deepnat ; sdnet ; quicknat ; psacnn .

Early networks for the task of whole brain segmentation like DeepNAT wachinger2018deepnat relied on 3D patches instead of 2D slices and were capable of segmenting an MRI brain scan in around 60 min wachinger2018deepnat ; de2015deep . The SkipDeconv-Net (SD-Net sdnet ) is a whole-brain segmentation F-CNN based on a classic encoder-decoder architecture reminiscent of the U-net unet

. In this work, a novel loss-function was introduced that addressed the inherent class imbalance problem and alleviated segmentation errors along anatomical boundaries. Subsequently, the network architecture was extended into an F-CNN called Quick segmentation of Neuroanatomy (QuickNAT 

quicknat ), which allows segmentation of a whole 3D brain volume into 27 structures. In this architecture, short-range skip connections were employed within each encoder-decoder block - these dense blocks were introduced in  denseconnections for classification tasks.

Here, we propose FastSurferCNN a deep learning architecture capable of segmenting a whole brain into 95 classes in merely 1 minute on the GPU (and 14 minutes sequential processing on the CPU). The basic architecture is inspired by QuickNAT, with three 2D F-CNNs operating on coronal, axial and sagittal views followed by a view aggregation step to infer the final segmentation quicknat . Each F-CNN has the same encoder/decoder-based architecture with skip connections unet , enhanced with unpooling layers noh2015learning and dense connections denseconnections within each block. The main methodological innovations of FastSurferCNN are the introduction of competition within each block (competitive dense blocks) by replacing concatenation with maxout operations maxout , as well as the inclusion of a wider image context within each 2D F-CNN (spatial information aggregation).

Note, that voxel-based image segmentation, on its own, is limited with regard to neuroimage analysis and biomarker extraction. Especially surface-based analysis has proven pivotal for e.g. correct estimation of thickness - an issue which has so far not been addressed in comparative publications on deep learning. Existing traditional pipelines go far beyond image segmentation and provide utilities such as creation of cortical surface models, estimation of thickness, construction of fiber tracts or functional connectivity graphs, and tools for group comparison, such as registration and statistical frameworks. A major focus of this work is to fill this gap by integrating the developed deep learning framework (FastSurferCNN) into a complete, self-contained imaging pipeline called FastSurfer.

Starting from the accurate 3D whole brain segmentation, provided by our deep learning framework, we perform cortical surface reconstruction and fast spherical mapping via a novel spectral approach that quickly maps the cortex using Laplace eigenfunctions. Furthermore, we map cortical labels and include traditional point-wise and ROI thickness analysis, resulting in a full FreeSurfer alternative with approximately 60 min run-time (depending on image quality and process parallelization) of which only 1 min is attributed to the whole brain segmentation. Hence, FastSurfer combines the speed of supervised deep learning approaches and the convenience of the broad spectrum of surfaced-based features and analysis methodologies provided by traditional neuroimaging pipelines.

We extensively validate the quality of our deep learning based neuroimaging pipeline through assessment of segmentation accuracy, generalizability to unseen datasets and acquisition parameters, test-retest reliability, and sensitivity to group level differences in imaging cohorts in a number of publicly available datasets. In fact, this is the first work within deep learning approaches with such an exhaustive validation. We demonstrate that despite being orders of magnitude faster than traditional approaches, FastSurfer increases reliability and sensitivity to disease effects making it a dependable tool for future large-scale population analysis tasks. The source code of FastSurfer is available on Github: https://github.com/reuter-lab/FastSurfer.***Will be made public upon acceptance.

2 Methodology

2.1 Datasets

The following 8 publicly available MRI datasets were selected for training, testing, and for extensive validation of the FastSurfer pipeline (see Table A.2 for usage summary).

ABIDE II: The Autism Brain Imaging Data Exchange II  abide_dataset contains cross-sectional data and focuses on autism spectrum disorders covering a wide age rage (5-64 years of age). The dataset contains 1044 MRI scans from 19 different institutions. The 3D magnetization prepared rapid gradient echo (MP-RAGE) sequence, or a vendor specific variant, was used to acquire all data. The corresponding sequence parameters vary depending on the site (see Table 2 of the corresponding paper for details abide_dataset ). With the exception of a single collection (IP_1, 1.5 Tesla, Philips Achieva), all MRI data were acquired using 3 Tesla scanners (1 Ingenia and 4 Achieva (Philips), 2 MR750 (GE), 7 TriTim (Siemens), 3 Allegra (Siemens) and 1 Skyra (Siemens)) at voxel resolutions varying from 1.30 mm to 0.7 mm (majority at 1.0 mm) and is available at: http://fcon_1000.projects.nitrc.org/indi/abide/abide_II.html. 20 cases from the ABIDE-II were used for training.

ADNI: The Alzheimer’s Disease Neuroimaging Initiative ADNI_dataset was launched in 2003 as a public-private partnership, led by principal investigator Michael W. Weiner, MD. The dataset contains 1.5T and 3T-MRI scans acquired at a resolution of 1.0x1.0x1.2 mm with scanners from the three largest MRI vendors (GE, Philips and Siemens) and includes Alzheimer’s disease patients, mild cognitive impaired subjects, and elderly controls. Data were acquired with a MP-RAGE sequence whose parameters are optimized for the different vendors (see ADNI_mri for details). The ADNI database has 2000 participants and is available at: http://adni.loni.usc.edu. 40 cases from ADNI where used for training. 180 different cases were used for assessing accuracy and generalizability across disease groups and scanners.

LA5c: The UCLA Consortium for Neuropsychiatric Phenomics LA5c Study LA5c_dataset is a cross-sectional study and includes 142 participants diagnosed with a neuropsychiatric or neurodevelopmental disorder (schizophrenia, bipolar disorder, ADHD) and 130 normal controls (ages 21-50). T1-weighted MP-RAGE images were acquired on a 3T Siemens Trio at a single-center with field of view of 250, 256x256 matrix, and 176 1.0 mm sagittal partitions. An Inversion time (TI) of 1.1 s, echo time (TE) of 3.5-3.3 ms, repetition time (TR) of 2.53 s and flip angle of 7 was used for all scans. This data was obtained from the OpenfMRI database (https://openfmri.org/dataset/ds000030/). Its accession number is ds000030. 20 cases from LA5c were used for training.

MIRIAD: The Minimal Interval Resonance Imaging in Alzheimer’s Disease miriad_dataset dataset includes longitudinal scans from 46 elderly individuals (ages 55+) diagnosed with Alzheimer’s disease and 23 elderly normal controls. The 3D MR images were acquired at a single center with a 1.5T Signa MRI scanner (GE Medical systems), using an IR-FSPGR (inversion recovery prepared fast spoiled gradient recalled) sequence, field of view of 24 cm, 256x256 matrix, 124 1.5 mm coronal partitions (voxel size 0.9x0.9x1.5), TR 15 ms, TE 5.4 ms, flip angle 15, and TI 650 ms. The data is available at: https://www.ucl.ac.uk/drc/research/methods/minimal-interval-resonance-imaging-alzheimers-disease-miriad . 20 cases from MIRIAD were used for validation. 49 different cases were used to assess accuracy and generalizability to a different T1-weighted acquisition sequences (IR-FSGPR) and scanner (GE).

Oasis-1 oasis_1_dataset and Oasis-2 oasis_2_dataset : The Open Access Series of Imaging Studies 1 and 2, both contain scans from a 1.5-T Vision scanner and a TIM Trio 3T MRI scanner (both Siemens) acquired in a single-center from non-demented and demented individuals diagnosed with very mild to moderate Alzheimer’s disease. All subjects were scanned in sagittal orientation with a voxel resolution of 1.0x1.0x1.25 mm and a MP-RAGE sequence with the following parameters: TR 9.7 ms, TE 4.0 ms, flip angle 10, and TI 20 ms. The Oasis-1 set is cross-sectional and contains 416 subject aged 18 to 96. In addition, it contains a test-retest component consisting of 20 subjects that were scanned no more than 90 days apart (all except 5 less than 30 days). The Oasis-2 set focuses on older adults (age 60+) and contains longitudinal scans from 150 subjects. Both dataset are available at: https://www.oasis-brains.org/ . 40 cases from Oasis-1 and 20 from Oasis-2 were used for training. 20 different cases from Oasis-1 where used to assess test-retest reliability and 370 cases for quantifying sensitivity to group effects.

HCP: The Human Connectome Projects Young Adult hcp_dataset is a cross-sectional dataset and contains 1200 healthy participants from ages 22 to 35. The 3T MR images were acquired with a single scanner (customized Connectome Skyra, Siemens) using a MP-RAGE sequence with TR 2.4 s, TE 2.14 ms, TI 1 s, and flip angle 8. Images are 0.7 mm isotropic with a field of view of 224x224, and are de-faced. Data is available at: https://www.humanconnectome.org/study/hcp-young-adult . 45 cases from HCP were used for assessing accuracy and generalizability to inputs with pre-processing (de-facing, downsampling).

MMND: This multi-subject, multi-model neuroimaging dataset was acquired with multiple functional and structural neuroimaging modalities (structural MRI + functional MRI + MEG + EEG) on the same 16 healthy volunteers MMND . Here, we only use the structural data which include T1-weighted MPRAGE and Multi-Echo Fast Low Angle Shot (MEF) sequences. The data was collected from a Siemens 3T TIM TRIO with a standard 1 mm isotropic resolution. For each participant, a T1-weighted image was acquired using an MPRAGE sequence (TR 2,250 ms, TE 2.98 ms, TI 900 ms, 190 Hz/pixel; flip angle 9) as well as two bandwidth-matched MEF sequences (651 Hz/pixel; TR 20 ms) at both 5 and 30 flip angles for each of 7 echo-times (TE 1.85 ms; 4.15 ms; 6.45 ms; 8.75 ms; 11.05 ms; 13.35 ms; 15.65 ms). This data was obtained from the OpenNeuro database. Its accession number is ds000117 . All MMND cases were used to assess generalizability to MEF sequences.

THP: The Traveling Human Phantom HTP_dataset is a dataset collected for assessing multi-site neuroimaging reliability. The THP includes 3D MP-RAGE MRI scans at 1.0 mm isotropic voxel resolution from 5 healthy subjects acquired at 8 different imaging centers. The sites involved in this study had either a Siemens 3T TIM Trio scanner (five sites: IOWA, UMN, UCL, MGH, CCF) or a Philips 3T Achieva scanner (three sites: JHU, DART, UW). The data is available at: https://openneuro.org/datasets/ds000206 . All THP cases were used to quantify accuracy and generalizability across sites and scanners.

Participants of the individual studies gave informed consent in accordance with the Institutional Review Board at each of the participating sites. Complete ethic statements are available at the respective study webpages.

All datasets were processed using FreeSurfer v6.0. FreeSurfer is an open source neuroimage analysis suite  fischl2002whole ; fischl2012freesurfer (http://surfer.nmr.mgh.harvard.edu/). Freesurfer morphometric procedures have been demonstrated to show good test-retest reliability across scanner manufacturers and across field strengths Han2006 ; reuter:long12 . In this work FreeSurfer parcellation following the “Desikan–Killiany–Tourville” (DKT) protocol atlas Klein2012 ; Desikan2006 is used for training and evaluation. In order to limit the number of segmentation labels, cortical regions touching each other across the hemispheres, are lateralized while all others are combined thus reducing the total number of labels from 95 (DKT without corpus callosum segmentations which are added later) to 78 during network training. The affiliation to the left or right hemisphere are restored in the final prediction by estimating the closest white matter centroid (left or right hemisphere) to each label cluster. A list of all segmentation labels is provided in the appendix (see Table A.1). In accordance with FreeSurfer, all MRI brain volumes are conformed to standard slice orientation and resolution (1 mm isotropic) before feeding them to the different deep learning networks. No further image processing is required afterwards (e.g. no skull stripping or intensity normalization).

Figure 2: FastSurfer Network Architecture

2.2 FastSurfer CNN

Here, we introduce the network architecture (FastSurferCNN) for whole brain segmentation into 95 classes (excluding background) in under 1 minute on the GPU (and approximately 14 minutes on the CPU). FastSurferCNN is composed of three F-CNNs operating on coronal, axial and sagittal 2D slices and a final view aggregation stage. The basic architecture of all three F-CNNs follows that of quicknat , namely a sequence of 4 dense encoder and decoder blocks separated by a bottleneck layer as illustrated in Figure 2. Within the FastSurferCNN, we now propose novel architectural elements - competitive dense blocks and spatial information aggregation - targeted to improve information recovery and increase network connectivity. In the following sections, each of these elements will be explained in detail.

Figure 3: The efficient “maxout” operation retains the max value at each position (top), while “concat” appends the two blocks requiring twice the memory.

2.2.1 Competitive Dense Block

With the exception of our preliminary work in Estrada2018 , dense connections within convolutional blocks have been implemented via concatenation of feature maps (see QuickNAT quicknat ) - effectively doubling the numbers of learnable parameters (Figure 3 top) within each encoder and decoder block and thus considerably increasing memory requirements. Here, we employ competitive dense blocks in which concatenations are replaced with maxout activations maxout . The maxout activations induce competition between feature maps and significantly reduce the number of parameters compared to the classical dense blocks, thus creating a lightweight model (Figure 3 bottom). Instead of stacking the output of previous layers on top of each other, only the maximum value at a given position is retained. Assuming inputs, denoted as , with each , where is height, is width and are number of channels for a particular feature map(), the output is given by:

(1)

The difference between dense blocks and competitive dense blocks can thus be described as:

(2)
(3)
(4)
(5)
(6)
(7)

Here,

represents a composite function of three consecutive operations: parametric rectified linear unit (PReLU) followed by convolution and batch normalization (BN) with exception of the very first encoder block. In the first block, the raw inputs are passed through BN, convolution and another BN before following the previously described architecture (see Figure 

2, CDBi vs. CDB block). The sequence of operations in guarantees normalized inputs to the maxout activation thereby improving convergence normalisation and increasing the exploratory span of the created sub-networks competitive simultaneously. Furthermore, filter co-adaptation is implicitly discouraged by the short-range skip-connections within the dense blocks competitive .

In addition to the competitive dense blocks we also implement competition across long-range skip connections. Instead of concatenating the unpooled information from the decoder arm with the corresponding feature maps from the encoder arm, we perform a maxout operation before further feeding the inputs to the competitive dense decoder blocks. All our competitive dense blocks are designed such that the inputs to this operation are already normalized (unpooling and skip transfer after BN; see Figure 2).

2.2.2 Spatial information aggregation

Due to memory constrains it is currently not possible to train a 3D deep neural segmentation network with whole brain MRI volumes. Thus, previous brain segmentation networks were trained on extracted 3D patches wachinger2018deepnat or 2D slices quicknat . However, both approaches loose spatial information critical for correct classification of a given structure. In order to recover as much information as possible, we propose a spatial information aggregation step. Here, instead of feeding only one 2D slice to the network we pass a 7-channel image by stacking the three preceding, the current, and the three succeeding slices for segmenting only the middle slice. Fundamentally, this spatial information aggregation combines the advantages of 3D patches (local neighbourhood) and 2D slices (global view).

2.2.3 View Aggregation

In order to account for the inherent 3D geometry of the brain, one F-CNN per anatomical plane is trained and their outputs combined in a final view aggregation step. Depending on the orientation of the 2D slices, each network therefore learns the anatomical representation of the brain structures within the coronal, axial or sagittal view. The final segmentation is generated by aggregating the probability maps of each model through a weighted average. Combination of the three principal views can boost accuracy for cortical folds and subcortical structures, some of which are better represented in one of the individual planes. In addition, view aggregation acts as a regularizer to reduce erroneous predictions. As it is not possible to differentiate between the left and right hemispheres in the sagittal view, we merge lateral labels, effectively reducing the number of classes from 78 to 50 in the sagittal network. The probability maps of these lateralized classes are finally restored by copying the softmax output of the combined label to both left and right hemispheres. To account for this remapping step, the weight with which the sagittal predictions influence the final segmentation is reduced by one half compared to the other two views.

2.2.4 Model Learning

Training Dataset: 140 representative subjects from ABIDE-II, ADNI, LA5C, and OASIS (see section 2.1) were selected for training the F-CNN models and 20 subjects from MIRIAD were used for validation. Empty slices were filtered from the volumes, leaving on average 145 single view planes per subject and a total training size of above 20k images per network. In addition, we use data augmentation (random translation of maximally 16 mm) to artificially increase the training set size further.

The training set is balanced with regard to gender, age, diagnosis, and spans various other parameters (i.e. scanners, field strength, and acquisition parameters); the distribution of the subset is presented in Table 1. Sufficient anatomical and acquisition variety in training images can be expected to improve network robustness, generalizability, and ultimately segmentation accuracy on most unseen scans without the need to fine-tune model weights. We will analyze generalizability to unseen datasets below.

Dataset Subjects Age ( SD) Women,n(%) Controls,n(%)
LA5c LA5c_dataset 20 35 ( 6.25) 10 (50) 10 (50)
MIRIAD miriad_dataset 20 68.85 ( 5.60) 10 (50) 10 (50)
OASIS-1 oasis_1_dataset 40 37.48 ( 13.54) 21 (52.5) 40 (100)
OASIS-2 oasis_2_dataset 20 77.70 ( 7.23) 11 (55) 10 (50)
ABIDE II abide_dataset 20 25.94 ( 4.89) 10 (50) 10 (50)
ADNI ADNI_dataset 40 74.65 ( 6.71) 20 (50) 20 (50)
Total 160 53.97( 22.19) 82 (51.25) 100 (62.5)
Table 1: Characteristics of the participants (n=160) showing mean (standard deviation) for continuous and counts (PCT) for categorical variables.

F-CNN Implementation:

Independent models for coronal, axial, and sagittal plane are implemented in PyTorch 

Paszke2017

and trained for 30 epochs using two NVIDIA Titan Xp GPU with 12 GB RAM and the following parameters: batch size of 16, constant weight decay of 10

, and an initial learning rate of 0.01 decreased by 95 % every 5 epochs. The networks are trained with Adam optimizer adamOptimizer and a composite loss function of median frequency balanced logistic loss and Dice loss sdnet . This loss function encourages correct segmentation of tissue boundaries and counters class imbalances by up-weighting less frequent classes.

2.3 FastSurfer Pipeline

Based on FreeSurfer methods and novel contributions, we also introduce a surface processing pipeline, that integrates our neural network architecture at its core to provide FreeSurfer volume and surface results, including cortical surfaces, thickness maps, and summary statistics in cortical regions following the DKT protocol atlas Klein2012 ; Desikan2006 .

Traditionally, surfaces are generated via a pipeline of several time-consuming processing steps: First, based on a white matter segmentation, which is patched to remove holes and ensure connectivity, initial surface triangle meshes are created for each hemisphere Dale1999 ; Fischl2001 . Meshes are smoothed, mapped to a sphere to localize topological defects (i.e. holes or handles should not exist as each hemisphere should be topologically equivalent to the sphere) Fischl1999a ; Segonne2007 . Once all defects are fixed, surface placement along the white matter is fine-tuned and a second expanded surface (pial surface) is placed at the outer gray matter (GM) boundary, also providing thickness estimates at every point on the cortex Dale1999 ; Fischl2000 . Then, surfaces are carefully mapped to the sphere a second time (minimizing metric distortions), registered to a spherical atlas Fischl1999b , and segmented into cortical parcellations (DKT atlas) Desikan2006 ; Fischl2004b ; Klein2012 .

Based on the DKT volume segmentation available from the FastSurferCNN we modify the above FreeSurfer pipeline to yield surface results of FreeSurfer (including thickness and cortical ROI measures). A significant speed-up compared to FreeSurfer can be achieved by omitting several steps that have become obsolete, such as skull stripping and non-linear atlas registration, given that a high-quality full brain segmentation has already been achieved. Furthermore, we innovate some traditional approaches by developing novel modules based on spectral mesh processing. Specifically:

  1. We use the full DKT brain volume segmentation to create a brainmask by closure, i.e. dilation and erosion, of the labels (including the ventricle label). This mask covers all labeled areas. Cortical regions are padded by one voxel layer to allow the pial surface to find its final position in some partial volume voxel between GM and CSF. Exceptions are the lateral orbital frontal and pars orbitalis to avoid capture of the optic nerve.

  2. We retrospectively construct a quick bias field corrected brain image and a linear Talairach registration as these results are needed later for some relevant statistics (e.g. intracranial vault volume for head size estimation buckner:04 ). Here we follow FreeSurfer, except that we can initialize the NU correct nu_correct with the already existing brainmask.

  3. We generate initial surfaces by using a marching cube Lorensen1987 algorithm rather than the traditional approach aiming at higher mesh quality at a slightly reduced number of vertices.

  4. We develop a fast mapping to the sphere using the eigenfunctions of the Laplace operator to perform a spectral embedding of the original white matter surfaces quickly (for the topology fixer). Precisely, we solve the Laplace-Beltrami Eigenvalue problem

    reuter:cad06 ; reuter:ijcv09

    on the original cortical surface mesh to obtain the first three non-constant Eigenfunctions with smallest Eigenvalues. After correcting sign flips and swaps, these functions parametrize the surface smoothly in anterior-posterior, superior-inferior and lateral-medial directions. The spherical map can then be quickly obtained by projecting the 3D spectral embedding to the sphere, i.e. by scaling the 3D Eigenfunction vector to unit length for each vertex.

  5. After topology fixing and GM surface creation, we map the DKT GM segmentations from the image onto the surface and compute surface ROI statistics, such as mean thickness and curvature averages per region - mimicking FreeSurfer’s surface segmentation pipeline without requiring the non-linear spherical atlas registration and segmentation. Spherical atlas registration can, however, be included if cross-subject correspondence is required, e.g. for local surface-based thickness analysis.

Overall, this yields a fast alternative to the FreeSurfer pipeline. We will evaluate the speed-up, reliability, and sensitivity of the full FastSurfer pipeline below.

2.4 Statistics

We will thoroughly validate the novel FastSurfer pipeline in terms of accuracy, generalizability, reliability and sensitivity using, Dice overlap, intraclass correlation and group analyses on volume and thickness ROI’s, as well as thickness maps. In the following sections we explain these statistical methods in detail.

2.4.1 Dice score

The Dice similarity coefficient (DSC) is a metric to evaluate the segmentation performance of the deep learning networks and can mathematically be expressed as

(8)

with binary label maps of ground truth G and prediction P (pixels of the given class indicated with 1, all others with 0). Here, the DSC is used two fold: first, to directly compare the performance of different network architectures against each other, and second, to estimate similarity of the predictions achieved with FastSurferCNN and ground truth (FreeSurfer v6.0) for a number of previously unseen datasets (generalizability). The DSC will be calculated separately for each cortical and sub-cortical structure. Note, that neural networks tend to smooth results, e.g., removing segmentation noise such as incorrect protrusions that are encountered in only a few training images and usually in random locations. While this kind of smoothing can improve segmentation accuracy, it can decrease the DSC which is partially affected by noise in the ground truth data. This is one reason why it is essential to perform additional validations (such as reliability or sensitivity analysis).

2.4.2 Intraclass correlation coefficient

The intraclass correlation coefficient (ICC) is a widely used metric to assess both, the degree of correlation and agreement between measurements. Thus, it is an ideal metric to judge the reliability of a given method. The ICC ranges from 0 to 1, with values close to 1 representing high reliability. Here, we use the degree of absolute agreement among measurements also known as criterion-referenced reliability ICC to compare the FastSurfer pipeline (deep learning segmentation + post-processing) to FreeSurfer. To this end, we calculate the agreement between cortical thickness and subcortical volumes in consecutive scans using the OASIS1 test-retest set. Prior to ICC calculations, volume and thickness estimates of the subcortical and cortical structures are extracted from the segmented brains. After averaging across hemispheres, the ICC as well as the upper and lower bound with level of significance are calculated for each region using the method described in ICC . Additionally, cortical thickness maps are mapped to a common space (fsaverage) and smoothed at 15 FWHM before calculating the ICC separately for each hemisphere. The resulting overlay maps are visualized on the semi-inflated fsaverage surfaces.

2.4.3 Group analysis

A segmentation method can potentially reach high ICC while being insensitive to actual effects in the data. Therefore, it is important to validate the sensitivity of a given method with respect to its capability to detect known significant variations in brain morphology between diagnostic groups. Here, we measure the degree to which FastSurfer and FreeSurfer are sensitive to differences in cortical thickness (derived from surfaces based on the deep learning segmentations) of cognitive normal controls (CN) and subjects diagnosed with Alzheimer’s disease (AD) or mild cognitive impairment (MCI). In FreeSurfer, cortical thickness is calculated as the minimal distance between the white matter and pial surface Fischl2000 . Prior to statistical analysis, thickness estimates of each subject were mapped to a common space (fsaverage) and smoothed at 15 FWHM. Using FreeSurfer’s generalized linear model fit mri_glmfit, we analyze the relation between the vertex-wise cortical thickness and the dementia status while correcting for gender, age and in case of the volume-based calculations for head size. Significance cut-off is set to p 0.05 without correction for multiple comparison as we are only interested in the relative differences between the two methods (FastSurfer and FreeSurfer). The p-value maps are then displayed on the semi-inflated fsaverage surfaces. Furthermore, we calculate and report the p-values of all cortical and sub-cortical structures with . Here, we average the thickness and volume estimates across hemispheres prior to the statistical analysis.

3 Results

3.1 Accuracy

First we evaluate segmentation performance by comparing FastSurfer segmentations with “ground truth” FreeSurfer results. FreeSurfer has been selected to provide ground truth for the following reasons: 1. manual segmentations are extremely labor intensive. 2. Comparability to FreeSurfer should be maintained (training on different segmentation protocols will void any direct comparison). 3. Complex 3D structures such as cortical folds are difficult to segment manually when viewing 2D slices while automated placement of 3D surface models has the potential to outperform manual raters. To evaluate segmentation accuracy, we compute the DSC of each candidate method with FreeSurfer labels on five testsets (ADNI, OASIS1, HCP, MIRIAD and THP). Here, scans from subjects used for network training and validation (i.e. 40 subjects each from OASIS and ADNI, 20 from MIRIAD) are excluded. No subjects from HCP and THP have been included in the training set at any point. Five subjects from OASIS1 were further excluded from the testset due to heavy white matter lesion load and resulting topological surface defects in FreeSurfer and FastSurfer.

We benchmark our proposed network against traditional whole-brain segmentation F-CNNs namely SDNet sdnet and QuickNAT quicknat . Additionally, we incrementally test the importance of our network modifications. First, we evaluate the effects of competition within the dense blocks and across the long-range skip connections (CDB, see section 2.2.1). Second, we increase the information input to the network by passing the stacked 7-channel image to the network (spatial information aggregation (SPI), see section 2.2.2). Both architecture changes together comprise our final proposed FastSurferCNN. To permit a fair comparison, all benchmark networks were trained on the same data and follow the same architectural design of 4 encoder and decoder blocks separated by a bottleneck block. Each block contains the same convolutional layer architecture as illustrated in Figure 2. The baseline architectures were further suitably adopted by modifying the final classification layer to predict 78 classes as the original implementations do not target cortical parcellations and hence comprise a much lower number of output labels (27 for QuickNAT quicknat and 26 for SDNet sdnet ). Furthermore, all comparative models were implemented with the above-mentioned view aggregation (see section 2.2.3). Care was taken to confirm that the adaptations are acceptable to the first author of the original papers.

Figure 4: Dice score comparison of baselines and the proposed FastSurferCNN on four different datasets (mean standard deviation). Network modifications (i) competitive dense blocks (CDB) and (ii) spatial information aggregation (SPI) are incrementally tested. The final FastSurferCNN (dark blue, CDB + SPI) outperforms all other models on both, subcortical and cortical structures.

In Figure 4, we report the DSC against 649 previously unseen scans from ADNI (180 subjects), OASIS1 (370 subjects), HCP (45 subjects), MIRIAD (49 subjects) and THP (5 subjects). We calculate the average DSC on 33 subcortical structures and 62 cortical regions (31 per hemisphere after remapping). For a complete list of structures the reader is referred to Appendix Table A.1. Each successive network modification results in an increase of the DSC for all five datasets. Introduction of competition within the network (blue, CDB) already outperforms QuickNAT quicknat (green) and SDNet sdnet (light green) with an up to 0.3 % improvement for both, subcortical and cortical structures. On average, competition increases the DSC to 88.74 and 84.55 for the subcortical and cortical structures, respectively. Note, that this improvement is achieved while simultaneously reducing the number of trainable parameters by one half (from approx.  to )! The final FastSurferCNN (dark blue) further increases segmentation accuracy on average by 0.6 % on the subcortical and 1.9 % on the cortical structures compared to QuickNAT (final DSC of 89.08 and 85.88). Therefore, increasing the local information content provided to the network via the spatial information aggregation is particularly useful for recognizing cortical folding patterns. The same trend can be observed when analyzing the worst instead of the average DSC (data not shown). Statistical testing further confirmed a highly significant increase in DSC for both improvements (competition and information aggregation) compared to QuickNAT (Wilcoxon signed-rank test, after Bonferroni correction for multiple testing).

FastSurferCNN also outperforms all other models on the challenging THP dataset. This data source contains scans from eight different sites and scanning conditions with strong variations in data quality (e.g. motion artifacts). FastSurferCNN, however, maintains high accuracy for all eight sites (average DSC 89.00 (QuickNAT: 88.37) for subcortical and 86.16 (QuickNAT: 84.43) for cortical structures). Interestingly, MRI data acquired on Philips Scanner (DART, JHU and UW) are segmented quite accurately even though the training set predominantly included Siemens scans. Additionally, the difference in accuracy between the imaging sites is far lower for FastSurferCNN than QuickNAT on the cortical structures (40 % less difference between best (JHU) and worst (IOWA) site). Notably, the algorithm also generalizes well to defaced and downsampled input images (HCP dataset) even though such examples are absent from the training set (DSC of 85.21 for the cortical and 87.36 for the subcortical structures).

Figure 5: Surface dice score per region of the proposed FastSurferCNN for the 31 cortical parcels. FastSurferCNN achieves accurate results across structures with an average DSC of above 92.

Inherently, FreeSurfer segments cortical parcels directly on the surface which specifically considers the curvature, e.g. to locate region boundaries inside the sulcii. Thus, cortical segmentation errors may not be accurately reflected in a volume based DSC comparison. We, therefore, also calculate the surface-based DSC for OASIS1. To this end, the volumetric segmentation of FastSurferCNN are projected onto the FreeSurfer-generated surface (considered as ground truth). The area-related DSC mapped regions is then directly calculated on the surface. As visible in Figure 5, surface segmentation is indeed quite similar to FreeSurfer with an average surface DSC of 92.38 on the right and 92.60 on the left hemisphere. Further, all structures achieve a DSC of above 85.5.

3.2 Generalizability

High generalizability will ensure that the proposed method can be applied across different sites, vendors, field strengths, and for large multi-center studies. Figure 4 indicates that networks generalize well across these parameters and respective image qualities, as the DSC remains quite stable. For example, the HCP dataset consists of 0.7 mm isotropic images, downsampled to 1 mm and de-faced, which were never encountered during training. MIRIAD is a IR-FSPGR sequence on a GE scanner. Furthermore, in the THP dataset DSC scores vary only around 1 or 2 % across the 8 sites spanning Siemens and Philips scanners. The five subjects of THP, however, might not be representative, which is why in this section we quantify generalizability by computing the agreement of FastSurferCNN with FreeSurfer across different scanner types (Siemens, Philips and GE) as well as disease states (CN, MCI and AD patients) in a larger dataset. For this purpose we employ an independent testset consisting of 180 scans from ADNI balanced with regard to vendor, disease group, gender and age.

Figure 6: Comparison of the DSC (mean standard deviation) across neurodegenerative states (cognitive normal (CN), mild cognitive impaired (MCI), demented (AD); top) and vendor (GE, Philips, Siemens; bottom). FastSurfer achieves high accuracy and low variability across all of them.

In the upper part of Figure 6 the DSC across different disease states is shown. Scans from subjects with later-stage dementia are more difficult to segment, potentially due to increased motion and lower gray-white contrast. Specifically, the increase in ventricle volume, subsequent shrinkage of GM and/or increased white matter lesion load can have a profound effect and are frequently difficult to segment with traditional neuroimaging pipelines such as FreeSurfer. A small decrease in segmentation DSC (i.e. increased deviation from FreeSurfer) can be observed with disease progression (CN to AD) in our proposed method (cortical structures: DSC decreases from 86.8 to 85.8, subcortical structures: DSC decreases from 90.3 to 89.6). However, with FastSurferCNN DSC scores do not decrease more than 1.11 % between diseased (MCI, AD) and control state (CN) for cortical and 0.77 % for subcortical structures. A similar trend is observed with regard to generalizability across vendors (lower part of Figure 6). Even though FastSurferCNN was predominantly trained on MRI scans acquired on Siemens scanners, the segmentations of Philips and GE scans are comparable to FreeSurfer with only a minor decrease in DSC of 1.6 % for the cortical structures (see Figure 6). Here, GE and Philips scans can be segmented similarly well (DSC of 85.50 and 85.57, respectively). Segmentation of subcortical regions is equally consistent on Siemens and GE scans (DSC of 90.30 and 90.23, respectively). Philips scans show more variation with a difference similar to the one observed for the cortical structures (1.6 % difference compared to Siemens; DSC of 88.85). These findings in combination with the high DSC scores achieved on MIRIAD (1.5T GE Scanner) and HCP (downsampled and defaced 0.7 mm 3T Siemens), see Figure 4, show that FastSurferCNN is capable of generalizing stably across disease stages, field strength, vendors and pre-processing.

While very few GE scans were used in training (only 3 cases of ADNI), still the network has at least seen some images spanning the above parameters which might explain its good generalizability. All datasets so far acquire some kind of MPRAGE sequence, except for MIRIAD with IR-FSPGR (note, MIRIAD was not used in training, only for validation). We now test generalizability across sequences to an unseen MEF sequence (MMND dataset, see Section 2.1). This datasets provides 16 subjects each with MEF and MPRAGE scans. We first confirm, see Figure 7 (orange column), that FastSurferCNN obtains a high DSC in comparison to FreeSurfer on the MPRAGE images, again corroborating good generalizability to Philips scanners. A 6.7 % point drop in DSC can be observed when comparing both segmentation methods on the MEF images (red column). Reduced agreement on MEF images can, of course, be explained by a deviation of either (or both) methods from ground truth. In the absence of real ground truth, we set FreeSurfer’s MPRAGE segmentation as the standard. To test how well FreeSurfer generalizes to MEF, we compare FreeSurfer’s own segmentations across MEF and MPRAGE after robust rigid registration reuter2010robustreg and observe a significantly reduced DSC (82.37 on subcortical and 75.70 on cortical structures) (Figure 7 dark blue). In comparison FastSurferCNN’s MEF segmentation (using the same registrations) slightly outperforms FreeSurfer’s generalizability as it is actually closer to ground truth (FreeSurfer MPRAGE) than FreeSurfer itself for subcortical structures (DSC of 83.17) and similar for the cortex (DSC of 75.67) (Figure 7 bright blue). These results highlight an excellent generalizability of FastSurfer to the unseen T1-weighted MEF sequence.

Figure 7: Comparison of segmentation accuracy between different image acquisition sequences (MPRAGE versus Multi-echo FLASH (MEF)). FreeSurfer’s MPRAGE segmentations are considered ground truth. FastSurferCNN generalizes well to MEF acquired images and achieves closer results to FreeSurfers MPRAGE than FreeSurfer itself on the subcortical structures.

All above DSC comparisons with FreeSurfer assume accurate performance of FreeSurfer. This is of course not certain, as FreeSurfer’s segmentation quality can degrade across scanners, sequences, or advanced neurodegeneration. Whenever “ground truth” cannot be trusted, it is difficult to quantify performance with direct comparisons, as a small DSC can also indicate noisy or erroneous ground truth. Therefore, we also perform validations in the next sections that are independent of ground truth labels but rather rely on the assumptions, that (i) anatomy does not change much in small time frames (test-retest reliability) and that (ii) known disease effects should be detected with statistical power by the new method (sensitivity). For these comparisons we run the full FastSurfer pipeline, extending the FastSurferCNN with subsequent surface processing.

3.3 Reliability

Test-retest reliability is assessed as the agreement between the evaluations of two scans in a short time frame. We calculate the intraclass correlation coefficient ICC on the OASIS1 test-retest dataset with 20 participants. Note, that the acquisition source of variation (motion, noise etc.) will be identical for different image processing methods. Higher agreement can therefore be taken as an index of method stability and consistency of results.

Figure 8 shows the ICC value for each structure separately including the upper and lower bound at significance level

=0.05 (black error bar) for FastSurfer (dark blue bars) and FreeSurfer (light blue bars). A higher ICC indicates a better reliability with the maximum being 1. The ICC values for the volume of subcortical structures are higher for FastSurfer on all 13 structures (0.99 on average versus 0.97 with FreeSurfer). On six structures (including Hippocampus, Putamen, Caudate and Thalamus) the agreement between scans is above 0.99. Furthermore, all confidence intervals (lower to upper bound) are smaller for FastSurfer indicating better segmentation consistency across the 20 participants. The ICC values for the thickness of cortical regions show a similar pattern, with FastSurfer achieving higher ICC on both hemispheres in 30 out of 31 regions. On average, an ICC of 0.92 is achieved with FastSurfer (0.87 with FreeSurfer) and 24 structures show a correlation of above 0.9. No cortical structure has an ICC below 0.78 (0.73 for FreeSurfer). Correspondingly, visualization of the ICC directly on the surface (see Figure 

9) demonstrates that FastSurfer segmentations yield larger regions on the cortex with high ICC values (light blue) compared to FreeSurfer. Here, it is further apparent that the majority of the cortex reaches ICC values of more than 0.8 (blue areas).

Figure 8: Intraclass correlation coefficient on the Test-Retest OASIS1 dataset for FastSurfer (dark blue) and FreeSurfer (light blue). Error bars indicate upper and lower bound of the calculated ICC (significance level =0.05).
Figure 9: Visualization of the intraclass correlation coefficient on the Test-Retest OASIS1 dataset for FreeSurfer (left) and FastSurfer (right). ICC ranges from 0.8 (dark blue) to 1.0 (light blue) are shown.

3.4 Sensitivity to Disease Effects

Sensitivity to disease effects is an essential component of our evaluation. While the accuracy of our approach (as quantified by the DSC) has already been established, we now determine to what extent our results are relevant and useful in an applied research setting. To this end, we analyze whether our proposed method is capable of reproducing well-known empirical results, in particular the effects of AD on brain structure. The sensitivity of the proposed methods to real effects is determined by evaluating their ability to separate diagnostic groups (here: AD versus CN), as indicated by the p-value, a composite measure of the size of an effect and the accuracy of its estimation. The OASIS1 dataset presents a suitable test set for this purpose and was used for the ensuing group analysis. In Figure 10 structures with p-values below are shown (based on both FreeSurfer and FastSurfer). The signed p-values presented in the figure indicate the direction of the effect (below zero represent atrophy and above zero volume increase). Consequently, one can directly observe that the ventricle volume (lateral ventricles, inferior lateral ventricles) increases while all other structures atrophy for both, FreeSurfer and FastSurfer. In the subcortical domain, a volume reduction is specifically detected for the hippocampus, the amygdala and the thalamus, which is congruent with other research results on AD deJong2008 ; Henneman2009 ; Schuff2009 ; Poulin2011 ; Aggleton2016 ; Pini2016 . FastSurfer reaches lower p-values for all three structures indicating a higher sensitivity to differences between the groups. FastSurfer and FreeSurfer are further capable of detecting significant differences in areas related to disease progression in the cortex (e.g. bilateral frontal, temporal and parietal lobe; Braak1995 ; Baron2001 ; Wenk2003 ; Lerch2005 ; Poulin2011 ). Specifically, parts of the temporal (superiortemporal, middletemporal, inferiortemporal, entorhinal) and parietal lobes (inferiorparietal, supramarginal) are significantly thinner (p detected with FastSurfer for all areas). The overall thickness also correlates with disease progression (MeanThickness p for FastSurfer). Figure 11 depicts the detected differences of cortical thinning in patients with AD compared to CN subjects with the original FreeSurfer stream (left) and our proposed FastSurfer pipeline (right) directly on the surface. The visualization complements the results of Figure 10, clearly indicating the ability of FreeSurfer and FastSurfer to detect thinning effects across hemispheres. Again, the differences between groups are more pronounced with the FastSurfer pipeline (uncorrected p-value smaller, larger yellow regions). The proposed pipeline is thus not only able to separate groups effectively, but is more sensitive to disease effects than the baseline FreeSurfer pipeline and thus holds great promise for future analysis tasks.

Figure 10: Significance (p-value) of cortical thickness changes in disease groups on the OASIS1 dataset for FastSurfer (red) and FreeSurfer (light red). Structures with p-values below are shown. Effect directions are indicated by the sign (atrophy: negative values, enlargement: positive values). Volume of ventricles increases while all other structures show atrophy. FastSurfer and FreeSurfer detect significant changes in areas related to disease progression.
Figure 11: Group Analysis of cortical thickness variations in Alzheimer’s disease compared to controls based on the OASIS1 dataset for FreeSurfer (left) and FastSurfer (right). The color-coded uncorrected p-value map ranges from 0.05 (red) to (yellow). Differences in cortical thinning are more pronounced in the FastSurfer analysis stream.

3.5 Pipeline Innovations

As described in Section 2.3, FastSurfer is a full surface reconstruction pipeline based on FreeSurfer. Additionally to skipping/replacing some steps the two main modifications are (i) reconstructing surfaces with the marching cube algorithm and (ii) implementing a novel, fast spectral mapping to the sphere. Here we compare these two changes with a pipeline that uses the original FreeSurfer modules (mri_tessellate and mris_sphere) in these two steps and is otherwise identical to FastSurfer. We quantify a) the number of topological surface defects, b) the overall processing time for the surfaces, and c) the average quality of the surface triangle meshes, which is defined for each triangle as where is the triangle area and the edges. is 1 for the equilateral triangle and close to zero for degenerated triangles.

To evaluate the two new modules, the following statistics are based on the OASIS1 dataset. The average number of defects in the pipeline with FreeSurfer modules (27.2 defects per hemisphere) decreased by 12 % (24.0 defects) when constructing the surfaces with marching cube, and 15.3 % (23.1 defects) when the proposed pipeline (FastSurfer: marching cube + spectral spherical projection) was used. Also the average surface processing time decreased significantly by 15 minutes per hemisphere with FastSurfer. Finally the triangle mesh quality index of FastSurfer (Q=0.902) was significantly higher than for the pipeline with the traditional modules (Q=0.888), due to the marching cube algorithm. Additionally, construction of surfaces with marching cube instead of mri_tessellate slightly reduces the number of vertices on average by 120 per hemisphere (133154 versus 1333034 for tessellate and marching cube, respectively). All three tests (defects, time, quality) of the final FastSurfer pipeline with respect to the one with original modules are highly significant: the Wilcoxon signed-rank test reports a p-value below machine precision (p ).

Defects Quality Time/hemi
orig 27.2 0.888 41.6
+ mc 24.0 0.902 25.7
+ qspec 23.1 0.902 25.4
Table 2: Average number of topological defects, mesh quality and processing time per hemisphere when using original FreeSurfer modules, marching cube, and spectral spherical projection in the FastSurfer pipeline.

Overall, in addition to these methodological innovations, our pipeline saves time by replacing many FreeSurfer steps, such as skull stripping, spherical segmentation etc., since we can directly build on the high-quality image segmentations provided by the deep neural network. Here, we compare run-times of three approaches: (i) complete regular FreeSurfer processing, (ii) FastSurfer pipeline without spherical registration, (iii) and complete FastSurfer with spherical registration. Note, while spherical registration is not needed to obtain surface segmentations and ROI thickness measures in FastSurfer, it is required to construct cross-subject correspondence, e.g., when performing vertex-wise surface thickness analyses. All pipelines are evaluated on identical subjects (10 representative subjects from OASIS1, balanced with regard to gender (5 male, 5 female), age range (21 to 86 years), and diagnosis (4 AD, 6 CN)) and identical hardware (CPU: Intel Xeon Gold 6154 @ 3 Ghz) using both, sequential processing and parallel processing (4 threads and simultaneous processing of the two hemispheres).

In our test a complete FreeSurfer run takes approximately 7h (4h parallel) on the CPU, which can vary depending on image quality, disease severity etc. The proposed FastSurfer pipeline achieves the volumetric segmentation in only 1 minute (on the GPU, 14 min on the CPU), surface processing including cortical ROI thickness measures in 1.7h (0.9h parallel), or complete processing including the spherical registration in 3.7h (1.6h parallel) on the CPU.

sequential parallel
FreeSurfer 424 min ( 37) 244 min ( 20)
FastSurfer 104 min ( 20) 54 min ( 27)
FastSurfer +
sphere.reg
223 min ( 53) 97 min ( 31)
Table 3: Runtime comparisons of the different pipelines (recon-all, recon-surf and recon-surf including registration (reg) of the orig surface to the spherical atlas for cross-subject correspondence). Average processing times and standard deviation in minutes. Parallel = parallelization of hemispheres, 4 threads

4 Discussion

In this work we introduce a novel semantic segmentation neural network architecture and a fast pipeline for the processing of neuroanatomical surfaces that outperforms FreeSurfer with respect to runtime, reliability and sensitivity. We contribute a novel deep learning architecture (FastSurferCNN) by introducing competition into the network (within dense blocks and long-range skip connections) and increasing the initial information content provided to the network via spatial information aggregation. Competition significantly reduces the number of network weights resulting in a slimmer architecture with lower memory requirements. We demonstrate its superior performance for the fast and detailed (close to 100 structures) segmentation of whole brain MRI compared to existing deep learning approaches. FastSurferCNN outperforms QuickNAT as well as SDNet in terms of accuracy by a significant margin. Across five different datasets our network achieves the highest DSC on the subcortical as well as cortical structures (89.08 and 85.87 on average). The lower DSC for cortical structures compared to subcortical structures can probably be attributed to the highly folded geometry of the cortex, as well as missing intensity gradients across neighboring cortical regions (GM intensity is relatively uniform). Overall, the processing time for segmenting a 3D 1mm isotropic MRI brain scan is kept below one minute. Fast MRI segmentation opens up multiple avenues of potential applications, ranging from direct feedback or field-of-view localization during image acquisition, fast clinical decision support by quantitative personalized measurements, and scalability to very large cohort data sets. Many such applications require no surface models and can terminate after the 1 minute image segmentation step, allowing rapid processing of the incoming data.

One frequently quoted limitation of learning-based approaches is the uncertain generalizability beyond image types encountered during training. This limitation is valid and it remains unclear how far networks generalize, e.g., to different sequences, disease or age groups. Various domain-adaptation approaches have been introduced to accommodate fine-tuning a network to a new type of data, and should be considered when network performance degrades. In this work, we put an emphasis on evaluating generalizability of our method. We first demonstrate good generalizability to different sites, vendors, field strength, scanner types, and across disease groups. Analysis on HCP further highlights generalizability to down-sampled and de-faced high-res images, which were never encountered during training. Furthermore, we were able to demonstrate good generalizability to a completely unseen multi-echo FLASH sequence, even outperforming FreeSurfer. These results are very promising, yet, we recommend users to visually inspect images to ensure good quality for their acquisition setting as good generalizability to any T1-weighted sequence is certainly not guaranteed.

Extending the image segmentation network, our full FastSurfer pipeline permits the fast analysis of cortical thickness (vertex-wise and region-wise) following the DKT atlas. This is achieved by both optimizing and replacing multiple steps of the FreeSurfer pipeline, e.g. by mapping segmentation results from the image to the surfaces. Processing of a single MRI volume with parallelization can thus be achieved in below 1h including thickness ROI analysis, and 1.6h including surface registration for cross-subject correspondence - a fraction of the time a whole FreeSurfer run needs to complete (4h with parallelization). Some of this speed-up can also be attributed to the reduced number of detected topological defects. Marching cube seems to be reducing the number of defects on the initial surfaces already, but also the new spectral spherical mapping helps further reduce detected defects, potentially due to the smooth embedding of the eigenfunctions and resulting reduction of self-folds. Future work will focus on increasing processing speed further, e.g., by including deep-learning based registration procedures. Also note, that ongoing activities to parallelize and speed-up traditional FreeSurfer code will directly impact multiple components of the FastSurfer pipeline (such as the topology fixer, surface reconstruction, cross subject registration etc.).

Our extensive validation of FastSurfer further includes test-retest reliability and sensitivity studies. FastSurfer exhibits improved test-retest reliability relative to FreeSurfer. This is reflected in higher ICC values across both hemispheres for FastSurfer (on average 0.92 for all cortical and 0.99 for all subcortical structures). Given that increased reliability can be bought by extensive smoothing, potentially at the cost of sensitivity, we evaluate FastSurfer’s sensitivity do known disease effects in AD. Here, we can replicate group differences between CN and AD patients with high sensitivity. Specifically, AD-related significant volume reductions in amygdala and hippocampus, increased ventricle volume, as well as cortical thinning in the temporal and parietal lobes were detected with both FastSurfer and FreeSurfer.

Interestingly - and in spite of using FreeSurfer for training - FastSurfer outperforms FreeSurfer with respect to statistical power in these discrimination tasks, most likely due to its implicit noise reduction: While consistent errors in FreeSurfer segmentations will be learned, random segmentation noise such as local inaccuracies or protrusions are averaged out and might allow the network to achieve superior results. The inherent training paradigm of FastSurferCNN can be considered another contributing factor. During training, the network has been exposed to various pathological scans with high anatomical and acquisition variability in contrast to the limited number of cases (40) within the FreeSurfer atlas. The larger corpus likely improves the resulting segmentations and derived volume and thickness estimates. In fact, it is remarkable that the 140 training cases (plus augmentation) are sufficient to provide these excellent results. This is put into perspective by considering that in fact 20k 2D images (some of them highly correlated - of course) are used for training each view. Still it can be expected that training with more cases will improve accuracy and generalizability further, leaving space for future exploration.

Finally, one of the major advantages of supervised learning over traditional pipelines is that consistent errors can be removed by manually fixing existing or adding new training cases. This is in stark contrast to model based pipelines, where updates or fixes to the algorithm can only be introduced by a handful of experts and often have unintended consequences. Future work can thus explore training on very large and heterogeneous datasets, as well as the inclusion of manual labels or manually corrected automated labels to improve segmentation quality even further.

Overall we introduce a fast, stable, reliable and sensitive pipeline for automated neuroimage analysis that scales well to large datasets and enables various new applications where segmentation speed is essential, for example: to localize structures during image acquisition, to provide quantitative measures in clinical workflows, or to process large cohort studies efficiently.

5 Acknowledgment

Support for this research was provided in part by NIH R01NS083534, R01LM012719, and by an NVIDIA Hardware Award as well as the BRAIN Initiative Cell Census Network grant U01MH117023, the National Institute for Biomedical Imaging and Bioengineering (P41EB015896, 1R01EB023281, R01EB006758, R21EB018907, R01EB019956), the National Institute on Aging (1R56AG064027, 5R01AG008122, R01AG016495), the National Institute of Mental Health the National Institute of Diabetes and Digestive and Kidney Diseases (1-R21-DK-108277-01), the National Institute for Neurological Disorders and Stroke (R01NS0525851, R21NS072652, R01NS070963, R01NS083534, 5U01NS086625,5U24NS10059103, R01NS105820), and was made possible by the resources provided by Shared Instrumentation Grants 1S10RR023401, 1S10RR019307, and 1S10RR023043. Additional support was provided by the NIH Blueprint for Neuroscience Research (5U01-MH093765), part of the multi-institutional Human Connectome Project. In addition, BF has a financial interest in CorticoMetrics, a company whose medical pursuits focus on brain imaging and measurement technologies. BF’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies. Data used in the preparation of this article were obtained in part by the OASIS Cross-Sectional with principal investigators D. Marcus, R, Buckner, J, Csernansky J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382, and OASIS: Longitudinal: Principal Investigators: D. Marcus, R, Buckner, J. Csernansky, J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382. Further, data used in the preparation of this article were obtained from the MIRIAD database. The MIRIAD investigators did not participate in analysis or writing of this report. The MIRIAD dataset is made available through the support of the UK Alzheimer’s Society (Grant RF116). The original data collection was funded through an unrestricted educational grant from GlaxoSmithKline (Grant 6GKC). Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics.The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Data were also provided in part by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.

References

References

  • [1] S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. R. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga, L. Beckett, Ways toward an early diagnosis in alzheimer’s disease: the alzheimer’s disease neuroimaging initiative (adni), Alzheimer’s & Dementia 1 (1) (2005) 55–66.
  • [2] D. S. Marcus, T. H. Wang, J. Parker, J. G. Csernansky, J. C. Morris, R. L. Buckner, Open access series of imaging studies (oasis): cross-sectional mri data in young, middle aged, nondemented, and demented older adults, Journal of cognitive neuroscience 19 (9) (2007) 1498–1507.
  • [3] D. S. Marcus, A. F. Fotenos, J. G. Csernansky, J. C. Morris, R. L. Buckner, Open access series of imaging studies: longitudinal mri data in nondemented and demented older adults, Journal of cognitive neuroscience 22 (12) (2010) 2677–2684.
  • [4] R. A. Poldrack, E. Congdon, W. Triplett, K. Gorgolewski, K. Karlsgodt, J. Mumford, F. Sabb, N. Freimer, E. London, T. Cannon, et al., A phenome-wide examination of neural and cognitive function, Scientific data 3 (2016) 160110.
  • [5] A. Di Martino, D. O’connor, B. Chen, K. Alaerts, J. S. Anderson, M. Assaf, J. H. Balsters, L. Baxter, A. Beggiato, S. Bernaerts, et al., Enhancing studies of the connectome in autism using the autism brain imaging data exchange ii, Scientific data 4 (2017) 170010.
  • [6] I. B. Malone, D. Cash, G. R. Ridgway, D. G. MacManus, S. Ourselin, N. C. Fox, J. M. Schott, Miriad—public release of a multiple time point alzheimer’s mr imaging dataset, NeuroImage 70 (2013) 33–36.
  • [7] M. A. Ikram, G. G. Brusselle, S. D. Murad, C. M. van Duijn, O. H. Franco, A. Goedegebure, C. C. Klaver, T. E. Nijsten, R. P. Peeters, B. H. Stricker, et al., The rotterdam study: 2018 update on objectives, design and main results, European journal of epidemiology 32 (9) (2017) 807–850.
  • [8] D. C. Van Essen, K. Ugurbil, E. Auerbach, D. Barch, T. Behrens, R. Bucholz, A. Chang, L. Chen, M. Corbetta, S. W. Curtiss, et al., The human connectome project: a data acquisition perspective, Neuroimage 62 (4) (2012) 2222–2231.
  • [9] C. Sudlow, J. Gallacher, N. Allen, V. Beral, P. Burton, J. Danesh, P. Downey, P. Elliott, J. Green, M. Landray, B. Liu, P. Matthews, G. Ong, J. Pell, A. Silman, A. Young, T. Sprosen, T. Peakman, R. Collins, Uk biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age, PLOS Medicine 12 (3) (2015) 1–10.
  • [10] M. M. Breteler, T. Stöcker, E. Pracht, D. Brenner, R. Stirnberg, Mri in the rhineland study: a novel protocol for population neuroimaging, Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 10 (4) (2014) P92.
  • [11]

    B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove, A. Van Der Kouwe, R. Killiany, D. Kennedy, S. Klaveness, et al., Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain, Neuron 33 (3) (2002) 341–355.

  • [12] D. W. Shattuck, R. M. Leahy, Brainsuite: An automated cortical surface identification tool, Medical Image Analysis 6 (2) (2002) 129 – 142.
  • [13] K. Friston, J. Ashburner, S. Kiebel, T. Nichols, W. Penny, Statistical Parametric Mapping: The Analysis of Functional Brain Images, Academic Press, 2007.
  • [14] B. B. Avants, N. Tustison, G. Song, Advanced normalization tools (ants), Insight Journal 2 (2009) 1–35.
  • [15] M. Jenkinson, C. F. Beckmann, T. E. Behrens, M. W. Woolrich, S. M. Smith, Fsl, NeuroImage 62 (2) (2012) 782 – 790.
  • [16]

    J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.

  • [17] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.
  • [18] H. Noh, S. Hong, B. Han, Learning deconvolution network for semantic segmentation, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528.
  • [19] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE transactions on pattern analysis and machine intelligence 39 (12) (2017) 2481–2495.
  • [20] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 Fourth International Conference on 3D Vision (3DV), IEEE, 2016, pp. 565–571.
  • [21] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, Y. Bengio, The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, IEEE, 2017, pp. 1175–1183.
  • [22] N. Rani, S. Vashisth, Brain tumor detection and classification with feed forward back-prop neural network, CoRR abs/1706.06411. arXiv:1706.06411.
  • [23] H. Dong, G. Yang, F. Liu, Y. Mo, Y. Guo, Automatic brain tumor detection and segmentation using u-net based fully convolutional networks, CoRR abs/1705.03820. arXiv:1705.03820.
  • [24] M. Arunachalam, S. R. Savarimuthu, An efficient and automatic glioblastoma brain tumor detection using shift-invariant shearlet transform and neural networks, Int. J. Imaging Systems and Technology 27 (3) (2017) 216–226.
  • [25] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-M. Jodoin, H. Larochelle, Brain tumor segmentation with deep neural networks, Medical Image Analysis 35 (2017) 18 – 31.
  • [26] J. Amin, M. Sharif, M. Yasmin, S. L. Fernandes, Big data analysis for brain tumor detection: Deep convolutional neural networks, Future Generation Comp. Syst. 87 (2018) 290–297.
  • [27] Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries - 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part I, Vol. 11383 of Lecture Notes in Computer Science, Springer, 2019.
  • [28] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, B. Glocker, Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation, Medical Image Analysis 36 (2017) 61 – 78.
  • [29] A. Varghese, K. Vaidhya, S. Thirunavukkarasu, C. Kesavdas, G. Krishnamurthi, Semi-supervised learning using denoising autoencoders for brain lesion detection and segmentation, CoRR abs/1611.08664. arXiv:1611.08664.
  • [30] M. Rezaei, H. Yang, C. Meinel, Deep neural network with l2-norm unit for brain lesions detection, CoRR abs/1708.05221. arXiv:1708.05221.
  • [31] L. Roa-Barco, O. Serradilla-Casado, M. de Velasco-Vázquez, A. López-Zorrilla, M. Graña, D. Chyzhyk, C. Price, A 2d/3d convolutional neural network for brain white matter lesion detection in multimodal MRI, in: Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017, Polanica Zdroj, Poland, 22-24 May 2017., 2017, pp. 377–385.
  • [32] X. Chen, E. Konukoglu, Unsupervised detection of lesions in brain MRI using constrained adversarial auto-encoders, CoRR abs/1806.04972. arXiv:1806.04972.
  • [33] A. Majumdar, Real-time dynamic MRI reconstruction using stacked denoising autoencoder, CoRR abs/1503.06383. arXiv:1503.06383.
  • [34] K. H. Jin, M. T. McCann, E. Froustey, M. Unser, Deep convolutional neural network for inverse problems in imaging, IEEE Transactions on Image Processing 26 (9) (2017) 4509–4522.
  • [35] M. Mardani, E. Gong, J. Y. Cheng, S. Vasanawala, G. Zaharchuk, M. T. Alley, N. Thakur, S. Han, W. J. Dally, J. M. Pauly, L. Xing, Deep generative adversarial networks for compressed sensing automates MRI, CoRR abs/1706.00051. arXiv:1706.00051.
  • [36] J. Schlemper, J. Caballero, J. V. Hajnal, A. N. Price, D. Rueckert, A deep cascade of convolutional neural networks for dynamic mr image reconstruction, IEEE Transactions on Medical Imaging 37 (2) (2018) 491–503.
  • [37] G. Yang, S. Yu, H. Dong, G. G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. R. Arridge, J. Keegan, Y. Guo, D. N. Firmin, DAGAN: deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction, IEEE Trans. Med. Imaging 37 (6) (2018) 1310–1321.
  • [38]

    M. A. Dedmari, S. Conjeti, S. Estrada, P. Ehses, T. Stöcker, M. Reuter, Complex fully convolutional neural networks for mr image reconstruction, in: F. Knoll, A. Maier, D. Rueckert (Eds.), Machine Learning for Medical Image Reconstruction, Springer International Publishing, 2018, pp. 30–38.

  • [39] A. Payan, G. Montana, Predicting Alzheimer’s disease - A neuroimaging study with 3d convolutional neural networks, in: ICPRAM 2015 - Proceedings of the International Conference on Pattern Recognition Applications and Methods, Volume 2, Lisbon, Portugal, 10-12 January, 2015., 2015, pp. 355–362.
  • [40] J. Qi, J. Tejedor, Deep multi-view representation learning for multi-modal features of the schizophrenia and schizo-affective disorder, in: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 952–956.
  • [41] E. Hosseini-Asl, R. Keynton, A. El-Baz, Alzheimer’s disease diagnostics by adaptation of 3d convolutional network, CoRR abs/1607.00455. arXiv:1607.00455.
  • [42] A. Khvostikov, K. Aderghal, A. S. Krylov, G. Catheline, J. Benois-Pineau, 3d inception-based CNN with sMRI and MD-DTI data fusion for Alzheimer’s disease diagnostics, CoRR abs/1809.03972. arXiv:1809.03972.
  • [43] G. Lee, K. Nho, B. Kang, K.-A. Sohn, D. Kim, Predicting Alzheimer’s disease progression using multi-modal deep learning approach, Scientific Reports 9 (1).
  • [44] A. de Brébisson, G. Montana, Deep neural networks for anatomical brain segmentation, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015, pp. 20–28.
  • [45] C. Wachinger, M. Reuter, T. Klein, Deepnat: deep convolutional neural network for segmenting neuroanatomy, NeuroImage 170 (2018) 434–445.
  • [46] A. G. Roy, S. Conjeti, D. Sheet, A. Katouzian, N. Navab, C. Wachinger, Error corrective boosting for learning fully convolutional networks with limited data, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2017, pp. 231–239.
  • [47] A. G. Roy, S. Conjeti, N. Navab, C. Wachinger, A. D. N. Initiative, et al., Quicknat: A fully convolutional network for quick and accurate segmentation of neuroanatomy, NeuroImage 186 (2019) 713–727.
  • [48] A. Jog, A. Hoopes, D. N. Greve, K. V. Leemput, B. Fischl, Psacnn: Pulse sequence adaptive fast whole brain segmentation, NeuroImage 199 (2019) 553 – 569.
  • [49] A. de Brebisson, G. Montana, Deep neural networks for anatomical brain segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 20–28.
  • [50] G. Huang, Z. Liu, K. Q. Weinberger, L. van der Maaten, Densely connected convolutional networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1, 2017, p. 3.
  • [51] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, Y. Bengio, Maxout networks, in: Proceedings of the 30th International Conference on International Conference on Machine Learning-Volume 28, JMLR. org, 2013, pp. III–1319.
  • [52] C. Jack, M. A Bernstein, N. C Fox, P. Thompson, G. Alexander, D. Harvey, B. Borowski, P. Britson, J. L. Whitwell, C. Ward, A. Dale, J. Felmlee, J. Gunter, D. Hill, R. Killiany, N. Schuff, S. Fox-Bosetti, C. Lin, C. Studholme, M. Weiner, The alzheimer’s disease neuroimaging initiative (adni): Mri methods, Journal of magnetic resonance imaging : JMRI 27 (2008) 685–91.
  • [53] D. G. Wakeman, R. N. Henson, A multi-subject, multi-modal human neuroimaging dataset, Scientific data 02 (2015) 150001.
  • [54]

    V. A. Magnotta, J. T. Matsui, D. Liu, H. J. Johnson, J. D. Long, B. D. Bolster Jr, B. A. Mueller, K. Lim, S. Mori, K. G. Helmer, et al., Multicenter reliability of diffusion tensor imaging, Brain connectivity 2 (6) (2012) 345–355.

  • [55] B. Fischl, Freesurfer, Neuroimage 62 (2) (2012) 774–781.
  • [56] X. Han, J. Jovicich, D. Salat, A. van der Kouwe, B. Quinn, S. Czanner, E. Busa, J. Pacheco, M. Albert, R. Killiany, P. Maguire, D. Rosas, N. Makris, A. Dale, B. Dickerson, B. Fischl, Reliability of mri-derived measurements of human cerebral cortical thickness: The effects of field strength, scanner upgrade and manufacturer, NeuroImage 32 (1) (2006) 180–194.
  • [57] M. Reuter, N. J. Schmansky, H. D. Rosas, B. Fischl, Within-subject template estimation for unbiased longitudinal image analysis, NeuroImage 61 (4) (2012) 1402–1418.
  • [58] A. Klein, J. Tourville, 101 labeled brain images and a consistent human cortical labeling protocol, Frontiers in Neuroscience 6 (2012) 171.
  • [59] R. S. Desikan, F. Ségonne, B. Fischl, B. T. Quinn, B. C. Dickerson, D. Blacker, R. L. Buckner, A. M. Dale, R. P. Maguire, B. T. Hyman, M. S. Albert, R. J. Killiany, An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest, NeuroImage 31 (3) (2006) 968 – 980.
  • [60] S. Estrada, S. Conjeti, M. Ahmad, N. Navab, M. Reuter, Competition vs. concatenation in skip connections of fully convolutional networks, in: Machine Learning in Medical Imaging, Springer International Publishing, 2018, pp. 214–222.
  • [61] Z. Liao, G. Carneiro, On the importance of normalisation layers in deep learning with piecewise linear activation units, in: Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, IEEE, 2016, pp. 1–8.
  • [62] Z. Liao, G. Carneiro, A deep convolutional neural network module that promotes competition of multiple-size filters, Pattern Recognition 71 (2017) 94–105.
  • [63] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch, in: NIPS Workshop Autodiff, 2017.
  • [64] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  • [65] A. M. Dale, B. Fischl, M. I. Sereno, Cortical surface-based analysis: I. segmentation and surface reconstruction, NeuroImage 9 (2) (1999) 179 – 194.
  • [66] B. Fischl, A. K. Liu, A. M. Dale, Automated manifold surgery: Constructing geometrically accurate and topologically correct models of the human cerebral cortex, IEEE Trans. Med. Imaging 20 (1) (2001) 70–80.
  • [67] B. Fischl, M. I. Sereno, A. M. Dale, Cortical surface-based analysis: II: Inflation, flattening, and a surface-based coordinate system, NeuroImage 9 (2) (1999) 195 – 207.
  • [68] F. Ségonne, J. Pacheco, B. Fischl, Geometrically accurate topology-correction of cortical surfaces using nonseparating loops, IEEE Trans. Med. Imaging 26 (4) (2007) 518–529.
  • [69] B. Fischl, A. Dale, Measuring the thickness of the human cerebral cortex from magnetic resonance images, Proceedings of the National Academy of Sciences of the United States of America 97 (20) (2000) 11050—11055.
  • [70] B. Fischl, M. I. Sereno, R. Tootell, A. Dale, High-resolution intersubject averaging and a coordinate system for the cortical surface, Human brain mapping 8 (1999) 272–84.
  • [71] B. Fischl, A. van der Kouwe, C. Destrieux, E. Halgren, F. Ségonne, D. Salat, E. Busa, L. Seidman, J. Goldstein, D. Kennedy, V. Caviness, N. Makris, B. Rosen, A. Dale, Automatically parcellating the human cerebral cortex, Cerebral cortex (New York, N.Y. : 1991) 14 (2004) 11–22.
  • [72] R. L. Buckner, D. Head, J. Parker, A. F. Fotenos, D. Marcus, J. C. Morris, A. Z. Snyder, A unified approach for morphometric and functional data analysis in young, old, and demented adults using automated atlas-based head size normalization: reliability and validation against manual measurement of total intracranial volume, Neuroimage 23 (2004) 724–738.
  • [73] J. G. Sled, A. P. Zijdenbos, A. C. Evans, A nonparametric method for automatic correction of intensity nonuniformity in mri data, IEEE Transactions on Medical Imaging 17 (1) (1998) 87–97.
  • [74] W. E. Lorensen, H. E. Cline, Marching cubes: A high resolution 3d surface construction algorithm, SIGGRAPH Comput. Graph. 21 (4) (1987) 163–169.
  • [75] M. Reuter, F.-E. Wolter, N. Peinecke, Laplace-beltrami spectra as ”shape-dna” of surfaces and solids, Computer-Aided Design 38 (4) (2006) 342–366.
  • [76] M. Reuter, Hierarchical shape segmentation and registration via topological features of laplace-beltrami eigenfunctions, International Journal of Computer Vision 89 (2) (2010) 287–308.
  • [77] K. O. McGraw, S. Wong, Forming inferences about some intraclass correlation coefficients, Psychological Methods 1 (1) (1996) 30–46.
  • [78] M. Reuter, H. D. Rosas, B. Fischl, Highly accurate inverse consistent registration: a robust approach, Neuroimage 53 (4) (2010) 1181–1196.
  • [79] L. W. de Jong, K. van der Hiele, I. M. Veer, J. J. Houwing, R. G. J. Westendorp, E. L. E. M. Bollen, P. W. de Bruin, H. A. M. Middelkoop, M. A. van Buchem, J. van der Grond, Strongly reduced volumes of putamen and thalamus in Alzheimer’s disease: an MRI study, Brain 131 (12) (2008) 3277–3285.
  • [80] W. Henneman, J. D. Sluimer, J. Barnes, W. M. van der Flier, I. C. Sluimer, N. C. Fox, P. Scheltens, H. Vrenken, F. Barkhof, Hippocampal atrophy rates in alzheimer disease, Neurology 72 (11) (2009) 999–1007.
  • [81] N. Schuff, N. Woerner, L. Boreta, T. Kornfield, L. M. Shaw, J. Q. Trojanowski, P. M. Thompson, J. Jack, C. R., M. W. Weiner, the Alzheimer’s; Disease Neuroimaging Initiative, MRI of hippocampal volume loss in early Alzheimer’s disease in relation to ApoE genotype and biomarkers, Brain 132 (4) (2009) 1067–1077.
  • [82] S. P. Poulin, R. Dautoff, J. C. Morris, L. F. Barrett, B. C. Dickerson, Amygdala atrophy is prominent in early alzheimer’s disease and relates to symptom severity, Psychiatry Research: Neuroimaging 194 (1) (2011) 7 – 13.
  • [83] J. P. Aggleton, A. Pralus, A. J. D. Nelson, M. Hornberger, Thalamic pathology and memory loss in early Alzheimer’s disease: moving the focus from the medial temporal lobe to Papez circuit, Brain 139 (7) (2016) 1877–1890.
  • [84] L. Pini, M. Pievani, M. Bocchetta, D. Altomare, P. Bosco, E. Cavedo, S. Galluzzi, M. Marizzoni, G. B. Frisoni, Brain atrophy in alzheimer’s disease and aging, Ageing Research Reviews 30 (2016) 25 – 48, brain Imaging and Aging.
  • [85] H. Braak, E. Braak, Staging of alzheimer’s disease-related neurofibrillary changes, Neurobiology of Aging 16 (3) (1995) 271 – 278, the Schmitt Symposium: The Cytoskeleton and Alzheimer’s Disease.
  • [86] J. Baron, G. Chételat, B. Desgranges, G. Perchey, B. Landeau, V. de la Sayette, F. Eustache, In vivo mapping of gray matter loss with voxel-based morphometry in mild alzheimer’s disease, NeuroImage 14 (2) (2001) 298 – 309.
  • [87] G. Wenk, Neuropathologic changes in alzheimer’s disease, The Journal of clinical psychiatry 64 Suppl 9 (2003) 7–10.
  • [88] J. P. Lerch, J. C. Pruessner, A. Zijdenbos, H. Hampel, S. J. Teipel, A. C. Evans, Focal Decline of Cortical Thickness in Alzheimer’s Disease Identified by Computational Neuroanatomy, Cerebral Cortex 15 (7) (2004) 995–1001.

Appendix

Subcortical Structures Proposed FS Cortical Structures Proposed FS
Cortical white matter (lh) 1 2 caudalanteriorcingulate (lh) 34 1002
Lateral Ventricle (lh) 2 4 caudalmiddlefrontal (lh, rh) 35 1003, 2003
Inferior Lateral Ventricle (lh) 3 5 cuneus (lh) 36 1005
Cerebellar White Matter (lh) 4 7 entorhinal (lh, rh) 37 1006, 2006
Cerebellar Cortex (lh) 5 8 fusiform (lh, rh) 38 1007, 2007
Thalamus (lh) 6 10 inferiorparietal (lh, rh) 39 1008, 2008
Caudate (lh) 7 11 inferiortemporal (lh, rh) 40 1009, 2009
Putamen (lh) 8 12 isthmuscingulate (lh) 41 1010
Pallidum (lh) 9 13 lateraloccipital (lh, rh) 42 1011, 2011
3rd-Ventricle 10 14 lateralorbitofrontal (lh) 43 1012
4th-Ventricle 11 15 lingual (lh) 44 1013
Brain Stem 12 16 medialorbitofrontal (lh) 45 1014
Hippocampus (lh) 13 17 middletemporal (lh, rh) 46 1015, 2015
Amygdala (lh) 14 18 parahippocampal (lh) 47 1016
CSF 15 24 paracentral (lh) 48 1017
Accumbens (lh) 16 26 parsopercularis (lh, rh) 49 1018, 2018
Ventral DC (lh) 17 28 parsorbitalis (lh, rh) 50 1019, 2019
Choroid Plexus (lh) 18 31 parstriangularis (lh, rh) 51 1020, 2020
Cortical white matter (rh) 19 41 pericalcarine (lh) 52 1021
Lateral Ventricle (rh) 20 43 postcentral (lh) 53 1022
Inferior Lateral Ventricle (rh) 21 44 posteriorcingulate (lh) 54 1023
Cerebellar White Matter (rh) 22 46 precentral (lh) 55 1024
Cerebellar Cortex (rh) 23 47 precuneus (lh) 56 1025
Thalamus (rh) 24 49 rostralanteriorcingulate (lh, rh) 57 1026, 2026
Caudate (rh) 25 50 rostralmiddlefrontal (lh, rh) 58 1027, 2027
Putamen (rh) 26 51 superiorfrontal (lh) 59 1028
Pallidum (rh) 27 52 superiorparietal (lh, rh) 60 1029, 2029
Hippocampus (rh) 28 53 superiortemporal (lh, rh) 61 1030, 2030
Amygdala (rh) 29 54 supramarginal (lh, rh) 62 1031, 2031
Accumbens (rh) 30 58 transversetemporal (lh, rh) 63 1034, 2034
Ventral DC (rh) 31 60 insula (lh, rh) 64 1035, 2035
Choroid Plexus (rh) 32 63 caudalanteriorcingulate (rh) 65 2002
WM-hypointensities 33 77 cuneus (rh) 66 2005
isthmuscingulate (rh) 67 2010
lateralorbitofrontal (rh) 68 2012
lingual (rh) 69 2013
medialorbitofrontal (rh) 70 2014
parahippocampal (rh) 71 2016
paracentral (rh) 72 2017
pericalcarine (rh) 73 2021
postcentral (rh) 74 2022
posteriorcingulate (rh) 75 2023
precentral (rh) 76 2024
precuneus (rh) 77 2025
superiorfrontal (rh) 78 2028
Table A.1: FastSurferCNN (proposed) segmentation labels and mapping to FreeSurfer (FS) for subcortical (left) and cortical (right) structures.
Usage Dataset Scanner 1.5T/3T State Age Subjects
Training ABIDE-II Phillips 3T Autism/Normal 20-39 20
ADNI Philips, GE, Siemens 1.5T/3T AD/MCI/Normal 56-90 40
LA5C Siemens 3T Neuropsych/Normal 23-44 20
OASIS1 Siemens 1.5T Normal 18-60 40
OASIS2 Siemens 1.5T AD/Normal 66-90 20
Validation MIRIAD GE 1.5T AD/Normal 60-77 20
Accuracy ADNI Philips, GE, Siemens 1.5T/3T AD/MCI/Normal 58-85 180
HCP Siemens 3T Normal 22-35 45
OASIS1 Siemens 1.5T AD/Normal 18-96 370
MIRIAD GE 1.5 T AD/Normal 55-80 49
THP Philips, Siemens 3T Normal - 5
Generalizability ADNI Philips, GE, Siemens 3T AD/MCI/Normal 58-85 180
HCP Siemens 3T Normal 22-35 45
MIRIAD GE 1.5 T AD/Normal 55-80 49
MMND Siemens 3T Normal 23-31 16
THP Philips, Siemens 3T Normal - 5
Reliability OASIS1 Siemens 1.5T Normal 19-34 20
Sensitivity OASIS1 Siemens 1.5T AD/Normal 18-96 370
Table A.2: Summary of training, validation and testing sets. Table lists the usage, scanner, field strength, (disease) state, age range and number of used subjects for each dataset.