Fast Infant MRI Skullstripping with Multiview 2D Convolutional Neural Networks

by   Amod Jog, et al.
Harvard University

Skullstripping is defined as the task of segmenting brain tissue from a full head magnetic resonance image (MRI). It is a critical component in neuroimage processing pipelines. Downstream deformable registration and whole brain segmentation performance is highly dependent on accurate skullstripping. Skullstripping is an especially challenging task for infant (age range 0--18 months) head MRI images due to the significant size and shape variability of the head and the brain in that age range. Infant brain tissue development also changes the T_1-weighted image contrast over time, making consistent skullstripping a difficult task. Existing tools for adult brain MRI skullstripping are ill equipped to handle these variations and a specialized infant MRI skullstripping algorithm is necessary. In this paper, we describe a supervised skullstripping algorithm that utilizes three trained fully convolutional neural networks (CNN), each of which segments 2D T_1-weighted slices in axial, coronal, and sagittal views respectively. The three probabilistic segmentations in the three views are linearly fused and thresholded to produce a final brain mask. We compared our method to existing adult and infant skullstripping algorithms and showed significant improvement based on Dice overlap metric (average Dice of 0.97) with a manually labeled ground truth data set. Label fusion experiments on multiple, unlabeled data sets show that our method is consistent and has fewer failure modes. In addition, our method is computationally very fast with a run time of 30 seconds per image on NVidia P40/P100/Quadro 4000 GPUs.



page 5

page 8

page 19


Pulse Sequence Resilient Fast Brain Segmentation

Accurate automatic segmentation of brain anatomy from T_1-weighted (T_1-...

PSACNN: Pulse Sequence Adaptive Fast Whole Brain Segmentation

With the advent of convolutional neural networks (CNN), supervised learn...

PSACNN: Pulse Sequence Resilient Fast Whole Brain Segmentation

With the advent of convolutional neural networks (CNN), supervised learn...

Infant Brain Age Classification: 2D CNN Outperforms 3D CNN in Small Dataset

Determining if the brain is developing normally is a key component of pe...

Real-Time Automatic Fetal Brain Extraction in Fetal MRI by Deep Learning

Brain segmentation is a fundamental first step in neuroimage analysis. I...

Automated brain extraction of multi-sequence MRI using artificial neural networks

Brain extraction is a critical preprocessing step in the analysis of MRI...

Realistic head modeling of electromagnetic brain activity: An integrated Brainstorm pipeline from MRI data to the FEM solution

Human brain activity generates scalp potentials (electroencephalography ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Skullstripping is a critical image preprocessing step in most neuroimage processing pipelines Fischl et al. (2002). The goal of skullstripping is to take an input whole head magnetic resonance image (MRI) and output a binary mask with a value of one for the brain and zero for all the extra-cerebral tissues such as skin, muscle, fat, bone etc. Extra-cerebral tissues significantly vary in their size and shape across subjects. Pulse sequences for brain imaging are also typically optimized to provide the best image contrast for brain tissues such as gray (GM) and white matter (WM). This results in extra-cerebral tissues with large variation in their intensities and imaging features that can prove difficult to model in downstream whole brain segmentation and deformable registration algorithms Ou et al. (2014), potentially reducing their accuracy. Thus, skullstripping becomes a necessary step for most neuroimage processing pipelines to mask out uninteresting background and help increase the accuracy of the complete segmentation and cortical reconstruction pipeline (Fischl et al., 2002).

Skullstripping is all the more important when processing infant brain MRI. Infant heads almost double in size in the first two years of life, as opposed to adult brains, which are of a relatively fixed size. Moreover, tissue contrast in the infant brain changes significantly with different stages of development. In Fig. 1 we show MPRAGE (magnetization prepared gradient echo) -weighted acquisitions Mugler III and Brookeman (1991) of three different infants in three stages of development: newborn (column 1), 6 months old (column 2), and 18 months old (column 3). The tissue contrast differs significantly in all these stages. The small dimensions of the infant head can also result in variations in the field of view and scanner distortions, as is apparent in Fig. 1.

There has been a significant amount of work in the development of skullstripping algorithms for adult head MRI. These include methods using spherical expansion Cox (1996), filtering and edge-detection-based approaches Shattuck et al. (2001), deformable model-based segmentation Smith (2002), variations of watershed algorithms Hahn and Peitgen (2000); Segonne et al. (2004); Carass et al. (2011), a generative-discriminative framework Iglesias et al. (2011), patch-based sparse reconstruction methods for multi-modal data Roy et al. (2017), multi-atlas registration-based skullstripping Doshi et al. (2013) among many others. These methods tend to have a sub-optimal performance for infant MRI skullstripping due to inherent assumptions about skull shape, size, and appearance in adults Mahapatra (2012). Infant MRI skullstripping has received some attention in recent years. Methods have been developed to work simultaneously for pediatric and adult brain MRI Zhuang et al. (2006); Chiverton et al. (2007); Eskildsen et al. (2012). These include model-based level set driven skullstripping Zhuang et al. (2006), a statistical morphology-based tool Chiverton et al. (2007), and non-local patch-based segmentation Eskildsen et al. (2012). A specialized method for infant skullstripping using the prior shape of neonatal brains that is learned using a labeled set of atlas images, with final segmentation using graph cuts was described by Mahapatra (2012). A meta-learning algorithm that combines the outputs of brain extraction tool (BET) Smith (2002) and brain surface extractor (BSE) was described in Shattuck et al. (2001). A multi-atlas registration-based method focusing on apparent diffusion coefficient (ADC) images of infants has also been developed Ou et al. (2015). Recently there have been methods that post-process the BET outputs for infant data specifically Alansary et al. (2016). Multi-atlas registration-based methods tend to be computationally expensive due to the many registrations needed to align the atlases to the subject image. The same holds true for meta learning-based methods that need to run multiple available skullstripping algorithms and combine their results. Intensity and morphology-based methods can also take up to 10–20 minutes to produce a brain mask. Segmentation accuracy can also reduce in the presence of intensity inhomogeneities that can vary the intensity gradient magnitudes between similar tissues across slices.

Recently, deep learning architectures based on convolutional neural networks (CNN) have been successfully applied to a variety of medical image segmentation problems Roth et al. (2015); Kamnitsas et al. (2017); Gibson et al. (2018). A 3D deep learning framework for skullstripping was described by Kleesiek et al. (2016) that used 3D patches of size as input to the network and the output was

-sized patches with probability values of each voxel belonging to the brain class. A great advantage of deep learning algorithms is the fast inference time. Inference using CNNs is very fast and fast skullstripping can significantly speed up neuroimage processing pipelines that typically take hours to complete for a single subject volume.

In this work, we describe SkullStripping CNN or SSCNN, a 2D multi-view CNN-based skullstripping approach. We train three independent 2D CNNs, one each for 2D slices in coronal, axial, and sagittal views. Predictions of these three networks are linearly combined to produce a final 3D mask. Past works have shown that using multi-view approaches can reduce disambiguities as compared to a single view 2D segmentation Bekker et al. (2016). 2D multi-view CNNs have also been used for whole brain segmentation Roy et al. (2019). 2D CNNs are more preferable for the skullstripping task because even for a human labeler, skullstripping is a visually intuitive task to perform on 2D slices instead of 3D patches. Errors made in a single view prediction can potentially be corrected by predictions from the other views. Additionally, skullstripping is usually one of the first preprocessing tasks for brain MRI and the images can present with uncorrected intensity inhomogeneities. The intensity bias is usually assumed to be a slow-changing multiplicative field. A multiview 2D CNN segments each slice independently, for each of the three independent views. Intensity inhomogeneity may adversely affect segmentation of slices in one of the views but the other views can compensate for that loss. In contrast, a 3D CNN would segment a 3D patch, which can have varying intensity inside it making the segmentation less robust. Segmenting a full 3D image instead of 3D patches is ideal but is not possible to achieve using CNNs due to GPU memory constraints.

Our paper is organized as follows. Section 2 describes the method, training, and prediction using multi-view 2D CNNs. In Section 3, we describe parameter selection experiments and comparison with other skullstripping algorithms. Finally, in Section 4, we summarize our observations and conclude with possible avenues of future development.

(a) Newborn (b) 6 months (c) 18 months

Figure 1: Full head MPRAGE acquisitions and manually labeled skull masks for axial (rows 1 and 2), coronal (row 3 and 4), and sagittal (rows 5 and 6) of infants of (a) newborn, (b) 6 months and (c) 18 months of age.

2 Method

Figure 2: The left half of the figure shows training of 3 2D U-Net segmentation architectures, each for slices oriented in one of the three cardinal orientations coronal, sagittal, and axial. The U-net architecture for all three with is the one as shown on the right.

Let be a collection of images with a paired expert manually labeled image set . Each is a 3D image of size and is resampled to a voxel size of  mm. The paired collection is referred to as the training image set. Figures 1(a)–(f) show three training data subjects in three age ranges and their manually labeled brain masks. From , we extract corresponding 2D slices in the coronal orientation of size to create a 2D training data set . Similarly, we extract slices from the sagittal and axial orientations from to generate 2D training data sets, , and . Figure 2 (left) shows training examples for all three orientations.

2.1 SSCNN: Network Architecture and Training

For each orientation , we train a 2D fully convolutional CNN that takes as input the -weighted 2D slice in that orientation to predict the corresponding manually labeled slice. Paired intensity and label slice data is used as training data for the CNN . We are interested in the semantic voxel-wise segmentation of the input 2D slice and therefore use the well-known U-Net architecture Ronneberger et al. (2015) for .

The U-Net consists of an encoder block with levels, followed by a symmetric decoder block. Figure 2 (right) shows an example U-net architecture with levels. In each level of the encoder block, we have convolutional layers (orange layer in Fig. 2). The number of filters in each convolutional layer at level is and the number of filters in subsequent levels is

. All convolutional layers, except the last layer, have a rectified linear (ReLU) activation, with a filter size

. We choose the optimal values for , , , and

by cross-validation experiments. Each convolutional layer (conv) is followed by a batch normalization (BN) layer 

Ioffe and Szegedy (2015) (green in Fig. 2). At the end of each level (except the last one) is a pooling layer (red in Fig. 2) that downsamples the output of that level by 2 in the image height-width directions. At the deepest level, the encoder block ends and the decoder block begins. In the decoder, the deepest conv+BN layers are followed by an upsampling layer (blue in Fig. 2) that upsamples the output of the previous level by a factor of 2 in the image height-width directions. The upsampled output is followed by a symmetric number of levels as the encoder. The number of filters and number of conv+BN layers at each level of the decoder block is the same as the corresponding symmetric level on the encoder block. The final layer consists of , conv layers with a softmax activation. To summarize, when a network is applied to an input slice of the shape in orientation , the output is of the shape , and stores the probabilities of each voxel in the input slice belonging to the background (class 0), or brain (class 1).

All three U-Nets are trained to minimize a soft Dice-based loss averaged over the whole batch. The loss is shown in Eqn. 1, where denotes the voxels present in ground truth slice and predicted slice of the training sample in a batch of samples in and . During training, is a

-sized tensor with each voxel storing the softmax probability of it belonging to a particular label.

is a similarly-shaped one-hot encoding of the label present at each of the voxels.


We use the Adam optimization algorithm Kingma and Ba (2015) to minimize the loss with an initial learning rate of

that reduces by half if the training loss does not reduce for five continuous epochs. Each epoch uses 3000 training slice pairs extracted from 23 training subjects. We use

600 slices extracted from a validation dataset that consists of three subjects. We train each of the , , and for 30 epochs. The training loss reduces monotonously as the number of epochs increases, however, we select the model with the minimum validation loss as our final trained model. This avoids using networks that overfit to the training data and do not generalize well for test data from other sources. All three networks , , are trained independent of each other.

2.2 SSCNN: Prediction

Figure 3: The prediction workflow for SSCNN.

Given a 3D test subject volume , we reorient it in coronal, sagittal, and axial directions to create 3D volumes , , and respectively. Let be the slice in , . We apply the trained network to to generate output for all to form

Each of , , is reoriented to the coronal orientation to create , , , respectively. Note that each image stores the voxel-wise class probabilities. We linearly combine these probability images to generate a fused result in the coronal orientation, as shown in Eqn. 2:


The weights are constrained such that . We found the optimal weights using cross-validation experiments to be , , and . Figure 3 illustrates the SSCNN prediction workflow. The brainmask is obtained by identifying the class with the higher probability value by . To remove any small non-brain blobs, we identify the connected components in the binary labeled image and select the largest connected component as the final labeled brain.

3 Experiments

In this section we describe the training data set (Section 3.1), the SSCNN parameter selection step (Section 3.2), and a set of quantitative experiments demonstrating the high quality performance of our tool: a set of leave-one-out cross validation experiments and comparison with other skullstripping algorithms on the manually labeled training data set (Section 3.3), validation on an independent manually labeled test data set (Section 3.3.2), and a comparison on multiple unlabeled data sets (Section 3.3.3).

3.1 Training Data Set:

Our training data set was generated by using the volumes described in de Macedo Rodrigues et al. (2015)

. A total of 26 T1-weighted MRI volumes, from subjects whose age is almost uniformly distributed in the age range of 0-18 months, together with their binary brain masks were used.

3.2 SSCNN Parameter Selection

The 2D U-Nets used in SSCNN have a number of free parameters that need to be carefully selected for optimal performance. For parameter selection, we chose 13 subjects distributed roughly equally in the age range of 0–18 months as our parameter selection training set. We used four subjects to generate a validation data set to prevent overfitting. We tested the trained network on the remaining nine subjects. While selecting the optimal value of a parameter, we iterated over a set of possible values by keeping the other parameters fixed and chose the parameter with the highest test segmentation accuracy over the nine subjects. The optimal parameters are therefore dependent on the order in which they were fixed.

The parameters of interest for our proposed pipeline are the following: (1) kernel size (), (2) depth of network (number of pooling layers = ), (3) number of filters in the first level (, and (4) number of convolutional layers in each level ().

3.2.1 Kernel Size 

This experiment was to select the most optimal kernel size of the 2D kernels used in each of the CNNs. Larger kernel sizes have larger receptive fields that can act on larger image contexts. Therefore we expect larger kernel sizes to improve the segmentation accuracy, but they also have a much larger number of paramaters to learn. We experimented with , by keeping the number of pooling layers , the number of filters in the first layer , and the number of convolutional layers in each level . Increasing the kernel size beyond resulted in out of memory errors. Figure 4 shows a boxplot of Dice overlap coefficients obtained on the nine test subjects when compared with the ground truth segmentation. The median Dice score increases as the kernel size increases, with resulting in the highest value. Therefore, we fixed the filter shape to be .

Figure 4: Dice coefficient with varying kernel size

3.2.2 Depth of Network

In this experiment we varied the depth of the network given by , where is the number of pooling layers in the network. It has been observed that deeper networks lead to better performance. We varied and kept , , and . In Fig. 5 we show boxplots for the four experiments, where we observe that the depth of the network does not change segmentation accuracy as it increases from four to seven. We set as it produces marginally higher numbers than the rest, but it is not a significant difference.

Figure 5: Dice coefficient with varying number of pooling layers

3.2.3 Number of Filters

In this experiment, we chose the number of filters in the convolutional layers of the first level . As described in Section 2, the number of filters in the subsequent levels is given by , therefore the total number of filters in the entire network is a function of the number of filters in the first level. The more filters we have the more image variation can be captured at different network depths. We expect that more filters in the network will lead to higher segmentation accuracy. We experimented with , by keeping , , and . was not possible due to GPU memory constraints. Figure 6 shows a boxplot of Dice coefficients obtained on the nine test subjects when compared with the ground truth segmentation. The median Dice increases as increases from 8 to 32. produces the highest median Dice score and thus we set .

Figure 6: Dice coefficient with varying number of filters in the first level

3.2.4 Depth per Level of Network

In these experiments, we varied the number of convolutional layers at each level of the U-Net. We can expect that the higher the number of convolutional layers at each level, the better the image texture is captured. However, this also increases the number of parameters in their network leading to sub-optimal training and possible overfitting. We varied by keeping , , . Figure 7 shows the Dice score boxplots calculated over the nine test subjects. We observe that the median Dice score increases from to and then falls for . Based on these experiments, we chose for our optimal network.

Figure 7: Dice coefficient with varying number of convolutional layers per level

3.3 Evaluation of Segmentation Accuracy

3.3.1 Cross-validation on Training Data Set

We chose the best network with parameters selected from the experiments described in Section 3.2

. We ran SSCNN in a leave-one-out-cross-validation framework (testing data was excluded from the training data) and evaluated the outcomes with the given ground truth segmentation using Dice and Jaccard overlap metrics. The mean and standard deviation of these overlap measures were: for Dice

and for Jaccard . Figure 8 displays the individual Dice score results.

Figure 8: Dice overlap coefficients on the training data using our new skullstripping tool in a LOOCV framework

In order to put such a performance into perspective, we selected a set of five publicly available and widely used skullstripping algorithms and evaluated them on the same data set. These tools were: ROBEX, BET, BSE, 3dSkullStrip and Watershed. We also optimized (where appropriate) the key parameters of these tools on the training data set, in order to run comparison as fair as possible. In the below description, we indicate the parameters, their optimization range and the optimized parameters.

Skullstripping tools: The RObust Brain Extraction (ROBEX)Iglesias et al. (2011) tool had been primarily designed for adult input images. It deforms a brain surface to brain boundary, which is found by the brain versus non-brain classification. The deformed brain surface is then locally refined by a graphcut algorithm to obtain the final brain mask. ROBEX had performed well among 10+ skullstripping algorithms in multi-site brain images of the adults Iglesias et al. (2011) and in children as young as newbornsSerag et al. (2016). The advantage of ROBEX is that it does not require users to modify any parameters. The Brain Extraction Tool (BET) Smith (2002) is part of the FSL image analysis pipeline. It was primarily designed for images of adults. It evolves a deformable model to fit the brain surface. The performance of BET is often sensitive to parameter variations Lee et al. (2003); Popescu et al. (2012). BET was used for skullstripping neonatal brain images Shi et al. (2012); Makropoulos et al. (2014); Serag et al. (2012); but only upon careful and manual parameter tuning Popescu et al. (2012). We varied the fractional intensity threshold (f) and vertical gradient in fractional intensity threshold (g) parameters: f:[.2:.05:.8], g:[-.3:.05,.3]; (f,g)=(-.8,.05) Brain Surface Extractor (BSE) Shattuck et al. (2001) had been primarily designed for images of adults. It smooths the input image and uses edge detectors to find brain boundary, and refines the results by morphological operations. BSE was used in Shi et al. (2012); Serag et al. (2016) for neonatal images upon parameter tuning. We varied the diffusion (d) and edge detection constants (s): d:[10:5:60], s:[.42:.04:.82]; (d,s)=(-10,.58). 3dSkullStrip is part of the AFNI image analysis pipeline Cox (1996). It replaces BET’s deformable model with a spherical surface expansion paradigm, and modifies BET in other parts to avoid the eyes and to reduce leakage into the skull. We varied the brain vs non-brain intensity threshold (shrink_fac) and speed of expansion (exp_frac) parameters: shrink_fac: [.4:.05:.8], exp_frac: [.05:.025:.15]; (shrink_fac, exp_frac)= (0.8, .05). Hybrid Watershed (WATERSHED) Segonne et al. (2004) is part of the FreeSurfer neuroimaging analysis pipeline. It creates an initial brain surface using a watershed algorithm, and then evolves the brain surface to refine the result. We varied the preflooding height (h) parameter: h:[10:2:40]; h= 10.

The mean and standard deviation of these overlap measures are included in Table 1 and Figure 9 displays the individual Dice score results.

Method Dice (Mean; std. dev.) Jaccard (Mean; std. dev.)
SSCNN (97.8; 1.4) (95.7; 2.6)
BET (76.0; 18.9) (64.9; 24.6)
BSE (67.2; 24.1) (55.6; 29.0)
ROBEX (93.4; 9.8) (88.7; 12.6)
AFNI (86.7; 14.3) (78.7; 18.2)
Watershed (84.9; 15.7) (76.4;20.7)
Table 1: Mean and standard deviation of Dice and Jaccard overlap metrics computed on the training data set with SSCNN and five other commonly used skullstriping techniques.
Figure 9: Dice overlap coefficients on the training data using all methods with their optimal flags

3.3.2 Evaluation on an Independent Newborn Data Set

Anonymized brain MRI images of eighteen newborns from a cohort of 43 non-sedated infants born to 32 heavy drinkers and 11 controls recruited prospectively during pregnancy for a brain imaging study Jacobson et al. (2017) was selected and manually traced to be used as a second data set to quantitatively evaluate the performance of our new skullstripping algorithm. For imaging details, see Table 2. The segmenters were blind with respect to the newborn fetal alcohol spectrum disorder diagnosis and prenatal alcohol and drug exposure. The mean and standard deviation of the Dice and Jaccard overlap scores was : and .

Data Set N Age-at-scan Parameters Scanner MRI sequence
dHCP 40 Term age (37-44 weeks)
T1wTR = 4795ms; TI = 1740ms; TE = 8.7ms;
SENSE factor 2.27 (axial) and 2.66 (sagittal);
0.8x0.8mm^2 and 1.6mm slices overlapped by 0.8mm
3T Philips Achieva inversion recovery T1w multi-slice fast spin-echo
BCH1 111+88 10-52 and 77-156 days (avg=26.66,std=9.18 and avg=104.8, std=16.18)
TR = 250 ms, TE=1.74 ms, TI = 1450ms, flip angle=7°, PAT=2, 1mm3
voxels, FOV=160mm
3T Siemens Trio
T1-weighted mocoMEMPRAGE
BCH2 29+6
1-26, 106-202 days
(avg=6.1, std=6.2 and avg=164.5, std=41.1)
TR = 250 ms, TE=1.74 ms, TI = 1450ms, flip angle=7°, PAT=2, 1mm3
voxels, FOV=160mm
3T Siemens Trio
T1-weighted mocoMEMPRAGE
BCH3 105 2-19 days (avg=9.32, std=3.59)
TR = 2270 ms, TI = 1450
ms, flip angle = 7 degrees, 176 slices, 1 mm3 voxels, FOV =
220x220 mm2, GRAPPA=2
3T Siemens
T1-weighted mocoMEMPRAGE
BAN 54
62-97 days (avg=79.79,
TR = 2520 ms, TE = 2.22
ms, 144 sagittal slices, 1 mm3 voxels, FOV = 192 mm
3T Siemens Verio T1-weighted MPRAGE
UCT 18
7-47 days (avg=17.4,
MEF 5°/20°: TR =
20ms, 8 echoes TE = 1.46 ms + n×1.68 ms where
n = 0,..,7, 144 sagittal slices, 1 mm3 voxels
3T Siemens Allegra
angle, multi-echo FLASH (MEF)
Table 2: Test Data Set Summary

3.3.3 Evaluation on Unlabeled Data Sets

In addition to the previously described experiments, we also assembled a collection of anonymized and unlabeled test data set from five different initiatives in order to demonstrate the exceptional performance of our new tool on a variety of input images. Three of these originate from the Boston Children’s Hospital (BCH), one from Bangladesh and one from the recently released data set from the “The Developing Human Connectome Project” (dHCP) Makropoulos et al. (2018). We compared the performance of SSCNN both qualitatively and quantitatively to the five already mentioned skull stripping solutions from Section 3.3.

Below is the description of all the data sets that were used for our experiments. The participating infants underwent structural MRI imaging and T1-weighted scans were acquired either on a 3T Siemens or a Phillips scanner during natural sleep. Human subject approval was obtained from all respective Institutional Review Boards and written informed parental consent was obtained for imaging. For details of the imaging protocol refer to Table 2.

Unlabeled Data Sets: dHCP : Imaging data of forty newborn subjects was released by the developing human connectome project Makropoulos et al. (2018). Even though the consortium processed the T2w images of these subjects in their initial release, the corresponding T1w images were also made available and we processed these in our study. The original data sets of 0.8 x 0.8 x 0.8 were downsampled to 1mm isotropic for our processing. BCH1 : Healthy, full-term neonates were recruited at the Brigham and Women’s Hospital (BWH) and Beth Israel Deaconess Medical Center (BIDMC) as part of an ongoing prospective data collection study. The protocol was reviewed and approved by the institutional review boards at Boston Children’s Hospital (BCH), BWH and BIDMC. The participating infants were all singletons with normal Apgar scores and have no clinical concerns regarding perinatal brain injury or congenital or metabolic abnormalities. All subjects were full-term infants scanned within their first month of life and a subset called back for a second scan at about 4 months of age. BCH2 : Parents of neonates with congenital heart disease were approached for consent in the Cardiac ICU at BCH. This prospective study was approved by the institutional review board of BCH and was performed in compliance with the Health Insurance Portability and Accountability Act. Criteria for inclusion were: diagnosis of CHD confirmed by echocardiogram or cardiac MRI and ability to safely tolerate the brain MRI examination without sedation, prior and in some cases post to surgery. Neonates were excluded if there was evidence of a syndrome or genetic disease. BCH3 : Native English-speaking children with and without a family history of DD were studied. All children were enrolled in a longitudinal dyslexia study which was approved by BCH institutional review. The data set used in this study is from the first timepoint acquisitions. BEAN : Imaging data were collected in a set of infants from the Bangladesh Early Adversity Neuroimaging (BEAN) study investigating the effect of early biological and psychosocial adversity on children’s neurocognitive development among infants and children growing up in Dhaka, Bangladesh Jensen et al. (2019); Storrs (2017). For the set of unlabeled data sets, we used the optimized version of SSCNN and the five other skullstripping tools. Figures 10 and 11 display outcomes on the dHCP data, where SSCNN clearly outperforms the rest. Given that we did not have access to a ground truth solution, we STAPLEd Warfield et al. (2004) all the outcomes together and then compared the SSCNN solution to it. The expectation was that the more SSCNN outperformed the other solutions the lower the Dice overlap score was between it and the STAPLEd labels, as well as the higher standard deviation. Both of these are well demonstrated in Figure 12.

Figure 10: Skullstripping solutions on all 40 data sets from the dHCP project using six different tools. The images are aligned in an unbiased affine coordinate space for visualization purposes and the central coronal slice is selected from each of the MRIs. Skullstripping contours are indicated in color: BET (pink), BSE (yellow), robex (white), AFNI (blue), watershed (black) and SSCNN (red)
Figure 11: Skullstripping solutions on three representative data sets from the dHCP project using six different tools. The images are aligned in an unbiased affine coordinate space for visualization purposes and the central coronal slice is selected from each of the MRIs. Skullstripping contours are indicated in color: BET (pink), BSE (yellow), robex (white), AFNI (blue), watershed (black) and SSCNN (red)
Figure 12: Dice overlap coefficients on unlabeled test data between solutions of our tool and the STAPLEd version of all tested automated algorithms

4 Discussion and Conclusion

We have described SSCNN, a 2D multi-view CNN-based skullstripping method for infant MRI. SSCNN trains three independent networks to extract the brain mask from slices in three cardinal orientations–coronal, axial, and saggital. The outputs of the three networks are linearly combined to produce a final brain mask. We have demonstrated in Section 3.3 and Section 3.3.2 that SSCNN is a highly accurate skullstripping algorithm and is significantly better than existing methods.

We ran SSCNN on hundreds of subjects from diverse, multi-site, multi-scannner MRI studies (Section 3.3.3)and showed an improved skullstripping performance compared to five other tools. SSCNN did not have a single case of gross skullstripping failure on any of these datasets, demonstrating its generalizability and robustness. Choosing a training dataset encompassing the infant age range ensured that SSCNN was robust to the contrast changes due to brain development.

SSCNN is computationally fast with a run time of less than 30 seconds with a GPU (NVidia P100, P40, Quadro P6000) and less than 2 minutes on a single thread CPU. It is about 10 times faster than ROBEX, which is the fastest method among those tested. This is an important advantage as SSCNN can be potentially deployed on the scanner to quickly identify the brain during a scanning session and perform slice prescription for an optimal field of view based on the brain structure of interest. This will also prove to be useful for motion correction between scans–a step that is sometimes necessary when scanning infant subjects.

Presently, SSCNN is designed to work on -weighted acquisitions. In the next version we plan to update the training with cross-sequence augmentation that will enable to it skullstrip acquisitions with -weighted contrasts as well Jog et al. (2019). The linear combination of the three predictions in coronal, axial, and sagittal orientations was learned from cross-validation experiments. In the future, we will learn the weights of this combination by adding a custom layer that will collects the slice predictions in each orientation, reorients them, and combines them. This will ensure an end-to-end learning scheme for combining all three orientations and jointly optimizing their weights.

We also need to evaluate SSCNN for infant subjects with pathologies such as tumors, large ventricles that can significantly change the brain shape and boundary characteristics of brain and skull. Our training dataset does not have such subjects and we would need to augment the existing training dataset or add newer training datasets to enhance this feature for SSCNN.

In summary, we have described SSCNN, a fast, robust, infant MRI skullstripping framework. The code will be made available as a part of the FreeSurfer development version repository ( Further validation and testing will be necessary before incorporating it into a release version.

5 Acknowledgements

Support for this research was provided in part by the BRAIN Initiative Cell Census Network grant U01MH117023, the National Institute for Biomedical Imaging and Bioengineering (P41EB015896, 1R01EB023281,
R01EB006758, R21EB018907, R01EB019956), the National Institute on Aging (5R01AG008122, R01AG016495), the National Institute of Mental Health, the National Institute for Neurological Disorders and Stroke (R01NS0525851, R21NS072652, R01NS070963, R01NS083534, 5U01NS086625,
5U24NS10059103), the Eunice Kennedy Shriver National Institute of Child Health & Human Development (5R01HD065762), the National Institute on Alcohol Abuse and Alcoholism (R21 AA020037) and was made possible by the resources provided by Shared Instrumentation Grants 1S10RR023401, 1S10RR019307, and 1S10RR023043. Additional support was provided by the NIH Blueprint for Neuroscience Research (5U01-MH093765), part of the multi-institutional Human Connectome Project. In addition, BF has a financial interest in CorticoMetrics, a company whose medical pursuits focus on brain imaging and measurement technologies. BF’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.


  • Fischl et al. (2002) B. Fischl, D. H. Salat, et al., Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain, Neuron 33 (2002) 341 – 355.
  • Ou et al. (2014) Y. Ou, H. Akbari, M. Bilello, X. Da, C. Davatzikos, Comparative evaluation of registration algorithms in different brain databases with varying difficulty: Results and insights, IEEE Transactions on Medical Imaging 33 (2014) 2039–2065.
  • Mugler III and Brookeman (1991) J. P. Mugler III, J. R. Brookeman, Rapid three-dimensional T1-weighted MR imaging with the MP-RAGE sequence, Journal of Magnetic Resonance Imaging 1 (1991) 561–567.
  • Cox (1996) R. W. Cox, AFNI: Software for analysis and visualization of functional magnetic resonance neuroimages, Computers and Biomedical Research 29 (1996) 162 – 173.
  • Shattuck et al. (2001) D. W. Shattuck, S. R. Sandor-Leahy, K. A. Schaper, D. A. Rottenberg, R. M. Leahy, Magnetic resonance image tissue classification using a partial volume model, NeuroImage 13 (2001) 856 – 876.
  • Smith (2002) S. M. Smith, Fast robust automated brain extraction, Human Brain Mapping 17 (2002) 143–155.
  • Hahn and Peitgen (2000) H. K. Hahn, H.-O. Peitgen, The skull stripping problem in MRI solved by a single 3D watershed transform, in: Proceedings of the Third International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI ’00, Springer-Verlag, London, UK, UK, 2000, pp. 134–143.
  • Segonne et al. (2004) F. Segonne, A. Dale, E. Busa, M. Glessner, D. Salat, H. Hahn, B. Fischl, A hybrid approach to the skull stripping problem in MRI, NeuroImage 22 (2004) 1060 – 1075.
  • Carass et al. (2011) A. Carass, J. Cuzzocreo, M. B. Wheeler, P.-L. Bazin, S. M. Resnick, J. L. Prince, Simple paradigm for extra-cerebral tissue removal: Algorithm and analysis, NeuroImage 56 (2011) 1982 – 1992.
  • Iglesias et al. (2011) J. E. Iglesias, C. Liu, P. M. Thompson, Z. Tu, Robust brain extraction across datasets and comparison with publicly available methods, IEEE Transactions on Medical Imaging 30 (2011) 1617–1634.
  • Roy et al. (2017) S. Roy, J. A. Butman, D. L. Pham, Robust skull stripping using multiple MR image contrasts insensitive to pathology, NeuroImage 146 (2017) 132 – 147.
  • Doshi et al. (2013) J. Doshi, G. Erus, Y. Ou, B. Gaonkar, C. Davatzikos, Multi-atlas skull-stripping, Academic Radiology 20 (2013) 1566 – 1576.
  • Mahapatra (2012) D. Mahapatra, Skull stripping of neonatal brain MRI: Using prior shape information with graph cuts, Journal of Digital Imaging 25 (2012) 802–814.
  • Zhuang et al. (2006) A. H. Zhuang, D. J. Valentino, A. W. Toga, Skull-stripping magnetic resonance brain images using a model-based level set, NeuroImage 32 (2006) 79 – 92.
  • Chiverton et al. (2007) J. Chiverton, K. Wells, E. Lewis, C. Chen, B. Podda, D. Johnson, Statistical morphological skull stripping of adult and infant MRI data, Computers in Biology and Medicine 37 (2007) 342 – 357.
  • Eskildsen et al. (2012) S. F. Eskildsen, P. Coupé, V. Fonov, J. V. Manjón, K. K. Leung, N. Guizard, S. N. Wassef, L. R. Østergaard, D. L. Collins, Beast: Brain extraction based on nonlocal segmentation technique, NeuroImage 59 (2012) 2362 – 2373.
  • Ou et al. (2015) Y. Ou, R. L. Gollub, K. Retzepi, N. Reynolds, R. Pienaar, S. Pieper, S. N. Murphy, P. E. Grant, L. Zöllei, Brain extraction in pediatric ADC maps, toward characterizing neuro-development in multi-platform and multi-institution clinical images, Neuroimage 122 (2015) 246–61.
  • Alansary et al. (2016) A. Alansary, M. Ismail, A. Soliman, F. Khalifa, M. Nitzken, A. Elnakib, M. Mostapha, A. Black, K. Stinebruner, M. F. Casanova, J. M. Zurada, A. El-Baz, Infant brain extraction in t1-weighted mr images using bet and refinement using lcdg and mgrf models, IEEE Journal of Biomedical and Health Informatics 20 (2016) 925–935.
  • Roth et al. (2015) H. R. Roth, L. Lu, A. Farag, H.-C. Shin, J. Liu, E. B. Turkbey, R. M. Summers, Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation, in: N. Navab, J. Hornegger, W. M. Wells, A. Frangi (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer International Publishing, Cham, 2015, pp. 556–564.
  • Kamnitsas et al. (2017) K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, B. Glocker, Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation, Medical Image Analysis 36 (2017) 61 – 78.
  • Gibson et al. (2018) E. Gibson, W. Li, C. Sudre, L. Fidon, D. Shakir, G. Wang, Z. Eaton-Rosen, R. Gray, T. Doel, Y. Hu, T. Whyntie, P. Nachev, M. Modat, D. C. Barratt, S. Ourselin, M. J. Cardoso, T. Vercauteren, Niftynet: a deep-learning platform for medical imaging, volume 158, pp. 113–122.
  • Kleesiek et al. (2016) J. Kleesiek, G. Urban, A. Hubert, D. Schwarz, K. Maier-Hein, M. Bendszus, A. Biller, Deep MRI brain extraction: A 3D convolutional neural network for skull stripping, NeuroImage 129 (2016) 460 – 469.
  • Bekker et al. (2016) A. J. Bekker, M. Shalhon, H. Greenspan, J. Goldberger, Multi-view probabilistic classification of breast microcalcifications, IEEE Transactions on Medical Imaging 35 (2016) 645–653.
  • Roy et al. (2019) A. G. Roy, S. Conjeti, N. Navab, C. Wachinger, Quicknat: A fully convolutional network for quick and accurate segmentation of neuroanatomy, NeuroImage 186 (2019) 713 – 727.
  • Ronneberger et al. (2015) O. Ronneberger, P.Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), volume 9351 of LNCS, Springer, 2015, pp. 234–241. (available on arXiv:1505.04597 [cs.CV]).
  • Ioffe and Szegedy (2015) S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift,

    in: F. Bach, D. Blei (Eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of

    Proceedings of Machine Learning Research, PMLR, Lille, France, 2015, pp. 448–456.
  • Kingma and Ba (2015) D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  • de Macedo Rodrigues et al. (2015) K. de Macedo Rodrigues, E. Ben-Avi, D. D. Sliva, M.-s. Choe, M. Drottar, R. Wang, B. Fischl, P. E. Grant, L. Zollei, A FreeSurfer-compliant consistent manual segmentation of infant brains spanning the 0–2 year age range, Frontiers in Human Neuroscience 9 (2015) 21.
  • Serag et al. (2016) A. Serag, M. Blesa, E. J. Moore, R. Pataky, S. A. Sparrow, A. G. Wilkinson, G. Macnaught, S. I. Semple, J. P. Boardman, Accurate learning with few atlases (ALFA): an algorithm for MRI neonatal brain extraction and comparison with 11 publicly available methods, Scientific Reports 6 (2016).
  • Lee et al. (2003) J.-M. Lee, U. Yoon, S. H. Nam, J.-H. Kim, I.-Y. Kim, S. I. Kim, Evaluation of automated and semi-automated skull-stripping algorithms using similarity index and segmentation error, Computers in Biology and Medicine 33 (2003) 495 – 507.
  • Popescu et al. (2012) V. Popescu, M. Battaglini, W. Hoogstrate, S. Verfaillie, I. Sluimer, R. van Schijndel, B. van Dijk, K. Cover, D. Knol, M. Jenkinson, F. Barkhof, N. de Stefano, H. Vrenken, Optimizing parameter choice for FSL-brain extraction tool (BET) on 3D T1 images in multiple sclerosis, NeuroImage 61 (2012) 1484 – 1494.
  • Shi et al. (2012) F. Shi, L. Wang, Y. Dai, J. H. Gilmore, W. Lin, D. Shen, Label: Pediatric brain extraction using learning-based meta-algorithm, NeuroImage 62 (2012) 1975 – 1986.
  • Makropoulos et al. (2014) A. Makropoulos, I. S. Gousias, C. Ledig, P. Aljabar, A. Serag, J. V. Hajnal, A. D. Edwards, S. J. Counsell, D. Rueckert, Automatic whole brain MRI segmentation of the developing neonatal brain, IEEE Transactions on Medical Imaging 33 (2014) 1818–1831.
  • Serag et al. (2012) A. Serag, P. Aljabar, G. Ball, S. J. Counsell, J. P. Boardman, M. A. Rutherford, A. D. Edwards, J. V. Hajnal, D. Rueckert, Construction of a consistent high-definition spatio-temporal atlas of the developing brain using adaptive kernel regression, NeuroImage 59 (2012) 2255 – 2265.
  • Jacobson et al. (2017) S. W. Jacobson, J. L. Jacobson, C. D. Molteno, C. M. Warton, P. Wintermark, H. E. Hoyme, G. De Jong, P. Taylor, F. Warton, N. M. Lindinger, R. C. Carter, N. C. Dodge, E. Grant, S. K. Warfield, L. Zollei, A. J. van der Kouwe, E. M. Meintjes, Heavy prenatal alcohol exposure is related to smaller corpus callosum in newborn MRI scans, Alcoholism: Clinical and Experimental Research 41 (2017) 965–975.
  • Makropoulos et al. (2018) A. Makropoulos, E. C. Robinson, A. Schuh, R. Wright, S. Fitzgibbon, J. Bozek, S. J. Counsell, J. Steinweg, K. Vecchiato, J. Passerat-Palmbach, G. Lenz, F. Mortari, T. Tenev, E. P. Duff, M. Bastiani, L. Cordero-Grande, E. Hughes, N. Tusor, J.-D. Tournier, J. Hutter, A. N. Price, R. P. A. Teixeira, M. Murgasova, S. Victor, C. Kelly, M. A. Rutherford, S. M. Smith, A. D. Edwards, J. V. Hajnal, M. Jenkinson, D. Rueckert, The developing human connectome project: A minimal processing pipeline for neonatal cortical surface reconstruction, NeuroImage 173 (2018) 88 – 112.
  • Jensen et al. (2019) S. K. Jensen, S. Kumar, W. Xie, F. Tofail, R. Haque, W. A. Petri, C. A. Nelson, Neural correlates of early adversity among bangladeshi infants, Scientific reports 9 (2019) 3507.
  • Storrs (2017) C. Storrs, How poverty affects the brain, 2017. [Online; posted 12-July-2017].
  • Warfield et al. (2004) S. K. Warfield, K. H. Zou, W. M. Wells,

    Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation,

    IEEE Transactions on Medical Imaging 23 (2004) 903–921.
  • Jog et al. (2019) A. Jog, A. Hoopes, D. N. Greve, K. Van Leemput, B. Fischl, PSACNN: Pulse sequence adaptive fast whole brain segmentation, arXiv preprint arXiv:1901.05992 (2019).