Whole brain segmentation is one of the most important tasks in a neuroimage processing pipeline. A segmentation output consists of labels for white matter, cortex, and subcortical structures such as the thalamus, hippocampus, amygdala, and others. Structure volumes and shape statistics that rely on volumetric segmentation are regularly used to quantify differences between healthy and diseased populations . An ideal segmentation algorithm of course needs to be highly accurate, but also, critically, it needs to be robust to variations in input data. Large modern MRI datasets are necessarily multi-center initiatives to gain access to a larger pool of subjects. It is very difficult to achieve perfect acquisition harmonization across different sites due to variations in scanner manufacturers, field strengths, receive coils, pulse sequences, available contrast, and resolution. Variations in site-specific parameters introduce bias and increase variation in downstream image processing including segmentation [5, 7]. Low computational load is a yet another desirable property of a segmentation algorithm. A fast, memory-light segmentation algorithm enables quicker processing of large datasets and wider adoption.
Existing segmentation algorithms can be broadly classified into three types: (a) unsupervised, (b) multi-atlas registration-based, and (c) supervised. Unsupervised algorithms[3, 12] fit the observed intensities of the input image to an underlying atlas-based model and perform maximum a posteriori labeling. They assume a functional form (e.g. Gaussian) of intensity distributions and results can degrade if the input distribution differs from this assumption. Efforts have been made to develop a hybrid approach  that is robust to input sequences and also leverages manually labeled training data. Unsupervised methods are usually computationally intensive, taking 0.5–4 hours to run. Multi-atlas registration and label fusion (MALF) algorithms achieve state-of-the-art [1, 14]
segmentation accuracy. However, they require multiple computationally expensive registrations followed by label fusion. Registration quality can also suffer if the test image contrast properties are significantly different from the atlas images. Recently, with the success of deep learning methods in medical imaging, supervised segmentation approaches built on 3D CNNs have produced accurate segmentations with a runtime of a few seconds to minutes[8, 13]. Despite the powerful local and global context that these models provide, they are vulnerable to subtle contrast differences that are inevitably present in multi-scanner MRI studies. However, with appropriate training using test-data specific augmentation, as we will show in Section 3, these differences can be essentially removed.
We present PSACNN–Pulse Sequence Augmented Convolutional Neural Network; a CNN-based multi-label segmentation approach that employs an augmentation scheme of generating training image patches on-the-fly that appear as if they have been imaged using the pulse sequence of the test data. PSACNN training consists of three major steps: (1) Estimating test data pulse sequence parameters, (2) applying test data pulse sequence forward models to training data nuclear magnetic resonance (NMR) parameters to create test data specific training features, (3) training a deep CNN to predict the segmentation using the augmentation. We will describe each of these steps in detail in Section2.
Let be a collection of -w images with a corresponding expert manually labeled image set . The paired collection is referred to as the atlas image set or the training image set. We assume that is acquired using the pulse sequence , where can be MPRAGE (magnetization prepared gradient echo)  or FLASH (fast low angle shot), SPGR (spoiled gradient echo), and others. In addition, let be the corresponding NMR parameter maps. For each we have , where is a map of proton densities, and and store the longitudinal (), and transverse () relaxation time maps respectively. Most atlas datasets do not acquire or generate maps. We generate them using image synthesis for our atlas dataset using a previously acquired dataset  that estimated and maps from multi-echo FLASH images. We describe how to complement existing atlas sets with these maps in the Supplementary Material.
Step 1: Estimating Test Data Acquisition Parameters:
Let be the test image we want to obtain the segmentation for, acquired via pulse sequence . We assume that we have access to the test image (or test dataset) prior to training so that we can design test data-specific augmentations. We would like to estimate the pulse sequence parameters of directly from the image . The intensity at voxel is given by the imaging equation of . Equation (1) shows the imaging equation for the FLASH sequence. is a function of the acquisition parameters repeat time (), echo time (), flip angle (), gain (), and tissue NMR parameters .
The MPRAGE sequence is more complex to model . In general, it is difficult to derive an imaging equation for most pulse sequences, and given and the acquisition parameter set, it is necessary to run a full Bloch equation simulation to obtain voxel intensities. Additionally, the number of scanner parameters that can affect signal intensity is very large on modern MRI scanners and for many datasets detailed parameter sets may not be available. Therefore, we would like to estimate these directly from the image intensities. However, it is generally not possible to robustly estimate all pulse sequence parameters purely from the image intensities themselves. Even when the equation is well-understood, the highly nonlinear dependence of intensities on imaging parameters makes their estimation unstable. Therefore, we have formulated approximations of the MPRAGE and FLASH equations with a smaller number of parameters that can be estimated directly and robustly from the image intensities. Our approximation for the FLASH sequence is shown in Eqns. (2)-(3).
where forms our parameter set. For the range of values of in the human brain ms at 1.5 T, our approximation fits the signal dependence on well (see Figs. 1(b) and 1(c)). Equation (4) is our approximation for the MPRAGE imaging equation provided by Wang et al. .
Given test image we estimate by a strategy that is similar to Jog et al. . Let , ,
be the mean intensities of cerebrospinal fluid (CSF), gray matter (GM), and white matter (WM), respectively. These can be obtained by a three-class classification scheme by fitting a Gaussian mixture model to the intensity distribution. Let, , be the mean NMR values for CSF, GM, and WM classes, respectively, that are obtained from previously acquired data . Thus, we have a system of three linear equations and three unknown parameters in that can be solved to obtain the estimate (see Fig. 1(a)).
Step 2: Feature Extraction and Augmentation:
We use the U-Net CNN architecture  as our predictor. GPU memory constraints prevent us from using the brain image as an input. For each in , we sample a -sized patch at voxel . We observed that training a purely patch-based U-Net leads to segmentation errors due to incorrect localization. To provide more global context, we use information from the nonlinear warp produced by the FreeSurfer pipeline  that aligns the FreeSurfer atlas to the test image . For each voxel in , we extract the FreeSurfer atlas 3D coordinates , that warp to , and produce –a
-sized patch of warped FreeSurfer atlas coordinates. The final feature vector, is a concatenation of the intensity patch and the coordinates (see Fig. 1(a) 2D patches shown for illustration). Test data-specific augmented patches are generated by extracting NMR parameter patches from the NMR map for each , where the NMR patch at voxel is denoted by . The augmented patches are given by:
In addition, we also sample the 3D parameter space of and generate non-test-data specific augmented patches to increase the breadth of available training. The augmented patches are also concatenated with the warped FreeSurfer atlas coordinate patches to generate the augmented features and added to the training (Fig 1(a)).
Original and augmented features with corresponding manually labeled image patches are used to train the U-Net, which has three pooling and corresponding upsampling stages (see Figure 1
(a)). Each orange block consists of two 3D convolutional layers followed by a batch normalization layer. The first block has 32
-sized filters and the subsequent encoding blocks have twice the number of filters as the previous block. The red arrows signify skip connections where data from the encoding layers is concatenated as input to the decoding layers. All activations are ReLu except for the last layer that has a softmax activation. The output iswhere is the number of labels. We use the Adam optimization algorithm to minimize soft Dice-based loss calculated over all labels. The batch size is 32 and is divided equally between augmented and original training features. The total size of unaugmented training was patches from 16 subjects in . Validation data was generated from three subjects. During prediction, we extract features from test image
and apply the trained U-Net to generate overlapping patches of label probabilities. These are combined together by averaging in the overlapping regions to produce the final probability map. The label with the highest probability is selected as the label in the hard segmentation.
3 Experiments and Results
3.1: Same Scanner Dataset
In this experiment we compare the performance of PSACNN against unaugmented CNN (CNN), FreeSurfer segmentation (ASEG) , SAMSEG , and MALF , on test data with the same acquisition protocol as the training data. The complete dataset consists of 39 subjects with 1.5 T Siemens Vision MPRAGE acquisitions (TR=9.7 ms, TE=4 ms TI=20 ms, voxel-size= mm) with expert manual labels done as per the protocol described in . We chose a subset of 16 subjects as training data for CNN, PSACNN, and as the atlas set for MALF. We used 3 subjects for validation for the CNNs, and 20 for testing. Figure 2 shows the results for all the algorithms and an example segmentation produced by PSACNN. CNN (red) and PSACNN (blue) have significantly higher Dice overlap (ALL Dice
) than the next best method as tested using a paired t-test () for most structures.
3.2: Different Scanner Datasets
In this experiment we compare the accuracy of PSACNN against other methods on two datasets; (a) Siemens dataset: 14 subject MPRAGE scans acquired on a 1.5 T Siemens SONATA scanner with the same pulse sequence as the training data, and (b) GE dataset: 13 subject SPGR (spoiled gradient recalled) scans acquired on a 1.5 T GE Signa (TR=35 ms, TE=5 ms, , voxel size= mm) scanner. Both datasets have expert manual segmentations generated with the same protocol as the training data. Figure 4 shows comparison of Dice coefficients for all the methods. On the Siemens dataset CNN and PSACNN are significantly better than the rest (Fig. 4(a)), as is expected due to its similarity with the training data. However, on the GE scans, which present a noticeably different tissue contrast (Fig. 3), all methods show a reduced overlap (Fig. 4(b)), but CNN has the worst performance of all as it is unable to generalize. PSACNN (ALL Dice=) on the other hand, is robust to the contrast change due to pulse sequence-based augmentation, and produces segmentations that are comparable to the state-of-the-art algorithms such as SAMSEG (ALL Dice=) and MALF (ALL Dice=) in accuracy, with an order of magnitude lower processing time.
3.3: Multi-scanner Consistency
In this experiment we tested the consistency of the segmentation produced by the various methods on four datasets acquired from 15 subjects; MEF: multi-echo FLASH acquired on Siemens 3 T TRIO scanner, TRIO: MPRAGE acquired on Siemens TRIO, GE: MPRAGE on 1.5 T GE Signa, and Siemens: MPRAGE on 1.5 T Siemens SONATA scanner. Siemens scan parameters are closest to that of the training dataset and we calculated absolute difference in structure volumes obtained using different segmentation methods on datasets MEF, TRIO, GE with respect to Siemens. In Table 1 we report only WM volume differences due to lack of space and also because total white matter volumes are a good indicator of accumulated effects of pulse sequence variation. MALF shows the highest consistency for GE and is second best for the rest. PSACNN provides the best performance for TRIO and second best for the GE dataset.
4 Discussion and Conclusions
We have described PSACNN, a CNN-based segmentation algorithm that can adapt itself to the test data by generating test-data specific augmentation using an approximate forward model of MRI image formation. The augmentation can be used to robustly train or fine-tune any underlying predictor. We show state-of-the-art segmentation performance on diverse datasets. The prediction is fast and takes less than a minute to run on a single process. In the future we intend to use more accurate imaging equations and simulate other pulse sequences to increase the range of application.
-  Asman, A.J., Landman, B.A.: Formulating spatially varying performance in the statistical fusion framework. IEEE TMI 31(6), 1326–1336 (June 2012)
-  Buckner, R., Head, D., Parker, et al.: A unified approach for morphometric and functional data analysis in young, old, and demented adults using automated atlas-based head size normalization: reliability and validation against manual measurement of total intracranial volume. NeuroImage 23(2), 724 – 738 (2004)
Fischl, B., Salat, D.H., et al.: Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain. Neuron 33(3), 341 – 355 (2002)
-  Fischl, B., Salat, D.H., van der Kouwe, A.J., et al.: Sequence-independent segmentation of magnetic resonance images. NeuroImage 23(S1), S69–S84 (2004)
-  Han, X., Jovicich, J., Salat, D., et al.: Reliability of MRI-derived measurements of human cerebral cortical thickness: The effects of field strength, scanner upgrade and manufacturer. NeuroImage 32(1), 180 – 194 (2006)
-  Jog, A., Carass, A., Roy, S., Pham, D.L., Prince, J.L.: MR image synthesis by contrast learning on neighborhood ensembles. MedIA 24(1), 63–76 (2015)
-  Jovicich, J., et al.: Brain morphometry reproducibility in multi-center 3T MRI studies: A comparison of cross-sectional and longitudinal segmentations. NeuroImage 83, 472 – 484 (2013)
-  Li, W., Wang, G., Fidon, L., Ourselin, S., Cardoso, M.J., Vercauteren, T.: On the compactness, efficiency, and representation of 3D convolutional networks: Brain parcellation as a pretext task. In: IPMI (2017)
-  Mugler, J.P., Brookeman, J.R.: Three-dimensional magnetization-prepared rapid gradient-echo imaging (3D MPRAGE). Mag. Reson. Med. 15(1), 152–157 (1990)
-  Puonti, O., Iglesias, J.E., Leemput, K.V.: Fast and sequence-adaptive whole-brain segmentation using parametric Bayesian modeling. NeuroImage 143, 235 – 249 (2016)
-  Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: MICCAI 2015. pp. 234–241 (2015)
-  Shiee, N., et al.: A topology-preserving approach to the segmentation of brain images with Multiple Sclerosis lesions. NeuroImage 49(2), 1524–1535 (2010)
-  Wachinger, C., Reuter, M., Klein, T.: DeepNAT: Deep convolutional neural network for segmenting neuroanatomy. NeuroImage (2017)
-  Wang, H., Suh, J.W., Das, S.R., Pluta, J.B., Craige, C., Yushkevich, P.A.: Multi-atlas segmentation with joint label fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(3), 611–623 (March 2013)
-  Wang, J., He, L., Zheng, H., Lu, Z.L.: Optimizing the magnetization-prepared rapid gradient-echo (MP-RAGE) sequence. PloS one 9(5), e96899 (2014)