Ex vivo MRI of the brain provides remarkable advantages over in vivo MRI for visualizing detailed neuroanatomy and linking macroscopic morphometric measures such as cortical thickness to underlying cytoarchitecture and pathology [mancini2020multimodal]. It helps in characterizing the underlying anatomy at the scale of subcortical layers [augustinack2013medial], such as hippocampal subfields in the medial temporal lobe (MTL) [yushkevich2021three, ravikumar2020building]. Compared to in vivo MRI, ex vivo MRI is not affected by head or respiratory motion artifacts and has much less stringent time and specific absorption rate constraints. Compared to histology, it does not suffer from distortion or tearing of brain tissue, thereby giving flexibility in acquiring ultra-high resolution images. Indeed, ex vivo MRI is often used to provide a 3D reference space onto which to map 2D histological images. Combined analysis of ex vivo MRI and histology makes it possible to link morphological changes in the brain to underlying pathology as well as to generate anatomically correct parcellations of the brain based on cytoarchitecture [schiffer2021convolutional], and pathoarchitecture [augustinack2013medial].
There has been substantial work in brain MRI parcellation such as FreeSurfer [fischl2012freesurfer] and recent efforts based on deep learning [henschel2020fastsurfer, chen2018voxresnet]. However, these approaches focus on in vivo MRI, and limited work has focused on developing automated segmentation methods for ex vivo MRI segmentation. Ex vivo segmentation methods have been region specific. Recent developments include automated deep learning methods for high resolution cytoarchitectonic mapping of the occipital lobe in 2D histological sections [schiffer2021convolutional]. The work by [iglesias2015computational] has developed an atlas to segment the MTL using manual segmentations in ex vivo images. Yet, an ex vivo segmentation method applicable to a variety of brain regions has yet to be described. Limited availability of ex vivo 3D MRI segmentation algorithms may be explained by only a few groups focusing on whole brain ex vivo image analysis and hence limited availability of specimens, scans, and labeled ground truth segmentations; greater heterogeneity in scanning protocols when compared to in vivo structural MRI; larger image dimensions, greater textural complexity, and more profound imaging artifacts than in in vivo MRI.
In this work, we present a novel dataset of 32 high resolution (0.3 x 0.3 x 0.3 mm) 7 Tesla ex vivo MRI scans of whole brain hemispheres of older adult patients with Alzheimer’s Disease or Related Dementias (ADRD) or cognitively normal adults. We then benchmark nine deep learning neural architectures to segment cortical and sub-cortical gray matter in whole brain hemispheres, with limited patch-based training data. We measure cortical thickness at several key locations in the cortex and correlate these automated measures with thickness measurements obtained using a user-guided semi-automated protocol. High consistency between these two sets of measures supports the use of deep learning based automated thickness measures for ex vivo brain morphometry. Additionally, we show that networks trained on T2w images acquired at 7 Tesla are able to generalize to ex vivo images obtained with T2*-weighted (T2*w) gradient echo images acquired at 7 Tesla, and ex vivo images acquired at a lower field strength of 3 Tesla.
Image Acquisition. We analyze a dataset of 32 ex vivo whole-hemisphere MRI scans. Patients were selected for the study from our ongoing research autopsy program. Data was drawn from 11 females (Age: 64-94) and 21 males (Age: 54-97) with Alzheimer’s Disease or Related Dementias (ADRD) or cognitively normal adults. Human brain specimens were obtained in accordance with the Institutional Review Board guidelines. Specimens were scanned after atleast 4 week fixation period. T2w images were acquired using a 3D-encoded T2 SPACE sequence with 0.28 mm isotropic resolution, 3 s repetition time (TR), echo time (TE) 383 ms, turbo factor 188, echo train duration 951 ms, bandwidth 348 Hz/px. All data was acquired on a Siemens MAGNETOM Terra 7 Tesla scanner using a custom birdcage transmit/receive coil. Sample slices are shown in Fig. 2.
Patch-level Gray Matter Segmentation. To train the neural networks, we sampled five 3D image patches of size 64 x 64 x 64 around the orbitofrontal, anterior temporal, inferior prefrontal, primary motor, and primary somatosensory cortices from 6 brain hemispheres, resulting in a total of 30 patches. Fig. 1 C shows sample patch images and the corresponding ground truth labels with 3D renderings. Five manual raters, divided into groups of two and three, labeled gray matter as the foreground, and rest of the image as the background using a combination of manual tracing and the semi-automated segmentation tool in ITK-SNAP software [yushkevich2019user]. Inter-rater reliability scores were computed for these manual segmentations in terms of Dice Coefficient (DSC): Raters 1&2: 95.26 1.37 %, Raters 1&3: 94.64 1.64 %, Raters 2&3: 94.54 1.20 %, Raters 4&5: 92.04 4.26 %.
Thickness measurements at key cortical locations. To obtain localized quantitative signatures of cortical morphometry at our Center, in each of the 32 hemispheres, we identified 13 thirteen cortical landmarks (Fig. 1 A): visual, midfrontal, orbitofrontal, motor, anterior and posterior cingulate, superior and ventrolateral temporal, anterior temporal pole, anterior insula, inferior frontal, angular gyrus, and superior parietal. These locations were chosen to gather neuropathology data which is part of a separate ongoing research project. To measure cortical thickness at these locations, we use pipeline developed in [wisse2021downstream], shown in Fig. 1 B.
We benchmark variants of popular biomedical image segmentation deep learning models: (1.) nnUNet [isensee2021nnu]; four variants of AnatomyNet [zhu2019anatomynet] based on squeeze-and-excitation blocks [rickmann2019project]: (2.) Spatial excitation AnatomyNet (SE), (3.) Channel excitation AnatomyNet (CE), (4.) Projection layer-based [rickmann2019project] AnatomyNet (Project), (5.) Channel-spatial excitation AnatomyNet (CE + SE); (6.) 3D Unet-like network [khandelwal2020domain]; (7.) VoxResNet [chen2018voxresnet]; (8.) VNet [milletari2016v] ; and (9.) Attention Unet [oktay2018attention].
We use PyTorch 1.5.1 and Nvidia Quadro RTX 5000 GPUs to train the models using user-annotated patches described above. Patches were standardized, and then normalized between 0 and 1. For all the models, except the off-the-shelf nnUNet, which has its own internal optimization procedure, we use a batch size of 2, Adam optimizer with learning rate 0.001, weight decay of 0.00005 for 30 epochs until convergence with Generalized Dice Coefficient as the loss function. We performed the following random data augmentation to each image patch: flipping, rotation, and elastic deformation. Inference was performed on whole hemisphere images, using the model which gave best validation accuracy across the epochs, via the patch-wise sliding window.
Evaluation. First, we compare the performance of the deep learning architectures at patch-level by reporting Dice Coefficient (DSC) and Hausdorff Distance 95th percentile (HD95) in a five-fold cross-validation setting for the 30 patches. We then employ the best performing model, based on DSC, to segment the cortical mantle in whole hemispheres. We compute the thickness of the cortical mantle around 13 landmarks as described in Section 2
. We then correlate the cortical thickness of manual, and automated segmentations via Pearson’s correlation coefficient with t-distribution as the test statistic reporting the p-value (with 0.05 significance level), and the Average fixed raters Intra-class Correlation Coefficient (ICC) for the 13 cortical locations.
4 Results and Discussion
4.1 Deep learning segmentations
Table 1 tabulates the performance of different neural network architectures across 5-fold cross validation. Clearly, nnUNet outperforms the rest by a large margin in terms of DSC of 98.52 5.84 and HD95 of 0.28 0.00 mm. Given its superior performance, we use nnUNet to segment the cortical mantle in whole hemisphere across the 32 subjects. We show qualitative results in Fig. 2. We observe that inferior performing models (2 B-E) mislabel white matter as gray matter. The AnatomyNet and its variants (2 F-I) are able to distinguish gray matter from white matter, but fail to segment the low intensity anterior and posterior regions, virtue of coil limitations, shown in white arrows. There are also some under-segmentations of the cortex (white arrows). Fig. 2 J-L depicts that the best performing model, nnUNet, which clearly demarcates GM/WM boundary, segments regions with low signal, which were not included in the training patches, making the performance of nnUNet even more remarkable. Fig. 2 M shows 3D renderings of whole cortical segmentation.
|nnUNet||98.52 5.84||0.28 0.00|
|AnatomyNet (SE)||93.29 6.30||0.76 0.66|
|AnatomyNet (CE)||93.07 6.59||0.83 0.69|
|AnatomyNet (Project)||91.54 7.62||0.89 0.69|
|AnatomyNet (CE + SE)||90.61 8.21||0.96 0.72|
|3D Unet||90.32 8.60||1.02 0.57|
|VoxResNet||87.43 13.90||1.34 1.46|
|VNet||78.65 16.65||2.54 2.01|
|Attention Unet||78.53 17.37||2.26 1.86|
4.2 Cortical Thickness Measurements
We correlate thickness (mm) between ground truth and automated nnUNet-based segmentations at 13 landmarks. Fig. 3 shows good agreement between ground truth and automated thickness with 8 regions having r-value greater than 0.6, and 11 regions reaching statistical significance at 0.05, except in inferior frontal, and superior temporal regions. We also observe high ICC scores; with 8 regions having ICC greater than 0.8, which confirms that automated segmentations are accurate to give desirable cortical thickness measurements.
4.3 Generalization to other imaging sequences
In Fig. 4, we qualitatively show that our model trained on 7 Tesla 0.3 x 0.3 x 0.3 mm T2w images, is able to generalize well to MRI sequences and resolutions unseen during training. Segmentation results on T2*w gradient echo ex vivo images, acquired [tisdall2021joint] at 0.28 mm and 0.16 mm isotropic resolution, are shown in Fig. 4 A and B respectively. Fig. 4 C and D show that our model is able to segment gray matter in the publicly available ex vivo T2w image acquired at 3 Tesla at a lower resolution of 0.5 x 0.5 x 0.5 mm [mancini2020multimodal].
5 Conclusion and Future Work
Our results show that even using limited patch-level training data from six subjects, nnUNet (and to a lesser extent AnatomyNet) is able to generate high-quality segmentations of cortical and subcortical gray matter in ex vivo MRI of brain hemispheres, generalizing well to areas of low contrast unseen during training, as well as to other MRI protocols, field strengths, and resolutions. Moreover, thickness measures derived from nnUNet segmentations concur with user-supervised thickness measurements, suggesting the feasibility of fully automated cortical thickness analysis in ex vivo MRI analogous to the way FreeSurfer is used for in vivo MRI morphometry. A limitation of our approach is that it does not separate subcortical gray matter from the cortex. In future work, we intend to address this limitation using anatomical priors, and develop techniques for groupwise normalization of ex vivo MRI and correlate cortical thickness with neuropathology.