Brain MRI segmentation is an important task in many clinical applications. Various approaches for brain analysis rely on accurate segmentation of anatomical regions. For example, it is commonly used for measuring and visualizing different brain structures, for delineating lesions, for analysing brain development, and for characterization of brain disorders such as Alzheimer’s disease, epilepsy, schizophrenia, multiple sclerosis (MS), cancer, and infectious and degenerative diseases. Manual segmentation is the gold standard for in-vivo images. However, it requires outlining structures slice-by-slice by neuroradiologist, which is highly time-consuming and prone to rater-bias. Therefore, there is a need for automated segmentation approaches to provide accuracy close to that of expert raters’ with a high reproducibility.
Early works on segmentation of normal brain structures focus on white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF), which is important for studying early brain developments in infants and quantitative assessment of the brain tissue and intracranial volume in large scale studies. Atlas-based approaches [12, 7]
, which match intensity information between an atlas and target images and pattern recognition approaches
, which classify tissues based on a set of local intensity features, are the classical approaches that have been used for brain tissue segmentation. The MRBrainS Challenge 2013 was held to compare state-of-the-art segmentation algorithms on three brain structures in conjunction with the 16 International Conference on Medical Image Computing and Computer Assisted Intervention. Deep-learning based approaches have shown superior performances to the traditional state-of-art methods on the segmentation of brain stroke lesions, brain white matter lesions and brain tumors [6, 5, 9].
In this paper, we presented a deep-learning based method for segmenting eight brain tissues including cortical gray matter (GM), basal ganglia, WM, white matter lesions/hyperintensities (WMH), CSF, ventricles, cerebellum and brain stem. Deep dilated residual U-Net was adopted to learn context and texture information of different brain tissues. Multi-sequence data including T1, T1-IR and FLAIR which captures complementary information of different brain structures. The proposed 2-D network was more computationally efficient than 3D network and traditional U-Net. Experimental results showed that the proposed method outperforms traditional U-Net.
2.1 Dataset and Protocols
Thirty MRI scans were acquired on a 3.0 T Philips Achieva MR scanner at the University Medical Center Utrecht (Netherlands). The following sequences were acquired and used for the evaluation framework: 3D T1 (TR: 7.9 ms, TE: 4.5 ms), T1-IR (TR: 4416 ms, TE: 15 ms, and TI: 400 ms), and T2- FLAIR (TR: 11000 ms, TE: 125 ms, and TI: 2800 ms). The sequences were aligned by rigid registration using Elastix  and bias correction was performed using SPM8. After registration, the voxel size within all provided sequences (T1, T1-IR, and T2-FLAIR) was 0.96 0.96 3.00 . Seven scans with annotations were released as a public training set, and the remaining twenty-three scans were used as hidden testing set. For more details on the method of ranking performance, please find the relevant information on the challenge website.
2.1.2 Evaluation Metric
Three types of measures were employed to evaluate the segmentation results. The Dice coefficient is used to determine the spatial overlap and is defined as:
where G is the reference standard, P is the segmentation result.
The 95th-percentile of the Hausdorff distance is used to determine the distance between the segmentation boundaries. Hausdorff distance is defined as:
where d(x, y) denotes the distance of x and y, sup denotes the supremum and inf for the infimum.
The third measure is the volumetric similarity. Let and be the volume of lesion regions in and respectively. Then the volumetric similarity (VS) in percentage is defined as:
3.1 Image Preprocessing
A patient-wise normalization of the image intensities was performed both during training and testing. For the scan of each patient, the mean value and standard deviation were calculated based on intensities of all voxels. Then each image volume was normalized to zero mean and unit standard deviation. Rotation, shearing, scaling along horizontal direction (x-scaling), and scaling along vertical direction (y-scaling) were employed for data augmentation.After data augmentation, a four times larger training dataset was obtained.
3.2 2D Dilated Residual U-Net
We used Dilated Residual U-Net (DRUNet), which was originally proposed in  for nerve head tissues segmentation in optical coherence tomography images. DRUNet exploits the inherent advantages of the U-Net skip connections , residual learning  and dilated convolutions  to capture rich context information and offer a robust brain structure segmentation with a minimal number of trainable parameters.
DRUNet architecture is presented in Fig. 1. The model consists of downsampling and upsampling parts. In turn, each part includes one standard block and two residual blocks. Corresponding blocks in downsampling and upsampling parts are connected through skip connections. Convolution layers in both block types have 32 filters of size 3x3. In total the entire network consists of 156105 trainable parameters.
3.3 Combination of Modalities
Multi-sequence data including T1-weighted (T1), T1-weighted inversion recovery (T1-IR) and FLAIR which captures complementary information of different brain structures were used for training the network. In clinical practice, the combination of FLAIR and T1 is beneficial for segmenting white matter lesions while the combination of T1 and T1-IR is helpful for annotating cerebrospinal fluid. We feed different combinations of modalities for multiple networks.
3.4 Ensemble Model
To improve the robustness of our model, an ensemble method was used in the testing stage. Then when given a new testing subject, each subject will be segmented based on the averaged probability maps by the ensemble model.
3.5 Our Submissions
3.5.1 Submission 1
We used only DRUNet for simultaneously segmenting the ten labels including infarction and pathologies were set to background label during the training of the network. We generated five DRUNet models with the same architecture but trained with shuffled batches. Then in testing stage, each subject was segmented based on the averaged probability maps by the ensemble models.
3.5.2 Submission 2
We used two Dilated Residual U-Nets (DRUNet) and one traditional U-Net for segmenting different labels. Since not all the labels were annotated in the same modalities, i.e., white matter lesions were annotated on the FLAIR scan and the outer border of the CSF was segmented using both the T1-weighted scan and the T1-weighted inversion recovery scan, we employed a multi-stage approach to segment different tissues from coarse to fine using different combinations of input modalities. Firstly, coarse segmentation including eight brain tissues (other labels including infarction and pathologies were set to background label) was performed using FLAIR and T1-weighted modalities by DRUNet (model 1). Secondly, CSF was independently segmented using T1 and T1-IR modalities by DRUNet (model 2). Thirdly, since segmentation of white matter lesions is a very challenging task, we used the pre-trained model of the winning method in MICCAI WMH challenge  (model 3) to perform segmentation independently. Finally we fused the multi-stage segmentation results. Five DRUNet models for model 1 and model 2, respectively, with the same architecture were trained with shuffled batches.
4.1 Leave-one-subject-out Evaluation
To test the generalization performance of our systems across different subjects, we conducted an experiment on the public training datasets (seven subjects) in a leave-one-subject-out setting. Specifically, we used the subject IDs to split the public training dataset into training and validation sets. In each split, we used slices from six subjects for training, and the slices from the remaining subject for testing. This procedure was repeated until all of the subjects are used as testing. The results were shown in Table. 1. There exists significant segmentation difference on subject 4. We further observed the brain structures of subject 4 and found it was a heathy brain scan without WMHs, infarctions and other lesions. The reason for the performance difference could be that the models in first submission were trained on 10 labels including infarctions and other lesions while the models in the second submission were trained on 8 main structures excluding two other labels. When testing on healthy scans, the models trained with 8 main healthy tissues could be more effective since the data distributions among training and testing were similar.
|Metrics||Subject 1||Subject 2||Subject 3||Subject 4||Subject 5||Subject 6||Subject 7|
4.2 Comparison with U-Net
We further compared the performance of the proposed method (submission 1) with traditional U-Net using the state-of-the-art architecture proposed in . As shown in Table. 2, generally our approach outperformed traditional U-Net, especially in segmentation of WM and CSF, with an improvement of 8% and 11% in Dice score. WM and CSF are both large structures in brains. We concluded that the use of dilated convolutions is beneficial for capturing the context information of large target. Furthermore, our model is with much fewer trainable parameters (156105 vs 8748609). Thus the training of the network is computationally efficient. The segmentation results of both DRUNet and U-Net on test case 70 was shown in 2.
4.3 Results on Hidden Testing Cases
-  Devalla, S.K., Renukanand, P.K., Sreedhar, B.K., Perera, S., Mari, J.M., Chin, K.S., Tun, T.A., Strouthidis, N.G., Aung, T., Thiery, A.H., et al.: Drunet: A dilated-residual u-net deep learning network to digitally stain optic nerve head tissues in optical coherence tomography images. arXiv preprint arXiv:1803.00232 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
-  Klein, S., Staring, M., Murphy, K., Viergever, M.A., Pluim, J.P.: Elastix: a toolbox for intensity-based medical image registration. IEEE transactions on medical imaging 29(1), 196–205 (2010)
-  Li, H., Jiang, G., Wang, R., Zhang, J., Wang, Z., Zheng, W.S., Menze, B.: Fully convolutional network ensembles for white matter hyperintensities segmentation in mr images. arXiv preprint arXiv:1802.05203 (2018)
-  Li, H., Zhang, J., Muehlau, M., Kirschke, J., Menze, B.: Multi-scale convolutional-stack aggregation for robust white matter hyperintensities segmentation. arXiv preprint arXiv:1807.05153 (2018)
-  Maier, O., Menze, B.H., von der Gablentz, J., Häni, L., Heinrich, M.P., Liebrand, M., Winzeck, S., Basit, A., Bentley, P., Chen, L., et al.: Isles 2015-a public evaluation benchmark for ischemic stroke lesion segmentation from multispectral mri. Medical image analysis 35, 250–269 (2017)
-  Makropoulos, A., Gousias, I.S., Ledig, C., Aljabar, P., Serag, A., Hajnal, J.V., Edwards, A.D., Counsell, S.J., Rueckert, D.: Automatic whole brain mri segmentation of the developing neonatal brain. IEEE transactions on medical imaging 33(9), 1818–1831 (2014)
-  Mendrik, A.M., Vincken, K.L., Kuijf, H.J., Breeuwer, M., Bouvy, W.H., De Bresser, J., Alansary, A., De Bruijne, M., Carass, A., El-Baz, A., et al.: Mrbrains challenge: online evaluation framework for brain image segmentation in 3t mri scans. Computational intelligence and neuroscience 2015, 1 (2015)
-  Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34(10), 1993 (2015)
-  Moeskops, P., Benders, M.J., Chiţǎ, S.M., Kersbergen, K.J., Groenendaal, F., de Vries, L.S., Viergever, M.A., Išgum, I.: Automatic segmentation of mr brain images of preterm infants using supervised classification. NeuroImage 118, 628–641 (2015)
-  Ronneberger, O., P.Fischer, Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI). LNCS, vol. 9351, pp. 234–241. Springer (2015), http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a, (available on arXiv:1505.04597 [cs.CV])
-  Vrooman, H.A., Cocosco, C.A., van der Lijn, F., Stokking, R., Ikram, M.A., Vernooij, M.W., Breteler, M.M., Niessen, W.J.: Multi-spectral brain tissue segmentation using automatically trained k-nearest-neighbor classification. Neuroimage 37(1), 71–81 (2007)
-  Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)