Primary central nervous system (CNS) tumors refer to a heterogeneous group of tumors arising from cells within the CNS and can be benign or malignant. Malignant primary brain tumors remain among the most difficult cancers to treat, with a 5-year overall survival rate no greater than 35%. The most common malignant primary brain tumors in adults are gliomas. In a patient with a suspected brain tumor, magnetic resonance imaging (MRI) with gadolinium is the investigation tool of choice . Manual segmentation of brain tumors on MR images is a challenging and time-consuming task. Therefore, an automatic and accurate brain tumor segmentation tool benefits radiologists and physician on both diagnosis and treatment planning.
. These works focus on designing a new network architecture, loss function, data augmentation, and training and testing procedure in order to improve the performance of brain tumor segmentation. Another method proposed by Kao et al. utilizes an existing brain parcellation to bring location information of brain into patch-based neural networks that improves the brain tumor segmentation performance of networks. Inspired by their work, we directly integrate lesion prior with multimodal MR images and input the fused information to a 3D U-Net. The proposed lesion prior fusion method includes two steps: (i) we first create a volume-of-interest (VOI) map from the ground-truth brain tumor lesions, and (ii) this VOI map is then integrated with the multimodal MR images and input to a 3D U-Net for the brain tumor segmentation. The main contribution of this paper is the integration of lesion prior to a 3D U-Net architecture that improves the brain tumor segmentation performance of the 3D U-Net.
2 Materials and Methods
Multimodal Brain Tumor Image Segmentation Benchmark (BraTS) 2017 [2, 3, 4, 14] provides 285 subjects in the training set and 46 subjects in the validation set. Multimodal MR images are provided for each subject, but ground-truth lesion mask is only available for the training subject. These MR images include T1-weighted, contrast-enhanced T1-weighted, T2-weighted, and fluid-attenuated inversion recovery scans, and the ground-truth lesion mask comprises the enhancing tumor (ET), edema (ED), and necrotic & non-enhancing tumor (NCR/NET). The dimension of each image is in the and direction, and the voxel resolution is
. The provided data are intra-subject registered, interpolated to the same resolution and skull-stripped.
2.2 Volume-of-interest Map
The volume-of-interest (VOI) map is built in the Montreal Neurological Institute (MNI) 1mm space 
, and each voxel of the VOI map has a label ranging from 0 to 9, which represents different probabilities of observing the brain tumor lesions. First, we build the heatmaps of different types of brain tumor lesions in the MNI space, see the workflow in Fig.1.
We apply inter-subject registration which registers the ground-truth lesions of each BraTS 2017 training subject from the subject space to the MNI space using FLIRT  from FSL. We then split the brain lesions of each subject into three binary masks, and each binary mask only contains information of one type of lesion. For each type of lesion, we apply element-wise summation to the binary masks of all 285 training subjects and create the heatmap of this type of lesion. Fig. 2 shows the heatmaps of different brain tumor lesions from BraTS 2017 training subjects in the MNI space.
The heatmaps of different brain lesions are then used to create the VOI map. The VOI map construction accounts for the fact that the whole tumor is a superset of ET, NCR/NET and ED, and the tumor core includes ET and NCR/NET. In addition, ETs are usually observed in patients with high-grade gliomas whose survival rate is considerably lower than patients with low-grade gliomas. Based on these observations, we create Algorithm 1 to generate the VOI map and prioritize the order of the VOI labels.
Note that the VOI labels are based on the thresholds which are chosen from the percentiles of non-zero voxels of heatmaps. For each lesion type, we sort the frequency counts of the non-zero voxels, and the heatmaps are used to generate these frequency counts. The percentile thresholds () are selected from these sorted frequency counts. We then use these percentile thresholds to create the VOI label mapping. Any given voxel location in the VOI map has probabilities of being different types of lesion. We examined different thresholds, and percentiles yield the best overall segmentation performance. Fig. 3
shows the VOI map and the distribution of brain tumor lesions occurring in the different labels of VOI map. This distribution is computed by dividing the total voxel value of lesions in the heatmaps by the total volume of the corresponding VOI label. This distribution shows that (i) the prior probabilities of different lesions depend on their corresponding labels in the VOI label map, and (ii) lesions have higher probabilities to happen in the larger VOI labels.
2.3 3D U-Net
2.3.1 Data pre-processing.
Intensity normalization is the procedure of mapping intensities of different MR images into a standard scale, and it is an essential step to avoid initial biases and improve the performance of the network. For each MR image, we first clip it at [0.2 percentile, 99.8 percentile] of non-zero voxels to remove the outliers and subsequently normalize every voxel within the brain with respect to their mean and standard deviation. That is,where is the index of voxel inside the brain, is the normalized voxel, is the corresponding raw voxel, and and are the mean and standard deviation of the raw voxels inside the brain, respectively.
2.3.2 Network architecture.
The proposed network architecture shown in Fig. 4 is based on 3D U-Nets [5, 7]. Different colors of blocks represent different types of layers. The number of convolutional kernels is indicated within the white box. Group normalization  is used, and the number of groups is set to 4. Trilinear interpolation is used in the upsampling layer.
2.3.3 Training and testing procedure.
The proposed network is trained with randomly cropped patches of size
voxels and batch size 2. A larger input patch capture more contextual information of the brain. In every epoch, a cropped patch is randomly extracted from each subject. The network is trained for a total of 300 epochs. The weights of network are updated by Adam algorithm with an initial learning rate following the schedule of , L2 penalty weight decay of , and AMSGrad . For the loss function, the standard multi-class cross-entropy loss with the hard negative mining is used to solve the class imbalance problem of the dataset. We only back-propagate the negative (background) voxels with the largest losses (hard negative) and the positive (lesions) voxels to the gradients. In our implementation, the number of selected negative voxels is at most three times more than the number of positive voxels. In addition, data augmentation is not used for both training and testing. At the testing time, we input the entire image of size voxels into the trained 3D U-Net for each patient to get the predicted lesion mask. Training takes approximate 12.5 hours, and testing takes approximate 1.5 seconds per subject on an Nvidia 1080 Ti GPU.
2.4 Integrate the VOI Map and a 3D U-Net
Fig. 5 shows the pipeline of integrating the VOI map and a 3D U-Net for brain tumor segmentation. First, we register the VOI map from the MNI 1mm space to the subject space using FLIRT  from FSL, and this registered VOI map is then split into 9 binary masks. Each binary mask only contains information of one VOI label. Afterward, these binary masks are concatenated with the multimodal MR images. In the end, we input this 13-channel (4 image channels + 9 VOI channels) image to a 3D U-Net for both training and testing.
2.5 Evaluation Metrics
The employed evaluation metrics are the (i) Dice similarity coefficient (DSC) and the (ii) 95 percentile of the Hausdorff distance (H95). DSC is the quotient of similarity and ranges between 0 and 1 which is defined as
where and are the number of voxels in the ground-truth label and predict label, respectively. Hausdorff distance measures how far two subsets of a metric space are from each other which is defined as
where is the Euclidean distance, is the supremum, and is the infimum.
3 Experimental Results and Discussion
First, we examine if the proposed lesion prior fusion method improves the brain tumor segmentation performance of the proposed 3D U-Net. Therefore, we train two identical 3D U-Nets with and without additional VOI map using 285 subjects of BraTS 2017 training set. BraTS 2017 validation set is used to evaluate the performance of these networks. The quantitative results are shown in Table 1. From the first two rows of Table 1, our proposed lesion prior fusion method improves the performance of 3D U-Net, particularly for the DSC of ET (3.5%), and H95 of ET (2.56) and whole tumor (2.39).
|Single 3D U-Net (baseline)||0.695||0.896||0.762||6.79||6.92||11.38|
|Single 3D U-Net + VOI (proposed)||0.730||0.899||0.764||4.23||4.53||10.93|
|Ensemble of five 3D U-Nets (baseline)||0.723||0.902||0.763||5.99||4.75||10.58|
|Ensemble of five 3D U-Nets + VOI (proposed)||0.744||0.903||0.780||5.01||3.86||9.71|
|Isensee et al. ||0.732||0.896||0.797||4.55||6.97||9.48|
|Kamnitsas et al. ||0.738||0.901||0.797||4.50||4.23||6.56|
Second, we examine if the proposed lesion prior fusion method improves the performance of the ensemble of 3D U-Nets. Thus, we train two identical ensembles with and without additional VOI map using 285 subjects of BraTS 2017 training set. Each ensemble has five identical networks with different seed initializations, and the output of ensemble is averaged from five networks. BraTS 2017 validation set is used to evaluate the performance of ensembles, and the quantitative results are shown in Table 1. From the middle two rows of Table 1, our proposed lesion prior fusion method also improves the tumor segmentation performance of the ensemble of five 3D U-Nets, particularly for the DSC of ET (2.1%) and tumor core (1.7%). The reason why the VOI map has the greatest improvement on the ET is that the percentiles of ET heatmap have the highest priorities while we create the VOI map. In addition, the proposed VOI map, directly built from the heatmaps of brain lesions, has inhomogeneous labels within neighboring voxels that carry more precise information of brain tumor lesions to the 3D U-Net.
In the end, we compare the performance of our proposed method with the state-of-the-art methods [7, 10]. From Table 1, the baseline model has worse performance than the state-of-the-art methods but it achieves a competitive performance by integrating the proposed VOI map. It is noted that the ensemble of Kamnitsas et al.  contains 7 different types of models but our proposed ensemble only consists of five 3D U-Net.
We have proposed a novel method to integrate prior information about the lesion probabilities into a 3D U-Net for improving brain tumor segmentation. Our experimental results demonstrate that the proposed lesion prior fusion approach improves the segmentation performance of the baseline model. Moreover, the proposed lesion prior fusion method can be easily integrated with other network architectures to further potentially enhance their segmentation performance.
This work is partially supported by a National Institute of Health (NIH) funding # 5R01NS103774-02.
-  Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018)
-  Bakas, S., et al.: Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data 4, 170117 (2017)
-  Bakas, S., et al.: Segmentation labels and radiomic features for the pre-operative scans of the tcga-gbm collection. The Cancer Imaging Archive. (2017). https://doi.org/10.7937/K9/TCIA.2017.KLXWJJ1Q
-  Bakas, S., et al.: Segmentation labels and radiomic features for the pre-operative scans of the tcga-lgg collection. The Cancer Imaging Archive. (2017). https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF
-  Çiçek, Ö., et al.: 3d u-net: learning dense volumetric segmentation from sparse annotation. In: International conference on medical image computing and computer-assisted intervention. pp. 424–432. Springer (2016)
-  Grabner, G., et al.: Symmetric atlasing and model based segmentation: an application to the hippocampus in older adults. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 58–66. Springer (2006)
-  Isensee, F., et al.: Brain tumor segmentation and radiomics survival prediction: contribution to the brats 2017 challenge. In: International MICCAI Brainlesion Workshop. pp. 287–297. Springer (2017)
-  Isensee, F., et al.: No new-net. In: International MICCAI Brainlesion Workshop. pp. 234–244. Springer (2018)
-  Jenkinson, M., Smith, S.: A global optimisation method for robust affine registration of brain images. Medical image analysis 5(2), 143–156 (2001)
-  Kamnitsas, K., et al.: Ensembles of multiple models and architectures for robust brain tumour segmentation. In: International MICCAI Brainlesion Workshop. pp. 450–462. Springer (2017)
Kao, P.Y., et al.: Brain tumor segmentation and tractographic feature extraction from structural mr images for overall survival prediction. In: International MICCAI Brainlesion Workshop. pp. 128–141. Springer (2018)
-  Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (2015)
-  Lapointe, S., et al.: Primary brain tumours in adults. The Lancet (2018)
-  Menze, B.H., , et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34(10), 1993–2024 (2015)
Myronenko, A.: 3d mri brain tumor segmentation using autoencoder regularization. In: International MICCAI Brainlesion Workshop. pp. 311–320. Springer (2018)
-  Reddi, S.J., et al.: On the convergence of adam and beyond. In: International Conference on Learning Representations (2018)
Wu, Y., He, K.: Group normalization. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2018)