1 Introduction
Glioma are the most common family of brain tumors, and forms some of highestmortality and economically costly diseases of brain cancer [1, 2, 3]
. The diagnosed method is highly relayed on manual segmentation and analysis of multimodal MRI scans by biomedical experts. Nevertheless, this diagnosed way is severely limited by the laborintensive character of the manual segmentation process and disagreement or mistakes between manual segmentation. Consequently, there exists a great need for a fast and robust automated segmentation algorithm. Convolutional neural networks (CNNs) have been verified to be extremely effective for a variety of semantic segmentation tasks
[4].While CNN segmentation algorithms are abundant in biomedical imaging, only very few make use of nestedtopological prior information. Among the few that do [5, 6, 7, 8, 9, 10, 11], we find three different approaches. First, the use of cascaded algorithms where the network consists of successive segmentation networks. Second, the information on the nestedclasses is incorporated into the loss function, imposing penalties on solutions that do not respect the nested geometry relations. Third, Markov random fields are used to formalizing class relationship in the postprocessing of the network output. Here, we make use of a new activation function [12] that is directly implementing class hierarchy in the network training and generalize it to 3 nested classes. For the glioma labels we assume that active tumor regions are always contained in the tumor core which is surrounded by the tumor edema, resulting in a hierarchical threeclass model. In sharp contrast with nestedclass method, the softmaxbased method of multiclass ignores the geometric prior between different classes, and assumes the classes are mutuallyexclusive, meaning one pixel cannot belong to different classes at the same time, which absolutely discards the topological information and sometimes leads the unreasonable segmentation results. The comparison of Dice score criteria between two different methods is implemented and it obviously indicates the nestedclass method achieves higher accuracy than the softmaxbased method, especially for the internalclasses.
In the following, we introduce a brief overview of startoftheart 3Dresidual Unet architecture and multiclassnested activation and loss function. We then propose and evaluate our model architectures for Brats tumor segmentation. Finally, we implement the comparison between two main avenues and illustrate the multilevel activation performs better especially in the interclass.
2 Methodology
2.1 Network Architecture
The nestedclasses relationship between different labels are shown in Fig.2. The general network structure shown in Fig.1 is stemming from the previously used glioma segmentation network by Isensee [13] to process large 3D input blocks of 144x144x144 voxels. The original network is inspired by the Unet [14]
which allows the network to intrinsically recombine different scales throughout the entire network. This vertical depth is set as 5, which balances between the spatial resolution and feature representations. The context module is a preactivation residual block, and is connected by 3x3x3 convolutions with input stride 2. The purpose of the localization pathway is to extract features from the lower levels of the network and transform them to a high spatial resolution by means of a simple upscale technology. The upsampled features and its corresponding level of the context aggregation feature are recombined via concatenation. Furthermore, the localization module, consisting of a 3x3x3 convolution followed by a 1x1x1 convolution, is designed to gather these features.
The deep supervision is introduced in the localization pathway by integrating segmentation layers at different levels of the network and combining them via elementwise summation to form the final network output. The output activation layer is multilevel Sigmoid layer instead of softmax layer in the Isensee’s network which converting the multiclass problem to binary ones. Intrinsically, the multilevel activation is the assemble of multisigmoid function and then straightforwardly maps to multiclass segmentation incorporating the topological prior. Consequently, it overcomes the softmaxbased method’s shortcoming which is blind to the geometric prior.
2.2 Crop preprocessing
For 3D network architecture, the larger patch size of training dataset contains more continuous context knowledge and localization information which are beneficial to improve the segmentation accuracy. In order to acquire to the larger cube size patch of 3D image, the valuable knowledge in the MRI is extracted as much as possible while the meaningless information is cropped. Then the crop processing is implemented, and the maximum size of cube patch is selected as [144,144,144].
The crop preprocessing equation is defined as:
(1) 
where and are the min and max nonzero information index of MRI image, and represents the length of nonzero information. is the cube patch size and selected as 144.
The index is recorded and used in the image postprocessing stage to recovery back to the original shape [155,240,240]. However, a little of meaningful information which exceeds the cube patch size 144 is unavoidably ignored and have little effect on the segmentation result. In order to equally compare the softmaxbased with the multilevel method, no data augmentation operation is used in the stage of image preprocssing.
2.3 Multilevel method
Here, we use one output channel and a multiclassnested activation function, as first proposed in [12]
.The multilevel method is inspired by continuous regression, and thereby generalizing logistic regression to hierarchicallynested classes. It is shown in Fig.3 and defined as
(2) 
Where is the sigmoid function, k is the steepness and h is the spacing between consecutive Sigmoids. For Brain tumor segmentation challenge 4classes nested label case, we have m+1=4, and we take h=0.5 and steepness=10. The corresponding loss function, called Modified CrossEntropy (MCE) in [12], is defined as
(3) 
where is the weight of corresponding label,which we take as(), where is the sum number of pixels, the number of pixels in each class, and where for the groundtruth label c of pixel i and otherwise. Furthermore, the mapping function is defined as
(4) 
Where is the Heaviside function. The other one loss function, called Normalized CrossEntropy (NCE) in [12], is defined as
(5) 
Furthermore, the mapping function is defined as
(6) 
where s is the softplus function,, and is the Heaviside function.
Weighted modified and Normalized crossentropy losses are naturally combined with standard crossentropy loss and mitigate the class unbalance problem. They also have the ability to encode of any hierarchical and mutuallyexclusive topological relationship of classes in a network architecture.
2.4 Evaluation metrics
In the task for BRATS, the number of positives and negatives are highly unbalanced. Consequently, four typical different metrics are used by the organizers to evaluate the performance of the algorithm and then rank the different teams.
Give a groundtruth segmentation map G and a segmentation map corresponding one class generated by the algorithm. The four evaluation criteria are defined as following.
Dice similarity coefficient(DSG):
(7) 
The Dice similarity coefficient measures the overlap in percentage between G and P.
Hausdorff distance (95th percentile) is defined as :
(8) 
where denotes the distance of x and y, denotes the supremum and for the infimum. This measures how far two subsets of a metric space are from each other. As used in this challenge, it is modified to obtain a robustified version by using the 95th percentile instead of the maximum(100 percentile) distance.
Sensitivity (also called the true positive rate) measures the proportion of actual positives that are correctly identified. Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified. Assume is the number of real positive prediction pixel of lesion and is the number of real negative prediction pixel of lesion. Condition positive consists with true positive and false negative . Besides, the condition negative is also divided into true negative and false positive.
Then, the metrics of Sensitivity and Specificity are illustrated as:
(9) 
(10) 
Then the values of those four metrics were computed by the organizers independently and made available in the validation leaderboard.
3 Experiment results
In BRATS 2018 dataset [15, 16, 17, 18], there are four labels, Necrotic core, Edema, Nonenhancing core and Enhancing core that form the three tumor classes in Fig.2. The dataset contains 4 different modalities for MRI, native (T1), postcontrast T1weighted (T1Gd), T2weighted (T2) and T2 Fluid Attenuated Inversion Recovery (FLAIR) which are all used as different input channels. We train the networks using ADAM optimizer with an initial learning rate of 0.0005, and to regularize the network, we use early stopping when the precision on the 20% of the training dataset reserved for validation is no longer improved, and dropout (with rate 0.3) in all residual block before the multiclass sigmoid function. Some slices of segmentation results containing the tumor, tumor core and enhancing core are shown in Fig.4. We observe that the topology geometry between different labels is constrained to the nestedclasses relationship, consequently avoiding errors stemming from the lack of topological prior.
Dice score  

Enhancing core  whole tumor  tumor Core  Weight scheme  
Multilevel(MCE)  0.719  0.857  0.769  0.4 
Multilevel(NCE)  0.676  0.857  0.755  0.4 
Multilevel(NCE)  0.633  0.837  0.736  0.5 
Multilevel(NCE)  0.655  0.856  0.758  0.3 
Softmaxbased method  0.691  0.861  0.763   
The segmentation result is severely affected by highly unbalanced problems existing in the Brats dataset. As class imbalance in a data set increases, the performance of a neural net trained on that data has been shown to decrease dramatically [19]. In order to mitigate this issue, many methods [20, 21, 22] were proposed to modify the loss function to alleviate this problems. Here,the weighted cross entropy incorporating the nestedclass information is proposed and investigated. We experimented with different weighting schemes (=1,0.5,0.4,0.3) and with the different losses （MCE and NCE） proposed in [12]. The best performing combination turned out to be =0.4 and MCE loss function. The segmentation thresholds to determine the boundaries between classes, were set to [0.95,1.65,2.2] on the validation process. For this final configuration, we reached Dice scores of 86% for the complete tumor, 77% for the tumor core and 72% for the enhancing core as presented in Table 1.
Dice score  Enhancing core  whole tumor  tumor Core 

Mean  0.71965  0.85685  0.76906 
StdDev  0.28526  0.09802  0.21962 
Median  0.84268  0.87823  0.84325 
25quantile  0.6889  0.83379  0.70743 
75quantile  0.8876  0.90895  0.91292 
The weightedmodifiedcrossentropy performs much better than the result achieved by normalized crossentropy, and weight scheme affects the segmentation result severely since the extraordinary unbalance problem. The different weight schemes are compared and the optimal weight scheme is taken as 0.4. In comparison with the softmaxbased method based on the same network architecture proposed by Isensee without ensembles operation, any complicated image preprocessing and postprocessing steps and extra training dataset, it indicates that the Dice score of nestedclass (enhancing core) drastically improved from 0.691 to 0.719 while the Dice core of whole tumor and tumor core almost remains at same extent.
Mean  Enhancing core  whole tumor  tumor Core 

Sensitivity  0.74119  0.93916  0.78743 
Specificity  0.9974  0.98715  0.99591 
Hausdorff95  5.50007  10.84397  9.98557 
The quantitative evaluation (Mean, std, Median, 25%, 75% quantile) of Dice score of enhancing core and whole tumor and tumor core are showed in Table 2. And other evaluation metrics (the proportion of actual positives correctly identified—Sensitivity, the proportion of actual negatives correctly identified—Specificity and Hausdorff95) are listed in Table 3.
3.1 Threshold scheme definition and analysis
Setting the optimal threshold is an important component of the multiclass segmentation task, and it is straightforwardly linked to segmentation boundary. From the activation function (4 nestedclass sigmoid function) Fig.3, the 4 classes segmentation problem is corresponding with the threshold scheme with 3 parameters [Threshold1, Threshold2, Threshold3]. The threshold scheme is optimally chosen during the validation procedure, and then fixed and applied into test dataset.
In order to analyze how the threshold affects the segmentation accuracy, the relationship between boundary threshold and Dice score is illustrated in Fig.5. The target threshold is changed to the value taken from a specific interval which is considered to be possible to achieve optimal segmentation result when other thresholds are fixed at the optimal value. The criteria Dice score of three classes is very sensitive to the threshold3 value compared with other two threshold indexes, that it may drop into Dice score valley within interval [2.2,2.4]. The threshold2 index has little impact on the Dice score of whole classes except for threshold greater than 1.8. Consequently, it is easier to make an optimal threshold scheme after determining indexes of threshold3 and threshold2. After experiment and optimization, the suitable threshold scheme in the Brats challenge is selected as [0.95,1.65,2.2].
4 Conclusions
In this paper we applied the technique of multilevel activation to the nested classes segmentation of glioma. The results of our experiments indicate that the multilevel activation function and its corresponding loss function are efficient compared to Softmax output layer based on the same network framework. Using the MCE loss function and a reweighting scheme with powerlaw =0.4, we obtain Dice scores 86% for complete tumor, 77% for tumor core and 72% for enhancing core on the validation leaderboard of the 2018 BRATS challenge, proving the applicability of the multilevel activation scheme. Finally, this activation could be combined with other network architectures. Using it with the best performing architecture of the BRATS challenge could even lead to further improved results.
References
 [1] Davis M.E.: Glioblastoma: Overview of Disease and Treatment. Clinical journal of oncology nursing. 20(5),S2S8(2016) 10.1188/16.CJON.S1.28
 [2] Hanif, F., Muzaffar, K., Perveen,K., Malhi, S.M., Simjee,S.U.: Glioblastoma Multiforme: A Review of its Epidemiology and Pathogenesis through Clinical Presentation and Treatment. Asian Pacific Journal of Cancer Prevention. 18,3–9 (2017)
 [3] Birbrair, A., Sattiraju, A., Zhu, D., Zulato, G., Batista, I., Nguyen, V.T., Messi, M.L., Solingapuram, Sai.K.K., Marini, F.C., Delbono, O., Mintz, A.: Novel Peripherally Derived NeuralLike Stem Cells as Therapeutic Carriers for Treating Glioblastomas. STEM CELLS Translational Medicine. 6,471–481 (2017)

[4]
Gu,J.X., Wang,Z.H., Kuen,J., Ma, L.Y., Shahroudy, A., Shuai,B., Liu, T., Wang, X.X., Wang.L., Wang,G., Cai, J.F., Chen, T.: Recent Advances in Convolutional Neural Networks. Pattern Recognition.
77,354–377 (2018)  [5] Nosrati, M.S., Hamarneh, G.: Local optimization based segmentation of spatiallyrecurring, multiregion objects with part configuration constraints. IEEE Transactions on Medical Imaging 33, 1845–1859 (2014)
 [6] BenTaieb, A., Hamarneh, G.: Topology Aware Fully Convolutional Networks for Histology Gland Segmentation. In Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W., eds.: MICCAI 2016. Lecture Notes in Computer Science, Cham, Springer International Publishing (2016)
 [7] Christ, P.F., Elshaer,M.E.A., Ettlinger,F., Tatavarty,S., and Bickel, M., Bilic,P., Rempfler, M., Armbruster, M., Hofmann, F., Anastasi, M.D.,Sommer,W.H.,Ahmadi,S.A.,Menze,B.H.: Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields. In Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W., eds.: MICCAI.Lecture Notes in Computer Science 9900, (2016)
 [8] Fidon, L., Li,W.Q., GarciaPerazaHerrera,L.C., Ekanayake,J., Kitchen,N., Ourselin,S., Vercauteren,T.: Generalised Wasserstein Dice Score for Imbalanced Multiclass Segmentation using Holistic Convolutional Networks. Oral presentation at the MICCAI 2017 Brain Lesion (BrainLes) Workshop 112, (2014)

[9]
Bauer, S., Tessier,J., Krieter, O., Nolte, L.P., Reyes,M.: Integrated spatiotemporal segmentation of longitudinal brain tumor imaging studies. In Menze, B., Langs, G., Montillo,A., Kelm, M., Müller, H., Tu, Z., eds.: Medical Computer Vision. Large Data in Medical Imaging, Cham, Springer International Publishing 7183, (2014)
 [10] Alberts, E., Charpiat,G., Tarabalka,Y., Huber,T., Weber, M.A., Bauer,J., Zimmer,C., Menze,B.H.: A Nonparametric Growth Model for Brain Tumor Segmentation in Longitudinal MR Sequences. MICCAI Brain Lesion Workshop 6979, (2015)

[11]
Liu, Z.W., Li,X.X., Luo,P., Loy, C.C., Tang, X.O.: Deep Learning Markov Random Field for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11,8828 (2017)
 [12] Piraud, M., Sekuboyina,A., Menze,B.H.: Multilevel Activation for Segmentation of Hierarchicallynested Classes. Computer Vision and Pattern Recognition workshop (2018)
 [13] Isensee, F., Kickingereder.P., Wick,W., Bendszus,M., MaierHein,K.H.: Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge. MICCAI BraTs Challenge, (2017)
 [14] Ronneberger,O., Fischer,P., Brox,T.: UNet: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and ComputerAssisted Intervention (MICCAI), Springer, LNCS 234241, (2015)
 [15] Menze B.H., Jakab A., Bauer S., et al.: The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging 34(10),19932024(2015)
 [16] Bakas S., Akbari H., Sotiras A., et al..: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Nature Scientific Data, (2017)
 [17] Bakas S., Akbari H., Sotiras A., Bilello M., Rozycki M., Kirby J., Freymann J., Farahani K., Davatzikos C.: Segmentation Labels and Radiomic Features for the Preoperative Scans of the TCGAGBM collection. The Cancer Imaging Archive (2017). 10.7937/K9/TCIA.2017.KLXWJJ1Q
 [18] Bakas S., Akbari H., Sotiras A., Bilello M., Rozycki M., Kirby J., Freymann J., Farahani K., Davatzikos C.: Segmentation Labels and Radiomic Features for the Preoperative Scans of the TCGALGG collection. The Cancer Imaging Archive (2014). 10.7937/K9/TCIA.2017.GJQ7R0EF

[19]
Mazurowski M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural networks
21(2),427–436 (2017)  [20] Milletari, F., Navab, N., Ahmadi, S.A.: VNet: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Fourth International Conference on 3D Vision 16,565–571 (2016)
 [21] Sudre, C.H., Li, W.Q., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso M. et al. (eds) Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA 2017, MLCDS 2017. Lecture Notes in Computer Science 10553, (2017)
 [22] Crum, W.R., Camara, O., Hill, D.L.G.: Generalized Overlap Measures for Evaluation and Validation in Medical Image Analysis. IEEE Transactions on Medical Imaging 25(11),1451–1461 (2006)
Comments
There are no comments yet.