. The diagnosed method is highly relayed on manual segmentation and analysis of multi-modal MRI scans by bio-medical experts. Nevertheless, this diagnosed way is severely limited by the labor-intensive character of the manual segmentation process and disagreement or mistakes between manual segmentation. Consequently, there exists a great need for a fast and robust automated segmentation algorithm. Convolutional neural networks (CNNs) have been verified to be extremely effective for a variety of semantic segmentation tasks.
While CNN segmentation algorithms are abundant in biomedical imaging, only very few make use of nested-topological prior information. Among the few that do [5, 6, 7, 8, 9, 10, 11], we find three different approaches. First, the use of cascaded algorithms where the network consists of successive segmentation networks. Second, the information on the nested-classes is incorporated into the loss function, imposing penalties on solutions that do not respect the nested geometry relations. Third, Markov random fields are used to formalizing class relationship in the post-processing of the network output. Here, we make use of a new activation function  that is directly implementing class hierarchy in the network training and generalize it to 3 nested classes. For the glioma labels we assume that active tumor regions are always contained in the tumor core which is surrounded by the tumor edema, resulting in a hierarchical three-class model. In sharp contrast with nested-class method, the softmax-based method of multi-class ignores the geometric prior between different classes, and assumes the classes are mutually-exclusive, meaning one pixel cannot belong to different classes at the same time, which absolutely discards the topological information and sometimes leads the unreasonable segmentation results. The comparison of Dice score criteria between two different methods is implemented and it obviously indicates the nested-class method achieves higher accuracy than the softmax-based method, especially for the internal-classes.
In the following, we introduce a brief overview of start-of-the-art 3D-residual U-net architecture and multi-class-nested activation and loss function. We then propose and evaluate our model architectures for Brats tumor segmentation. Finally, we implement the comparison between two main avenues and illustrate the multi-level activation performs better especially in the inter-class.
2.1 Network Architecture
The nested-classes relationship between different labels are shown in Fig.2. The general network structure shown in Fig.1 is stemming from the previously used glioma segmentation network by Isensee  to process large 3D input blocks of 144x144x144 voxels. The original network is inspired by the U-net 
which allows the network to intrinsically recombine different scales throughout the entire network. This vertical depth is set as 5, which balances between the spatial resolution and feature representations. The context module is a pre-activation residual block, and is connected by 3x3x3 convolutions with input stride 2. The purpose of the localization pathway is to extract features from the lower levels of the network and transform them to a high spatial resolution by means of a simple upscale technology. The upsampled features and its corresponding level of the context aggregation feature are recombined via concatenation. Furthermore, the localization module, consisting of a 3x3x3 convolution followed by a 1x1x1 convolution, is designed to gather these features.
The deep supervision is introduced in the localization pathway by integrating segmentation layers at different levels of the network and combining them via elementwise summation to form the final network output. The output activation layer is multi-level Sigmoid layer instead of softmax layer in the Isensee’s network which converting the multi-class problem to binary ones. Intrinsically, the multi-level activation is the assemble of multi-sigmoid function and then straightforwardly maps to multi-class segmentation incorporating the topological prior. Consequently, it overcomes the softmax-based method’s shortcoming which is blind to the geometric prior.
2.2 Crop preprocessing
For 3D network architecture, the larger patch size of training dataset contains more continuous context knowledge and localization information which are beneficial to improve the segmentation accuracy. In order to acquire to the larger cube size patch of 3D image, the valuable knowledge in the MRI is extracted as much as possible while the meaningless information is cropped. Then the crop processing is implemented, and the maximum size of cube patch is selected as [144,144,144].
The crop preprocessing equation is defined as:
where and are the min and max non-zero information index of MRI image, and represents the length of non-zero information. is the cube patch size and selected as 144.
The index is recorded and used in the image post-processing stage to recovery back to the original shape [155,240,240]. However, a little of meaningful information which exceeds the cube patch size 144 is unavoidably ignored and have little effect on the segmentation result. In order to equally compare the softmax-based with the multi-level method, no data augmentation operation is used in the stage of image pre-procssing.
2.3 Multi-level method
Here, we use one output channel and a multi-class-nested activation function, as first proposed in 
.The multi-level method is inspired by continuous regression, and thereby generalizing logistic regression to hierarchically-nested classes. It is shown in Fig.3 and defined as
Where is the sigmoid function, k is the steepness and h is the spacing between consecutive Sigmoids. For Brain tumor segmentation challenge 4-classes nested label case, we have m+1=4, and we take h=0.5 and steepness=10. The corresponding loss function, called Modified Cross-Entropy (MCE) in , is defined as
where is the weight of corresponding label,which we take as(), where is the sum number of pixels, the number of pixels in each class, and where for the ground-truth label c of pixel i and otherwise. Furthermore, the mapping function is defined as
Where is the Heaviside function. The other one loss function, called Normalized Cross-Entropy (NCE) in , is defined as
Furthermore, the mapping function is defined as
where s is the softplus function,, and is the Heaviside function.
Weighted modified and Normalized cross-entropy losses are naturally combined with standard cross-entropy loss and mitigate the class unbalance problem. They also have the ability to encode of any hierarchical and mutually-exclusive topological relationship of classes in a network architecture.
2.4 Evaluation metrics
In the task for BRATS, the number of positives and negatives are highly unbalanced. Consequently, four typical different metrics are used by the organizers to evaluate the performance of the algorithm and then rank the different teams.
Give a ground-truth segmentation map G and a segmentation map corresponding one class generated by the algorithm. The four evaluation criteria are defined as following.
Dice similarity coefficient(DSG):
The Dice similarity coefficient measures the overlap in percentage between G and P.
Hausdorff distance (95th percentile) is defined as :
where denotes the distance of x and y, denotes the supremum and for the infimum. This measures how far two subsets of a metric space are from each other. As used in this challenge, it is modified to obtain a robustified version by using the 95th percentile instead of the maximum(100 percentile) distance.
Sensitivity (also called the true positive rate) measures the proportion of actual positives that are correctly identified. Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified. Assume is the number of real positive prediction pixel of lesion and is the number of real negative prediction pixel of lesion. Condition positive consists with true positive and false negative . Besides, the condition negative is also divided into true negative and false positive.
Then, the metrics of Sensitivity and Specificity are illustrated as:
Then the values of those four metrics were computed by the organizers independently and made available in the validation leaderboard.
3 Experiment results
In BRATS 2018 dataset [15, 16, 17, 18], there are four labels, Necrotic core, Edema, Non-enhancing core and Enhancing core that form the three tumor classes in Fig.2. The dataset contains 4 different modalities for MRI, native (T1), post-contrast T1-weighted (T1Gd), T2-weighted (T2) and T2 Fluid Attenuated Inversion Recovery (FLAIR) which are all used as different input channels. We train the networks using ADAM optimizer with an initial learning rate of 0.0005, and to regularize the network, we use early stopping when the precision on the 20% of the training dataset reserved for validation is no longer improved, and dropout (with rate 0.3) in all residual block before the multi-class sigmoid function. Some slices of segmentation results containing the tumor, tumor core and enhancing core are shown in Fig.4. We observe that the topology geometry between different labels is constrained to the nested-classes relationship, consequently avoiding errors stemming from the lack of topological prior.
|Enhancing core||whole tumor||tumor Core||Weight scheme|
The segmentation result is severely affected by highly unbalanced problems existing in the Brats dataset. As class imbalance in a data set increases, the performance of a neural net trained on that data has been shown to decrease dramatically . In order to mitigate this issue, many methods [20, 21, 22] were proposed to modify the loss function to alleviate this problems. Here,the weighted cross entropy incorporating the nested-class information is proposed and investigated. We experimented with different weighting schemes (=1,0.5,0.4,0.3) and with the different losses （MCE and NCE） proposed in . The best performing combination turned out to be =0.4 and MCE loss function. The segmentation thresholds to determine the boundaries between classes, were set to [0.95,1.65,2.2] on the validation process. For this final configuration, we reached Dice scores of 86% for the complete tumor, 77% for the tumor core and 72% for the enhancing core as presented in Table 1.
|Dice score||Enhancing core||whole tumor||tumor Core|
The weighted-modified-cross-entropy performs much better than the result achieved by normalized cross-entropy, and weight scheme affects the segmentation result severely since the extraordinary unbalance problem. The different weight schemes are compared and the optimal weight scheme is taken as 0.4. In comparison with the softmax-based method based on the same network architecture proposed by Isensee without ensembles operation, any complicated image pre-processing and post-processing steps and extra training dataset, it indicates that the Dice score of nested-class (enhancing core) drastically improved from 0.691 to 0.719 while the Dice core of whole tumor and tumor core almost remains at same extent.
|Mean||Enhancing core||whole tumor||tumor Core|
The quantitative evaluation (Mean, std, Median, 25%, 75% quantile) of Dice score of enhancing core and whole tumor and tumor core are showed in Table 2. And other evaluation metrics (the proportion of actual positives correctly identified—Sensitivity, the proportion of actual negatives correctly identified—Specificity and Hausdorff95) are listed in Table 3.
3.1 Threshold scheme definition and analysis
Setting the optimal threshold is an important component of the multi-class segmentation task, and it is straightforwardly linked to segmentation boundary. From the activation function (4 nested-class sigmoid function) Fig.3, the 4 classes segmentation problem is corresponding with the threshold scheme with 3 parameters [Threshold-1, Threshold-2, Threshold-3]. The threshold scheme is optimally chosen during the validation procedure, and then fixed and applied into test dataset.
In order to analyze how the threshold affects the segmentation accuracy, the relationship between boundary threshold and Dice score is illustrated in Fig.5. The target threshold is changed to the value taken from a specific interval which is considered to be possible to achieve optimal segmentation result when other thresholds are fixed at the optimal value. The criteria Dice score of three classes is very sensitive to the threshold-3 value compared with other two threshold indexes, that it may drop into Dice score valley within interval [2.2,2.4]. The threshold-2 index has little impact on the Dice score of whole classes except for threshold greater than 1.8. Consequently, it is easier to make an optimal threshold scheme after determining indexes of threshold-3 and threshold-2. After experiment and optimization, the suitable threshold scheme in the Brats challenge is selected as [0.95,1.65,2.2].
In this paper we applied the technique of multi-level activation to the nested classes segmentation of glioma. The results of our experiments indicate that the multi-level activation function and its corresponding loss function are efficient compared to Softmax output layer based on the same network framework. Using the MCE loss function and a reweighting scheme with power-law =0.4, we obtain Dice scores 86% for complete tumor, 77% for tumor core and 72% for enhancing core on the validation leaderboard of the 2018 BRATS challenge, proving the applicability of the multi-level activation scheme. Finally, this activation could be combined with other network architectures. Using it with the best performing architecture of the BRATS challenge could even lead to further improved results.
-  Davis M.E.: Glioblastoma: Overview of Disease and Treatment. Clinical journal of oncology nursing. 20(5),S2-S8(2016) 10.1188/16.CJON.S1.2-8
-  Hanif, F., Muzaffar, K., Perveen,K., Malhi, S.M., Simjee,S.U.: Glioblastoma Multiforme: A Review of its Epidemiology and Pathogenesis through Clinical Presentation and Treatment. Asian Pacific Journal of Cancer Prevention. 18,3–9 (2017)
-  Birbrair, A., Sattiraju, A., Zhu, D., Zulato, G., Batista, I., Nguyen, V.T., Messi, M.L., Solingapuram, Sai.K.K., Marini, F.C., Delbono, O., Mintz, A.: Novel Peripherally Derived Neural-Like Stem Cells as Therapeutic Carriers for Treating Glioblastomas. STEM CELLS Translational Medicine. 6,471–481 (2017)
Gu,J.X., Wang,Z.H., Kuen,J., Ma, L.Y., Shahroudy, A., Shuai,B., Liu, T., Wang, X.X., Wang.L., Wang,G., Cai, J.F., Chen, T.: Recent Advances in Convolutional Neural Networks. Pattern Recognition.77,354–377 (2018)
-  Nosrati, M.S., Hamarneh, G.: Local optimization based segmentation of spatially-recurring, multi-region objects with part configuration constraints. IEEE Transactions on Medical Imaging 33, 1845–1859 (2014)
-  BenTaieb, A., Hamarneh, G.: Topology Aware Fully Convolutional Networks for Histology Gland Segmentation. In Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W., eds.: MICCAI 2016. Lecture Notes in Computer Science, Cham, Springer International Publishing (2016)
-  Christ, P.F., Elshaer,M.E.A., Ettlinger,F., Tatavarty,S., and Bickel, M., Bilic,P., Rempfler, M., Armbruster, M., Hofmann, F., Anastasi, M.D.,Sommer,W.H.,Ahmadi,S.A.,Menze,B.H.: Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields. In Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W., eds.: MICCAI.Lecture Notes in Computer Science 9900, (2016)
-  Fidon, L., Li,W.Q., Garcia-Peraza-Herrera,L.C., Ekanayake,J., Kitchen,N., Ourselin,S., Vercauteren,T.: Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks. Oral presentation at the MICCAI 2017 Brain Lesion (BrainLes) Workshop 1-12, (2014)
Bauer, S., Tessier,J., Krieter, O., Nolte, L.P., Reyes,M.: Integrated spatio-temporal segmentation of longitudinal brain tumor imaging studies. In Menze, B., Langs, G., Montillo,A., Kelm, M., Müller, H., Tu, Z., eds.: Medical Computer Vision. Large Data in Medical Imaging, Cham, Springer International Publishing 71-83, (2014)
-  Alberts, E., Charpiat,G., Tarabalka,Y., Huber,T., Weber, M.A., Bauer,J., Zimmer,C., Menze,B.H.: A Nonparametric Growth Model for Brain Tumor Segmentation in Longitudinal MR Sequences. MICCAI Brain Lesion Workshop 69-79, (2015)
Liu, Z.W., Li,X.X., Luo,P., Loy, C.C., Tang, X.O.: Deep Learning Markov Random Field for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 1-1,8828 (2017)
-  Piraud, M., Sekuboyina,A., Menze,B.H.: Multi-level Activation for Segmentation of Hierarchically-nested Classes. Computer Vision and Pattern Recognition workshop (2018)
-  Isensee, F., Kickingereder.P., Wick,W., Bendszus,M., Maier-Hein,K.H.: Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge. MICCAI BraTs Challenge, (2017)
-  Ronneberger,O., Fischer,P., Brox,T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, LNCS 234-241, (2015)
-  Menze B.H., Jakab A., Bauer S., et al.: The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging 34(10),1993-2024(2015)
-  Bakas S., Akbari H., Sotiras A., et al..: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Nature Scientific Data, (2017)
-  Bakas S., Akbari H., Sotiras A., Bilello M., Rozycki M., Kirby J., Freymann J., Farahani K., Davatzikos C.: Segmentation Labels and Radiomic Features for the Pre-operative Scans of the TCGA-GBM collection. The Cancer Imaging Archive (2017). 10.7937/K9/TCIA.2017.KLXWJJ1Q
-  Bakas S., Akbari H., Sotiras A., Bilello M., Rozycki M., Kirby J., Freymann J., Farahani K., Davatzikos C.: Segmentation Labels and Radiomic Features for the Pre-operative Scans of the TCGA-LGG collection. The Cancer Imaging Archive (2014). 10.7937/K9/TCIA.2017.GJQ7R0EF
Mazurowski M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural networks21(2),427–436 (2017)
-  Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Fourth International Conference on 3D Vision 16,565–571 (2016)
-  Sudre, C.H., Li, W.Q., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso M. et al. (eds) Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA 2017, ML-CDS 2017. Lecture Notes in Computer Science 10553, (2017)
-  Crum, W.R., Camara, O., Hill, D.L.G.: Generalized Overlap Measures for Evaluation and Validation in Medical Image Analysis. IEEE Transactions on Medical Imaging 25(11),1451–1461 (2006)