Gliomas are the most prevalent brain tumors which occur more frequently in adults and may be initiated by glial cells . They account for nearly eighty percent of all malignant brain tumors diagnosed in the United States . There are two types of Glioma including High-grade Glioma (HGG) and Low-grade Glioma (LGG). HGG tumors are malignant and grow quickly that usually require a surgery where the average survival period for affected patients has been reported two years or less. LGG tumors are followed by a few years of life expectancy along with the aggressive treatment being sometimes delayed as much as possible .
To analyze and monitor brain tumor images, there are some major tools such as Magnetic Resonance Imaging (MRI). The MRI provides detailed brain images and is a common imaging method which is utilized to visualize the extent of tumor regions. However, the manual segmentation of 3D MRI scans requires significant amount of time and is susceptible to inaccuracies and variability due to the highly complex nature of tumor appearance. Accordingly, automatic brain tumor segmentation of MRI images can dramatically affect the improvement of diagnosis, prediction of growth rates, and treatment plans, particularly where access to an expert radiologist is restricted.
A brain is visualized by employing different MRI modalities. The most commonly used MRI modalities for brain tumor segmentation are as follows: T1-weighted, post-contrast T1-weighted, T2-weighted, and Flair. Collectively, the complementary information from the aforementioned modalities enables a more robust segmentation of a tumor brain.
Recently, deep learning methods have been successfully applied in a variety of application domains[26, 13, 33]. One of the applications is brain tumor segmentation which has gained wide attention in the medical imaging field. In this paper, we investigate some of the best recent methods and techniques adopted for automatic brain tumor segmentation task.
Havaei et al.  introduced specific multi-path CNNs to segment brain tumor regions over 2D slices of MRI images. Additionally, two training phases were used to tackle the imbalanced classes of the input data. A boundary-aware FCN was developed by Shen et al.  to increase the segmentation performance. Subsequently, Kamnitsas et al.  proposed a novel 3D network, named Deep Medic, which extracts multi-scale feature maps, and then integrates them locally and globally by using a two-path architecture.
According to the effectiveness of encoder-decoder architectures for the purpose of semantic segmentation, networks like UNet  also appropriately performed the brain tumor segmentation. All the winners who participated in BRAin Tumor Segmentation 2017 (BRATS 2017) challenge also benefited from encoder-decoder networks. Applying anisotropic convolutions, Wang et al.  trained three networks for every tumor sub-region in the cascade form, where the previous network output was considered as the subsequent network input.
In BRATS 2018, Andriy Myronenko , Isensee et al. , and McKinley et al.  ranked first, second, and third, respectively. Andriy Myronenko  indicated that the presence of a variational auto-encoder branch for the construction of an input image could provide the capability of regularizing the shared decoder. Isensee et al.  used a minor modification version of the UNet architecture. Moreover, the authors utilized extra training data provided by their own institution to improve the overall performance. McKinley et al.  developed a shallow network by applying a structure in which the dilated convolution was employed. Furthermore, to calculate the uncertainty of the label, the generalization of binary Cross-entropy was used. Zhou et al.  suggested the application of an ensemble of various networks incorporating multi-scale contextual information, adding an attention block, and segmentation of three tumor sub-regions in the cascade form. Wang et al.  investigated Test Time Augmentation technique (TTA) in which various augmentation methods are applied at the test time. By adopting TTA on several networks, they demonstrated that it can improve the overall performance of brain tumor segmentation.
Several studies [9, 20, 4] have shown that the 3D versions of UNet architecture are able to achieve better results compared to fully 2D architectures. Although 3D UNet has good performance, it has more parameters and computational complexity than 2D version and that is why we used a version of 2D UNet architecture to enhance the performance of the network in terms of memory. Consequently, we need to extract 2D slices from 3D volumes of MRI images, which causes not to benefit from 3D contextual information of input images. To overcome this problem, we have employed the Multi-View technique to enhance the network performance by benefiting from 3D contextual information of input images. Moreover, since most of the recent methods (especially UNet-based networks) integrate low-level and high-level features in a naive way, i.e. considering equal importance for each feature map, it may result in confusion for the model. To address this problem, we propose an extended version of UNet architecture in which we adopted channel attention mechanism technique after concatenation of low-level and high-level features by weighting each channel adaptively.
In summary, our paper carries three contributions as follows:
Since we used a version of 2D UNet architecture due to containing low number of parameters, we employ Multi-View Fusion technique to benefit from 3D contextual information of input images to improve the performance.
Integrating multi-level features in a naive way may result in confusion for the model. Thus, we propose an extended version of UNet architecture in which we adopted the channel attention mechanism after concatenation of multi-level features by weighting each channel adaptively to prevent confusion for the model.
Although our method is 2D, it performs favorably against 2017 and 2018 state-of-the-art methods.
. The network consists of two main paths: i) the contracting path that encodes the whole input image and ii) the expanding path that recovers the original resolution. The contracting path has three layers of downsampling in which, instead of using Max-pooling layers employed in the original UNet, we utilize convolutional layers with Stride = 2 similar to.
, instead of plain units in the original UNet to speedup training and convergence. In each residual unit, there are two convolutional layers each of which followed by a Batch Normalization layer and PReLU activation 
instead of ReLU activation used in the original UNet. After the contracting path, a residual unit is used in the bottleneck to connect both paths. Similarly, three residual units are used in the expanding path. This path also has three upsampling layers each of which doubles the size of feature maps. Moreover, aconvolutional layer is adopted after each upsampling layer.
To combine the feature maps in the expanding path with the corresponding feature maps of the contracting path, the original UNet used the direct concatenation. However, the direct concatenation of these high-level and low-level feature maps without weighting their importance is not the best way to efficiently integrate them. As a matter of fact, the multi-level features may not be useful for all types of input images and this would lead to redundancy of information. Moreover, inaccurate and ambiguous information of some levels might cause confusion for the network, and thus leads to wrong segmentation of tumors. To address these problems, different from the original UNet, we utilize a channel attention mechanism by adopting a Squeeze and Excitation Block  after each concatenation layer to adaptively weight the channels, as illustrated in Fig. 2. This design generates channel weights to re-weight the concatenated feature maps. Finally, at the end of the network, a convolutional layer followed by a Softmax function is adopted to map the features of the previous layer to the expected number of classes.
Ii-B Loss Function
The overall performance of a segmentation model depends not only on the architecture of the network but also on the choice of the loss function, particularly in the cases that suffer from highly class imbalance problems. Therefore, choosing an appropriate loss function becomes more challenging. Due to the distributions of the tumor and non-tumor regions, the brain tumor segmentation task has an innate class imbalance problem. Thus, the widely-used loss functions in the segmentation tasks are not appropriate for training our network. If these functions are adopted, the network tries to learn the larger classes and this results in poor segmentation performance. To tackle this problem, we use Generalized Dice loss (GDL) , which adaptively weights the classes to balance them, along with the well-known Cross-entropy loss (CE), which speeds up the convergence.
where the is empirically set to .
The GDL function  is a multi-class version of the Dice loss function. GDL also assigns an adaptive weight to each class to deal with the imbalances of brain tumor classes. GDL is computed as:
where is regularization constant and is the adaptive weight for class and formulated as:
The multi-class Cross-entropy loss function is also computed as follows:
Ii-C Multi-view Fusion
Since our proposed network is a 2D architecture, we need to extract 2D slices from 3D volumes of MRI images. To benefit from 3D contextual information of input images, we extract 2D slices from both Axial and Coronal views, and then train a network for each view separately. In the test time, we build the 3D output volume for each model by concatenating the 2D predicted maps. Finally, we fuse the two views by pixel-wise averaging. The whole procedure is illustrated in the Fig. 3. Benefiting from the fusion of these two views, our method is capable of considering the 3D nature of input images. We show the effectiveness of the Multi-view Fusion in the experimental results section.
Iii-a Dataset description
In this paper, we utilized the BRATS 2017 and 2018 [29, 5] datasets for experiments. The training sets of these datasets contain 3D MRI volumes of 285 different patients, in which there are 210 volumes as HGG and 75 volumes as LGG with dimensions of . The BRATS 2017 and the BRATS 2018 validation sets contain 3D MRI volume of 46 and 66 patients of unknown grades, respectively. There are four modalities for each individual brain, namely, T1, T1c (post-contrast T1), T2, and Flair which were skull-stripped, resampled and coregistered. These datasets includes four labels, namely, enhancing tumor, edema, necrosis, and background. For the purpose of evaluation, annotations are merged into three binary sub-regions including Whole Tumor region, Tumor Core region, and Enhancing Tumor region, which we denote them as WT, TC, and ET, respectively. The experts of this domain have created ground truth by manual segmentation. The segmentation labels for the validation sets are not publicly available, and the participants must upload the results provided by their networks to the BRATS online evaluation platform in order to obtain quantitative evaluations such as Dice Score and Hausdorff Distance .
Iii-B Evaluation metrics
To evaluate the performance of our method, We use Dice Score and Hausdorff Distance for ET, WT, and TC sub-regions.
Dice Score calculates the similarity between X and Y sets as follows:
where and denote the cardinalities of and sets, respectively.
The Hausdorff Distance (HD) is defined as the longest distance between a point in a one set and the most adjacent point of the other set and defined as:
where d(x,y) is Euclidean Distance between and . To reduce problems with noisy predictions, percentile is used instead of the max operation, which we refer to as Hausdorff95.
Iii-C Implementation Details
We conduct all of our experiments in Google Colabatory service. Our proposed network is developed in Keras
, using TensorFlow backend. For each view, we train our network by performing a 5-fold cross-validation on the 285 cases of BRATS 2018 training set (228 cases for training and the other 57 cases for validation for each fold). Finally, evaluation results for each view are obtained by adopting an ensemble learning method in which we average the Softmax output of the five networks. BRATS 2018 and 2017 validation sets are used to evaluation our method and all reported results were computed by the online evaluation platform. Our final results are available in the leaderboard section of these challenges under the title "IUST_ ICCKE2019".
For pre-processing the data, firstly, N4ITK algorithm, a bias field correction algorithm proposed in , is adopted on each MRI modalities to correct the inhomogeneity of these images. Secondly, of the top and bottom intensities is removed like 
, and then each modality is normalized to zero mean and unit variance. To reduce overfitting, two kinds of data augmentations are employed at random: vertical and horizontal flipping. A Gaussian noise layer with standard deviation 0.01 is also utilized at the input of architecture in other to tackle the noisy nature of MRI images. To train the networks, we adopt SGD with a momentum 0.9 and learning rate of 8e-3.
|UNet + Minor Modifications||0.768||0.883||0.807|
Iii-D Ablation Study
In this section, we show the effectiveness of the modifications employed in the proposed network. The modifications are divided into two parts, the Minor Modifications (including Residual Units, strided convolution, PReLU and BN) and the attention mechanism which weights the multi-level features by adopting SE Blocks. For the sake of simplicity, all the models in this section are trained on the first fold and the Axial view of the BRATS 2018 training set, and then evaluated on the BRATS 2018 validation set. As seen from Table I, the attention mechanism improves the overall performance significantly which shows the beneficial effect of weighting multi-level features.
Iii-E The Proposed Method Results
In this section, evaluation results of our proposed method are reported. Table II demonstrates the ensemble results of 5-fold cross-validation on Axial and Coronal views along with the results of the Multi-view Fusion technique. As seen from Table II, Multi-view Fusion improves the overall performance, especially in terms of Hausdorff distance metric, which shows the beneficial effect of considering the 3D nature of the data. The Fig. 4 shows the visual comparison of the networks in Table II for a HGG tumor and a LGG tumor. In this figure, the input slices and the segmentation masks are shown in Axial view and Coronal view. We also provide Dice score and Hausdorff95 Box plots in Fig. 5 for the three regions.
Iii-F Comparison with the Existing Methods
In this section, the proposed method is evaluated on the validation dataset of the BRATS 2017 and BRATS 2018 challenges, and compared with the best methods of these challenges in terms of Dice score and Hausdorff distance metric. As previously mentioned, all results are obtained from the online evaluation platform. Table III and Table IV show the evaluation results of BRATS 2018 and 2017 validation set, respectively. Although our network is a 2D architecture, it performs favorably against state-of-the-art methods, especially in terms of ET Dice score and Hausdorff95 in all three sub-regions.
|Isensee et al. ||3D||0.804||0.908||0.854||3.12||4.97||7.04|
|McKinley et al. ||3D||0.796||0.903||0.847||3.55||4.17||4.93|
|Zhou et al. ||3D||0.792||0.907||0.835||2.80||4.48||7.07|
|Gholami et al. ||3D||0.791||0.908||0.819||—||—||—|
|Albiol et al. ||3D||0.773||0.881||0.777||—||—||—|
|Chen et al. ||3D||0.733||0.888||0.808||4.64||5.51||8.14|
|Wang et al. ||3D||0.786||0.905||0.838||3.28||3.89||6.48|
|Kamnitsas et al. ||3D||0.757||0.902||0.820||4.22||4.56||6.11|
|Isensee et al. ||3D||0.732||0.896||0.797||4.55||6.97||9.48|
|Feng et al. ||3D||0.751||0.896||0.799||4.76||12.53||8.69|
|Jesson et al. ||3D||0.713||0.899||0.751||6.98||4.16||8.65|
|Andermatt et al. ||3D||0.711||0.893||0.734||4.19||4.61||8.19|
|Jungo et al. ||2D||0.674||0.884||0.726||6.63||7.93||10.91|
In this paper, we propose an improved version of 2D UNet architecture for the purpose of segmenting brain tumors. Despite designing a 2D architecture that contains the low number of parameters, the model can benefit from 3D contextual information of input images by using the Multi-View technique. Moreover, since considering equal importance for each feature map after concatenation of low-level and high-level features may result in confusion for the model, we utilize the attention mechanism to extract discriminative features and prevent confusion for the model. By adopting these techniques, we achieve average Dice scores of 0.813, 0.895, and 0.823 for ET, WT, and TC, respectively, by using the BRATS 2018 validation data set.
TensorFlow: large-scale machine learning on heterogeneous systems. Note: Software available from tensorflow.org External Links: Cited by: §III-C.
-  (2018) Extending 2d deep learning architectures to 3d image segmentation problems. In International MICCAI Brainlesion Workshop, pp. 73–82. Cited by: TABLE III.
Automated segmentation of multiple sclerosis lesions using multi-dimensional gated recurrent units. In International MICCAI Brainlesion Workshop, pp. 31–42. Cited by: TABLE IV.
-  (2018) Deep learning radiomics algorithm for gliomas (drag) model: a novel approach using 3d unet based deep convolutional neural network for predicting survival in gliomas. In International MICCAI Brainlesion Workshop, pp. 369–379. Cited by: §I.
-  (2017) Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data 4, pp. 170117. Cited by: §III-A.
Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pp. 177–186. Cited by: §III-C.
-  (2018) S3D-unet: separable 3d u-net for brain tumor segmentation. In International MICCAI Brainlesion Workshop, pp. 358–368. Cited by: TABLE III.
-  (2015) Keras. Note: https://keras.io Cited by: §III-C.
-  (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pp. 424–432. Cited by: §I.
-  (2017) Patch-based 3d u-net for brain tumor segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Cited by: TABLE IV.
An inverse problem formulation for parameter estimation of a reaction–diffusion model of low grade gliomas. Journal of mathematical biology 72 (1-2), pp. 409–433. Cited by: §I.
-  (2018) A novel domain adaptation framework for medical image segmentation. In International MICCAI Brainlesion Workshop, pp. 289–298. Cited by: TABLE III.
-  (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In , pp. 580–587. Cited by: §I.
-  (2017) Brain tumor segmentation with deep neural networks. Medical image analysis 35, pp. 18–31. Cited by: §I, §III-C.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. Cited by: §II-A.
-  (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §II-A.
-  (2016) Identity mappings in deep residual networks. In European conference on computer vision, pp. 630–645. Cited by: §II-A.
-  (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141. Cited by: Fig. 2, §II-A.
-  (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §II-A.
-  (2017) Brain tumor segmentation and radiomics survival prediction: contribution to the brats 2017 challenge. In International MICCAI Brainlesion Workshop, pp. 287–297. Cited by: §I, TABLE IV.
-  (2018) No new-net. In International MICCAI Brainlesion Workshop, pp. 234–244. Cited by: §I, TABLE III.
-  (2017) Brain tumor segmentation using a 3d fcn with multi-scale loss. In International MICCAI Brainlesion Workshop, pp. 392–402. Cited by: TABLE IV.
-  (2017) Towards uncertainty-assisted brain tumor segmentation and survival prediction. In International MICCAI Brainlesion Workshop, pp. 474–485. Cited by: TABLE IV.
-  (2017) Ensembles of multiple models and architectures for robust brain tumour segmentation. In International MICCAI Brainlesion Workshop, pp. 450–462. Cited by: TABLE IV.
-  (2017) Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis 36, pp. 61–78. Cited by: §I.
-  (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §I.
-  (2007) Targeted delivery of antitumoral therapy to glioma and other malignancies with synthetic chlorotoxin (tm-601). Expert opinion on drug delivery 4 (2), pp. 175–186. Cited by: §I.
-  (2018) Ensembles of densely-connected cnns with label-uncertainty for brain tumor segmentation. In International MICCAI Brainlesion Workshop, pp. 456–465. Cited by: §I, TABLE III.
-  (2014) The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34 (10), pp. 1993–2024. Cited by: §III-A.
-  (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §II-A.
3D mri brain tumor segmentation using autoencoder regularization. In International MICCAI Brainlesion Workshop, pp. 311–320. Cited by: §I, TABLE III.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §I, §II-A.
Change detection in sar images using deep belief network: a new training approach based on morphological images. IET Image Processing. Cited by: §I.
-  (2014) A multilayer grow-or-go model for gbm: effects of invasive cells and anti-angiogenesis on growth. Bulletin of mathematical biology 76 (9), pp. 2306–2333. Cited by: §I.
-  (2017) Boundary-aware fully convolutional network for brain tumor segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 433–441. Cited by: §I.
-  (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 240–248. Cited by: §II-B, §II-B.
-  (2015) Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC medical imaging 15 (1), pp. 29. Cited by: §III-A.
-  (2010) N4ITK: improved n3 bias correction. IEEE transactions on medical imaging 29 (6), pp. 1310. Cited by: §III-C.
-  (2017) Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks. In International MICCAI Brainlesion Workshop, pp. 178–190. Cited by: §I, TABLE IV.
-  (2018) Automatic brain tumor segmentation using convolutional neural networks with test-time augmentation. In International MICCAI Brainlesion Workshop, pp. 61–72. Cited by: §I.
-  (2018) Learning contextual and attentive information for brain tumor segmentation. In International MICCAI Brainlesion Workshop, pp. 497–507. Cited by: §I, TABLE III.