Robust Semantic Segmentation of Brain Tumor Regions from 3D MRIs

by   Andriy Myronenko, et al.

Multimodal brain tumor segmentation challenge (BraTS) brings together researchers to improve automated methods for 3D MRI brain tumor segmentation. Tumor segmentation is one of the fundamental vision tasks necessary for diagnosis and treatment planning of the disease. Previous years winning methods were all deep-learning based, thanks to the advent of modern GPUs, which allow fast optimization of deep convolutional neural network architectures. In this work, we explore best practices of 3D semantic segmentation, including conventional encoder-decoder architecture, as well combined loss functions, in attempt to further improve the segmentation accuracy. We evaluate the method on BraTS 2019 challenge.


Redundancy Reduction in Semantic Segmentation of 3D Brain Tumor MRIs

Another year of the multimodal brain tumor segmentation challenge (BraTS...

3D MRI brain tumor segmentation using autoencoder regularization

Automated segmentation of brain tumors from 3D magnetic resonance images...

Deep Learning Based Brain Tumor Segmentation: A Survey

Brain tumor segmentation is a challenging problem in medical image analy...

EVC-Net: Multi-scale V-Net with Conditional Random Fields for Brain Extraction

Brain extraction is one of the first steps of pre-processing 3D brain MR...

3D Kidneys and Kidney Tumor Semantic Segmentation using Boundary-Aware Networks

Automated segmentation of kidneys and kidney tumors is an important step...

Parotid Gland MRI Segmentation Based on Swin-Unet and Multimodal Images

Parotid gland tumors account for approximately 2 tumors. Preoperative tu...

MRI brain tumor segmentation and uncertainty estimation using 3D-UNet architectures

Automation of brain tumor segmentation in 3D magnetic resonance images (...

Code Repositories


An implementation for "Robust Semantic Segmentation of Brain Tumor Regions from 3D MRIs."

view repo

1 Introduction

Brain tumors are categorized into primary and secondary tumor types. Primary brain tumors originate from brain cells, whereas secondary tumors metastasize into the brain from other organs. The most common type of primary brain tumors are gliomas, which arise from brain glial cells. Gliomas can be of low-grade (LGG) and high-grade (HGG) subtypes. High grade gliomas are an aggressive type of malignant brain tumor that grow rapidly, usually require surgery and radiotherapy and have poor survival prognosis. Magnetic Resonance Imaging (MRI) is a key diagnostic tool for brain tumor analysis, monitoring and surgery planning. Usually, several complimentary 3D MRI modalities are acquired - such as T1, T1 with contrast agent (T1c), T2 and Fluid Attenuation Inversion Recover (FLAIR) - to emphasize different tissue properties and areas of tumor spread. For example the contrast agent, usually gadolinium, emphasizes hyperactive tumor subregions in T1c MRI modality.

Automated segmentation of 3D brain tumors can save physicians time and provide an accurate reproducible solution for further tumor analysis and monitoring. Recently, deep learning based segmentation techniques surpassed traditional computer vision methods for dense semantic segmentation. Convolutional neural networks (CNN) are able to learn from examples and demonstrate state-of-the-art segmentation accuracy both in 2D natural images 

[5, 7] and in 3D medical image modalities [15].

Multimodal Brain Tumor Segmentation Challenge (BraTS) aims to evaluate state-of-the-art methods for the segmentation of brain tumors by providing a 3D MRI dataset with ground truth tumor segmentation labels annotated by physicians [4, 14, 3, 1, 2]. This year, BraTS 2019 training dataset included 335 cases, each with four 3D MRI modalities (T1, T1c, T2 and FLAIR) rigidly aligned, resampled to 1x1x1 mm isotropic resolution and skull-stripped. The input image size is 240x240x155. The data were collected from multiple institutions, using various MRI scanners. Annotations include 3 tumor subregions: the enhancing tumor, the peritumoral edema, and the necrotic and non-enhancing tumor core. The annotations were combined into 3 nested subregions: whole tumor (WT), tumor core (TC) and enhancing tumor (ET), as shown in Figure 1. Two additional datasets without the ground truth labels were provided for validation and testing. These datasets required participants to upload the segmentation masks to the organizers’ server for evaluations. The validation dataset (125 cases) allowed multiple submissions and was designed for intermediate evaluations. The testing dataset allowed only a single submission, and is used to calculate the final challenge ranking.

In this work, we describe our semantic segmentation approach for volumetric 3D brain tumor segmentation from multimodal 3D MRIs and participate in BraTS 2019 challenge.

2 Related work

Previous year, BraTS 2018 top submissions included Myronenko [16], Isensee et al. [11], McKinly et al. [13] and Zhou et al. [19]. In our previous work [16], we explored how an additional decoder for a secondary task get impose additional structure on the network. Isensee et al. [11] demonstrated that a generic U-net architecture with a few minor modifications is enough to achieve competitive performance. McKinly et al. [13] proposed a segmentation CNN in which a DenseNet [9] structure with dilated convolutions was embedded in U-net-like network. Finally, Zhou et al. [19] proposed to use an ensemble of different networks: taking into account multi-scale context information, segmenting 3 tumor subregions in cascade with a shared backbone weights and adding an attention block.

Here, we generally follow the previous year submission [16], but instead of secondary task decoder we explore various architecture design choices and complimentary loss functions. We also utilize multi-gpu systems for data parallelism to be able to use larger batch sizes.

3 Methods

Our segmentation approach generally follows [16] with encoder-decoder based CNN architecture.

3.1 Encoder part

The encoder part uses ResNet [8]

blocks, where each block consists of two convolutions with normalization and ReLU, followed by additive identity skip connection. For normalization, we experimented with Group Normalization (GN) 

[18], Instance Normalization [17]

and Batch Normalization 


. We follow a common CNN approach to progressively downsize image dimensions by 2 and simultaneously increase feature size by 2. For downsizing we use strided convolutions. All convolutions are 3x3x3 with initial number of filters equal to 32. The encoder part structure is shown in Table 

1. The encoder endpoint has size 256x20x24x16, and is 8 times spatially smaller than the input image. We decided against further downsizing to preserve more spatial content.

Name Ops Repeat Output size
Input 4x160x192x128
InitConv Conv 1 32x160x192x128
EncoderBlock0 GN,ReLU,Conv,GN,ReLU,Conv, AddId 1 32x160x192x128
EncoderDown1 Conv stride 2 1 64x80x96x64
EncoderBlock1 GN,ReLU,Conv,GN,ReLU,Conv, AddId 2 64x80x96x64
EncoderDown2 Conv stride 2 1 128x40x48x32
EncoderBlock2 GN,ReLU,Conv,GN,ReLU,Conv, AddId 2 128x40x48x32
EncoderDown3 Conv stride 2 1 256x20x24x16
EncoderBlock3 GN,ReLU,Conv,GN,ReLU,Conv, AddId 4 256x20x24x16
Table 1: Encoder structure, where GN stands for group normalization (with group size of 8), Conv - 3x3x3 convolution, AddId - addition of identity/skip connection. Repeat column shows the number of repetitions of the block. We refer to the final output of the encoder, as the encoder endpoint

3.2 Decoder part

The decoder structure is similar to the encoder one, but with a single block per each spatial level. Each decoder level begins with upsizing: reducing the number of features by a factor of 2 (using 1x1x1 convolutions) and doubling the spatial dimension (using 3D bilinear upsampling), followed by an addition of encoder output of the equivalent spatial level. The end of the decoder has the same spatial size as the original image, and the number of features equal to the initial input feature size, followed by 1x1x1 convolution into 3 channels and a sigmoid function. The decoder structure is shown in Table 


Name Ops Repeat Output size
DecoderUp2 Conv1, UpLinear, +EncoderBlock2 1 128x40x48x32
DecoderBlock2 GN,ReLU,Conv,GN,ReLU,Conv, AddId 1 128x40x48x32
DecoderUp1 Conv1, UpLinear, +EncoderBlock1 1 64x80x96x64
DecoderBlock1 GN,ReLU,Conv,GN,ReLU,Conv, AddId 1 64x80x96x64
DecoderUp0 Conv1, UpLinear, +EncoderBlock0 1 32x160x192x128
DecoderBlock0 GN,ReLU,Conv,GN,ReLU,Conv, AddId 1 32x160x192x128
DecoderEnd Conv1, Sigmoid 1 1x160x192x144
Table 2: Decoder structure, where GN stands for group normalization (with group size of 8), Conv - 3x3x3 convolution, Conv1 - 1x1x1 convolution, AddId - addition of identity/skip connection, UpLinear - 3D linear spatial upsampling

3.3 Loss

We use a hybrid loss function that consists of the following terms:


is a soft dice loss [15] applied to the decoder output to match the segmentation mask :


where summation is voxel-wise, and is a small constant to avoid zero division. Since the output of the segmentation decoder has 3 channels (predictions for each tumor subregion), we simply add the three dice loss functions together. is the 3D extension of supervised active contour loss [6] that consists of volumetric and length terms:


in which :


Where and represent the energy of the foreground and background. is a focal loss function [12] defined as:


Where is the total number of voxels, and is set to .

3.4 Optimization

We use Adam optimizer with initial learning rate of and progressively decrease it according to:



is an epoch counter, and

is a total number of epochs (300 in our case). We draw input images in random order (ensuring that each training image is drawn once per epoch).

3.5 Regularization

We use L2 norm regularization on the convolutional kernel parameters with a weight of . We also use the spatial dropout with a rate of after the initial encoder convolution.

3.6 Data preprocessing and augmentation

We normalize all input images to have zero mean and unit std (based on non-zero voxels only). We apply a random (per channel) intensity shift ( of image std) and scale (

) on input image channels. We also apply a random axis mirror flip (for all 3 axes) with a probability


4 Results

Figure 1: A typical segmentation example with true and predicted labels overlaid over T1c MRI axial, sagittal and coronal slices. The whole tumor (WT) class includes all visible labels (a union of green, yellow and red labels), the tumor core (TC) class is a union of red and yellow, and the enhancing tumor core (ET) class is shown in yellow (a hyperactive tumor part). The predicted segmentation results match the ground truth well.

We implemented our network in PyTorch 

111 and trained it on NVIDIA Tesla V100 32GB GPUs using BraTS 2019 training dataset (335 cases) without any additional in-house data. During training we used a random crop of size 160x192x128, which ensures that most image content remains within the crop area. We concatenated 4 available 3D MRI modalities into the 4 channel image as an input. The output of the network is 3 nested tumor subregions (after the sigmoid).

We report the results of our approach on BraTS 2019 validation (125 cases). We uploaded our segmentation results to the BraTS 2019 server for evaluation of per class dice, sensitivity, specificity and Hausdorff distances.

The results of our model on the BratTS 2019 data are shown in Table 3 for the validation dataset and in Table 4 for the testing dataset.

Dice Hausdorff (mm)
Validation dataset ET WT TC ET WT TC
Single Model (batch 8) 0.800 0.894 0.834 3.921 5.89 6.562
Table 3: BraTS 2019 validation dataset results. Mean Dice and Hausdorff measurements of the proposed segmentation method. EN - enhancing tumor core, WT - whole tumor, TC - tumor core.
Dice Hausdorff (mm)
Testing dataset ET WT TC ET WT TC
Ensemble 0.826 0.882 0.837 2.203 4.713 3.968
Table 4: BraTS 2019 testing dataset results. Mean Dice and Hausdorff measurements of the proposed segmentation method. EN - enhancing tumor core, WT - whole tumor, TC - tumor core.

Time-wise, each training epoch (335 cases) on a single GPU (NVIDIA Tesla V100 32GB) takes  10min. Training the model for 300 epochs takes  2 days. We trained the model on NVIDIA DGX-1 server (that includes 8 V100 GPUs interconnected with NVLink); this allowed to train the model in  8 hours. The inference time is 0.4 sec for a single model on a single V100 GPU.

5 Discussion and Conclusion

In this work, we described a semantic segmentation network for brain tumor segmentation from multimodal 3D MRIs for BraTS 2019 challenge. We have experimented with various normalization functions, and found groupnorm and instancenorm to perform equivalent, whereas batchnorm was always inferior, which could be due the fact of the largest batch size attempted being only 16. Since instancenorm is simpler to understand and implement, we used it for normalization by default. Multi-gpu systems, such as DGX-1 server, contains 8 GPU, which allows data-parallel implementation of batch size of 1 (where each each GPU get a batch of 1). We found the performance of multi-gpu system to be equivalent to a single gpu (batch 1) case, thus we used a batch of 8 by default, since it is almost 8 times faster to train. We have also experimented with more sophisticated data augmentation techniques, including random histogram matching, affine image transforms, rotations, random image filtering, which did not demonstrate any additional improvements. Increasing the network depth further did not improve the performance, but increasing the network width (the number of features/filters) consistently improved the results. Our BraTS 2019 final testing dataset results were 0.826, 0.882 and 0.837 average dice for enhanced tumor core, whole tumor and tumor core, respectively.


  • [1] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., John Freymann, K.F., Davatzikos, C.: Segmentation labels and radiomic features for the pre-operative scans of the tcga-gbm collection. The Cancer Imaging Archive (2017),
  • [2] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., John Freymann, K.F., Davatzikos, C.: Segmentation labels and radiomic features for the pre-operative scans of the tcga-lgg collection. The Cancer Imaging Archive (2017),
  • [3] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann, J., Farahani, K., Davatzikos, C.: Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data 4 (9 2017)
  • [4] Bakas, S., Reyes, M., et Int, Menze, B.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. In: arXiv:1811.02629 (2018)
  • [5] Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv:1802.02611 (2018)
  • [6]

    Chen, X., Williams, B.M., Vallabhaneni, S.R., Czanner, G., Williams, R., Zheng, Y.: Learning active contour models for medical image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11632–11640 (2019)

  • [7] Hatamizadeh, A., Sengupta, D., Terzopoulos, D.: End-to-end deep convolutional active contours for image segmentation. arXiv preprint arXiv:1909.13359 (2019)
  • [8] He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision (ECCV) (2016)
  • [9] Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2261–2269 (2017)
  • [10]

    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML). pp. 448–456 (2015)

  • [11] Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.: No new-net. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2018). Multimodal Brain Tumor Segmentation Challenge (BraTS 2018). BrainLes 2018 workshop. LNCS, Springer (2018)
  • [12] Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp. 2980–2988 (2017)
  • [13] McKinley, R., Meier, R., Wiest, R.: Ensembles of densely-connected cnns with label-uncertainty for brain tumor segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2018). Multimodal Brain Tumor Segmentation Challenge (BraTS 2018). BrainLes 2018 workshop. LNCS, Springer (2018)
  • [14] Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., Lanczi, L., Gerstner, E.R., Weber, M.A., Arbel, T., Avants, B.B., Ayache, N., Buendia, P., Collins, D.L., Cordier, N., Corso, J.J., Criminisi, A., Das, T., Delingette, H., Demiralp, C., Durst, C.R., Dojat, M., Doyle, S., Festa, J., Forbes, F., Geremia, E., Glocker, B., Golland, P., Guo, X., Hamamci, A., Iftekharuddin, K.M., Jena, R., John, N.M., Konukoglu, E., Lashkari, D., Mariz, J.A., Meier, R., Pereira, S., Precup, D., Price, S.J., Raviv, T.R., Reza, S.M.S., Ryan, M.T., Sarikaya, D., Schwartz, L.H., Shin, H.C., Shotton, J., Silva, C.A., Sousa, N., Subbanna, N.K., Szekely, G., Taylor, T.J., Thomas, O.M., Tustison, N.J., Unal, G.B., Vasseur, F., Wintermark, M., Ye, D.H., Zhao, L., Zhao, B., Zikic, D., Prastawa, M., Reyes, M., Leemput, K.V.: The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
  • [15] Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: Fourth International Conference on 3D Vision (3DV) (2016)
  • [16] Myronenko, A.: 3D MRI brain tumor segmentation using autoencoder regularization. In: BrainLes, Medical Image Computing and Computer Assisted Intervention (MICCAI). pp. 311–320. LNCS, Springer (2018),
  • [17] Ulyanov, D., Vedaldi, A., Lempitsky, V.S.: Instance normalization: The missing ingredient for fast stylization. In: CVPR (2016)
  • [18] Wu, Y., He, K.: Group normalization. In: European Conference on Computer Vision (ECCV) (2018)
  • [19] Zhou, C., Chen, S., Ding, C., Tao, D.: Learning contextual and attentive information for brain tumor segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2018). Multimodal Brain Tumor Segmentation Challenge (BraTS 2018). BrainLes 2018 workshop. LNCS, Springer (2018)