Reciprocal Adversarial Learning for Brain Tumor Segmentation: A Solution to BraTS Challenge 2021 Segmentation Task

by   Himashi Peiris, et al.
Monash University

This paper proposes an adversarial learning based training approach for brain tumor segmentation task. In this concept, the 3D segmentation network learns from dual reciprocal adversarial learning approaches. To enhance the generalization across the segmentation predictions and to make the segmentation network robust, we adhere to the Virtual Adversarial Training approach by generating more adversarial examples via adding some noise on original patient data. By incorporating a critic that acts as a quantitative subjective referee, the segmentation network learns from the uncertainty information associated with segmentation results. We trained and evaluated network architecture on the RSNA-ASNR-MICCAI BraTS 2021 dataset. Our performance on the online validation dataset is as follows: Dice Similarity Score of 81.38 Hausdorff Distance (95%) of 21.83 mm, 5.37 mm, 8.56 mm for the enhancing tumor, whole tumor and tumor core, respectively. Similarly, our approach achieved a Dice Similarity Score of 84.55 Hausdorff Distance (95%) of 13.48 mm, 6.32 mm and 16.98 mm on the final test dataset. Overall, our proposed approach yielded better performance in segmentation accuracy for each tumor sub-region. Our code implementation is publicly available at



There are no comments yet.


page 4

page 9


Multi-stage Deep Layer Aggregation for Brain Tumor Segmentation

Gliomas are among the most aggressive and deadly brain tumors. This pape...

Extending nn-UNet for brain tumor segmentation

Brain tumor segmentation is essential for the diagnosis and prognosis of...

A Baseline Approach for AutoImplant: the MICCAI 2020 Cranial Implant Design Challenge

In this study, we present a baseline approach for AutoImplant (https://a...

DR-Unet104 for Multimodal MRI brain tumor segmentation

In this paper we propose a 2D deep residual Unet with 104 convolutional ...

MixLacune: Segmentation of lacunes of presumed vascular origin

Lacunes of presumed vascular origin are fluid-filled cavities of between...

Hepatocellular Carcinoma Segmentation fromDigital Subtraction Angiography Videos usingLearnable Temporal Difference

Automatic segmentation of hepatocellular carcinoma (HCC)in Digital Subtr...

Code Repositories


Code for Reciprocal Adversarial Learning for Brain Tumor Segmentation: A Solution to BraTS Challenge 2021 Segmentation Task

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Segmentation accuracy on boundaries is essential in medical image segmentation as it is crucial for many clinical applications, such as treatment planning, disease diagnosis and image guided intervention to name a few. Tremendous progress in deep learning algorithms in dense pixel level prediction tasks has recently drawn attention on implementing automatic segmentation applications for brain tumor/giloma segmentation. Gliomas considered as the most common brain tumor variant in adults. Diagnosing High-Grade Gliomas (HGG) in early phases which are more malignant (since they usually grow fast and frequently destroy healthy brain tissue) is essential for treatment planning. On the other hand Low-Grade Gliomas (LGG) are slower growing tumors which can be cured if it is diagnosed in early phases. However, segmenting tumor sub regions from various medical images modalities (e.g., MRI and CT) is a monotonous process which is time consuming and subjective. Medical Imaging analysis is carried out by radiologists and this manual process is tedious since the volumes are hefty in size and contains heterogeneous ambiguous sub-regions (i.e

. edema, active tumor structures, necrotic components, and non-enhancing gross abnormality). In particular, medical image segmentation plays a cornerstone role in computer aided diagnosis. With the recent development in computer vision algorithms in deep learning, there has been many discoveries on automatic medical image segmentation. Multi-modal brain tumor segmentation challenge (BraTS) has been one of the platforms for many discoveries for many years. During the last decade, variants of Fully convolutional networks (FCN) and Convolutional Neural Network (CNN) based architectures have shown convincing performance in previous BraTS and other segmentation challenges. Recent developments in volumetric medical image segmentation networks like 3D-Unet 

[6] and V-Net [14] has been widely used with medical image modalities since these networks produce predictions for different planes(i.e. axial (divides the body into top and bottom halves), coronal (perpendicular), and sagittal (midline of the body)).

The main limitation of implementing and training these volumetric neural network architectures is out-of-memory (OOM) issues and extending these architectures are not feasible due to computational resource constraints. Many researchers have shown that, with a carefully crafted pre-processing, training and inference procedure, segmentation accuracy of 3D-UNet can improve further. By considering those factors like OOM issues, resource limitations, inference time, we propose an approach to tackle these challenges and further improve the segmentation accuracy and training process of 3D-UNet architecture 

[6]. In summary, our major contributions are,

  1. Inspired by adversarial learning techniques, we propose two way adversarial learning to segment brain tumor sub regions in multi-modal MR images.

  2. We introduce a volumetric discriminator model which can explicitly show the confidence towards the current prediction to impose a higher-order consistency measure of prediction and ground truth during training.

  3. We introduce Virtual Adversarial Training (VAT) during model training to enhance the model’s robustness to data artefacts.

2 Related Work

2.1 Medical Image Segmentation

The rapid development of deep Convolutional Neural Networks and U-shaped encoder decoder architectures have shown convincing performance in medical image segmentation. The celebrated work U-Net [18] has shown a novel direction to automatic medical image segmentation as it exploits both spatial and contextual information of images which greatly affect accuracy of segmentation models. Due to the simplicity and superior performance U-Net, many variants of U-shaped architectures are constantly emerging, such as Res-UNet [20], H-Dense-UNet [11], U-Net++ [22] and Attention-UNet [16]. Later, to handle volumetric medical image segmentation models are introduced into the field of 3D medical image segmentation, such as 3D-Unet [6] and V-Net [14].

2.2 Adversarial Learning

Generative Adversarial Networks(GANs) [8] by Goodfellow has been a major breakthrough in the image generation task. Inspired by GAN approach, many GAN based medical imaging applications were introduced recently including in the areas of medical image segmentation [12], reconstruction [17] and domain adaptation [21]. In BraTS challenge 2020, Marco et al. proposed 3D volume-to-volume Generative Adversarial Network for segmentation of brain tumours [7] where the discriminator is build based on PatchGAN [9]

architecture style. VAT is another adversarial learning approach which has shown tremendous performance in semi-supervised learning 


. VAT is applicable to any parametric model and it directly regularizes the output distribution by its local sensitivity of the output with respect to input 


Hence, inspired by above works, we propose min-max formulation with VAT for segmenting brain tumors in multi-modal MR images.

3 Methodology

We start this section by providing an overview of the BraTS dataset and proposed method as shown in Fig. 2. Then we detail out the structure of each module and the entire training pipeline.

3.1 Dataset

The Magnetic Resonance images used for the model training and evaluation are from the Multi-modal Brain tumour Segmentation Challenge (BraTS) 2021  [2, 13, 5, 3, 4]. The BraTS 2021 training dataset contains 1251 MR volumes of shape . MRI is required to evaluate tumor heterogeneity. These MRI sequences are conventionally used for giloma detection: T1 weighted sequence (T1), T1-weighted contrast enhanced sequence using gadolinium contrast agents (T1Gd) (T1CE), T2 weighted sequence (T2), and Fluid attenuated inversion recovery (FLAIR) sequence. From these sequences, four distinct tumor sub-regions can be identified from MRI as: The Enhancing Tumor (ET) which corresponds to area of relative hyper-intensity in the T1CE with respect to the T1 sequence, Non Enhancing Tumor (NET), Necrotic Tumor (NCR) which are both hypo-intense in T1-Gd when compared to T1, Peritumoral Edema (ED) which is hyper-intense in FLAIR sequence. These almost homogeneous sub-regions can be clustered together to compose three semantically meaningful tumor classes as, Enhancing Tumor (ET), addition of ET, NET and NCR represents the Tumor Core (TC) region and addition of ED to TC represents the Whole Tumor (WT). MRI sequences and ground truth map with three classes are shown in Fig. 1.

Flair T1 T1CE T2 GT
Figure 1: Visual Analysis of BraTs 2021 Training Data. In the Ground Truth (GT) Mask, green, yellow and gray represent the peritumoral edema (ED), Enhancing Tumor (ET) and non enhancing tumor/necrotic tumor (NET/NCR), respectively.

3.2 Problem Formulation

Let be a labeled set with number of samples, where each sample consists of an image and its associated ground-truth segmentation mask . Pixels with 0,1,2 and 4 in label-map represent the background/air, Necrotic (NCR) and Non-enhancing tumor core (NET), Peritumoral Edema (ED) and Enhancing Tumor (ET).

3.3 Network Architecture

Figure 2: Proposed Overall Network Architecture. and denote the Segmentation network and the Critic network. and are input data (original patient data), ground truth segmentation masks, perturbation added on input data and the prediction generated from segmentation network. Here, Critic criticizes between prediction masks and the ground truth masks to perform the min-max game by generating a pixel-wise confidence map. VAT block improves the robustness of the model against generated adversarial examples by adding perturbation that violates the virtual adversarial direction.

The proposed network architecture consists of three modules, namely a segmentation network, a critic network and Virtual adversarial Training (VAT) block. The segmentation network (i.e., ) composed of down-sampling and up-sampling layers with skip pathways, making it a U like network architecture [18]. Critic is constructed as a fully convolutional adversarial network. Both networks consists 3D convolutions. The critic constructively impose the segmentation network to predict segmentation masks that are more similar to ground truth masks. The critic here, depicts Markovian PatchGAN architecture [10, 9]. In the original work Markovian PatchGAN architecture enables producing confidence scores for prediction masks. Inspired by this, we adapt the similar approach to provide uncertainty information to the segmentation network. The VAT block generates adversarial examples, so that the segmentation network can learn to avoid making such incorrect predictions on new patient data and patient data with artefacts.

3.4 Objective Function

The parameters of segmentation network is defined as and the critic network is . To encourage the segmentation network to yield predictions closer to the ground truth real masks by deceiving a critic network, we propose optimizing the following min-max problem:


We propose to train the segmentation network by minimizing a the total loss function which consists of three terms:


where , , and denote the supervised dice loss, the virtual adversarial training loss and the critic loss respectively. Furthermore, are hyper-parameters of the algorithm, controlling the contribution of each loss term. It can be seen that the supervised dice loss and vat loss are only dependent on the segmentation networks while the critic loss is defined based on the parameters of the entire model. The segmentation network works robustly and shows generalization performance as long as these parameters are defined in a reasonable range. In our experiments we set , and .

As the main loss, we use dice loss and we calculate dice loss for each class (Multi-class loss function):


where we use , and is the smoothing factor (set to 1 in our experiment).

VAT is an algorithm that updates the model by the weighted sum of the gradient of the regularization term which is the second loss term of our full objective function. is a non-negative function that measures the divergence between ground truth distribution and perturbed prediction distribution. Inspired by the VAT method by Takeru et al[15], we define the divergence based Local Distributional Smoothness (LDS) as:


Minimizing improves the generalization performance of the model and makes the model more robust against the adversarial examples that violates the virtual adversarial direction. Instead of having heavy data augmentation on the dataset with images perturbed by regular deformation we use adversarial perturbation which reduces the test error [19].

We denote the functionality of the critic by and define the normalized loss of critic for prediction distribution as:


where if the sample is generated by the segmentation network, and if the sample is drawn from the ground truth labels. With this adversarial loss, segmentation network tries to deceive the critic by generating predictions that are more similar to ground truth masks holistically.

4 Experiments

4.1 Implementation Details

The proposed model is developed in PyTorch and trained from scratch. We use modified version of 3D UNet as the segmentation network and a 3D discriminator as the critic network. In the 3D UNet, contracting path comprises five layers including bottleneck and each consisted of two 3x3x3 convolutions together with group normalization and ReLu activation. The number of feature maps in the first encoder is predefined as 48. The down-sampling layer consists a Max pooling operation with a kernel size of 2x2x2 with stride 2. Blocks of expansive path consists performs up-sampling using the trilinear interpolation followed by 3x3x3 convolution. Final layers consists a convolutional layer of a 1x1x1 kernel with 3 output channels and a sigmoid activation. Skip connections between contracting and expansive path lead to concatenation of corresponding outputs. 3D discriminator consists 4 3x3x3 convolutions with batch normalization and leaky ReLu activation function. Discriminator here is implemented, inspired by PatchGAN 

[9] where cubic size is 1x1x1.

4.1.1 Image Pre-processing

Intensities of MRI volumes are inconsistent due to various factors such as motions of patients during the examination, different manufacturers of acquisition devices, sequences and parameters used during image acquisition. To standardize all volumes, min-max scaling was performed followed by clipping intensity values. Images were then cropped to a fixed patch size of by removing unnecessary background pixels.

4.1.2 Training

For training of segmentation network we use Adam optimizer with the learning rate of 2e-04 and for training of critic network, we use RMSProp optimizer with the learning rate of 5e-05 as momentum based methods cause instability 


. Training was done by splitting the original training dataset into training set (80%) and test set (20%) for 100 epochs with batch size of 2. Therefore, 1000 MR volumes are used to train the model while 251 MR volumes were used as test set.

4.1.3 Inference

The BraTS 2021 validation dataset contains 219 MR volumes and synapse portal conducts the evaluation. In the inference phase, the original volume re-scaled using min-max scaling followed by clipping intensity values and cropped to

before feeding to the saved 3D UNet model.

4.2 Performance Evaluation

Segmentation accuracy of three classes (i.e., ET, TC and WT) are evaluated during training and inference. Both qualitative and quantitative analysis is performed to evaluate the model accuracy.

Class   Hausdorff Distance  Dice Score  Sensitivity  Specificity
Enhanced Tumor (ET) 21.8296 81.3898 83.3949 99.9695
Tumor Core (TC) 8.5632 85.3856 85.0726 99.9745
Whole Tumor (WT) 5.3686 90.7654 92.0858 99.9107
Table 1: Validation Phase Results.
Dice Similarity Coefficient
Hausdorff Distance
Figure 3:

The box and whisker plots of the distribution of the segmentation metrics for Validation Phase Results. The box-plot shows the minimum, lower quartile, median, upper quartile and maximum for each tumor class. Outliers are shown away from lower quartile.

Axial View Coronal View Sagittal View
Figure 4: Validation Phase Results for the Sample BraTS202100190. Here, green, yellow and gray represents the Whole tumor (WT), Enhancing Tumor (ET) and Tumor Core (TC) classes respectively. (Dice (ET) = 97.2585, Dice (TC) = 99.1492, Dice (WT) = 97.5753 )

4.2.1 Evaluation Matrices

The learning model is evaluated using four matrices (1) Dice Srensen coefficient (DSC), (2) Hausdorff Distance, (3) Sensitivity and (4) Specificity.

4.2.2 Validation Phase Experimental Results

The quantitative and qualitative results during validation phase for the proposed approach is shown in Table 1 Fig. 3 and Fig. 4. It is noticeable that, the proposed framework helps in identifying fine predictions successfully.

4.2.3 Testing Phase Experimental Results

Our final evaluation results on the testing dataset are shown in Table 2. Compared to validation phase results, it can be seen that average of Dice Similarity Scores for tumor sub regions is improved during testing phase.

Class   Hausdorff Distance  Dice Score  Sensitivity  Specificity
Enhanced Tumor (ET) 13.4802 84.5530 88.0258 99.9680
Tumor Core (TC) 16.9814 85.3010 87.7660 99.9637
Whole Tumor (WT) 6.3239 90.4583 92.1467 99.9161
Table 2: Testing Phase Results.

5 Conclusion

In this work, we demonstrate a simple and effective way to improve training of 3D U-Net by reciprocal adversarial learning. Our approach extends the VAT method, making the segmentation network robust to adversarial perturbations, by generating adversarial examples and adapt min-max approach adapting GAN architecture. Our experiments showed that the virtual adversarial training and uncertainty guidance help to encourage the performance of the segmentation network.


  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §4.1.2.
  • [2] U. Baid, S. Ghodasara, M. Bilello, S. Mohan, E. Calabrese, E. Colak, K. Farahani, J. Kalpathy-Cramer, F. C. Kitamura, S. Pati, et al. (2021) The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314. Cited by: §3.1.
  • [3] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. Kirby, J. Freymann, K. Farahani, and C. Davatzikos (2017) Segmentation labels and radiomic features for the pre-operative scans of the tcga-gbm collection. the cancer imaging archive. Nat Sci Data 4, pp. 170117. Cited by: §3.1.
  • [4] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. Kirby, J. Freymann, K. Farahani, and C. Davatzikos (2017) Segmentation labels and radiomic features for the pre-operative scans of the tcga-lgg collection. The cancer imaging archive 286. Cited by: §3.1.
  • [5] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. Farahani, and C. Davatzikos (2017) Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data 4 (1), pp. 1–13. Cited by: §3.1.
  • [6] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pp. 424–432. Cited by: §1, §1, §2.1.
  • [7] M. D. Cirillo, D. Abramian, and A. Eklund (2020) Vox2Vox: 3d-gan for brain tumour segmentation. arXiv preprint arXiv:2003.13653. Cited by: §2.2.
  • [8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2.2.
  • [9] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In

    Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    pp. 1125–1134. Cited by: §2.2, §3.3, §4.1.
  • [10] C. Li and M. Wand (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In European conference on computer vision, pp. 702–716. Cited by: §3.3.
  • [11] X. Li, H. Chen, X. Qi, Q. Dou, C. Fu, and P. Heng (2018) H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE transactions on medical imaging 37 (12), pp. 2663–2674. Cited by: §2.1.
  • [12] F. Mahmood, D. Borders, R. Chen, G. N. McKay, K. J. Salimian, A. Baras, and N. J. Durr (2019) Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. on Medical Imaging. Cited by: §2.2.
  • [13] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, et al. (2014) The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34 (10), pp. 1993–2024. Cited by: §3.1.
  • [14] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pp. 565–571. Cited by: §1, §2.1.
  • [15] T. Miyato, S. Maeda, M. Koyama, and S. Ishii (2018)

    Virtual adversarial training: a regularization method for supervised and semi-supervised learning

    IEEE Trans. on Pattern Analysis and Machine Intelligence 41 (8), pp. 1979–1993. Cited by: §2.2, §3.4.
  • [16] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz, et al. (2018) Attention u-net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999. Cited by: §2.1.
  • [17] T. M. Quan, T. Nguyen-Duc, and W. Jeong (2017) Compressed sensing mri reconstruction with cyclic loss in generative adversarial networks. arXiv preprint arXiv:1709.00753. Cited by: §2.2.
  • [18] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In Proc. Int, Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 234–241. Cited by: §2.1, §3.3.
  • [19] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §3.4.
  • [20] X. Xiao, S. Lian, Z. Luo, and S. Li (2018) Weighted res-unet for high-quality retina vessel segmentation. In 2018 9th international conference on information technology in medicine and education (ITME), pp. 327–331. Cited by: §2.1.
  • [21] Y. Zhang, S. Miao, T. Mansi, and R. Liao (2018) Task driven generative modeling for unsupervised domain adaptation: application to x-ray image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 599–607. Cited by: §2.2.
  • [22] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang (2018) Unet++: a nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11. Cited by: §2.1.