vizviva_brats_2021
Code for Reciprocal Adversarial Learning for Brain Tumor Segmentation: A Solution to BraTS Challenge 2021 Segmentation Task
view repo
This paper proposes an adversarial learning based training approach for brain tumor segmentation task. In this concept, the 3D segmentation network learns from dual reciprocal adversarial learning approaches. To enhance the generalization across the segmentation predictions and to make the segmentation network robust, we adhere to the Virtual Adversarial Training approach by generating more adversarial examples via adding some noise on original patient data. By incorporating a critic that acts as a quantitative subjective referee, the segmentation network learns from the uncertainty information associated with segmentation results. We trained and evaluated network architecture on the RSNA-ASNR-MICCAI BraTS 2021 dataset. Our performance on the online validation dataset is as follows: Dice Similarity Score of 81.38 Hausdorff Distance (95%) of 21.83 mm, 5.37 mm, 8.56 mm for the enhancing tumor, whole tumor and tumor core, respectively. Similarly, our approach achieved a Dice Similarity Score of 84.55 Hausdorff Distance (95%) of 13.48 mm, 6.32 mm and 16.98 mm on the final test dataset. Overall, our proposed approach yielded better performance in segmentation accuracy for each tumor sub-region. Our code implementation is publicly available at https://github.com/himashi92/vizviva_brats_2021
READ FULL TEXT VIEW PDFCode for Reciprocal Adversarial Learning for Brain Tumor Segmentation: A Solution to BraTS Challenge 2021 Segmentation Task
Segmentation accuracy on boundaries is essential in medical image segmentation as it is crucial for many clinical applications, such as treatment planning, disease diagnosis and image guided intervention to name a few. Tremendous progress in deep learning algorithms in dense pixel level prediction tasks has recently drawn attention on implementing automatic segmentation applications for brain tumor/giloma segmentation. Gliomas considered as the most common brain tumor variant in adults. Diagnosing High-Grade Gliomas (HGG) in early phases which are more malignant (since they usually grow fast and frequently destroy healthy brain tissue) is essential for treatment planning. On the other hand Low-Grade Gliomas (LGG) are slower growing tumors which can be cured if it is diagnosed in early phases. However, segmenting tumor sub regions from various medical images modalities (e.g., MRI and CT) is a monotonous process which is time consuming and subjective. Medical Imaging analysis is carried out by radiologists and this manual process is tedious since the volumes are hefty in size and contains heterogeneous ambiguous sub-regions (i.e
. edema, active tumor structures, necrotic components, and non-enhancing gross abnormality). In particular, medical image segmentation plays a cornerstone role in computer aided diagnosis. With the recent development in computer vision algorithms in deep learning, there has been many discoveries on automatic medical image segmentation. Multi-modal brain tumor segmentation challenge (BraTS) has been one of the platforms for many discoveries for many years. During the last decade, variants of Fully convolutional networks (FCN) and Convolutional Neural Network (CNN) based architectures have shown convincing performance in previous BraTS and other segmentation challenges. Recent developments in volumetric medical image segmentation networks like 3D-Unet
[6] and V-Net [14] has been widely used with medical image modalities since these networks produce predictions for different planes(i.e. axial (divides the body into top and bottom halves), coronal (perpendicular), and sagittal (midline of the body)).The main limitation of implementing and training these volumetric neural network architectures is out-of-memory (OOM) issues and extending these architectures are not feasible due to computational resource constraints. Many researchers have shown that, with a carefully crafted pre-processing, training and inference procedure, segmentation accuracy of 3D-UNet can improve further. By considering those factors like OOM issues, resource limitations, inference time, we propose an approach to tackle these challenges and further improve the segmentation accuracy and training process of 3D-UNet architecture
[6]. In summary, our major contributions are,Inspired by adversarial learning techniques, we propose two way adversarial learning to segment brain tumor sub regions in multi-modal MR images.
We introduce a volumetric discriminator model which can explicitly show the confidence towards the current prediction to impose a higher-order consistency measure of prediction and ground truth during training.
We introduce Virtual Adversarial Training (VAT) during model training to enhance the model’s robustness to data artefacts.
The rapid development of deep Convolutional Neural Networks and U-shaped encoder decoder architectures have shown convincing performance in medical image segmentation. The celebrated work U-Net [18] has shown a novel direction to automatic medical image segmentation as it exploits both spatial and contextual information of images which greatly affect accuracy of segmentation models. Due to the simplicity and superior performance U-Net, many variants of U-shaped architectures are constantly emerging, such as Res-UNet [20], H-Dense-UNet [11], U-Net++ [22] and Attention-UNet [16]. Later, to handle volumetric medical image segmentation models are introduced into the field of 3D medical image segmentation, such as 3D-Unet [6] and V-Net [14].
Generative Adversarial Networks(GANs) [8] by Goodfellow has been a major breakthrough in the image generation task. Inspired by GAN approach, many GAN based medical imaging applications were introduced recently including in the areas of medical image segmentation [12], reconstruction [17] and domain adaptation [21]. In BraTS challenge 2020, Marco et al. proposed 3D volume-to-volume Generative Adversarial Network for segmentation of brain tumours [7] where the discriminator is build based on PatchGAN [9]
architecture style. VAT is another adversarial learning approach which has shown tremendous performance in semi-supervised learning
[15]. VAT is applicable to any parametric model and it directly regularizes the output distribution by its local sensitivity of the output with respect to input
[15].Hence, inspired by above works, we propose min-max formulation with VAT for segmenting brain tumors in multi-modal MR images.
We start this section by providing an overview of the BraTS dataset and proposed method as shown in Fig. 2. Then we detail out the structure of each module and the entire training pipeline.
The Magnetic Resonance images used for the model training and evaluation are from the Multi-modal Brain tumour Segmentation Challenge (BraTS) 2021 [2, 13, 5, 3, 4]. The BraTS 2021 training dataset contains 1251 MR volumes of shape . MRI is required to evaluate tumor heterogeneity. These MRI sequences are conventionally used for giloma detection: T1 weighted sequence (T1), T1-weighted contrast enhanced sequence using gadolinium contrast agents (T1Gd) (T1CE), T2 weighted sequence (T2), and Fluid attenuated inversion recovery (FLAIR) sequence. From these sequences, four distinct tumor sub-regions can be identified from MRI as: The Enhancing Tumor (ET) which corresponds to area of relative hyper-intensity in the T1CE with respect to the T1 sequence, Non Enhancing Tumor (NET), Necrotic Tumor (NCR) which are both hypo-intense in T1-Gd when compared to T1, Peritumoral Edema (ED) which is hyper-intense in FLAIR sequence. These almost homogeneous sub-regions can be clustered together to compose three semantically meaningful tumor classes as, Enhancing Tumor (ET), addition of ET, NET and NCR represents the Tumor Core (TC) region and addition of ED to TC represents the Whole Tumor (WT). MRI sequences and ground truth map with three classes are shown in Fig. 1.
![]() |
![]() |
![]() |
![]() |
![]() |
Flair | T1 | T1CE | T2 | GT |
Let be a labeled set with number of samples, where each sample consists of an image and its associated ground-truth segmentation mask . Pixels with 0,1,2 and 4 in label-map represent the background/air, Necrotic (NCR) and Non-enhancing tumor core (NET), Peritumoral Edema (ED) and Enhancing Tumor (ET).
The proposed network architecture consists of three modules, namely a segmentation network, a critic network and Virtual adversarial Training (VAT) block. The segmentation network (i.e., ) composed of down-sampling and up-sampling layers with skip pathways, making it a U like network architecture [18]. Critic is constructed as a fully convolutional adversarial network. Both networks consists 3D convolutions. The critic constructively impose the segmentation network to predict segmentation masks that are more similar to ground truth masks. The critic here, depicts Markovian PatchGAN architecture [10, 9]. In the original work Markovian PatchGAN architecture enables producing confidence scores for prediction masks. Inspired by this, we adapt the similar approach to provide uncertainty information to the segmentation network. The VAT block generates adversarial examples, so that the segmentation network can learn to avoid making such incorrect predictions on new patient data and patient data with artefacts.
The parameters of segmentation network is defined as and the critic network is . To encourage the segmentation network to yield predictions closer to the ground truth real masks by deceiving a critic network, we propose optimizing the following min-max problem:
(1) |
We propose to train the segmentation network by minimizing a the total loss function which consists of three terms:
(2) |
where , , and denote the supervised dice loss, the virtual adversarial training loss and the critic loss respectively. Furthermore, are hyper-parameters of the algorithm, controlling the contribution of each loss term. It can be seen that the supervised dice loss and vat loss are only dependent on the segmentation networks while the critic loss is defined based on the parameters of the entire model. The segmentation network works robustly and shows generalization performance as long as these parameters are defined in a reasonable range. In our experiments we set , and .
As the main loss, we use dice loss and we calculate dice loss for each class (Multi-class loss function):
(3) |
where we use , and is the smoothing factor (set to 1 in our experiment).
VAT is an algorithm that updates the model by the weighted sum of the gradient of the regularization term which is the second loss term of our full objective function. is a non-negative function that measures the divergence between ground truth distribution and perturbed prediction distribution. Inspired by the VAT method by Takeru et al. [15], we define the divergence based Local Distributional Smoothness (LDS) as:
(4) |
Minimizing improves the generalization performance of the model and makes the model more robust against the adversarial examples that violates the virtual adversarial direction. Instead of having heavy data augmentation on the dataset with images perturbed by regular deformation we use adversarial perturbation which reduces the test error [19].
We denote the functionality of the critic by and define the normalized loss of critic for prediction distribution as:
(5) |
where if the sample is generated by the segmentation network, and if the sample is drawn from the ground truth labels. With this adversarial loss, segmentation network tries to deceive the critic by generating predictions that are more similar to ground truth masks holistically.
The proposed model is developed in PyTorch and trained from scratch. We use modified version of 3D UNet as the segmentation network and a 3D discriminator as the critic network. In the 3D UNet, contracting path comprises five layers including bottleneck and each consisted of two 3x3x3 convolutions together with group normalization and ReLu activation. The number of feature maps in the first encoder is predefined as 48. The down-sampling layer consists a Max pooling operation with a kernel size of 2x2x2 with stride 2. Blocks of expansive path consists performs up-sampling using the trilinear interpolation followed by 3x3x3 convolution. Final layers consists a convolutional layer of a 1x1x1 kernel with 3 output channels and a sigmoid activation. Skip connections between contracting and expansive path lead to concatenation of corresponding outputs. 3D discriminator consists 4 3x3x3 convolutions with batch normalization and leaky ReLu activation function. Discriminator here is implemented, inspired by PatchGAN
[9] where cubic size is 1x1x1.Intensities of MRI volumes are inconsistent due to various factors such as motions of patients during the examination, different manufacturers of acquisition devices, sequences and parameters used during image acquisition. To standardize all volumes, min-max scaling was performed followed by clipping intensity values. Images were then cropped to a fixed patch size of by removing unnecessary background pixels.
For training of segmentation network we use Adam optimizer with the learning rate of 2e-04 and for training of critic network, we use RMSProp optimizer with the learning rate of 5e-05 as momentum based methods cause instability
[1]. Training was done by splitting the original training dataset into training set (80%) and test set (20%) for 100 epochs with batch size of 2. Therefore, 1000 MR volumes are used to train the model while 251 MR volumes were used as test set.
The BraTS 2021 validation dataset contains 219 MR volumes and synapse portal conducts the evaluation. In the inference phase, the original volume re-scaled using min-max scaling followed by clipping intensity values and cropped to
before feeding to the saved 3D UNet model.Segmentation accuracy of three classes (i.e., ET, TC and WT) are evaluated during training and inference. Both qualitative and quantitative analysis is performed to evaluate the model accuracy.
Class | Hausdorff Distance | Dice Score | Sensitivity | Specificity |
Enhanced Tumor (ET) | 21.8296 | 81.3898 | 83.3949 | 99.9695 |
Tumor Core (TC) | 8.5632 | 85.3856 | 85.0726 | 99.9745 |
Whole Tumor (WT) | 5.3686 | 90.7654 | 92.0858 | 99.9107 |
![]() |
Dice Similarity Coefficient |
![]() |
Hausdorff Distance |
The box and whisker plots of the distribution of the segmentation metrics for Validation Phase Results. The box-plot shows the minimum, lower quartile, median, upper quartile and maximum for each tumor class. Outliers are shown away from lower quartile.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Axial View | Coronal View | Sagittal View |
The learning model is evaluated using four matrices (1) Dice Srensen coefficient (DSC), (2) Hausdorff Distance, (3) Sensitivity and (4) Specificity.
Our final evaluation results on the testing dataset are shown in Table 2. Compared to validation phase results, it can be seen that average of Dice Similarity Scores for tumor sub regions is improved during testing phase.
Class | Hausdorff Distance | Dice Score | Sensitivity | Specificity |
Enhanced Tumor (ET) | 13.4802 | 84.5530 | 88.0258 | 99.9680 |
Tumor Core (TC) | 16.9814 | 85.3010 | 87.7660 | 99.9637 |
Whole Tumor (WT) | 6.3239 | 90.4583 | 92.1467 | 99.9161 |
In this work, we demonstrate a simple and effective way to improve training of 3D U-Net by reciprocal adversarial learning. Our approach extends the VAT method, making the segmentation network robust to adversarial perturbations, by generating adversarial examples and adapt min-max approach adapting GAN architecture. Our experiments showed that the virtual adversarial training and uncertainty guidance help to encourage the performance of the segmentation network.
Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 1125–1134. Cited by: §2.2, §3.3, §4.1.Virtual adversarial training: a regularization method for supervised and semi-supervised learning
. IEEE Trans. on Pattern Analysis and Machine Intelligence 41 (8), pp. 1979–1993. Cited by: §2.2, §3.4.
Comments
There are no comments yet.