Brain tumor is considered as the deadliest and most common form of cancer in both children and adults. Determining the correct type of brain tumor in its early stage is the key aspect for further diagnosis and treatment process. However, for any radiologist, identification and segmentation of brain tumor via multi-sequence MRI scans for diagnosis, monitoring, and treatment, are complex and time-consuming tasks.
Brain tumor segmentation is a challenging task because of its varied behavior both in terms of structure and function. Furthermore, the tumor intensity of a person differs significantly from each other. MRI is preferred over other imaging modalities  for the diagnosis of brain tumor because of its non-invasive property that follows from without the exposure to ionizing radiations and superior image contrast in soft tissues.
Deep learning has shown advancement in various fields with promising performance especially in the area of biomedical image analysis 
. The convolutional neural networks (CNN) are the most widely used models in image processing. The CNNs involve combination of convolution, pooling and activation layers accompanied with the normalization and regularization operations to extract and learn the target specific features for desired task (classification, localization, segmentation, etc.). In recent years various techniques have been proposed for identification (classification and segmentation) of the brain tumor using MRI images that achieved promising results [13, 30]
. However, most of the approaches use millions of trainable parameters that result in slower training and analysis time, while also having high variance in results in case of limited data samples.
In order to overcome the aforementioned drawbacks, Ronneberger et al. 
proposed U shaped network (U-Net) for biomedical image segmentation. The model follows encoder-decoder design with feature extraction (contraction path) and reconstruction phases (expansion path) respectively. In addition, skip connections are introduced to propagate the extracted feature maps to the corresponding reconstruction phase to aid upsample the feature maps. Finally, model produces segmentation mask in same dimensions as the input highlighting the target structure (tumor in our case). Following the state-of-the-art potential of the U-Net model, many U-Net variants are proposed to further improve the segmentation performance. Attention based U-Net model is one such variant that tend to draw the focus of the model towards target features to achieve better segmentation results. The attention filters are introduced in the skip connections where each feature is assigned weight coefficient to highlight its importance towards the target features. Despite achieving the promising results, these models have millions of trainable parameter which can be reduced by optimizing the convolution operation. This can be achieved by incorporating the depthwise convolution operations  that is performed in two stages: depthwise and pointwise convlutions. The reduction in the number of the parameters and multiplications as compared to standard convolution operation can represented as , where is the depth of the output feature map and is the kernel height or width . The achieved reduction in number of parameters and multiplications is . Following this context, attention guided network is proposed that uses depthwise separable convolution for real time segmentation and classification of the brain tumor using MRI imaging. The major contribution of the present research work is as follows:
A novel model, Multi-task (segmentation and classification) attention guided network (MAG-Net) is proposed for brain tumor diagnosis.
Optimization of training parameters using depthwise separable convolution. The training parameters of the MAG-Net reduced from 26.0M to 5.4M.
MAG-Net achieved significant improvement in classification and segmentation as compared to the state-of-the-art models while having limited data samples.
The rest paper is organized as follows: Section 2 describes the crux of related work on brain tumor segmentation and classification. Section 3, talks about the proposed architecture, whereas Section 4 discuses the training and testing environment with experimental and comparative analysis. Finally, concluding remarks are presented in Section 5.
2 Literature review
Identifying the brain tumor is a challenging task for the radiologists. Recently, several deep learning based approaches are proposed to aid in faster diagnosis of the diseases. Segmentation of the infected region is most common and critical practice involved in the diagnosis. In addition, the segmented region can be provided with label (classification) to indicate what type of anomaly or infection is present in the image.
In contrast to the traditional approaches, Cheng et al. 
proposed a brain tumor classification approach using augmented tumor region instead of original tumor region as RoI (region of interest). Authors utilized the bag of word (BOW) technique to segment and extract local features from RoI. Dictionary is used for encoding the extracted local features maps that are passed through SVM (support vector machine) classifier. The approach outperformed the traditional classification techniques with the accuracy of 91.28% but the performance is limited by the data availability. In similar work, Ismael et al.
proposed an approach of combining statistical features along with neural networks by using filter combination: discrete wavelet transform (DWT)(represented by wavelet coefficient) and Gabor filter (for texture representation). For classification of the tumor, three layered neural network classifier is developed using multilayer perceptron network that is trained with statistical features. In contrast to Cheng et al., authors also achieved promising results on the limited data samples with an overall accuracy of. 91.9%.
Recently, capsule network  has shown great performance in many fields especially in biomedical image processing. Afshar et al.  proposed basic capsnet with three capsules in last layer representing three tumor classes. However, due to varied behavior (background, intensity, structure, etc.) of MRI image, the proposed model failed to extract optimal features representing the tumor structure. The author achieved the tumor classification accuracy of 78% and 86.5% using raw MRI images and tumor segmented MRI images respectively. In another approach, Pashaei et al. 
utilized CNN and kernel extreme learning machine that comprises one hidden layer with 100 neurons to increase the robustness of the model. With several experimental trials, the authors achieved an accuracy of 93.68% but detects only 1% of the positive pituitary tumor cases out of the total pituitary tumor case. Deepak et al.
proposed a transfer learning approach that uses pre-trained GoogleNet model to extract features (referred as deep CNN features) with softmax classifier in the output layer to classify three tumor classes. Furthermore, the authors combine the deep CNN features and SVM model to analyse the classification performance. The authors achieved 97.1% accuracy but resulted in poor performance by standalone GoogleNet model due to overfitting with limited training image dataset, and misclassifications in meningioma tumor. In another approach, Pernas et al. proposed to process images in three different spatial scales along with multi pathways feature scales for classification and segmentation of brain tumor. The images are pre-processed with elastic transform for preventing overfitting. The model analyses entire image and classifies pixel by pixel in one of four possible output labels (i.e. 0-healthy, 1-meningioma, 2-glioma, and 3-pituitary tumor). The proposed approach outperformed existing approaches with 97.3% classification accuracy, but with poor segmentation performance. Following this context, in this article multi-task attention guided network (MAG-Net) is proposed based on the U-Net architectural design  that uses parallel depthwise separable convolution layers for multi-level feature extraction along with an attention mechanism to better extract tumor features for brain tumor classification and generate the corresponding tumor mask.
3 Proposed work
The proposed mutli-task attention guided network (MAG-Net) model, as shown in Fig. 1, focuses on reducing overall computation, better feature extraction and optimizing the training parameters by reduction. The overall architectural design consists of an encoder, decoder, and classification module with 5.4M trainable parameters. The overall architectural design of the model is inspired by the U-Net encoder-decoder style . Due to its state-of-the-art potential, this model is the most prominent choice among the researchers to perform biomedical image segmentation .
In MAG-Net to reduce the number of training parameters without the cost of performance, standard convolution operations are replaced with depthwise separable convolution. In addition, the skip connections are equipped with attention filters  to better extract the feature maps concerning the tumor regions. The attention approach filters the irrelevant feature maps in the skip connection by assigning weights to highlight its importance towards the tumor regions. Besides, the encoder block is equipped with parallel separable convolution filters of different sizes, where the extracted feature maps are concatenated for better feature learning. These features are then passed to the corresponding decoder blocks via attention enabled skip connections to aid in feature reconstruction with the help of upsampling operation. The bottleneck layer connects the feature extraction path to the feature reconstruction path. In this layer filters of different sizes are used along with the layer normalization. Furthermore, the classification is performed using the extracted feature maps obtained from the final encoder block.
To detect the shape and size of varying image like brain tumor it is required to use separable convolution of different sizes. Inspired from the concept of inception neural network 
the encoder segment is consist of separable convolutions of 1 x 1, 3 x 3, and 5 x 5 kernels. Each of separable convolutions are followed by layer normalization. The extracted feature maps are fused with add operation that are downsampled by max pooling operation. Fig.2, shows the proposed encoder architecture of MAG-Net model for some input feature map, , where , and are the width, height and depth of the feature map.
The decoder component follows from the encoder block and that tend to reconstruct the spatial dimension to generate the output mask in same dimension as input. It consists of upsampling of the feature maps along with the concatenation with the attention maps followed by a separable convolution operation. Long skip connections  are used to propagate the attention feature maps from encoder to decoder to recover spatial information that was lost during downsampling in encoder. By using attention in the skip connection it helps the model to suppress the irrelevant features.
This module classifies the brain tumor MRI images into respective classes i.e meningioma, glioma, and pituitary tumor by utilizing the features extracted from the encoder block. This encoder block act as backbone model for both classification and segmentation, thereby reducing the overall complexity of the model. In this classification block the feature maps of the last encoder block act as input that are later transformed into 1D tensor by using global average pooling. The pooled feature maps are then processed with multiple fully connected layers. The classification output is generated from the softmax activated layer that generates the probability distribution of the tumor classes for an image.
4 Experiment and Results
4.1 Dataset Setup
The present research work utilizes Figshare  dataset that comprises of 2D MRI scan with T1-weighted contrast-enhanced modality acquired from 233 patients to form a total of 3064 MRI scans. The T1 modality highlight distinct features of the brain tumor with three classes representing the type of brain tumor i.e. meningioma (708 slices), glioma (1426 slices), and pituitary (930 slices) forming 23%, 46.5%, and 30% class distribution in the dataset respectively. The sample MRI slices of different tumor classes are presented in Fig. 3. Dataset is randomly split into 80% training and 20% of the validation set. The training and testing composition kept the same throughout the experiment trails for comparative analysis.
4.2 Training and Testing
The MAG-Net model is trained and evaluated on the Figshare dataset. The training phase is accompanied with early-stopping  to tackle the overfitting problem, and Adam as a learning rate optimiser 
. Cross entropy based loss functions are most popularly used for model training and validating segmentation and classification tasks. Following this, binary cross entropy and categorical cross entropy functions are employed for training the model for binary tumor mask generation and classification respectively. Binary cross entropy (BCE, shown in Eq.1) is a sigmoid activation  followed by cross entropy loss 
that compares each of the predicted probabilities to actual output. Categorical cross entropy (CE, shown in Eq.2
) is a softmax activation function followed by cross-entropy-loss that compares the output probability over each tumor class for each MRI image.
where represents actual tumor mask, represents predicted tumor mask and is the total number of images.
where is the no. of class, is the probability of occurrence of each class represents 1 for true label and 0 for others.
For segmentation the most popular evaluation matrics are dice coefficient (shown in Eq. 3
) and intersection-over-union (IoU / Jaccard index) (shown in Eq.4), and hence are utilized to evaluate the trained MAG-Net model. TP defines correctly classified predictions FP defines wrongly classified, and FN defines missed objects of each voxel.
To evaluate classification module of the MAG-Net, accuracy, precision, recall, f1-score and micro average metrics are considered for better quantification and visualization of the performance of the model. Precision of the class, as shown in Eq. 5, quantifies about the positive prediction accuracy of the model. Recall is the fraction of true positive which are classified correctly (shown in Eq. 6). F1-score quantifies the amount of correct predictions out of all the positive predictions (shown in Eq. 7). Support quantifies the true occurrence in the specified dataset of the respective class. Micro average () (shown in Eq. 8, Eq. 9 and Eq. 10) is calculated for precision, recall, and F1-score. To compute micro average (), the test dataset is divided into two sub dataset, on each of which the true positive, false positive and false negative predictions are identified.
where , , and belong to the first set and , , and belongs to the different sets.
is the harmonic mean.
The MAG-Net outputs the segmented mask of a given MRI image consisting of tumor region corresponding to meningioma, glioma, and pituitary as classified by the model. For randomly chosen MRI slices, Fig. 4 presents the segmentation and classification results of model. The visual representation confirms that the results are close to the ground truth of respective tumor classes.
|Model||Accuracy||Loss||Dice coefficient||Jaccard index||Parameters|
|*bold quantities indicate the best results.|
Table 1 represents the result of the proposed work for segmentation in the form of accuracy, loss, dice coefficient, Jaccard index, and trainable parameters along with comparative analysis with other popular approaches. The proposed framework outperforms the other approaches in segmenting tumor with the dice and IoU score of 0.74 and 0.60 respectively. In contrast to other models, MAG-Net achieved best results with minimal trainable parameters. The other popular approaches taken in comparative analysis for segmentation are U-Net , U-Net++ [18, 31], and wU-Net. .
|*bold quantities indicate the best results.|
Comparative analysis of the MAG-Net with the existing classification models on test dataset using confusion matrix.
Table 2 and Table 3 represent the results of the proposed work for classification in the form of accuracy, loss, confusion matrix, and classification report for meningioma, glioma, and pituitary tumor along with comparative analysis with other state-of-the-art approaches: VGG-16 , VGG-19 , and ResNet50 . With exhaustive experimental trials it is observed that MAG-Net outperformed the existing approaches with significant margin in all the metrics.
|*bold quantities indicate the best results.|
It is observed that unlike other state-of-the-art models, the MAG-Net model achieved promising results due to the reduction in the overall computation, better feature extraction and training parameters optimization. As shown in Table 1 raw U-Net displayed similar performance but at the cost of large number of trainable parameters. In the MAG-Net model, the encoder block is developed by replacing convolution layers with parallel depthwise separable convolution of various sizes connected in parallel which resulted in better multi-scale feature learning for varying shapes and sizes of the tumor. For reducing spatial loss during feature reconstruction, attention mechanism is used in skip connections for better feature reconstruction. To reduce the overall complexity of the model the feature extracted by encoder blocks are reused to classify the type of brain tumor.
In this paper, the complex task of brain tumor segmentation and classification is addressed using multi-task attention guided network (MAG-Net). This a U-Net based model that features reduction in the overall computation, better feature extraction and training parameters optimization. The proposed architecture achieved significant performance on the Figshare brain tumor dataset by exploiting the state-of-the-art advantages of U-Net, depthwise separable convolution and attention mechanism. The MAG-Net model recorded the best classification and segmentation results compared to the existing classification and segmentation approaches. It is believed that this work can also be extended to other domains involving classification and segmentation tasks.
We thank our institute, Indian Institute of Information Technology Allahabad (IIITA), India and Big Data Analytics (BDA) lab for allocating the centralised computing facility and other necessary resources to perform this research. We extend our thanks to our colleagues for their valuable guidance and suggestions.
-  Afshar, P., Mohammadi, A., Plataniotis, K.N.: Brain tumor type classification via capsule networks. In: 2018 25th IEEE International Conference on Image Processing (ICIP). pp. 3129–3133. IEEE (2018)
-  Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET). pp. 1–6. Ieee (2017)
-  Brownlee, J.: Use early stopping to halt the training of neural networks at the right time. https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/ (2018), [Online; accessed April 17, 2021]
-  Cancer.Net: Brain tumor: Diagnosis. https://www.cancer.net/cancer-types/brain-tumor/diagnosis (2020), [Online; accessed March 20, 2021]
-  CarryHJR: Nested unet. https://github.com/CarryHJR/Nested-UNet/blob/master/model.py. (2020), [Online; accessed March 11, 2021]
Cheng, J.: brain tumor dataset (4 2017),
-  Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., Yun, Z., Wang, Z., Feng, Q.: Enhanced performance of brain tumor classification via tumor region augmentation and partition. PloS one 10(10), e0140381 (2015)
-  Deepak, S., Ameer, P.: Brain tumor classification using deep cnn features via transfer learning. Computers in biology and medicine 111, 103345 (2019)
-  Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C.: The importance of skip connections in biomedical image segmentation. In: Deep learning and data labeling for medical applications, pp. 179–187. Springer (2016)
-  Díaz-Pernas, F.J., Martínez-Zarzuela, M., Antón-Rodríguez, M., González-Ortega, D.: A deep learning approach for brain tumor classification and segmentation using a multiscale convolutional neural network. Healthcare 9(2), 153 (2021). https://doi.org/10.3390/healthcare9020153, https://app.dimensions.ai/details/publication/pub.1135094000 and https://www.mdpi.com/2227-9032/9/2/153/pdf
-  Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with em routing. In: International conference on learning representations (2018)
-  Işın, A., Direkoğlu, C., Şah, M.: Review of mri-based brain tumor image segmentation using deep learning methods. Procedia Computer Science 102, 317–324 (2016)
-  Ismael, M.R., Abdel-Qader, I.: Brain tumor classification via statistical features and back-propagation neural network. In: 2018 IEEE international conference on electro/information technology (EIT). pp. 0252–0257. IEEE (2018)
-  Jain, A.: brain tumor segmentation u-net. https://github.com/adityajn105/brain-tumor-segmentation-unet (2020), [Online; accessed January 08, 2021]
-  Jamel, T.M., Khammas, B.M.: Implementation of a sigmoid activation function for neural network using fpga. In: 13th Scientific Conference of Al-Ma’moon University College. vol. 13 (2012)
-  Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
MrGiovanni: U-net++ keras.https://github.com/MrGiovanni/UNetPlusPlus (2020), [Online; accessed March 12, 2021]
-  Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., Glocker, B., Rueckert, D.: Attention u-net: Learning where to look for the pancreas (2018)
Pashaei, A., Sajedi, H., Jazayeri, N.: Brain tumor classification via convolutional neural network and extreme learning machines. In: 2018 8th International conference on computer and knowledge engineering (ICCKE). pp. 314–319. IEEE (2018)
-  Punn, N.S., Agarwal, S.: Chs-net: A deep learning approach for hierarchical segmentation of covid-19 infected ct images. arXiv preprint arXiv:2012.07079 (2020)
-  Punn, N.S., Agarwal, S.: Inception u-net architecture for semantic segmentation to identify nuclei in microscopy cell images. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16(1), 1–15 (2020)
-  Punn, N.S., Agarwal, S.: Multi-modality encoded fusion with 3d inception u-net and decoder model for brain tumor segmentation. Multimedia Tools and Applications pp. 1–16 (2020)
-  Punn, N.S., Agarwal, S.: Modality specific u-net variants for biomedical image segmentation: A survey. arXiv preprint arXiv:2107.04537 (2021)
-  raghakot: keras-resnet. https://github.com/raghakot/keras-resnet (2017), [Online; accessed March 18, 2021]
-  Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
-  Ruder, S.: An overview of gradient descent optimization algorithms (2017)
-  Thakur, R.: step by step vgg16 implementation in keras for beginners. https://towardsdatascience.com/step-by-step-vgg16-implementation-in-keras-for-beginners-a833c686ae6c (2019), [Online; accessed March 20, 2021]
-  Zhang, Z., Sabuncu, M.R.: Generalized cross entropy loss for training deep neural networks with noisy labels. arXiv preprint arXiv:1805.07836 (2018)
-  Zhou, T., Ruan, S., Canu, S.: A review: Deep learning for medical image segmentation using multi-modality fusion. Array 3, 100004 (2019)
-  Zhou, Z., Siddiquee, M., Tajbakhsh, N., Liang, J.U.: A nested u-net architecture for medical image segmentation. arXiv preprint arXiv:1807.10165 (2018)