Deep Learning Based Brain Tumor Segmentation: A Survey

07/18/2020 ∙ by Zhihua Liu, et al. ∙ University of Leicester 46

Brain tumor segmentation is a challenging problem in medical image analysis. The goal of brain tumor segmentation is to generate accurate delineation of brain tumor regions with correctly located masks. In recent years, deep learning methods have shown very promising performance in solving various computer vision problems, such as image classification, object detection and semantic segmentation. A number of deep learning based methods have been applied to brain tumor segmentation and achieved impressive system performance. Considering state-of-the-art technologies and their performance, the purpose of this paper is to provide a comprehensive survey of recently developed deep learning based brain tumor segmentation techniques. The established works included in this survey extensively cover technical aspects such as the strengths and weaknesses of different approaches, pre- and post-processing frameworks, datasets and evaluation metrics. Finally, we conclude this survey by discussing the potential development in future research work.



There are no comments yet.


page 1

page 2

page 3

page 7

page 8

page 9

page 13

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Information services in computer-assisted intervention have been considered as an important tool in medical imaging applications for a long time. These applications have been commonly found in basic medical research and clinical treatment, e.g. computer-aided diagnosis [34], medical record data management [68], medical robots [111] and medical image analysis [76]. Medical image analysis can provide precise guidance for medical professionals to understand diseases and investigate clinical challenges in order to improve health-care quality. Among various tasks in medical image analysis, brain tumor segmentation has attracted much attention in the research community, which has been continuously studied (illustrated in Fig. 1 (a)). In spite of tireless efforts of researchers, as a key challenge, brain tumor segmentation still remains to be solved. One of the major reasons for this outcome is that brain tumors may appear in any location inside the human brain with different shapes and sizes. Low-quality imaging and diffusion boundaries between anomaly and normal tissues also make it difficult to obtain sufficient segmentation accuracy. With the promising performance made by powerful deep learning methods, a number of deep learning based methods have been applied on brain tumor segmentation to extract feature representations automatically and achieve promising system performance as illustrated in Fig. 1 (b).

Fig. 1: Growth of professional attention on deep learning technologies. (a) Keyword frequency map in MICCAI from 2018 to 2019. The size of the keyword is proportional to the frequency of the word. We can observe that ’brain’, ’tumor’, ’segmentation’, ’deep learning’ have drawn huge attention in the community. (b) The number of deep learning based solutions in each year’s multimodal brain tumor segmentation challenge (BraTS). We observe that researchers shift their interests to deep learning based segmentation methods due to the powerful feature learning ability and impact performance made by deep learning techniques since 2012 (Blue dashed line). Best viewed in colors.
Fig. 2: Exemplar input dataset with different MRI modalities and corresponding segmentation outputs. Each column represents a unique MRI modality (from left to right: T1, T1c, T2, FLAIR). The last column is the corresponding manual segmentation output.

Gliomas are one of the most common primary brain tumors that stem from astrocytes that form the structural backbone of the brain, where ’primary’ means that the tumors originate from the brain instead of elsewhere. World Health Organization (WHO) reports that [83], gliomas can be graded from level one to four based on their microscopic images and tumor behaviors. Grade I and II are Low-Grade-Gliomas (LGG) which are close to benign and slow growing cases. Grade III and IV are High-Grade-Gliomas (HGG) which are cancerous and aggressive. Current treatment includes surgery removal followed by radiotherapy and chemotherapy. Image segmentation plays an active role in gliomas diagnosis and treatment. For example, information from glioma image segmentation may help surgery planning. Hierarchical features from a glioma segmentation map may also help postoperative observations and improve the survival rate. To quantify the outcome of image segmentation, we define brain tumor segmentation

as follows: Given an image frame from one or multiple MRI sequences, the system aims to automate the segmentation of the tumor area from the tissues and to classify each voxel or pixel of the input data into a pre-set category. Finally, the system returns the segmentation map of the corresponding data. Fig.

2 shows one exemplar HGG and LGG dataset with different MRI sequences and segmentation maps.

Different imaging methods serve with individual advantages and drawbacks. For example, Computed Tomography (CT) is widely used on an emergency basis due to fast imaging speed with the strength of revealing bone fractures, blood and organ injury. Magnetic Resonance Imaging (MRI) provides tissue details with no radiation worry but costs much higher and usually takes more time for image reconstruction. In this survey, we focus more on brain tumor and lesion segmentation methods, especially in architecture comparison and categorization. We wish to explore how different architectures affect deep neural networks’ segmentation performance and how different learning approaches can be further improved for brain tumor segmentation.

Ii Background

Ii-a Research Challenges

Despite significant progress that has been made in brain tumor segmentation, state-of-the-art deep learning based methods still experience unsatisfactory outcomes with several challenges to be solved. The challenges associated with brain tumor segmentation can be categorized as follows:

  • Location Uncertainty Glioma is mutated from gluey cells, which is a kind of supportive cell surrounding nerve cells. Due to the wide spatial distribution of gluey supportive cells, either High-Grade Glioma (HGG) or Low-Grade Glioma (LGG) can appear at any location inside the brain.

  • Morphological Uncertainty Differ from a rigid object, the morphology, e.g. shape and size, of different patients’ brain tumors can vary with large uncertainty. As the external layer of a brain tumor, or edema tissues, show different fluid structures, which barely provide any prior information for describing the tumor shapes. The sub-regions of a tumor may also vary in shape and size.

  • Diffusion and Low Contrast High resolution images with multi-modality channels (such as T1 and T2 in MRI) in high contrast are expected to contain more image information [80]. Due to the image projection and tomography process, the sliced images are common in low quality, most in diffusion and low contrast. The boundary between biological tissues tends to be blurred and hard to detect. Cells nearby the boundary are hard to be classified. This limits the automated algorithms to extract sufficient information for further processing.

  • Annotation Bias Manual annotation highly depends on personal experience, which can introduce an annotation bias during dataset labeling. As the exemplar cases are shown in Fig. 3 (a), some annotations tend to connect all the small regions into a large region while the other annotation tends to label the voxels individually, thus the ground truth annotation tends to be sparse. The annotation biases have a huge impact on the detection algorithm which may be confused by the biases during the learning and processing.

  • Imbalanced Issue As the examples are shown in Fig. 3 (b) and (c), there exists an unbalanced number of voxels in different tumor regions. For example, the NCR/ECT region is much smaller than the other two regions. The imbalanced issue can affect the learning algorithm as well, as the extracted features may be highly dominated by large tumor regions.

Fig. 3 shows research challenges when we conduct automated segmentation algorithms for glioma tumors. This figure shows examples and statistics in detecting glioma tumors, and also illustrate other brain anomalies such as TBI lesions and ischemic stroke lesions.

Fig. 3: Challenges in segmenting brain glioma tumors. (a) shows glioma tumor exemplars with various sizes and locations inside the brain. (b) and (c) show the statistical information of the training set in the multimodal brain tumor segmentation challenge 2017 (BraTS2017). The left hand side figure of (b) shows the FLAIR and T2 intensity projection, and the right hand side figure shows the T1ce and T1 intensity projection. (c) is the pie chart of the training data with labels, where the top figure shows the HGG labels while the bottom figure shows the LGG labels. There are clear region and label imbalances. Best viewed in colors.

Ii-B Related Problems

There are a number of unsolved problems in brain tumor segmentation. Brain tissue segmentation or anatomical brain segmentation aims to label each voxel or pixel into a unique brain tissue class. This segmentation assumption is that the brain image does not contain any tumor tissue or other anomalies [92, 30]. The goal of white matter lesion segmentation is to segment the white matter region from the normal tissue. The white matter lesion itself does not contain sub-regions such as necrotic and cores, where segmentation may be achieved based on binary classification methods. Tumor detection aims to detect abnormal tumors or lesion tissues and report the predicted class of each tissue. This returns the bounding box as the detection result and the label as the classified result [42, 38, 37]. It is worth mentioning that some research works with the words “brain tumor detection” only return the bounding box of the tumor tissue as the detection result. Some research methods return the single label segmentation mask or the center of the tumor core as the point of interest without performing further reasoning and segmentation. In this paper, we focus on glioma tumor segmentation and sub-region voxel (or pixel) level segmentation. Disorder diagnosis is to extract pre-defined features from brain scan images and then classify feature representations into pre-specified disorders such as High-Grade-Glioma (HGG) vs Low-Grade-Glioma (LGG), Mild Cognitive Impairment (MCI) [110], Alzheimer’s Disease (AD) [109] and Schizophrenia [96]. Disorder diagnosis relies on classification algorithms that do not normally need segmentation masks. Survival Prediction concerns identifying tumors’ patterns and activities [121] in order to predict the survival rate of a patient [116]. Survival prediction normally returns a survival rate as a supplementary of the clinical diagnosis, which can be regarded as the down-stream regression task of tumor segmentation or disease diagnosis.

Fig. 4: A taxonomy of this survey for deep learning based brain tumor segmentation.
Survey Title Ref Num Published Year Remarks
State of the art survey on MRI brain tumor segmentation [45] MRI 2013 Summary of segmentation methods before 2013
A survey of MRI-based brain tumor segmentation methods [78] Tsinghua Science and Technology 2014 Survey of methods used for brain MRI segmentation before 2014
A survey on deep learning in medical image analysis [76] MIA 2017 A comprehensive survey of deep learning methods in medical image analysis

Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review

[10] AIM 2018 Review on convolutional neural networks used for brain MRI image analysis
Deep learning for brain MRI segmentation: state of the art and future directions [2] DI 2017 Review on deep learning based brain MRI segmentation methods
A guide to deep learning in health-care [40] Nature Medicine 2019 A survey on promoting health-care applications by deep learning techniques.
Deep learning for generic object detection: A survey [79] arXiv 2018 A comprehensive review on deep learning based object detection
Deep learning [70] Nature 2015 An introduction review on deep learning and its applications
Recent advances in convolutional neural networks [47] PR 2015 A survey on convolutional neural networks and its application on computer vision, language processing and speech.

Probabilistic machine learning and artificial intelligence

[43] Nature 2015 A survey on probabilistic machine learning methods and its application.
Deep Learning Based Brain Tumor Segmentation: A Survey - Ours 2020 A comprehensive survey of deep learning based brain tumor segmentation
TABLE I: A summary of the existing surveys.

Ii-C Difference From Previous Surveys

A number of notable medical image analysis surveys have been published in the last few years. We present recent relevant surveys and review papers in Table I. A survey of early state-of-the-art brain tumor segmentation methods before 2013 was presented in [45], where most of the methodologies were proposed before 2013 using conventional machine learning methods with hand-crafted features, e.g. region-growing and clustering methods. Liu et al. [78] reported a survey on MRI based brain tumor segmentation in 2014. This survey does not include any deep learning based method. A survey reported in [76] summarised deep learning based techniques on medical image analysis. This survey is of broad studies on medical image processing whilst it mentions several deep learning based brain tumor segmentation methods. Bernal et al. [10] reported a survey focusing on the use of deep convolutional neural networks for brain image analysis. This survey only highlights the usage of deep convolutional neural networks. Other important methods such as deep generative models and recurrent networks were not mentioned. Akkus et al. [2] presented a survey on deep learning for brain MRI segmentation. The methods presented in [2] mainly base on convolutional neural network while other deep learning methods were not introduced. Also, they did not discuss important aspects such as datasets, and data pre/post processing. Recently, Esteva et al. [40]

presented a survey on deep learning for health-care. This survey summarized the works of how deep learning in computer vision, natural language processing, reinforcement learning and generalized methods promote health-care applications. For a broader view of object detection and semantic segmentation, a recent survey was published in

[79], providing further implications on object detection and semantic segmentation.

Narrowly speaking, the word ”deep learning” means using neural network models with stacked functional layers (usually layer number 5). Neural networks are able to learn high dimensional hierarchical features and approximate any continuous functions. Considering the achievements and recent advances of deep neural networks, several surveys have reported the developed deep learning techniques, such as [70], [47] and [43].

Ii-D Data Collection

In this survey, we have collected and summarized the research studies reported on over one hundred scientific papers. Google Scholar and IEEE Xplore are the major search engines. We also retrieved major journals in the scientific community including Medical Image Analysis and IEEE Transactions on Medical Imaging. Furthermore, we evaluate proceedings from relevant major conferences, such as ISBI, MICCAI, CVPR, ICCV, and NIPS, to retain frontier research outcomes. We examine annual challenges and their related competition entries such as The Multimodal Brain Tumor Segmentation Challenge (MICCAI BRATS) and Ischemic Stroke Lesion Segmentation Challenge (ISLES). In addition, some of the established methods’ pre-printed versions on arXiv are also included as a source of information.

Ii-E Contribution of this survey

With the breakthrough improvement made by deep learning in recent years, numerous deep learning based methods have been published on brain tumor segmentation and achieved promising results. This paper, as a platform, provides a comprehensive and critical survey of the current deep learning based brain tumor segmentation methods. We anticipate that this survey supplies useful guidelines and coherent technical insights for someone who chooses deep learning as a development tool for brain tumor segmentation. This survey makes these contributions: (1) We provide a summary of current state-of-the-art deep learning based brain tumor segmentation algorithms. (2) We categorize deep learning based brain tumor segmentation algorithms according to different structures or pipelines. (3) We discuss the challenges, open issues and potential working perspectives for deep learning based brain tumor segmentation.

The rest of this survey is organised as illustrated by the taxonomy shown in Fig. 4: In Sec. III, we review the methods of brain tumor segmentation. Related data augmentation, pre- and post-processing techniques are also discussed in Sec. IV. In Sec. V, we explore datasets, related challenging tasks and evaluation metrics. We point out several future research directions in Sec. VI and conclude this paper in Sec. VII.

Method Categories Sub Categories Publications
CNN Single-Path [129, 113, 12, 55, 98, 93, 82]
Multi-Path [49, 58, 123, 115, 26, 102, 16, 126, 28, 73]
FCN Vanilla FCN [11], [54, 12, 104, 18, 53, 3, 106, 87]
U/V-Net Based [52, 36, 35, 60, 25, 32, 63, 89, 9, 103]
Cascaded CNN - [118, 15, 24, 77]
RNN GRU [4, 69]
CRF-RNN [124]
Generative Model GAN [84, 57, 120, 99, 74]
Autoencoders [114, 8, 22, 94]
Ensemble Models - [56, 59, 14, 19, 91, 50]
TABLE II: Overview of relevant papers for deep learning based brain tumor segmentation.
Fig. 5: A sketch comparison between traditional and deep learning based brain tumor segmentation algorithms.

Iii Deep Learning Based Brain Tumor Segmentation Methods

Researchers have considered deep learning as a rising subset of machine learning techniques. Rather than using pre-defined hand-crafted features, deep neural networks can learn hierarchical features thoroughly from the input images. A sketch comparison between traditional and deep learning based brain tumor segmentation algorithms is shown in Fig. 5. Deep learning methods require a large amount of training data for avoiding the over-fitting problem and large computing resources for accelerating the training procedure. Combined with effective weight initialization and optimization strategies, deep learning methods have achieved state-of-art performance in various domains such as object detection [79] and natural language processing [33]. Recently, many researchers have also applied deep learning into medical image related tasks, such as chest x-ray image analysis [27] and breast image analysis [64].

There are many well-known deep learning methods such as convolutional neural networks and recurrent neural networks. In this survey, we categorize various deep learning based brain tumor segmentation methods into four classes according to different structures or fundamentals.

  • Convolutional Neural Networks Based Methods Since their first use in hand-written digit recognition [71] and image classification [66], convolutional neural networks has been widely used in computer vision. Conventional Neural Networks (CNNs) contain stacked convolutional and pooling layers, which can effectively capture the translation invariance of the input signals.

  • Recurrent Neural Networks Based Methods

    Recurrent neural networks can learn the representation of the time sequence inputs. RNNs contain a memory function that remembers and reuses the previously learned information. Variations such as bidirectional-RNNs (Bi-RNNS) and long-short term memory (LSTM) have achieved superior performance on various applications such as video understanding and visual question answering. Most RNN-based brain tumor segmentation treats one dimension in the volumetric data of MRI or CT as the time dimension, and the slices formed by the other two dimensions are sequential inputs of the RNN network.

  • Deep Generative Model Based Methods

    Deep generative models have shown noticeable advances in data simulation and conditional density estimation. Recently, researchers have applied generative adversarial networks (GANs) and auto-encoders (AEs) to brain tumor segmentation. Generative adversarial networks are normally formed by a generator and a discriminator. The generator and the discriminator are trained together to play a minimax game. For the brain tumor segmentation applications, the generator generates the tumor segmentation mask and the discriminator determines whether the tumor segmentation mask is generated or from the ground-truth data

    [84]. On the other hand, the auto-encoder aims to reconstruct healthy brain images from the training images. The difference between the healthy reconstruction result and the original data is the segmented tumor [114].

  • Model Ensembling

    Different networks perform very diversely with different strengths and weaknesses. The models are influenced by the main architectures and the training settings. Recent research shows that model ensembling can average the variances within the sub-models and configuration-specific behaviors. This leads to an unbiased generic method with robust performance.

Fig. 6: A conceptual illustration of a deep learning based brain tumor segmentation system. Colored rectangles represent an optional processing module, while arrow line represents the flow of the data from the multi-modality input to segmentation output. Not all the modules must be included to deliver the functionality.

Examining different main architectures, we categorize relevant deep learning based brain tumor segmentation methods proposed in recent years and show them in Table II

. Some categories contain sub-categories according to the used core technologies in each method, e.g. the CNN category contains two sub-categories listed as single- and multi-path CNNs. Some categories share the same technology with others, e.g. most RNN and Generative Models use CNN as the backbone for feature extraction. Here, we pay more attention to the core idea of the methods and then classify the methods based on the core technologies. Note that the existing methods can be categorized in different ways. For example, these methods can be categorized into 2D and 3D methods based on the input data dimensions. As the goal of this survey is to evaluate how different architecture designs affect the segmentation performance, we use the structure concept for categorisation.

Note that a complete brain tumor segmentation pipeline (illustrated in Fig. 6) must be of other components. However, in this survey, we focus on the similarities, differences, advantages and disadvantages of the segmentation algorithms. A summary of other components, such as dataset summary, data pre-processing, data augmentation, post-processing and evaluation metrics will be discussed broadly in the following sections. More details can be found in the literature.

Iii-a Convolutional Neural Network Based Methods

Convolutional neural networks, often known as CNNs, have a strong ability in processing and learning features from images and videos. A CNN typically contains convolutional layers and pooling layers followed by an activation layer and finally a fully connected layer. CNNs have been widely used as the backbone network in various deep neural network applications. Here, we divide the CNN based methods into four categories: (1) single-path CNN, (2) multi-path CNN, (3) fully convolutional networks (FCNs), and (4) cascaded architecture.

Iii-A1 Single-Path and Multi-Path CNN

Single path neural networks contain a Single path neural networks contain a single flow of data processing where the network starts to sample the input image with convolutional layers. This is followed by the pooling layer and non-linear rectifier layers. Many research works e.g. [95, 129, 113, 12] use single-path networks due to computational efficiency. Compared with single-path networks, multi-path convolutional neural networks can extract different features from different processing pathways with different scales. The extracted features are then combined or concatenated together for further processing. A common interpretation is that the path with a large scale kernel can learn features from a larger reception field known as global features. Paths with a small scale kernel can learn features from a smaller reception field known as local features. Global features provide global spatial information such as global positions of the target object while local features provide more concrete information such as texture, size and boundary details. Early research work like [90] used a 3 pathways CNN to segment brain MRI images. Local pathways contain kernels of size 5 5 and 7 7 while the global pathway using a kernel of size 9 9. Both the local and global pathways contain three convolutional layers and exploit the features simultaneously. Havaei et al. [49] reported a novel two-pathway structure that learns local brain information as well as contexts. The local pathway uses the kernel of 7 7 and the global pathway is of the kernel of 13 13. In order to utilize the CNN architectures, the authors designed several ablation CNN architectures that concatenate the first CNN’s output into the different levels of the second CNN. They tested three different cascaded structures that concatenate the first CNN’s output with (1) the input, (2) a local pathway and (3) the output of the second CNN layer. The vanilla structure of the two-pathway of [48] is shown in Fig. 7.

Fig. 7: A high level comparison between single-path CNN and two-path CNN.

Kamnitsas et al. [58] presented a dual pathway CNN with fully connected conditional random fields as post-processing. Instead of using convolutional kernels of different sizes, the proposed network takes the input with different patch sizes, known as the normal resolution input of size and the low resolution input of size . In order to make the network deeper, the authors use a small convolution kernel at both pathways with a size of . The network learns the local detailed information of the tumor image such as texture and boundary from the local pathways with the normal resolution segment as the input. From the global pathway, the network focuses on learning global spatial information such as the location of a tumor with the low resolution segment as the input.

There are other related systems using a multi-pathway to construct a convolutional neural network for different scaling feature learning. Zhao et al. [123] designed a multi-scale CNN with a large scale path with an input size of , a middle scale path with the input size of and a small scale path with the input size of . Sedlar et al. [102] trained two small convolutional neural networks for brain tumor segmentation. One extracts features from a local region and the other extracts features from a larger region. Choi et al. [26] presented three different architectures that combine fine and coarse features in order to obtain accurate segmentation for the prognosis of ischemic strokes.

Iii-A2 Fully Convolutional Based

In the early stage of using CNNs for image classification, the final layer is a fully connected layer that produces a single value. The output of the final fully connected layer can be interpreted as the predicted label with the highest probability. Long et al.

[81] introduced fully convolutional networks which use a deconvolutional layer to replace the final fully connected layer and predict segmentation masks directly. In the fully convolutional networks, the final fully connected layer is replaced by the deconvolutional layer for up-sampling that transforms the down-sampled feature map back to the original spatial size. FCNs therefore can be trained in an image-to-segmentation map fashion and have computational efficiency over the CNN patch classifiers. One well recognized variant of FCNs, known as U-Net [100] and its related usage on brain tumor segmentation will be discussed later.

Jesson et al. [54]

extended the standard FCN by using a multi-scale loss function. One limitation of FCNs is that FCNs do not explicitly model the contexts in the label domain. The usage of multi-scale loss provides different resolutions and the FCN variant minimizes the multi-scale loss function by combining higher and lower resolutions to model the contexts both in image and label domains. In

[104], researchers proposed a boundary aware fully convolutional neural network, which includes two branches for up-sampling. The boundary detection branch aims to learn the boundary information of the whole tumor whilst considering it as a binary classification problem. The region detection branch learns to detect and classify sub-region classes of the tumor. The outputs from the two branches are concatenated and fed to a block of two convolutional layers with the softmax classification layer. The total loss function is


where is the set of weight parameters in the boundary-aware FCN. refers to the loss function of each branch. is the -th voxel in the -th image used for training, and refers to the predicted probability of voxel belonging to class .

One important mutant of FCNs is U-Net, reported in [100]. U-Net consists of a contracting path to capture contexts and a symmetric expanding path that enables precise localization. The expansive path is more or less symmetric to the contracting path. This usage leads to a U-shaped architecture. In U-Net, the network does not contain any fully connected layer and only uses the valid part of each convolution layer such as segmentation map. This strategy yields missing contexts when predicting the boundary of each image. In order to obtain the missing contexts, the input image is concatenated in a mirroring way. The vanilla structure of U-Net used for brain tumor segmentation is shown in Fig. 8.

Fig. 8: A high level comparison between different kinds of fully convolutional networks (FCNs).

One advantage of using U-Net, compared against traditional FCNs, is the skip connections between the contracting and the expanding paths. The skip connections pass the feature maps from the contracting path to the expanding path and concatenate feature maps from two paths directly. The original image data through skip connections can help the layers in the contracting path repair the details. Many research works have been proposed for brain tumor or lesion segmentation, based on U-Net. For example, Brosch et al. [12] used fully convolutional networks with skip connections to segment multiple sclerosis lesions. Isensee et al. [52] reported a modified U-Net for brain tumor segmentation, where the authors used a dice loss function and extensive data augmentation to successfully prevent over-fitting. In [36]

, the authors used zero padding to keep the output dimension for all the convolutional layers in both down-sampling and up-sampling paths. Dolz et al.

[35] introduced an extended U-Net with multiple input channels and dense concatenation for lesion detection. Other methods such as [60, 25, 32, 63] also extended U-Net for achieving better brain tumor segmentation results. Chang et al. [18]

reported a fully convolutional neural network with residual connections. Similar to skip connection, the residual connection in this paper allows both low-level and high-level feature maps to contribute towards the final classification.

In order to extract rich feature information from the original 3D volume data, Milletari et al. [89] introduced a modified 3D version of U-Net, called V-Net, with the customized Dice coefficient loss function. Beers et al. [9] introduced 3D U-Nets based on sequential tasks, which uses the entire tumor ground truth as an auxiliary channel to detect enhancing tumors and tumor cores. In the post-processing stage, the authors employed two additional U-Nets that serve to enhance the prediction for better classification outcomes. The input patches consist of seven channels: four anatomical MR and three label maps corresponding to the entire tumor, enhancing tumor, and tumor core.

Iii-A3 Cascaded CNN

Spatial information, such as spatial inclusion relationship, can be applied as prior auxiliary knowledge for image segmentation, e.g. a common property of biological tissues is that the core of a tumor is surrounded by extended edema tissues. Such kind of properties can be used either on the processing stage to segment sub-regions, or on the refining stage to remove false positive segmentation. Another problem is that the pixels of anomaly regions only occupy a low percentage, compared to the background pixels and normal tissue pixels. The imbalance between the positive pixel samples (anomaly regions, e.g., tumors and lesions) and negative pixel samples (background and normal tissue pixels) may introduce prediction biases to the training models. Using a sub-region hierarchy to reduce the imbalance noise, various cascaded network structures have been proposed for brain tumor sub-region segmentation. Wang et al. [118] presented cascaded convolutional neural networks using sub-region hierarchical information to decompose a multi-class segmentation problem into three binary segmentation problems. As shown in Fig. 9, the first network takes multi-modal MRI images as input and the segmentation mask with a bounding box of the entire tumor is obtained. Using the bounding box, the cropped slice is used as the input of the second network to segment the tumor core from the tumor. This procedure repeats again for the third network to segment the enhancing core from the tumor core. Bounding boxes and cropped slices are generated from the ground truth annotation for network training. In the testing stage, a binary segmentation output is inferred and used as the input for the next stage, which serves as the anatomical constraints for the segmentation. The authors used a stack of slices as input with a large receptive field in 2D and a relative small receptive field in the out-plane direction that is orthogonal to the 2D slices. In each network, dilated convolution and residual connections are used, where the dilated convolution kernel enlarges the receptive field and the residual connection allows a functional block to learn residual functions with reference to the input.

Method Dimension Structure Pros Cons
[129] 2D Single-Path CNN Pioneer work applying CNN on brain tumor segmentation. A basic CNN structure with limited performance.
[113] 2D Single-Path CNN Multi-model sequence as input. Simple CNN structure with large parameter size.
[12] 3D Single-Path CNN Pioneer work on using 3D CNN on brain tumor segmentation Need more parameters for processing additional dimension information.
[55] 2D Single-path CNN A work on uncertainty representation by combining Monte Carlo Dropout on parameters. Original FRRN designed for high resolution images, which is not suitable for low resolution medical images.
[93] 2D Stacked residual convolutional encoder and decoder First method using residual convolutional encoder-decoder structures Decoder layers generates segmentation maps with smooth boundary.
[82] 2D Dilated CNN First work using dilated convolutional kernels Limitation for tissue deformation learning
[123] 2D Multi-Path CNN Multi scale convolutional kernel A basic multi-path CNN with limited performance
[11] 2D Multi-Path CNN Each view as a input sequence, combine features from different view The combined feature may introduce additional parameters.
[54] 3D FCN FCN with multi-scale loss in 3D images Need more parameter for additional dimension information
[104] 2D FCN FCN with multi tasks Blurry segmentation results on sub-region segmentation results
[52] 2D U-Net Extended U-Net on brain tumor segmentation A basic extended version of U-Net on brain tumor segmentation
[118] 2D Cascaded CNN Precision segmentation on sub-regions Hard to train multi large CNNs
TABLE III: Comparison of CNN based methods.

The main advantage brought by cascaded structures is that it enables us to convert multi-class segmentation into sequential binary segmentation tasks with the flow of the cascaded outputs. Casamitjana et al. [15] reported two cascaded V-Nets by utilizing the region of interests (ROI) mask. This allows the training procedure to focus on relevant voxels. Chen et al. [24] described a cascaded classifier for multi-class segmentation. Liu et al. [77] used cascaded structures to achieve feature fusion from different standard feature extraction methods.

Fig. 9: The structure of cascaded convolutional networks for brain tumor segmentation, modified from the original structure reported in [118]. WNet, TNet and ENet are used for segmenting the whole tumor, tumor core and enhancing tumor core, respectively.

Iii-A4 Summary

Convolutional neural network based models have recently become a popular method in image processing. Convolutional kernels can help efficiently extract features of low or high dimensions and the pooling layer used in the networks can learn translation invariance and also reduce the system parameters’ number by reducing feature dimensionality. Various applications have shown research and practical values of using convolutional neural networks based models for brain tumor segmentation. As the extension of single path feed-forward CNNs, multi-path CNNs aim to extract different features from either a global or a local field, or different features from different modalities. Fully convolutional networks aim to reconstruct an image using deconvolutional layers. FCNs take input patches in any size and reduce the parameter size by replacing fully connected layers with deconvolutional layers. The main shortcoming of FCNs is that the result of the up-sampling operation tends to be blur and not sensitive to the small details of images, which limits their performance in medical image analysis. Cascaded networks can take sub-regions’ spatial relationships into account, although it is harder to train multiple sub-networks than training a single end-to-end network. Selected CNNs based methods comparison is shown in Table III.

Iii-B Recurrent Neural Network Based

Fig. 10:

Illustration of Gated Recurrent Units (GRU). Left: C-GRU on one dimensional graph. Right-top: 6 C-GRUs in a MD-GRU. Right-bottom: The network structure reported in


Recurrent Neural Networks (RNNs) are first created for handling the sequential process problems, where the current output depends on the current input and the representation of the previous inputs. One mutant of recurrent neural networks is Long Short-Term Memory (LSTM) networks. The historical information can be stored and processed over a long sequence inside the LSTM cell. For a two dimensional image, LSTM connects hidden units in grid-like four directions, i.e. up, down, left and right. Graves et al. [46] and Byeon et al. [13] reported pioneer works for applying Multi-Dimensional LSTMs to image classification and segmentation tasks. The LSTM units recursively gather information from the predeceasing LSTM units based on adjacent pixels. Rather than processing 3D voxel data using 2D slices, 3D LSTMs can directly process full voxel contexts through 8 adjacent voxels in 8 sweeps (each sweep is the direction of one of the 8 directed volume diagonals).

Gated Recurrent Units (GRU) is another mutant of RNNs using an update gate to combine the hidden and cell states instead of using a forget gate and the input gate, compared with LSTM. GRU request less memory and therefore it is more suitable for processing large volumetric data, which encourages the network to be deeper and larger for the same volume size. [4] used Multi-Dimensional Gated Recurrent Units for brain tumor segmentation. The cell block within Simon’s MD-GRU is convolutional GRU (C-GRU), shown in Figure 10. The C-GRU is defined as:


where is the input data, is the reset gate and is the update gate. is the element-wise multiplication.

is the logistic activation function and

is the activation function. and are the parameters for the current and the previous states’ output respectively. is the convolution operation. A Multi-Dimensional GRU(MD-GRU) contains C-GRU where is the dimension number of the input image or voxel data. Similar to [108] using parallel multi-dimensional LSTMs for brain MR segmentation, each C-GRU can only handle one direction and one dimensional signals. Prepossessing tumor data such as high-pass filtering and intensity normalization are used. Detailed structures of MD-GRU are shown in Fig. 10.

Le et al. [69] combined the recurrent fully convolutional network (RFCN) with variational level sets (VLS) for tumor segmentation. Unlike the previous studies using recurrent structures for pixel-wise adjacent information concatenation, in RFCN, the deconvolutional layer takes the previous convolutional layer’s output as input feature map. The output of the deconvolutional layer is then used as part of the input to the next convolutional layer via a skip connection, shown as , where denotes a deconvolutional layer with jointly learned parameter to up-sample the input feature map.

Method Dimension Structure Pros Cons
[4] 3D 3D GRU First work of using GRU on brain tumor segmentation Lack of hierarchical feature extraction
[69] 2D 2D Recurrent Level Sets End-to-end FCN with recurrent level sets. Large parameter size by introducing the recurrent level sets to increase the dimension
[124] 2D FCNN with CRF-RNN CRF as RNN to refine the results. Have to separate training stage of FCNN and CRF-RNN.
TABLE IV: Comparison of RNN based methods.

In order to remove false negatives in the refining stage while training the network in an end-to-end fashion, researchers [20, 21] have applied fully connected conditional random fields to the segmentation networks. Zheng et al. [125] reported that conditional random fields (CRFs) can be used as recurrent neural networks (CRF-RNN). The inference step within CRFs can be regarded as a sequence of recursions, which can be back-propagated as recurrent neural networks. Zhao et al. [124] first used conditional random fields as recurrent neural networks for post-processing the segmentation map. The authors use a FCNN for producing segmentation labels for the image pixels. Then the CRF-RNN takes the segmentation output and the original image as the inputs to produce a refined segmentation map using pixel intensity and position information. The training procedure can be divided into three steps: first randomly select image samples to avoid class-imbalance for training the FCNN, then train the CRF-RNN as FCNN parameters are fixed, finally fine-tune the entire network. By integrating FCNN with CRF-RNN, this method achieves promising performance on the dataset BRATS2015 with promising computational efficiency.

Iii-B1 Summary

Recurrent neural networks demonstrate effective power in handling time-sequence processing problems. LSTM and GRU allow neural networks to equip with the ability to filter historical information using the memory and forget gates. Existing RNN based models take the third dimension of 3D voxels as the time sequence axis, which extends the networks’ ability using spatial information. Selected RNNs based methods and their comparison are shown in Table IV.

Iii-C Deep Generative Model Based

Deep neural networks still face several challenges. First, like many supervised methods, deep neural networks hold the same hypothesis as the traditional machine learning techniques that can only handle the testing data with the same distribution as the training data. In real world datasets, there exists a discrepancy between the distributions of the testing and training data. This may lead to the bias prediction problem or model over-fitting. Secondly, the dataset labeling is labor consuming and time consuming, especially for a dense pixel-level segmentation dataset in medical applications. Another drawback is the majority of deep learning models focus on pixel level classification whilst ignoring the relation or connection between adjacent pixels. This may lead to high accuracy in pixel level classification but the segmentation result is inconsistent due to the variations in size and shape of the targets.

Iii-C1 Generative Adversarial Networks

As a rising sub-set of generative models, generative adversarial network (GAN) [44] was proposed to tackle the above issues. The generator and the discriminator play a min-max game to optimize the model which can produce the result close enough to the ground truth as expected. Lu et al. [84] proposed the pioneer work of applying GANs onto image semantic segmentation, where the discriminator needs to report the difference between the ground truth segmentation and the actual segmentation masks. The experiment results also show that this method reduces the over-fitting opportunity.

By holding the hypothesis that different image acquisitions represent different domains, Kamnitsas et al. [57] reported the first work of using a domain adaptation method based on adversarial neural networks for brain lesion segmentation, where the authors determine a 3D convolutional neural network as a segmenter and a second 3D convolutional neural network as the discriminator to classify the input. Given an input of an arbitrary size , the segmenter minimizes the cross-entropy of a training batch of samples

. During the segmentation, the feature map in the segmenter encodes a chosen hidden representation

. The domain discriminator intends to classify the source of input with the distribution or , equal to classifying the source domain of . The classification accuracy shows how well the source-specific representation is. Taking the adversarial training, the hidden representation should be domain-invariant. The joint training aims to maximize the domain classification loss whilst minimizing the segmentation loss :


where are the parameters of the segmenter, is the importance of the domain-adaptation task for the segmenter. Previous research on domain adaptation is quite different, for example, Tzeng et al. [112] only chose the last three fully connected layers to adapt. The authors of [57] built a multi-connected architecture between the discriminator and segmenter network layers. This brings the advantages of the gradient flows from discriminator loss function to the segmenter’s network layers. Also, this multi-connected mechanism benefits both the shallow layers’ feature extraction performance and the deep layers’ high-level feature learning ability. It results in competitive results that are close to the supervised segmentation results. More discussion on the architecture is shown in Fig. 11.

Fig. 11: Multi-connected adversarial networks proposed in ([57]). The segmenter is a 3D convolutional neural network, where the dashed lines represent the low resolution pathway. The discriminator is another 3D convolutional neural network to classify whether the input is from the source or target domain. The adversarial gradients flow through red lines from to the segmenter. Image courtesy of [57].

In spite of their performance [57] and [84], one main drawback is that the loss function of the discriminator generates a single integer. The discriminator either reports whether the input segmentation mask is original or generated, or informs whether or not the input sample comes from the same domain as the source database. With a single scalar or boolean output, the gradient flow of the discriminator’s loss function is insufficient for feature learning in the segmenter networks. Xue et al. [120] proposed an adversarial network with multi-scale loss function called segAN. The discriminator (originally called critic network) in segAN extracts hierarchical features and from the input image masked by the input segmentation labels from the ground truth and the segmenter network . The critic network aims to maximize the Mean Absolute Error (MAE) or distance between two hierarchical features while the segmenter generates a segmentation mask with minimized errors. Overall, the whole loss function of segAN can be defined as:




and are the parameters of the segmenter and the critic network respectively. is the mask process by pixel-wise multiplication. is the number of the total layers in the critic network, and is the extracted feature map of image at the th layer of . By using multi-scale feature loss, it forces both the segmenter and the critic network to learn hierarchical features of long and short spatial relationships between pixels. The results of segAN on BRATS 13 and BRATS 15 is competitive to those state-of-art supervised deep neural networks.

Iii-C2 Auto-Encoders

Another group of unsupervised brain tumor segmentation methods are based on auto-encoders (AEs). Generative adversarial networks optimize the generator that can learn a latent representation of random samples from a prior distribution where the optimized discriminator cannot correctly distinguish. Different from GANs, the encoder of AEs encodes the input into a lower dimensional latent variable and the decoder of AEs reconstructs to a reconstruction . The optimisation procedure can be undertaken via minimising the reconstruction loss , which commonly uses distance . With this ability, AEs are used for reconstructing images with high resolutions. By holding a hypothesis that tumor image pixels’ distribution is partly different from that of a healthy reconstruction image, tumor segmentation can be simply achieved by comparing the input image with the reconstructed healthy image. Recent research reported in [114]

uses stacked 3D denoising autoencoders for glioma detection and segmentation. One drawback of learning low dimensional representations and reconstruction is that it limits the ability of AEs to reconstruct the unseen data distributions, for instance, reconstructing the healthy brain image from the brain tumor image. In order to address this issue, variational autoencoders (VAE) combines stochastic inference with the AEs framework in order to approximate the healthy image model

from latent representation of tumor image . Other research methods have been proposed using autoencoders and their variants. [8] compared several sets of AEs and VAEs for brain lesion detection. The structure of AE reported in [8] is shown in Fig. 12. [22] used adversarial autoencoders to perform tumor segmentation. The authors of ([22]) pointed out that unsupervised models lack consistency in the latent space for mapping between the healthy reconstructions and the input tumor image. By adding a regularisation term , the auto-encoder loss becomes . controls the mapping of similar images in the latent space. [94] reported a Bayesian convolutional autoencoder as Monte Carlo estimation:

Method Dimension Structure Pros Cons
[57] 2D GAN GAN in domain adaptation A domain adaption method in segmentation task. The return from discriminator is a singular number.
[120] 2D CNN with adversarial training Multi-scale loss Hard to train in stability with multiple backbone network combination.
[99] 2D GAN Conditional GAN for brain tumor segmentation Hard to train in stable. Segmentation map contains noise.
[114] 2D AE Stacked Auto-encoders for brain tumor segmentation The segmentation result relies heavily on reconstruction result.
[94] 2D AE Bayesian Convolutional AE for brain tumor segmentation The encoder layer does not contain regularizing constraints.
TABLE V: Comparison of unsupervised methods.
Fig. 12: Auto-encoder based brain anomaly segmentation. The segmentation map was produced by the substraction between the reconstruction and the original input. Image courtesy of [8].

Iii-C3 Summary

With the capability of learning the latent representation and reconstructing, research focus on data augmentation by generating synthetic data using unsupervised models. Although recent unsupervised models have been proposed on image reconstruction and anomaly detection, there are still remaining issues within the unsupervised anomaly detection area. An overview (

[23]) of the current state of unsupervised medical image detection and segmentation models has been carried out. Possible improvement is that the tumor segmentation is based on the reconstruction error, which also can help to improve the pixel-wise probability estimation. Another issue is that an unsupervised model requires complicated post processing such as noise removal with manual settled thresholds, which may not be ideal for an automated framework. An adaptive way of post-processing should be developed. Selected deep generative model based methods and their comparison are shown in Table V.

Iii-D Ensemble Models

One main drawback of deep neural networks is the model’s performance and behavior can be influenced by architectural choices and training data. Therefore, most of the proposed deep neural networks only perform well on specific datasets and pre-settled tasks. This refers to a limited generalization capability of deep neural networks. For example, feed forward networks with a big kernel size can capture well spatial information while a small size kernel allows us to learn boundary features. Models with pixel oriented post processing can generate continuous tissue segmentation masks while other models may achieve better accuracy on pixel level classification. In order to build more robust and more generalized segmentation methods, several models’ outputs can be aggregated together with a high variance between each other, known as ensemble models. Ensembles of Multiple Models and Architectures (EMMA) [56] is one of the early well-structured works using ensemble deep neural network models for brain tumor segmentation. The research aim of EMMA is to approximate the process by model , where is the data, is the label and is the parameters coming from the choice of meta-parameters. This process can be trained and optimized by minimizing the distance between the target distribution and the models:

Fig. 13: This shows different distributions of true posterior (black), EMMA (red) and other individual models. EMMA was trained with different configuration settings like loss functions, noise and labels. With the suboptimal training, EMMA as an ensemble model can remove biases and approximate to the true data distribution. Image courtesy of [56].

The bias effect of meta-parameters choice can be marginalized:


There are other studies also focusing on ensemble models for tumor segmentation. [59] ensembled 26 neural networks for tumor segmentation and survival prediction. They use brain parcellation t produce location prior information for tumor segmentation. [14] reported a 3D U-Net to perform pre-segmentation and refined the pre-segmentation mask using the ensembles of 4 different CNN architecture, where all the sub-nets share the same meta-parameters with different architectures and weights. The aim of this process is to capture the bias of the networks whilst keeping the same input information.

Instead of using multiple models as different information extraction pipelines, Chen et al. [19] used one set of DeconvNets [91] to generate the primary segmentation probability map and another multi-scale Convolutional Label Evaluation Net is used to evaluate previously generated segmentation maps. False positives can be reduced using both the probability map and the original input image. Hu et al. [50] ensembled a 3D cascaded U-Net with a multi-modality fusion structure. The cascaded two-level U-Net aims to outline the boundary of tumors and the patch-based deep network associates tumor voxels with the predicted labels.

Iii-D1 Summary

Ensemble models are effective to support brain tumor segmentation. In spite of their contribution, there are several drawbacks to use ensemble models. First, their computational costs are very high. In using ensemble models, to avoid over-fitting, high variances between sub-models should be introduced by configuring and training sub-models in different ways. This leads to heavy loaded computational costs both in the training stage before the models can be adopted. Second, the voting scheme of concatenating sub-models lacks meaningful interpretations. Ensemble methods usually use an averaging scheme, or top-N scheme to calculate the confidence score of each voxel in each class. This may be further improved in optimizing the weight of each sub-model’s output by minimizing sub-model-level loss functions.

Iv Pre-, Post-processing and Data Augmentation

In this section, we discuss the related tasks and methods such as pre- and post-processing and data augmentation for brain tumor segmentation. These related tasks are important but also very challenging, which have gained large research attention in the last decade. For example, many end-to-end neural networks have been proposed for brain MRI skull stripping and registration. Apart from CRF-RNN that we mentioned before, conditional random fields and Markov Random Fields, as fully connected networks, have also been proposed for post-processing the segmentation results and achieving end-to-end network training. Finally, data augmentation have also been considered as a key step in avoiding over-fitting and improving the model’s performance due to possible class imbalance and limited dataset scale.

Iv-a Pre-Processing

Raw image data may not be directly used as an input during the training stage. This is due to the fact that raw image data contains irrelevant structures or severe noise, produced under different physical conditions or with various acquisition devices. This may affect the model’s training and lead to biased predictions eventually. Another issue is that the training class distributions are often imbalanced, which ends up with over-fitting problems [127].

Recent research studies have shown that manual skull stripping is time consuming. The intensity information collected from the skull can confuse the segmentation models. In the past few years, automated skull stripping methods have also received development ([61]). This benefits the downstream tasks in brain image analysis such as 3D brain reconstruction and brain tumor segmentation.

Registration Image co-registration (referred as image alignment or simply registration) is the process of transforming different sets of data by the same or reference coordinate. In MRI, image registration refers to the alignment and overlay of MRI data from a single subject with the subject’s own but separately acquired anatomic images. The anatomic study is usually derived from another MRI series obtained at the same session but could be from an entirely different imaging modality (e.g. PET or CT). The reference anatomic images are commonly acquired with high-resolution voxels (e.g., a size of 1mm 1mm

1mm). These voxels should be isotropic (perfect cubes and equal in each dimension) to allow the data to be rotated, re-sliced, and manipulated. Interpolation techniques are often used for re-sampling in order to retrieve high-resolution voxels, followed by rigid body transformations. A typical optimization protocol is sought to minimize the distance loss function between the current image data and the anatomic image data after each registration iteration has been completed. Accurate registration is important to image fusion and augmentation as intensity or feature overlap and differences can be influential. Klein et al.

[62] evaluated 15 algorithms in brain MRI registration. Recently, researchers from MIT power the speed of the registration implementation into 1000 times [29].

Normalization Similar to image registration, the purpose of normalization is to align and warp the current image data into a generic anatomic template [5]. Deep learning approaches usually intend to normalize the image intensity into the one with zero mean and unit variance [97].

Iv-B Post-Processing

Various post-processing methods have been proposed for removing false positives and enhancing segmentation results. Conventional post-processing methods such as threshold- or region-growing based methods use manual setting thresholds to highlight the isolated areas or pixels. Recently, conditional random fields (CRF) and Gaussian Markov Random Fields (MRF) have been used for post-processing by inferring the pixel pairs given prior information such as pixel intensity distributions and spatial distance [67, 101]. Recently, researchers combined CRF with neural networks in an end-to-end training fashion. CRF as fully connected networks [65] has been used for image segmentation tasks at the post-processing stage [21]. Kamnitsas et al. [58] extended the standard CRF to form a 3D version in order to process the multi-modal MRI scans by considering voxels’ neighboring information.

Iv-C Data Augmentation

Over-fitting is a common problem when we train neural networks. A universal explanation of over-fitting is that the modal is too complicated and its performance is much better in handling training datasets (very small train errors) than dealing with testing datasets (very high testing errors). Several ways can be used to prevent or reduce over-fitting such as using regularisation terms and drop out [107]. Another solution is that learning models usually require incredibly enormous data in order to handle over-fitting [105]. Due to various reasons, including labeling costs and patients’ privacy, specific datasets on brain tumors are in relatively small scales. Data augmentation is a popular means to enlarge the data scale by performing different changes to the original datasets. Data augmentation methods including flipping, cropping, scaling and shifting. U-Net [100] applied elastic deformation to medical image datasets. This allows the models to learn high invariance without any prior knowledge about these transformations.

V Datasets and Evaluation Metrics

We in this section discusses the relevant brain tumor datasets, which mainly come from the annual BraTS challenge. A detailed summary is denoted in Tables VI. Related challenge tasks and evaluation metrics are also discussed in this section.

Name Training Scan Num Testing Scan Num Training HGG Num Training LGG Num Sequences Resolution
BRATS2012 80 30 45 35 T1,T1c,T2,FLAIR 130 x 170 x 170
BRATS2013 30 25 20 10 T1,T1c,T2,FLAIR 240 x 240 x 155
BRATS2014 166 66 130 33 T1,T1c,T2,FLAIR 240 x 240 x 155
BRATS2015 274 110 220 54 T1,T1c,T2,FLAIR 240 x 240 x 150
BRATS2016 274 191 220 54 T1,T1c,T2,FLAIR 240 x 240 x 150
BRATS2017 285 146 210 75 T1,T1c,T2,FLAIR 240 x 240 x 155
BRATS2018 285 - 210 75 T1,T1c,T2,FLAIR 240 x 240 x 155
TABLE VI: Overview of related BraTS challenge datasets for brain tumor segmentation.

V-a Brain Tumor Imaging Datasets

One of the key contributions accompanying the development of deep learning models is the creation of large scale datasets. Neural networks with deep layers contain enormous parameters and large scale datasets can be used for avoiding over-fitting. Also, novel and well constructed datasets can push machine learning research forward in various areas. Compared against several recognised image or video datasets and challenges such as Imagenet

[31], PASCAL VOC [41] and MS COCO ([75]), brain tumor segmentation datasets are less comprehensive in either scale or size of content. The most widely used brain tumor segmentation dataset is the BraTS dataset alone with the MICCAI Multimodal Brain Tumor Segmentation Challenge [88, 7, 6]. Another popular challenge is Ischemic Stroke Lesion Segmentation (ISLES) challenge [85], which is held jointly with the BrainLes Workshop and the BraTS Challenge.

The BraTS challenge has been held annually since 2012. Each year’s challenge provides multimodal MRI scans aiming at glioblastoma (GBM or HGG for high-grade glioma) and lower grade glioma (LGG). The multimodal data includes 4 MRI sequences: (1) T1, (2) post-contrast T1 weighted (T1w), (3) T2 weighted (T2) and (4) Fluid-attenuated inversion recovery (FLAIR). The BraTS challenge 2018 provides 135 patients’ pre-operative scans in GBM and 108 patients’ pre-operative scans in LGG. All pre-operative scans are manually segmented to various kinds of glioma sub-regions by medical experts following the standard annotation protocols. The annotation label and the related sub-regions are shown in Fig. 2. The challenge separates the whole dataset into training, validation and testing folds. The latest BraTS challenge 2018 also provided Overall Survival data for the survival prediction task.

Starting from 2015, the ISLES challenge has drawn clinical and scientific attention to stroke lesion imaging analysis. ISLES provides multi-spectral MRI images aiming at sub-acute ischemic stroke lesion segmentation and other related tasks from 2015 to 2017. In the ISLES challenge 2018, 103 patients’ MRI DWI are included. Different diffusion maps such as cerebral blood volume (CBV), cerebral blood flow (CBF) and time to peak of the residue function (Tmax) serve as the input of the challenging dataset. Related to brain imaging analysis, the Open Access Series of Imaging Studies (OASIS) [86] is also established for providing cross-sectional or longitudinal brain MRI dataset for normal aging and Alzheimer’s disease research.

V-B Segmentation Related Tasks

For the BraTS challenge 2018, the main task is to produce the sub-regions segmentation maps of glioma in pre-operative MRI scans. The sub-regions considered for evaluation include: (1) ”enhancing tumor” (ET), (2) ”tumor core” (TC), and (3) ”whole tumor” (WT). Similar segmentation tasks were also proposed in the ISLES challenge 2018. Participants need to produce binary segmentation image maps so that other predictions can be made.

Another task in the BraTS 2018 challenge is to predict the overall survival of patients’ pre-operative MRI scans. Participants should use the produced segmentation maps in combination with the provided multimodal MRI data to extract image or radiomic features. Participants can also consider other image information such as intensity, morphologic, histogram-based and textural features, spatial information, and glioma diffusion properties extracted from the glioma growth models. The fusion of these characteristics can be used as the input to the machine learning components to predict the patients’ overall survival rates.

V-C Evaluation Metrics

There are mainly three types of metrics when we evaluate the performance of the segmentation algorithms: Dice Score 14, Sensitivity (True Positive Rate) 15, Specificity (True Negative Rate) 16 and Hausdorff Distance 17. Given a segmentation result with corresponding ground truth labels, shown in Fig. 14, the evaluation metrics are calculated as follows:

Fig. 14: Left-an example annotation of true segmentation and prediction. Image courtesy of ([88]). Right-an explanition of calculating Hausdorff score.

Dice score, Sensitivity and Specificity are the measures of pixel or voxel level overlapping of the segmented regions, which show how the proposed algorithm performs for the assignment of each pixel or voxel to the correct classes. Hausdorff distance calculates the distance between the segmentation boundaries shown in Fig. 14 (right), where returns the upper bound of the distance. Hausdorff distance demonstrates the segmentation performance on the region segmentation and boundary pixel classification.

Vi Future Directions

Although deep learning based methods have achieved significant improving performance on brain anomaly segmentation, there are still many challenges remaining to be solved. In this section, we briefly discuss several open issues and also point out potential directions for possible future work.

Vi-a Transfer Learning for Brain Anomaly Segmentation

Designing and training a deep neural network from scratch for a specific task is quite difficult and time consuming. Different structures and initialization processes can significantly influence the final performance of neural networks. Assuming that the distance between the target and source domains is close enough, transfer learning can be regarded as a solution for transferring the knowledge collected in the source domain and then fine-tune it in the target domain to achieve satisfactory performance. A commonly used transfer learning scheme is to using pre-trained models from the state-of-art algorithms. This provides an efficient way to use the information from brain tissue segmentation or other organ anomaly segmentation for brain tumor segmentation. In a recently proposed method

[122], the authors measure the relationship between two vision tasks. Similarly, computational examination over the relationship between medical imaging tasks may be a new application of neural networks with limited datasets.

Vi-B Model Interpretation

Considering the application values and related security issues, we believe that understanding how a model learns to segment and what different layers have learned is crucial to the development of robust segmentation methods. A detailed insightful interpretation may help boost the system performance whilst avoiding development failure in practice. A common observation is that the shallow layers learn low dimensional features such as edges and corners of an object while the deep layers learn high dimensional features. Examples are shown in Fig. 15, where neural networks learn to distinguish ventricles, CSF, white and grey matters [58]. This learning exercise is beneficial to lesion segmentation and is also in line with the early research of using the combination of prior image information [117, 128]. Recent research work such as [119] used attention mechanisms presents a new interpretation direction. This shows a new direction of using attention mechanisms to encode the tumor area and then fine-tuning the attention’s latent representation.

Recent research works also focus on how different layers and connections behave in a model. A recent study [72] focused on revealing how network architectures affect the loss landscape. Also, [39] evaluates how the long and short skip connections affect biomedical image processing tasks. Both studies demonstrate that the skip connections can help to smooth the loss landscape and make the convergence faster and more stable.

Fig. 15: Activation feature maps shown in ([58]) (left) and ([48]) (right) respectively. Left: First row represents the dataset and DeepMedic’s segmentation results. Second row shows the shallow layer’s feature map. Third row shows the deep layer’s feature map. Right: Randomly selected features of the first layer in global and local pathways. Features show that the local pathway focuses on edge detections while the global pathway focuses on local features.

Vi-C Segmentation with Inference

One main challenge in brain tumor and lesion segmentation research is that anomaly tissues can be anywhere within the brain with any shape or size. This largely limits the practice of non-rigid organ segmentation compared with the segmentation of rigid objects such as cars or buildings. Biological tissues have their own unique properties, e.g. tumor cores are surrounded by edema. These unique properties cannot only help to remove false positives in the segmentation, but also provoke inference using spatial information when learning features.

Vi-D Efficient and Accurate Segmentation

As a branch of medical image analysis, brain tumor segmentation holds potential values in both academic research and clinical applications. Also, robust segmentation methods can provide detailed information for patients’ surgery planning and treatment. The motivation behind clinical applications requires a high standard for brain tumor segmentation models related to patients’ health and life quality. A working and well defined evaluation protocol should be established. Moreover, many segmentation algorithms of using powerful GPUs to train and visualize medical images cannot be implemented in home-based computers. Therefore, future segmentation algorithms are expected to be efficient with powerful maintainability and extendibility.

Vi-E Dataset Contribution

As we mentioned before, large scale datasets play a crucial role in deep learning research. Sufficient large datasets with precise labeling lead to robust system performance. The dataset will influence patients’ diagnosis and surgery planning one way or the other. However, even compared against other medical datasets such as [51], current brain tumor segmentation datasets are relatively small and imbalanced. Dense pixel-level labeling requires tireless human expert efforts and this is very time consuming. Recently, researchers proposed deep learning assisted efficient interactive polygon annotation framework [17]. [1] proposed an extended version of the interactive polygon annotation framework called Polygon-RNN++ using self-critical training with policy gradients. Both these interactive polygon annotation frameworks aim towards instance detection and segmentation. In the future, we expect that such semi-automatic or fully automatic interactive annotation framework will be proposed for brain tumor segmentation. This task can be challenging as brain tumor MRI datasets are often in low quality. Tumor sub-region labeling makes this task more complicated.

Vii Conclusion

Applying various deep learning methods to brain tumor segmentation is an invaluable and challenging task. Automated image segmentation benefits many aspects due to a powerful feature learning ability of deep learning techniques. In this paper, we have investigated relevant deep learning based brain tumor segmentation methods and presented a comprehensive survey. We structurally categorized and summarised the deep learning based brain tumor segmentation methods. We have deeply investigated this task and discussed several key aspects such as methods’ pros and cons, pre- and post-processing, related dataset and evaluation metrics. We also predicted potential research directions at the end of this survey.


  • [1] D. Acuna, H. Ling, A. Kar, and S. Fidler (2018) Efficient interactive annotation of segmentation datasets with polygon-rnn++. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 859–868. Cited by: §VI-E.
  • [2] Z. Akkus, A. Galimzianova, A. Hoogi, D. L. Rubin, and B. J. Erickson (2017) Deep learning for brain mri segmentation: state of the art and future directions. Journal of digital imaging 30 (4), pp. 449–459. Cited by: §II-C, TABLE I.
  • [3] V. Alex, M. Safwan, and G. Krishnamurthi (2017) Automatic segmentation and overall survival prediction in gliomas using fully convolutional neural network and texture analysis. In International MICCAI Brainlesion Workshop, pp. 216–225. Cited by: TABLE II.
  • [4] S. Andermatt, S. Pezold, and P. C. Cattin (2017) Automated segmentation of multiple sclerosis lesions using multi-dimensional gated recurrent units. In International MICCAI Brainlesion Workshop, pp. 31–42. Cited by: TABLE II, Fig. 10, §III-B, TABLE IV.
  • [5] J. Ashburner, K. J. Friston, et al. (2003) Spatial normalization using basis functions. Human brain function 2. Cited by: §IV-A.
  • [6] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. Kirby, J. Freymann, K. Farahani, and C. Davatzikos (2017) Segmentation labels and radiomic features for the pre-operative scans of the tcga-gbm collection. The Cancer Imaging Archive 286. Cited by: §V-A.
  • [7] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. Farahani, and C. Davatzikos (2017) Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data 4, pp. 170117. Cited by: §V-A.
  • [8] C. Baur, B. Wiestler, S. Albarqouni, and N. Navab (2018) Deep autoencoding models for unsupervised anomaly segmentation in brain mr images. arXiv preprint arXiv:1804.04488. Cited by: TABLE II, Fig. 12, §III-C2.
  • [9] A. Beers, K. Chang, J. Brown, E. Sartor, C. Mammen, E. Gerstner, B. Rosen, and J. Kalpathy-Cramer (2017) Sequential 3d u-nets for biologically-informed brain tumor segmentation. arXiv preprint arXiv:1709.02967. Cited by: TABLE II, §III-A2.
  • [10] J. Bernal, K. Kushibar, D. S. Asfaw, S. Valverde, A. Oliver, R. Martí, and X. Lladó (2018) Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review. Artificial intelligence in medicine. Cited by: §II-C, TABLE I.
  • [11] A. Birenbaum and H. Greenspan (2017) Multi-view longitudinal cnn for multiple sclerosis lesion segmentation. Engineering Applications of Artificial Intelligence 65, pp. 111–118. Cited by: TABLE II, TABLE III.
  • [12] T. Brosch, L. Y. Tang, Y. Yoo, D. K. Li, A. Traboulsee, and R. Tam (2016) Deep 3d convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation. IEEE transactions on medical imaging 35 (5), pp. 1229–1239. Cited by: TABLE II, §III-A1, §III-A2, TABLE III.
  • [13] W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki (2015) Scene labeling with lstm recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3547–3555. Cited by: §III-B.
  • [14] M. Cabezas, S. Valverde, S. González-Villà, A. Clérigues, M. Salem, K. Kushibar, J. Bernal, A. Oliver, and X. Lladó (2018) Survival prediction using ensemble tumor segmentation and transfer learning. arXiv preprint arXiv:1810.04274. Cited by: TABLE II, §III-D.
  • [15] A. Casamitjana, M. Catà, I. Sánchez, M. Combalia, and V. Vilaplana (2017) Cascaded v-net using roi masks for brain tumor segmentation. In International MICCAI Brainlesion Workshop, pp. 381–391. Cited by: TABLE II, §III-A3.
  • [16] L. S. Castillo, L. A. Daza, L. C. Rivera, and P. Arbeláez (2017) Brain tumor segmentation and parsing on mris using multiresolution neural networks. In International MICCAI Brainlesion Workshop, pp. 332–343. Cited by: TABLE II.
  • [17] L. Castrejon, K. Kundu, R. Urtasun, and S. Fidler (2017) Annotating object instances with a polygon-rnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5230–5238. Cited by: §VI-E.
  • [18] P. D. Chang (2016) Fully convolutional deep residual neural networks for brain tumor segmentation. In International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pp. 108–118. Cited by: TABLE II, §III-A2.
  • [19] L. Chen, P. Bentley, and D. Rueckert (2017) Fully automatic acute ischemic lesion segmentation in dwi using convolutional neural networks. NeuroImage: Clinical 15, pp. 633–643. Cited by: TABLE II, §III-D.
  • [20] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062. Cited by: §III-B.
  • [21] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40 (4), pp. 834–848. Cited by: §III-B, §IV-B.
  • [22] X. Chen and E. Konukoglu (2018) Unsupervised detection of lesions in brain mri using constrained adversarial auto-encoders. arXiv preprint arXiv:1806.04972. Cited by: TABLE II, §III-C2.
  • [23] X. Chen, N. Pawlowski, M. Rajchl, B. Glocker, and E. Konukoglu (2018) Deep generative models in the real-world: an open challenge from medical imaging. arXiv preprint arXiv:1806.05452. Cited by: §III-C3.
  • [24] X. Chen, J. H. Liew, W. Xiong, C. Chui, and S. Ong (2018) Focus, segment and erase: an efficient network for multi-label brain tumor segmentation. In European Conference on Computer Vision, pp. 674–689. Cited by: TABLE II, §III-A3.
  • [25] Y. Chen, Z. Cao, C. Cao, J. Yang, and J. Zhang (2018) A modified u-net for brain mr image segmentation. In International Conference on Cloud Computing and Security, pp. 233–242. Cited by: TABLE II, §III-A2.
  • [26] Y. Choi, Y. Kwon, H. Lee, B. J. Kim, M. C. Paik, and J. Won (2016) Ensemble of deep convolutional neural networks for prognosis of ischemic stroke. In International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pp. 231–243. Cited by: TABLE II, §III-A1.
  • [27] M. Cicero, A. Bilbily, E. Colak, T. Dowdell, B. Gray, K. Perampaladas, and J. Barfett (2017) Training and validating a deep convolutional neural network for computer-aided detection and classification of abnormalities on frontal chest radiographs. Investigative radiology 52 (5), pp. 281–287. Cited by: §III.
  • [28] R. R. Colmeiro, C. Verrastro, and T. Grosges (2017) Multimodal brain tumor segmentation using 3d convolutional networks. In International MICCAI Brainlesion Workshop, pp. 226–240. Cited by: TABLE II.
  • [29] A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu (2018) Unsupervised learning for fast probabilistic diffeomorphic registration. arXiv preprint arXiv:1805.04605. Cited by: §IV-A.
  • [30] A. de Brebisson and G. Montana (2015) Deep neural networks for anatomical brain segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28. Cited by: §II-B.
  • [31] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248–255. Cited by: §V-A.
  • [32] Y. Deng, Y. Sun, Y. Zhu, M. Zhu, and K. Yuan (2018) A strategy of mr brain tissue images’ suggestive annotation based on modified u-net. arXiv preprint arXiv:1807.07510. Cited by: TABLE II, §III-A2.
  • [33] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §III.
  • [34] K. Doi (2007) Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Computerized medical imaging and graphics 31 (4-5), pp. 198–211. Cited by: §I.
  • [35] J. Dolz, I. B. Ayed, and C. Desrosiers (2018) Dense multi-path u-net for ischemic stroke lesion segmentation in multiple image modalities. arXiv preprint arXiv:1810.07003. Cited by: TABLE II, §III-A2.
  • [36] H. Dong, G. Yang, F. Liu, Y. Mo, and Y. Guo (2017) Automatic brain tumor detection and segmentation using u-net based fully convolutional networks. In Annual Conference on Medical Image Understanding and Analysis, pp. 506–517. Cited by: TABLE II, §III-A2.
  • [37] Q. Dou, H. Chen, L. Yu, L. Shi, D. Wang, V. C. Mok, and P. A. Heng (2015) Automatic cerebral microbleeds detection from mr images via independent subspace analysis based hierarchical features. In Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, pp. 7933–7936. Cited by: §II-B.
  • [38] Q. Dou, H. Chen, L. Yu, L. Zhao, J. Qin, D. Wang, V. C. Mok, L. Shi, and P. Heng (2016) Automatic detection of cerebral microbleeds from mr images via 3d convolutional neural networks. IEEE transactions on medical imaging 35 (5), pp. 1182–1195. Cited by: §II-B.
  • [39] M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal (2016) The importance of skip connections in biomedical image segmentation. In Deep Learning and Data Labeling for Medical Applications, pp. 179–187. Cited by: §VI-B.
  • [40] A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui, G. Corrado, S. Thrun, and J. Dean (2019) A guide to deep learning in healthcare. Nature Medicine 25 (1), pp. 24–29. External Links: ISSN 1546-170X Cited by: §II-C, TABLE I.
  • [41] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2), pp. 303–338. Cited by: §V-A.
  • [42] M. Ghafoorian, N. Karssemeijer, T. Heskes, M. Bergkamp, J. Wissink, J. Obels, K. Keizer, F. de Leeuw, B. van Ginneken, E. Marchiori, et al. (2017) Deep multi-scale location-aware 3d convolutional neural networks for automated detection of lacunes of presumed vascular origin. NeuroImage: Clinical 14, pp. 391–399. Cited by: §II-B.
  • [43] Z. Ghahramani (2015) Probabilistic machine learning and artificial intelligence. Nature 521 (7553), pp. 452. Cited by: §II-C, TABLE I.
  • [44] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §III-C1.
  • [45] N. Gordillo, E. Montseny, and P. Sobrevilla (2013) State of the art survey on mri brain tumor segmentation. Magnetic resonance imaging 31 (8), pp. 1426–1438. Cited by: §II-C, TABLE I.
  • [46] A. Graves and J. Schmidhuber (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in neural information processing systems, pp. 545–552. Cited by: §III-B.
  • [47] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, et al. (2018) Recent advances in convolutional neural networks. Pattern Recognition 77, pp. 354–377. Cited by: §II-C, TABLE I.
  • [48] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P. Jodoin, and H. Larochelle (2017) Brain tumor segmentation with deep neural networks. Medical image analysis 35, pp. 18–31. Cited by: §III-A1, Fig. 15.
  • [49] M. Havaei, N. Guizard, N. Chapados, and Y. Bengio (2016) HeMIS: hetero-modal image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 469–477. Cited by: TABLE II, §III-A1.
  • [50] Y. Hu and Y. Xia (2017) 3D deep neural network-based brain tumor segmentation using multimodality magnetic resonance sequences. In International MICCAI Brainlesion Workshop, pp. 423–434. Cited by: TABLE II, §III-D.
  • [51] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya, J. Seekins, D. A. Mong, S. S. Halabi, J. K. Sandberg, R. Jones, D. B. Larson, C. P. Langlotz, B. N. Patel, M. P. Lungren, and A. Y. Ng (2019-01) CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. arXiv e-prints, pp. arXiv:1901.07031. External Links: 1901.07031 Cited by: §VI-E.
  • [52] F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein (2017) Brain tumor segmentation and radiomics survival prediction: contribution to the brats 2017 challenge. In International MICCAI Brainlesion Workshop, pp. 287–297. Cited by: TABLE II, §III-A2, TABLE III.
  • [53] M. Islam and H. Ren (2017) Multi-modal pixelnet for brain tumor segmentation. In International MICCAI Brainlesion Workshop, pp. 298–308. Cited by: TABLE II.
  • [54] A. Jesson and T. Arbel (2017) Brain tumor segmentation using a 3d fcn with multi-scale loss. In International MICCAI Brainlesion Workshop, pp. 392–402. Cited by: TABLE II, §III-A2, TABLE III.
  • [55] A. Jungo, R. McKinley, R. Meier, U. Knecht, L. Vera, J. Pérez-Beteta, D. Molina-García, V. M. Pérez-García, R. Wiest, and M. Reyes (2017) Towards uncertainty-assisted brain tumor segmentation and survival prediction. In International MICCAI Brainlesion Workshop, pp. 474–485. Cited by: TABLE II, TABLE III.
  • [56] K. Kamnitsas, W. Bai, E. Ferrante, S. McDonagh, M. Sinclair, N. Pawlowski, M. Rajchl, M. Lee, B. Kainz, D. Rueckert, et al. (2017) Ensembles of multiple models and architectures for robust brain tumour segmentation. In International MICCAI Brainlesion Workshop, pp. 450–462. Cited by: TABLE II, Fig. 13, §III-D.
  • [57] K. Kamnitsas, C. Baumgartner, C. Ledig, V. Newcombe, J. Simpson, A. Kane, D. Menon, A. Nori, A. Criminisi, D. Rueckert, et al. (2017) Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In International Conference on Information Processing in Medical Imaging, pp. 597–609. Cited by: TABLE II, Fig. 11, §III-C1, §III-C1, §III-C1, TABLE V.
  • [58] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker (2017) Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis 36, pp. 61–78. Cited by: TABLE II, §III-A1, §IV-B, Fig. 15, §VI-B.
  • [59] P. Kao, T. Ngo, A. Zhang, J. Chen, and B. Manjunath (2018) Brain tumor segmentation and tractographic feature extraction from structural mr images for overall survival prediction. arXiv preprint arXiv:1807.07716. Cited by: TABLE II, §III-D.
  • [60] G. Kim (2017) Brain tumor segmentation using deep u-net. In MICCAI, Cited by: TABLE II, §III-A2.
  • [61] J. Kleesiek, G. Urban, A. Hubert, D. Schwarz, K. Maier-Hein, M. Bendszus, and A. Biller (2016) Deep mri brain extraction: a 3d convolutional neural network for skull stripping. NeuroImage 129, pp. 460–469. Cited by: §IV-A.
  • [62] A. Klein, J. Andersson, B. A. Ardekani, J. Ashburner, B. Avants, M. Chiang, G. E. Christensen, D. L. Collins, J. Gee, P. Hellier, et al. (2009) Evaluation of 14 nonlinear deformation algorithms applied to human brain mri registration. Neuroimage 46 (3), pp. 786–802. Cited by: §IV-A.
  • [63] X. Kong, G. Sun, Q. Wu, J. Liu, and F. Lin (2018) Hybrid pyramid u-net model for brain tumor segmentation. In International Conference on Intelligent Information Processing, pp. 346–355. Cited by: TABLE II, §III-A2.
  • [64] T. Kooi, B. van Ginneken, N. Karssemeijer, and A. den Heeten (2017) Discriminating solitary cysts from soft tissue lesions in mammography using a pretrained deep convolutional neural network. Medical physics 44 (3), pp. 1017–1027. Cited by: §III.
  • [65] P. Krähenbühl and V. Koltun (2011) Efficient inference in fully connected crfs with gaussian edge potentials. In Advances in neural information processing systems, pp. 109–117. Cited by: §IV-B.
  • [66] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: 1st item.
  • [67] J. Lafferty, A. McCallum, and F. C. Pereira (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Cited by: §IV-B.
  • [68] M. Lavin and M. Nathan (1998-June 30) System and method for managing patient medical records. Google Patents. Note: US Patent 5,772,585 Cited by: §I.
  • [69] T. H. N. Le, R. Gummadi, and M. Savvides (2018) Deep recurrent level set for segmenting brain tumors. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 646–653. Cited by: TABLE II, §III-B, TABLE IV.
  • [70] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §II-C, TABLE I.
  • [71] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel (1989) Backpropagation applied to handwritten zip code recognition. Neural computation 1 (4), pp. 541–551. Cited by: 1st item.
  • [72] H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein (2018) Visualizing the loss landscape of neural nets. In Advances in Neural Information Processing Systems, pp. 6389–6399. Cited by: §VI-B.
  • [73] Y. Li and L. Shen (2017) Deep learning based multimodal brain tumor diagnosis. In International MICCAI Brainlesion Workshop, pp. 149–158. Cited by: TABLE II.
  • [74] Z. Li, Y. Wang, and J. Yu (2017) Brain tumor segmentation using an adversarial network. In International MICCAI Brainlesion Workshop, pp. 123–132. Cited by: TABLE II.
  • [75] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In European conference on computer vision, pp. 740–755. Cited by: §V-A.
  • [76] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. Van Ginneken, and C. I. Sánchez (2017) A survey on deep learning in medical image analysis. Medical image analysis 42, pp. 60–88. Cited by: §I, §II-C, TABLE I.
  • [77] J. Liu, F. Chen, C. Pan, M. Zhu, X. Zhang, L. Zhang, and H. Liao (2018) A cascaded deep convolutional neural network for joint segmentation and genotype prediction of brainstem gliomas. IEEE Transactions on Biomedical Engineering. Cited by: TABLE II, §III-A3.
  • [78] J. Liu, M. Li, J. Wang, F. Wu, T. Liu, and Y. Pan (2014) A survey of mri-based brain tumor segmentation methods. Tsinghua Science and Technology 19 (6), pp. 578–595. Cited by: §II-C, TABLE I.
  • [79] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and M. Pietikäinen (2018) Deep learning for generic object detection: a survey. arXiv preprint arXiv:1809.02165. Cited by: §II-C, TABLE I, §III.
  • [80] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg (2016) Ssd: single shot multibox detector. In European conference on computer vision, pp. 21–37. Cited by: 3rd item.
  • [81] J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §III-A2.
  • [82] M. M. Lopez and J. Ventura (2017) Dilated convolutions for brain tumor segmentation in mri scans. In International MICCAI Brainlesion Workshop, pp. 253–262. Cited by: TABLE II, TABLE III.
  • [83] D. N. Louis, A. Perry, G. Reifenberger, A. Von Deimling, D. Figarella-Branger, W. K. Cavenee, H. Ohgaki, O. D. Wiestler, P. Kleihues, and D. W. Ellison (2016) The 2016 world health organization classification of tumors of the central nervous system: a summary. Acta neuropathologica 131 (6), pp. 803–820. Cited by: §I.
  • [84] P. Luc, C. Couprie, S. Chintala, and J. Verbeek (2016) Semantic segmentation using adversarial networks. In NIPS Workshop on Adversarial Training, Cited by: TABLE II, 3rd item, §III-C1, §III-C1.
  • [85] O. Maier, B. H. Menze, J. von der Gablentz, L. Häni, M. P. Heinrich, M. Liebrand, S. Winzeck, A. Basit, P. Bentley, L. Chen, et al. (2017) ISLES 2015-a public evaluation benchmark for ischemic stroke lesion segmentation from multispectral mri. Medical image analysis 35, pp. 250–269. Cited by: §V-A.
  • [86] D. S. Marcus, A. F. Fotenos, J. G. Csernansky, J. C. Morris, and R. L. Buckner (2010) Open access series of imaging studies: longitudinal mri data in nondemented and demented older adults. Journal of cognitive neuroscience 22 (12), pp. 2677–2684. Cited by: §V-A.
  • [87] R. McKinley, A. Jungo, R. Wiest, and M. Reyes (2017) Pooling-free fully convolutional networks with dense skip connections for semantic segmentation, with application to brain tumor segmentation. In International MICCAI Brainlesion Workshop, pp. 169–177. Cited by: TABLE II.
  • [88] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest, et al. (2015) The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34 (10), pp. 1993. Cited by: Fig. 14, §V-A.
  • [89] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 3D Vision (3DV), 2016 Fourth International Conference on, pp. 565–571. Cited by: TABLE II, §III-A2.
  • [90] P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J. Benders, and I. Išgum (2016) Automatic segmentation of mr brain images with a convolutional neural network. IEEE transactions on medical imaging 35 (5), pp. 1252–1261. Cited by: §III-A1.
  • [91] H. Noh, S. Hong, and B. Han (2015) Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pp. 1520–1528. Cited by: TABLE II, §III-D.
  • [92] B. Patenaude, S. M. Smith, D. N. Kennedy, and M. Jenkinson (2011) A bayesian model of shape and appearance for subcortical brain segmentation. Neuroimage 56 (3), pp. 907–922. Cited by: §II-B.
  • [93] K. Pawar, Z. Chen, N. J. Shah, and G. Egan (2017) Residual encoder and convolutional decoder neural network for glioma segmentation. In International MICCAI Brainlesion Workshop, pp. 263–273. Cited by: TABLE II, TABLE III.
  • [94] N. Pawlowski, M. C. Lee, M. Rajchl, S. McDonagh, E. Ferrante, K. Kamnitsas, S. Cooke, S. Stevenson, A. Khetani, T. Newman, et al. (2018) Unsupervised lesion detection in brain ct using bayesian convolutional autoencoders. Cited by: TABLE II, §III-C2, TABLE V.
  • [95] S. Pereira, A. Pinto, V. Alves, and C. A. Silva (2016) Brain tumor segmentation using convolutional neural networks in mri images. IEEE transactions on medical imaging 35 (5), pp. 1240–1251. Cited by: §III-A1.
  • [96] W. H. Pinaya, A. Gadelha, O. M. Doyle, C. Noto, A. Zugman, Q. Cordeiro, A. P. Jackowski, R. A. Bressan, and J. R. Sato (2016)

    Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

    Scientific reports 6, pp. 38897. Cited by: §II-B.
  • [97] J. T. Pontalba, T. Gwynne, K. Jakate, D. Androutsos, and A. Khademi (2019) Assessing the impact of colour normalization in convolutional neural network-based nuclei segmentation frameworks. Frontiers in Bioengineering and Biotechnology 7, pp. 300. Cited by: §IV-A.
  • [98] R. Pourreza, Y. Zhuge, H. Ning, and R. Miller (2017) Brain tumor segmentation in mri scans using deeply-supervised neural networks. In International MICCAI Brainlesion Workshop, pp. 320–331. Cited by: TABLE II.
  • [99] M. Rezaei, K. Harmuth, W. Gierke, T. Kellermeier, M. Fischer, H. Yang, and C. Meinel (2017) A conditional adversarial network for semantic segmentation of brain tumor. In International MICCAI Brainlesion Workshop, pp. 241–252. Cited by: TABLE II, TABLE V.
  • [100] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §III-A2, §III-A2, §IV-C.
  • [101] H. Rue and L. Held (2005) Gaussian markov random fields: theory and applications. CRC press. Cited by: §IV-B.
  • [102] S. Sedlar (2017) Brain tumor segmentation using a multi-path cnn based method. In International MICCAI Brainlesion Workshop, pp. 403–422. Cited by: TABLE II, §III-A1.
  • [103] M. Shaikh, G. Anand, G. Acharya, A. Amrutkar, V. Alex, and G. Krishnamurthi (2017) Brain tumor segmentation using dense fully convolutional neural network. In International MICCAI Brainlesion Workshop, pp. 309–319. Cited by: TABLE II.
  • [104] H. Shen, R. Wang, J. Zhang, and S. J. McKenna (2017) Boundary-aware fully convolutional network for brain tumor segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 433–441. Cited by: TABLE II, §III-A2, TABLE III.
  • [105] P. Y. Simard, D. Steinkraus, and J. C. Platt (2003) Best practices for convolutional neural networks applied to visual document analysis. In null, pp. 958. Cited by: §IV-C.
  • [106] M. Soltaninejad, L. Zhang, T. Lambrou, G. Yang, N. Allinson, and X. Ye (2017)

    MRI brain tumor segmentation and patient survival prediction using random forests and fully convolutional networks

    In International MICCAI Brainlesion Workshop, pp. 204–215. Cited by: TABLE II.
  • [107] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: §IV-C.
  • [108] M. F. Stollenga, W. Byeon, M. Liwicki, and J. Schmidhuber (2015) Parallel multi-dimensional lstm, with application to fast biomedical volumetric image segmentation. In Advances in neural information processing systems, pp. 2998–3006. Cited by: §III-B.
  • [109] H. Suk and D. Shen (2016) Deep ensemble sparse regression network for alzheimer’s disease diagnosis. In International Workshop on Machine Learning in Medical Imaging, pp. 113–121. Cited by: §II-B.
  • [110] H. Suk, C. Wee, S. Lee, and D. Shen (2016) State-space model with deep learning for functional dynamics estimation in resting-state fmri. NeuroImage 129, pp. 292–307. Cited by: §II-B.
  • [111] R. H. Taylor, A. Menciassi, G. Fichtinger, P. Fiorini, and P. Dario (2016) Medical robotics and computer-integrated surgery. In Springer handbook of robotics, pp. 1657–1684. Cited by: §I.
  • [112] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474. Cited by: §III-C1.
  • [113] G. Urban, M. Bendszus, F. Hamprecht, and J. Kleesiek (2014) Multi-modal brain tumor segmentation using deep convolutional neural networks. MICCAI BraTS (Brain Tumor Segmentation) Challenge. Proceedings, winning contribution, pp. 31–35. Cited by: TABLE II, §III-A1, TABLE III.
  • [114] K. Vaidhya, S. Thirunavukkarasu, V. Alex, and G. Krishnamurthi (2015) Multi-modal brain tumor segmentation using stacked denoising autoencoders. In International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pp. 181–194. Cited by: TABLE II, 3rd item, §III-C2, TABLE V.
  • [115] S. Valverde, M. Cabezas, E. Roura, S. González-Villà, D. Pareto, J. C. Vilanova, L. Ramio-Torrenta, À. Rovira, A. Oliver, and X. Lladó (2017) Improving automated multiple sclerosis lesion segmentation with a cascaded 3d convolutional neural network approach. NeuroImage 155, pp. 159–168. Cited by: TABLE II.
  • [116] H. K. van der Burgh, R. Schmidt, H. Westeneng, M. A. de Reus, L. H. van den Berg, and M. P. van den Heuvel (2017) Deep learning predictions of survival based on mri in amyotrophic lateral sclerosis. NeuroImage: Clinical 13, pp. 361–369. Cited by: §II-B.
  • [117] K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens (1999) Automated model-based tissue classification of mr images of the brain. IEEE transactions on medical imaging 18 (10), pp. 897–908. Cited by: §VI-B.
  • [118] G. Wang, W. Li, S. Ourselin, and T. Vercauteren (2017) Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks. In International MICCAI Brainlesion Workshop, pp. 178–190. Cited by: TABLE II, Fig. 9, §III-A3, TABLE III.
  • [119] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio (2015) Show, attend and tell: neural image caption generation with visual attention. In International conference on machine learning, pp. 2048–2057. Cited by: §VI-B.
  • [120] Y. Xue, T. Xu, H. Zhang, L. R. Long, and X. Huang (2018) Segan: adversarial network with multi-scale l 1 loss for medical image segmentation. Neuroinformatics, pp. 1–10. Cited by: TABLE II, §III-C1, TABLE V.
  • [121] Y. Yoo, L. W. Tang, T. Brosch, D. K. Li, L. Metz, A. Traboulsee, and R. Tam (2016) Deep learning of brain lesion patterns for predicting future disease activity in patients with early symptoms of multiple sclerosis. In Deep Learning and Data Labeling for Medical Applications, pp. 86–94. Cited by: §II-B.
  • [122] A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese (2018) Taskonomy: disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3712–3722. Cited by: §VI-A.
  • [123] L. Zhao and K. Jia (2016) Multiscale cnns for brain tumor segmentation and diagnosis. Computational and mathematical methods in medicine 2016. Cited by: TABLE II, §III-A1, TABLE III.
  • [124] X. Zhao, Y. Wu, G. Song, Z. Li, Y. Zhang, and Y. Fan (2018) A deep learning model integrating fcnns and crfs for brain tumor segmentation. Medical image analysis 43, pp. 98–111. Cited by: TABLE II, §III-B, TABLE IV.
  • [125] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr (2015) Conditional random fields as recurrent neural networks. In Proceedings of the IEEE international conference on computer vision, pp. 1529–1537. Cited by: §III-B.
  • [126] F. Zhou, T. Li, H. Li, and H. Zhu (2017) TPCNN: two-phase patch-based convolutional neural network for automatic brain tumor segmentation and survival prediction. In International MICCAI Brainlesion Workshop, pp. 274–286. Cited by: TABLE II.
  • [127] Z. Zhou and X. Liu (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18 (1), pp. 63–77. Cited by: §IV-A.
  • [128] D. Zikic, B. Glocker, E. Konukoglu, A. Criminisi, C. Demiralp, J. Shotton, O. M. Thomas, T. Das, R. Jena, and S. J. Price (2012) Decision forests for tissue-specific segmentation of high-grade gliomas in multi-channel mr. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 369–376. Cited by: §VI-B.
  • [129] D. Zikic, Y. Ioannou, M. Brown, and A. Criminisi (2014) Segmentation of brain tumor tissues with convolutional neural networks. Proceedings MICCAI-BRATS, pp. 36–39. Cited by: TABLE II, §III-A1, TABLE III.