Semi-supervised Learning using Denoising Autoencoders for Brain Lesion Detection and Segmentation

by   Varghese Alex, et al.

The work presented explores the use of denoising autoencoders (DAE) for brain lesion detection, segmentation and false positive reduction. Stacked denoising autoencoders (SDAE) were pre-trained using a large number of unlabeled patient volumes and fine tuned with patches drawn from a limited number of patients (n=20, 40, 65). The results show negligible loss in performance even when SDAE was fine tuned using 20 patients. Low grade glioma (LGG) segmentation was achieved using a transfer learning approach wherein a network pre-trained with High Grade Glioma (HGG) data was fine tuned using LGG image patches. The weakly supervised SDAE (for HGG) and transfer learning based LGG network were also shown to generalize well and provide good segmentation on unseen BraTS 2013 & BraTS 2015 test data. An unique contribution includes a single layer DAE, referred to as novelty detector(ND). ND was trained to accurately reconstruct non-lesion patches using a mean squared error loss function. The reconstruction error maps of test data were used to identify regions containing lesions. The error maps were shown to assign unique error distributions to various constituents of the glioma, enabling localization. The ND learns the non-lesion brain accurately as it was also shown to provide good segmentation performance on ischemic brain lesions in images from a different database.



There are no comments yet.


page 6

page 7

page 11

page 12


Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning

In this report, we introduce the outline of our system in Task 3: Diseas...

Self-Supervised Learning from Unlabeled Fundus Photographs Improves Segmentation of the Retina

Fundus photography is the primary method for retinal imaging and essenti...

Automatic acute ischemic stroke lesion segmentation using semi-supervised learning

Ischemic stroke is a common disease in the elderly population, which can...

Transfer Learning Enhanced Generative Adversarial Networks for Multi-Channel MRI Reconstruction

Deep learning based generative adversarial networks (GAN) can effectivel...

Tumor Delineation For Brain Radiosurgery by a ConvNet and Non-Uniform Patch Generation

Deep learning methods are actively used for brain lesion segmentation. O...

Semi-Supervised Multi-Organ Segmentation through Quality Assurance Supervision

Human in-the-loop quality assurance (QA) is typically performed after me...

Feature extraction with regularized siamese networks for outlier detection: application to lesion screening in medical imaging

Computer aided diagnosis (CAD) systems are designed to assist clinicians...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

This paper Gliomas are a type of primary brain tumor that affect the glial cells in the brain. Based on severity, gliomas are further divided to HGG and LGG. Automatic segmentation of Gliomas from MRI, a preliminary step for treatment planning and determining disease progression, is a challenging task due to heterogeneity of tissue within the lesion, non uniform dynamic range of MR images and diffused borders of tumors. Furthermore multiple MRI sequences, namely T1, T2, FLAIR and T1 post contrast (T1c) are required for accurate segmentation. These sequences provide complementary information about the lesion. For e.g., T2 weighted sequence and FLAIR helps in segmenting the gross tumor while T1 post contrast sequence helps to delineate the enhancing tumor and necrotic region from gross lesion. A fully automated image segmentation pipeline is thus necessary for evaluating large number of patients across multiple centers.

I-a Literature Survey

In the recent past, various fully automated techniques have been proposed to segment Gliomas and they can be broadly classified as either generative or discriminative techniques


. Generative techniques model the joint distribution of the voxel classes and voxel specific features. A typical approach is to register the images onto an probabilistic atlas


. An atlas represents a normal healthy brain and comprises of white matter, gray matter, ventricles, brain stem etc. Following registration various techniques have been developed to classify the tumor as an outlier/additional class. For e.g.


used Covariance Determinant estimator to detect outliers followed by further segmentation using K-Means algorithm. Since the presence of large tumors or resectional cavities alter the structure of brain, the performance of generative models can be impacted by the registration technique used to align images and spatial priors

[6]. Overall, as stated in [1], [5]-[8], generative techniques perform well on unseen data. A recent work [6] on a hybrid generative/discriminative model for glioma segmentation shows a boost in performance by combining a generative and discriminative approach.

Discriminative techniques model/determine the class conditional distribution given the image features for eg. voxel intensities. Discriminative techniques such as Random Forest


and Support Vector Machines

[17], [18]

have been applied to Brain Tumor segmentation. These techniques are suited for multiclass problems and uses hand coded features such as mean, median, skewness, symmetric of the brain to name a few, to classify voxels. Discriminative techniques tend to misclassify certain voxels as lesion at anatomically and physiologically unlikely locations since each voxel is modeled to be independent from its neighbouring voxels

[23]. However, Conditional Random Fields and Markov Random fields can be used to regularize the segmentation and could lead to improved results. The overall performance of discriminative techniques in general would depend on the quality of the computed features.

In the past decade, deep learning techniques such as Deep Belief networks, Convolutional Neural Networks, Stacked Denosing Autoencoders have been used in a variety of image classification and segmentation task

[19]-[22]. Deep learning techniques are capable of learning features such as edges, textures, patterns and various higher order features from raw images. Recently convolutional neural networks (CNNs) have been used for segmentation of gliomas from MR images [23]- [29]

and have outperformed other fully automatic techniques. CNNs can be considered as discriminative models since they predict the posterior probability given the image features. Typically, a large number of labeled data is required to train discriminative models especially deep learning based ones like CNNs.

Among deep learning models, Restricted Boltzman Machines(RBM) and Deep Boltzman Machines

[30] can be considered as generative models. Convolutional RBMs [31] have been used to extract features to aid semi-automated Glioma segmentation and was judged the best entry in the BraTS 2015 challenge. The focus of this paper is on SDAEs that learn a compact encoding of the data which can then be used as features for classification. Autoencoders and its variants which include SDAE’s and sparse stacked autoencoders have been in various medical image processing applications [32]-[34]

. Both RBMs and autoencoders can be pre-trained using unlabeled data. RBMs being energy based models are trained using Markov chain techniques like Gibbs sampling, while autoencoders, DAE & SDAE have the the advantage of being trained using gradient based backpropagation techniques.

One of the major issues that arise in training deep networks is class imbalance. Class imbalance is particularly acute in medical imaging problems since lesions constitute a miniscule percentage of image voxels. For instance in gliomas, lesion voxels form less than 2% of the total number of image voxels, in such scenarios a novelty/anomaly detection approach would be very effective. The principles of anomaly detection are well studied and typically involves detecting outliers or rare events by measuring a distance metric obtained from a parametric model of the data (excluding the anomalies). Autoencoders and other machine learning techniques

[35], [36]

have also found applications in novelty detection but have not been explored in the context of brain lesion detection from multi-sequence MR images. In the next section the paper’s original contribution is outlined based on the unsupervised training and novelty detection approach for glioma segmentation and ischemic lesion segmentation.

I-B Contribution

This paper describes the application of denoising autoencoders for the detection and segmentation of brain lesions from multi-sequence MR images. Specifically the contributions are:

  • False positive reduction for gliomas and candidate detection for brain lesions (Ischemic lesion) using a novelty detector.

  • Variant of the Novelty Detector, called Cascaded Novelty Detector (CND) which generates unique error distributions for various constituents of glioma.

  • Demonstrating Semi supervised and weakly supervised learning by training SDAE using patches drawn from limited patient volumes (n=20).

  • Transfer learning approach for LGG, which had limited number of labeled training data. The LGG network was obtained by fine tuning the pre trained HGG network.

The manuscript is laid out as follows. Section II describes the data set used, section III describes the preprocessing of image data, training of SDAE’s and post processing using ND. Section IV describes the results and discusses the performance of SDAE’s on test data for the brain tumor segmentation task. The paper concludes with the summary and discussion of future direction in Section V.

Ii Training Data

The publicly available BraTS-2015 data set [1], [37]

was used for training the networks. The data set comprises of 220 HGG and 54 LGG patient data. The HGG data set is composed of patients imaged only once (single time point) and patients who are scanned multiple times (longitudinal data). The HGG data set comprises of 123 single time point patients (ST) and 97 longitudinal patients (LT), while no longitudinal studies were found in the LGG data set. Each patient data comprises of a FLAIR, T2 weighted, T1 weighted and T1 post contrast sequence. Each voxel in the image volumes is classified as one of the five classes namely Normal, Edema, Non Enhancing Tumor, Necrotic Region and Enhancing Tumor as summarized in Table

I. There exists huge data imbalance among classes in both HGG and LGG data set (Table II). Furthermore, certain classes occur more frequently in one grade of Glioma than the other, for e.g. enhancing tumor and necrotic region is more prominent in HGG while non enhancing tumor is more prominent in LGG.

Type of Lesion Class
Healthy/No Lesion and Background 0
Necrotic Region 1
Edema 2
Non Enhancing Tumor 3
Enhancing Tumor 4
Table I: Labels associated to various types of Lesion in the images
Grade of Glioma class 0 class 1 class 2 class 3 class 4
HGG 98.74 0.29 0.59 0.0091 0.35
LGG 98.79 0.05 0.8 0.34 0.0013
Table II: Amount of Data imbalance in % in HGG and LGG

The ischemic lesion training database made available as part of the ISLES 2015 challenge [38] was used for demonstrating candidate lesion detection using ND. The database consists 28 patient volumes comprising of Diffusion weighted images, FLAIR, T1 and T2 weighted sequences.

Iii Methods

Iii-a Background

Autoencoders are neural networks that were originally used for dimensionality reduction. They are trained to reconstruct the input data and dimensionality reduction is achieved by using lesser number of neurons in the hidden layer than in the input layer. A deep autoencoder is obtained by stacking multiple layers of encoders with each layer trained independently (pre-training) using an unsupervised learning criterion. A classification layer can be added to the pre-trained encoder and further trained with labeled data (fine tuning). Such an approach initially outlined in

[21] was shown to be an effective way to train deep networks. Denoising autoencoder is a variant where the hidden layer is pre-trained with artificially corrupted data and the reconstruction error is calculated against the uncorrupted data. DAEs provide robust features which in turn improves the classification accuracy [22].

Iii-B Overview

In this work a single layer denoising autoencoder was used as an anomaly/novelty detector by training the network to reconstruct non-lesion patches. The reconstruction error corresponding to lesion and non lesion patches would then be significantly different.

SDAEs were pre-trained layer by layer using a large number of unlabeled patches. The network was fine tuned using labeled patches drawn from a limited subset of patients after adding a classification layer. Voxel wise classification was done on test data volumes by selecting patches centered on every voxel to create a label image. The reconstruction error map was obtained for the entire volume using ND. A binary mask derived from the error map, indicating the lesion regions, was used to reject false positives in the label image.

Iii-C Preprocessing

Iii-C1 Histogram Matching

All the volumes in the database were histogram matched [39] to an arbitrarily chosen reference image from the training data. This ensures the contrast and dynamic range to be similar across image volumes, (Fig. 4 (a-c)). The same reference image was used for HGG, LGG and Ischemic data set.

Iii-C2 z-score

Following histogram matching, all sequences corresponding to a patient volume were independently normalized to have zero mean and unit standard deviation.

Iii-D Patch Extraction

Patches of size 21x21 were drawn from all four sequences for pre-training and fine tuning the networks.

Iii-D1 Patches for SDAE

For pre-training, patches were sampled using a sliding window of 21x21 with a stride of 10 throughout the image volume, ignoring the voxel labels. Patches for fine tuning were extracted from regions around the tumor. This sampling scheme reduces the data imbalance between lesion and non-lesion patches. The patch extraction scheme for the network is shown in Table


Since the SDAEs were pre-trained using unlabeled data and fine tuned with limited labeled data the networks are referred to as Deep Semi-supervised network (D-SSN).

Iii-D2 Patches for ND

The patches for training the ND were extracted from FLAIR and T2 images. The non-lesion regions were sampled to obtain the patches.

Pre-training Fine-Tuning No. patients
Systematic Sampling,
No class balance
Vicinity of Tumor,
No class balance
Table III: Patch Extraction Scheme for D-SSN

Iii-E Training

Iii-E1 Novelty Detector

The Novelty detector is a one layer deep DAE, (Table IV

), with a sigmoid encoding layer and a linear decoder. The novelty detector was trained on 1,110,492 patches (576636-ST; 533856-LT) and validated on 438,275 patches (193955-ST; 244320-LT) extracted from the same subset of data that was used fine tune the HGG network. The training data was corrupted by 20 % masking noise. The weights and biases of the network were randomly initialized. The network was trained for 200 epochs with an initial learning rate of 0.001. Mean squared error loss function with L2 regularization was optimized with RmsProp


Iii-E2 Stacked Denoising Autoencoders

Separate networks with the same architecture were trained for LGG and HGG segmentation. The network architecture is given in Table IV. Both the networks were pre-trained using 130 HGG patients (70 ST and 60 LT data). The HGG network was fine tuned using patches from 10 ST images and 10 LT images. Validation was done using patches from 11 ST and 10 LT. The data set for fine tuning was a subset of data set used for pre- training. The LGG network was fine-tuned using 20 patient image volumes and validated using patches from 11 patient volumes.

width=0.48 Network Input Layer H H H H Output Layer ND 882 3500 882 SDAE 1764 3500 2000 1000 500 5

Table IV: Network Architecture. H- No. of Neurons in the i Hidden Layer
Figure 4: Histogram Matching (a) Histogram of reference FLAIR sequence. (b) Histogram of Test FLAIR sequence. (c) Histogram of test data post Histogram Matching.

Iii-E3 HGG Network

The HGG network was pre-trained layer by layer using 941,716 patches with 25% masking noise for 50 epochs. The weights & biases in each layer was initialized using Xavier initialization [40]

and RmsProp was used as the optimizer. The networks used a sigmoid encoder and a linear decoder. For fine-tuning, the weights and biases connecting the penultimate layer and decision layer was initialized with zeros. The network was trained using 3,304,035 patches (18,88,020-ST and 14,16,015-LT) and validated on 411,495 patches(235,140-ST; 176,355-LT). The weights of the network were learnt by minimizing the negative log likelihood cost function using Stochastic Gradient Descent with momentum equal to 0.9. The learning rate was initialized to 0.005 and was annealed as a function of number of epochs (Eq.

1) with a learning rate decay of 0.001. To prevent overfitting, all layers used dropouts [42] of 25 %.


Iii-E4 LGG Network

Due to the limited amount of LGG volumes in the data set, the network pre-trained on the HGG data was fine tuned with dropouts (35%) using LGG image patches(Training- 1,365,450; validation- 181,170). The data augmentation scheme described later increases the training data 6 fold.

Iii-F Data augmentation

Lesion classes constitute less than 2% of the image volume which makes data augmentation unavoidable. Data augmentation was done by rotating image patches through various angles. The angles were chosen such that the fill in regions are minimized during the interpolation. Arbitrary angles are also possible but the impact of zero filling or zero padding would be difficult to determine. Augmentation is done on the fly, thus minimizing hard disk and RAM usage. Multi-threaded training ensured that patches were augmented and loaded into GPU memory without slowing down training. It was observed that performing label preserving rotations during fine tuning had a significant impact on the classifying less prevalent classes like Non Enhancing Tumor and Necrotic Region.

For the HGG network, patches extracted from single time points were rotated by either 90, -90 or 180 degrees, while patches from longitudinal data were rotated by all three angles. For the LGG network, patches were additionally rotated by -45 & 45 degrees.

Iii-G Hyper-parameter optimization

Hyper-parameters were set by random search in the space of hyper-parameters. The training patch size was included as a hyper-parameter, in addition to the learning rate, optimizer, number of layer, number of neurons per layer and the L1/L2 penalties. Networks trained with various combinations of hyper-parameters were tested on limited test data and the hyper-parameters corresponding to the best dice scores were identified. Initially networks were trained with 3D patches which seemed a natural choice for the problem. However, in order to keep the size of the network to a manageable level, 2D patches were adopted.

Iii-H Postprocessing using ND

In the test phase for both the networks, vectorized patches (21x21x4) were used as input to classify the centre voxel of the patch.

Patches from the T2 and FLAIR was used as input to the ND. The reconstruction error map for a slice, , was constructed by assigning to every voxel( ), the mean reconstruction error of the patch centered at that voxel, (Eq. (2)). This led to a heat map like image with large error regions corresponding to the location of the Glioma/lesion. In Eq. (2) is the size of the patch i.e. each patch was of size , and is the patch error. Patch Error, (Eq. (3)), is the squared error between the FLAIR () and () patches and their respective reconstruction and

. The error map was then binarized using Otsu’s thresholding

[43] technique.


where and


Following the generation of binary mask, connected component analysis was carried on the image predicted by the HGG and LGG networks. Connected components that had a non-empty intersection with the binary error mask were retained while the rest were discarded.

A variant of the aforementioned Novelty Detector called Cascaded Novelty Detector (CND) was developed. In CND, the final resultant error value corresponding to a voxel was calculated by maintaining a cumulative error sum over all the image patches containing the voxel. The calculation of the error for each voxel is given in Eq. (4), where and are the voxel coordinates of the error map in a given slice, p is the size of the patch & is the patch error.


Iii-I Submission to BraTS 2015 challenge

The authors submission to the BraTS 2015 challenge is described in [44]

. Briefly, two SDAEs with 3 hidden layers (3000-1000-500) each, were trained on 3D image patches. One using HGG data and one using a mix of HGG and LGG data. Pre-processing included histogram matching, z score normalization and intensity clipping. During pre-training, class labels were used to form balanced mini-batches. The results of the prediction from these two networks were combined to obtain the label image. The label image was registered to an anatomic atlas to remove connected connected components in anatomical regions where the probability of tumor occurrence is generally considered low. The largest connected component was retained.

Figure 21: Performance of Proposed Networks. (a) Ground Truth. (b) Prediction. (c) Ground Truth. (d) Prediction. (e) Ground Truth. (f) Prediction. (g) Ground Truth. (h) Prediction. (i) Raw Prediction. (j) Otsu’s Mask. (k) Prediction after Post Processing. (l) Ground Truth. (m) FLAIR. (n) ND Error Map. (o) Binarized Error Map. (p) Ground Truth. In images all images, Orange-Edema, Yellow -Non Enhancing Tumor, Red-Necrotic Region, White-Enhancing Tumor, Green-Otsu’s Mask. In image (o), White- Binarized Error Map.
Figure 42: Performance of ND & CND on BraTS (a-j) and ISLES data set (k-t). (a) ND Error Map. (b) CND Error Map. (c) Ground Truth. (d) CND Error Map. (e) Ground Truth. (f) FLAIR. (g) T2. (h) CND Error Map. (i) Ground Truth. (j) Binarized CND error map. (k) FLAIR. (l) T2. (m) ND Error Map. (n) Binarized ND error map. (o) Ground Truth. (p) FLAIR. (q) T2. (r) CND error map. (s) Binarized Error Map. (t) Ground Truth. In images (c), (e) and (i) , Orange- Edema, Yellow- Non Enhancing Tumor, Red- Necrotic Region,White- Enhancing Tumor. In images (n), (o), (s) and (t), Green-Binarized Error Map, Red-Ischemic Lesion Ground truth

Iv Results and Discussion

Iv-a Performance on the Training data

The dice scores on the single time point patients, longitudinal patients and the entire BraTS 2015 training data set is shown in Table V. The performance of algorithm on the single time point patients is shown in Fig. 21 (a-d).

Fig. 21 (e-h) shows the performance of the network on a longitudinal patient data at 2 different time points. By including longitudinal time point patients as training data, the network attained the capability to capture the tumor region across time points.

width=0.48 Grade Statistics WT TC AT All M 0.89 0.81 0.84 0.85 0.71 0.75 0.11 0.25 0.22 ST M 0.92 0.86 0.87 0.86 0.78 0.80 0.13 0.20 0.18 LT M 0.85 0.70 0.79 0.84 0.62 0.69 0.08 0.27 0.25

Table V: D-SSN Performance on HGG data. LT and ST refer to longitudinal and single time point image volumes. WT- Whole Tumor Dice score, TC- Tumor Core Dice Score, AT- Active Tumor Dice Score, - Mean, - Standard Deviation, - Median, - No. of patient volumes.

Iv-B Novelty detector

Fig. 21 (i-l) demonstrates the reduction in false positive voxels using ND. Post processing using the ND mask led to good improvements in glioma segmentation. The improvement in performance was in the order of 4% for HGG whole tumor dice score, 1 % for HGG tumor core and 1% for HGG active tumor and 3 % for LGG whole tumor dice score.

Fig. 21 (n) shows the reconstruction error heat map for a sample slice, Fig. 21

(m). A bi-modal distribution could be inferred on visual inspection of the error map. The mean square reconstruction error corresponds to maximizing the log-likelihood of the training data assuming a Gaussian distribution. Given this interpretation any data input that does not correspond to the training data distribution can be expected to give rise to a large mean square error enabling lesion patch detection. Cross-validation can be used to determine the ideal threshold but would have to be changed depending on the lesion. Otsu’s thresholding technique makes the ND application independent.

A sample ND error map binarized using Otsu’s thresholding is shown in Fig. 21 (o). The mean square error of a patch is assigned to the center pixel of the patch, consequently the voxels towards the boundaries of the lesions will get assigned a much lower error than the voxels near the center of the lesion. Thus the ND error map underestimate the size of the lesion. For every voxel in the volume, CND’s take into account the reconstruction error from patches centered on its neighbors, therefore the degree of under-segmentation of the lesion is lower in CND’s when compared to ND’s (Fig. 42 (a-c)).

Qualitatively, it was observed that CND’s produces unique error distribution for various constituents of the lesion (Fig.42 (d-e), (h-i)). From the CND error map, the necrotic region would be easily delineated from edema and enhancing tumor. It is note worthy that even though CND was trained on FLAIR and T2 and not T1c, it was able to delineate enhancing tumor regions. However, this was only possible if there existed a corresponding hyper intensity profile in either of its input sequences. Thus CND trained with FLAIR, T2 and T1 post contrast would be the ideal choice to capture all the constituents of the lesion.

The CND generated error map could be used as initialization point for various generative techniques. The whole lesion could be segmented from the CND error map by setting the threshold to be one standard deviation away from the mean (Fig. 42 (j)).

The performance of ND on ISLES challenge data is shown in Fig. 42 (k-o). Similar to Glioma segmentation, the error map picks up the location of the ischemic lesions accurately, however ND misses lesions that constitute less than 1% of total number of voxels in the volume. The T2 weighted sequences in the ISLES data set were acquired in the sagittal plane as opposed to the axial plane acquisition in the BraTS data set. Since the ND was trained using BraTS data set, the resolution mismatch would lead to poor performance in detecting small lesions. On patients with lesions that constitute more than 1% of the total number of voxels, ND achieved a dice score of 0.44 0.21. In contrast, CND achieved a much higher dice score of 0.64 0.17, (Fig. 42 (p-t)). The improved performance was due to the reasons explained in the previous paragraph(s). These results imply that the ND can be trained using data from healthy volunteers or other imaging studies comprising of relevant MR sequences.

Iv-C Transfer Learning for LGG

The LGG network was trained by fine tuning the pre-trained HGG network. There was a significant improvement in network performance compared to the authors submission to the BraTS 2015 challenge. Comparison with networks pre-trained with a mix of LGG and HGG data are shown in Table VIII. However, the performance of the LGG network is significantly worse than of the HGG network due to inherent differences in the abundance of classes found in these grades of Glioma. Enhancing tumor or active tumor region is hardly present in LGG while non-enhancing tumor class is rare in HGG. A collection of unlabeled multi-sequence LGG volumes can be expected to improve the prediction accuracy significantly.

Iv-D Prediction with missing sequences

The performance of the network upon blocking individual sequences from the input is shown in Table VI. Prediction with a missing sequence was expected to lower the dice scores, however the magnitude of the decline was dependent on the sequence dropped. The results were also informative, indicating the relative importance of the sequences. Removing T1 had negligible impact on the whole tumor score while removing T2 and FLAIR lead to the maximum change i.e. decline. The change in dice scores of enhancing tumor or active tumor was the largest when T1c was removed which can be expected. Based on the decline in performance one can conclude that FLAIR plays an important role in delineating lesion from normal tissues.

width=0.49 MS WT TC AT FL 0.35 0.23 0.35 0.55 0.29 0.63 0.58 0.31 0.68 T2 0.79 0.13 0.82 0.61 0.29 0.69 0.70 0.26 0.80 T1 0.80 0.16 0.85 0.61 0.25 0.66 0.58 0.29 0.60 T1c 0.81 0.13 0.86 0.40 0.22 0.38 0.00 0.00 0.00

Table VI: Performance of D-SSN Performance with Missing Sequences (MS), FLAIR(FL), T2, T1, T1 post contrast(T1c). WT- Whole tumor Dice score, TC- Tumor Core Dice Score, AT- Active Tumor Dice Score, - Mean, - Standard Deviation, - Median.

Iv-E Weakly supervised learning

The minimum amount of data required for the network to maintain its level of performance was tested by fine tuning the LGG and HGG networks with a lower number of patient data (leading to decreasing number of patches). The results in Table VII shows that training with patches drawn from only 20 patients, the HGG network had marginal decline in dice scores and are comparable to results obtained when the networks were trained on patches drawn from a larger number of patients. It’s also notable that if the number of extracted patches are increased from a limited number of patients then the network performance rebounds as shown in Table VII. The structures in the brain appear similar across different brain MR images. Drawing patches from a limited number of patient volumes coupled with data augmentation, would still provide enough samples for the network to learn and maintain prediction performance.

width=0.49 N WT TC AT 20 0.84 0.13 0.89 0.72 0.24 0.81 0.74 0.25 0.84 40 0.85 0.13 0.90 0.75 0.23 0.83 0.78 0.23 0.87 65 0.84 0.15 0.89 0.75 0.23 0.83 0.78 0.24 0.86 20M 0.86 0.12 0.90 0.73 0.24 0.83 0.77 0.23 0.86

Table VII: Performance of network based on number of Training patients used (N), (20M) is 20 patients with more patches. WT- Whole tumor Dice score, TC- Tumor Core Dice Score, AT- Active Tumor Dice Score, - Mean, - Standard Deviation, - Median.

Iv-F Performance on Challenge Dataset

The networks were tested on two different challenge test data namely BraTS 2013 challenge test data and BraTS 2015 test data. The performance of networks on BraTS 2013 challenge data and BraTS 2015 test data is given in Table VIII. On the BraTS 2013 leader-board, the network was ranked .

Compared to the authors previous submission to the 2015 challenge, on HGG data it was observed that the current method does significantly better on the tumor core as well as the active tumor while the same level of performance was maintained for the whole tumor dice score. For LGG, the use of a mix of LGG and HGG patches to pre-train and fine tune the network gave better results for whole tumor. However the transfer learning approach gives significantly better results for the tumor core. These results indicate that an optimum mix of LGG and HGG data is required for improved segmentation performance on LGG patient volumes.

width=0.499 Year Nw G WT TC AT 2013 D-SSN All 0.85 0.04 0.780.15 0.730.11 2015 PS All 0.710.24 0.510.26 0.580.17 HGG 0.710.23 0.570.24 0.580.17 LGG 0.730.29 0.380.28 * D-SSN All 0.730.25 0.560.28 0.680.20 HGG 0.750.19 0.610.27 0.680.20 LGG 0.680.34 0.460.30 *

Table VIII: Performance of D-SSN on challenge data sets compared against the authors original submission to Brats 2015 challenge. Nw- Network, G- Grade of Tumor, WT- Whole Tumor Dice score, TC- Tumor Core Dice score, AT- Active Tumor Dice score. PS- Previous Submission, D-SSN-Current Submission.

V Conclusion

In this paper we propose a completely automated brain tumor segmentation technique with a novel false positive/candidate detection method based on denoising autoencoders.

  • Despite differences in acquisition resolution, ND trained using non-lesion patches (BraTS data) was able to learn the normal brain structure and detect ischemic lesions (ISLES data). A variant of ND (CND), wherein a cumulative error map was calculated for every voxel, was able to significantly improve lesion detection performance on ISLES data. In addition, CND error maps assigned different error distributions to various constituents of glioma, making it an ideal tool to construct tumor atlases. This can also serve as a good initialization for various segmentation techniques.

  • The paper clearly demonstrates the ability of SDAE’s to produce good segmentation using minimal number of patient data. The redundancy of patches obtained from MR brain images was exploited to train networks.

  • The results presented are the prediction of a single network with minimal data pre-processing and post-processing. The N4 bias correction technique which is an oft used pre-processing step was eliminated. Histogram matching to a reference data was still done and future work would be to eliminate the same by appropriate data normalization. Skull stripping (BraTS data and ISLES challenge data were skull stripped) could potentially be eliminated as a separate step by using ND. The idea is to enable prediction on MR images without expensive pre-processing.

In summary the work presented applies SDAE’s for the brain lesion detection and segmentation task using a limited number of training data. The novelty detector concept allows for efficient elimination of false positives and candidate detection, making it a valuable CAD tool.


  • [1] B. Menze et al. ,“The Multimodal Brain Tumor Image Segmentation Benchmark (BraTS)”, IEEE Transactions on Medical Imaging, vol. 34, no. 10, pp. 1993–2024, 2015.
  • [2] J. Corso et al., “Efficient Multilevel Brain Tumor Segmentation With Integrated Bayesian Model Classification,” IEEE Transactions on Medical Imaging IEEE Trans. Med. Imaging, vol. 27, no. 5, pp. 629–-640, 2008.
  • [3] L. Görlitz et al.

    , “Semi-supervised Tumor Detection in Magnetic Resonance Spectroscopic Images Using Discriminative Random Fields,” Lecture Notes in Computer Science Pattern Recognition, pp. 224–-233.

  • [4] M. Schmidt et al.,“Segmenting Brain Tumors using Alignment-Based Features,” Fourth International Conference on Machine Learning and Applications (ICMLA’05).
  • [5] M. Prastawa et al.

    , “A brain tumor segmentation framework based on outlier detection.” Med Image Anal, vol. 8, pp. 275–283, 2004.

  • [6] B. Menze et al., “A Generative Probabilistic Model and Discriminative Extensions for Brain Lesion Segmentation— With Application to Tumor and Stroke”, IEEE Transactions on Medical Imaging, vol. 35, no. 4, pp. 933-946, 2016.
  • [7] Kilian M.Pohl et al. “A Unifying Approach to Registration, Segmentation, and Intensity Correction.” Lecture Notes in Computer Science Medical Image Computing and Computer-Assisted Intervention – MICCAI 2005, pp. 310-318, 2005.
  • [8] K.Van Leemput et al. “Automated model-based bias field correction of MR images of the brain.”, IEEE Transactions on Medical Imaging pp. 885-896, 1999.
  • [9] R. Meier et al.,“A hybrid model for multimodal brain tumor segmentation,” in Proceedings of NCI-MICCAI BraTS, pp. 31–37, 2013.
  • [10] R.Meier et al., “Appearance-and context-sensitive features for brain tumor segmentation,” in MICCAI Brain Tumor Segmentation Challenge (BraTS), pp. 20-–26, 2014.
  • [11] D. Zikic et al., “Decision Forests for Tissue-Specific Segmentation of High-Grade Gliomas in Multi-channel MR,” Medical Image Computing and Computer-Assisted Intervention – MICCAI 2012 Lecture Notes in Computer Science, pp. 369–-376, 2012.
  • [12] S.Bauer et al., “Segmentation of brain tumor images based on integrated hierarchical classification and regularization,” Proceddings of MICCAI- BRATS, pp. 10–-13, 2012.
  • [13] S. Reza and K. M. Iftekharuddin, “Multi-fractal texture features for brain tumor and edema segmentation,” Medical Imaging 2014: Computer-Aided Diagnosis, 2014.
  • [14] N. J. Tustison et al., “Optimal Symmetric Multimodal Templates and Concatenated Random Forests for Supervised Brain Tumor Segmentation (Simplified) with ANTsR,” Neuroinform Neuroinformatics, vol. 13, no. 2, pp. 209-–225, 2014.
  • [15] E. Geremia et al., “Spatially adaptive random forests,” in 2013 IEEE 10th International Symposium on Biomedical Imaging (ISBI). IEEE, pp. 1344–-1347, 2013.
  • [16] A. Pinto et al., “Brain Tumour Segmentation based on Extremely Randomized Forest with high-level features,” 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015.
  • [17] S. Bauer et al., “Fully Automatic Segmentation of Brain Tumor Images Using Support Vector Machine Classification in Combination with Hierarchical Conditional Random Field Regularization,” Lecture Notes in Computer Science Medical Image Computing and Computer-Assisted Intervention – MICCAI 2011, pp. 354–-361, 2011.
  • [18] C. Lee et al., “Segmenting Brain Tumors Using Pseudo–Conditional Random Fields,” Medical Image Computing and Computer-Assisted Intervention – MICCAI 2008 Lecture Notes in Computer Science, pp. 359–-366, 2008.
  • [19] A. Krizhevsky et al.

    ,“Imagenet classification with deep convolutional neural networks,” in Advances in neural infor- mation processing systems, pp. 1097–-1105, 2012.

  • [20] S. Dieleman et al., “Rotation-invariant convolutional neural networks for galaxy morphology prediction,” Monthly Notices of the Royal Astronomical Society, vol. 450, no. 2, pp. 1441–-1459, 2015.
  • [21] G. E. Hinton and R. Salakhutdinov.“Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, no. 5786, pp. 504–-507, 2006.
  • [22] P. Vincent et al., “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research. 3371–3408, 2010.
  • [23] S. Pereira et al., “Brain Tumor Segmentation using Convolutional Neural Networks in MRI Images,” IEEE Transactions on Medical Imaging IEEE Trans. Med. Imaging, pp. 1–-1, 2016.
  • [24] M. Havaei et al., “Brain tumor segmentation with deep neural networks,” arXiv:1505.03540v1, 2015. [Online]. Available:
  • [25] D. Zikic et al., “Segmentation of brain tumor tissues with convolutional neural networks,” MICCAI Multimodal Brain Tumor Segmentation Chal- lenge (BraTS), pp. 36–-39, 2014.
  • [26] G. Urban et al., “Multi-modal brain tumor segmentation using deep convolutional neural networks,” MICCAI Multimodal Brain Tumor Seg- mentation Challenge (BraTS), pp. 1–-5, 2014.
  • [27] M. Lyksborg et al., “An ensemble of 2d convolutional neural networks for tumor segmentation,” in Image Analysis. Springer, pp. 201–-211, 2015.
  • [28] V. Rao et al., “Brain tumor segmentation with deep learning,” MICCAI Multimodal Brain Tumor Segmentation Challenge (BraTS), pp. 56–-59, 2015.
  • [29] P. Dvor̆ák and B. Menze, “Structured prediction with convolutional neural networks for multimodal brain tumor segmentation,” MICCAI Multimodal Brain Tumor Segmentation Challenge (BraTS), pp. 13–-24, 2015.
  • [30]

    R. Salakhutdinov and G. Hinton, ”An Efficient Learning Procedure for Deep Boltzmann Machines”, Neural Computation, vol. 24, no. 8, pp. 1967-2006, 2012.

  • [31] M.Agn et al., ”Brain Tumor Segmentation Using a Generative Model with an RBM Prior on Tumor Shape”. In International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries (pp. 168-180). Springer International Publishing.
  • [32] D. Sheet et al., “Deep learning of tissue specific speckle representations in optical coherence tomography and deeper exploration for in situ histology,” 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), 2015.
  • [33] Hoo-Chang Shin et al., “Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data,” IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1930–-1943, 2013.
  • [34] J. Xu et al., ”Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images”, IEEE Transactions on Medical Imaging, vol. 35, no. 1, pp. 119-130, 2016.
  • [35] N. Japkowicz et al., “A novelty detection approach to classification”, in IJCAI, pp. 518–523. 1995.
  • [36]

    E. Eskin,”Anomaly detection over noisy data using learned probability distributions”. In Proceedings of the International Conference on Machine Learning.2000.

  • [37] M. Kistler et al., “The Virtual Skeleton Database: An Open Access Repository for Biomedical Research and Collaboration”, Journal of Medical Internet Research, vol. 15, no. 11, p. e245, 2013.
  • [38] ”ISLES: Ischemic Stroke Lesion Segmentation Challenge 2015”,, 2016. [Online]. Available: [Accessed: 24- Nov- 2016].
  • [39] L. G. Nyúl et al., “New variants of a method of mri scale standardization,” IEEE Transactions on Medical Imaging, vol. 19, no. 2, pp. 143–-150, 2000.
  • [40]

    X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks” in International conference on artificial intelligence and statistics, pp. 249–-256, 2010.

  • [41] G. Hinton et al.,“ Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude”, 2012.
  • [42] N. Srivastava et al., “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–-1958, 2014.
  • [43] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms”, IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, 1979.
  • [44] K. Vaidhya et al., “Multi-modal Brain Tumor Segmentation Using Stacked Denoising Autoencoders,” Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries Lecture Notes in Computer Science, pp. 181–-194, 2016.