3-D Convolutional Neural Networks for Glioblastoma Segmentation

11/14/2016 ∙ by Darvin Yi, et al. ∙ 0

Convolutional Neural Networks (CNN) have emerged as powerful tools for learning discriminative image features. In this paper, we propose a framework of 3-D fully CNN models for Glioblastoma segmentation from multi-modality MRI data. By generalizing CNN models to true 3-D convolutions in learning 3-D tumor MRI data, the proposed approach utilizes a unique network architecture to decouple image pixels. Specifically, we design a convolutional layer with pre-defined Difference- of-Gaussian (DoG) filters to perform true 3-D convolution incorporating local neighborhood information at each pixel. We then use three trained convolutional layers that act to decouple voxels from the initial 3-D convolution. The proposed framework allows identification of high-level tumor structures on MRI. We evaluate segmentation performance on the BRATS segmentation dataset with 274 tumor samples. Extensive experimental results demonstrate encouraging performance of the proposed approach comparing to the state-of-the-art methods. Our data-driven approach achieves a median Dice score accuracy of 89 a generalized low-bias possibility to learn from medium-size MRI datasets.



There are no comments yet.


page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Glioblastoma (GBM) is a highly malignant brain tumor with a dismal prognosis [1]. Most patients experience disease progression within 7-10 months, and targeted therapies have not increased survival [2, 3]. Accurate brain tumor segmentation is a significant yet challenging task for follow-up computer-aided diagnosis. Semi-automatic segmentation remains a bottleneck in mining medical imaging data with a lack of definitive guidance of human experts involvement. Automated methods such as graph cuts method tend to lead high-bias models that have not significantly improved accuracy [8].
While data-driven models like Convolutional Neural Networks (CNNs) are increasingly prevalent [11, 12, 15, 17]

, high variance limits their use for medical image analysis as many medical data sets have at most hundreds of patient samples. Strategies like data augmentation and transfer learning may gain success in creating more generalizable low-bias models

[9, 10], but they do not consider the development of network structures incorporating specific domain knowledge in tumor MRI. In this paper, we identify two key weaknesses in previous approaches by applying CNN models to medium sized imaging data sets. Canonically, CNNs utilize layers of 2-D convolutions as filters for feature learning, feeding the outputs of these convolutional layers into a fully connected neural network. We propose a 3-D CNN model for brain tumor segmentation by generalizing the conventional 2-D architecture to fully take advantage of 3-D multi-modality MRI data. In addition, we propose several important advances leading to accurate segmentation performance.
First, most prior methods for volumetric image data use either 2-D convolutions or limited 3-D convolutions on the , , and planes. By contrast, we propose a true generalization to 3-D CNNs, made it computationally feasible by the transformation into Fourier space. Such innovation renders our system more robust with minimal loss of spatial information during convolution.

Second, the use of CNN models with medium data sets likely lead to high variance due to the lack of training data for learn network weights. Related algorithms have been proposed to use pre-trained filters from ImageNet

[11], but these 2-D filters maximize object classification from real-world images rather than volumetric medical images [11, 12, 13]. Since texture filters have been proven as effective tools for image data analysis, in this study, we perform 3-D convolutions using pre-defined difference of Gaussian (DoG) filters, which are rotationally symmetric and act as effective blob detectors [14]. Subsequent CNN layers use convolutions to decouple pixels, expanding effective data size from patient number to total pixel count and significantly reducing variance. Our 3-D CNN leverages the structure of medical imaging data to train a robust and efficient algorithm for learning from 3-D images. We apply our framework for segmenting brain tumors and compare with previous approaches as well as results provided by expert annotations.

2 Methods

2.1 3-D Convolutional Neural Networks Architecture

Our CNN architecture utilizes 5 convolutional layers (Figure 1). Starting from inputs of volumetric image data in 4 MRI modalities (channels), the first layer performs 3-D convolution over all input channels with 72 pre-defined filters. We then train 4 convolutional layers of

filters over the number of channels in the preceding layer (72 for the first and 100 for all subsequent layers). The final output layer of 5 channels represents scores for the predicted class probabilities of each pixel as either non-tumor or 4 tumor subregions.

Figure 1: 3-D CNN architecture. Starting with 4 MRI modalities as input samples, we convolve 72 sparse 3-D Difference-of-Gaussian (DoG) filters to form the first convolutional layer. The remaining convolutional layers use

(scalar) filters over all channels to generate the subsequent layer, decoupling all pixels after the initial convolution. The last layer produces 5 channels, each corresponding to a probability for classifying each pixel into non-tumor or 4 tumor subregions.

2.2 3-D Convolution

Next, we minimized the potential bias of the CNN architecture by considering the volumetric image data as a 3-D space of pixels. For 3-D image and filter , the usual 2-D filters of CNNs can be generalized to 3-D convolutions (Eq. 1) as defined below,


Given images and filters, time complexity of 3-D convolution is . Since convolution in space is equivalent to element-wise multiplication in Fourier space, this complexity can be reduced to .
Previously, CNNs have repeatedly been found to have a first layer with trained weights that resemble Gabor-like filters [10]. Thus, to save computational time in training the model, we pre-select the first layer’s filters to function as edge detectors. More specifically, we use 3-D Difference-of-Gaussian (DoG) filters, each represented by the difference of two normalized 3-D Gaussians of scales and , as defined in Eq. 2.


We created 8 filters of size with scales . Previous algorithms have shown the efficacy of DoG filters in blob detection [14]

; in particular, their rotational symmetry enables the CNN to learn a blob profile for each pixel. By contrast, while Gabor texture filters have emerged as a common theme in deep learning on image data


, their lack of rotational symmetry requires learning a full feature vector for each pixel for every possible orientation, which greatly increases learning complexity.

Next, we apply the 8 DoG filters to the original input as well as the magnitude of the gradient of those images. Thus, we create 18 “feature” images: the original pixel intensities and their 8 filter products as well as the magnitude of the gradient values and their 8 filter products. After applying such computation on 4 MRI modalities, such design leads to a 72-dimensional feature space for each pixel. Overall, this non-trained convolutional layer results in a 3-D convolution of the input data using 72 pre-defined filters.

2.3 Subsequent Convolution Layers as Pixel-wise Neural Network

Each subsequent convolution layer consists of

kernels over all input channels. This choice enabling training on a CPU cluster is motivated by two benefits: (1) drastic decrease in the number of weights to be trained and (2) decoupling of pixels, allowing for a fully connected neural net implementation of the last five convolution layers. This decoupling is possible because the convolution layers and the softmax loss function operate independently for each pixel.

Thus, given a 72-dimensional feature vector produced for each pixel by the first convolutional layer, we classify each pixel using a fully connected neural network. Our network architecture consists of 3 hidden layers of 100 neurons each, with a rectified linear unit (ReLU) as the activation function and the softmax function as the loss function. The output layer of five neurons predicts the following classes:

, , , , and . The final classification step follows a voting algorithm as described previously for pooling expert segmentations from BRATS dataset [16].

3 Experiments

3.1 Brain Tumor MRI Data

We used the Brain Tumor Image Segmentation Challenge (BRATS)[16] to evaluate performance of the proposed approach. The 2015 BRATS data set consists of 274 samples: 220 patients with high-grade GBM (HGG) and 54 with low-grade GBM (LGG). Each patient has 4 modalities (T1 post-contrast, T1 pre-contrast, T2-weighted, and FLAIR) and an expert segmentation that we treat as ground truth. The expert segmentation, which provides pixel-wise labeling into five segmentations based on the consensus of eleven radiologists: , , , , and

. We additionally included BRATS 2013 dataset to compare with prior studies. All images were pre-processed by stripping the skull, co-registering images, and interpolating images to

pixels (Figure 2).

(a) Modalities
(b) Labels
(c) Modalities
Figure 2: BRATS image data. (a) The four imaging modalities (upper left: T1-Pre, upper right: T1-Post, lower left: T2W, lower right: FLAIR). (b) The four subregions (yellow: necrosis, green: edema, blue: non-enhancing, purple: enhancing), corresponding to the scans in (a). (c) A visualization of the labels superimposed on each modality.

3.2 Evaluation

We evaluated our algorithms by focusing on three clinically relevant segmentations: “whole” or “total” referring to the entire tumor, “core” including all structures except “edema,” and “active” including only “enhancing” subregions unique to HGG [16]. For each of these three regions, accuracy is reported using the Dice coefficient by comparing the predicted segmentation with the expert reference and with previously developed algorithms. This score, given in Eq. 3

, is equivalent to the harmonic mean of the precision and the recall.


4 Results

The performance of the 3-D CNN for “total,” “core,” and “active” tumor regions on the 2015 data set is shown (Figure 3), with a median accuracy of over on total tumor detection (compared to inter-radiologist reproducibility of ). Slice-level comparison of our algorithm’s labels with the expert segmentation is shown for representative samples of varying Dice scores (Figure 4).

Figure 3: Histogram of dice scores. The full distribution of dice scores for all 274 patients are shown for each of the three tumor regions (blue: total, turquoise: core, yellow: active). The box-and-whisker plot in the figure summarizes the distribution of accuracy for tumor segmentation.
Figure 4: Representative segmented slices. Results of the algorithm are given for three representative patients each with (left) high, (middle) intermediate, (right) and low Dice scores. Each column shows (top) the T1 post-contrast image, (middle) the predicted tumor subregion labeling, and (bottom) the expert segmentation labeling for a slice from that patient. Label colors correspond to those of Figure 2.

Table 1 compares the performance of our algorithm to expert segmentations and competing methods for brain tumor segmentation. Overall, our algorithm shows quite competitive results comparing to prior approaches. First, comparing to raters, our results are comparable with annotations by individual radiologists and even close to results of expert segmentation generated by a voting algorithm [16]. Additionally, we evaluated the performance of our method on the 2013 BRATS data set, comparing it to the best combination of programs from the 2013 BRATS challenge [16]. While each individual program from the 2013 challenge has lower performance than the combination, our algorithm, trained only on the 2013 data, has equal or better results than the combination in all three categories. Finally, we compare our method with other methods on the 2015 data [18]. Our algorithm achieves Dice scores for whole, core, and active tumor detection of 87%, 76%, and 80%, with the highest performance in two of the three clinically used regions. Our similar outcomes on both 2015 data and 2013 BRATS data reaffirmed the superior performance of the proposed 3-D CNN model with a notable improvement in classification accuracy of active tumor regions.

People Description Whole Core Active
Rater v. Rater Comparison between radiologists using 2013 BRATS challenge data. 85 (88/84) 75 (95/67) 74
Rater v. Fused Comparison between radiologists and fused segmentation. 91 (93/92) 86 (96/80) 85
Combination The best combination of 2013 BRATS challenge programs using Algorithm 1. 88 (89/86) 78 (82 / 66) 71
3-D Convolutional Neural Network, using 2013 data set. 89 (89/88) 78 (79/74) 71
Davy Deep neural networks. 2014 Workshop. 85 (-/-) 74 (-/-) 68
Goetz Extremely randomized trees. 2014 Workshop. 83 (-/-) 71 (-/-) 68
Kleesiek ilastik Algorithm. 2014 Workshop. 84 (84/82) 68 (71/61) 72
Kwon GLISTR Algorithm. 2014 Workshop. 88 (-/-) 83 (-/-) 72
Meier Appearance and Context Sensitive Features. 2014 Workshop. 83 (84/-) 66 (73/-) 68
3-D Convolutional Neural Network, using 2015 data set. 89 (89/87) 76 (79/69) 80
Table 1: Performance Comparison (%)

5 Conclusion

We have proposed a 3-D fully convolutional netwroks that generalizes conventional CNNs in learning 3-D tumor MRI data. Specifically, we first use a non-trained convolutional layer with pre-defined DoG filters to perform true 3-D convolution that incorporates information about the local neighborhood at each pixel of the output. We then use three trained convolutional layers that act to decouple voxels, under the assumption that voxels are coupled only by the information already incorporated in the initial 3-D convolution. This architecture of a fully connected neural network at the level of pixels allows us to greatly increase the effective training data size from the number of patient samples to the number of pixels. We show that the use a modified non-trained convolutional layer can greatly reduces variance by increasing the number of training samples. It is known that patient-based samples can theoretically allow for complex features that relate wholly different parts of the brain, but the presented voxel-based training data allows the fully connected feed-forward neural network to learn higher-level features based on a much larger training data set in pixel space. Overall, our generalization to a 3-D CNN incorporates several key innovations addressing problems with existing approaches to using deep learning in medium-sized imaging data sets.

6 Acknowledgement

This work was supported by the National Institutes of Health (NIH) under Award Number R01EB020527.

7 References


  • [1] Adamson C, et al. Glioblastoma multiforme: a review of where we have been and where we are going. Expert Opin. Investig. Drugs 18(8) (2009) 1061-1083.
  • [2] Omuro A, DeAngelis LM. Glioblastoma and Other Malignant Gliomas: A Clinical Review. JAMA 310(17) (2013) 1842-1850.
  • [3] Omuro AM, Faivre S, Raymond E. Lessons learned in the development of targeted therapy for malignant gliomas. Mol. Cancer Ther. 6(7) (2007) 1909-1919.
  • [4] Barajas RF, et al. Glioblastoma Multiforme Regional Genetic and Cellular Expression Patterns: Influence on Anatomic and Physiologic MR Imaging. Radiology 254(2) (2010) 564-576.
  • [5] Gutman DA, et al. MR Imaging Predictors of Molecular Profile and Survival: Multi-institutional Study of the TCGA Glioblastoma Data Set. Radiology 267(2) (2013) 560-569.
  • [6] Jain R, et al. Genomic Mapping and Survival Prediction in Glioblastoma: Molecular Subclassification Strengthened by Hemodynamic Imaging Biomarkers. Radiology 267(1) (2013) 212-220.
  • [7] Birkbeck N, et al. An Interactive Graph Cut Method for Brain Tumor Segmentation. IEEE WACV (2009) 1-7.
  • [8] Njeh I, et al. 3D multimodal MRI brain glioma tumor and edema segmentation: A graph cut distribution matching approach. Comput. Med. Imag. Graph. 40 (2015) 108-119.
  • [9] Cui X, Goel V, Kingsbury B. Data Augmentation for Deep Neural Network Acoustic Modeling. IEEE ICASSP (2014) 5582-5586.
  • [10] Yosinski J, et al. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 27 (2014) 3320-3328.
  • [11]

    Deng J, et al. ImageNet: A Large-Scale Hierarchical Image Database. IEEE Comput. Vision and Pattern Recognit. (2009) 248-255.

  • [12] Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. (2012) 1-9.
  • [13] Bar Y, et al. Chest Pathology Detection Using Deep Learning with Non-Medical Training. IEEE International Symposium on Biomedical Imaging (ISBI) (2015).
  • [14] Lowe DG. Object recognition from local scale-invariant features. Proc. Int. Conf. Comput. Vision. 2 (1999) 1150-1157.
  • [15]

    Lawrence S, et al. Face Recognition: A Convolutional Neural-Network Approach. IEEE Trans. on Neural Networks

    8(1) (1997) 98-113.
  • [16] Menze BH, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imag. 34(10) (2015) 1993-2024.
  • [17] Wei S, et al. Multi-scale convolutional neural networks for lung nodule classification. International Conference on Information Processing in Medical Imaging. (2015) 588-599.
  • [18] BraTS Challenge Manuscripts. MICCAI (2014).