Glioma is the most common primary central nervous system tumor with high morbidity and mortality. For glioma diagnosis, four standard Magnetic Resonance Imaging (MRI) modalities are generally used: T1-weighted MRI (T1), T2-weighted MRI (T2), T1-weighted MRI with gadolinium contrast enhancement (T1ce) and Fluid Attenuated Inversion Recovery (FLAIR). In fact, it is challenging and time-consuming for doctors to combine these four modalities to complete a fine segmentation of brain tumors.
Ronneberger et al.  proposed a U-shape convolutional network (called U-Net) and introduced skip-connections to fuse multi-level features, so as to help the net decode more precisely. Many experimental results show that U-Net performs well in various medical image segmentation tasks. Dong et al. 
applied U-Net to brain tumor segmentation and took the soft dice loss as loss function to solve the issue of imbalanced data in brain MRI data. Though soft dice loss may have better performance than cross entropy loss in some extremely class-imbalanced situation, it has less stable gradient, which may make the training process unstable even not convergent.
Inspired by the hierarchical structure within the brain tumor, we propose a novel cascaded U-shape convolutional network to realize a multistage segmentation of brain tumors. To mitigate the gradient vanishing problem caused by the increase of network depth, each basic block is designed as a residual block. Moreover, we add the decoding-layer supervision during training process and further alleviate the problem of gradient vanishing. To reduce the information loss in the deeper layers, we design many skip connections to help the transmission of high resolution information from the shallow layers to the corresponding deeper layers, so as to obtain more refined segmentation results. To solve the class-imbalanced problem, we present a loss weighted sampling scheme.
The main contribution of this paper can be summarized as follows.
We propose a novel cascaded U-Net with between-net connections for brain tumor segmentation. In particular, each basic block of our cascaded U-Net is designed as a residual block.
We also design many skip connections to help the transmission of high resolution information from shallow layers to deeper layers.
Moreover, we present a loss weighted sampling scheme to address the severe class imbalance problem. The full implementation and the trained networks are available at the authors’ website.
Finally, our experimental results show that our method performs much better than state-of-the-art methods in terms of dice score and sensitivity.
2 The Proposed Cascaded U-Net Method
2.1 Our Cascaded U-Net Architecture
Our network is a novel end-to-end architecture. In particular, our cascaded U-Net is mainly composed of two cascaded U-Nets, which are for different tasks, as shown in Fig. 1. Such a cascaded framework is inspired by the underlying hierarchical structure within the brain tumor. The tumor comprises a tumor core, and the tumor core contains an enhancing tumor.
Given the input brain MRI images, we extract a non-brain mask firstly and prevent the network from learning the masked areas by loss weight sampling. Then the first-stage U-Net separates the whole tumor from background, and sends the extracted features into the second-stage U-Net, which further segments tumor substructures. Such a cascade structure is designed to take advantage of the underlying physiological structure within the brain tumor. The cascade structure will multiply the network depth, which on the one hand will enhance the ability of a network to extract semantic features, but on the other hand exacerbate the gradient vanishing problem. In our architecture, we design the following three strategies to avoid the above problem and fulfill the coarse-to-fine segmentation of brain tumor.
Firstly, inspired by the residual network, each basic unit in our network is constructed by a residual block stacked by two convolution blocks. Secondly, the auxiliary supervisions are added. Specifically, each decoding layer in the network expands a branch composed by a deconvolution and a
convolution to up-sample the feature maps to the same resolution as input and squeeze the output channels. Then the training labels are added for the supervised learning (see the thinner orange arrows in Fig.1). This allows an introduction of additional gradients during training and further alleviates the vanishing of gradients. To some extent, it can be also regarded as an additional constraint for the network to avoid overfitting. Finally, the between-net connections are designed. The features from the decoding layers of the first U-Net are transmitted to the corresponding encoding layers in the second U-Net by concatenation operation. These between-net connections enable the high-resolution information in shallow layers to be preserved and sent to the deeper layers for a fine segmentation of tumor substructures.
2.2 Training with Loss Weighted Sampling
Our proposed network is an end-to-end architecture, in which the two cascaded U-Nets are trained jointly, ensuring the efficiency of the data processing procedure. To address the extremely imbalance of the positive and negative samples in brain tumor dataset, we present a loss weighted sampling scheme and introduce it into the cross entropy loss function. Specifically, the sampled loss is formulated as follows:
denotes the predicted probability for the one-hot labelafter softmax functions. is the number of batches, is the number of channels, , and are the length and width of the image, respectively. Sample matrix is computed according to specific tasks, and denotes the loss weight of the pixels at the spatial location .
The brain MRI image is divided into four regions: and , which represent the black background, normal brain region, tumor region, and tumor contour region, respectively (see Fig. 2 (d)). Then the sample matrix can be computed as:
where denotes a binary matrix obtained by random sampling in with probability . The hyper-parameter is greater than or equal to 1, which is introduced for adjusting the loss weight of contour regions and is expected to enhance the ability of network to recognize the tumor contour.
For most of the MRI images, the black background , also referred as non-brain mask in this paper, contains a large number of pixels, but provides little useful information for segmentation of the tumor. So according to this prior knowledge, we let be 0 and extract a non-brain mask in advance and merge it with the prediction maps when testing.
To compute the branch loss and auxiliary loss in U-Net1, we let . Then is calculated by:
where denotes the pixel number in region , and , usually more than 1, is for adjusting the proportion of positive and negative samples in a training batch, thus eliminating the class imbalance. Because is a random sampling operation, as long as is guaranteed, where is the times of the network to pass whole training set, all pixels in the dataset are expected to participate in the calculation of loss for at least one time so that no information from the brain tumor will be lost.
For the branch loss and auxiliary loss in U-Net2, we let: , which means that U-Net2 only learns the segmentation of tumor substructures. Thus, the loss function of our network is
where is the auxiliary loss, is the weighted coefficient, and is the regularization term with hyper-parameter for tradeoff with the other terms.
For the testing process, we extract the non-brain mask in advance and fuse it with the outputs of branch1 and branch2 to get the final segmentation result.
3 Experimental Results
3.1 Datasets and Pre-processing
We evaluate our method on the training data of BraTS challenge 2017. It consists of 210 cases of high-grade glioma and 75 cases of low-grade glioma. In each case, four modal brain MRI scans: T1, T2, T1ce and FLAIR, are provided, respectively. The resolution of MRI scans is . Pixel-level labels provided by the radiologists are: 1 for necrotic (NCR) and the non-enhancing tumor (NET), 2 for edema (ED), 4 for enhancing tumor (ET), and 0 for everything else. In our experiments, 210 high-grade cases are divided into three subsets at a ratio of , i.e., 126 training data, 42 validation data and 42 testing data are attained. Low-grade cases are not used. Besides, about 30% scans that don’t contain any tumor structure are discarded in the training process. All the input images are processed by N4-ITK bias field correction and intensity normalization. Data augmentation including random rotation and random flip is used in all algorithms.
3.2 Implementation Details
All the algorithms were implemented on a computer with NVIDIA GeForce GTX1060Ti (6 GB) GPU and Intel Core i5-7300HQ CPU @ 2.5 GHz (8GB), together with the open-source deep learning framework pytorch. The contour weightis set to 2, and
is set to 1.5. The extracted tumor contour is about 10 pixels wide. In the training phase, we use stochastic gradient descent (SGD) with momentum to optimize the loss function. The momentum parameter is 0.9, learning rate of
initially and decreased by a factor of 10 every ten epochs until a minimum threshold of. The weight decay is set to . The models are trained for about 50 iterations until there is an obvious uptrend in the validation loss. The weighted coefficient is set to 0.1 initially and decreased by a factor of 10 every ten epochs until a minimum threshold of .
For segmentation results, we evaluate the following three parts: (1) Whole Tumor (WT); (2) Tumor Core (TC); and (3) Enhancing Tumor (ET). For each part, Dice score, sensitivity and specificity are defined as follows:
where , denote the segmentation results and labels, and denote negatives in , positives in , negatives in and positives in , respectively.
3.3 Results and Analysis
To verify the effectiveness of our proposed network and loss weighted sampling scheme, we compare our CUN method with several state-of-the-art deep learning algorithms including U-Net , BFCN  and DRN .
The visual results are shown in Fig. 3. It can be seen that our proposed CUN+LWS has the best segmentation sensitivity among the five methods, and is better at segmenting the tiny sub-structures within a brain tumor. The distributions of the obtained Dice scores and sensitivities are presented in Fig. 4. The quantitative results of the five models on the testing set are listed in Table 1. As we can see, our CUN method outperforms the three state-of-the-art methods by approximately 1.5 percent in dice score and 2 percent in sensitivity. Besides, when LWS is adopted, there is an additional average growth of 1.5 percent in sensitivity, which indicates the effectiveness of LWS.
|Method||Whole Tumor||Tumor Core||Enhance Tumor|
Inspired by the hierarchical structure of brain tumors, we proposed a novel cascaded U-Net for the segmentation of brain tumor. To make the network work more effectively, three strategies were designed. The residual blocks and the auxiliary supervision can help gradient flow more smoothly during training, and alleviate the gradient vanishing problem caused by increasing network depth. The between-net connections can transmit the high resolution information from the shallow layer to the deeper layer and obtain more refined segmentation results. Furthermore, we presented a loss weighted sampling scheme to adjust the number of samples in different classes to solve the severe class imbalance problem. Our experimental results demonstrated the advantages of our network and the effectiveness of the loss weighted sampling scheme.
This work was supported by the Project supported the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No. 61621005), the Major Research Plan of the National Natural Science Foundation of China (Nos. 91438201 and 91438103), the National Natural Science Foundation of China (Nos. 61876221, 61876220, 61836009, U1701267, 61871310, 61573267, 61502369 and 61473215), the Program for Cheung Kong Scholars and Innovative Research Team in University (No. IRT_15R53), the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) (No. B07048), and the Science Foundation of Xidian University (Nos. 10251180018 and 10251180019).
-  (2017) Automatic brain tumor detection and segmentation using u-net based fully convolutional networks. In annual conference on medical image understanding and analysis, pp. 506–517. Cited by: §1.
-  (2017) Brain tumor segmentation with deep neural networks. Medical image analysis 35, pp. 18–31. Cited by: §1.
-  (2015) A convolutional neural network approach to brain tumor segmentation. In International Workshop on Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pp. 195–208. Cited by: §1.
-  (2017) CNN-based segmentation of medical imaging data. arXiv preprint arXiv:1701.03056. Cited by: §1.
-  (2017) Dilated convolutions for brain tumor segmentation in mri scans. In International MICCAI Brainlesion Workshop, pp. 253–262. Cited by: §1, §3.3, Table 1.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: §1, §3.3, Table 1.
-  (2017) Boundary-aware fully convolutional network for brain tumor segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 433–441. Cited by: §1, §3.3, Table 1.
-  (2017) Multi-task fully convolutional network for brain tumour segmentation. In Annual Conference on Medical Image Understanding and Analysis, pp. 239–248. Cited by: §1.
-  (2017) Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks. In International MICCAI Brainlesion Workshop, pp. 178–190. Cited by: §1.