1 Introduction
Recent advances in the treatment of gliomas have increased the demands on using magnetic resonance imaging (MRI) techniques for the diagnosis, tumor monitoring, and patient outcome prediction. Accurate segmentation of brain tumor is critical for diagnosis and treatment planning. However, automated brain tumor segmentation in multimodal MRI scans is a challenging task due to the heterogeneous appearance and shape of gliomas [3].
Deep learning has triumphed over various computer vision tasks. A flurry of research has leveraged Convolution Neural Networks (CNNs) for brain tumor segmentation and achieved great success. Havaei et al. [4] present a twopathway CNN architecture and predict the label for each pixel by taking as input a local image patch in a slidingwindow fashion. Ronneberger et al. [5] develop a fully convolutional network (FCN), namely UNet, to process the entire image for dense prediction. The network follows an encoderdecoder structure and is trained endtoend to produce a fullresolution segmentation. Although these 2D CNNbased approaches have achieved impressive segmentation performance, these models ignore crucial 3D spatial context given that most clinical imaging data are volumetric, e.g. 3D MR images. To better represent the 3D volumes of imaging data, Cicek et al. [1]
generalize the UNet from 2D to 3D by exploring 3D operations, e.g. 3D convolution and 3D max pooling, in the FCN, leading to the 3D UNet. Similarly, VNet
[2] uses volumetric convolutions to process MRI volumes and yields more accurate segmentation than the 2D approaches.It has been shown that an effective way of reasoning volumetric structure is to use 3D convolutions in deep neural network architectures [1, 2, 6]. However, using multiple layers of 3D convolutions suffers from high computational cost compared with regular 2D CNNs due to an extra dimension. A few attempts have been made to alleviate this issue by using lightweight network architectures. For example, 3DESPNet [7] extends ESPNet, a fast and efficient network based on pointwise convolution for 2D semantic segmentation, to 3D medical image data. SDUNet [8] takes advantages of the separable 3D convolution, which divides each 3D convolution into three parallel branches, in order to reduce the number of learnable network parameters. However, the performance of these efficient models is not comparable to the stateoftheart.
Contribution. In this paper, to bridge the gap between model efficiency and accuracy for 3D MRI brain tumor segmentation, we propose a novel 3D dilated multifiber network (DMFNet). It builds upon the multifiber unit [9], which uses the efficient group convolution, and introduces a weighted 3D dilated convolution operation to gain multiscale image representation for segmentation. DMFNet only has 3.88M parameters. Moreover, with the inference times of 0.019s on one GPU and 20.6s on one CPU for a single 3D volumetric segmentation, it achieves dice scores of 80.12%, 90.62% and 84.54% respectively for the enhancing tumor, the whole tumor and the tumor core on the 2018 BraTS challenge [10, 11].
2 Method
2.1 Dilated MultiFiber (DMF) Unit
3D convolution kernel is normally operated on the entire channels of the feature maps, which scales up the computational complexity exponentially in terms of floating point operations per second (FLOPs). Group convolution is an effective solution for model speeding up, which has been explored for efficient network design, e.g. ShuffleNet [12]. Although the grouping strategy could reduce the number of parameters, simply replacing the regular convolution with the group convolution may impact the information exchange between channels and hurt the learning capacity. Multifiber (MF) [9] is proposed for video action recognition and can facilitate information flow between groups. Inspired by that, we extend the multifiber unit design with an adaptive weighted dilated convolution to capture the multiscale features in brain MR images. In the following, we detail the key components of our DMF unit.
Channel Grouping. The idea of channel grouping is to split the convolutional channels as multiple groups that can reduce the connections between the feature maps and kernels for parameter saving significantly. As examples shown in Fig. 1 (a) and (b), the regular residual unit is grouped into parallel residual units that are called fibers. We assume the kernel size is constant, e.g. and denote and as the parameter amounts of Fig. 1 (a) and (b), respectively. Thus, we have , where is the number of channel. With the strategy of multiple fibers grouping, the amount of parameter comes to , which is times less than .
Multiplexer. To facilitate the information exchange between fibers, the convolutions, dubbed as multiplexer, are utilized for information routing among different fibers [9]. It is comprised of two convolution layers, as illustrated in Fig. 1. And the input channel is squeezed to and then inflated to . By employing two convolutions (), it can reduce half of the parameters as compared to using one convolution (). Besides, the residual shortcuts, which are placed outside the multiplexer and the entire unit, allow the information pass through from lower level to higher level directly, leading to enhanced learning capability without additional parameters.
Dilated Fiber. To enlarge the respective field and capture the multiscale 3D spatial correlations of the brain tumor lesions, the dilated convolution [13] is employed. As shown in Fig. 1 (d), the dilated fiber is comprised of three 3D dilated convolution branches with the dilation rates of , 2 and 3 respectively. We allocate the learnable weights , and to each dilated branches, and then sum them up. This weighted sum strategy is conductive to select most valuable information automatically from different field of view. The weight coefficients are oneinitialized, which means the branches contribute equally at the beginning of the training process.
2.2 Dilated MultiFiber Network (DMFNet) Architecture
Using the MF and DMF units as the building blocks, the overall encoderdecoder network architecture of DMFNet is shown in Fig. 2
. The 4channel input corresponds to 4modal MRI data. The main body of the network is composed by the MF/DMF units, excluding the first and last convolution layers. In the feature encoding stage, we apply the DMF unit in the first six encoding units to achieve multiscale representation, which is benefited by the various sizes of receptive field in the dilated convolution. In the decoding stage, the high resolution features from the encoder are concatenated with the upsampled features, which is similar to the UNet. We adopt the trilinear interpolation for upsampling the feature maps. Also, batch normalization and ReLU function are performed before each convolution operation of MF/DMF units.
3 Experiments and Results
3.1 Data and evaluation metric
The 3D MRI data, which provided by the Brain Tumor Segmentation (BraTS) 2018 challenge [11, 10], consists of four kinds of MR sequences, namely native T1weighted (T1), postcontrast T1weighted (T1ce), T2weighted (T2) and Fluid Attenuated Inversion Recovery (FLAIR). Each of them has a volume of . The labels for tumor segmentation include the background (label 0), necrotic and nonenhancing tumor (label 1), peritumoral edema (label 2) and GDenhancing tumor (label 4). The dataset consists of 285 cases of patients for training and 66 cases for validation. Although the testing set is not available currently, the performance of the validation set that assessed by the online evaluation server is used to validate the effectiveness of the proposed method.
Formally, the effectiveness is evaluated by the computational complexity and the segmentation accuracy. The complexity is determined by the amount of network parameters and FLOPs (i.e. multiplication and addition) [12]. The segmentation accuracy is measured by the dice score metrics, including Dice_ET – dice score of the enhancing tumor region (i.e. label 1), Dice_WT – the dice score of the whole tumor region (i.e. label 1, 2 and 4), and Dice_TC – the dice score of the regions of the tumor core (i.e. label 1 and 4).
3.2 Implementation details
In our experiments, we use a batch size of 12 and train the DMFNet model on 4 parallel Nvidia GeForce 1080Ti GPUs for 500 epochs. We adopt the Adam optimizer with an initial learning rate
. To increase the training data, we use the following data augmentation techniques: (i) random cropping the MRI data from voxels tovoxels; (ii) random mirror flipping across the axial, coronal and sagittal planes by a probability of 0.5; (iii) random rotation with the angle between
; (iv) random intensity shift between and scale between . The generalized dice loss (GDL) is employed to train the network. L2 norm is applied for model regularization with a weight decay rate of .3.3 Experimental results and analysis
Comparison with stateoftheart. We conduct experiments on the BraTS 2018 validation set and compare our method with the stateoftheart approaches. The performance comparison is presented in Table 1. Our proposed DMFNet achieves scores of 80.12%, 90.62% and 84.54% on Dice_ET, Dice_WT and Dice_TC, respectively. Compared to the best scores achieved by NVDLMED [14] (single model), it can be seen that our model only has marginal performance gaps of for the whole tumor, for the enhancing tumor and for the tumor core respectively. However, our DMFNet has less parameters and less FLOPs. Therefore, our method is a much more efficient algorithm yet can achieve comparable segmentation accuracy. We also show a visual comparison of the brain tumor segmentation results of various methods including 3D_UNet [1], Kao et al. [15] and our DMFNet in Fig. 3. It is obvious that DMFNet is able to generate better segmentation (especially at the class boundaries) due to the multiscale representation of dilated convolutions.
Model  Params(M)  FLOPs(G)  Dice_ET(%)  Dice_WT(%)  Dice_TC(%) 
0.75 MFNet (ours)  1.81  13.36  79.34  90.22  84.25 
MFNet (ours)  3.19  20.61  79.91  90.43  84.61 
DMFNet (ours)  3.88  27.04  80.12  90.62  84.54 
3D UNet^{1}^{1}1Original 3DUnet, where we use trilinear interpolation for upsampling rather than upconvolution. [1]  16.21  1669.53  75.96  88.53  71.77 
S3DUNet [8]  3.32  75.20  74.93  89.35  83.09 
3DESPNet [7]  3.63  76.51  73.70  88.30  81.40 
Kao et al. [15]  9.45  203.96  78.75  90.47  81.35 
No NewNet [16]  10.36  202.25  81.01  90.83  85.44 
NVDLMED [14]  40.06  1495.53  81.73  90.68  86.02 
Model efficiency. It is also evident from Table 1 that our DMFNet significantly outperforms the methods which have similar or close model complexity (# of parameters and FLOPs), i.e. S3DUNet and 3DESPNet. Without using the dilated convolution, the 3D MFNet further reduces the model complexity. Moreover, we devise a remarkably lightweight and efficient network (denoted by 0.75 MFNet in Table 1) by reducing the number of channels in MFNet (see Fig. 2) to 75%. Therefore, it has only 1.81M parameters and 13.36G FLOPs. Nevertheless, its dice scores still reveal the network has strong learning capability for 3D brain tumor segmentation. In addition, DMFNet obtains an average inference time of 0.019s on one GPU (Nvidia 1080Ti) or 20.6s on one CPU (E52690 v3 @ 2.60GHz) for a single 3D MR image segmentation.
Ablation study. The performance comparison between MFNet and DMFNet (Table 1) demonstrates that the dilated convolution is able to boost the dice scores. Since an adaptive weighting strategy is used for convolutions with different dilation rates (Fig. 1 (d)), its efficacy is justified in Table 2 by comparing it with the equal weight scheme (). Due to the ability of learning and selecting the multiscale context information adaptively, such weighting strategy results in more favorable scores, in particular for the Dice_ET.
Model  Weighting scheme  Dice_ET(%)  Dice_WT(%)  Dice_TC(%) 

DMFNet  78.969  90.539  84.207  
DMFNet  Learnable ,,  80.12  90.62  84.54 
The weights , and in the training process are plotted in Fig. 4. It is noticed that (green line, corresponds to small receptive field) plays a major role in the first unit, and its effect is decreasing in the higher layers. While, we also observe that the network favors the dilated branch with (red line, corresponds to large receptive field), which has leading influences in DMF units 26. It may be because the kernel with small receptive field is not able to capture useful semantic information in the higher layers which have small dimension.
4 Conclusion
In this work, we have developed a lightweight and efficient Dilated MultiFiber network, with only 3.88M parameters and around 27G FLOPs, that can achieve realtime inference for 3D brain tumor segmentation in MRI. To reduce the heavy computational burden in 3D convolution significantly, we explored multifiber units with the spirit of group convolution. Meanwhile, we introduced a learnable weighted 3D dilated convolution to gain multiscale image representation, which is able to enhance the segmentation accuracy. The experimental results on the 2018 BraTS challenge show that our approach achieved comparable dice sores (80.12%, 90.62% and 84.54% for ET, WT and TC, respectively) yet with 10 less model parameters and 50 less computational FLOPs, compared with the stateoftheart algorithm, e.g. NVDLMED [14]. This makes our method more practical for handling largescale 3D medical datasets.
References
 [1] Özgün Çiçek, Ahmed Abdulkadir, Soeren S Lienkamp, Thomas Brox, and Olaf Ronneberger. 3D UNet: learning dense volumetric segmentation from sparse annotation. In MICCAI, 2016.
 [2] Fausto Milletari, Nassir Navab, and SeyedAhmad Ahmadi. Vnet: Fully convolutional neural networks for volumetric medical image segmentation. In 3DV, 2016.
 [3] Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv preprint arXiv:1811.02629, 2018.
 [4] Mohammad Havaei, Axel Davy, David WardeFarley, Antoine Biard, Aaron Courville, et al. Brain tumor segmentation with deep neural networks. Medical image analysis, 2017.
 [5] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. Unet: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
 [6] Qi Dou, Lequan Yu, Hao Chen, Yueming Jin, Xin Yang, Jing Qin, and PhengAnn Heng. 3D deeply supervised network for automated segmentation of volumetric medical images. Medical image analysis, 2017.
 [7] Nicholas Nuechterlein and Sachin Mehta. 3DESPNet with Pyramidal Refinement for Volumetric Brain Tumor Image Segmentation. In MICCAI, 2018.
 [8] Wei Chen, Boqiang Liu, Suting Peng, Jiawei Sun, and Xu Qiao. S3DUNet: Separable 3D UNet for Brain Tumor Segmentation. In MICCAI, 2018.
 [9] Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, and Jiashi Feng. Multifiber networks for video recognition. In ECCV, 2018.
 [10] Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello, et al. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Scientific data, 2017.
 [11] Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree KalpathyCramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE transactions on medical imaging, 2015.
 [12] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In CVPR, 2018.
 [13] Fisher Yu and Vladlen Koltun. Multiscale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.

[14]
Andriy Myronenko.
3D MRI brain tumor segmentation using autoencoder regularization.
In MICCAI, 2018. 
[15]
PoYu Kao, Thuyen Ngo, Angela Zhang, Jefferson W Chen, and BS Manjunath.
Brain tumor segmentation and tractographic feature extraction from structural MR images for overall survival prediction.
In MICCAI, 2018.  [16] Fabian Isensee, Philipp Kickingereder, Wolfgang Wick, Martin Bendszus, and Klaus H MaierHein. No newnet. In MICCAI, 2018.
Comments
There are no comments yet.