1 Introduction
Accurate segmentation of brain anatomy provides the basis for quantitative measurements such as volume, thickness and shape from magnetic resonance images (MRIs). These measurements are widely used in the research field of neuroscience to investigate structural brain changes associated with age and disease. Since the manual segmentation of brain MRI scans is expert demanding and time consuming, a growing number of computational approaches have been proposed for accurate automatic segmentation of subcortical brain structures, which is particularly important for the vast boom of largescale brain studies.
Recently, several stateoftheart methods based on convolutional neural networks (CNNs) have been proposed for fast or accurate segmenting competence of brain anatomy. These CNNbased networks can be divided into three main categories: multitask voting models, Ushaped models and downsampling models, where the latter two base on fully convolutional networks(FCNs)
[1]. Multitask voting models, like DeepNAT[2], consist of conventional CNN components such as convolution, maxpooling and linear fullyconnected layers. Multiple predictions are generated from repeated linear fully connected layers and then reshaped into the threedimensional spatial space. Then spatial predictions add their votes to the segmentation mask stepbystep, leading to the segmenting time with no less than one hour per scan. With engineered means like cascaded approach and spectral coordinates, such 3D multitask models generate segmentation of subcortical structures in a stateoftheart accuracy, although time demanding. Unlike multitask voting models, Ushaped models concentrate more on dense prediction. Originated from FCN with upsampling and skipconnection path
[3], this category takes the advantage of fast fullimage segmentation with satisfactory but limited accuracy. Later, units built upon ResNet[4] and DenseNet[5] have separately been embedded into the Ushaped model, namely the successive stateoftheart methods of voxelwise residual network(VoxResNet)[6] and fully convolutional densenet(FCDenseNet)[7]. Although 3D Ushaped models have the capacity of fast fullimage segmenting in seconds, with complex network and millions of learnable parameters, fast fullimage training fails to achieve due to the limitation of GPU memory and annotated data. In fact, generally trained on patches cropped from the original images, Ushaped models run a timeconsuming training process with only small minibatch available, whose accuracy demands further improvement to align with multitask models. In summary, the absence of lightweight learnable models for fast but accurate segmentation still retains a bottleneck in the research field of brain anatomy.Downsampling models point out a solution to address this issue[8, 9]. Similarly based on FCN[1], this type of method discards upsampling layers in FCN and obtains downsampled dense inference directly from outputs of bottleneck layer, which enables fast segmentation and overcomes the computational burden in training process. With limited performance from the oversimplified architecture, this branch of FCN needs further exploration. In this paper, we propose a 3D endtoend downsampling model of fully dense and fully convolutional network (FDFCN) for T1weighted MRI brain structural segmentation, that shows competence in accuracy and efficiency in both training and segmenting process. In time consumption, this method retains the capacity of fast segmentation originated from FCNs, while vastly accelerating the training process with less memory occupied compared to the stateoftheart Ushaped ones, owing to the carefully designed architecture. In segmenting competence, experiments performed over the IBSR dataset show that FDFCN produces higher dice accuracy of 11structural brain segmentation than the other FCNbased methods. For convincing comparison, we firstly apply FCDenseNet, the stateoftheart method among FCNs, on multiple label subcortical brain segmentation of 3D volumes by the way. The experiments show that FDFCN achieves a 3.66% absolute improvement of dice accuracy (89.81% vs 86.15%) than FCDenseNet. Furthermore, FDFCN inherit the accurately segmenting capability of the multitask models (89.81% vs 89.76%) in averaging 53 seconds vs 73 minutes per scan. The main contributions of FDFCN are (i) vivid division of local and global information endows the proposed FDFCN with better capability of dense inference, while alleviating the problem of parameter explosion from dense connection, (ii) the newly designed dense blocks to enlarge receptive fields without significantly increasing parameters and (iii) the first incorporation of spectral coordinates to FCNs for spatial context. FDFCN could further exploit its potential for semantic segmentation of brain anatomy by incorporating the fully connected conditional random field (CRF) and finetuning learning strategy.
2 Method
We start by presenting the proposed multiscale fully dense and fully convolutional architecture of FDFCN, which is at the core of our segmenting method. Section 2.2 introduces the incorporation of dilated convolutions to design new dense blocks with enlarged receptive fields but negligible parameter increase. Then section 2.3 describes the calculation of spectral coordinates and finally section 2.4 gives other details.
2.0.1 2.1 Network Architecture.
As mentioned in section 1, the incorporation of DenseNets leads to an outstanding performance among Ushaped FCNs[10]. However, since the training process consumes both time and memory, such architecture faces with difficulties to give a solution to threedimensional brain segmentation. As downsampling models own similar dense inference and are easier to train owning to the natural lightweight size, we propose to apply DenseNet carefully to the downsampling model for further improvement.
DenseNet[5] designs a sophisticated connectivity that iteratively concatenates all output features in a feedforward fashion. Dense blocks form the basis of DenseNet, which is further composed of unit layers. The output of each unit layer has feature maps where , hereafter referred as growth rate, is typically set to a small value (e.g. ).
Assuming dense blocks contain densely connected unit layers, the output of a dense block is the concatenation of outputs from all layers, with input concatenated as well. The number of feature maps increases by after each dense block, growing linearly. With such linear growth of input channels, the parameter of convolutions explodes as depth increases.
As is in Fig. 1
, FDFCN is mainly composed of dense blocks. To alleviate parameter explosion, the inputs of dense blocks are no longer directly concatenated to subsequent layers in feature extraction (FE) process. This makes a slimmer model, although global context tends to be lost as well. To model both local and global context, FDFCN embeds intermediatelayer outputs in the final prediction, which encourages consistency between features extracted at different scales and embeds finegrained information directly in the segmenting process. Since such construction allows the model fitness with both slimmer size and better performance, we adopt FDFCN a multiscale architecture of network. Apart from FE dense blocks, dense blocks are first explored in the fully connected (FC) process. Dense connection hierarchically organizes FC layers and enables the local and global context pass through all FC layers, leading to a outperformance. Other than that, FC layers give outputs with lower dimension and thus the parameters in the FC process are reduced by multiple times, which helps to alleviate the problem of overfitting. In addition, other components of FDFCN, namely the downsampling and classifying layers, carry out only one convolution operation. The varying feature resolution through downsampling layers and the limited label classes of classifying layers prevent further appliance of dense networks on FDFCN. Since the amount of these layers is few in number compared to that of feature extraction and fully connected ones, we adopt FDFCN a fully dense architecture of network.
2.0.2 2.2 New Dense Blocks.
Unlike unit layers in original blocks of DenseNets, which commonly consists of batch normalization (BN), activation function and convolution, hybrid dilated (HD) layers are newly designed to enlarge receptive fields without significantly increasing parameters. The HD layers are ameliorated by dilated convolution, forming the basis of newly designed dense blocks. Dilated convolution is constructed by inserting
zeros between neighboring voxels in the 3D convolution kernel, where r corresponds to the dilation rate. For a convolution kernel with size , the size of resulting dilated filter is , where . Given a , we have , leading to the enlarged receptive fields. With the theoretical issue of gridding problem solved by hybrid dilated convolution[11], we are inspired to apply such solution to construct the unit layers of dense blocks. The new HD layer consists of BN, PReLU and a group of parallel convolutions, where the output is the summation of outputs from all parallel convolutions(see Fig. 1). These convolutions share the same output channel and convolution kernel size, discriminated by different dilation rates and corresponding padding lengths. The group of dilation rates
should meet two conditions in order to ensure the parallel convolutions cover a larger region without any holes or missing edges. Firstly, the dilation rate within a group should not have a common factor relationship. And secondly, giving , there should be . Here we adopt parallel convolutions with dilated rates and other combinations could be explored for further improvement.2.0.3 2.3 Spectral Coordinates.
A downside of patchbased FDFCN is the loss of spatial context, which provides valuable information for structures with low tissue contrast. To increase the spatial information, we adopt the spectral coordinates proposed in DeepNAT[2], augmenting the patches with location information in the final prediction. The calculation of spectral coordinates starts with the definition of the adjacency matrix . The weight in W between two points and is set to if both points are neighbors and within the brain mask, otherwise set to . Then the Laplacian operator on the volume is , with the node degree matrix where
and others are zero. Then we solve the Laplacian eigenvalue problem
with eigenvaluesand eigenvectors
. We compute the first three eigenvectors corresponding to three eigenvalues with largest real part, where each eigenvector is reshaped to a 3D image and the ensemble forms the spectral brain coordinates. To provide FDFCN with more information, we combine three spectral coordinates with three Cartesian ones. The Cartesian coordinates are normalized by dividing the separate dimensional length of the brain volume. To the best of our knowledge, this is the first application of eigenvectors to FCNbased methods.2.0.4 2.4 Details.
In FDFCN, there are four FE dense blocks, three transition down convolutions, one FC dense block and one classifying layer (convolution II) in total, including the first feature extractor convolution I. Convolution I owns kernel size of
, with padding & stride length 3&1. The FE dense blocks consist of HD layers and FC dense block consists of FC layers(see Fig.
1), where the growth rate of HD layers is and that of FC layers is . The transition down convolutions adopt the convolution kernel size of with padding & stride length either 1&2(the first) or 0&1(the second and the third), where the included convolutional transition down (CTD) layers increase the channels by from inputs to outputs. The outputs of three FE dense blocks are concatenated together before the FC dense block, where outputs of the former two are center cropped to maintain size consistency. In addition, the spectral and cartesian coordinate patch are also concatenated here, both centered at the same point with the input and the output of FDFCN. Finally, the FC and classifying layer share the convolution kernel size of and the padding length of . Note that PReLU is adopted as the activation function and BN is exploited in HD, CTD and FC layers. Since FDFCN is an endtoend approach, the widely used method of CRF is not adopted here. The FDFCN version with CRF could yield better performance if needed.3 Result
We evaluate FDFCN on the IBSR dataset, which consists of T1weighted MRI scans with size for all. The dataset contains expertlabelled segmentations of brain structures, among which a subset of important structures are considered (see Fig. 3). In addition, we employed a
fold cross validation strategy for unbiased estimates of model performance, where each fold is composed of
training examples, validation examples and test examples.In data arrangement, we select the input patch size and corresponding output patch size for FDFCN as a tradeoff between a large enough image region and a fast processing speed. In training process, we randomly sample at most patches per structure from the skullstripped MRIs, where we double the number of patches for cerebral cortex and cerebral white matter to account for the higher variability in these classes. In segmenting process, the output patches of size are stacked up to form the segmenting image and the corresponding input patches of size are cropped from the original image, centered at the same locations. Further more, we apply intensity normalization to the input patches in division by .
FDFCN is implemented on the Pytorch framework. The optimization of network parameters is performed with adaptive moment estimation (Adam) for fast convergence, using crossentropy as cost function. The actual learning rate at each epoch is
, where the base learning rate is set to and the maximum epoch . However, we observed that the performance did not improve after epochs, allowing us to early stop at this point. In addition, the minibatch size of fills up most of the GB GPU memory on the NVIDIA Tesla P100 GPU.In the experiment we horizontally compare the accuracy of FDFCN against two stateoftheart methods, FCDenseNet and DeepNAT, on the IBSR dataset with a cross validation strategy. The average Dice coefficient of FDFCN, FCDenseNet and DeepNAT are 89.81%, 86.15% and 89.76% separately, with average IoU coefficients 81.93%, 76.25% and 81.83%. The segmenting process of FDFCN consumes average 53 seconds per image, while FCDenseNet 21 seconds and DeepNAT 73 minutes. The 12GB GPU memory limited the full image segmentation of FCDenseNet let alone full image training, since which we adopt patchbased FCDenseNet with patch resolution and observe time increase in segmenting process. In addition, we incorporate coordinates in the bottleneck layer of FCDenseNet for control. Fig. 3 shows the 11structural comparison of Dice and IoU scores for all three methods, where the Dice and IoU coefficients show the same trend in performance measurement. And visual comparison of these three methods is illustrated in Fig. 3, where the rugged surface is accurately segmented by FDFCN without foreign particles. Since DeepNAT is a highly accurate but time consuming segmenting method, FDFCN inherit the accurate segmentation capability (Dice 89.81% vs 89.76%) with vast segmenting efficiency (53 seconds vs 73 minutes). While FCDenseNet outperforms among FCNbased methods and enables fast segmentation, FDFCN performs significantly better than FCDenseNet (Dice 89.81% vs 86.15%) with easier training (about 1.5 hours vs 3 days per epoch) and fast segmenting (53 vs 21 seconds). The slight segmenting time increase of FDFCN comes from the downsampled patch size of output, compared to that of FCDenseNet. Since a T1weighted MRI scan is generated in average 10 minutes from MRI machines, we believe such increase in time is negligible and FDFCN could nearly achieve the realtime segmentation.
Further more, we vertically evaluate the impact of the proposed contributions in FDFCN through control experiments. By introducing the newly designed dense blocks, we observe a 1.24% improvement of dice accuracy. And by incorporating spectral and cartesian coordinates, we observe a 1.37% dice improvement. Apart from these contributions, the multiscale fully dense and fully convolutional architecture of network with ordinary dense blocks and no coordinates still outperforms other 3D FCNbased methods.
4 Discussion and Conclusion
We have described FDFCN, a FCNbased multiscale fully dense network for semantic segmentation of brain anatomy. In the 11structural segmenting experiment on IBSR, FDFCN produces the best accuracy compared to two stateofart methods, with fast segmenting and easy training. In the future, larger scale experiments on extensive datasets will be investigated. And we intend to adopt CRF and finetuning training strategy later to further explore the improvement of model competence.
References

[1]
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 34313440.
 [2] Wachinger C, Reuter M, Klein T. DeepNAT: deep convolutional neural network for segmenting neuroanatomy[J]. NeuroImage, 2018, 170: 434445.
 [3] Çiçek Ö, Abdulkadir A, Lienkamp S S, et al. 3D UNet: learning dense volumetric segmentation from sparse annotation[C]//International conference on medical image computing and computerassisted intervention. Springer, Cham, 2016: 424432.
 [4] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770778.
 [5] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 47004708.
 [6] Chen H, Dou Q, Yu L, et al. VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images[J]. NeuroImage, 2018, 170: 446455.
 [7] Zhang R, Zhao L, Lou W, et al. Automatic segmentation of acute ischemic stroke from DWI using 3D fully convolutional densenets[J]. IEEE transactions on medical imaging, 2018, 37(9): 21492160.
 [8] Kamnitsas K, Ledig C, Newcombe V F J, et al. Efficient multiscale 3D CNN with fully connected CRF for accurate brain lesion segmentation[J]. Medical image analysis, 2017, 36: 6178.
 [9] Dolz J, Desrosiers C, Ayed I B. 3D fully convolutional networks for subcortical segmentation in MRI: A largescale study[J]. NeuroImage, 2018, 170: 456470.
 [10] Jégou S, Drozdzal M, Vazquez D, et al. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017: 1119.

[11]
Akkus Z, Galimzianova A, Hoogi A, et al. Deep learning for brain MRI segmentation: state of the art and future directions[J]. Journal of digital imaging, 2017, 30(4): 449459.
Comments
There are no comments yet.