Hydrocephalus is a condition that there is an abnormal accumulation of cerebrospinal fluid (CSF) in cavities within the patient’s brain. It is related to the introduction of blood and proteins into the CSF during surgery, accident, etc., the change in ventricular size is quite large by comparison of coregistered images. Thus, it is desirable to identify the incidence of the phenomenon, and to correlate the presence or absence of such a phenomenon with clinical symptoms. In order to precisely evaluate the damage of the hydrocephalus, it is important to identify and parcellate the consciousness-related regions-of-interest (ROIs) in brain MR images. However, manual segmentation on these ROIs is generally time-consuming and error-prone due to high inter- or intra-operator variability especially in hydrocephalus patient data. Therefore, fully automated method is essential for hydrocephalus patient brain segmentation.
With the success of deep learning in medical imaging, supervised segmentation approaches built on 3D convolution neural networks (CNNs) have produced accurate segmentation results at high speed. For example, Snehashiset al.  segmented white matter lesions from multi-contrast MR images by CNN. Pim et al.  used an adversarial training approach to improve CNN-based brain MR image segmentation. Moreover, Mohsen et al.  trained a CNN on legacy brain MR images, and evaluated the performance of the domain-adapted network on the same task but with images from different domains.
On the other hand, the multi-atlas-based segmentation (MABS) is also considered as an important and effective method, especially before the era of deep learning. The process of deformable image registration plays pivotal important role in MABS, which aims at obtaining a deformation field to align the moving image with the fixed image in a topology-preserving way. Conventionally many registration approaches [1, 10, 11] intend to measure the deformation fields for the to-be-registered image pair in iterative optimization, which results in very slow computation speed. Recent deep-learning-based algorithms, such as VoxelMorph 
, turn to deep neural networks to estimate the deformation fields, and greatly reduces the algorithm runtime with GPU support.
Although the above-mentioned segmentation methods are effective for normal brain images, segmenting the hydrocephalus patients’ images remains a challenging task, due to the high variations of anatomical structures. Specifically, the pathology-induced deformations vary a lot between individuals. Traditional CNN architectures generalize poorly on distorted samples such as hydrocephalus data. Therefore, our goal is to resolve the segmentation problem when using hydrocephalus patients training data by CNN architecture model, which is further described as follows.
In this paper, we propose a novel CNN-based method for automatic brain MR image segmentation, the main contributions are three-fold: We deploy the hard-attention module for generating atlas map, which includes some crucial ROIs that can resist the influences from large variations of anatomical structures as well as limited training samples. We also utilize VoxelMorph to significantly speed up MABS algorithm and generate segmentation results as our attention maps. We refer from DANet  and develop a multi-level soft attention module for better generalization on distorted samples, which splits the brain into two tasks, i.e. coarse detection and fine segmentation, according to anatomical knowledge. We also address the position self-attention mechanism to capture the shape variations between any two regions of the feature maps, to ease the poor generalization abilities on distortion samples. The proposed method achieves much improvement in robustness and accuracy dealing with all 17 ROIs of large variations. To the best of our knowledge, this is the first work that employ deep learning for hydrocephalus patient brain segmentation.
2 material and method
In this work, we collected 21 clinical hydrocephalus patients, each of which obtained the MR images in T1, FA and ADC modalities. Note that all these subjects have hematoma volume and hydrocephalus disease, causing large deformation in brain anatomical structures. Specifically, our aim is to segment the consciousness-related brain regions on 3T MRI scans following , which are Insula-R (IR), Insula-L (IL), Thalamus-R (TR), Thalamus-L (TL), internal Capsule-R-Ant (ICRA), internal Capsule-R-Post (ICRP), internal Capsule-L-Ant (ICLA), internal Capsule-L-Post (ICLP), Cingulate-R-Ant (CRA), Cingulate-R-Post (CRP), Cingulate-L-Ant (CLA), Cingulate-L-Post (CLP), Medial prefrontal cortex-R (MCR), Medial prefrontal cortex-L (MCL), Inferior parietal lobule-L (IPL), Inferior parietal lobule-R (IPR) and Brainstem (B).
Note that we grouped the annotated images into five-fold cross-validation sets. Some preprocessing works have also been made in order to feed the neural network with this data: First, the images were resized to make them isotropic, with a voxel size of
with trilinear interpolation. Then, the images were normalized (dividing by the maximum intensity value in foreground region) in order to improve convergence. We only utilized random flip in three directions during training and we did’t use any augmentation during testing period.
There are two modules designed for our hydrocephalus brain segmentation method: the hard and soft attention modules. The whole pipeline of our method and proposed hard and soft attention fully convolutional network (FCN) is shown in Fig. 1. The details about the two modules are further illustrated in Section 2.1 and 2.2, respectively.
2.2 Hard attention module
The hard attention module is designed for generating attention maps as prior knowledge. In our implementation, the coarse segmentation results obtained with MABS are taken as hard attention maps, and we use VoxelMorph for more efficient and robust deformable registration. More specifically, we learn parameters of a function which will generate deformation field to warp a 3D volume to a fixed volume
. The loss function is calculated by measuring negative similarity between the warped imageand the fixed image . During test stage, given the new images and , the deformation field is obtained by evaluating and the segmentation for is transferred through warping the label of with . We have multiple segmentation results for with different fixed images , and these transferred labels will be further fused to a single consensus segmentation, which are taken as our hard attention map. Note that there have no data leakage in this module.
2.3 Soft attention module
The multi-level soft-attention module consists of the following major processes: First, we split all of the ROIs shown in Fig. 1
into two groups based on the anatomical knowledge. We first merge the whole ROIs as foreground, and utilize the first classifier to local the foreground to ease the task difficulty. The cropped region of the input images based on the first classifier result is used as the input of the second classifier for 17 ROIs fine-grained segmentation.
Second, we design a position attention module in encoder-decoder FCN. We feed the encoder features into the position attention module, and generate new features of spatial long-range contextual information through the following three steps:
Generate a spatial attention matrix which models the spatial relationship between any two pixels of the features;
Perform a matrix multiplication between the attention matrix and the original features;
Perform an element-wise sum operation on the above multiplied resulting matrix and original features to obtain the final representations reflecting long range contexts.
|Base + Hard|
|Base + Hard + Soft|
|Comparison with the state-of-the-art methods|
|Base + Hard + Soft|
As in Fig. 1, given a local feature , we first feed it into a convolution layers to generate two new feature maps and , respectively, where . Then, we reshape them to , where is the number of pixels. After that, we perform a matrix multiplication between the transpose of and
, and apply a softmax layer to calculate the spatial attention map. Meanwhile, we feed feature into a convolution layer to generate a new feature map , and reshape it to . We also perform a matrix multiplication between and the transpose of , and reshape the result to . Finally, we conduct an element-wise sum operation with the features to obtain the final output .
In this section, we detail the results of our proposed automatic brain segmentation method in hydrocephalus dataset including hard- and soft-attention, which is also compared with the alternatives. The network is trained and applied with a Titan X GPU on Tensorflow and NiftyNet platform. Regarding the hyper-parameters, the basic learning rate is set to 0.0001. For multi-task learning, the learning rate decreases gradually. The momentum and weight decay are set to 0.9 and 0.0001, respectively. Note that we only utilize random flipping in three directions during training.
Table 1 shows the robustness of hard attention module in experiments. Generally, when concatenating atlas map, the MABS segmentation result with the original volumes feeding to the FCN model ("Base") achieves higher accuracy on the hydrocephalus patient dataset ("Base" vs "Base + Hard"). According to the result, this atlas map can give prior knowledge for the current segmentation task, which improve the segmentation accuracy and robustness.
To validate our framework with multi-level soft-attention module, we also compare several different settings and report the results in Table 1. First, we consider the single task of segmentation only. The Dice score of our implementation is . Second, with the network architecture ("Base + Hard" vs "Base + Hard + Soft") validated in multi-task learning, we further verify the contribution of the proposed soft-attention module. The experimental results in the middle of Table 1 shows it could outperform other solutions. Therefore, we conclude that the proposed soft-attention is beneficial to the segmentation task.
Finally, we compare our proposed method with other state-of-the-art algorithms including Unet , Vnet , PSPnet , Encnet  and DeeplabV3  in Table 1. The results show that the proposed method outperforms all the methods under comparison in the five-fold validation set. We have also provided visual inspection of the typical segmentation results (Unet vs. proposed method) with the ground truth in Fig. 2. The labeling result of the region inside the yellow box shows that, with the integration of our proposed module, the labeling accuracy and robustness is improved. Four patient cases which are failed to be located by Unet are successfully captured by our framework.
Furthermore, we compare our method with the state-of-arts method for each ROI in Table 2, it is shown that our method has demonstrated top-tier performance in each of ROIs compared with Unet solution. Our method can obtain more accurate result for hydrocephalus patient dataset.
We have proposed a novel and effective atlas-guided, multi-level soft-attention framework for semantic segmentation of brain MR images in hydrocephalus patient dataset. Specifically, we proposed a hard-attention module to give the prior knowledge from VoxelMorph based MABS method to improve the robustness of network. Moreover, we decomposed the very challenging semantic segmentation task to several sub-tasks, which are associated with coarse-to-fine segmentation mechanism. Finally, we addressed the position attention module to FCN model which could capture long-range contextual information in the model. We have conducted comprehensive experiments on a large, custom medical image dataset which is the first work for hydrocephalus patient brain segmentation task according to our knowledge.
-  (1989) Multiresolution elastic matching. Computer vision, graphics, and image processing 46 (1), pp. 1–21. Cited by: §1.
An unsupervised learning model for deformable medical image registration. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9252–9260. Cited by: §1.
-  (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Cited by: Table 1, §3.
-  (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pp. 424–432. Cited by: Table 1, §3.
-  (2019) Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154. Cited by: §1.
-  (2017) Transfer learning for domain adaptation in mri: application in brain lesion segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 516–524. Cited by: §1.
-  (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: Table 1, §3.
-  (2017) Adversarial training and dilated convolutions for brain mri segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 56–64. Cited by: §1.
-  (2018) Multiple sclerosis lesion segmentation from brain mri via fully convolutional neural networks. arXiv preprint arXiv:1803.09172. Cited by: §1.
-  (1999) Nonrigid registration using free-form deformations: application to breast mr images. IEEE transactions on medical imaging 18 (8), pp. 712–721. Cited by: §1.
-  (1998) Image matching as a diffusion process: an analogy with maxwell’s demons. Medical image analysis 2 (3), pp. 243–260. Cited by: §1.
-  (2018) White matter deficits underlying the impaired consciousness level in patients with disorders of consciousness. Neuroscience bulletin 34 (4), pp. 668–678. Cited by: §2.1.
-  (2018) Context encoding for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160. Cited by: Table 1, §3.
-  (2017) Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890. Cited by: Table 1, §3.