Robust Brain Magnetic Resonance Image Segmentation for Hydrocephalus Patients: Hard and Soft Attention

by   Xuhua Ren, et al.
Shanghai Jiao Tong University

Brain magnetic resonance (MR) segmentation for hydrocephalus patients is considered as a challenging work. Encoding the variation of the brain anatomical structures from different individuals cannot be easily achieved. The task becomes even more difficult especially when the image data from hydrocephalus patients are considered, which often have large deformations and differ significantly from the normal subjects. Here, we propose a novel strategy with hard and soft attention modules to solve the segmentation problems for hydrocephalus MR images. Our main contributions are three-fold: 1) the hard-attention module generates coarse segmentation map using multi-atlas-based method and the VoxelMorph tool, which guides subsequent segmentation process and improves its robustness; 2) the soft-attention module incorporates position attention to capture precise context information, which further improves the segmentation accuracy; 3) we validate our method by segmenting insula, thalamus and many other regions-of-interests (ROIs) that are critical to quantify brain MR images of hydrocephalus patients in real clinical scenario. The proposed method achieves much improved robustness and accuracy when segmenting all 17 consciousness-related ROIs with high variations for different subjects. To the best of our knowledge, this is the first work to employ deep learning for solving the brain segmentation problems of hydrocephalus patients.


page 2

page 4


Brain MR Image Segmentation in Small Dataset with Adversarial Defense and Task Reorganization

Medical image segmentation is challenging especially in dealing with sma...

Global Deep Learning Methods for Multimodality Isointense Infant Brain Image Segmentation

An important step in early brain development study is to perform automat...

Weak labels and anatomical knowledge: making deep learning practical for intracranial aneurysm detection in TOF-MRA

Supervised segmentation algorithms yield state-of-the-art results for au...

Segmentation-based Method combined with Dynamic Programming for Brain Midline Delineation

The midline related pathological image features are crucial for evaluati...

Feature-Fused Context-Encoding Network for Neuroanatomy Segmentation

Automatic segmentation of fine-grained brain structures remains a challe...

Brain Tissues Segmentation on MR Perfusion Images Using CUSUM Filter for Boundary Pixels

The fully automated and relatively accurate method of brain tissues segm...

Factorisation-based Image Labelling

Segmentation of brain magnetic resonance images (MRI) into anatomical re...

1 Introduction

Hydrocephalus is a condition that there is an abnormal accumulation of cerebrospinal fluid (CSF) in cavities within the patient’s brain. It is related to the introduction of blood and proteins into the CSF during surgery, accident, etc., the change in ventricular size is quite large by comparison of coregistered images. Thus, it is desirable to identify the incidence of the phenomenon, and to correlate the presence or absence of such a phenomenon with clinical symptoms. In order to precisely evaluate the damage of the hydrocephalus, it is important to identify and parcellate the consciousness-related regions-of-interest (ROIs) in brain MR images. However, manual segmentation on these ROIs is generally time-consuming and error-prone due to high inter- or intra-operator variability especially in hydrocephalus patient data. Therefore, fully automated method is essential for hydrocephalus patient brain segmentation.

Figure 1: Illustration of our proposed method: 1) hard-attention module which combines atlas map from VoxelMorph-based MABS to increase the robustness of model; 2) soft-attention module which decomposes the single segmentation task into several sub-tasks including coarse detection and fine segmentation. Moreover, we address a position-attention module into encoder-decoder architecture for generating spatially long-range contextual information.

With the success of deep learning in medical imaging, supervised segmentation approaches built on 3D convolution neural networks (CNNs) have produced accurate segmentation results at high speed. For example, Snehashis

et al. [9] segmented white matter lesions from multi-contrast MR images by CNN. Pim et al. [8] used an adversarial training approach to improve CNN-based brain MR image segmentation. Moreover, Mohsen et al. [6] trained a CNN on legacy brain MR images, and evaluated the performance of the domain-adapted network on the same task but with images from different domains.

On the other hand, the multi-atlas-based segmentation (MABS) is also considered as an important and effective method, especially before the era of deep learning. The process of deformable image registration plays pivotal important role in MABS, which aims at obtaining a deformation field to align the moving image with the fixed image in a topology-preserving way. Conventionally many registration approaches [1, 10, 11] intend to measure the deformation fields for the to-be-registered image pair in iterative optimization, which results in very slow computation speed. Recent deep-learning-based algorithms, such as VoxelMorph [2]

, turn to deep neural networks to estimate the deformation fields, and greatly reduces the algorithm runtime with GPU support.

Although the above-mentioned segmentation methods are effective for normal brain images, segmenting the hydrocephalus patients’ images remains a challenging task, due to the high variations of anatomical structures. Specifically, the pathology-induced deformations vary a lot between individuals. Traditional CNN architectures generalize poorly on distorted samples such as hydrocephalus data. Therefore, our goal is to resolve the segmentation problem when using hydrocephalus patients training data by CNN architecture model, which is further described as follows.

In this paper, we propose a novel CNN-based method for automatic brain MR image segmentation, the main contributions are three-fold: We deploy the hard-attention module for generating atlas map, which includes some crucial ROIs that can resist the influences from large variations of anatomical structures as well as limited training samples. We also utilize VoxelMorph to significantly speed up MABS algorithm and generate segmentation results as our attention maps. We refer from DANet [5] and develop a multi-level soft attention module for better generalization on distorted samples, which splits the brain into two tasks, i.e. coarse detection and fine segmentation, according to anatomical knowledge. We also address the position self-attention mechanism to capture the shape variations between any two regions of the feature maps, to ease the poor generalization abilities on distortion samples. The proposed method achieves much improvement in robustness and accuracy dealing with all 17 ROIs of large variations. To the best of our knowledge, this is the first work that employ deep learning for hydrocephalus patient brain segmentation.

2 material and method

2.1 Material

In this work, we collected 21 clinical hydrocephalus patients, each of which obtained the MR images in T1, FA and ADC modalities. Note that all these subjects have hematoma volume and hydrocephalus disease, causing large deformation in brain anatomical structures. Specifically, our aim is to segment the consciousness-related brain regions on 3T MRI scans following [12], which are Insula-R (IR), Insula-L (IL), Thalamus-R (TR), Thalamus-L (TL), internal Capsule-R-Ant (ICRA), internal Capsule-R-Post (ICRP), internal Capsule-L-Ant (ICLA), internal Capsule-L-Post (ICLP), Cingulate-R-Ant (CRA), Cingulate-R-Post (CRP), Cingulate-L-Ant (CLA), Cingulate-L-Post (CLP), Medial prefrontal cortex-R (MCR), Medial prefrontal cortex-L (MCL), Inferior parietal lobule-L (IPL), Inferior parietal lobule-R (IPR) and Brainstem (B).

Note that we grouped the annotated images into five-fold cross-validation sets. Some preprocessing works have also been made in order to feed the neural network with this data: First, the images were resized to make them isotropic, with a voxel size of

with trilinear interpolation. Then, the images were normalized (dividing by the maximum intensity value in foreground region) in order to improve convergence. We only utilized random flip in three directions during training and we did’t use any augmentation during testing period.

There are two modules designed for our hydrocephalus brain segmentation method: the hard and soft attention modules. The whole pipeline of our method and proposed hard and soft attention fully convolutional network (FCN) is shown in Fig. 1. The details about the two modules are further illustrated in Section 2.1 and 2.2, respectively.

2.2 Hard attention module

The hard attention module is designed for generating attention maps as prior knowledge. In our implementation, the coarse segmentation results obtained with MABS are taken as hard attention maps, and we use VoxelMorph for more efficient and robust deformable registration. More specifically, we learn parameters of a function which will generate deformation field to warp a 3D volume to a fixed volume

. The loss function is calculated by measuring negative similarity between the warped image

and the fixed image . During test stage, given the new images and , the deformation field is obtained by evaluating and the segmentation for is transferred through warping the label of with . We have multiple segmentation results for with different fixed images , and these transferred labels will be further fused to a single consensus segmentation, which are taken as our hard attention map. Note that there have no data leakage in this module.

Figure 2: Visualization of segmentation results for hydrocephalus patients.

2.3 Soft attention module

The multi-level soft-attention module consists of the following major processes: First, we split all of the ROIs shown in Fig. 1

into two groups based on the anatomical knowledge. We first merge the whole ROIs as foreground, and utilize the first classifier to local the foreground to ease the task difficulty. The cropped region of the input images based on the first classifier result is used as the input of the second classifier for 17 ROIs fine-grained segmentation.

Second, we design a position attention module in encoder-decoder FCN. We feed the encoder features into the position attention module, and generate new features of spatial long-range contextual information through the following three steps:

  1. Generate a spatial attention matrix which models the spatial relationship between any two pixels of the features;

  2. Perform a matrix multiplication between the attention matrix and the original features;

  3. Perform an element-wise sum operation on the above multiplied resulting matrix and original features to obtain the final representations reflecting long range contexts.

Model Dice(%)
Ablation studies
Base + Hard
Base + Hard + Soft
Comparison with the state-of-the-art methods
Unet [4]
Vnet [7]
PSP [14]
ENC [13]
DEEP [3]
Base + Hard + Soft
Table 1: Comparisons with state-of-the-arts and ablation studies
Table 2: Comparison of the proposed method with state-of-arts

As in Fig. 1, given a local feature , we first feed it into a convolution layers to generate two new feature maps and , respectively, where . Then, we reshape them to , where is the number of pixels. After that, we perform a matrix multiplication between the transpose of and

, and apply a softmax layer to calculate the spatial attention map

. Meanwhile, we feed feature into a convolution layer to generate a new feature map , and reshape it to . We also perform a matrix multiplication between and the transpose of , and reshape the result to . Finally, we conduct an element-wise sum operation with the features to obtain the final output .

3 experiments

In this section, we detail the results of our proposed automatic brain segmentation method in hydrocephalus dataset including hard- and soft-attention, which is also compared with the alternatives. The network is trained and applied with a Titan X GPU on Tensorflow and NiftyNet platform. Regarding the hyper-parameters, the basic learning rate is set to 0.0001. For multi-task learning, the learning rate decreases gradually. The momentum and weight decay are set to 0.9 and 0.0001, respectively. Note that we only utilize random flipping in three directions during training.

Table 1 shows the robustness of hard attention module in experiments. Generally, when concatenating atlas map, the MABS segmentation result with the original volumes feeding to the FCN model ("Base") achieves higher accuracy on the hydrocephalus patient dataset ("Base" vs "Base + Hard"). According to the result, this atlas map can give prior knowledge for the current segmentation task, which improve the segmentation accuracy and robustness.

To validate our framework with multi-level soft-attention module, we also compare several different settings and report the results in Table 1. First, we consider the single task of segmentation only. The Dice score of our implementation is . Second, with the network architecture ("Base + Hard" vs "Base + Hard + Soft") validated in multi-task learning, we further verify the contribution of the proposed soft-attention module. The experimental results in the middle of Table 1 shows it could outperform other solutions. Therefore, we conclude that the proposed soft-attention is beneficial to the segmentation task.

Finally, we compare our proposed method with other state-of-the-art algorithms including Unet [4], Vnet [7], PSPnet [14], Encnet [13] and DeeplabV3 [3] in Table 1. The results show that the proposed method outperforms all the methods under comparison in the five-fold validation set. We have also provided visual inspection of the typical segmentation results (Unet vs. proposed method) with the ground truth in Fig. 2. The labeling result of the region inside the yellow box shows that, with the integration of our proposed module, the labeling accuracy and robustness is improved. Four patient cases which are failed to be located by Unet are successfully captured by our framework.

Furthermore, we compare our method with the state-of-arts method for each ROI in Table 2, it is shown that our method has demonstrated top-tier performance in each of ROIs compared with Unet solution. Our method can obtain more accurate result for hydrocephalus patient dataset.

4 Conclusion

We have proposed a novel and effective atlas-guided, multi-level soft-attention framework for semantic segmentation of brain MR images in hydrocephalus patient dataset. Specifically, we proposed a hard-attention module to give the prior knowledge from VoxelMorph based MABS method to improve the robustness of network. Moreover, we decomposed the very challenging semantic segmentation task to several sub-tasks, which are associated with coarse-to-fine segmentation mechanism. Finally, we addressed the position attention module to FCN model which could capture long-range contextual information in the model. We have conducted comprehensive experiments on a large, custom medical image dataset which is the first work for hydrocephalus patient brain segmentation task according to our knowledge.


  • [1] R. Bajcsy and S. Kovačič (1989) Multiresolution elastic matching. Computer vision, graphics, and image processing 46 (1), pp. 1–21. Cited by: §1.
  • [2] G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca (2018)

    An unsupervised learning model for deformable medical image registration


    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 9252–9260. Cited by: §1.
  • [3] L. Chen, G. Papandreou, F. Schroff, and H. Adam (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Cited by: Table 1, §3.
  • [4] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger (2016) 3D u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pp. 424–432. Cited by: Table 1, §3.
  • [5] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu (2019) Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154. Cited by: §1.
  • [6] M. Ghafoorian, A. Mehrtash, T. Kapur, N. Karssemeijer, E. Marchiori, M. Pesteie, C. R. Guttmann, F. de Leeuw, C. M. Tempany, B. van Ginneken, et al. (2017) Transfer learning for domain adaptation in mri: application in brain lesion segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 516–524. Cited by: §1.
  • [7] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: Table 1, §3.
  • [8] P. Moeskops, M. Veta, M. W. Lafarge, K. A. Eppenhof, and J. P. Pluim (2017) Adversarial training and dilated convolutions for brain mri segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 56–64. Cited by: §1.
  • [9] S. Roy, J. A. Butman, D. S. Reich, P. A. Calabresi, and D. L. Pham (2018) Multiple sclerosis lesion segmentation from brain mri via fully convolutional neural networks. arXiv preprint arXiv:1803.09172. Cited by: §1.
  • [10] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. Hill, M. O. Leach, and D. J. Hawkes (1999) Nonrigid registration using free-form deformations: application to breast mr images. IEEE transactions on medical imaging 18 (8), pp. 712–721. Cited by: §1.
  • [11] J. Thirion (1998) Image matching as a diffusion process: an analogy with maxwell’s demons. Medical image analysis 2 (3), pp. 243–260. Cited by: §1.
  • [12] X. Wu, J. Zhang, Z. Cui, W. Tang, C. Shao, J. Hu, J. Zhu, Y. Zhao, L. Lu, G. Chen, et al. (2018) White matter deficits underlying the impaired consciousness level in patients with disorders of consciousness. Neuroscience bulletin 34 (4), pp. 668–678. Cited by: §2.1.
  • [13] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal (2018) Context encoding for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160. Cited by: Table 1, §3.
  • [14] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia (2017) Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890. Cited by: Table 1, §3.