Skull stripping is a significant task in medical image processing, as it not only protects patients’ brains during intraoperative navigation and radiotherapy but also provides visualized information to conduct surgical planning and clinical-oriented diagnosis. Furthermore, skull stripping is a preprocessed step used to identify abnormalities, such as tumors, lesions, and cancerous cells; therefore, the quality of extracted brains would greatly affect the accuracy of abnormal tissue detection [fatima2020state]. However, skull stripping is also a challenging task, as the brain is considered to be the most complex organ in the human body. To achieve great performance in the skull stripping process, it is necessary to obtain a stable and smooth brain isosurface.
Currently, brain surface extraction in clinical imaging is mostly conducted by experienced radiologists, outlining the brain boundary in each magnetic resonance imaging (MRI) slice. Obviously, this manual routine is prone to error and time consuming; therefore, semi-automatic end-to-end methods have emerged to improve both efficiency and fault tolerance. A considerable number of conventional skull stripping algorithms have been proposed and widely used in brain MRI analysis, such as the Brain Extraction Tool (BET), Brain Surface Extractor (BSE), FreeSurfer, and the Hybrid Watershed Algorithm (HWA) [rehman2020conventional]. However, conventional skull stripping algorithms such as deformable-surface-based methods require additional steps and extra computation cost [wang2011robust, segonne2004hybrid]
. As a result, recent studies have concentrated on applying deep learning methods to the skull stripping problem. Chen et al. proposed VoxResNet—a voxel-wise neural network architecture—to predict the label of each voxel and carefully designed the layers and connections to incorporate low-level images and high-level semantics[chen2018voxresnet]. Afterwards, Hwang et al. validated that 3D U-Net, the most famous encoder–decoder semantics segmentation network in the field of medical imaging, can achieve state-of-the-art performance in the problem of skull stripping [hwang20193d]
. However, Geirhos et al. used quantitative experiments to show that convolution neural networks place more emphasis on texture information rather than shape information[geirhos2018imagenet], while brain surfaces have relatively stable positions and shapes. The aforementioned skull stripping methods predict whole brains without prior learning shape information, which is not suitable for brain surface reconstruction.
To better exploit shape information of brain isosurfaces, we introduce the learning mechanism of signed distance fields (SDFs), a concept from geometry modeling, into the skull stripping backbone network for the first time. The magnitude of SDFs implies the distance between the point and the closest boundary of the whole brain, and the sign of SDFs provides information on whether the point is inside the brain. Several works validated that compared to segmentation maps, SDFs contain more information, especially global shape information [ma2020distance, kervadec2019boundary, karimi2019reducing, navarro2019shape, wang2020deep]. However, SDFs have not applied for brain surface reconstruction in literature. Besides, we introduce an additional Laplacian loss that exploits the property of SDFs to obtain a smoother, more continuous brain surface.
The main contributions of our work are as follows: (1) We applied 2D U-Net and 3D U-Net segmentation backbone networks to our brain MRI dataset and show that 2D U-Net performs better than 3D U-Net in the task of large organ surface extraction. (2) We added a regression head to the 2D U-Net backbone network to learn the information of SDFs and train the head jointly with the segmentation head. As a result, our new pipeline achieves better results in the evaluation metrics of the Hausdorff distance (HD) and average symmetric surface distance (ASSD). (3) We also introduced the Laplacian loss that uses the property of SDFs as an additional term in the loss function of the regression head and demonstrated that the new loss could help to reduce HD and obtain smoother brain isosurfaces in three-dimensional(3D) scenarios.
We propose to add signed distance maps to the segmentation network for whole brain surface reconstruction. We first perform a two-dimensional distance transformation on the ground truth separately in terms of each slice, and then train a 2D-U-Net-based convolution neural network with two heads to predict binary segmentation maps and scaled regression distance maps. Finally, we exploit the marching cube algorithms to generate 3D surfaces from segmentation maps for each patient. The pipeline is shown in Fig. 1.
2.2 Signed Distance Fields Transform
Given the ground truth of a brain surface defined for each brain slice and a point , the mapping between and the SDFs is defined as
where represents the boundary of each brain slice, refers to the closest Euclidean distance between the point and the boundary (infimum), and the sign implies whether the point is inside or outside of the brain slice. In addition, the zero-level set of points can be resolved as the isosurface of the brain. In our method, we adopted 2D pixel distance fields, which represent the distance to the nearest pixel boundary between inside and outside, to calculate SDFs.
2.3 Network Architecture
The network architecture is based on 2D U-Net, which employs an analysis path and a synthesis path to learn local and global features separately [ronneberger2015u]. To predict the segmentation maps and regression distance maps simultaneously, we added a head to the existing architecture and trained the parameters with the ground truth and SDFs at the same time. Furthermore, we normalized the calculated SDFs to the range and employed a tanh-activated output layer in the regression head.
2.4 Loss Function
We adopted different loss functions for segmentation and regression heads.
2.4.1 Segmentation Head
In the head of predicting segmentation maps, we adopted binary cross-entropy loss and dice loss. The binary cross-entropy loss is very common in binary classification tasks, but the loss may miss part of the high-level information as it only considers the loss pixel by pixel. Therefore, we introduced the dice loss when training the network.
We set the probability of theth point belonging to the whole brain as and the ground truth as , and the combined loss function of segmentation head can be defined as
where is a small term introduced to avoid dividing by zero.
|Model||Volumetric Dice||Surface Dice||HD||HD95||ASSD|
Quantitative results: mean and standard deviation of the volumetric dice, surface dice, HD, HD95, and ASSD. a) 2D U-Net with, b) 3D U-Net with , c) Our method with , d) Our method with .
2.4.2 Regression Head
In the head of predicting regression maps, we adopted loss as one part of the loss function of regression head since
loss is robust to outliers[xue2020shape]. Furthermore, based on the properties that SDFs should be smooth and continuous everywhere, we introduced a new loss called Laplacian loss [li2017laplacian].
Given the predicted SDF results and the SDFs’ ground truth , Laplacian loss can be expressed as
where refers to the discrete Laplacian filter and refers to the number of pixels in each brain slice.
2.4.3 Final Loss
When training the segmentation maps and regression distance maps simultaneously, the final loss is defined as
2.5 Post Processing
To generate the brain isosurface, we stacked the predicted segmentation maps for each patient and used marching cubes to complete the mesh reconstruction. We also tried to apply marching cubes to the SDFs but found that the brain isosurfaces are a little bit fuzzy, so we did not list the results in Section 3.
3 Experiments and Results
3.1 Dataset and Experimental Details
The dataset consisted of brain MRI slices from 111 patients. Medical doctors manually annotated the brain partitions in the slices, and we employed the results as the ground truth (gold standard).
We implemented four different models to validate our method. Details of the models are listed in Table 1.
We chose MRI slices from 66 and 22 patients for the training and validation phases (144 190 slices per patient), respectively, reserving the remaining slices from 23 patients for the testing phase. All models were trained with the Adam optimizer with a decaying learning rate initialized at . The initial number of channels was 32, and the maximum number of channels after the last downsampling operator was 512. The input size of 2D U-Net was , while that of 3D U-Net was
. The batch sizes of 2D U-Net and 3D U-Net were set as 8 and 1, separately, owing to memory limitation. All models were trained for 50 epochs, and we employed four-fold cross-validation to choose the best models of four experiment settings respectively (highest dice score in the validation set). Moreover, we performed all evaluations on the testing set to derive the quantitative results. All experiments were performed on an NVIDIA 1080TI GPU and the code was developed in Keras.
3.2 Evaluation Metrics
We employed separate evaluation metrics for the whole-brain surface extraction in all experiments. Specifically, we used the volumetric dice score, surface dice score, HD, 95% HD (HD95), and ASSD of each patient as the evaluation metrics. The volumetric dice score computes the dice score of the whole brain between the predicted mask and ground truth, while the surface dice score measures the overlap of two surfaces instead of two volumes. Furthermore, we set the tolerance of the surface to 1. HD, HD95, and ASSD were used to measure the difference between two different 3D representations of the same brain isosurface, which could be regarded as a shape-aware comparison.
3.3 Quantitative Results
that training a U-Net-based convolution neural network with an extra head learning from SDFs can achieve comparable results in the metrics of volumetric and surface dice scores. We assume that the whole brain is a large organ and it is not difficult to predict, even at baseline. However, 3D U-Net does not perform well in the brain isosurface extraction task. We assume that this is because 3D U-Net has more training hyperparameters and our dataset is not large enough to obtain a well-trained network. Moreover, after applying four downsampling operations, the intermediate feature maps would easily lose most or even all information of the first dimension.
In addition, we compared our proposed methods with the baseline networks (2D and 3D U-Net) on the shape-aware evaluation metrics (HD, HD95, and ASSD). The comparison demonstrates that our method outperforms HD, HD95, and ASSD for these metrics. The results prove that adding a head to the baseline network can be beneficial for learning shape information and that our introduced Laplacian loss can reduce HD at the 3D level, which is suitable for mesh reconstruction of the organs.
3.4 Qualitative Results
In Fig. 2, we show brain isosurface results of six patients obtained from 2D U-Net, 3D U-Net, and our method. We computed HD between the predicted brain isosurface and ground truth, and visualized HD in red-blue colormap (red - small HD, blue - large HD). We highlight special regions with a white box to show where our method improves the quality of brain isosurfaces. We can clearly see that adding one extra head to learn the information of SDFs can help produce a clearer and more stable brain isosurface than the baseline networks and that our proposed Laplacian loss can reduce HD at the 3D level (HD is sensitive to outliers). Moreover, since the ground truth is annotated manually in each slice separately, it may contain some isolated false positives in 3D view, and the last column shows that our method can still achieve a good brain isosurface even in this case.
4 Discussion and Conclusion
To address the problem of brain isosurface reconstruction, we propose a shape-aware segmentation network that incorporates the information of SDFs. We observed that 2D U-Net performs better than 3D U-Net in the segmentation task of large organs such as the brain. Furthermore, we proved that our method can achieve volumetric and surface dice scores comparable with those of 2D U-Net. Moreover, our method can decrease the HD and ASSD when conducting mesh reconstruction from the predicted segmentation maps.
However, we are not taking full advantage of SDFs, which have been predicted by the regression head. Our future work will be to evaluate the predicted SDFs and perform the reconstruction directly from the predicted SDFs.
5 Compliance with Ethical Standards
All procedures involving human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki Declaration and its later amendments. Informed consent was obtained from all participants included in the study. This study was approved by the Research Ethics Committee of the University of Tokyo.
This work was supported by JST CREST Grant Number JPMJCR17A1 and AMED under Grant Number JP18he1602001, Japan. The authors would like to thank Zheyuan Cai, Ding Xia and Yang Zhou for their helpful comments.