Deep LOGISMOS: Deep Learning Graph-based 3D Segmentation of Pancreatic Tumors on CT scans

01/25/2018 ∙ by Zhihui Guo, et al. ∙ National Institutes of Health The University of Iowa 0

This paper reports Deep LOGISMOS approach to 3D tumor segmentation by incorporating boundary information derived from deep contextual learning to LOGISMOS - layered optimal graph image segmentation of multiple objects and surfaces. Accurate and reliable tumor segmentation is essential to tumor growth analysis and treatment selection. A fully convolutional network (FCN), UNet, is first trained using three adjacent 2D patches centered at the tumor, providing contextual UNet segmentation and probability map for each 2D patch. The UNet segmentation is then refined by Gaussian Mixture Model (GMM) and morphological operations. The refined UNet segmentation is used to provide the initial shape boundary to build a segmentation graph. The cost for each node of the graph is determined by the UNet probability maps. Finally, a max-flow algorithm is employed to find the globally optimal solution thus obtaining the final segmentation. For evaluation, we applied the method to pancreatic tumor segmentation on a dataset of 51 CT scans, among which 30 scans were used for training and 21 for testing. With Deep LOGISMOS, DICE Similarity Coefficient (DSC) and Relative Volume Difference (RVD) reached 83.2+-7.8 respectively, both are significantly improved (p<0.05) compared with contextual UNet and/or LOGISMOS alone.



There are no comments yet.


page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Monitoring the growth and spread of tumors at different time points helps physicians differentiate tumor types and plan the proper treatment [1]. To achieve this, accurate and reliable segmentation of tumors is of great importance.

Figure 1: Schematic diagram of Deep LOGISMOS method illustrated on an example of 3D pancreatic tumor segmentation.

Variety of methods have been proposed for medical image segmentation, among which deep learning has recently become prevalent and reached new levels of the state-of-the-art accuracy in many tasks [2]

. Medical imaging data such as CT and MRI are inherently 3D, but can be visualized as stacks of 2D slices. Deep learning based segmentation methods can be divided into 4 categories according to how data is input to the network: convolutional neural networks (CNN) with 2D convolutions

[3]; CNN with 3D convolutions [4]

; combination of 2D CNN and recurrent neural networks (RNN) for 3D segmentation

[2, 5]; and combination of 2D CNN and optimization algorithm for 2D or 3D segmentation [6]. UNet [3] is a type of CNN with 2D convolutions that only takes intra-slice context into account, leaving out the inter-slice context. 3D UNets [4] apply 3D convolutions to capture 3D spatial context but are computationally expensive. An alternative way to use contextual UNet is to stack three adjacent slices in three RGB channels to leverage inter-slice context from adjacent slices. LSTM [7] is a type of RNN that is designed for sequential data and can be used to leverage spatial context between adjacent slices. Chen et al. [2]

combined a modified 2D UNet and LSTM to do 3D segmentation of neuron and fungus. Tseng

et al. [5] applied CNN and convolutional LSTM to multi-modality data and achieved 3D segmentation. In [6], FCN and graph-based method worked together where FCN provided the cost for the 2D graph. Since objects are 3D in nature, 3D spatial context are valuable in 3D segmentation.

LOGISMOS is a graph-based framework that translates geometric constraints of interacting surfaces and objects into graph arcs and likelihood of segmentation surface positioning into graph node/arc costs [8]. With LOGISMOS, the globally optimal N-dimensional solution satisfying defined smoothness constraints is obtained. The node-weighted LOGISMOS has been successfully used in difficult tasks such as 3D knee and brain segmentation [8, 9]. This graph segmentation framework is robust to image noise and weak boundaries but requires a proper initial segmentation as the shape prior to build the graph and to assign proper costs for each node reflecting its likelihood to occur on the desired segmentation surface. Defining the initial segmentation is not trivial and often requires manual intervention. Similarly, the graph costs are frequently derived from hand-crafted task-specific features and may not be generalizable to other problems.

We propose a method that combines UNet and LOGISMOS for 3D tumor segmentation. Specifically, we adopt UNet to integrate intra-slice and adjacent-slice contexts, and regulate the 3D shape by LOGISMOS. Different from our previous FCN+graph method [6] which uses FCN to locate the object center for graph construction and combines FCN-derived cost with hand-crafted costs, our method directly constructs the graph based on the UNet-derived object boundaries and assigns UNet-derived probabilities as costs.

Pancreatic cancer is a major health problem that shows a steady increase in incidence and death rate while also exhibiting a slight improvement in survival rates over the past 5 years [10]. To our best knowledge, this is the first approach for automated 3D segmentation of pancreatic tumors. The proposed method can be extended to any tumor segmentation tasks.

2 Methods

We present a method called Deep LOGISMOS to segment tumors in 3D by combining contextual UNet and a graph-based framework LOGISMOS. The work-flow is described in Fig.1. First, the tumor ROI, defined as a square cube (323232 voxels) by a single click of its center point, is cropped from the whole image. For each 2D slice, the contextual UNet takes itself and its two adjacent 2D slices as input patch and outputs the probability map and segmentation. We apply a GMM to remove false positives. After that, morphological opening and closing are applied to retain only the largest region in the segmentation. To construct the graph, the refined UNet segmentation is set as the initial segmentation to build the graph. The UNet probability maps is used as the cost for nodes in the graph. The final segmentation is given by the global optimal solution via a max-flow algorithm in graph search [11].

2.1 Contextual UNet

With the multi-scale training architecture, UNet meets the need for biomedical image segmentation and has achieved great success in various tasks [2]. We use the UNet described in [3] for end-to-end training in this study, with the modification that the lowest scale of feature maps is removed due to small 2D image size (32

32). The input is 3-adjacent 2D slices, leveraging the adjacent spatial contexts, namely contextual UNet. To increase training sample size, data augmentations including translation, rotation and scaling are applied to each sample. The initial learning rate is set as 1e-6 with momentum optimizer. We train UNet for around 30 epochs. We test several batch size options (1, 3, 10, 100) in the verification, and the batch size of 1 gives the best accuracy.

2.2 Refinement

The UNet output needs to be further refined due to two reasons. First, the intensity distributions of tumors vary greatly for different patients and different contrast phase. Since the training set is small, the diverse intensity distributions may compromise the performance of UNet. Second, the purpose is to segment the center tumor inside the ROI. However, there may be other tumors in the image that are detected by UNet and should be excluded. We adopt a GMM with prior information about the relative intensity distributions of tumors and background to subtract background from UNet segmentation. Afterwards, morphological opening and closing are applied to ensure that only the largest region in the center is retained.

For false positive reduction, only pixels that are segmented as tumors by UNet inside the ROI are considered. We fit two Gaussian distributions with GMM from all pixel intensities. GMM is a clustering method that applies maximum likelihood estimation with Gaussian conditional distribution and is solved by Expectation-Maximization algorithm. The motivation to fit two Gaussian distributions (N(

, ), N((, )) for tumor and background respectively is based on the prior information that pixels inside one tumor have relatively homogeneous intensities, which are higher than intensities in the background. Suppose is larger than , the condition to apply the false positive reduction by GMM is . If the condition is satisfied, pixels with intensities less than are marked as background and the probabilities are set to be 0. Otherwise, no false positive reduction will be applied. Then, two iterations of 3D morphological opening are applied and only the largest region is kept. Afterwards, 3D morphological closing is performed.

2.3 Logismos

There are two key factors that affect the performance of our graph-based method, namely the initial segmentation and the cost design. We take advantage of UNet to generate a reliable initial segmentation and assign costs from deep features.

2.3.1 Graph construction

The UNet segmentation after refinement can be regarded as a coarse initial segmentation. This type of initial segmentation contains image-specific shape information of the tumor on an unseen image, which is preferable to be the shape prior compared with simple shape such ellipse or a mean shape model. Based on the boundary of initial segmentation, a geometric node-weighted graph is established. A stack of graph nodes (called a column) are connected with intra-column arcs that ensure only one cut through the column. Besides, inter-column arcs encode the smoothness constraints. The columns are built starting from the normal directions of points on the boundary under electric lines of force (ELF) [8] to avoid intersection. The length of the columns is set as 50 with node spacing of 0.5 mm to cover the potential area of the tumor.

2.3.2 Cost design

Contextual UNet outputs a probability map for each 2D slice. The probability is a region-based likelihood that ranges from 0 to 1 with higher value indicating higher chance of the pixel to be inside the tumor. LOGISMOS requires the cost to be the likelihood of nodes being not on the boundary. To translate the region-based probabilities to boundary-based cost, Eq.1 is used to decide the cost for node j on column k based on the summation of the probabilities for interior nodes on the same column. The -0.5 term corresponds to the probability threshold (0.5) when generating UNet segmentation.


2.3.3 Segmentation

The constructed graph integrates shape prior, geometric smoothness constrains, deep features, and ensures globally optimized true 3D segmentation. The final segmentation is obtained by max-flow algorithm [11] in polynomial time.

3 Experimental Results

Deep LOGISMOS was applied to a dataset of 51 arterial phase CT scans from 15 patients with pancreatic tumors studied at multiple time points, patients were participating in a clinical trial. The CT scans have a resolution of 111 after resampling. A pancreatic tumor ROI with the size of 3232

32 voxels was extracted from a scan by a single click at the approximate tumor center. 30 tumor ROIs from 8 patients were used for training and 21 tumor ROIs from other 7 patients were used for testing. We assessed the effect of 3 main aspects, which were the adjacent-slice context, refinement, and true 3D constraints of LOGISMOS. The evaluation metrics include DSC and RVD (relative volume difference). Statistical significance was estimated using paired t-test and the significance level was set at 0.05. The UNet is implemented using the Caffe platform

[12] on a Nvidia TITAN X Pascal GPU with 12 GB of memory.

3.1 Contextual UNet vs. 2D UNet

Besides training a contextual UNet, we also trained a 2D UNet on the same training set. All the parameters were the same. The performance of the two networks on the test set are presented in Table 1. Contextual UNet achieved significantly superior segmentation accuracy than 2D UNet for RVD.

Methods DSC (%) RVD (%)
2D UNet 72.8 22.0 42.5 32.8
Contextual UNet 75.6 16.6 35.7 29.7
Table 1: Comparison of 2D UNet and contextual UNet.

3.2 Refinement

Next, we compared the refined UNet segmentation with the original UNet segmentation (Table 2). DSC and RVD indices demonstrated that segmentation with refinement was significantly better than that without refinement.

Methods DSC (%) RVD (%)
Without refinement 75.6 16.6 35.7 29.7
With refinement 81.6 11.2 26.1 20.9
Table 2: Performance of contextual UNet segmentation.

3.3 Deep LOGISMOS vs. Contextual UNet, LOGISMOS

Segmentation results from contextual UNet after refinement, LOGISMOS and Deep LOGISMOS are presented in Fig. 2.

Figure 2: Qualitative comparison of the three methods. The rows represent three adjacent slices from one tumor. Ground truth is marked as green regions. Yellow, red and blue contours are segmentations from the contextual UNet, deep LOGISMOS, and original LOGISMOS methods, respectively.
Methods DSC (%) RVD (%)
Contextual UNet 81.6 11.2 26.1 20.9
LOGISMOS 70.4 27.7 35.4 51.1
Deep LOGISMOS 83.2 7.8 18.6 17.4
Table 3: Comparison of segmentations from contextual UNet after refinement, LOGISMOS and Deep LOGISMOS.
Figure 3: Tumor segmentations in the same 3D view; (a), (b) and (c) represent segmentations from contextual UNet, Deep LOGISMOS and original LOGISMOS respectively. The tumor is the same as in Fig. 2.

The initial segmentation for the original LOGISMOS method was a sphere centered on the ROI with the radius of 8 mm (a quarter of the ROI size), the costs were derived from inverted gradients along the graph columns. Deep LOGISMOS segmentation performance was significantly better than that of the contextual UNet with refinement and also of the original LOGISMOS when considering either of the DSC and/or RVD metrics (Table 3). Note that the LOGISMOS method failed to detect 3 tumors altogether since the tumors were too small. Excluding the 3 missing tumors, LOGISMOS method gave an average DSC of 78.7%. The segmentation results demonstrated in a 3D view are shown in Fig. 3.

4 Conclusion

A hybrid fully convolutional network – FCN combined with the graph-based LOGISMOS approach was reported. Its performance was evaluated in a 3D pancreatic tumor segmentation task. Resulting from this study, we have demonstrated that 1) context information from adjacent slices significantly improved the performance of a UNet, and that 2) our novel Deep LOGISMOS method achieved significantly better performance than the UNet and/or LOGISMOS methods alone.


  • [1] L. Zhang, L. Lu, R. M. Summers, E. Kebebew, and J. Yao, “Personalized pancreatic tumor growth prediction via group learning,” in MICCAI. Springer, 2017, pp. 422–432.
  • [2] J. Chen, L. Yang, Y. Zhang, M. Alber, and D. Chen, “Combining fully convolutional and recurrent neural networks for 3D biomedical image segmentation,” in NIPS, 2016, pp. 3036–3044.
  • [3] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in MICCAI. Springer, 2015, pp. 234–241.
  • [4] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: learning dense volumetric segmentation from sparse annotation,” in MICCAI. Springer, 2016, pp. 424–432.
  • [5] K. Tseng, Y. Lin, W. Hsu, and C. Huang, “Joint sequence learning and cross-modality convolution for 3D biomedical segmentation,” in CVPR, July 2017.
  • [6] L. Zhang, M. Sonka, L. Lu, R. M. Summers, and J. Yao, “Combining fully convolutional networks and graph-based approach for automated segmentation of cervical cell nuclei,” in IEEE ISBI, 2017, pp. 406–409.
  • [7] S. Hochreiter and Jü. Schmidhuber,

    Long short-term memory,”

    Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
  • [8] Y. Yin, X. Zhang, R. Williams, X. Wu, D. D. Anderson, and M. Sonka, “LOGISMOS-layered optimal graph image segmentation of multiple objects and surfaces: cartilage segmentation in the knee joint,” IEEE TMI, vol. 29, no. 12, pp. 2023–2037, 2010.
  • [9] Z. Guo, S. Kashyap, M. Sonka, and I. Oguz,

    Machine learning in a graph framework for subcortical segmentation,”

    in SPIE Medical Imaging, 2017, pp. 101330H–101330H.
  • [10] R. Siegel, J. Ma, Z. Zou, and A. Jemal, “Cancer statistics, 2014,” CA Cancer J Clin., vol. 64, no. 1, pp. 9–29, 2014.
  • [11] Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE TPAMI, vol. 26, no. 9, pp. 1124–1137, 2004.
  • [12] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in ACM, 2014, pp. 675–678.