1 Introduction
Monitoring the growth and spread of tumors at different time points helps physicians differentiate tumor types and plan the proper treatment [1]. To achieve this, accurate and reliable segmentation of tumors is of great importance.
Variety of methods have been proposed for medical image segmentation, among which deep learning has recently become prevalent and reached new levels of the stateoftheart accuracy in many tasks [2]
. Medical imaging data such as CT and MRI are inherently 3D, but can be visualized as stacks of 2D slices. Deep learning based segmentation methods can be divided into 4 categories according to how data is input to the network: convolutional neural networks (CNN) with 2D convolutions
[3]; CNN with 3D convolutions [4]; combination of 2D CNN and recurrent neural networks (RNN) for 3D segmentation
[2, 5]; and combination of 2D CNN and optimization algorithm for 2D or 3D segmentation [6]. UNet [3] is a type of CNN with 2D convolutions that only takes intraslice context into account, leaving out the interslice context. 3D UNets [4] apply 3D convolutions to capture 3D spatial context but are computationally expensive. An alternative way to use contextual UNet is to stack three adjacent slices in three RGB channels to leverage interslice context from adjacent slices. LSTM [7] is a type of RNN that is designed for sequential data and can be used to leverage spatial context between adjacent slices. Chen et al. [2]combined a modified 2D UNet and LSTM to do 3D segmentation of neuron and fungus. Tseng
et al. [5] applied CNN and convolutional LSTM to multimodality data and achieved 3D segmentation. In [6], FCN and graphbased method worked together where FCN provided the cost for the 2D graph. Since objects are 3D in nature, 3D spatial context are valuable in 3D segmentation.LOGISMOS is a graphbased framework that translates geometric constraints of interacting surfaces and objects into graph arcs and likelihood of segmentation surface positioning into graph node/arc costs [8]. With LOGISMOS, the globally optimal Ndimensional solution satisfying defined smoothness constraints is obtained. The nodeweighted LOGISMOS has been successfully used in difficult tasks such as 3D knee and brain segmentation [8, 9]. This graph segmentation framework is robust to image noise and weak boundaries but requires a proper initial segmentation as the shape prior to build the graph and to assign proper costs for each node reflecting its likelihood to occur on the desired segmentation surface. Defining the initial segmentation is not trivial and often requires manual intervention. Similarly, the graph costs are frequently derived from handcrafted taskspecific features and may not be generalizable to other problems.
We propose a method that combines UNet and LOGISMOS for 3D tumor segmentation. Specifically, we adopt UNet to integrate intraslice and adjacentslice contexts, and regulate the 3D shape by LOGISMOS. Different from our previous FCN+graph method [6] which uses FCN to locate the object center for graph construction and combines FCNderived cost with handcrafted costs, our method directly constructs the graph based on the UNetderived object boundaries and assigns UNetderived probabilities as costs.
Pancreatic cancer is a major health problem that shows a steady increase in incidence and death rate while also exhibiting a slight improvement in survival rates over the past 5 years [10]. To our best knowledge, this is the first approach for automated 3D segmentation of pancreatic tumors. The proposed method can be extended to any tumor segmentation tasks.
2 Methods
We present a method called Deep LOGISMOS to segment tumors in 3D by combining contextual UNet and a graphbased framework LOGISMOS. The workflow is described in Fig.1. First, the tumor ROI, defined as a square cube (323232 voxels) by a single click of its center point, is cropped from the whole image. For each 2D slice, the contextual UNet takes itself and its two adjacent 2D slices as input patch and outputs the probability map and segmentation. We apply a GMM to remove false positives. After that, morphological opening and closing are applied to retain only the largest region in the segmentation. To construct the graph, the refined UNet segmentation is set as the initial segmentation to build the graph. The UNet probability maps is used as the cost for nodes in the graph. The final segmentation is given by the global optimal solution via a maxflow algorithm in graph search [11].
2.1 Contextual UNet
With the multiscale training architecture, UNet meets the need for biomedical image segmentation and has achieved great success in various tasks [2]. We use the UNet described in [3] for endtoend training in this study, with the modification that the lowest scale of feature maps is removed due to small 2D image size (32
32). The input is 3adjacent 2D slices, leveraging the adjacent spatial contexts, namely contextual UNet. To increase training sample size, data augmentations including translation, rotation and scaling are applied to each sample. The initial learning rate is set as 1e6 with momentum optimizer. We train UNet for around 30 epochs. We test several batch size options (1, 3, 10, 100) in the verification, and the batch size of 1 gives the best accuracy.
2.2 Refinement
The UNet output needs to be further refined due to two reasons. First, the intensity distributions of tumors vary greatly for different patients and different contrast phase. Since the training set is small, the diverse intensity distributions may compromise the performance of UNet. Second, the purpose is to segment the center tumor inside the ROI. However, there may be other tumors in the image that are detected by UNet and should be excluded. We adopt a GMM with prior information about the relative intensity distributions of tumors and background to subtract background from UNet segmentation. Afterwards, morphological opening and closing are applied to ensure that only the largest region in the center is retained.
For false positive reduction, only pixels that are segmented as tumors by UNet inside the ROI are considered. We fit two Gaussian distributions with GMM from all pixel intensities. GMM is a clustering method that applies maximum likelihood estimation with Gaussian conditional distribution and is solved by ExpectationMaximization algorithm. The motivation to fit two Gaussian distributions (N(
, ), N((, )) for tumor and background respectively is based on the prior information that pixels inside one tumor have relatively homogeneous intensities, which are higher than intensities in the background. Suppose is larger than , the condition to apply the false positive reduction by GMM is . If the condition is satisfied, pixels with intensities less than are marked as background and the probabilities are set to be 0. Otherwise, no false positive reduction will be applied. Then, two iterations of 3D morphological opening are applied and only the largest region is kept. Afterwards, 3D morphological closing is performed.2.3 Logismos
There are two key factors that affect the performance of our graphbased method, namely the initial segmentation and the cost design. We take advantage of UNet to generate a reliable initial segmentation and assign costs from deep features.
2.3.1 Graph construction
The UNet segmentation after refinement can be regarded as a coarse initial segmentation. This type of initial segmentation contains imagespecific shape information of the tumor on an unseen image, which is preferable to be the shape prior compared with simple shape such ellipse or a mean shape model. Based on the boundary of initial segmentation, a geometric nodeweighted graph is established. A stack of graph nodes (called a column) are connected with intracolumn arcs that ensure only one cut through the column. Besides, intercolumn arcs encode the smoothness constraints. The columns are built starting from the normal directions of points on the boundary under electric lines of force (ELF) [8] to avoid intersection. The length of the columns is set as 50 with node spacing of 0.5 mm to cover the potential area of the tumor.
2.3.2 Cost design
Contextual UNet outputs a probability map for each 2D slice. The probability is a regionbased likelihood that ranges from 0 to 1 with higher value indicating higher chance of the pixel to be inside the tumor. LOGISMOS requires the cost to be the likelihood of nodes being not on the boundary. To translate the regionbased probabilities to boundarybased cost, Eq.1 is used to decide the cost for node j on column k based on the summation of the probabilities for interior nodes on the same column. The 0.5 term corresponds to the probability threshold (0.5) when generating UNet segmentation.
(1) 
2.3.3 Segmentation
The constructed graph integrates shape prior, geometric smoothness constrains, deep features, and ensures globally optimized true 3D segmentation. The final segmentation is obtained by maxflow algorithm [11] in polynomial time.
3 Experimental Results
Deep LOGISMOS was applied to a dataset of 51 arterial phase CT scans from 15 patients with pancreatic tumors studied at multiple time points, patients were participating in a clinical trial. The CT scans have a resolution of 111 after resampling. A pancreatic tumor ROI with the size of 3232
32 voxels was extracted from a scan by a single click at the approximate tumor center. 30 tumor ROIs from 8 patients were used for training and 21 tumor ROIs from other 7 patients were used for testing. We assessed the effect of 3 main aspects, which were the adjacentslice context, refinement, and true 3D constraints of LOGISMOS. The evaluation metrics include DSC and RVD (relative volume difference). Statistical significance was estimated using paired ttest and the significance level was set at 0.05. The UNet is implemented using the Caffe platform
[12] on a Nvidia TITAN X Pascal GPU with 12 GB of memory.3.1 Contextual UNet vs. 2D UNet
Besides training a contextual UNet, we also trained a 2D UNet on the same training set. All the parameters were the same. The performance of the two networks on the test set are presented in Table 1. Contextual UNet achieved significantly superior segmentation accuracy than 2D UNet for RVD.
Methods  DSC (%)  RVD (%) 

2D UNet  72.8 22.0  42.5 32.8 
Contextual UNet  75.6 16.6  35.7 29.7 
3.2 Refinement
Next, we compared the refined UNet segmentation with the original UNet segmentation (Table 2). DSC and RVD indices demonstrated that segmentation with refinement was significantly better than that without refinement.
Methods  DSC (%)  RVD (%) 

Without refinement  75.6 16.6  35.7 29.7 
With refinement  81.6 11.2  26.1 20.9 
3.3 Deep LOGISMOS vs. Contextual UNet, LOGISMOS
Segmentation results from contextual UNet after refinement, LOGISMOS and Deep LOGISMOS are presented in Fig. 2.
Methods  DSC (%)  RVD (%) 

Contextual UNet  81.6 11.2  26.1 20.9 
LOGISMOS  70.4 27.7  35.4 51.1 
Deep LOGISMOS  83.2 7.8  18.6 17.4 
The initial segmentation for the original LOGISMOS method was a sphere centered on the ROI with the radius of 8 mm (a quarter of the ROI size), the costs were derived from inverted gradients along the graph columns. Deep LOGISMOS segmentation performance was significantly better than that of the contextual UNet with refinement and also of the original LOGISMOS when considering either of the DSC and/or RVD metrics (Table 3). Note that the LOGISMOS method failed to detect 3 tumors altogether since the tumors were too small. Excluding the 3 missing tumors, LOGISMOS method gave an average DSC of 78.7%. The segmentation results demonstrated in a 3D view are shown in Fig. 3.
4 Conclusion
A hybrid fully convolutional network – FCN combined with the graphbased LOGISMOS approach was reported. Its performance was evaluated in a 3D pancreatic tumor segmentation task. Resulting from this study, we have demonstrated that 1) context information from adjacent slices significantly improved the performance of a UNet, and that 2) our novel Deep LOGISMOS method achieved significantly better performance than the UNet and/or LOGISMOS methods alone.
References
 [1] L. Zhang, L. Lu, R. M. Summers, E. Kebebew, and J. Yao, “Personalized pancreatic tumor growth prediction via group learning,” in MICCAI. Springer, 2017, pp. 422–432.
 [2] J. Chen, L. Yang, Y. Zhang, M. Alber, and D. Chen, “Combining fully convolutional and recurrent neural networks for 3D biomedical image segmentation,” in NIPS, 2016, pp. 3036–3044.
 [3] O. Ronneberger, P. Fischer, and T. Brox, “UNet: Convolutional networks for biomedical image segmentation,” in MICCAI. Springer, 2015, pp. 234–241.
 [4] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D UNet: learning dense volumetric segmentation from sparse annotation,” in MICCAI. Springer, 2016, pp. 424–432.
 [5] K. Tseng, Y. Lin, W. Hsu, and C. Huang, “Joint sequence learning and crossmodality convolution for 3D biomedical segmentation,” in CVPR, July 2017.
 [6] L. Zhang, M. Sonka, L. Lu, R. M. Summers, and J. Yao, “Combining fully convolutional networks and graphbased approach for automated segmentation of cervical cell nuclei,” in IEEE ISBI, 2017, pp. 406–409.
 [7] S. Hochreiter and Jü. Schmidhuber, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
 [8] Y. Yin, X. Zhang, R. Williams, X. Wu, D. D. Anderson, and M. Sonka, “LOGISMOSlayered optimal graph image segmentation of multiple objects and surfaces: cartilage segmentation in the knee joint,” IEEE TMI, vol. 29, no. 12, pp. 2023–2037, 2010.

[9]
Z. Guo, S. Kashyap, M. Sonka, and I. Oguz,
“Machine learning in a graph framework for subcortical segmentation,”
in SPIE Medical Imaging, 2017, pp. 101330H–101330H.  [10] R. Siegel, J. Ma, Z. Zou, and A. Jemal, “Cancer statistics, 2014,” CA Cancer J Clin., vol. 64, no. 1, pp. 9–29, 2014.
 [11] Y. Boykov and V. Kolmogorov, “An experimental comparison of mincut/maxflow algorithms for energy minimization in vision,” IEEE TPAMI, vol. 26, no. 9, pp. 1124–1137, 2004.
 [12] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in ACM, 2014, pp. 675–678.