1 Introduction
Basic cues like smooth boundaries and appearance models are often insufficient to regularize complex segmentation problems. This is particularly true in medical applications where objects have weak contrast boundaries and overlapping appearances. Thus, additional priors are needed, e.g. shapepriors [28], volumetric constraints [3], or segments interaction [30, 8]. The latter constraint is the essence of Hierarchicallystructured Interacting Segments (HINTS) model^{1}^{1}1HINTS model was proposed by [8] by was not given a name. [8, 30], which was successfully applied to many segmentation problems, e.g. cells [22], joint cartilage [30], cortical [23] or tubular [20] surfaces.
HINTS model overview:
Any hierarchicallystructured^{2}^{2}2We use hierarchicallystructured and partially ordered interchangeably. segments could be represented as a label tree , see Fig. 1. Tree defines topological relationship between segments as follows; (a) childparent relation means the child segment is inside its parent’s segment, (b) sibling relation means the corresponding segments exclude each other. For example, in Fig. 1 is inside , is inside , while , and exclude each other in . Minmargin is one form of interaction between regions. If region has minmargin then its outside boundary pushes away the outside boundary of its parent and siblings by at least , Fig.1(b).
(a) tree & margins  (b) a feasible segmentation 
ground truth  aexp[7, 8]  QPBO [25, 8]  ours 
Limitations of previous algorithms:
We extend [8], which introduced HINTS for arbitrary trees. In [8] aexpansion (aexp) [7] was used to optimize the multilabel formulation of HINTS, but it often results in bad local minima due to complexities of interaction constraints, e.g. Fig.2. The contribution of [8] is a binary multilayered HINTS formulation. They use highorder data terms, which are not easy to convert into unary and pairwise potentials for arbitrary trees. Their algorithm’s global optimality guarantee depends on the tree at hand. Only trees that do not yield frustrated cycles [25] have this guarantee, but this is not immediately obvious for any given tree. In [8], nonsubmodular binary energy implied by frustrated cycles were addressed by QPBO [25]. In practice, QPBO produces only partial solutions for most trees, see Figs. 2, 15 and 17.
As an alternative to QPBO, [27] formulated HINTS as constraint optimization. They solve the Lagrangian dual of this NPhard problem using an iterative sequence of graph cuts. However, the duality gap may be arbitrarily large and the optimum for HINTS is not guaranteed. Their supergradient optimization of Lagrange multiplier guesses initial solutions and timestep parameters. Also, according to Lemma 5 in [27] their supergradient corresponds to the hard exclusion constraint of HINTS, which is valued. They do not discuss how this affects the algorithm.
In [30] the authors generalize their earlier method [20] for segmenting multiple nested surfaces, i.e. is a chain. In [30] the aim was to segment multiple mutually exclusive objects each with a set of nested surfaces, i.e. is a spider^{3}^{3}3Tree with one node of degree and all others with degree .. But, the proposed approach can handle only a single pair of mutually exclusive objects in a given image region. As such [30]
requires a prior knowledge of the region of interaction for two excluded objects, or computes it using a problem specific trained classifier
[30]. Such prior knowledge is not required for our method. In contrast to our approach, [30] requires a sufficiently close initial segmentation satisfying interaction constraints. In all of our experiments we started from a trivial solution. Unlike our approach, [30] implicitly imposes a star like shape prior [28] and use nonhomogeneous anisotropic polar grids.If interactivity (minmargin) constraints are dropped HINTS degenerates to treemetric labeling. Certain treemetric labeling problems are addressed in [10] using DP to find the global optima if the data terms are also a treemetric. Recently, [1] used convex relaxation to approximate labeling problems where labels are leafs of a DAG. This problem can be reduced^{4}^{4}4Personal communication with the authors of [1]. to general metric labeling [17]. Such labeling problems are significantly different from HINTS due to interactions between segments.
Motivation for PathMoves:
In the context of multilabel HINTS formulation, we propose an effective movemaking algorithm applicable to arbitrary label trees avoiding limitations of the previous optimization methods.
In contrast to aexp [7], our PathMoves are nonbinary: when expanding label any pixel can change its current label to any label along the path connecting its current label and in the tree. Optimization uses our generalization of the wellknown multilayered Ishikawa technique [15] for convex potentials over strictly ordered labels. In essence, PathMoves combine aexp and Ishikawa. In the special case of a chaintree our algorithm reduces to Ishikawalike construction in [8, 15] finding global minimum in one step. On the contrary, when is a singlelevel star our algorithm reduces to aexp. Note that closely related multilabel rangemoves [29] also combine aexp and Ishikawa for nonconvex pairwise potentials over strictly ordered labels (a chain). In contrast to PathMoves, in rangemoves all pixels have the same set of feasible labels.
Our contributions are summarized below:

we show stateoftheart biomedical segmentation results for complex trees.
2 Hierarchicallystructured Interacting Segments
Given pixel set , neighborhood system , and labels (regions) the HINTS model can be formulated as
(1) 
where is a label assigned to and is a labeling of all pixels.
The data and smoothness terms are widely used in segmentation, e.g. [6, 4]. Data term ) is the cost incurred when pixel is assigned to label . Usually, is negative log likelihood of the label’s probabilistic model, which could be fitted using scribbles [4, 24] or known a priori.
The smoothness term regularizes segmentation discontinuities. A discontinuity occurs when two neighboring pixels are assigned to different labels. Parameter weights the importance of the smoothness term. The most commonly used smoothness potential is Potts model [7].We use treemetric smoothness [10, 11] which is more true to the physical structure of the labels in some settings, especially medical segmentation as we explain shortly.
A function is treemetric if there exists a tree with nonnegative edge weights and is equal to the sum of edge weights along the unique path between nodes and in the tree. In our setting the label tree is such a tree and is completely defined by assigning nonnegative weights to every edge in . Thus, for any , in
(2) 
where is the set of ordered labels on the path between and in the undirected tree . The summation in (2) is between pairs of neighboring labels on path
To motivate treemetric smoothness consider in Fig.1(a). This tree implies that regions , and are nested. For example, , and could be background, cell and nucleus, respectively. In the physical world boundaries of nested regions never merge into a single boundary. That is, if in the image we observe a boundary between and this corresponds to two boundaries, namely and in the physical world. Therefore, the boundary cost should be the sum of and boundary costs. The summation property of nested boundaries can be modeled as treemetric smoothness. In contrast, Potts model penalizes multiple nested boundaries as a single boundary.
The interaction term in (1) ensures that the minmargin constraints are satisfied at every pixel, see Fig. 3,
(3) 
where is an infinitely large scalar, are the nodes of the subtree rooted at , is ’s parent in , and is the Iverson bracket. This term guarantees that any labeling that violates minmargin constraint has infinite energy.
(a) tree  (b) minmargin constraint at pixel 
In general, the interaction term could model not only minmargin but also region attraction [8], scene parsing [21, 8]
, or a combination of these constraints. However, the focus of this paper is developing an effective combinatorial optimization move for energy (
1). Thus, for simplicity of exposition we only cover minmargins.We now compare our formulation to that in [8]. Inclusion is an easy constraint to impose in both formulations as it reduces to using treemetric smoothness. In our formulation exclusion is satisfied by definition because we use multilabel formulation and each pixel is assigned to only one label. In contrast, in [8]
the label of a pixel is represented by several binary variables. Therefore,
[8] needs to explicitly enforce exclusion to maintain the validity of these binary variables w.r.t. tree . Often this leads to nonsubmodular terms that are difficult to optimize.(a) tree & margins  (b) current labeling 
(c) largest expansion [7] on  (d) largest PathMove on 
3 Optimization
In Section 3.1 we introduce our PathMove algorithm and in Section 3.2 we show which interaction constraints PathMove could optimize. The authors in [8] showed that HINTS is nonsubmodular for a general tree and they used either QPBO or aexp for optimization. Unfortunately, QPBO does not guarantee to label all pixels and we observed that in our experiments, see Fig. 2. The aexp algorithm [7] is guaranteed to label all pixels but prone to weak local minima, Fig. 2.
We build on aexp algorithm [7]. The algorithm in [7] maintains a valid current labeling and iteratively tries to decrease the energy by switching from the current labeling to a nearby labeling via a binary expansion move. In a binary expansion, a label is chosen randomly and allowed to expand. Each pixel is given a binary choice to either stay as or switch to , i.e. . The algorithm stops when it cannot decrease the energy anymore.
Due to the “binary” nature of the expansion move interaction constraints cause aexp to be highly sensitive to initialization and prone to converge to a weak local minima even for simple trees, see Fig. 4.
Instead of using a binary expansion move [7] in step 4 of the aexp algorithm, we propose a more powerful “multilabel” move, namely, PathMove. Figure 4(d) shows how robust a PathMove is compared to a binary one [7].
3.1 PathMove
In a PathMove on each pixel can choose any label in the ordered set where is the current label of . Thus, the set of feasible labels for is , see examples in Fig. 5.
Given an arbitrary current labeling and label , we now show how to build a graph such that the mincut on this graph corresponds to the optimal PathMove. We use and to denote source and sink nodes of the mincut problem, respectively. Our construction is motivated by [15, 7, 29].
(a) data encoding  (b) smoothness encoding 
Data Term:
For each pixel we generate a chain of nodes whose size is . Let us rename to where and . Note that and depend on but we drop explicit dependence on from notation for clarity. Figure 6(a) illustrates chain and how it is linked to and . The edge weights along the directed path encode the the data terms of while the weights along the opposite direction are . If the edge along the is cut, then pixel is assigned to label . The edges ensure that any mincut severs only one edge on the path as proposed by [15]. Thus, the sum of severed edges on paths for all pixels adds to the data term in (1).
It should be noted that although is used as a weight for nlink^{5}^{5}5An edge between two nodes and neither of those nodes is or . edges it could still be negative. In case of negative a positive constant is added to for all and to ensure that the new data terms are nonnegative for every pixel—equivalent to adding a constant to (1).
Smoothness:
Let and be a pair of neighboring pixels. Note the overlap between and is at least one label, see Fig. 5. Our graph construction treats the sequence of overlapping labels of paths and differently from the nonoverlapping parts. Therefore we rename and to emphasize the overlap. Figure 6(b) shows the newly weighted edges that are added to out constructed graph to encode the smoothness penalty .
The overlapping part forms a linear ordering for which the smoothness cost is encoded as proposed by [15]. The nonoverlapping parts and each forms a linear ordering independent of the other, but extending linear ordering. In this case smoothness penalties are handled by additional edges from , for proof of correctness see Appendix A.
Interaction Constraints:
Let and be within each other. As per energy (3), to impose the margin constraint between and we need to add edges to our graph to ensure that whenever and the corresponding energy is infinite. Thus, we need to eliminate/forbid labels along the path that would violate the constraints if is assigned to a label in To impose such constraint between and expansion paths there are several cases to consider depending on whether each of , or is in or not as follows.
(a) Scenario I, case 1  (b) Scenario I, case 2 
(a)  (b) 
Scenario I when :
Case 1, assume and . Since and by assumption, we can deduce that and are both in see Fig.7 (a), otherwise our assumption that is a tree would be violated. Following the same reasoning we can deduce that and are both in . Thus, there are possible labelings/configurations involving and that violate . In other words, if is assigned to a label in there is a chance of assigning to a label that is not in , since part of expansion path is not entirely in , Fig. 7(a). We forbid those configurations by adding a edge between graph chains and as shown in Fig.8(a). Thus, eliminating the possibility of a min. cut that would simultaneously assign to a label and to a label
Case 2, assume . Since and by assumption, we can deduce that , otherwise our assumption that is a tree would be violated, see Fig. 7 (b). Thus, additional edges are not needed since is guaranteed to be in . Simply, in this case there is no chance of violating the constraints regardless of what label is assigned to , because we could only assign to a label in , i.e. .
Case 3, assume and . Since and by assumption, we can deduce that , otherwise is not a tree. On the one hand, if then no additional edges needed since . On the other hand, the case when is not possible as this would imply that the current labeling violates the . Recall, PathMoves switches from one valid labeling to another.
Scenario II when :
Case 1, assume and . This case follows the same reasoning as scenario I, case 1. Similarly to scenario I, case 1 we handle the forbidden configurations by adding an additional edge as shown in Fig.8(b).
Case 2, assume . Since and are by assumption, we can deduce that . Thus, no new edges are needed, as we are only interested in the case when . Recall that we only forbid labels along chain if is in , which is not possible in this case.
Case 3, assume and . Since and are by assumption then we can deduce that On the one hand, if then we can deduce that while (by assumption), i.e. current labeling violates the constraint and this is not possible. On the other hand, if then we can deduce that otherwise the current labeling would violate . When the construction is as shown in Fig.8(b) except there are no nodes above for .
3.2 Interaction Representability Condition
Unfortunately, not every interaction constraint could be represented as constraints between expansion paths during a PathMove. To be specific, sometimes interactions lead to conflicting constraints between the expansion paths, i.e. the graph construction inadvertently forbids a permissible configuration. We refer to an interaction that could be optimized by PathMove as a PathMove representable constraint, e.g. minmargin is a PathMove representable constraint while strict boxlayout constraints described in [21, 8] for scene parsing is not PathMove representable.
The objective of boxlayout scene parsing is segmenting the scene into 5 regions; left, top, right, bottom, and background, denoted by L, T, R, B, and G, respectively, in Fig. 9(a). The strict boxlayout constraints are illustrated in Fig. 9(bf). For example as shown in (b), when pixel is directly above and then according to the layout in (a) could only be labeled or , i.e. . The rest of the constraints shown in Fig. 9(bf) are derived in the same way from the layout in (a).
(a) strict boxlayout  (b) G restrictions 
(c) L restrictions  (d) T restrictions 
(e) R restrictions  (f) B restrictions 
(a) tree 1  (b) tree 2 
To show a case that leads to conflicting constraints let us consider the tree shown in Fig. 10(a). Also assume that pixel is directly below and that their current labeling is and , respectively. As shown in Fig. 11(Left), when expanding on an edge is add to avoid assigning and to and , respectively, which is a forbidden configuration as per Fig. 9(c). However, the same edge also forbids a permissible labeling that would assign and to and , respectively. As you can see, when using the hierarchical tree in Fig. 10(a) we can not properly represent the strict boxlayout interaction constraints during a PathMove.
In general, an interaction constraint is not PathMove representable if there exists and where and while configuration is prohibited and is permissible, see Fig. 11 (Right). Nonrepresentable PathMove interactions lead to a nonsubmodular PathMove [26]. Nonetheless, this could be avoided either by modifying the tree or relaxing the interaction constraints. For instance, the strict boxlayout becomes PathMove representable when using the alternative hierarchical tree shown in Fig. 10(b). If modifying the hierarchical tree is not an option, then by relaxing the constraints one could always achieve PathMove representability. For example, by relaxing the strict boxlayout constraints shown in Fig. 12 they become PathMove representable for the hierarchical tree shown in Fig. 10(a).
(a) relaxed boxlayout  (b) G restrictions 
(c) L restrictions  (d) T restrictions 
(e) R restrictions  (f) B restrictions 
4 Shape Priors for HINTS
In this section we extend starshape [28], Geodesicstar [12] and Hedgehogs [13] priors to the HINTS model and show how to enforce these priors during a PathMove.
In the context of binary segmentation, starshape prior [28] on label with star center reduces to the following constraint. If pixels and lie on any line originating from with in the middle and is labeled , then must also be labeled , see Fig.13(a). Geodesicstar [12] and Hedgehogs [13] differ from starshape prior in terms of what defines the center and how lines from the center (or geodesic paths) are generated. Furthermore, Hedgehogs [13] allow control over shape constraint tightness, see [13] for details.
For partially ordered segments we generalize the starshape prior constraint as follows. If pixels and lie on any line originating from with in the middle and is in , then must also be in , see Fig.13(b).
The shape prior penalty term is
(4) 
where is the set of all ordered pixel pairs^{6}^{6}6In practice, it is enough to include only consecutive pixel pairs in . along any line containing such that is between and . Using [12] or [13] instead of [28] results in a different .
(a) starshape prior [28]  (b) starshape prior + HINTS 
(a)  (b) 
Let pixels and lie on a line passing through , and is between and . To impose the starshape prior for label during a PathMove, there are multiple cases to consider depending on whether each of , and is in or not as follows.
Scenario I: when :
Case 1, assume and . In this case we can deduce that and are both in and . Thus, there are possible forbidden configurations and to handle them we add an edge as in Fig. 14(a).
Cases 2, assume . We can deduce that . Thus, no additional edges are needed since is guaranteed to be in .
Case 3, assume and . An impossible case as the current labeling would be violating the shapeprior.
GreyMatter  WhiteMatter  CSF  SGM  
Ours  QPBO  aexp  Ours  QPBO  aexp  Ours  QPBO  aexp  Ours  QPBO  aexp  
F Score  
Precision  
Recall 
. The precision and recall were averaged over 15 examples. Our method and QPBO clearly outperformed aexp which was very sensitive to initialization and the order in which labels were expanded on. On average QPBO left
of the pixels unlabeled and in one instance . These values rise significantly when not using the Hedgehog shape prior, see Table 2.GreyMatter  WhiteMatter  CSF  SGM  
Ours  QPBO  aexp  Ours  QPBO  aexp  Ours  QPBO  aexp  Ours  QPBO  aexp  
F Score  
Precision  
Recall 
Scenario II: when :
Case 1, assume and . This case is similar to scenario I, case 1, the added edge is shown in Fig. 14(b).
Cases 2, assume . We can deduce that . Thus, no edge needed since can not be in .
Case 3, assume and , see case 3 above.
5 Experiments
Our 2D medical segmentation experiments focus on comparing PathMoves for optimizing energy (1) or (1)+(4) to QPBO [25, 8] and aexp [7, 8]. In all experiments was set to 1. To define our treemetric, every edge in was assigned a nonnegative weight computed using a nonincreasing function of difference in and intensities similar to [2]. Also, whenever a Hedgehog [13] shape prior was used its tightness parameter was set to .
The experiments evaluate the effectiveness of PathMoves as a combinatorial multilabeling move. As such we assume that the color models are known a priori. One can easily integrate PathMoves in a framework that estimates initial color models using user interaction and iteratively alternates between labeling pixels and reestimating color models in a GrabCut fashion, e.g.
[24, 9, 14].(a) tree and minmargins  
Subject 2 

Subject 3 

Subject 4 

ground truth  aexp [7, 8]  QPBO [25, 8]  ours 
Brain Segmentation:
We combined the labeled regions in dataset [19] (T1W MRI) to create the tree shown in Fig. 15(a). In this setting, the data term is the sum of color model penalty and an shape prior [5] based on an automatically extracted brain mask using [16],
(5) 
where is the intensity at pixel and is the Euclidean Distance Transform of the extracted brain mask. Minmargins are shown in Fig. 15(a). We also added a Hedgehog prior [13] for the subcortical greymatter to help our energy differentiate between greymatter and subcortical greymatter.
In this application our method outperformed QPBO in most cases and aexp in all cases. In fact aexp always converged to a weak local minima in this setting, see Fig. 15. Based on our experience the quality of aexp result depends on various factors, e.g. tree complexity, the number of minmargins introduced, the order in which labels are expanded, and the initial solution. For the subjects that QPBO was able to find the global optimal PathMoves either found the global optimal or a very close solution.
minmargins 
Hedgehog prior 



minmargins 
No Hedgehog prior 


No minmargins 
Hedgehog prior 


No minmargin 
No Hedgehog prior 


aexp [7, 8]  QPBO [25, 8]  ours 
Figure 16 shows the results for Subject 1 with (and without) minmargins and Hedgehog prior. The third row shows the results when not using minmargins. PathMoves converged after two iterations to a lower energy than aexp, which converged after six iterations. In this case aexp local minimum was due to the Hedgehog prior, see last row.
Table 1 compares the precision, recall and F score for each region individually, where . The higher F values correspond to better segmentation. In general, QPBO was unpredictable as in some cases it found the optimal solution and in other cases it left a large number of pixels unlabeled.
Table 2 show the results after dropping the hedgehog prior. In terms of PathMoves, the mainly affected label after dropping the Hedgehog prior is the subcortical gray matter, as it started to grab parts of the gray matter, see Fig.16 second row, last column. Comparing Tables 1 and 2 it is clear that QPBO and aexp benefited the most by introducing the hedgehog prior.
Heart Segmentation:
In this setting we only used color models for the data term and no shape priors. Figure 17(a) shows the labels tree. For aexp to escape its local minimum it needs to first expand the left ventricle and then the left papillary muscles. However, expanding on left ventricle would lead to a higher energy than the current one. PathMoves avoids this local minimum by allowing both labels to expand simultaneously when performing a PathMove on the left papillary muscles.
Abdominal Organ Segmentation:
We used CT dataset and extended the work in [13] which used Hedgehogs to segment liver and kidneys. In contrast to [13], we utilized more detailed structures reaching labels.
Ours  QPBO  aexp  

F Score  
Weighted Precision  
Weighted Recall 
Ours  QPBO  aexp  

F Score  
Weighted Precision  
Weighted Recall 
(a) tree  
(b) ground truth  (c) ours (PathMoves) 
For each test case we computed the weighted precision
where is the ground truth labeling. The weighted recall is defined similarly. As shown in Table 3, all methods performed comparably due to the use of Hedgehog priors and the starlike structure of , which exp is well suited for. See Table 4 for results without using Hedgehog priors. Figure 18 shows the tree and our result for one test case. Interestingly, QPBO labeled all the pixels in all 7 test cases. By comparing Tables 3 and 4 it is easy to see the benefit of using Hedgehog priors. Moreover, PathMoves outperformed QPBO and aexp after dropping the Hedgehog priors.
(a) tree  
(b) ground truth  (c) aexp 
(d) QPBO  (e) ours (PathMoves) 
We pursued a more challenging structure, see Fig.19(a). The objective in this case was to segment the liver into three segments and any tumors inside them separately. Due to the large overlap between color models and the complex structure, having Hedgehog priors was not enough for QPBO or aexp to converge to an adequate solution, see Fig.19(ce). PathMoves was able to achieve good results by avoiding local minima as in Fig.19(c). Furthermore, PathMoves guarantees full labeling in contrast to QPBO, which left of the pixels unlabeled in Fig.19(d).
In conclusion, our results empirically show that for a general tree aexp converges to weak local minima, and QPBO is unpredictable in terms of being able to label all pixels. PathMoves which uses an effective multilabel expansion move, labels all pixels and easily avoids weak local minima that aexp is prone to.
6 Discussion
PathMoves is applicable to treemetrics which could be used to approximate arbitrary metrics [18, 10]. Even in the absence of interactions, PathMoves is a more powerful move making algorithm than aexp [7] because of the multilabel nature of its moves. Thus, PathMoves is a better fit for applications that rely on treemetrics such as [18]. In the presence of interaction constraints the optimality bound of [7] is not valid. The proof in [7] assumes that given any labeling every pixel with ground truth label could switch to via a binary expansion on . This is no longer guaranteed as interaction constraints limit [7] expansion domain, e.g. see Fig.4(c). Our experiments empirically show that PathMoves finds optimal or near optimal solution. In the cases where QPBO found full labeling, i.e. optimal solution, PathMoves either found the same solution or a very close one, see Table 3 and Fig.15 Subject 4.
In terms of space complexity aexp is the most efficient as it requires building a graph with nodes while QPBO requires a significantly larger graph with nodes. A PathMove graph size depends on . When is balanced it requires nodes and in the worse case when is a chain.
There is one limitation when using our PathMoves to optimize (1) compared to [8]. In [8] it is possible to explicitly control the min. exclusion margin between two siblings, say A and B in . In our model the min. exclusion margin is implicit and it is equal to Because siblings such as and are not directly connected in the tree.
Another limitation are interaction constraints that are not PathMove representable. An interaction constraint is not PathMove representable if there exists and where and while configuration is prohibited and is permissible [26]. In general, this could be avoided either by slightly modifying tree or relaxing the interaction constraints.
(a)  (b)  (c)  (d) 
7 Conclusion
The proposed multilabeling move is effective in optimizing models with hierarchicallystructured segments (partially ordered labels) and interaction constraints. In contrast to binary expansion move [7], our move avoids local minima caused by interaction constraints.
Our experiments cover various medical segmentation applications, e.g. brain and heart segmentation. Our results show that PathMoves always perform at least as well as prior methods. Moreover, PathMoves significantly outperform prior methods when using complex trees and/or regions with ambiguous color models.
PathMoves is applicable to arbitrary trees. This is in contrast to [8] which is not easy to generalize for an arbitrary tree as it relies on the cumbersome process of reducing highorder data terms to unary and pairwise potentials.
We generalized starlike shape priors in the context of partially ordered labels. Extending preexisting commonly used priors to partially ordered labels is an interesting idea on its own and we leave this for future work.
Acknowledgments
This work was supported by NIH grants R01EB004640, P50CA174521 and R01CA167632. We thank Drs. S. O’Dorisio and Y. Menda for providing the liver data (NIH grant U01CA140206). This work was also supported by NSERC Discovery and RTI grants (Canada) for Y. Boykov and O. Veksler.
Appendix A Smoothness Encoding Proof Of Correctness
The following proof uses some of the treemetric function properties, mainly symmetry and treemetric condition (2). Let and be a pair of neighbour pixels and assume that we are expanding on label . Recall that our graph construction treats the sequence of overlapping labels of paths and differently from the nonoverlapping parts. Therefore, we rename and to emphasize the overlap.
To show the correctness of our smoothness encoding shown in Fig.6(b) we need to consider the cost of all possible cuts involving pixels and . However, it is enough to only consider the four general cuts shown in Fig. 20, which are representative of all possible cuts.
Case I, and : Let and where . If then the cost of severed edges, shown in Fig. 20(a), will be
(6) 
Using equation (2) we can deduce that the terms in (6) sum to , which is the correct smoothness cost based on our assumption regarding and .
If then based on our construction no smoothness edges will be severed between that the and expansion paths. Thus, according to our construction the cost for the assigning and to will be 0, which is correct since for all by definition.
Finally, the proof when is equivalent to changing the roles of and when .
Case II, and : Let and . The cost of severed edges, shown in Fig.20(b), will be
Using equation (2) we can deduce the following
which is the correct smoothness cost based on our assumption regarding and .
Case III, and : Let and . Using equation (2) we can deduce that the cost of severed edges, shown in Fig.20(c), will be
which is the correct smoothness cost based on our assumption regarding and .
Case IV, and : Let and . Using equation (2) we can deduce that the cost of severed edges, shown in Fig.20(d), will be
which is the correct smoothness cost based on our assumption regarding and .
Recall that for any pixel a nonprohibitively expensive min. cut severs only one edge along its corresponding graph chain , which is due to the data terms exclusion constraints. Thus, this proof showed that our smoothness cost encoding is correct for any feasible cut involving a pair of neighboring pixels.
References

[1]
J. S. H. Baxter, M. Rajchl, A. J. McLeod, J. Yuan, and T. M. Peters.
Directed acyclic graph continuous maxflow image segmentation for
unconstrained label orderings.
International Journal of Computer Vision
, pages 1–20, 2017.  [2] Y. Boykov and G. FunkaLea. Graph cuts and efficient ND image segmentation. International Journal of Computer Vision (IJCV), 70(2):109–131, 2006.
 [3] Y. Boykov, H. Isack, C. Olsson, and I. Ben Ayed. Volumetric bias in segmentation and reconstruction: Secrets and solutions. In The IEEE International Conference on Computer Vision (ICCV), December 2015.
 [4] Y. Boykov and M.P. Jolly. Interactive graph cuts for optimal boundary & region segmentation of objects in ND images. In ICCV, volume I, pages 105–112, July 2001.
 [5] Y. Boykov, V. Kolmogorov, D. Cremers, and A. Delong. An integral solution to surface evolution PDEs via geocuts. In European Conference on Computer Vision (ECCV), Graz, Austria, May 2006.
 [6] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. In International Conference on Computer Vision, volume I, pages 377–384, 1999.
 [7] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE transactions on Pattern Analysis and Machine Intelligence, 23(11):1222–1239, November 2001.
 [8] A. Delong and Y. Boykov. Globally Optimal Segmentation of MultiRegion Objects. In International Conference on Computer Vision (ICCV), 2009.
 [9] A. Delong, A. Osokin, H. Isack, and Y. Boykov. Fast Approximate Energy Minization with Label Costs. International Journal of Computer Vision (IJCV), 96(1):1–27, January 2012.

[10]
P. F. Felzenszwalb, G. Pap, E. Tardos, and R. Zabih.
Globally optimal pixel labeling algorithms for tree metrics.
In
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
, pages 3153–3160. IEEE, 2010.  [11] I. Gridchyn and V. Kolmogorov. Potts model, parametric maxflow and ksubmodular functions. In Proceedings of the IEEE International Conference on Computer Vision, pages 2320–2327, 2013.
 [12] V. Gulshan, C. Rother, A. Criminisi, A. Blake, and A. Zisserman. Geodesic star convexity for interactive image segmentation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3129–3136. IEEE, 2010.
 [13] H. Isack, O. Veksler, M. Sonka, and Y. Boykov. Hedgehog shape priors for multiobject segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
 [14] H. N. Isack and Y. Boykov. Energybased Geometric MultiModel Fitting. International Journal of Computer Vision (IJCV), 97(2):123–147, April 2012.
 [15] H. Ishikawa. Exact optimization for Markov Random Fields with convex priors. IEEE transactions on Pattern Analysis and Machine Intelligence, 25(10):1333–1336, 2003.
 [16] M. Jenkinson, M. Pechaud, and S. Smith. Bet2: Mrbased estimation of brain, skull and scalp surfaces. In Eleventh annual meeting of the organization for human brain mapping, volume 17, page 167. Toronto, ON, 2005.
 [17] J. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. J. ACM, 49(5):616–639, Sept. 2002.

[18]
M. P. Kumar and D. Koller.
Map estimation of semimetric mrfs via hierarchical graph cuts.
In
Proceedings of the TwentyFifth Conference on Uncertainty in Artificial Intelligence
, UAI ’09, pages 313–320, Arlington, Virginia, United States, 2009. AUAI Press.  [19] B. Landman and S. Warfield. Miccai workshop on. In MultiAtlas Labeling, 2012.
 [20] K. Li, X. Wu, D. Z. Chen, and M. Sonka. Optimal surface segmentation in volumetric imagesa graphtheoretic approach. IEEE transactions on pattern analysis and machine intelligence, 28(1):119–134, 2006.
 [21] X. Liu, O. Veksler, and J. Samarabandu. Graph cut with ordering constraints on labels and its applications. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008.
 [22] A. Lucchi, C. Becker, P. M. Neila, and P. Fua. Exploiting enclosing membranes and contextual cues for mitochondria segmentation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 65–72. Springer, 2014.
 [23] I. Oguz and M. Sonka. Logismosb: layered optimal graph image segmentation of multiple objects and surfaces for the brain. IEEE transactions on medical imaging, 33(6):1220–1235, 2014.
 [24] C. Rother, V. Kolmogorov, and A. Blake. Grabcut  interactive foreground extraction using iterated graph cuts. In ACM transactions on Graphics (SIGGRAPH), August 2004.
 [25] C. Rother, V. Kolmogorov, V. Lempitsky, and M. Szummer. Optimizing binary mrfs via extended roof duality. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2007.
 [26] D. Schlesinger and B. Flach. Transforming arbitrary minsum problem into a binary one. TU, Fak. Informatik, 2006.
 [27] J. Ulén, P. Strandmark, and F. Kahl. An efficient optimization framework for multiregion segmentation based on lagrangian duality. IEEE transactions on medical imaging, 32(2):178–188, 2013.
 [28] O. Veksler. Star shape prior for graphcut image segmentation. In European Conference on Computer Vision (ECCV), 2008.
 [29] O. Veksler. Multilabel moves for mrfs with truncated convex priors. International journal of computer vision, 98(1):1–14, 2012.
 [30] Y. Yin, X. Zhang, R. Williams, X. Wu, D. D. Anderson, and M. Sonka. Logismos–layered optimal graph image segmentation of multiple objects and surfaces: cartilage segmentation in the knee joint. IEEE transactions on medical imaging, 29(12):2023–2037, 2010.