1 Introduction
We study a general class of complex nonlinear segmentation energies with highorder regional terms. Such energies are often desirable in the computer vision tasks of image segmentation, cosegmentation and stereo [15, 12, 22, 3, 2, 16, 14, 13] and are particularly useful when there is a prior knowledge about the appearance or the shape of an object being segmented.
We focus on energies of the following form:
(1) 
where is a binary segmentation, is a standard lengthbased smoothness term, and is a (generally) nonlinear regional functional discussed below.
Let be an image defined in . The most basic type of regional terms used in segmentation is a linear functional , which can be represented via an arbitrary scalar function
(2) 
Usually, corresponds to an appearance model based on image intensities/colors , e.g., could be a loglikelihood ratio for intensity given particular object and background intensity distributions. The integral in (2) can be seen as a dot product between scalar function and
, which denotes the characteristic function of set
. We use notation to refer to such linear functionals.More general nonlinear regional functional can be described as follows. Assume scalar functions , each defining a linear functional of type (2), and one differentiable nonlinear function that combines them,
(3) 
Such general regional terms could enforce nonlinear constraints on volume or higherorder shape moments of segment
. They could also penalize or other distance metrics between the distribution (or nonnormalized bin counts) of intensities/colors inside segment and some given target. For example, could be softly constrained to specific volume via quadratic functionalusing and , while the KullbackLeibler (KL) divergence between the intensity distribution in and a fixed target distribution could be written as
Here scalar function is an indicator for pixels with intensity (in bin ). More examples of nonlinear regional terms are discussed in Section 3.
In general, optimization of nonlinear regional terms is NPhard and cannot be addressed by standard global optimization methods. Some earlier papers developed specialized techniques for particular forms of nonlinear regional functionals. For example, the algorithm in [2] was developed for minimizing the Bhattacharyya distance between distributions and a dualdecomposition approach in [24] applies to convex piecewise linear regional functionals (3). These methods are outside the scope of our paper since we focus on more general techniques. Combinatorial techniques [15, 22] apply to general nonlinear regional functionals, and they can make large moves by globally minimizing some approximating energy. However, as pointed out in [14], these methods are known to converge to solutions that are not even a local minimum of energy (1). Level sets are a well established in the litterature as a gradient descent framework that can address arbitrary differentiable functionals [19], and therefore, are widely used for highorder terms [1, 11, 4, 18, 12].
This paper compares two known optimization methods applicable to energy (1) with any highorder regional term (3): trust region and level sets. To address nonlinear term , both methods use its firstorder functional derivative
(4) 
either when computing gradient flow for (1) in level sets or when approximating energy (1) in trust region. However, despite using the same first derivative , the level sets and trust region are fairly different approaches to optimize (1).
The structure of the paper is as follows. Sections 2.12.2 review the two general approaches that we compare: gradient descent implemented with level sets and trust region implemented with graph cuts. While the general trust region framework [13] can be based on a number of underlying global optimization techniques, we specifically choose a graph cut implementation (versus continuous convex relaxation approach), since it is more appropriate for our CPUbased evaluations. Our experimental results and comparisons are reported in Section 3 and Section 4 presents the conclusions.
2 Overview of Algorithms
Below we provide general background information on level sets and trust region methods, see Sections 2.1–2.2. Highlevel conceptual comparison of the two frameworks is provided in Section 2.3.
2.1 Standard Level Sets
In the level set framework, minimization of energy
is carried out by computing a partial differential equation (PDE), which governs the evolution of the boundary of
. To this end, we derive the EulerLagrange equation by embedding segment in a oneparameter family , , and solving a PDE of the general form:(5) 
where is an artificial time step parameterizing the descent direction. The basic idea is to describe segment implicitly via an embedding function :
(6) 
and evolve instead of . With the above representation, the terms that appear in energy (1) can be expressed as functions of as follows [9, 19]:
(7) 
Here, and denote the Dirac function and Heaviside function, respectively. Therefore, the evolution equation in (5) can be computed directly by applying the EulerLagrange descent equation with respect to . This gives the following gradient flow:
(8) 
with denoting the curvature of ’s level lines. The first term in (8) is a regional flow minimizing , and the second is a standard curvature flow minimizing the length of the segment’s boundary.
In standard level set implementations, it is numerically mandatory to keep the evolving close to a distance function [17, 20]. This can be done by reinitialization procedures [20], which were intensively used in classical level set methods [8]. Such procedures, however, rely on several ad hoc choices and may result in undesirable side effects [17]. In our implementation, we use an efficient and wellknown alternative [17], which adds an internal energy term that penalizes the deviation of from a distance function:
(9) 
In comparison to reinitialization procedures, the implementation in [17] allows larger time steps (and therefore faster curve evolution). Furthermore, it can be implemented via simple finite difference schemes, unlike traditional level set implementations which require complex upwind schemes [23]. With the distancefunction penalty, the gradient flow in (8) becomes:
(10) 
For all the experiments in this paper, we implemented the flow in (10) using the numerical prescriptions in [17]. For each point of the discrete grid, we update the level set function as
(11) 
where is the discrete time step and is the iteration number. is a numerical approximation of the righthand side of (10), where the spatial derivatives of are approximated with central differences and the temporal derivative with forward differences. The Dirac function is approximated by for and elsewhere. We use and .
In the context of level set and PDE methods, it is known that the choice of time steps should follow strict numerical conditions to ensure stability of front propagation, e.g., the standard CourantFriedrichsLewy (CFL) conditions [10]. These conditions require that should be smaller than a certain value that depends on the choice of discretization. The level set literature generally uses fixed time steps. For instance, classical upwind schemes [23] generally require a small for stability, whereas the scheme in [17] allows relatively larger time steps. The optimum time step is not known a priori and finding a good via an adaptive search such as backtracking [5] seems attractive. However, to apply a backtracking scheme, we would have to evaluate the energy at each step. In the case of level sets, this requires a discrete approximation of the original continuous energy. We observed in our experiments that the gradient of such discrete approximation of the energy does not coincide with the gradient obtained in the numerical updates in (11). Therefore, with a backtracking scheme, level sets get stuck very quickly in a local minimum of the discrete approximation of the energy (See the adaptive level set example in Fig. 2). We believe that this is the main reason why, to the best of our knowledge, backtracking approaches are generally avoided in the levelset literature. Therefore, in the following levelset experiments, we use a standard scheme based on fixed time step during each curve evolution, and report the performance at convergence for several values .
2.2 Trust Region Framework
Trust region methods are a class of iterative optimization algorithms. In each iteration, an approximate model of the optimization problem is constructed near the current solution. The model is only “trusted” within a small region around the current solution called “trust region”, since in general, approximations fit the original nonlinear function only locally. The approximate model is then globally optimized within the trust region to obtain a candidate iterate solution. This step is often called trust region subproblem. The size of the trust region is adjusted in each iteration based on the quality of the current approximation. Variants of trust region approach differ in the kind of approximate model used, optimizer for the trust region subproblem step and a protocol to adjust the next trust region size. For a detailed review of trust region methods see [25].
Below we outline a general version of a trust region algorithm in the context of image segmentation. The goal is to minimize in Eq. (1). Given solution and distance , the energy is approximated using
(12) 
where is the first order Taylor approximation of the nonlinear term near . The trust region subproblem is then solved by minimizing within the region given by . Namely,
(13) 
Once a candidate solution is obtained, the quality of the approximation is measured using the ratio between the actual and predicted reduction in energy. The trust region is then adjusted accordingly.
For the purpose of our CPUbased evaluations we specifically selected the Fast Trust Region (FTR) implementation [13] which includes the following components for the trust region framework. The nonlinear term is approximated by the first order Taylor approximation in (12) using firstorder functional derivative (4). The trust region subproblem in (13) is formulated as unconstrained Lagrangian optimization, which is globally optimized using one graphcut (we use a floating point precision in the standard code for graphcuts [7]). Note that, in this case, the length term is approximated using CauchyCrofton formula as in [6]. More details about FTR can be found in [13].
2.3 Conceptual Comparison
Some highlevel conceptual differences between the level sets and trust region optimization frameworks are summarized in Figure 1. Standard level sets methods use fixed to make steps in the gradient descent direction. Trust region algorithm [13] moves to solution minimizing approximating functional within a circle of given size . The trust region size is adaptively changed from iteration to iteration based on the observed approximation quality. The blue line illustrates the spectrum of trust region moves for all values of . Solution is the global minimum of approximation . For example, if is a 2ndorder Taylor approximation of at point then would correspond to a Newton’s step.
3 Experimental Comparison
In this section, we compare trust region and level sets frameworks in terms of practical efficiency, robustness and optimality. We selected several examples of segmentation energies with nonlinear regional constraints. These include: 1) quadratic volume constraint, 2) shape prior in the form of distance between the target and the observed shape moments and 3) appearance prior in the form of either
distance, KullbackLeibler divergence or Bhattacharyya distance between the target and the observed color distributions.
In all the experiments below, we optimize energy of a general form . To compare optimization quality, we will plot energy values for the results of both level sets and trust region. Note that the Fast Trust Region (FTR) implementation uses a discrete formulation based on graphcuts and level sets are a continuous framework. Thus, the direct comparison of their corresponding energy values should be done carefully. While numerical evaluation of the regional term is equivalent in both methods, they use completely different numerical approaches to measuring length . In particular, level sets use the approximation of length given in (2.1), while the graph cut variant of trust region relies on integral geometry and CauchyCrofton formula popularized by [6].
Since the energies are not comparable by its actual number, we study instead the robustness of each method independently and compare the resulting segmentation with one another. Note that level sets have much smaller oscillation for small time steps, which supports the theory of the CFLconditions.
In each application below we examine the robustness of both trust region and level sets methods by varying the running parameters. In the trust region method we vary the multiplier used to change the size of the trust region from one iteration to another. For the rest of the parameters we follow the recommendations of [13]. In our implementation of level sets we vary the timestep size , but keep the parameters and fixed for all the experiments.
The topleft plots in figures 26 report energy as a function of the CPU time. At the end of each iteration, both level sets and trust region require energy updates, which could be computationally expensive. For example, appearancebased regional functionals require reevaluation of color histograms/distributions at each iteration. This is a time consuming step. Therefore, for completeness of our comparison, we report in the topmiddle plots of each figure energy versus the number of energy evaluations (number of updates) required during the optimization.
3.1 Volume Constraint
First, we perform image segmentation with a volume constraint with respect to a target volume , namely,
We choose to optimize this energy on a synthetic image without appearance term since the solution to this problem is known to be a circle. Figure 2 shows that both FTR and level sets converge to good solutions (nearly circle), with FTR being 25 times faster, requiring 150 times less energy updates and exhibiting more robustness to the parameters.
3.2 Shape Prior with Geometric Shape Moments
Next, we perform image segmentation with a shape prior constraint in the form of distance between the geometric shape moments of the segment and a target. Our energy is defined as , where is a standard loglikelihood unary term based on intensity histograms. In this case, is given by
with denoting the target geometric moment of order . Figure 3 shows an example of liver segmentation with the above shape prior constraint. The target shape moments as well as the foreground and background appearance models are computed from the user provided input ellipse as in [16, 13]. We used moments of up to order (including the center of mass and shape covariance but excluding the volume). Both trust region and level sets obtain visually pleasing solutions. The trust region method is two orders of magnitude faster and requires two orders of magnitude less energy updates (topleft and topmiddle plots). Since the level sets method was forced to stop after 10000 iterations, we show the last solution available for each value of parameter . The actual convergence for this method would have taken more iterations. In this example, the oscillations of the energy are especially pronounced (topright plot).
3.3 Appearance Prior
In the experiments below, we apply both methods to optimize segmentation energies where the goal is to match a given target appearance distribution using either the distance between the observed and target color bin counts, or the KullbackLeibler divergence and Bhattacharyya distance between the observed and target color distributions. Here, our energy is defined as . We assume is an indicator function of pixels belonging to bin and
is the target count (or probability) for bin
. The target appearance distributions for the object and the background were obtained from the ground truth segments. We used 100 bins per color channel. The images in the experiments below are taken from [21].distance constraint on bin counts:
Figure 4 shows results of segmentation with distance constraint between the observed and target bin counts regularized by length. The regional term in this case is
Since the level sets method was forced to stop after 15000 iterations for values of , we show the last solution available. Full convergence would have taken more iterations. For higher values of , we show results at convergence. We observe two orders of magnitude difference between the trust region and level sets method in terms of the speed and the number of energy updates required.
KullbackLeibler divergence:
Figure 5 shows results of segmentation with KL divergence constraint between the observed and target color distributions. The regional term in this case is given by
The level sets method converged for times steps and , but was forced to stop for other values of the parameter after iterations. We show the last solution available. Full convergence would have required more iterations. The trust region method obtains solutions that are closer to the ground truth, runs two order of magnitude faster and requires less energy updates.
Bhattacharyya divergence:
Figure 6 shows results of segmentation with Bhattacharyya distance constraint between the observed and target color distributions. The regional term in this case is given by
Also for this image, since the level sets method had not (yet) converged after 10000 iterations for any set of the parameters, we show the last solution available. Further increasing parameter would increase the oscillations of the energy (see topright plot).
4 Conclusions
For relatively simple functionals (3), combining a few linear terms ( is small), such as constraints on volume and loworder shape moments, the quality of the results obtained by both methods is comparable (visually and energywise). However, we observe that the number of energy updates required for level sets is two orders of magnitude larger than for trust region. This behavior is consistent with the corresponding CPU running time plots. The segmentation results on shape moments were fairly robust with respect to parameters (time step and multiplier ) for both methods. The level sets results for volume constraints varied with the choice of . In general, larger steps caused significant oscillations of the energy in level sets thereby affecting the quality of the result at convergence.
When optimizing appearancebased regional functionals with large number of histogram bins (corresponding to large ), level sets proved to be extremely slow. Convergence would require more than iterations (longer than 1 hour on our machine). In some cases, the corresponding results were far from optimal both visually and energywise. This is in contrast to the results by trust region approach, which consistently converged to plausible solutions with low energy in less than a minute or iterations.
We believe that our results will be useful for many practitioners in computer vision and medical imaging when selecting an optimization technique.
References
 [1] A. Adam, R. Kimmel, and E. Rivlin. On scene segmentation and histogramsbased curve evolution. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 31:1708–1714, 2009.

[2]
I. Ben Ayed, H. Chen, K. Punithakumar, I. Ross, and S. Li.
Graph cut segmentation with a global constraint: Recovering region
distribution via a bound of the Bhattacharyya measure.
In
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
, pages 3288 – 3295, 2010.  [3] I. Ben Ayed, S. Li, A. Islam, G. Garvin, , and R. Chhem. Area prior constrained level set evolution for medical image segmentation. In Proc. SPIE, Medical Imaging 2008: Image Processing, 2008.
 [4] I. Ben Ayed, S. Li, and I. Ross. A statistical overlap prior for variational image segmentation. International Journal of Computer Vision, 85(1):115–132, 2009.
 [5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
 [6] Y. Boykov and V. Kolmogorov. Computing Geodesics and Minimal Surfaces via Graph Cuts. In IEEE Int. Conf. on Computer Vision (ICCV), 2003.
 [7] Y. Boykov and V. Kolmogorov. An Experimental Comparison of MinCut/MaxFlow Algorithms for Energy Minimization in Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 29(9):1124–1137, 2004.
 [8] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. International Journal of Computer Vision, 22(1):61–79, 1997.
 [9] T. Chan and L. Vese. Active contours without edges. IEEE Transactions on Image Processing (TIP), 10(2):266–277, 2001.
 [10] V. Estellers, D. Zosso, R. Lai, S. Osher, J. P. Thiran, and X. Bresson. Efficient algorithm for level set method preserving distance function. IEEE Transactions on Image Processing (TIP), 21(12):4722–34, 2012.
 [11] A. Foulonneau, P. Charbonnier, and F. Heitz. Multireference shape priors for active contours. International Journal of Computer Vision, 81(1):68–81, 2009.
 [12] D. Freedman and T. Zhang. Active contours for tracking distributions. IEEE Transactions on Image Processing (TIP), 13:518 – 526, April 2004.
 [13] L. Gorelick, F. Schmidt, and Y. Boykov. Fast Trust Region for Segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2013.
 [14] L. Gorelick, R. Schmidt, Y. Boykov, A. Delong, and A. Ward. Segmentation with NonLinear Regional Constraint via LineSearch cuts. In European Conf. on Computer Vision (ECCV), 2012.
 [15] J. Kim, V. Kolmogorov, and R. Zabih. Visual correspondence using energy minimization and mutual information. In IEEE Int. Conf. on Computer Vision (ICCV), pages 1033 – 1040, 2003.
 [16] M. Klodt and D. Cremers. A convex framework for image segmentation with moment constraints. In IEEE Int. Conf. on Computer Vision (ICCV), 2011.
 [17] C. Li, C. Xu, C. Gui, and M. D. Fox. Level set evolution without reinitialization: A new variational formulation. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 430–436, 2005.
 [18] O. V. Michailovich, Y. Rathi, and A. Tannenbaum. Image segmentation using active contours driven by the bhattacharyya gradient flow. IEEE Transactions on Image Processing (TIP), 16(11):2787–2801, 2007.
 [19] A. Mitiche and I. Ben Ayed. Variational and Level Set Methods in Image Segmentation. Springer, 2011.
 [20] S. Osher and R. Fedkiw. Level Set Methods and Dynamic Implicit Surfaces. SpringerVerlag, 2002.
 [21] C. Rother, V. Kolmogorov, and A. Blake. GrabCut: Interactive Foreground Extraction using Iterated Graph Cuts. In ACM SIGGRAPH, 2004.
 [22] C. Rother, V. Kolmogorov, T. Minka, and A. Blake. Cosegmentation of Image Pairs by Histogram Matching  Incorporating a Global Constraint into MRFs. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 993 – 1000, 2006.
 [23] J. A. Sethian. Level set Methods and Fast Marching Methods. Cambridge University Press, 1999.
 [24] O. J. Woodford, C. Rother, and V. Kolmogorov. A global perspective on MAP inference for lowlevel vision. In IEEE Int. Conf. on Computer Vision (ICCV), pages 2319 – 2326, 2009.
 [25] Y. Yuan. A review of trust region algorithms for optimization. In International Congress on Industrial and Applied Mathematics, 1999.
Comments
There are no comments yet.