Energy minimization has become a key tool for various computer vision problems such as image segmentation [1, 3], 3D reconstruction [22, 25], and stereo . It is equivalent to performing Maximum A Posterior (MAP) inference in Markov Random Fields (MRF), and in general, is NP-hard . However, there are subclasses of energy functions such as submodular functions, and thus it can be solved in polynomial time [2, 7, 9, 13]. Also many approximation algorithms based on belief propagation , tree reweighted message passing , and graph cuts [4, 14] have been studied.
Although sophisticated energy minimization algorithms have enabled researchers to compute the exact MAP solution under many probabilistic models, they do not provide the desired solution of image labelling problems. The reason for this is that most probabilistic models used in computer vision such as MRFs are mis-specified i.e., the most probable solution under the model is not the ground truth (desired solution) [23, 18]. This has led researchers to consider more sophisticated models [9, 20], but these too have not been shown to be free from mis-specification.
For many vision problems, the system may have access to certain statistics of the ground truth. For instance, in the case of the foreground/background image segmentation problem, the area and boundary length of the object to be segmented may be known. Similarly, for 3D voxel segmentation, we may know the volume and surface area of the object. In these scenarios, researchers require computation of the most probable solution under the model that is consistent with such statistics, or in other words, the solution satisfies certain equality or inequality constraints. Note that the inequality constraint allows researchers, for instance, to find the most probable segmentation that is bigger or smaller than a specified value in the image segmentation problem. While imposing such constraints in the optimization problem results in an improved solution for the image labelling problem, the resulting optimization problem becomes NP-hard in general.
The constrained optimization problem (MAP inference problem) is usually solved using Linear Programming formulations that relax the integrality constraints [15, 17]. However, this approach suffers from a number of drawbacks. First, the size of the resulting Linear Program can be very large, making the algorithm very slow. Second, rounding error introduced while transforming the fractional solution of the LP to a discrete solution can be arbitrarily large and may result in solutions having a high energy value. Lastly, although LP relaxation based methods can deal with linear constraints, they cannot handle powerful second order or higher order constraints such as the boundary length constraint described in Section 5.
In this paper, we propose a novel method to solve constrained energy minimization problems that can handle linear and higher order constraints. In contrast to previous methods that employ a continuous relaxation of the problem, we directly exploit its discrete structure. We consider the Lagrangian dual of the primal constrained problem, and directly compute a discrete solution corresponding to the dual maximum. It is then guaranteed that such a solution is the lowest energy solution with its statistics (see Lemma 1). Hence, solutions yielded by our method can be more accurate than those by LP relaxation based methods with rounding. Through extensive experiments on the image segmentation problem, we demonstrate that the proposed method produces solutions with less error, and runs more than
times faster compared with state-of-the-art continuous relaxation based methods. Although we show the efficacy of our method via the image segmentation problem, the method is not limited to a particular problem and can be applied to various constrained energy minimization problems in computer vision and machine learning.
2 Related Work
Our work has been inspired from a number of recent studies on enforcing statistics during inference [6, 12, 16, 26, 27, 8]. These methods can be divided into two broad categories on the basis of how they enforce constraints: methods that enforce statistics softly by incorporating global potentials in probabilistic models, and methods that use statistics as hard constraints during inference.
Many studies have considered the problem of performing inference in probabilistic models that contain global potentials that encourage solutions to have a particular distribution of labels [26, 27, 8]. Werner  and Kohli and Kumar  proposed inference algorithms that can handle a global potential that encourages a specific number of variables to be assigned a particular label. This potential can be used for encouraging foreground segmentations to be of a particular size, i.e., to cover a specific number of pixels. Woodford et al.  considered histogram distribution preserving potentials for image labelling problems such as image denoising and texture synthesis and proposed a number of sophisticated methods for performing inference in models containing such potentials.
The methods discussed above enforce statistics softly, and there is no guarantee that their solution would have statistics that match the desired statistics. In fact the statistics can be very different from the desired statistics. A number of inference algorithms that ensure that solutions have the desired statistics have been proposed. One of the first constraints used for image labelling problems was the silhouette constraint, used for the problem of 3D reconstruction [10, 22]. This constraint ensures that a ray emanating from any silhouette pixel must pass through at least one voxel that belongs to the ‘object’. Algorithms that ensure topological constraints such as connectivity of the object segmentation  or that the boundary of the object segmentation is close to the sides of a bounding box  have also been proposed.
Convex relaxation has been widely used to solve a discrete optimization problem. Ravikumar and Lafferty  proposed a convex quadratic relaxation for MAP inference, and it was shown to be more accurate than LP relaxation and propagation based methods. Another approach that is closely related to the present work is the method of Klodt and Cremers , which produces impressive results for labelling problems such as image segmentation. This method works by solving a continuous relaxation of the constrained energy minimization problem and rounding its solution to obtain a discrete solution. Although their methods can handle multiple constraints, they cannot easily deal with constraints on second order statistics (such as the length of the segmentation boundary) without significantly increasing the computational cost. For a more detailed discussion, please refer to Section 7.
We now discuss the most relevant methods to the present work. Kolmogorov et al.  and Lim et al.  considered the problem of minimizing a submodular energy function under the ‘label count’ constraint. For the foreground/background segmentation problem, the label count constraint ensures that a specific number of pixels take the foreground label. Both methods use the parametric mincut algorithm for guiding their search for a lowest energy solution that satisfies the constraint. The results of these methods have been shown to improve the accuracy under Hamming error with respect to the ground truth [12, 16]. However, both methods suffer from the limitation that they can handle only one constraint at a time. In contrast, our method can handle multiple, equality or inequality, and linear or non-linear constraints.
3 Setup and Preliminaries
This section provides the notations and definitions used in the paper. It also provides our formulation for the constrained energy minimization problem and explains the optimality certificate associated with the solutions of our method.
3.1 Energy Minimization
Many computer vision problems are formulated using a random field model defined over a graph
. The energy of the random field is the negative log of the posterior probability distribution. For image labelling problems with two labels, such as image segmentation and volumetric segmentation, the energy has the following form:
where , is the set of cliques in , and is called a potential and is defined over a clique . Although minimizing is NP-hard in general, if is submodular, it becomes solvable in polynomial time. Moreover, if an energy function is defined over cliques of size up to as
the problem can be efficiently solved by a st-mincut algorithm. Owing to this property, submodular second order energy functions are widely used in computer vision for problems such as image segmentation.
3.2 Constrained Energy Minimization
In this paper, we tackle the problem of minimizing an energy function under certain pre-specified constraints. This problem is formulated as
where and . We denote by . In the context of image segmentation, may correspond to statistics such as the size and boundary length of an object. For instance, is the number of pixels that are assigned the label 1 (foreground).
Before explaining how to solve the inequality problem, we first examine the equality case as follows.
where , and
Note that for any , maximizing gives a lower bound of the minimum for (4). This leads us to the following known lemma , which, as will become apparent later, provides us with an optimality certificate that guarantees that our solution is the lowest energy solution with its statistics.
Let be such that for some , and . Then, .
Let satisfy that . Then, for all . This implies . Thus, from (6), . ∎
Note that for a fixed , corresponds to an
-dimensional hyperplane, and when
-dimensional hyperplane, and when, the slope of the hyperplane becomes . Also note that is defined by the pointwise minimum of a finite number of hyperplanes, that is, is piecewise linear concave. From these facts, if a hyperplane with slope contributes to the shape of the dual , that hyperplane corresponds to a primal optimal solution and we can find it by maximizing . Even though the optimal solution whose slope is does not contribute to the shape of , we can still compute a (primal) solution at the dual maximum. Such a (primal) solution has a slope close to , i.e. , and by Lemma 1, it is optimal among such that . Thus, it is a good approximate solution for (4). The algorithm in Section 4 essentially computes this solution.
Now we return to the inequality constrained problem. An inequality constraint can be handled as an equality one with the insertion of a slack variable . As a first step, we rewrite problem (3) as follows.
where , and . The corresponding equality problem then becomes
where , and . The following is the corresponding Lagrangian.
Let be a minimizer of for some ; then if , if , and can be any number in if . Hence, only depends on , and the dual is as follows.
This form is the same as (5). Thus, we can handle an inequality constrained problem as the corresponding equality constrained problem. We obtain the following Lemma, which shows the optimality of our solution.
Let be such that for some , and . Then .
Let satisfy . Then, for any . This implies . Then , and finally we get from the definition of . ∎
4 Algorithm to Maximize the Dual
We have seen that an energy minimization problem with inequalities can be solved by considering the corresponding problem with equalities. Thus, in this section we focus on solving the problem with equalities only. To this end, we employ the standard cutting plane algorithm  for maximizing the Lagrangian dual. The cutting plane algorithm runs iteratively by computing an -dimensional hyperplane consisting of the dual one by one until finding the maximum. In the following, we assume the existence of a polynomial time oracle to compute for any , where is a large enough set. Note that when (2) is submodular and constraints are linear, can be efficiently computed by the graph cut algorithm for any with and . In Section 5, we show that this holds for many useful constraints. We denote the oracle call by the function:
As we mentioned, each corresponds to a hyperplane . The cutting plane algorithm works by iteratively finding , for which the oracle call needs to be made. Let be a set of computed solutions by the oracle call until iteration ; we then compute the maximum of the following linear programming over the variables .
We let be the optimal solution of (4). We subsequently call the oracle to obtain an optimal solution for , that is, . If , the algorithm terminates with the output ; otherwise, it updates , and repeats the procedure. Note that as new hyperplanes are added over the iterations, the optimal value of (4) becomes smaller, and the condition guarantees the optimality. Algorithm 1 summarizes the whole procedure.
The running time of the algorithm is dominated by the oracle call and solving the LP in (4). Note that the number of variables, , of (4) is small, regardless of that of the primal problem. Also, the number of constraints is at most , where is the number of iterations. Note that is bounded by the number of possible values, which is for all constraints explained in Section 5. For example, for the size constraint. Hence, the running time of our algorithm is polynomial in for all those constraints. We also observed that in practice, the number of iterations is at most for almost all cases, thus inducing very fast running time (see Section 6).
5 Constraints for Matching Statistics
In this section, we explain how our method can be applied to labelling problems in vision such as image segmentation. The energy function for the pairwise random field model for image segmentation can be written as
where is the set of all pixels, is the set of all neighboring edges, , and encodes the likelihood that each pixel belongs to the object or background, and encodes the smoothness of the boundary of the object. Note that (14) is submodular since all the second order terms are submodular. Functions such as (14) can be efficiently minimized by solving an equivalent st-mincut problem . In the following, we introduce some useful constraints for image segmentation, which can be handled by our method.
5.1 Size Constraint (Sz) 
The most natural constraint for segmentation is the size (area for image segmentation, volume for 3D segmentation) of the object being segmented. This can be represented as
5.2 Boundary Length Constraint (Br)
The number of discontinuities in the labelling is a measure of the length of the object boundary in image segmentation, and the surface area in 3D reconstruction. These constraints can ensure that the segmentation boundary is smooth. Hence, we suggest the following boundary length constraint:
Note that only when pixels and contribute to the boundary of an object. When we use the boundary length constraint, the search space for the cutting plane algorithm may be restricted to ensure submodularity. Note that for some negative , where is the Lagrangian multiplier for this constraint, the Lagrangian is no longer submodular over . We can ensure submodularity by truncating the search space such that is always larger than . This restriction appears to cause a problem in accuracy, but we observed that the solution for the parameter (the case of no constraint) resulted in a speckled segmentation whose boundary length was much larger than that of the ground truth in all cases. To obtain a shorter boundary length, we need to search for . In other words, the maximum of the dual is found on a restricted search range, . Hence, this restriction does not affect the accuracy in our case.
5.3 Mean Constraint (Mn) 
The center of the object to be segmented can be easily specified by the user by drawing a circle roughly containing the object. This information can be used to define constraints on the mean horizonal and vertical coordinates of the pixels belonging to the object to be segmented. Enforcing the mean statistic involves the following constraint:
where , , and and are the horizontal and vertical coordinates of a pixel , respectively. Note that (Mn) can be rewritten using a slack variable as follows.
where . Note that if we consider the dual with the mean constraint (16), is determined only by . That is, for , for , for , and can be any number for . Consequently, (16) can be handled as a linear equality constraint for our method.
5.4 Variance Constraint (Vr) 
If we have knowledge of the center of an object, we can also impose a variance constraint on the distance of the object pixels from the center as follows.
where , , and are horizontal and vertical coordinates of a pixel , and denotes the coordinate mean of the object. This constraint can be also handled in the same manner as done for Mn.
5.5 Covariance Constraint (Cv)
Similarly, a covariance constraint can also be enforced as
5.6 Local Size Constraint (Lsz)
This is a generalization of the size constraint (Sz). In contrast to the size constraint, in this constraint, the size of the object being segmented is locally fixed. For instance, we divide the image into subimages, and impose the size constraint for each subimage. If we could obtain the size information locally, it would be more powerful than the size constraint. For instance, we can decompose an object having a complex boundary into parts having simpler boundaries and impose the size constraint for each part, respectively. In general, subimages are allowed to overlap each other or may not cover the whole image. When we enforce the size constraint for number of subimages, the local size constraint can be represented as follows. For ,
All the constraints above, except Br, are linear in . Note that for a linear constraint, we can adopt any search range in the cutting plane algorithm since it does not break submodularity. Now, we can use any combination of these constraints with an appropriate search range. Figure 1 shows a sequence of segmentations produced by our method for maximizing the dual. Figure 2 shows some examples of segmentations improved by imposing constraints.
We now evaluate the performance of our algorithm on the image segmentation problem using images from the GrabCut dataset . The first experiment checks the accuracy of the segmentation results obtained with different combinations of constraints. We also compare the results of our method with those obtained by the continuous relaxation based methods in [15, 19]. To measure the quality of segmentations, we compute the percentage of erroneously labelled pixels (with respect to the ground truth(GT)). We use ER to denote this percentage error. The experiment environment is a server with 2.8GHZ Quad-Core Intel i7 930 and 24G memory.
6.1 Effect of Imposing Constraints
Figure 3 shows segmentation results corresponding to various combinations of constraints. We find that the variance constraint (Vr) is quite effective: the results with constraint combinations including the variance constraint have less error compared with the other results. Imposing the boundary length constraint (Br) affected the smoothness of segmentation. For instance, the ER values of segmentations with boundary length/covariance constraints (Br, Cv) and variance constraint (Vr) are nearly the same, but the former is much smoother than the latter. Table 1 shows quantitative results of our method. We assumed that statistics of the ground truth such as area, mean, and variance are known, and set inequality gaps in intervals of and from the ground truth values. We considered various combinations of constraints. As expected, imposition of any constraint improves the segmentation accuracy on average. From the table, we can see that the size (Sz) and variance constraints (Vr) are especially powerful for image segmentation. Note that the variance constraint (Vr) alone results in a highly accurate segmentation. We speculate that this is because the variance constraint causes the segmentation result to be rounder, and reduces noise fragmentations. We provide more segmentation results by our method with various constraint combinations in Appendix.
6.2 Comparison with Continuous Relaxation
We compared the proposed method with two continuous relaxation based methods.
6.2.1 Linear Relaxation (LR) 
The first one is to relax the problem to a linear programming using additional variables , as follows.
where . This LP relaxation can handle all constraints described in Section 5, but cannot deal with both-side inequality or an equality boundary length constraint (Br), because each of these breaks the convexity of the feasible region. Also, it is not reasonable to use as the value of the boundary length constraint (Br) because this value is not exactly the same as . Hence, the boundary length constraint (Br) cannot be properly handled by the LR.
6.2.2 Quadratic Relaxation (QR) 
For this relaxation, we first write our energy function as follows.
Note that since . The relaxed problem can then be formulated as a convex quadratic programming.
where for , and . This quadratic relaxation can handle all constraints described in Section 5, except the boundary length constraint (Br). As in LR, both-side inequality or an equality boundary length constraint (Br) breaks the convexity of the feasible region.
We implemented the LR and QR based methods using CPLEX Optimizer with MATLAB interface, and implemented our method by combining MATLAB and C++ codes. Table 2 summarizes the comparison between the proposed approach and the relaxation based methods outlined above.
For all constraint combinations, our method produces segmentations with less percentage pixel error (ER) compared to those obtained by LR and QR. Further, our method is extremely fast: times faster than QR and times faster than LR. Figure 4 shows some examples of segmentation by the three methods. We provide more segmentation results to compare our method with LR and QR in Appendix.
|Our Method||LR||QR||Our Method||LR||QR|
Recall that our method runs by calling the oracle and solving the LP in (4) at each iteration. Since we consider the pixel grid graph, the oracle call, which involves minimizing a submodular pseudo-boolean function using the graph cut algorithm, takes very little time. The number of variables constituting the LP in (4) for our approach is small. Furthermore, for all constraint combinations in Table 2, the total number of iterations does not exceed (average: , std.: ), making the number of constraints of the LP small. The small size of the LP enables our method to run much faster than LR, which has to solves a relatively large linear programming problem.
The results of our experiments comparing the computational complexity of the three methods (ours, LR, QR) are shown in Figure 6. We used four images, and each image was scaled down to small images of size , . Each data point denotes the average running time of the specified method over images having the same size specified on the x-axis values. It can be seen that our methods is substantially faster than the state-of-the-art LR and QR based approaches. Figure 5 shows segmentation results yielded by our method for original images of size .
Table 3 shows how solutions by the three methods satisfy given constraints. Here, we used two measures: and . denotes the percentage of constraints that are unsatisfied. For instance, if the size (Sz) and variance constraints (Vr) are considered and one of them is not satisfied, then is counted as . represents the deviation of the constraint values from the inequality gap, thus providing a measure of the degree of satisfaction. For instance, let the inequality gap for Sz be and the size of a computed solution be ; is then counted by if and counted by if . We observed that solutions computed by three methods satisfy the constraints or are very close to the constraints gap in general.
7 Discussion and Conclusions
We have proposed a novel approach to constrained energy minimization. Our method is efficient and can handle a very large class of constraints including those that enforce second order statistics, which cannot be achieved by using previously proposed methods. To handle such second order constraints, the boundary length constraint in our paper, we use a restricted search region in which the dual is computable. In our cases, the dual maximum is always found in that restricted search region which means that it does not affect the accuracy of our method. Although our approach can be used with continuous relaxations based methods [15, 19], it would require repeated solutions of the relaxed problem to find the dual maximum and would be much slower than our method as shown in Section 6. We evaluated our method on image segmentation with constraints described in Section 5. Our method produced segmentations, with less error, in much faster time compared to state-of-the-art continuous relaxation based methods.
-  A. Blake, C. Rother, M. Brown, P. Pérez, and P. Torr. Interactive image segmentation using an adaptive gmmrf model. In ECCV, 2004.
-  E. Boros and P. Hammer. Pseudo-boolean optimization. Discrete Applied Mathematics, 2002.
-  Y. Boykov and M. Jolly. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In ICCV, 2001.
-  Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. PAMI, 2001.
-  M. Guignard. Lagrangean relaxation. TOP, 11, 2003.
A. Klodt and D. Cremers.
A convex framework for image segmentation with moment constraints.In ICCV, 2011.
-  P. Kohli, M. Kumar, and P. Torr. and beyond: Solving energies with higher order cliques. In CVPR, 2007.
-  P. Kohli and M. P. Kumar. Energy minimization for linear envelope mrfs. In CVPR, pages 1863–1870, 2010.
-  P. Kohli, L. Ladicky, and P. Torr. Robust higher order potentials for enforcing label consistency. In CVPR, 2008.
-  K. Kolev and D. Cremers. Integration of multiview stereo and silhouettes via convex functionals on convex domains. In ECCV, 2008.
-  V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. PAMI, 2006.
-  V. Kolmogorov, Y. Boykov, and C. Rother. Application of parametric maxflow in computer vision. In ICCV, 2007.
-  V. Kolmogorov and R. Zabih. What energy functions can be minimized using graph cuts? In ECCV, 2002.
-  N. Komodakis, N. Paragios, and G. Tziritas. Mrf optimization via dual decomposition: message-passing revisited. In ICCV, 2007.
-  V. Lempitsky, P. Kohli, C. Rother, and T. Sharp. Image segmentation with a bounding box prior. In ICCV, 2009.
-  Y. Lim, K. Jung, and P. Kohli. Energy minimization under constraints on label counts. In ECCV, 2010.
-  S. Nowozin and C. Lampert. Global connectivity potentials for random field models. In CVPR, 2009.
-  P. Pletscher, S. Nowozin, P. Kohli, and C. Rother. Putting map back on the map. In DAGM-Symposium, pages 111–121, 2011.
-  P. D. Ravikumar and J. D. Lafferty. Quadratic programming relaxations for metric labeling and markov random field map estimation. In ICML, 2006.
-  S. Roth and M. Black. Fields of experts: A framework for learning image priors. In CVPR, 2005.
-  C. Rother, V. Kolmogorov, and A. Blake. ”grabcut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph., 2004.
-  S. Sinha and M. Pollefeys. Multi-view reconstruction using photo-consistency and exact silhouette constraints: A maximum-flow formulation. In ICCV, 2005.
-  R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for Markov random fields. In ECCV, 2006.
-  S. Vicente, V. Kolmogorov, and C. Rother. Graph cut based image segmentation with connectivity priors. In CVPR, 2008.
-  G. Vogiatzis, P. Torr, and R. Cipolla. Multi-view stereo via volumetric graph-cuts. In CVPR, 2005.
-  T. Werner. High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft contraint optimisation (map-mrf). In CVPR, 2008.
-  O. Woodford, C. Rother, and V. Kolmogorov. A global perspective on MAP inference for low-level vision. In ICCV, 2009.
-  J. Yedidia, W. Freeman, and Y. Weiss. Generalized belief propagation. In NIPS, 2001.
A Segmentation Results of Our Method with Various Constraint Combinations
In the following, segmentation results by our method with various constraint combinations are shown.
B Segmentation Results of Our Method, LR and QR
In the following, segmentation results by our method, LR and QR are shown.