# Multi-dimensional Parametric Mincuts for Constrained MAP Inference

In this paper, we propose novel algorithms for inferring the Maximum a Posteriori (MAP) solution of discrete pairwise random field models under multiple constraints. We show how this constrained discrete optimization problem can be formulated as a multi-dimensional parametric mincut problem via its Lagrangian dual, and prove that our algorithm isolates all constraint instances for which the problem can be solved exactly. These multiple solutions enable us to even deal with `soft constraints' (higher order penalty functions). Moreover, we propose two practical variants of our algorithm to solve problems with hard constraints. We also show how our method can be applied to solve various constrained discrete optimization problems such as submodular minimization and shortest path computation. Experimental evaluation using the foreground-background image segmentation problem with statistic constraints reveals that our method is faster and its results are closer to the ground truth labellings compared with the popular continuous relaxation based methods.

## Authors

• 2 publications
• 31 publications
• 91 publications
07/30/2013

### Efficient Energy Minimization for Enforcing Statistics

Energy minimization algorithms, such as graph cuts, enable the computati...
04/22/2018

### Reliability based-design optimization using the directional bat algorithm

Reliability based design optimization (RBDO) problems are important in e...
01/15/2015

### Submodular relaxation for inference in Markov random fields

In this paper we address the problem of finding the most probable state ...
07/24/2018

### A convex formulation for Discrete Tomography

Discrete tomography is concerned with the recovery of binary images from...
05/05/2022

### Soft and Hard Constrained Parametric Generative Schemes for Encoding and Synthesizing Airfoils

Traditional airfoil parametric technique has significant limitation in m...
06/13/2018

### MAP inference via Block-Coordinate Frank-Wolfe Algorithm

We present a new proximal bundle method for Maximum-A-Posteriori (MAP) i...
02/17/2021

### Joint Continuous and Discrete Model Selection via Submodularity

In model selection problems for machine learning, the desire for a well-...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Markov Random Fields (MRF) is an undirected graphical model, which has been extensively studied and used in various fields, including statistical physics [11]

, and computer vision

[19]

. It represents interdependency of discrete random variables as a graph over which a probabilistic space is defined. Computing the solution which has the maximum probability under the random field, or Maximum a Posteriori (MAP) inference is NP-hard in general. However, a number of subclasses of MRFs have been isolated for which the problem can be solved in polynomial time

[2]

. Further, a number of heuristics or approximation algorithms based on belief propagation

[34], tree reweighted message passing [33], and graph-cut [3]

have also been proposed for the problem. Such algorithms are widely used for various problems in machine learning and computer vision

[38, 17]. Since MAP inference in an MRF is equivalent to minimizing the corresponding energy function111

Energy of a labelling is the negative logarithm of its posterior probability.

, in what follows, we will explain these problems in terms of energy minimization.

In many real world problems, the values of certain statistics of the desired solution may be available as prior knowledge. For instance, in the case of foreground-background image segmentation, we may know the approximate shape and/or size of the object being segmented, and thus might want to find the most probable segmentation that has a particular area (number of foreground pixels) and boundary length (number of discontinuities). Another example is community detection in a network [8] where we may know the number of nodes belonging to each community. Such scenarios result in constraints in the solution space, and MAP inference becomes a constrained energy minimization problem, which is generally NP-hard even if the unconstrained version is polynomial time solvable.

Energy minimization under the above-mentioned statistics constraints results in a challenging optimization problem. However, recent work in computer vision has shown that this problem can be handled efficiently using the parametric mincuts [14] which allow simultaneous computation of exact solutions for some constraint instances. Although the parametric mincuts provide a general framework to deal with constrained energy minimization, they can only handle one linear equality constraint.

For minimizing energy functions under multiple constraints, a number of continuous relaxation based methods have been proposed in the literature. For instance, linear relaxation approaches were adopted to handle bounding-box and connectivity constraints defined on the labelling [18, 24]. Further, Klodt and Cremers [12]

proposed a convex relaxation framework to deal with moment constraints. Continuous relaxation based methods have also been used for constrained discrete optimization, and can handle multiple inequality constraints. All the above-mentioned methods suffer from following basic limitations: they only handle linear constraints, and the solution involves rounding of the solution of the relaxed problem which may introduce large errors.

### 1.1 Our contribution

In this paper we show how the constrained discrete optimization problem associated with constrained MAP inference can be formulated as a multi-dimensional parametric mincut problem via its Lagrangian dual, and propose an algorithm that isolates all constraint instances for which the problem can be solved exactly. This leads to densely many minimizers, each of which is, optimal under distinct constraint instance. These minimizers can be used to compute good approximate solutions of problems with soft constraints (enforced with a higher order term in the energy).

Our algorithm works by exploiting the Lagrangian dual of the minimization problem, and requires an oracle which can compute values of the Lagrangian dual efficiently. A graph-cut algorithm [3] is a popular example of such an oracle. In fact, our algorithm generalizes the (one-dimensional) parametric mincuts [5, 14] to multiple-dimensions. In contrast to the parametric mincuts [5], our algorithm can deal with multiple constraints simultaneously, including some non-linear constraints (as we show in the paper). This extension allows our algorithm to be used as a technique for multi-dimensional sampling e.g. to obtain different segmentation results for image segmentation as done in [4].

We propose two variants of our algorithm to efficiently deal with the problem of performing MAP Inference under hard constraints. The first variant computes the maximum of the dual and outputs its corresponding primal solution as an approximation of the constrained minimization. The primal is computed using selective oracle calls, leading to fast computation time. The other variant combines the first variant with our multi-dimensional parametric mincuts algorithm to deal with problems with soft-constraints, which allows to find a solution closer to a desired one via additional search.

Our method is quite general and can be applied to any constrained discrete optimization problems whose Lagrangian dual value is efficiently computable. Examples include submodular function minimization with constraints such as the balanced minimum cut problem, and constrained shortest path problems. Further, in contrast to traditional continuous relaxation based methods, our technique can easily handle complicated soft constraints.

In Section 5, we demonstrate that our algorithms compute solutions very close to the ground truth compared with these continuous relaxation based methods on the foreground-background image segmentation problem.

### 1.2 Related work

A number of methods have been proposed to obtain better labelling solutions by inferring the MAP solution from a restricted domain of solutions which satisfy some constraints. Among them, solutions to image labelling problems which have a particular distribution of labels [36] or satisfy a topological property like connectivity [32] have been widely studied.

More specifically, for the problem of foreground-background image segmentation, most probable segmentations under the label count constraint have been shown to be closer to the ground truth [14, 20]. Another example is the silhouette constraint which has been used for the problem of 3D reconstruction [13, 29]. This constraint ensured that a ray emanating from any silhouette pixel must pass through at least one voxel which belongs to the ‘object’.

Recently, dual decomposition has been proposed for constrained MAP inference [7, 37]. Gupta et al. [7] dealt with cardinality-based clique potentials and developed both exact and approximate algorithms. Also Woodford et al. [37] studied a problem involving marginal statistics such as the area constraint especially with convex penalties, and showed that the proposed method improves quality of solutions for various computer vision problems.

MAP inference under constraints are also applied to combinatorial optimization such as the balanced metric labelling. For this problem, Naor and Schwartz

[23] obtained an -approximate algorithm where each label is assigned to at most variables/nodes.

## 2 Setup and preliminaries

### 2.1 Energy minimization

Markov Random Fields (MRF) defined on a graph

is a probability distribution where every vertex

has a corresponding random variable taking a value from the finite label set . The probability distribution is defined as where , and the corresponding energy function is in the following form:

 f(x)=∑c∈CGϕc(xc), (1)

where is the set of cliques in and is a potential defined over the clique . The MAP problem is to find an assignment which has the maximum probability, and is equivalent to minimizing the corresponding energy function . In general it is NP-hard to minimize , but it is known that if is submodular, it can be minimized in polynomial time. Especially, if

is a pairwise submodular energy function defined on binary variables, which considers only cliques of size up to

, i.e.

 f(x)=∑u∈Vϕu(xu)+∑(u,v)∈Eϕuv(xu,xv), (2)

it can be efficiently minimized by solving a equivalent st-mincut problem [15]. Such is widely used in machine learning and computer vision [9, 20].

### 2.2 Energy minimization with constraints

Energy minimization with constraints is to compute the solution minimizing an energy function among ’s satisfying given constraints. A typical example of constraints is the label count constraint where .

In this paper, we consider the following energy minimization with multiple constraints.

 minx∈{0,1}n{f(x):hi(x)=bi, 1≤i≤m}, (3)

where , is a constant, and for , and . In (3), each constraint encodes distinct prior knowledge on a desired solution. For convenience, we denote by .

Let us consider the following Lagrangian dual of , which is widely used for discrete optimization [16, 30].

 g(λ)=minx∈{0,1}nL(x,λ), (4)

where

 L(x,λ)=f(x)+λT(H(x)−b). (5)

Note that is defined over a continuous space while is defined over a discrete space. As in the continuous minimization, maximizing over provides a lower bound for (3). Now we define the characteristic set, which is the collection of minimizers of (5) over all .

###### Definition 1 (Characteristic Set).

The Characteristic Set is defined by

 χg=⋃λ∈Rmargminx∈{0,1}nL(x,λ). (6)

Let and . Then
[6].

###### Proof.

Suppose that satisfies that . It implies for any . Since , for some . Thus, from (5), . ∎

In this paper, we develop a novel algorithm to compute the characteristic set . We will show that if the dual is efficiently computable for any fixed , for example, when is submodular on , our algorithm computes by evaluating for number of . One implication of is

 g(λ)=minx∈{0,1}nL(x,λ)=minx∈χgL(x,λ), (7)

meaning that indeed depends on a much smaller set . Note that does not depend on the constraint instance , thus, in the remaining of the paper, we regard unless there is explicit specification. In Section 5, we will show that is polynomially bounded in for many constraints corresponding to useful statistics of the solution. Through experiments, we will show that is densely many among all possible constraint instances by an example of image segmentation.

Note that if we can compute minimizers of (3) for densely many constraint instances , we can obtain a good approximate solution for the following soft-constrained problem with any global penalty function .

 minx∈{0,1}n{f(x)+ρ(H(x)−^b)}. (8)

In (8), encodes our prior knowledge on a solution, and examples of include

and sigmoid functions. This soft-constrained optimization has been widely used in terms of lasso regularization and ridge regression, and also in computer vision

[17, 31].

### 2.3 Generalization

Although we describe our method for problems involving pseudo-Boolean222Real-valued

functions defined over boolean vectors

. objective functions, there is a class of multi-label functions to which our method can be applied. For instance, the results of [26] show transformation of any multi-label submodular functions of order up to to a pairwise submodular one, meaning that it can be solved by the graph-cut algorithm. This enables us to handle the following type of constraints, which is analogue of linear constraints in binary cases: for each ,

 hj(x)=∑i∈Vaijδxi;j=bj, (9)

where is Kronecker delta function.

Our method is also applicable to any constrained combinatorial optimization problems whose is efficiently computable. We will discuss it more in detail in Section 5.2.

## 3 Computing the Characteristic Set

### 3.1 Algorithm description

In this section, we describe our algorithm that computes the characteristic set . We assume that for a given set where for all , there is an oracle to compute the Lagrangian dual efficiently for any . For simplicity of explanation, we assume for some . We denote the oracle call by

 O(λ)=argminx∈{0,1}nL(x,λ). (10)

Essentially, our algorithm iteratively decides the ’s in for which the oracle will be called. Later we prove that the number of oracle calls in our algorithm to compute is polynomial in .

We first define the following, which has a central role in our algorithm.

###### Definition 2 (Induced dual of g on X).

Let be the Lagrangian dual of , and . The induced dual of is defined by

 gX(λ)=minx∈XL(x,λ). (11)

From the definition of , note that . For each

, we define a hyperplane

by

 Px={(λ,z)∈Rm+1:λ∈Rm,z=L(x,λ)}. (12)

For , we use the notation so that . For convenience, we will denote any by , where is the first coordinates of and is the -th coordinate of . Since is finite and each corresponds to a hyperplane in -dimension, consists of the boundary of the upper polytope of (4). Then corresponds to the collection of -dimensional facets of this polytope.

To compute , we will recursively update a structure called the skeleton of defined below. Intuitively, the skeleton of is the collection of vertices and edges of the polytope corresponding to .

###### Definition 3 (Proper convex combination).

Given , is a proper convex combination of if for some with .

###### Definition 4 (Skeleton of gX over S).

For a given induced dual , let , and for , is the line segment connecting and . The skeleton of is satisfying the followings.

1. of , then .

2. is a proper convex combination of , then .

Our algorithm runs by updating and iteratively. If a new minimizer is computed by the oracle call, it is inserted to and the algorithm computes . Then, the algorithm determines new ’s for which the oracle will be called from the new vertices added to . We prove in Theorem 1 that at the end of the algorithm, .

Initially, the algorithm begins with where is the output of the oracle call for any arbitrary . The inittial skeleton is given by where and for ; and . Note that , i.e. the skeleton of . This initialization is denoted by and it returns and .

In each iteration with the skeleton , the algorithm chooses any vertex , and checks whether using the oracle call for . If , we confirm that and . If not, computed from the oracle satisfies . Then, the algorithm computes a new skeleton as explained below.

Let . To compute , geometrically we cut by . This can be done by finding the set of skeleton vertices of strictly above , and finding the set of all intersection points between and . Then, is removed from , and is added to . Lastly, the set of edges of the convex hull of , which is denoted by , is added to 333 For a given , can be computed, for example, by [1]. In general, for given -dimensional points, a convex hull algorithm outputs a set of dimensional facets of the convex hull. Then, we can obtain the edges of the convex hull by recursively applying the algorithm to every computed facets.. Then, the updated is . Due to the concavity of , we can compute all the above sets by the depth or breadth first search starting from . Algorithm 1 describes the whole procedure.

##### Example of execution

We explain the running process of DualSearch with a toy example. Let us consider an energy function , and two constraints and defined as follows.

 h1(x1,x2) =x1−x2, (13) h2(x1,x2) =2|x1−x2|. (14)

Here, we set . Initially, the algorithm computes a minimizer for . Then the initial becomes , which is shown in Figure 1. At this point, . Let be chosen in the next iteration, and for that vertex, the new minimizer is found. This updates both and the skeleton as shown in Figure 1. In the following iterations, and are chosen, but for those vertices, there is no new minimizer; that is, for those vertices, a minimizer is either or . The skeleton at this point is shown in Figure 1. Next, is chosen, and the new minimizer is computed so that is updated by . This changes the skeleton as in Figure 1.

### 3.2 Correctness of the algorithm

In what follows, we analyze the correctness and query complexity of DualSearch. All proofs are provided in Section A.

###### Lemma 2.

At the end of each iteration of DualSearch, .

Lemma 2 states that when DualSearch terminates, is the skeleton of an induced dual where is the output of the algorithm. It remains to show that the computed is indeed the characteristic set .

###### Theorem 1.

When DualSearch terminates with , .

From Lemma 2 and Theorem 1, the following holds.

###### Corollary 1.

When DualSearch terminates, .

Now we analyze the query complexity. At each iteration, the algorithm uses exactly one oracle call. Then, either one new is identified if Line of Algorithm 1 is not satisfied, or one new is obtained if Line is satisfied. Using these facts, we prove the following theorem.

###### Theorem 2.

The number of oracle calls in DualSearch is .

Recall that each corresponds to a facet of the -dimensional convex polytope of . Since each vertex is determined as the intersection of at least facets, at the end of our algorithm, is bounded by . Thus, the query complexity becomes .

## 4 Algorithms for a specific constraint instance

In this section, we propose two variants of DualSearch to compute an approximate solution for a specific constraint instance. The first one is called DualMax, and the second one is AdaptSearch which combines DualMax and DualSearch. While DualSearch essentially does not need prior knowledge, DualMax and AdaptSearch explicitly use a given prior knowledge for more efficient computation.

### 4.1 DualMax

Given , this algorithm finds the maximum of the dual , which provides a lower bound of (3). If a corresponding minimizer of (3) is in , this algorithm finds that minimizer efficiently. Even though the corresponding minimizer is not in , the algorithm finds a lower bound of the minimum, which is a good approximate solution as shown in Section 5.1.2.

The main difference of DualMax from DualSearch is the vertex set appended to in Line of Algorithm 1. At each iteration, DualMax calls the oracle for the maximum of the current induced dual. While DualSearch appends all vertices in , DualMax only appends one vertex where for all . Then becomes the maximum of the induced dual for the next iteration. Since the (induced) dual is concave, such a local search on enables us to eventually find the maximum of the dual. The following is the modification of DualSearch to obtain DualMax.

1. The initial vertex set is changed to where for all where is the ordinary initial skeleton vertex.

2. Line 1 of Algorithm 1 is changed to “append to the one vertex such that for all ”.

Then, the following Lemma holds, and the proof is provided in Appendix.

###### Lemma 3.

When DualMax terminates, for the last for which the oracle is called, .

Note that DualMax uses far fewer oracle calls than DualSearch, which leads to fast computation of the maximum value of and the corresponding primal solution. The cutting plane method [6] can do the same computation as DualMax, and DualMax can be understood as one implementation of the cutting plane method.

While the cutting plane method computes the maximum of the dual by linear programming with computed hyperplanes at each time,

DualMax computes it by keeping and updating the skeleton of the dual.

Now, we suggest a way for DualMax to deal with inequality constraints by inserting a slack variable. For a given problem with inequality constraints, we first transform the problem to one with equality constraints, and apply the algorithm to the transformed problem. Let us consider the following problem.

 minx{f(x):¯b−k≤H(x)≤¯b}, (15)

where , and the inequality is the coordinatewise inequality. The inequality gap contains our prior knowledge, i.e. . First we transform the problem to a problem with equality constraints using a slack variable as follows.

 minx,y{^f(x,y):H(x)+y=¯b}, (16)

where , and . Let us consider the following Lagrangian.

 ^L(x,y,λ)=^f(x,y)+λT(H(x)+y−¯b). (17)

For a minimizer of for a fixed , it always holds that for , for , and can be any number in for . Hence, only depends on . Then, the dual of becomes

 ^g(λ)=minx{f(x)+λT(H(x)+y∗−¯b)}. (18)

Note that is a lower bound of (15). Since is determined only by , can be computed by the same oracle for . Now, we obtain the following lemma.

###### Lemma 4.

Let be such that for some , and . Then .

###### Proof.

Assume satisfying . It implies that for any . Then, . Finally, , and by the definition of , holds. ∎

Hence, we can solve (15) by the same manner as in the equality case. Inequality constraints make DualMax more widely applicable because we may not know the exact statistics of a desired solution in practice.

DualSearch is a very effective algorithm because it finds minimizers for all . But in general we do not know where good solutions are found, and thus we should use a large search region , which leads to slow running time. On the other hand, while DualMax efficiently finds the maximum of the dual for a specific , it may be difficut to determine for equality constraints in practice. Even if we use inequality constraints to deal with the uncertainty, as inequality gap gets larger, the accuracy of DualMax gets lower. To overcome these drawbacks, we propose a hybrid algorithm, called AdaptSearch, to combine advantages of DualMax and DualSearch, which runs as follows.

1. Let our prior knowledge be given, and be a large search region.

2. Run DualMax on with inequality constraint for moderately small . Let be the constraint instance for which the dual maximum is computed.

3. Run DualMax again on with equality constraint . Then, we obtain at which DualMax computes the maximum of the dual.

4. Run DualSearch for a small search region where , and let be the output of DualSearch.

5. Output a solution among that minimizes the soft-constrained objective.

Note that in AdaptSearch, we can also use the cutting plane method instead of DualMax. In general, any convex search region is adoptable in Step , but we observed from extensive experiments that small constants are enough to obtain a good solution. We will show in Section 5 that AdaptSearch computes better solutions than DualMax and runs quite fast.

## 5 Applications

### 5.1 Labelling problems in computer vision

In computer vision, a number of problems can be reduced to labelling problems, including image segmentation, D-reconstruction, and stereo. Our constrained energy minimization algorithms can be applied to those problems, for instance, when we may have knowledge on the volume of a reconstructed object for D-reconstruction or on the number of pixels belonging to an object for image segmentation. In this section, we show how our algorithms are applied to the foreground-background (fg-bg) image segmentation problem.

The fg-bg image segmentation problem is to divide a given image to foreground (object) and background. This can be done by labelling all pixels such that is assigned to foreground pixels and is assigned to background pixels. For this problem, one popular approach is to consider an image as a grid graph in which each node has four neighbours, and minimize an energy function of the form (2), which is submodular. The unary terms of the function encode how likely each pixel belongs to the foreground or background, while the pairwise terms encode the smoothness of the boundary of the object being segmented. However, in general, a minimizer of (2) is not the ground truth, and it has been shown that imposing statistics on a desired solution can improve segmentation results [12].

Below, we describe some linear constraints that have been successfully used in computer vision.

1. Size: where [20, 35, 36].

2. Mean: where and denotes the vertical and horizontal coordinates of a pixel , respectively [12].

3. Cov.: where and denotes the mean center of the object [12].

We can define the variance constraints for the vertical and horizontal coordinates in a similar way to the covariance constraint.

In many scenarios, researchers are interested in ensuring that the boundary of the object in the segmentation has a particular length. This length can be measured by counting the number of pairs of adjacent variables having different labels and described by where . For this boundary constraint, the search region may be restricted to a subregion of where is the smallest real number ensuring submodular for all . Figure 2 shows improvement of segmentation results by imposing the above constraints.

#### 5.1.1 Query complexity of DualSearch

Recall that the query complexity of DualSearch is polynomial in . Note that is upper bounded by the number of all possible constraint instances. For all the constraints above, we can show . For example, for the size constraint, , and for the boundary length constraint, because is a grid graph. Let us consider the mean constraint, and let be obtained from our prior knowledge. Then, the Lagrangian is as follows:

 L(x,λ)=f(x)+λT(∑i(ci−^b)xi), (19)

where is bounded by the size of row and column of the image. Hence, the numbers of possible values of and are and , respectively, which leads to . By a similar analysis, we can show for the covariance and variance constraints. If we consider multiple constraints simultaneously, is bounded by multiplication of the upper bound of each constraint. Hence, for any combination of the constraints above.

#### 5.1.2 Experiments

First we did experiments for the size and boundary constraints, and used the following Lagrangian.

 L(x,λ)=f(x)+λ1∑i∈Vxi+λ2∑(i,j)∈E|xi−xj|. (20)

Table 1 reports the summary of results of DualSearch for images with size . DualSearch produces minimizers for a very large number of constraint instances. One implication is that for any given constraint instance, DualMax and AdaptSearch can compute a minimizer with very close constraint instance to the original one. Figure3 shows an example of a skeleton projected onto two dimensional space that is computed with (20) for a image.

Figure 4 shows experimental results of DualMax and AdaptSearch. For AdaptSearch, we used a soft constraint with a square penalty function for the size and the boundary length constraint, that is, and . We chose and with which segmentation results generally show less error. Also for the first running of DualMax, we used the inequality gap of of , and were obtained from the ground truth. The small search region to apply DualSearch is used with , except for the first image with .

We also compared our algorithms with LP [18] and QP [12] relaxation based methods. Table 2 shows that DualMax is faster and more accurate compared with both methods. Since LP and QP cannot handle higher order both-side constrained inequality constraints unlike our algorithms, we used linear constraints introduced previously. Segmentation results are provided in Appendix.

### 5.2 Combinatorial optimization

##### Submodular minimization

Our method can also be used for constrained submodular function minimization (SFM). SFM is known to be polynomial time solvable and a number of studies have considered SFM under specific constraints such as vertex cover and size constraints [10, 22]. In contrast to previous work, we provide a framework for dealing with multiple general constraints. Our method can not only deal with any linear constraint, but can also handle some higher order constraints which ensure that the dual is computable. For instance, as shown in the previous section, any submodular constraint can be handled with restricted .

##### Shortest path problem

The restricted shortest path problem is a widely studied constrained version in which each edge has an associated delay in addition to its length. A path is feasible if its total delay is less than some threshold  [21]. This is a linear constraint where is the delay of edge . Another natural constraint for the shortest path problem is to drop some nodes among a given set of nodes. For instance, we may want to design a tour that should contain cities among cities. Indeed, this becomes a Hamiltonian path problem when . As in the project selection problem, we may partition cities to groups, and try to visit number of cities from each group where . Note that all constraints above are linear so that our method can be applied.

##### Project selection problem

Given a set of projects, a profit function , and a prerequisite relation , this problem is to find projects maximizing the total profit while satisfying a prerequisite relation. This is also known as the maximal closure problem and can be solved in polynomial time by transforming it to a st-mincut problem [25]. In practice, may be represented by sets that may overlap, and we may want to select projects from for . This can be formulated using linear constraints where is an indicator which projects belong to . This enables the use of our method to solve the constrained project selection problem.

## 6 Conclusions

This paper proposes novel algorithms to deal with the multiple constrained MAP inference problem. Our algorithm AdaptSearch is able to generate high-quality candidate solutions in a short time (see Figure 4) and enables handling of problems with very high order potential functions. We believe it would have a significant impact on the solution of many labelling problems encountered in computer vision and machine learning. As future work, we intend to analyze the use of our algorithms for enforcing statistics in problems encountered in various domains of machine learning.

## References

• [1] Barber, C.B., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software (TOMS) 22 (1996)
• [2] Boros, E., Hammer, P.: Pseudo-boolean optimization. Discrete Applied Mathematics (2002)
• [3] Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI (2001)
• [4] Carreira, J., Sminchisescu, C.: CPMC: Automatic object segmentation using constrained parametric min-cuts. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1312–1328 (2012)
• [5] Gallo, G., Grigoriadis, M., Tarjan, R.: A fast parametric maximum flow algorithm and applications. SIAM J. on Comput. 18, 18:30–55 (1989)
• [6] Guignard, M.: Lagrangean relaxation. TOP 11 (2003)
• [7] Gupta, R., Diwan, A.A., Sarawagi, S.: Efficient inference with cardinality-based clique potentials. In: Z. Ghahramani (ed.) ICML, ACM International Conference Proceeding Series, vol. 227, pp. 329–336. ACM (2007)
• [8] Hastings, M.B.: Community detection as an inference problem. Phys. Rev. E 74, 035,102 (2006).
• [9] Ishikawa, H.: Transformation of general binary MRF minimization to the first-order case. PAMI 33, 1234–1249 (2011)
• [10] Iwata, S., Nagano, K.: Submodular function minimization under covering constraints. In: FOCS (2009)
• [11] Kindermann, R., Snell, J.L.: Markov Random Fields and Their Applications. AMS (1980)
• [12] Klodt, A., Cremers, D.: A convex framework for image segmentation with moment constraints. In: ICCV (2011)
• [13] Kolev, K., Cremers, D.: Integration of multiview stereo and silhouettes via convex functionals on convex domains. In: ECCV (2008)
• [14] Kolmogorov, V., Boykov, Y., Rother, C.: Application of parametric maxflow in computer vision. In: ICCV (2007)
• [15] Kolmogorov, V., Rother, C.: Minimizing non-submodular functions with graph cuts - a review. In: PAMI (2007)
• [16] Komodakis, N., Paragios, N., Tziritas, G.: MRF optimization via dual decomposition: message-passing revisited. In: ICCV (2007)
• [17] Ladicky, L., Russell, C., Kohli, P., Torr, P.: Graph cut based inference with co-occurrence statistics. In: ECCV (2010)
• [18] Lempitsky, V., Kohli, P., Rother, C., Sharp, T.: Image segmentation with a bounding box prior. In: ICCV (2009)
• [19] Li, S.Z.: Markov random filed models in computer vision. In: ECCV (1994)
• [20] Lim, Y., Jung, K., Kohli, P.: Energy minimization under constraints on label counts. In: ECCV (2010)
• [21] Lorenz, D.H., Raz, D.: A simple efficient approximation scheme for the restricted shortest path problem. Operations Research Letters 28, 213–219 (1999)
• [22] Nagano, K., Kawahara, Y., Aihara, K.: Size-constrained submodular minimization through minimum norm base. In: ICML (2011)
• [23] Naor, J., Schwartz, R.: Balanced metric labeling. In: STOC (2005)
• [24] Nowozin, S., Lampert, C.: Global connectivity potentials for random field models. In: CVPR (2009)
• [25] Picard, J.C.: Maximal closure of a graph and applications to combinatorial problems. Management Science 22 (1976)
• [26] Ramalingam, S., Kohli, P., Alahari, K., Torr, P.: Exact inference in multi-label CRFs with higher order cliques. In: CVPR (2008)
• [27] Rhemann, C., Rother, C., Rav-Acha, A., Sharp, T.: High resolution matting via interactive trimap segmentation. In: CVPR (2008)
• [28] Rother, C., Kolmogorov, V., Blake, A.: “grabcut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (2004)
• [29] Sinha, S., Pollefeys, M.: Multi-view reconstruction using photo-consistency and exact silhouette constraints: A maximum-flow formulation. In: ICCV (2005)
• [30] Strandmark, P., Kahl, F.: Parallel and distributed graph cuts by dual decomposition. In: CVPR (2010)
• [31] Toyoda, T., Hasegawa, O.: Random field model for integration of local information and global information. PAMI 30, 1483–1489 (2008)
• [32] Vicente, S., Kolmogorov, V., Rother, C.: Graph cut based image segmentation with connectivity priors. In: CVPR (2008)
• [33]

Wainwright, M., Jaakkola, T., Willsky, A.: MAP estimation via agreement on trees: message-passing and linear programming.

IEEE Transactions on Information Theory (2005)
• [34] Weiss, Y., Yanover, C., Meltzer, T.: MAP estimation, linear programming and belief propagation with convex free energies. In: UAI (2007)
• [35] Werner, T.: High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF). In: CVPR (2008)
• [36] Woodford, O., Rother, C., Kolmogorov, V.: A global perspective on MAP inference for low-level vision. In: ICCV (2009)
• [37] Woodford, O.J., Rother, C., Kolmogorov, V.: A global perspective on map inference for low-level vision. In: ICCV, pp. 2319–2326. IEEE (2009)
• [38] Yanover, C., Meltzer, T., Weiss, Y.: Linear programming relaxations and belief propagation – an empirical study. Journal of Machine Learning Research 7, 1887–1907 (2006)

## A Proofs

In this section, we provide the proofs omitted in the main body. We use notations and to indicate and at the end of the -th iteration in lgorithm 1, respectively. Also we denote a vertex chosen in Line at the -th iteration by , and by without . Note that initially by the definition of .

### a.1 Proof of Lemma 2

Lemma 2 is proved by the following Lemma 5 and Lemma 6.

###### Lemma 5.

Assume that . Then .

###### Proof.

Let . First assume that is also in . Suppose that . There is and so that is a proper convex combination of . Note that . Thus, is a proper convex combination of over . It is a contradiction to .

Assume that . Then, , implying that and . Suppose that there is and so that is a proper convex combination of . Since and , it is a contradiction to the definition of .

Let . Suppose that but , which means that . Then, , a contradiction to . Suppose that nor . Then, there is and so that is a proper convex combination of . Note that because . Since , at least one of is strictly below , and let be the set of such elements of . Since , at least one of is strictly above , and let be the set of such elements of . Let be the set of intersections of and where and . Suppose that is strictly below , then is a proper convex combination of and , implying a contradiction. So . Suppose that , then is a proper convex combination of , which is a contradiction. Thus, and . Then since is on some edge and , by the algorithm so that is present in , which is a contradiction. ∎

###### Lemma 6.

Assume that . Then .

###### Proof.

Let . Assume that is added by so that is the intersection of and . Suppose that there is , and so that for some , is a proper convex combination of . Since , . Also since and , . It is a contradiction to .

Assume that . Since , no is a proper convex combination of and . Thus, due to by Lemma 5.

Assume that is added by . Suppose that there is and so that for some , is a proper convex combination of . Since