1 Introduction
This paper focuses on energy minimization or maximum a posteriori (MAP) inference for undirected graphical models. This problem is closely related to weighted and valued constraint satisfaction. In the most common pairwise case it amounts to minimizing a partially separable function
taking real values on a discrete set of finitevalued vectors
^{1}^{1}1We rigorously define notation in Section “Preliminaries”.(1) 
The problem is known to be NPhard (e. g. li2016complexity li2016complexity), and therefore a number of approximate algorithms were proposed to this end. In contrast, our goal is an efficient method able to solve largescale, but mostly simple problem instances exactly
. Such instances typically arise in computer vision, machine learning and other areas of artificial intelligence. Although approximate methods often provide reasonable solutions, having an exact solver can be quite critical at the modeling stage, when one has to differentiate between modeling and optimization errors. In this case one usually resorts to either specialized combinatorial solvers (see references in kappes2015comparative kappes2015comparative; hurley2016multi hurley2016multi) or offtheshelf integer linear program (ILP) solvers like CPLEX
[CPLEX, IBM2014] or Gurobi [Gurobi Optimization2016]. However, neither specialized nor offtheshelf solvers scale well, as the problem instances get larger. Our method is able to use the fact that a linear program (LP) relaxation of the problem is “almost” tight, i. e. the obtained solution is close to the optimal one. It restricts application of an exact solver to a small fraction of the problem, where the LP relaxation is not tight and yet obtains a provably optimal solution to the whole problem. This allows to solve problems for which no efficient solving technique was available.Related work LP relaxations are an important building block for a number of algorithms addressing the MAPinference problem (1
). It was probably first considered in (shlezinger1976syntactic shlezinger1976syntactic; see werner2007linear werner2007linear for the recent review) both in its primal and dual form. The notion of
reparametrization (known also as equivalent transformations or equivalence preserving transformations) was introduced in the same work as well. Although the bound provided by the LP relaxation is often good, the class of problems, where it is tight, is limited (see kolmogorov2015power kolmogorov2015power). Practically important problems from this class are mainly those having acyclic structure or submodular costs. Therefore a number of works were devoted to cutting plane techniques to tighten the relaxation (e. g. koster1998partial koster1998partial; sontag2007cutting sontag2007cutting; degivry2017clique degivry2017clique). Sometimes the tightening itself may lead to an exact solution, however, in general it is accomplished with branchandbound or algorithms. The most prominent representative of the first class are the DAOOPT [Marinescu and Dechter2005, Otten and Dechter2010] and Toulbar2 [Cooper et al.2010] solvers. The latter has recently shown impressive results on a number of benchmarks [Hurley et al.2016]. In contrast, the algorithm so far was mainly used in specific applications (e. g. bergtholdt2010study bergtholdt2010study).Recently developed LPrelaxationbased partial optimality methods (e. g. shekhovtsov2014maximum shekhovtsov2014maximum; shekhovtsov2015maximum shekhovtsov2015maximum; Swoboda2016 Swoboda2016) can find optimal labels for a significant part of variables without solving the combinatorial problem (1). Afterwards, a combinatorial solver can be applied to the rest of the variables to obtain a complete solution. These methods work well if the pairwise costs play the role of a “smoothing regularizer” by slightly penalizing differences in values in neighboring variables and . However, they struggle as the pairwise costs get more weight and move towards “hard constraints”, when some pairs of variable values are strongly penalized or even forbidden.
The CombiLP method [Savchynskyy et al.2013] is the closest to our work. It iteratively splits the problem into a “simple” and a “difficult” part based on consistency of reparametrized unary and pairwise costs, known as (virtual) arcconsistency (see werner2007linear werner2007linear), and checks for agreement of their solutions. The “simple” part is addressed with a dedicated LP solver, whereas the “difficult” one is solved with an ILP method. Although CombiLP has shown promising results on the OpenGM benchmark [Kappes et al.2015], its usage is beneficial for sparse graphical models only, when .
Contribution Based on CombiLP, we propose a method, which is not restricted to sparse models. Similar to CombiLP, we split the problem into LP and ILP parts based on local consistency properties. Our new consistency criterion guarantees that the concatenation of the obtained LP and ILP solutions is optimal for the whole problem, given that the criterion is satisfied. When the criterion is not satisfied, we increase the ILP subproblem and correspondingly decrease the LP one, like it is done in CombiLP. There are several crucial differences to the CombiLP approach, however:

[nosep]

Our “difficult” ILP subproblem is kept much more compact, which is critical for denselyconnected graphs. This leads to substantial computational savings.

Our optimality criterion is stronger than those of CombiLP: Satisfaction of CombiLP’s criterion for a given splitting implies satisfaction of ours.
Additionally, we treat the problem of an initial reparametrization suitable for the used splitting criterion and propose a method, which allows to use arbitrary dual LP solvers within our algorithm, whereas the CombiLP implementation has a fixed dedicated LP solver. This allowed us to choose a more efficient LP solver and to significantly (up to 18 times) speed up the original CombiLP implementation.
Finally, our criterion and implementation^{2}^{2}2Code is available at github.com/fgrsnau/combilp. are also able to deal with higher order models, which intrinsically have a higher connectivity. We show efficacy of our method on publicly available benchmarks from computer vision, machine learning and bioimaging.
2 Preliminaries
Graphical Models and MAPinference Let be an undirected graph with the set of nodes and the set of edges . The neighborhood of is defined as . Each node is associated with a finite set of labels . For any subset of graph nodes the Cartesian product defines the set of labelings of the subset , when each node from is assigned a label. This includes also the special cases and denoted as and respectively. We assume that stands for the set of minimal elements. At the same time, when used with “” or “” operators, it returns some element from this set.
Let be the set of indices enumerating all labels and label pairs in neighboring graph nodes. For each node and edge the cost functions , and , assign a cost to a label or label pair respectively. The vector contains all values of the functions and as its coordinates.
ILP formulation and LP relaxation One way to address the MAPinference problem (1) is to consider its ILP formulation (see e. g. shlezinger1976syntactic shlezinger1976syntactic; werner2007linear werner2007linear)
(2)  
(3) 
A natural LP relaxation is obtained by omitting the integrality constraints (3). The resulting LP (2) is known as a local polytope [Werner2007] or simply an LP relaxation of (1). We will call the problem (1) LPtight, if the optimal values of (1) and its LP relaxation (2) coincide. This also implies that there is an integer solution to the relaxed problem (2). We will say that the LP relaxation has an integer solution in a node if there is such that . Due to constraints of (2) it implies that for .
Linear programs of the form (2) are as difficult as linear programs in general [Prusa and Werner2013] and therefore obtaining exact solutions for largescale instances may require significant time. However, there are fast specialized solvers (e. g. kolmorogrov2006convergent kolmorogrov2006convergent; cooper2008virtual cooper2008virtual) returning approximate dual solutions of (2).
Partial Optimality Observation Practical importance of the LP relaxation (2) is based on the fact that often most coordinates of its (approximate) relaxed solution are assigned integer values. The noninteger coordinates can be rounded [Ravikumar, Agarwal, and Wainwright2010] and the resulting labeling can be used as if it was a solution of the nonrelaxed problem. A number of problems have been successfully addressed with this type of methods [Kappes et al.2015]. However, apart from special cases (e. g. boros2002pseudo boros2002pseudo; rother2007optimizing rother2007optimizing) there is no guarantee that the integer coordinates keep their values in an optimal solution of the nonrelaxed problem.
Even though there is no guarantee that the rounded integer solution is a sensible approximation for the optimal solution, empirical tests have shown that usually many integer coordinates coincide with the ones found in the optimal solution. This is a purely practical observation with little theoretical background. Nevertheless, this observation can be used to address the nonrelaxed problem efficiently and it is a basis of our method. An alternative, the partial optimality approach was pursued by e. g. shekhovtsov2015maximum shekhovtsov2015maximum; Swoboda2016 Swoboda2016. We will provide a corresponding empirical comparison later in the paper.
3 Idea of the Algorithm
Graph partition A subgraph of the graph is called induced by the set of its nodes , if , i. e. the set of its edges contains all edges from connecting nodes from .
The subgraphs and are called partition of the graph , if , and and are induced by and respectively. The subgraph as complement to will be denoted as . The other way around, stands for , if is a partition of . Notation will be used for the set of edges connecting and : .
In the following, we will show how to partition the problem graph into (i) an easy part with subgraph , which can be solved exactly with approximate LP solvers and (ii) a difficult part with subgraph , which will require an ILP solver.
Lower bound induced by partition Till the end of this section we will assume , are a partition of a graph . For the sake of notation, when considering different subgraphs of we will nevertheless use a cost vector corresponding to the master graph , i. e. will stand for , where .
Additionally, for and , their concatenation will be defined as
(4) 
Note that the energy function can be decomposed into subproblems on and and it holds
(5) 
and therefore,
(6) 
constitutes a lower bound for the energy function .
Proposition 1 (Sufficient optimality condition).
The lower bound specified in (6) is tight if for all it holds that , where , .
It is trivial to show that the labeling is optimal for if the lower bound (6) is tight.
When considering the set of all possible partitions of into and there is always at least one that leads to a tight lower bound (6). It corresponds to a trivial partition, where either the subgraph or is empty. The first case corresponds to solving the whole problem with an ILP method, whereas the second one corresponds to the case when the LP relaxation is tight, i. e. all coordinates of an LP solution are integer.
However, as our experimental evaluation shows, there exist often tight nontrivial partitions, with a large subgraph and a small subgraph .
Conceptual Algorithm These partitions can be obtained for example by a conceptual Algorithm 1, which assigns all nodes of the graph having an integer solution to and all others to . After solving both subproblems one checks fulfillment of the sufficient optimality condition defined by Proposition 1. Should the condition hold, the problem is solved. Otherwise one increases the subproblem (and respectively decreases ) by including those nodes , where the condition does not hold for at least one , in terms of Proposition 1.
Relation to CombiLP Algorithm 1 differs from CombiLP [Savchynskyy et al.2013] in one very important aspect. Namely, the subgraphs used in CombiLP are overlapping, whereas ours are not. This substantially improves performance of the method in cases when the graph has a high connectivity. In later sections of this paper we will give a detailed theoretical and empirical comparison of the methods.
In the following, we will turn the conceptual algorithm into a working one. In order to do so, we will give positive answers to a number of important questions:

[nosep]

Why and when is the subproblem on LPtight? This is critical, since we assume to be close to in its size and therefore it must be solvable by a (polynomial) LP method.

Can we avoid running an LP solver for in each iteration?

Can we use (fast specialized) approximate LP solvers on instead of (slow offtheshelf) exact ones?

How to encourage conditions of Proposition 1 to be fulfilled for a possibly small ?
Although our construction mostly follows the one given in [Savchynskyy et al.2013], we repeat it here to keep the paper selfcontained.
4 Theoretical Background
Reparametrization Decompositions of the energy function into unary and pairwise costs are not unique, which is, there exist other costs such that for all labelings . It is known (see e. g. werner2007linear werner2007linear) and straightforward to check that such equivalent costs can be obtained with an arbitrary vector as follows:
(7)  
The costs are called reparametrized and the vector is known as a reparametrization. Costs related by (7) are also called equivalent. In this sense, all vectors can be split into equivalence classes according to (7). Other established terms for reparametrizations are equivalence preserving transformations [Cooper and Schiex2004] and equivalent transformations [Shlezinger1976].
Dual Problem By swapping the and operations in (1) one obtains a lower bound to the energy^{3}^{3}3It can be shown that this bound is in general less tight than (6)., which reads as
(8) 
Although the energy remains the same for all cost vectors from a given equivalence class (), the lower bound is dependent on the reparametrization (). Therefore, a natural maximization problem arises as maximization of the lower bound over all equivalent costs: . It is known (e. g. werner2007linear werner2007linear) that this maximization problem is equivalent to the Lagrangian dual to the LP relaxation (2). In turn, this implies that the minimum of (2) coincides with the maximum of . Therefore, one speaks about optimal reparametrizations as those , where the maximum is attained. Apart from its lower bound property the function is important because (i) function is concave w. r. t. as a sum of minima of linear functions; (ii) there exist many of scalable and efficient algorithms for its (approximate) maximization, e. g. [Kolmogorov2006, Cooper et al.2008].
Strict ArcConsistency From a practical point of view it is important how an optimal reparametrization can be translated into a labeling, i. e. into an (approximate) solution of the energy minimization problem (1). The following definition plays a crucial role for this question in general and for our method in particular:
Definition 1 (Strict arcconsistency).
The node is called strictly arcconsistent w. r. t. the costs if there exists a label and labels for all , such that it holds (i) for all ; and (ii) for all .
The set of strictly arcconsistent nodes is denoted by .
If all nodes are strictly arcconsistent w. r. t. the reparametrized costs , then it is straightforward to check that , where
(9) 
In turn, this implies that is an optimal reparametrization and is an exact solution of the energy minimization problem (1).
Reconstructing labeling from reparametrization Although there is no guarantee that the strict arcconsistency property holds for all nodes even with an optimal reparametrization, the rule (9) is still used to obtain an approximate minimizer for (1) with arbitrary, also nonoptimal reparametrizations (although, a number of more sophisticated rules were proposed, they are based on (9) and reduce to it if the strict arcconsistency holds for all nodes, see e. g. ravikumar2010message ravikumar2010message). Moreover, for an optimal reparametrization , when the strict arcconsistency holds for a node , the complementary slackness conditions imply (e. g. werner2007linear werner2007linear) that strict arcconsistency of a node guarantees an integer solution of the LP relaxation in .
From the application point of view, an (approximate) solution (9) is typically considered as good, if most of the nodes satisfy the strict arcconsistency property. At the same time, unless the strict arcconsistency holds for all nodes, there is in general no theoretical guarantee that obtained as (9) coincide with the corresponding coordinate of an optimal solution , even if the node is strictly arcconsistent.
5 Detailed Algorithm Description
Let us consider Algorithm 2. It differs from Algorithm 1 provided above in several aspects: Instead of solving the relaxed problem (2) in the primal domain, it solves its dual formulation and resorts to the optimally reparametrized costs. Strict arcconsistency is used in place of integrality to form the initial set , which is justified by the fact that strict arcconsistency is sufficient for integrality.
The reparametrization step in line 2 plays a crucial role for the whole method. Due to this step, solving the energy minimization problem on becomes trivial because of its strict arcconsistency. It can be performed by selecting the best label in each node independently, according to (9). Therefore, there is no computational overhead of resolving the problem on in each iteration. Also, as more and more nodes from the initial subgraph move over to the subgraph their strict arcconsistency encourages solution on to coincide with the locally optimal labels. Moreover, instead of an optimal dual solution any, also approximate, nonoptimal reparametrization can be used. According to Proposition 1, this does not affect correctness of Algorithm 2. Therefore, approximate solvers can be used in line 1 of the algorithm. However, the better the dual solution is, the larger the set of strictly arcconsistent nodes is and therefore, the lower computational complexity of the ILP phase of the algorithm. Finally, reparametrization of the costs typically speeds up the ILP solver in line 6, as it serves as preprocessing.
6 Analysis of the Method
Family of Tight Partitions The proposition below that if the sufficient optimality criterion (Proposition 1) of Algorithm 2 is fulfilled for a partition , then for any other partition such that the criterion holds as well:
Proposition 2.
This property shows that there are potentially many partitions, which results in a tight bound and allows to apply a greedy strategy for growing the subgraph by adding all inconsistent nodes (violating Proposition 1) at once, as it is done in line 11 of Algorithm 2.
Comparison to CombiLP As mentioned above, the CombiLPmethod is very similar to ours, but uses a different optimality criterion. Below we show that our criterion is in a certain sense stronger than theirs. To this end, following [Savchynskyy et al.2013], we introduce the notion of a boundary complement subgraph:
Definition 2 (savchynskyy2013global savchynskyy2013global).
Let be an induced subgraph of . A subgraph is called boundary complement to w. r. t. if it is induced by the set , where is the set of nodes in incident to nodes outside .
The optimality criterion used in CombiLP reads:
Theorem 1 (savchynskyy2013global savchynskyy2013global).
Let be a subgraph of and be its boundary complement w. r. t. . Let and be labelings minimizing and respectively and let all nodes be strictly arcconsistent. Then from
(10) 
it follows that the labeling with coordinates
, is optimal on .
As can be seen from comparing Proposition 1 and Theorem 1, the main difference between the methods is that we use a partition of the graph , i. e. nonintersecting subgraphs, whereas the subgraphs in CombiLP are boundary complement and therefore intersect.
The following proposition states that the bounds produced by our method are at least as tight as those of CombiLP:
Proposition 3.
Let be a partition of a graph and be boundary complement for and . Let also , be optimal labelings on and . If the condition (10) holds for and , i. e. for all , then Proposition 1 holds for and as well, where is the restriction of to the set . In other words, for the same subgraph fulfillment of Theorem 1 implies fulfillment of Proposition 1.
7 Technical Details
PostProcessing of Reparametrization The maximum of the dual objective is typically nonunique. Since is a concave function, the set of its maxima is convex and therefore it contains either a unique element or a continuum. Unfortunately, not all optimal (or suboptimal ones, corresponding to the same value of ) reparametrizations are equally good for our method. Moreover, different dual algorithms return different reparametrizations and the fastest algorithm may not return an appropriate one.
Therefore, we developed a postprocessing algorithm to turn an arbitrary reparametrization into a suitable one without decreasing the value of . This algorithm consists of two steps: (i) several iterations of a message passing (dual blockcoordinate ascent) algorithm, which accumulates weights in unary costs and (ii) partial redistribution of unary costs between incident pairwise cost functions. This twostep procedure empirically turns most of the nodes, where the LP relaxation (2) has an integer solution, into strictly arcconsistent ones. The details of both steps are described in the supplement.
Higher Order Extensions All discussed techniques are easily extended to the higherorder MAPinference problem
(11) 
where the cliques in the decomposition of the energy function may contain terms dependent on 3, 4 and more nodes. The bound (6) in the higherorder case reads as
(12) 
where similar to in the pairwise case. Proposition 1 for the higherorder case turns into:
Proposition 4.
Let and . The lower bound (12) is tight if for all and it holds that where .
The proof follows the same reasoning as the proof of Proposition 1 and is omitted here.
dataset (avg. )  popt  clp  dclp 

worms (558)  100%  69.30%  26.08% 
proteinfolding (37)  79.22%  100%  71.03% 
colorseg (79k)  12.10%  0.16%  0.06% 
mrfstereo (138k)  45.19%  33.58%  33.49% 
(53.30%)  (0.45%)  (0.20%)  
OnCallRostering (948)  —  98.80%  65.68% 
dataset (#instances)  density  cpx  tb2  popttb2  clporig  clptb2  dclptb2  

worms (30)  10.6 %  1  54.7  13  8.3  13  8.0  15  14.2  17  6.9  25  5.8 
(17)  (3.1)  
proteinfolding (11)  100 %  2  48.5  11  1.1  11  1.7  10  16.8  11  0.9  11  0.8 
colorseg (19)  0.007 %  5  4.9  15  22.1  18  0.3  18  7.6  18  1.1  18  1.4 
mrfstereo (3)  0.003 %  0  —  0  —  1  0.9  2  46.9  2  3.2  2  2.3 
OnCallRostering (3)  0.9 %  2  0.9  2  0.1  —  —  —  —  3  2.3  3  1.1 
8 Experimental Evaluation
Algorithms In this section we compare our proposed algorithm with other related methods. As baselines we use CPLEX 12.6.2 [CPLEX, IBM2014] and ToulBar2 0.9.8.0 [Cooper et al.2010] where the first is the wellknown commercial optimizer and the latter is one of the best dedicated branchandbound solvers for (1), see comparison in [Hurley et al.2016]. We used comparable parameters and settings like the ones used in [Hurley et al.2016]. They are denoted by cpx or tb2 respectively. The original CombiLP [Savchynskyy et al.2013] implementation is referred as clporig. For a fair comparison, we modified it to make it compatible with arbitrary LP and ILP solvers, in particular, by applying the reparametrization postprocessing algorithm described above. The modified method referred as clp is up to an order of magnitude faster than the original one clporig (see Table 2). For the experiments with clp and dclp we used both CPLEX and ToulBar2 as ILPsolvers. The corresponding variants of clp are denoted as clpcpx and clptb2 respectively and similarly for dclp. Since the ToulBar2 variants (clptb2 and dclptb2) were superior to the CPLEX variants in all our tests, we will mainly discuss the former here (see supplement for all results). TRWS [Kolmogorov2006] is used as fast blockcoordinatedescend LPsolver everywhere except higherorder models. We used a fast implementation of the solver from the work [Shekhovtsov, Swoboda, and Savchynskyy2015]. Only for higherorder examples we resort to SRMP [Kolmogorov2015] using the minimal or basic LP relaxation (for details see kolmogorov2015new kolmogorov2015new). We set the maximum number of TRWS/SRMP iterations to . Furthermore we tested the performance of a recent partial optimality technique [Shekhovtsov, Swoboda, and Savchynskyy2015] which is denoted by popt. As this approach does not solve the whole problem, we run ToulBar2 on the reduced model and measure the total running time (popttb2). We set the maximal running time for all methods to hour.
Datasets We verify performance of the algorithms on the following publicly available datasets: worms [Kainmueller et al.2017], colorseg [Lellmann and Schnörr2011], mrfstereo [Scharstein and Szeliski2002] and OnCallRostering [Stuckey et al.2014], proteinfolding [Yanover, SchuelerFurman, and Weiss2008]. Each of these datasets is included to highlight specific strengths and weaknesses of the competing methods. The worms dataset (30 instances) serves as a prime example for our algorithm due to its relatively densely connected graph structure and a small duality gap. The mrfstereo (3 instances) and colorseg (19 instances) datasets consist of sparsely connected gridmodels and are used to compare performance to the CombiLP method clp. The proteinfolding dataset can be split into easy problems (many nodes, sparsely connected) and hard problems (only around 3340 nodes, fully connected). In the following, we only consider the hard problems ( instances in total). Last but not least, the dataset OnCallRostering ( instances) is included as an example of higherorder models, which include cliques of order four. Unfortunately, we were unable to convert other instances of this dataset from the benchmark [Hurley et al.2016] because of a memory bottlenecks in the conversion process. Apart from OnCallRostering and worms, all other problem instances were taken from the OpenGM benchmark [Kappes et al.2015].
Results We compare and analyse performance of our method in the following three settings: (i) targeted dense problems like the worms and proteinfolding datasets; (ii) sparse problems (mrfstereo and colorseg), and (iii) exemplary higherorder problems (OnCallRostering).
(i) dense models: On the worms dataset our method dclptb2 clearly outperforms competitors, as Table 2 shows. dclptb2 solves instances out of , the next competitor clptb2 – only 17. Moreover, our solver is also more than times faster than clptb2 in average. This is due to the fact that the resulting ILP subproblem of dclp is much smaller that those of clp, see Figure 1 for visual comparison. The partial optimality method is unable to reduce the problem (see Table 1) because of infinite pairwise costs to disallow assigning the same label to different nodes. Figure 2 shows primal and dual bounds as a function of computational time for this dataset.
Although on the proteinfolding dataset dclptb2 also outperforms all its competitors, the improvement over clptb2 and tb2 is not that pronouncing as for the worms dataset. This is because the final ILP subproblem of dclp covers a significant part of the whole graph (over in average). To satisfy its optimality criterion, dclp performs up to iterations with smaller ILP subproblems. In contrast, clp considers the whole graph as an ILP subproblem right at the very first iteration. Interestingly that even under this circumstances clptb2 outperforms tb2 and clpcpx outperforms cpx (see supplement for details). The latter solves only problem instances out of , whereas clpcpx is able to cope with . We attribute it to the reparametrization, which is performed by clp prior to passing the problem to cpx or tb2 and plays a role of an efficient presolving.
(ii) sparse models: Sparse (gridstructured) datasets mrfstereo and colorseg with about graph nodes each are very well suitable for both clp and dclp methods and are difficult for cpx and tb2. Both clp and dclp are able to solve all the problems except the largest one (teddy from mrfstereo dataset with over nodes and labels) in similar time. On colorseg the method clptb2 is somewhat faster, whereas dclptb2 requires less time on mrfstereo. This is due to the fact that dclp consistently produces smaller ILP subproblems (see Table 1 for comparison), but clp may require less iterations due to the start with a larger ILP subproblem. Partial optimality popttb2 is the winner for the colorseg dataset: Although its ILP subproblems are larger than those of clp and dclp, it runs an ILP solver only once. However, results of popttb2 on mrfstereo are useful only up to a limited extend: They are sufficient to solve only a single, the simplest problem from that dataset (tsukuba). dclptb2 and clptb2 in contrast solve two problem instances each.
(iii) higherorder models: The dataset OnCallRostering is included mainly to show applicability of our method to higherorder models. Generally, higherorder models pose additional difficulties to solvers because they are intrinsically dense and the size of an ILP formulation of the problem grows exponentially with the problem order, therefore even small problems may not fit into memory of an ILP solver. The dclp method again shows its advantage over clp as similarly as in the case of the worms dataset: Since the problems are intrinsically dense, the ILP subproblem for dclp is smaller, which results in speedup compared to clp. We also found tb2 and cpx to be quite efficient on this dataset, although they were able to solve only problems out of .
9 Conclusions
We presented a new method, suitable to solve efficiently largescale MAPinference problems. The prerequisites for efficiency is the “almost” tight LP relaxation, i. e. the nonstrictarcconsistent subset of nodes should constitute only a small portion of the problem. In this case, it isnot the size of the problem which important, but only the size of its nonstrictarcconsistent subproblem. Comparing to previous works, our method is able to further reduce this size, which is especially notable if the underlying graph structure of the model is nonsparse. In the future, we plan to extend the method to a broader class of combinatorial problems.
Acknowledgement
This work was supported by the DFG grant “Exact RelaxationBased Inference in Graphical Models” (SA 2640/11). We thank the Center for Information Services and High Performance Computing (ZIH) at TU Dresden for generous allocations of computer time.
References
 [Bergtholdt et al.2010] Bergtholdt, M.; Kappes, J.; Schmidt, S.; and Schnörr, C. 2010. A study of partsbased object class detection using complete graphs. International journal of computer vision 87(1):93–117.
 [Boros and Hammer2002] Boros, E., and Hammer, P. L. 2002. Pseudoboolean optimization. Discrete applied mathematics 123(1):155–225.
 [Cooper and Schiex2004] Cooper, M., and Schiex, T. 2004. Arc consistency for soft constraints. Artificial Intelligence 154(12):199–227.
 [Cooper et al.2008] Cooper, M. C.; de Givry, S.; Sanchez, M.; Schiex, T.; and Zytnicki, M. 2008. Virtual arc consistency for weighted csp. In AAAI, volume 8, 253–258.
 [Cooper et al.2010] Cooper, M. C.; de Givry, S.; Sánchez, M.; Schiex, T.; Zytnicki, M.; and Werner, T. 2010. Soft arc consistency revisited. Artificial Intelligence 174(7):449–478.
 [CPLEX, IBM2014] CPLEX, IBM. 2014. ILOG CPLEX 12.6 Optimization Studio.
 [de Givry and Katsirelos2017] de Givry, S., and Katsirelos, G. 2017. Clique cuts in weighted constraint satisfaction. In International Conference on Principles and Practice of Constraint Programming, 97–113. Springer.
 [Gurobi Optimization2016] Gurobi Optimization, I. 2016. Gurobi optimizer reference manual.
 [Hurley et al.2016] Hurley, B.; O’Sullivan, B.; Allouche, D.; Katsirelos, G.; Schiex, T.; Zytnicki, M.; and De Givry, S. 2016. Multilanguage evaluation of exact solvers in graphical model discrete optimization. Constraints 21(3):413–434.
 [Kainmueller et al.2017] Kainmueller, D.; Jug, F.; Rother, C.; and Meyers, G. 2017. Graph matching problems for annotating c. elegans. http://dx.doi.org/10.15479/AT:ISTA:57. Accessed: 20170910.
 [Kappes et al.2015] Kappes, J. H.; Andres, B.; Hamprecht, F. A.; Schnörr, C.; Nowozin, S.; Batra, D.; Kim, S.; Kausler, B. X.; Kröger, T.; Lellmann, J.; et al. 2015. A comparative study of modern inference techniques for structured discrete energy minimization problems. International Journal of Computer Vision 115(2):155–184.
 [Kolmogorov, Thapper, and Zivny2015] Kolmogorov, V.; Thapper, J.; and Zivny, S. 2015. The power of linear programming for generalvalued csps. SIAM Journal on Computing 44(1):1–36.
 [Kolmogorov2006] Kolmogorov, V. 2006. Convergent treereweighted message passing for energy minimization. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28(10):1568–1583.
 [Kolmogorov2015] Kolmogorov, V. 2015. A new look at reweighted message passing. IEEE transactions on pattern analysis and machine intelligence 37(5):919–930.
 [Koster, Van Hoesel, and Kolen1998] Koster, A. M.; Van Hoesel, S. P.; and Kolen, A. W. 1998. The partial constraint satisfaction problem: Facets and lifting theorems. Operations research letters 23(3):89–97.
 [Lellmann and Schnörr2011] Lellmann, J., and Schnörr, C. 2011. Continuous multiclass labeling approaches and algorithms. SIAM Journal on Imaging Sciences 4(4):1049–1096.
 [Li, Shekhovtsov, and Huber2016] Li, M.; Shekhovtsov, A.; and Huber, D. 2016. Complexity of discrete energy minimization problems. In European Conference on Computer Vision, 834–852. Springer.
 [Marinescu and Dechter2005] Marinescu, R., and Dechter, R. 2005. And/or branchandbound for graphical models. In IJCAI, 224–229.
 [Otten and Dechter2010] Otten, L., and Dechter, R. 2010. Toward parallel search for optimization in graphical models. In ISAIM.

[Prusa and
Werner2013]
Prusa, D., and Werner, T.
2013.
Universality of the local marginal polytope.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 1738–1743.  [Ravikumar, Agarwal, and Wainwright2010] Ravikumar, P.; Agarwal, A.; and Wainwright, M. J. 2010. Messagepassing for graphstructured linear programs: Proximal methods and rounding schemes. Journal of Machine Learning Research 11(Mar):1043–1080.
 [Rother et al.2007] Rother, C.; Kolmogorov, V.; Lempitsky, V.; and Szummer, M. 2007. Optimizing binary mrfs via extended roof duality. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, 1–8. IEEE.
 [Savchynskyy et al.2013] Savchynskyy, B.; Kappes, J. H.; Swoboda, P.; and Schnörr, C. 2013. Global MAPoptimality by shrinking the combinatorial search area with convex relaxation. In Advances in Neural Information Processing Systems, 1950–1958.
 [Scharstein and Szeliski2002] Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense twoframe stereo correspondence algorithms. International journal of computer vision 47(13):7–42.
 [Shekhovtsov, Swoboda, and Savchynskyy2015] Shekhovtsov, A.; Swoboda, P.; and Savchynskyy, B. 2015. Maximum persistency via iterative relaxed inference with graphical models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 521–529.
 [Shekhovtsov2014] Shekhovtsov, A. 2014. Maximum persistency in energy minimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1162–1169.
 [Shlezinger1976] Shlezinger, M. 1976. Syntactic analysis of twodimensional visual signals in the presence of noise. Cybernetics and systems analysis 12(4):612–628.
 [Sontag2007] Sontag, D. A. 2007. Cutting plane algorithms for variational inference in graphical models. Ph.D. Dissertation, Massachusetts Institute of Technology.
 [Stuckey et al.2014] Stuckey, P. J.; Feydy, T.; Schutt, A.; Tack, G.; and Fischer, J. 2014. The minizinc challenge 2008–2013. AI Magazine 35(2):55–60.
 [Swoboda et al.2016] Swoboda, P.; Shekhovtsov, A.; Kappes, J.; Schnörr, C.; and Savchynskyy, B. 2016. Partial Optimality by Pruning for MAPInference with General Graphical Models. IEEE Trans. Patt. Anal. Mach. Intell. 38(7):1370–1382.
 [Werner2007] Werner, T. 2007. A linear programming approach to maxsum problem: A review. Pattern Analysis and Machine Intelligence, IEEE Transactions on 29(7):1165–1179.
 [Yanover, SchuelerFurman, and Weiss2008] Yanover, C.; SchuelerFurman, O.; and Weiss, Y. 2008. Minimizing and learning energy functions for sidechain prediction. Journal of Computational Biology 15(7):899–911.
10 Supplementary Material for
“Exact MAPInference by Confining Combinatorial Search with LP Relaxation”
Stefan Haller^{2}^{2}footnotemark: 2, Paul Swoboda^{3}^{3}footnotemark: 3, Bogdan Savchynskyy^{2}^{2}footnotemark: 2
^{2}^{2}footnotemark: 2University of Heidelberg, ^{3}^{3}footnotemark: 3 IST Austria
stefan.haller@iwr.uniheidelberg.de
Proof of Proposition 1
Proof of Proposition 2
Lemma 1.
From requirements of Proposition 1 follows that .
Proof.
Since and , it holds that . From strict arcconsistency we know that and are determined by (9), hence for all .
Proof of Proposition 2:
Proof.
From the requirements of the Proposition we already know that and are the optimal assignment of and respectively. It remains to show that for all .
Applying (5) to results in . Due to if either or . In the first case it follows from , as for the optimal strictly arcconsistent label for it holds that and (Lemma 1). In the second case (Lemma 1) and from fulfillment of Proposition (1) for follows that .
Hence for and all requirements of Proposition (1) are fulfilled for . ∎
Proof of Proposition 3
Proof.
The prerequisites already assure that is optimal for , so it remains to show that is optimal for and that for all . The optimality of for follows trivially from and the fact that is optimal for , as for both and the optimal labeling is determined by (9). From Definition 2 we know that . In other words, all edges are covered by subgraph (note that ). Due to , for all it is true that . From the preconditions we know that for all , hence . ∎
Reparametrization PostProcessing
The details of both steps are described below.
(i) In order to obtain a fast postprocessing algorithm we modified one of the fastest dual methods, TRWS of [Kolmogorov2006], which also can be seen as a special case of its higherorder counterpart SRMP [Kolmogorov2015]. For the sake of brevity we refer to the latter method, because of its simpler presentation and because it works also for higher order models (see below). Our whole modification consisted in reassigning the weights defined by expression (14) in [Kolmogorov2015] with the values
(13) 
We also performed the same reassignment of the weights (defined by expression (16) in [Kolmogorov2015]), with in place of . We refer to [Kolmogorov2015] for a detailed description of the notation and the algorithm itself. The only difference between (13) and the original expression (14) from [Kolmogorov2015] is an additional term added to the denominator of the expression in the upper line of (13). The nonzero leads to redistribution of the labeling costs between unary cost functions. Therefore, nonoptimal labels get higher costs than those belonging to an optimal labeling. We empirically found the value to work well in practice.
(ii) It is a property of a (modified) SRMP method that locally optimal pairwise costs are always for any graph edge . The mentioned above partial redistribution of unary costs between incident pairwise cost functions was done as for all and .
Labelwise relative ILP size As the partial optimality techniques work on a labelwise basis, we use a labelwise measure for comparing the size of the final ILP subproblem. For popt we use the formula (35) of [Shekhovtsov, Swoboda, and Savchynskyy2015] to compute the relative number of eliminated labels. Subtracting this value from 100% yields the values in Table 1. The final formula for popt looks like the following
(14) 
For clp and dclp we evaluate the following expression to compute the value with the same semantic:
(15) 
As popt is a polynomial time algorithm, it will output a reduced model for all instance of the benchmark. As clp and dclp try to solve the NPhard MAPinference problem (1), they do not terminate for all instances. As the maximal ILP subproblem is only defined after Proposition 1 holds, we assume the worstcase and use 100% as ILP subproblem size for unsolved instances.
Complete benchmark table For lack of space we removed some solvers from the experimental evaluation. The following tables show the results for each instance and each solver separately..
instance  cpx  tb2  popttb2  clporig  clpcpx  clptb2  dclpcpx  dclptb2  

C18G1_2L1_1  —  —  100%  —  —  —  —  —  —  —  19.8%  58.1  19.8%  14.8 
cnd1threeL1_1213061  —  1.3  100%  1.1  34.1%  8.9  34.6%  1.7  34.6%  0.6  3.9%  0.5  3.9%  0.4 
cnd1threeL1_1228061  —  0.6  100%  1.2  36.7%  7.2  35.4%  1.8  35.4%  0.5  5.3%  0.4  5.3%  0.5 
cnd1threeL1_1229061  —  —  100%  —  —  —  —  —  —  —  20.8%  56.4  20.8%  20.5 
cnd1threeL1_1229062  —  —  100%  —  —  —  —  —  —  —  —  —  —  — 
cnd1threeL1_1229063  —  1.7  100%  1.3  31.7%  15.3  26.3%  6.3  26.3%  0.5  5.3%  0.6  5.3%  0.4 
eft3RW10035L1_0125071  —  —  100%  —  —  —  —  —  52.8%  51.2  16.7%  50.1  16.7%  32.4 
eft3RW10035L1_0125072  —  —  100%  —  —  —  —  —  —  —  16.8%  29.0  16.8%  1.9 
eft3RW10035L1_0125073  —  —  100%  —  —  —  —  —  —  —  —  —  —  — 
egl5L1_0606074  —  —  100%  —  —  —  —  —  —  —  —  —  —  — 
elt3L1_0503071  —  51.2  100%  55.5  61.5%  36.5  —  —  60.8%  2.3  14.6%  3.3  14.6%  0.7 
elt3L1_0503072  —  9.3  100%  4.3  53.2%  15.0  49.5%  10.6  49.5%  1.2  10.0%  0.7  10.0%  0.4 
elt3L1_0504073  —  —  100%  —  —  —  —  —  —  —  13.4%  1.7  13.4%  0.9 
hlh1fourL1_0417071  —  —  100%  —  —  —  —  —  —  —  —  —  —  — 
hlh1fourL1_0417075  —  1.8  100%  1.3  44.7%  11.5  45.8%  6.3  45.8%  0.6  5.2%  0.5  5.2%  0.5 
hlh1fourL1_0417076  —  —  100%  —  —  —  —  —  68.4%  52.8  20.2%  25.9  20.2%  13.0 
hlh1fourL1_0417077  —  28.8  100%  30.8  46.7%  7.1  46.7%  3.9  46.7%  0.7  5.6%  0.5  5.6%  0.5 
hlh1fourL1_0417078  —  3.5  100%  2.0  54.1%  17.8  53.0%  30.1  53.0%  1.0  10.4%  1.8  10.4%  0.5 
mir61L1_1228061  —  —  100%  —  —  —  —  —  —  —  13.5%  22.7  13.5%  5.1 
mir61L1_1228062  —  —  100%  —  —  —  —  —  —  —  —  —  15.9%  39.2 
mir61L1_1229062  —  —  100%  —  67.1%  18.0  67.5%  14.3  67.5%  1.1  8.9%  1.3  8.9%  0.6 
pha4A7L1_1213061  —  —  100%  —  —  —  —  —  —  —  20.1%  17.0  20.1%  7.1 
pha4A7L1_1213062  54.7  1.2  100%  1.0  8.4%  8.2  8.4%  0.4  8.4%  0.3  0.7%  0.3  0.7%  0.3 
pha4A7L1_1213064  —  —  100%  —  47.4%  20.9  40.8%  16.8  40.8%  1.4  11.1%  1.2  11.1%  0.6 
pha4B2L1_0125072  —  —  100%  —  —  —  —  —  —  —  20.1%  10.7  20.1%  2.3 
pha4I2L_0408071  —  —  100%  —  —  —  —  —  —  —  —  —  —  — 
pha4I2L_0408072  —  1.9  100%  1.3  42.5%  9.1  43.2%  4.2  43.2%  0.6  4.9%  0.5  4.9%  0.4 
pha4I2L_0408073  —  3.0  100%  1.8  60.2%  11.9  60.5%  7.6  60.5%  1.0  10.7%  1.2  10.7%  0.5 
unc54L1_0123071  —  1.5  100%  1.2  32.3%  10.8  32.3%  1.5  32.3%  0.6  1.9%  0.5  1.9%  0.4 
unc54L1_0123072  —  1.8  100%  1.3  53.9%  12.0  53.9%  4.2  53.9%  0.7  7.9%  0.6  7.9%  0.6 
average  54.7  8.3  100%  8.0  50.0%  14.0  42.7%  7.8  45.8%  6.9  11.2%  11.9  11.3%  5.8 
instance  cpx  tb2  popttb2  clporig  clpcpx  clptb2  dclpcpx  dclptb2  

1CKK  —  0.5  73.6%  1.2  100%  13.6  100%  36.0  100%  0.7  76.4%  13.2  76.4%  0.6 
1CM1  —  0.6  70.1%  1.2  100%  7.9  100%  3.1  100%  0.5  42.0%  1.4  42.0%  0.3 
1SY9  38.6  0.5  42.2%  0.8  100%  8.5  100%  6.4  100%  0.6  31.0%  0.7  31.0%  0.3 
2BBN  —  1.2  85.9%  2.7  100%  31.1  100%  40.1  100%  1.2  62.9%  9.9  62.9%  1.0 
2BCX  —  3.1  85.8%  3.6  100%  31.3  —  —  100%  1.6  —  —  94.1%  2.1 
2BE6  58.4  0.5  84.9%  1.1  100%  9.3  100%  13.5  100%  0.4  96.6%  7.9  96.6%  0.7 
2F3Y  —  2.7  86.3%  2.1  100%  —  —  —  100%  1.2  97.7%  50.8  97.7%  1.5 
2FOT  —  1.0  89.0%  1.5  —  18.5  100%  13.1  100%  0.8  69.7%  10.6  69.7%  0.7 
2HQW  —  0.6  82.0%  1.2  100%  10.5  100%  8.4  100%  0.8  68.2%  5.1  68.2%  0.5 
2O60  —  0.8  84.5%  1.9  100%  24.3  100%  7.9  100%  0.8  91.0%  8.1  91.0%  0.9 
3BXL  —  0.7  87.7%  1.6  100%  12.6  100%  5.2  100%  0.7  52.3%  2.4  52.3%  0.5 
average  48.4  1.1  79.2%  1.7  100%  21.0  100%  14.9  100%  0.9  68.8%  11.0  71.1%  0.8 
instance  cpx  tb2  popttb2  clporig  clpcpx  clptb2  dclpcpx  dclptb2  

tedgm  —  —  29.0%  —  —  —  —  —  —  —  —  —  —  — 
tsugm  —  —  6.7%  0.9  0.2%  35.0  0.2%  2.6  0.2%  1.5  0.1%  0.8  0.1%  0.7 
vengm  —  —  99.9%  —  —  58.7  0.7%  4.8  0.7%  4.8  0.3%  3.9  0.3%  3.8 
average  —  —  45.2%  0.9  0.2%  46.9  0.5%  3.7  0.5%  3.2  0.2%  2.4  0.2%  2.3 
instance  cpx  tb2  popttb2  clporig  clpcpx  clptb2  dclpcpx  dclptb2  

n4/clownfishsmall  —  23.7  8.9%  0.1  0.1%  5.4  0.1%  0.5  0.1%  1.0  0.1%  1.0  0.1%  1.0 
n4/cropssmall  —  31.1  6.8%  0.1  0%  0.7  0.1%  1.0  0.1%  1.0  0.1%  1.0  0.1%  1.0 
n4/fourcolors  2.3  9.5  30.4%  0.0  0.2%  4.8  0.4%  0.7  0.4%  0.7  0.1%  0.7  0.1%  0.7 
n4/lakesmall  —  14.4  8.4%  0.1  0%  0.2  0.0%  1.0  0.0%  1.0  0.0%  1.0  0.0%  1.0 
n4/palmsmall  —  39.3  5.2%  0.1  0%  0.8  0.1%  1.0  0.1%  1.0  0.1%  1.1  0.1%  1.0 
n4/penguinsmall  9.2  8.3  6.3%  0.0  0%  0.3  0.0%  0.8  0.0%  0.8  0.0%  0.8  0.0%  0.4 
n4/pfausmall  —  —  12.7%  2.1  0.7%  10.6  0.5%  1.6  0.5%  0.8  0.4%  1.6  0.4%  1.3 
n4/snail  2.1  0.5  26.7%  0.0  0.1%  2.9  0.1%  0.9  0.1%  0.8  0.1%  0.8  0.1%  0.8 
n4/strawberryglass2small  —  38.9  5.8%  0.1  0%  4.1  0.0%  0.9  0.0%  0.8  0.0%  0.9  0.0%  0.9 
n8/clownfishsmall  —  12.4  8.9%  0.2  0.2%  11.3  0.2%  2.3  0.2%  2.3  0.1%  2.3  0.1%  2.1 
n8/cropssmall  —  54.0  6.8%  0.7  0.2%  11.6  0.4%  2.6  0.4%  2.6  0.2%  2.5  0.2%  2.5 
n8/fourcolors  7.1  15.0  30.4%  0.1  0.5%  5.9  0.6%  1.5  0.6%  1.5  0.1%  1.6  0.1%  1.4 
n8/lakesmall  —  14.0  8.4%  0.1  0.1%  11.8  0.1%  2.0  0.1%  2.1  0.1%  2.0  0.1%  2.1 
n8/palmsmall  —  —  5.3%  0.9  0.3%  27.1  0.3%  3.2  0.3%  3.1  0.2%  3.5  0.2%  2.0 
n8/penguinsmall  —  14.1  6.3%  0.2  0%  3.4  0.0%  1.5  0.0%  1.8  0.0%  1.8  0.0%  1.8 
n8/pfausmall  —  —  8.8%  1.1  0.3%  12.1  0.3%  2.2  0.3%  2.1  0.2%  2.1  0.2%  2.2 
n8/snail  3.9  0.7  26.8%  0.0  0.1%  12.1  0.1%  1.6  0.1%  1.6  0.1%  1.6  0.1%  1.6 
n8/strawberryglass2small  —  55.6  5.9%  0.6  0.4%  12.5  0.4%  2.3  0.4%  2.2  0.2%  2.3  0.2%  2.2 
average  4.9  22.1  12.1%  0.3  0.2%  7.6  0.2%  1.5  0.2%  1.1  0.1%  1.6  0.1%  1.4 
instance  cpx  tb2  popttb2  clporig  clpcpx  clptb2  dclpcpx  dclptb2  

10s50d  —  —  —  —  —  —  —  —  99.9%  7.0  —  —  42.8%  3.4 
4s10d  0.0  0.0  —  —  —  —  97.7%  0.1  97.7%  0.0  71.9%  0.1  71.9%  0.0 
4s23d  1.7  0.0  —  —  —  —  99.0%  8.1  99.0%  0.0  82.5%  24.1  82.5%  0.1 
average  0.9  0.1  —  —  —  —  98.9%  4.1  98.9%  2.3  77.2%  12.4  65.7%  1.1 
Comments
There are no comments yet.