Optimization models in the general form of minimizing a partially separable function of discrete variables, known as energy minimization, weighted/valued constraint satisfaction or max-sum labeling, proved useful in many areas. The function has the formWainwright and Jordan, 2008) used to model a variety of structured statistical recognition problems. In case variables take only two values or , the problem is known as pseudo-Boolean optimization or - polynomial programming. Problems where terms (summands) involve at most two - variables at a time are called quadratic. We consider the general case, where terms may couple more that two variables at a time (higher order) and variables can take more then two values (multilabel).
One major trend for performing inference in graphical models is represented by graph cut methods. The basic capability is essentially to solve a binary pairwise submodular problem, , image segmentation (Greig et al., 1989), by reduction to a minimum cut / maximum flow problem. For the latter, many efficient algorithms exist and their running time is experimentally near linear for typical vision problems (Boykov and Kolmogorov, 2004). This basic method was extended to submodular multilabel problems (Ishikawa, 2003; Schlesinger and Flach, 2006), to general multilabel problems by solving for an optimized crossover between two candidate solutions at a time (Boykov et al., 1999), to higher-order - models reducible to a graph cut (Kolmogorov and Zabih, 2004; Freedman and Drineas, 2005) and to combinations of higher order and multilabel (Ladicky et al., 2010; Delong et al., 2012).
Another technique that can be considered nowadays as a basic graph cut method is the roof dual relaxation (Boros and Hammer, 2002) known in computer vision as quadratic pseudo-Boolean optimization (QPBO) (Kolmogorov and Rother, 2007). It allows to find a partial optimal solution to a non-submodular binary problem and reduces to finding a minimum cut in a specially constructed network (Boros et al., 1991). It can be interpreted (Kolmogorov, 2012a) as solving a submodular relaxation of the initial problem. This basic method is again extended to multilabel problems by solving crossover problems (Lempitsky et al., 2008) and to general higher order - problems by reduction (quadratization) techniques expressing the function as a quadratic function with auxiliary variables (Ishikawa, 2011; Fix et al., 2011; Boros and Gruber, 2012).
Another direction of extending graph cuts to higher order models relies on minimization of more general submodular functions. Several efficient max-flow based algorithms have been proposed (Arora et al., 2012; Kolmogorov, 2012b) for minimization of a sum of submodular functions (SoS). A natural extension of QPBO is represented by submodular and bisubmodular relaxations (Kolmogorov, 2012a; Kahl and Strandmark, 2012).
Arguably, linear programming (LP) is a much more costly tool than computing a minimum cut. Yet, it provides theoretical insight to many methods (Komodakis and Tziritas, 2007; Komodakis et al., 2008) and there has been solvers developed that can address (sometimes approximately) large scale problems. Dual decomposition methods (Schlesinger and Giginyak, 2007; Komodakis et al., 2011) or dual block-descent methods, in particular TRW-S (Kolmogorov, 2006), are competitive with graph cut based methods in terms of speed and quality. There are extensions of these specialized LP methods to higher order models (Kolmogorov and Schoenemann, 2012; Kolmogorov, 2014). Smoothing (Savchynskyy et al., 2011) and proximal (Ravikumar et al., 2010) methods are scalable and offer a theoretically guaranteed convergence speed. Cutting plane approaches (Sontag et al., 2012; Werner, 2008) are used to tighten the relaxation adaptively to the problem.
One drawback of relaxation based methods is that the final discrete solution is obtained by so-called rounding schemes and often appears inferior to solutions by graph cut methods as they stay feasible to the discrete space. Even in the case when many of the relaxed variables take integer values in the optimal relaxed solution, a fundamental problem remains that they may not take the same integer values in the optimal discrete solution. Therefore, unless the relaxation is tight, a local rounding technique cannot provide any guarantees for general models. The situation is dramatically different when we consider quadratic pseudo-Boolean functions. There, all variables that are integer in the relaxation correspond to at least one globally optimal discrete solution (Nemhauser and Trotter, 1975; Hammer et al., 1984). This property of the relaxation is called persistency. For general - polynomial problems persistency was studied by (Lu and Williams, 1987; Adams et al., 1998). In their terminology persistency is associated with relaxations and is a property of the relaxed solution as a whole. In this work we call any partial assignment of a subset of discrete variables persistent if it can be provably extended to a globally optimal solution based on the properties of the relaxation or any other sufficient condition. Success of relaxation based exact methods such as (Savchynskyy et al., 2013) on computer vision and machine learning problems suggests that often a large part of the relaxed solution is integral. In this case we are interested in determining the largest subset of such variables that is persistent.
The present work was to a large extent inspired by simple local sufficient conditions proposed in bioinformatics under the name dead end elimination (DEE) (Desmet et al., 1992; Goldstein, 1994). In computer vision, several methods were proposed that identify a persistent assignment directly in the multi-label setting. These are methods by Kovtun (2003, 2011) and Swoboda et al. (2013, 2014a). Method (Swoboda et al., 2014a) is applicable with a general polyhedral relaxation and it maximizes the subset of peristent variables for their sufficient condition (discussed in §4.5).
In the case of - variables there are several different techniques. Adams et al. (1998) proposed a sufficient condition on dual multipliers to prove persistency of the integral part of the relaxation. Quadratization techniques by Ishikawa (2011); Fix et al. (2011); Boros and Gruber (2012) introduce auxiliary variables in order to reduce the function to a quadratic form and infer persistency from the QPBO method. Lu and Williams (1987) generalized the roof duality approach to higher order by using a higher order linear relaxation. Kolmogorov (2012a) generalized both QPBO and the construction by Lu and Williams (1987) by proposing discrete submodular and bisubmodular relaxations. He argues that the key property of QPBO that needs to be generalized is the existence of a totally half integral optimal solution to the relaxation, , with values in . He characterized all totally half-integral relaxations as bisubmodular relaxations. Finding a good (bi)submodular relaxation appears to be a challenging problem. To our knowledge it was only resolved for the special case of mincut-reducible relaxations (Kahl and Strandmark, 2012; Strandmark, 2012), and even in this case it requires solving a series of linear programs. Even though the relaxed problem itself can be efficiently optimized (in particular when it is a sum of submodular functions), having a sound persistency result at a comparable computation cost is an open problem. No theoretical comparison seems to be possible between (bi)submodular relaxations and quadratization techniques (Kolmogorov, 2012a).
In this work we settle the persistency capabilities achievable with a general polyhedral relaxation. The previously known results are in a certain sense unique, relying on a specific sufficient condition or on a specific type of the relaxation. We show that persistency guarantees are not that rare. To any polyhedral relaxation we associate clear sufficient conditions for persistency. We propose a polynomial time method to determine the largest strongly persistent subset of variables according to the sufficient condition. The method sets up a linear program connected to the given relaxation polytope and maximizes the number of strongly persistent variables. In comparison to QPBO-based or submodularity-based techniques, we employ a more costly optimization tool, but gain the following advantages:
the new sufficient condition generalizes a wide variety of existing methods that span across different fields of research and apply different techniques;
it is possible to pose formally and solve (under certain restrictions) the problem of determining the largest subset of persistent variables;
the maximum to the proposed general sufficient condition is guaranteed to be at least as good as any of the individual methods or their combinations;
the method is invariant to the permutation of labels and reparametrization of the problem as long as the relaxation is invariant;
persistent assignments form a hierarchy when tightening the relaxation.
The author’s previous work (Shekhovtsov, 2013, 2014a) considered only pairwise models and the standard LP relaxation. This paper generalizes to higher order and arbitrary polyhedral relaxations, gives more complete proofs of some properties and establishes comparisons with a novel multilabel method (Swoboda et al., 2014a) and higher-order - methods (Kolmogorov, 2012a; Ishikawa, 2011; Fix et al., 2011; Adams et al., 1998).
In §2 we propose a general approach to persistency with a general polyhedral relaxation. This includes the proposed linear program formulation of maximum persistency and general properties of the problem. In §3 we consider standard LP relaxations and specialize the construction for this case, many properties are simplified. In §4 we propose a theoretical comparison between the proposed framework and other approaches. In §5 we validate our theoretical findings experimentally and compare performance on small random problems. In §6 there is a conclusion and discussion and §A contains proofs.
When generalizing to higher order models many statements and proofs simplify if we use a properly defined notion of the empty sum, the empty product and the empty Cartesian product. They are respectively: , and . The inclusion is non-strict and is strict. LHS refers to the left hand side of an equation. denotes the Iverson bracket. is the set of minimizers. Sets , denote real and non-negative real numbers and is the Boolean domain. A composition of functions is denoted as . Finally, a polytope is assumed to be convex but may be unbounded and a polyhedron means the same as a polytope.
1.2 Energy Minimization
A hypergraph is given by the set of nodes and the set of hyperedges . We assume that is totally ordered and each hyperedge is identified with the tuple of elements of c ordered the total order of . We will further assume that and . Let be a finite set of labels associated to a node . For a subset of nodes the set denotes the Cartesian product in the order defined on and . The assignment of labels to all nodes is called a labeling. Let denote the restriction of to (thus is just a single coordinate) and . Let us define the following functions (terms):
|(general hyperedge term)|
The special cases read equationparentequation
|(pairwise / 1st order term)|
and so on. The constant term is nothing but a single number. The energy function is defined by
It is a partially separable function of discrete variables . In this paper we will use a graphical notation of the energy explained in Figure 1.
The general energy minimization problem is NP-hard to approximate222, inapproximability of the traveling salesman problem (Orponen and Mannila, 1990).. On the other hand, there are tractable subclasses. Works by Thapper and Živný (2012, 2013) and Kolmogorov (2013) characterized all languages of energy functions with terms from a fixed finite set and unrestricted structure. They showed that there are no tractable languages other than those that can be solved by the basic LP relaxation (defined in §3), which proves that the relaxation is a universal and powerful technique.
1.3 General Polyhedral Relaxation
In this section we embed the energy minimization problem into the Euclidean space. A labeling is represented as a -vector in order to linearize the energy and write it as scalar product of this vector with the cost vector consisting of all components for , . According to these components let us define the following set of indices . The embedding is defined by its components
The special cases read equationparentequation
and so on. Let denote the scalar product in . We can write the energy using the embedding as a linear function:
The embedding is illustrated in Figure 2. The energy minimization can be expressed as:
where is the image of the set of labelings, , the set of corresponding points in and is their convex hull, called marginal polytope (Wainwright and Jordan, 2008). The second equality follows from the fact that a convex combination of solutions is also a solution. Polytope has in general exponentially many facets. A relaxation of the problem is obtained by replacing with an outer approximation :
A vector will be called a relaxed labeling. We will consider polyhedral relaxations of the following general form:
where we assume that is such that is bounded and . Since is non-empty, it follows that is non-empty. By these assumptions, relaxation (8) is a feasible and bounded linear program. Note that general inhomogenous equality and inequality constraints can be represented in this form by utilizing the component . The dual problem to (7) and the conical hull of are expressed conveniently as follows. Recall that for a convex set its conical hull is the set:
The conical hull of a relaxation polytope (in the form (8), non-empty and bounded) is obtained by dropping the constraint :
The linear program (7) and its dual are expressed as
where vector is the basis vector for the component and the equality between the primal and the dual formulations holds because the primal problem is feasible and bounded. Let us introduce the notation . Later on, when we consider equality constraints of the form , the vector will obtain the meaning of an equivalent problem and for now it is just an abbreviation.
2 Maximum Persistency
A partial assignment , where , is called weakly persistent if there exists an optimal solution such that . In other words, can be extended to a global solution. Partial assignment is called strongly persistent if holds for all optimal solutions .
It may seem that there are no practical reasons to distinguish strongly and weakly persistent partial assignments as long as they allow to simplify the problem. However, it will become clear later that they have different theoretical properties leading to polynomially solvable versus NP-hard maximum persistency problems. It turns out that strong persistency is more tractable, whereas proofs are generally easier to obtain in the weak form and most results in the literature deliver weak persistency.
In the case of quadratic pseudo-Boolean functions the roof dual relaxation (Boros and Hammer, 2002) is persistent: for any relaxed solution its integral part defines a partial assignment which is optimal to the discrete problem. Moreover, for any labeling , not necessarily optimal, replacing part of on with , the overwrite operation, denoted in (Boros and Hammer, 2002) by , has the following autarky property:
illustrated in Figure 4. We will generalize this property to the multilabel setting.
2.1 Improving Mapping
The overwrite operation discussed above can be represented by a discrete mapping . The following generalization of autarky to an arbitrary mapping is proposed.
A mapping is called (weakly) improving for if
and strictly improving if
The idea of the improving mapping is illustrated in Figure 4. It easily follows from the definition that if is improving then there exists an optimal solution and if is strictly improving then all optimal solutions are contained in . In this way an improving mapping reduces the search space from to .
We will consider node-wise mappings, of the form , where . Furthermore, we restrict ourselves to idempotent mappings, , satisfying . This restriction is without loss of generality. Indeed, for an improving node-wise mapping its compositional power will be idempotent for some (, for = , which turns all cycles in the map to identity) and provides equally good or better reduction with . Idempotent maps have two following properties. Let be a set and idempotent.
If then no is mapped to ;
For the restriction of to is the identity map and there holds ;
It follows that knowing an improving mapping , we can eliminate labels for which and there will remain at least one global minimizer of .
Given a mapping , the verification of the improving property (12) is NP-hard since already in the quadratic pseudo-Boolean case the verification of autarky property (11) is NP-hard (Boros et al., 2006). A tractable sufficient condition will be constructed by embedding the mapping into the space and applying the relaxation there.
2.2 Relaxed Improving Mapping
A linear extension of is a linear mapping that satisfies
See Figure 5 for illustration. Avoiding the discussion of uniqueness333When a linear extension exists, its restriction to the affine hull of is unique., we will only use the following linear extension for a node-wise mapping , which will be denoted . The linear extension is defined by
These coefficients should be understood as a “matrix” representation of . To verify that (14) holds true we simply substitute an integer labeling and expand the components as
Using the linear extension of we can write
This allows to express the condition of improving mapping (12) as
or equivalently, fully in the embedding, as
Taking convex combinations in (20), we obtain an equivalent condition
Thus we have linearized the inequalities necessary for an improving mapping. However, the marginal polytope is not tractable. We introduce a sufficient condition by requiring that the same inequality (21) is satisfied over a larger (tractable) polytope .
A linear mapping is (weak) -improving for if
and is strict -improving for if
Let be a linear extension of and a relaxation polytope. If is -improving for then is improving for .
Naturally, a strict relaxed improving map is relaxed improving, , . This is so because for all such that the inequality (22) is trivially satisfied.
Next we show that the verification of (resp. ) for a given can be solved (decided) in polynomial time. The definition (22) of is equivalent to the expression
The optimization problem in (24) will be therefore called the verification LP. As a linear program over a tractable polytope , it can be solved in polynomial time and hence the decision problem is solvable in polynomial time.
In order to show that the verification of can also be decided in polynomial time we introduce the following equivalent reformulation.
Let . There holds iff
The statement says that a strictly relaxed improving mapping must not change the set of all optimal solutions to the verification LP. This can be further expressed in components of the mapping and of the support set :
Let . There holds iff
We next give necessary conditions for in order that or . They help to narrow down the set of maps to be considered. A relaxed improving map must preserve optimality of solutions to the relaxation and consequently their support set (again in components).
Lemma (Necessary conditions I).
Let be node-wise and . Let and . Then
For there holds equationparentequation
For there holds equationparentequation
Next, we reformulate problems and dually, , not with quantifier as in (22) but with existence quantifiers. This will become important in the formulation of the maximum persistency problem where we optimize over subject to the constraints (resp. ). Recall that the set is defined for the relaxation polytope , where .
Theorem 2.1 (Dual representation of ).
Set can be expressed as
Inequality (30) holds iff the primal problem is bounded, and it is bounded iff the dual is feasible, which is the case iff . ∎
The set is defined via a more complicated quantifier . Fortunately, the following dual reformulation holds for node-wise maps:
Theorem 2.2 (Dual representation of ).
Let be node-wise. Then: (i) there exists such that iff
where is a function such that and iff ; and (ii) for rational inputs (including ) the value of in (i) is a rational number of polynomial bit length.
The constraint can thus be reduced to nearly the same representation as (29), with an addition of an slack term. By construction, this term is zero iff . In practice, taking a larger value of always results in a sufficient condition for and hence does not break correctness. In theory, we want a very small but not so small that it would break polynomiality of the reformulation, which is ensured by part (ii). Note, while the set in the space of all maps was convex but not closed (as seen from definition (23)), the theorem encloses the discrete maps of our interest, in a closed (convex) polytope.
Finally we give a necessary condition for . The theorem has a primal and a dual counterpart. The primal counterpart states that when solving the verification LP, because its objective is in the null space of , the constrains of the problem can be projected onto the same subspace providing a simplification. The dual counterpart states that there always exist dual multipliers such that the improving property holds component-wise for reparametrized costs. This is useful in proofs, providing an alternative reformulation of local inequalities (29).
Theorem 2.3 (Necessary conditions II).
Let be idempotent, and . Then equationparentequation
2.4 Maximum Relaxed Improving Mapping
We showed in §2.2 that weak/strict relaxed-improving property can be verified in polynomial time and have described sets , . Any relaxed-improving map, with the exception of the identity, eliminates some labels as non-optimal. Recall that the label is eliminated by node-wise mapping if . We formulate the following maximum persistency problem:
we directly maximize the number of eliminated labels. The strict variant, with constraint , will be denoted max-si.
The problem may look difficult to solve. Indeed, it optimizes over discrete maps and involves a general polyhedral relaxation in the specification of constraints. Nevertheless, if we place some additional restrictions on the set of maps, it turns out to be solvable in polynomial time in a number of cases summarized in Table 1. One of them is the pseudo-Boolean case, where there are only 3 possible idempotent maps for every node: , and . Problem (max-si) turns out to be solvable in this case. For multilabel problems, node-wise mappings are more diverse. Motivated by the goal to include/generalize existing multilabel methods, the following sets of maps are introduced:
all-to-one maps. The set of maps of the form for all and fixed . This class is a straightforward generalization of the overwrite operation in the autarky (11). A mapping is illustrated in Figure 10(a). There are only two possible choices for every node . The mapping either contracts to a single label or retains unchanged. This class allows to explain one-against-all method of Kovtun (2003) and the central part of the method of Swoboda et al. (2014a) as discussed in §4.4, §4.5.
all-to-one-unknown maps. Set . A mapping has the same form as above, , however the labeling is not fixed now but a part of the specification of the mapping, see Figure 11. In every node there are choices for : send all labels to a single one (which may be chosen) or change nothing. It is easy to see that in the case of two labels, contains all idempotent node-wise maps. As will be shown later the (max-si) problem over this class decomposes into sufficient conditions to determine from the integral part of the solution to the relaxation and the (max-si) problem over .
subset-to-one maps. The set of maps is defined as follows. Let – the set of labels in all nodes. Let . Mapping in every node either preserves the label or overwrites it with :
Vector serves as the indicator of the subset of labels in node that stay immovable while all other labels are mapped to , see Figure 6. In a node there are choices for . Clearly, this class generalizes .
2.5 Formulation for Subset-to-one Maps
In the following three subsections we gradually show that (max-wi) problem over class can be written as a mixed integer linear program in which integrality constraints can be relaxed without loss of tightness and thus we obtain an equivalent LP formulation.
) will involve products of binary variablesfor all , and . To reach the ILP formulation we are going to replace each such product with a substitute variable . This is achieved with the help of the relaxation of Sherali and Adams (1990).
2.6 Relaxation of Sherali and Adams
The relaxation of Sherali and Adams (1990) is applicable to polynomial programs with binary variables . The relaxation of order performs a simultaneous lifting for all subsets of variables with .
|monomial||new variable (S1)|
|multilinear polynomial||linearization (S2)|
|new constraint (S4)|
Let us focus on a single hyperedge c chosen for generality from the set of hyperedges . The construction and its properties (within hyperedge c) are summarized in Table 2. For every product , , a new variable is introduced (S1). A pseudo-Boolean function is linearized by writing it as a multilinear polynomial and replacing each monomial with the new variable , (S2). From this definition we have linearity properties (S3), in particular:
Lemma (Identity Equality (S3)).
Let be the linearization of . Then for all iff for all .
Next, constraints on new variables are added which correspond to identity inequalities for each . Clearly this inequality holds for all . By expanding this expression one obtains its equivalent multilinear polynomial . Constraints (S4) ensure that the linearization of this expression is non-negative. The set of all such constraints defines the polytope
In fact, polytope is the convex hull of all binary vectors corresponding to configurations :
Lemma (Convex hull).
Polytope equals the convex hull
From the convex hull representation there naturally follows an equivalence of identity inequalities before and after linearization:
Lemma (Identity inequality (S5)).
Let be the linearization of . Then for all iff for all .
In particular, for there holds for , a relation which is rather difficult to prove directly form (37). Finally, for our construction the next two results are necessary.
Theorem 2.4 (Lemma 2 of (Sherali and Adams, 1990)).
If and unary components are integer (, equal to some ) for all , then there holds for all .
For there holds , where the product is component-wise.
When applying the linearization to all hyperedges simultaneously, a variable is introduced only once for (overlapping) hyperedges . All local properties described above continue to hold for each hyperedge individually but of course they need not hold for the whole set .
2.7 Solution via Linear Program Formulation
where are appropriate constants not depending on (detailed in §A.3). Because for there holds irrespectively of (label is always mapped to itself) we may assume that as well as all products involving it.
The relaxation of Sherali and Adams is applied as follows. Let us denote and respectively . We substitute new variables in place of products in (39). For zero products, , for , we let . From now on, let denote the vector of relaxed variables
New variables must satisfy the following constraints, defining a polytope : equationparentequation