 # A Class of Linear Programs Solvable by Coordinate-wise Minimization

Coordinate-wise minimization is a simple popular method for large-scale optimization. Unfortunately, for general (non-differentiable) convex problems it may not find global minima. We present a class of linear programs that coordinate-wise minimization solves exactly. We show that dual LP relaxations of several well-known combinatorial optimization problems are in this class and the method finds a global minimum with sufficient accuracy in reasonable runtimes. Moreover, for extensions of these problems that no longer are in this class the method yields reasonably good suboptima. Though the presented LP relaxations can be solved by more efficient methods (such as max-flow), our results are theoretically non-trivial and can lead to new large-scale optimization algorithms in the future.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Coordinate-wise minimization (or coordinate descent) is an iterative optimization method, which in every iteration finds a global minimum of the problem over a single variable, while keeping the other variables fixed111In this paper we consider only exact updates, where in every iteration the global minimum over a variable is found.. For general convex optimization problems, the method need not converge and its fixed points need not be global minima. A simple example is the unconstrained minimization of the function , which is unbounded but any point with is a coordinate-wise local minimum. For some classes of objective functions the method is known to converge to a global minimum. It is trivial to show that for unconstrained minimization of a differentiable convex function without constraints, any fixed point of the method is a global minimum. Moreover, if the function has unique univariate minima, then any limit point of the method is a global minimum [4, §2.7]. The same properties hold for convex functions whose non-differentiable part is separable . A natural extension of the method is block-coordinate minimization, where every iteration minimizes the objective over a block of variables.

Despite missing guarantees of global optimality, (block-)coordinate minimization can be a method of choice for large-scale convex optimization problems. A notable example is the class of convergent message passing methods for solving dual linear programming (LP) relaxation of max-posterior (MAP) inference in graphical models, which can be seen as various forms of (block-)coordinate minimization applied to various forms of the dual. In the typical case, the dual LP relaxation boils down to the unconstrained minimization of a convex piece-wise affine (hence non-differentiable) function. These methods include max-sum diffusion [20, 26, 30], TRW-S , MPLP , and SRMP 

. They do not guarantee global optimality but for large sparse instances from, e.g., computer vision the achieved coordinate-wise local optima are very good and TRW-S is significantly faster than competing methods

[27, 15], including popular first-order primal-dual methods such as ADMM  or .

This is a motivation to look for other classes of convex optimization problems for which (block-)coordinate descent would work well or, alternatively, to extend convergent message passing methods to a wider class of convex problems than the dual LP relaxation of MAP inference. A step in this direction is the work , where it was observed that if the minimizer of the objective function over the current variable block is not unique, one should choose a minimizer that lies in the relative interior of the set of block-optimizers. It is shown that any update satisfying this condition is, in a precise sense, not worse than any other exact update.

To be precise, suppose we minimize a convex function on a closed convex set . We assume that  is bounded from below on . For brevity of formulation, we rephrase this as the minimization of the extended-valued function such that for and for . One iteration of coordinate minimization with the relative interior rule  chooses a variable index

and replaces an estimate

with a new estimate such that222In , the iteration is formulated in a more abstract (coordinate-free) notation. Since we focus only on coordinate-wise minimization here, we use a more concrete notation.

 xk+1i ∈riargminy∈R¯f(xk1,…,xki−1,y,xki+1,…,xkn), xk+1j =xkj∀j≠i,

where denotes the relative interior of a convex set . As this is a univariate convex problem, the set is either a singleton or an interval. In the latter case, the relative interior rule requires that we choose  from the interior of this interval. A point that satisfies

 xi∈riargminy∈R¯f(x1,…,xi−1,y,xi+1,…,xn)

for all is called a (coordinate-wise) interior local minimum of function  on set .

It is natural to ask for which convex problems interior local minima are global minima333We neglect convergence issues in this paper and assume that the method converges to an interior local minimum. This is supported by experiments, e.g., max-sum diffusion and TRW-S have this property. More on convergence can be found in .. A succinct characterization of this class is elusive. Two subclasses of this class are known though [18, 26, 30]: the dual LP relaxation of MAP inference with pairwise potential functions and two labels, or with submodular potential functions. In this paper, we restrict ourselves to linear programs (where  is linear and  is a convex polyhedron) and present a different class of linear programs with this property. We show that dual LP relaxations of a number of combinatorial optimization problems belong to this class and coordinate-wise minimization converges in reasonable time on large practical instances. We must note, however, that there exist more efficient large-scale algorithms for solving these LP relaxations (such as reduction to max-flow), which makes the practical impact of our study limited so far.

## 2 Reformulations of Problems

Before presenting our main result, we make an important remark: while a convex optimization problem can be reformulated in many ways to an ‘equivalent’ problem which has the same global minima, not all of these transformations are equivalent with respect to coordinate-wise minimization, in particular, not all preserve interior local minima.

One example is dualization. If coordinate-wise minimization achieves good local (or even global) minima on a convex problem, it can get stuck in very poor local minima in the dual. Indeed, trying to apply (block-)coordinate minimization to the primal LP relaxation of MAP inference (linear optimization over the local marginal polytope) has been futile so far.

###### Example 1

Consider the linear program

 min{x1+x2∣x1,x2≥0}, (1)

which has one interior local minimum with respect to individual coordinates that also corresponds to the unique global optimum. But if one adds a redundant constraint, namely , then any feasible point will become an interior local minimum w.r.t. individual coordinates, because the redundant constraint blocks changing the variable without changing for both .

###### Example 2

Consider the linear program

 min m∑j=1zj (2a) zj ≥aTijx+bij ∀ i∈[n],j∈[m] (2b) z ∈Rm,x∈Rp (2c)

which can be also formulated as

 min m∑j=1nmaxi=1(aTijx+bij) (3a) x ∈Rp. (3b)

Optimizing over the individual variables by coordinate-wise minimization in (2) does not yield the same interior local optima as in (3). For instance, assume that , and the problem (3) is given as

 min(max{x,0}+max{−x,−1}+max{−x,−2}), (4)

where . Then, when optimizing directly in form (4), one can see that all the interior local optima are global optimizers.

However, when one introduces the variables and applies coordinate-wise minimization on the corresponding problem (2), then there are interior local optima that are not global optimizers, for example , which is an interior local optimum, but is not a global optimum.

On the other hand, optimizing over blocks of variables for each in case (2) is equivalent to optimization over individual in formulation (3).

## 3 Main Result

The optimization problem with which we are going to deal is in its most general form defined as

 min (m∑i=1max{wi−φi,0}+aTφ+bTλ+p∑j=1max{vj+AT:jφ+BT:jλ,0}) (5a) φ––i≤φi≤¯¯¯¯φi∀i∈[m] (5b) λ––i≤λi≤¯¯¯λi∀i∈[n], (5c)

where , , , , , , , , (assuming and ). We optimize over variables and . and denotes the -th column and -th row of , respectively.

Applying coordinate-wise minimization with relative-interior rule on the problem (5) corresponds to cyclic updates of variables, where each update corresponds to finding the region of optima of a convex piecewise-affine function of one variable on an interval. If the set of optimizers is a singleton, then the update is straightforward. If the set of optimizers is a bounded interval , the variable is assigned the middle value from this interval, i.e. . If the set of optima is unbounded, i.e. , then we set the variable to the value , where is a fixed constant. In case of , the variable is updated to . The details for the update in this setting are in Appendix 0.A.

###### Theorem 3.1

Any interior local optimum of (5) w.r.t. individual coordinates is its global optimum if

• matrices contain only values from the set and contain at most two non-zero elements per row

• contains only elements from the set

• vector contains only elements from the set .

In order to prove this claim, we formulate problem (5) as a linear program by introducing additional variables and and construct its dual. The proof of optimality is then obtained by constructing a dual feasible solution that satisfies complementary slackness.

The primal linear program (with corresponding dual variables and constraints on the same lines) reads

 min∑i∈[m]αi+∑i∈[p] βi+aTφ+bTλ maxf(z,y,s,r, q,x) (6a) βj−AT:jφ−BT:jλ ≥vj xj ≥0 ∀ j∈[p] (6b) αi+φi ≥wi si ≥0 ∀ i∈[m] (6c) φi ≥φ––i yi ≥0 ∀ i∈[m] (6d) φi ≤¯¯¯¯φi zi ≤0 ∀ i∈[m] (6e) λi ≥λ––i qi ≥0 ∀ i∈[n] (6f) λi ≤¯¯¯λi ri ≤0 ∀ i∈[n] (6g) φi ∈R si+zi+yi−ATi:x =ai ∀ i∈[m] (6h) λi ∈R ri+qi−BTi:x =bi ∀ i∈[n] (6i) βj ≥0 xj ≤1 ∀ j∈[p] (6j) αi ≥0 si ≤1 ∀ i∈[m], (6k)

where the dual criterion is

 f(z,y,s,r,q,x)=¯¯¯¯φTz+φ––Ty+wTs+¯¯¯λTr+λ––Tq+vTx (7)

and clearly, at optimum of the primal, we have

 αi =max{wi−φi,0} ∀i∈[m] (8a) βj =max{vj+AT:jφ+BT:jλ,0} ∀j∈[p]. (8b)

The variables were eliminated from the primal formulation (6) to obtain (5) due to similar reasoning as in Example 2. We also remark that setting (resp. , , ) results in (resp. , , ).

Even though the primal-dual pair (6) might seem overcomplicated, such general description is in fact necessary because as described in Section 2, equivalent reformulations may not preserve the structure of interior local minima and we would like to describe as general class, where optimality is guaranteed, as possible.

###### Example 3

To give the reader better insight into the problems (6), we present a simplification based on omitting the matrix (i.e. ) and setting , , which results in and variables become slack variables in (6i). The primal-dual pair in this case then simplifies to

 min∑i∈[p] βi+bTλ max vTx (9a) βj−BT:jλ ≥vj xj ≥0 ∀ j∈[p] (9b) βj ≥0 xj ≤1 ∀ j∈[p] (9c) λi ≥0 −BTi:x ≤bi ∀ i∈[n]. (9d)
###### Theorem 3.2

For a problem (5) satisfying conditions of Theorem 3.1 and a given interior local minimum , the values

 xj =⎧⎪ ⎪⎨⎪ ⎪⎩0if AT:jφ+BT:jλ+vj<012if AT:jφ+BT:jλ+vj=01if AT:jφ+BT:jλ+vj>0 si =⎧⎪⎨⎪⎩1if wi>φi0if wi<φih[0,1](ai+ATi:x)if wi=φi ri ={0if λi<¯¯¯λihR−0(bi+BTi:x)if λi=¯¯¯λc zi ={0if φi<¯¯¯¯φihR−0(ai+ATi:x−si)if φi=¯¯¯¯φi qi ={0if λi>λ––ihR+0(bi+BTi:x)if λi=λ––i yi ={0if φi>φ––ihR+0(ai+ATi:x−si)if φi=φ––i

are feasible for the dual (6) and satisfy complementary slackness with primal (6), where the remaining variables of the primal are given by (8).

It can be immediately seen that all the constraints of dual (6) are satisfied except for (6h) and (6i), which require a more involved analysis. The complete proof of Theorem 3.2 is technical (based on verifying many different cases) and given in Appendix 0.B.

## 4 Applications

Here we show that several LP relaxations of combinatorial problems correspond to the form (5) or to the dual (6) and discuss which additional constraints correspond to the assumptions of Theorem 3.1.

### 4.1 Weighted Partial Max-SAT

In weighted partial Max-SAT, one is given two sets of clauses, soft and hard. Each soft clause is assigned a positive weight. The task is to find values of binary variables

, such that all the hard clauses are satisfied and the sum of weights of the satisfied soft clauses is maximized.

We organize the soft clauses into a matrix defined as

 Sci=⎧⎨⎩1if literal xi is present in soft clause % c−1if literal ¬xi is present in soft clause c0otherwise ,

In addition, we denote to be the number of negated variables in clause . These numbers are stacked in a vector . The  hard clauses are organized in a matrix and a vector in the same manner.

The LP relaxation of this problem reads

 max ∑c∈[m]wcsc (11a) sc ≤STc:x+nSc ∀ c∈[m] (11b) HTc:x+nHc ≥1 ∀ c∈[h] (11c) xi ∈[0,1] ∀ i∈[p] (11d) sc ∈[0,1] ∀ c∈[m], (11e)

where are the weights of the soft clauses . This is a sub-class of the dual (6), where , , , , ( are therefore slack variables for the dual constraint (6h) that correspond to (11b)), (therefore ), (therefore ), ( are slack variables for the dual constraint (6i) that correspond to (11c)), .

Formulation (11) satisfies the conditions of Theorem 3.1 if each of the clauses has length at most 2. In other words, optimality is guaranteed for weighted partial Max-2SAT.

Also notice that if we omitted the soft clauses (11b) and instead set , we would obtain an instance of Min-Ones SAT, which could be generalized to weighted Min-Ones SAT. This relaxation would still satisfy the requirements of Theorem 3.1 if all the present hard clauses have length at most 2.

#### 4.1.1 Results

We tested the method on 800 smallest444Smallest in the sense of the file size. All instances could not have been evaluated due to their size and lengthy evaluation. instances that appeared in Max-SAT Evaluations  in years 2017  and 2018 . The results on the instances are divided into groups in Table 1 based on the minimal and maximal length of present clauses. We have also tested this approach on 60 instances of weighted Max-2SAT from Ke Xu . The highest number of logical variables in an instance was 19034 and the highest overall number of clauses in an instance was 31450. It was important to separate the instances without unit clauses (i.e. clauses of length 1), because in such cases the LP relaxation (11) has a trivial optimal solution with for all .

Coordinate-wise minimization was stopped when the criterion did not improve by at least after a whole cycle of updates for all variables. We report the quality of the solution as the median and mean relative difference between the optimal criterion and the criterion reached by coordinate-wise minimization before termination.

Table 1 reports not only instances of weighted partial Max-2SAT but also instances with longer clauses, where optimality is no longer guaranteed. Nevertheless, the relative differences on instances with longer clauses still seem not too large and could be usable as bounds in a branch-and-bound scheme.

### 4.2 Weighted Vertex Cover

Dual (6) also subsumes555It is only necessary to transform minimization to maximization of negated objective in (12). the LP relaxation of weighted vertex cover, which reads

 min{∑i∈Vvixi∣∣xi+xj≥1,∀{i,j}∈E,xi∈[0,1],∀i∈V} (12)

where is the set of nodes and is the set of edges of an undirected graph. This problem also satisfies the conditions of Theorem 3.1 and therefore the corresponding primal (5) will have no non-optimal interior local minima.

On the other hand, notice that formulation (12), which corresponds to dual (6) can have non-optimal interior local minima even with respect to all subsets of variables of size , an example is given in Appendix 0.C.

We reported the experiments on weighted vertex cover in an unpublished text  where the optimality was not proven yet. In addition, the update designed in  ad hoc becomes just a special case of our general update here.

### 4.3 Minimum st-Cut, Maximum Flow

Recall from  the usual formulation of max-flow problem between nodes and on a directed graph with vertex set , edge set and positive edge weights for each , which reads

 max∑(s,i)∈E fsi (13a) 0≤fij ≤wij ∀ (i,j)∈E (13b) ∑(u,i)∈Efui−∑(j,u)∈Efju =0 ∀ u∈V−{s,t}. (13c)

Assume that there is no edge , there are no ingoing edges to and no outgoing edges from , then any feasible value of in (13) is an interior local optimum w.r.t. individual coordinates by the same reasoning as in Example 1 due to the flow conservation constraint (13c), which limits each individual variable to a single value. We are going to propose a formulation which has no non-globally optimal interior local optima.

The dual problem to (13) is the minimum -cut problem, which can be formulated as

 max ∑(i,j)∈Ewijyij (14a) yij ≤1−xi+xj ∀ (i,j)∈E,i≠s,j≠t (14b) ysj ≤xj ∀ (s,j)∈E (14c) yit ≤1−xi ∀ (i,t)∈E (14d) yij ∈[0,1] ∀ (i,j)∈E, (14e) xi ∈[0,1] ∀ i∈V−{s,t}, (14f)

where if edge is in the cut and if edge is not in the cut. The cut should separate and , so the set of nodes connected to after the cut will be denoted by and is the set of nodes connected to . Using this notation, . Formulation (14) is different from the usual formulation by replacing the variables by , therefore we also maximize the weight of the not cut edges instead of minimizing the weight of the cut edges, therefore if the optimal value of (14) is , then the value of the minimum -cut equals .

Formulation (14) is subsumed by the dual (6) by setting , and omitting the matrix. Also notice that each variable occurs in at most one constraint. The problem (14) therefore satisfies the conditions of Theorem 3.1 and the corresponding primal (5) is a formulation of the maximum flow problem, in which one can search for the maximum flow by coordinate-wise minimization. The corresponding formulation (5) reads

 min (∑(i,j)∈Emax{wij−φij,0}+∑(i,j)∈E,i≠sφij+ +∑i∈V−{s,t}max{∑(j,i)∈Eφji−∑(i,j)∈Eφij,0}) (15a) φij≥0∀(i,j)∈E. (15b)

#### 4.3.1 Results

We have tested our formulation for coordinate-wise minimization on max-flow instances666Available at https://vision.cs.uwaterloo.ca/data/maxflow. from computer vision. We report the same statistics as with Max-SAT in Table 2, the instances corresponded to stereo problems, multiview reconstruction instances and shape fitting problems.

For multiview reconstruction and shape fitting, we were able to run our algorithm only on small instances, which have approximately between and nodes and between and edges. On these instances, the algorithm terminated with the reported precision in 13 to 34 minutes on a laptop.

### 4.4 MAP Inference with Potts Potentials

Coordinate-wise minimization for the dual LP relaxation of MAP inference was intensively studied, see e.g. the review . One of the formulations is

 min∑i∈Vmaxk∈Kθδi(k)+∑{i,j}∈Emaxk,l∈Kθδij(k,l) (16a) δij(k)∈R∀{i,j}∈E,k∈K, (16b)

where is the set of labels, is the set of nodes and is the set of unoriented edges and

 θδi(k) =θi(k)−∑j∈Niδij(k) (17a) θδij(k,l) =θij(k,l)+δij(k)+δji(l) (17b)

are equivalent transformations of the potentials. Notice that there are variables, i.e. two for each direction of an edge. In , it is mentioned that in case of Potts interactions, which are given as , one can add constraints

 δij(k)+δji(k) =0 ∀ {i,j}∈E,k∈K (18a) −12≤δij(k) ≤12 ∀ {i,j}∈E,k∈K (18b)

to (16) without changing the optimal objective. One can therefore use constraint (18a) to reduce the overall amount of variables by defining

 λij(k)=−δij(k)=δji(k) (19)

subject to . The decision of whether or should have the inverted sign depends on the chosen orientation of the originally undirected edges and is arbitrary. Also, given values satisfying (18), it holds for any edge and pair of labels that , which can be seen from the properties of the Potts interactions.

Therefore, one can reformulate (16) into

 min∑i∈V maxk∈Kθλi(k) (20a) −12≤λij(k) ≤12∀(i,j)∈E′,k∈K, (20b)

where the equivalent transformation in variables is given by

 θλi(k)=θi(k)+∑(i,j)∈E′λij(k)−∑(j,i)∈E′λji(k) (21)

and we optimize over variables , the graph is the same as graph except that each edge becomes oriented (in arbitrary direction). The way of obtaining an optimal solution to (16) from an optimal solution of (20) is given by (19) and depends on the chosen orientation of the edges in . Also observe that for any node and label and therefore the optimal values will be equal. This reformulation therefore maps global optima of (20) to global optima of (16). However, it does not map interior local minima of (20) to interior local minima of (16) when , an example of such case is shown in Appendix 0.D.

In problems with two labels (), problem (20) is subsumed by (5) and satisfies the conditions imposed by Theorem 3.1 because one can rewrite the criterion by observing that

 maxk∈{1,2}θλi(k)=max{θλi(1)−θλi(2),0}+θλi(2) (22)

and each is present only in and . Thus, will have non-zero coefficient in the matrix only on columns and . The coefficients of the variables in the criterion are only and the other conditions are straightforward.

We reported the experiments on the Potts problem in  where the optimality was not proven yet. In addition, the update designed in  ad hoc becomes just a special case of our general update here.

### 4.5 Binarized Monotone Linear Programs

In , integer linear programs with at most two variables per constraint were discussed. It was also allowed to have 3 variables in some constraints if one of the variables occurred only in this constraint and in the objective function. Although the objective function in 

was allowed to be more general, we will restrict ourselves to linear criterion function. It was also shown that such problems can be transformed into binarized monotone constraints over binary variables by introducing additional variables whose amount is defined by the bounds of the original variables, such optimization problem reads

 minwTx +eTz (23a) Ax−Iz ≤0 (23b) Cx ≤0 (23c) x ∈{0,1}n1 (23d) z ∈{0,1}n2, (23e)

where contain exactly one per row and exactly one per row and all other entries are zero,

is the identity matrix. We refer the reader to

 for details, where it is also explained that the LP relaxation of (23) can be solved by min--cut on an associated graph. We can notice that the LP relaxation of (23) is subsumed by the dual (6), because one can change the minimization into maximization by changing the signs in . Also, the relaxation satisfies the conditions given by Theorem 3.1.

In the paper , there are listed many problems which are transformable to (23) and are also directly (without any complicated transformation) subsumed by the dual (6) and satisfy Theorem 3.1, for example, minimizing the sum of weighted completion times of precedence-constrained jobs (ISLO formulation in ), generalized independent set (forest harvesting problem in ), generalized vertex cover , clique problem , Min-SAT (introduced in , LP formulation in ).

For each of these problems, it is easy to verify the conditions of Theorem 3.1, because they contain at most two variables per constraint and if a constraint contains a third variable, then it is the only occurrence of this variable and the coefficients of the variables in the constraints are from the set .

The transformation presented in  can be applied to partial Max-SAT and vertex cover to obtain a problem in the form (23) and solve its LP relaxation. But this step is unnecessary when applying the presented coordinate-wise minimization approach.

## 5 Concluding Remarks

We have presented a new class of linear programs that are exactly solved by coordinate-wise minimization. We have shown that dual LP relaxations of several well-known combinatorial optimization problems (partial Max-2SAT, vertex cover, minimum -cut, MAP inference with Potts potentials and two labels, and other problems) belong, possibly after a reformulation, to this class. We have shown experimentally (in this paper and in ) that the resulting methods are reasonably efficient for large-scale instances of these problems. When the assumptions of Theorem 3.1 are relaxed (e.g., general Max-SAT instead of Max-2SAT, or the Potts problem with any number of labels), the method experimentally still provides good local (though not global in general) minima.

We must admit, though, that the practical impact of Theorem 3.1 is limited because the presented dual LP relaxations satisfying its assumptions can be efficiently solved also by other approaches. Thus, max-flow/min--cut can be solved (besides well-known combinatorial algorithms such as Ford-Fulkerson) by message-passing methods such as TRW-S. Similarly, the Potts problem with two labels is tractable and can be reduced to max-flow. In general, all considered LP relaxations can be reduced to max-flow, as noted in §4.5. Note, however, that this does not make our result trivial because (as noted in §2) equivalent reformulations of problems may not preserve interior local minima and thus message-passing methods are not equivalent in any obvious way to our method.

It is open whether there are practically interesting classes of linear programs that are solved exactly (or at least with constant approximation ratio) by (block-)coordinate minimization and are not solvable by known combinatorial algorithms such as max-flow. Another interesting question is which reformulations in general preserve interior local minima and which do not.

Our approach can pave the way to new efficient large-scale optimization methods in the future. Certain features of our results give us hope here. For instance, our approach has an important novel feature over message-passing methods: it applies to a constrained convex problem (the box constraints (5b) and (5c)). This can open the way to a new class of applications. Furthermore, updates along large variable blocks (which we have not explored) can speed algorithms considerably, e.g., TRW-S uses updates along subtrees of a graphical model, while max-sum diffusion uses updates along single variables.

## References

•  Anonymous Relative interior rule in block-coordinate minimization. Submitted to CVPR 2020. Cited by: §4.2, §4.4, §5.
•  C. Ansotegui, F. Bacchus, M. Järvisalo, R. Martins, et al. (2017) MaxSAT Evaluation 2017. Cited by: §4.1.1.
•  F. Bacchus, M. J. Järvisalo, R. Martins, et al. (2018) MaxSAT Evaluation 2018. Cited by: §4.1.1.
•  D. P. Bertsekas (1999) Nonlinear programming. 2nd edition, Athena Scientific, Belmont, MA. Cited by: §1.
•  S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2011-01) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 (1), pp. 1–122. Cited by: §1.
•  Y. Boykov and V. S. Lempitsky (2006) From photohulls to photoflux optimization.. In BMVC, Vol. 3, pp. 27. Cited by: Table 2.
•  Y. Boykov, O. Veksler, and R. Zabih (1998) Markov random fields with efficient approximations. In

Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231)

,
pp. 648–655. Cited by: Table 2.
•  A. Chambolle and T. Pock (2011) A first-order primal-dual algorithm for convex problems with applications to imaging.. J. of Math. Imaging and Vision 40 (1), pp. 120–145. Cited by: §1.
•  F. A. Chudak and D. S. Hochbaum (1999) A half-integral linear programming relaxation for scheduling precedence-constrained jobs on a single machine. Operations Research Letters 25 (5), pp. 199–204. Cited by: §4.5.
•  D. Fulkerson and L. Ford (1962) Flows in networks. Princeton University Press. Cited by: §4.3.
•  A. Globerson and T. Jaakkola (2008) Fixing max-product: convergent message passing algorithms for MAP LP-relaxations. In Neural Information Processing Systems, pp. 553–560. Cited by: §1.
•  D. S. Hochbaum and A. Pathria (1997) Forest harvesting and minimum cuts: a new approach to handling spatial constraints. Forest Science 43 (4), pp. 544–554. Cited by: §4.5.
•  D. S. Hochbaum and A. Pathria (2000) Approximating a generalization of MAX 2SAT and MIN 2SAT. Discrete Applied Mathematics 107 (1-3), pp. 41–59. Cited by: §4.5.
•  D. S. Hochbaum (2002) Solving integer programs over monotone inequalities in three variables: a framework for half integrality and good approximations. European Journal of Operational Research 140 (2), pp. 291–321. Cited by: §4.5, §4.5, §4.5.
•  J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kröger, J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother (2015) A comparative study of modern inference techniques for structured discrete energy minimization problems. Intl. J. of Computer Vision 115 (2), pp. 155–184. External Links: ISSN 1573-1405 Cited by: §1.
•  R. Kohli, R. Krishnamurti, and P. Mirchandani (1994) The minimum satisfiability problem. SIAM Journal on Discrete Mathematics 7 (2), pp. 275–283. Cited by: §4.5.
•  V. Kolmogorov and R. Zabih (2001) Computing visual correspondence with occlusions via graph cuts. Technical report Cornell University. Cited by: Table 2.
•  V. Kolmogorov (2006) Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Analysis and Machine Intelligence 28 (10), pp. 1568–1583. Cited by: §1, §1.
•  V. Kolmogorov (2015-05) A new look at reweighted message passing. IEEE Trans. on Pattern Analysis and Machine Intelligence 37 (5). Cited by: §1.
•  V. A. Kovalevsky and V. K. Koval (approx. 1975) A diffusion algorithm for decreasing the energy of the max-sum labeling problem. Note: Glushkov Institute of Cybernetics, Kiev, USSR. Unpublished Cited by: §1.
•  V. Lempitsky, Y. Boykov, and D. Ivanov (2006) Oriented visibility for multiview reconstruction. In European Conference on Computer Vision, pp. 226–238. Cited by: Table 2.
•  V. Lempitsky and Y. Boykov (2007) Global optimization for shape fitting. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: Table 2.
•  (To Appear.) MaxSAT Evaluation 2018: new developments and detailed results. Journal on Satisfiability, Boolean Modeling and Computation. Note: Instances available at https://maxsat-evaluations.github.io/. Cited by: §4.1.1, Table 1.
•  D. Průša and T. Werner (2017) LP relaxation of the Potts labeling problem is as hard as any linear program. IEEE Trans. Pattern Anal. Mach. Intell. 39 (7), pp. 1469–1475. Cited by: §4.4.
•  D. Scharstein and R. Szeliski (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision 47 (1-3), pp. 7–42. Cited by: Table 2.
•  M. I. Schlesinger and K. Antoniuk (2011) Diffusion algorithms and structural recognition optimization problems. Cybernetics and Systems Analysis 47, pp. 175–192. External Links: ISSN 1060-0396 Cited by: §1, §1.
•  R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother (2008) A comparative study of energy minimization methods for markov random fields with smoothness-based priors. IEEE Trans. on Pattern Analysis and Machine Intelligence 30 (6), pp. 1068–1080. Cited by: §1.
•  P. Tseng (2001-06) Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 (3), pp. 475–494. Cited by: §1.
•  T. Werner and D. Průša (2019-10) Relative interior rule in block-coordinate minimization. External Links: 1910.09488 Cited by: §1, §1, footnote 2, footnote 3.
•  T. Werner (2007-07) A linear programming approach to max-sum problem: a review. IEEE Trans. Pattern Analysis and Machine Intelligence 29 (7), pp. 1165–1179. Cited by: §1, §1, §4.4.
•  K. Xu and W. Li (2003)

Many hard examples in exact phase transitions with application to generating hard satisfiable instances

.
arXiv preprint cs/0302001. Note: Instances available at http://sites.nlsde.buaa.edu.cn/~kexu/benchmarks/max-sat-benchmarks.htm. Cited by: §4.1.1, Table 1.

## Appendix 0.A Details on Coordinate-wise Updates

We now describe coordinate-wise minimization for problem (5), satisfying the relative interior rule. Objective function (5a) restricted to a single variable for chosen reads (up to a constant)

 max{wi−φi,0}+∑i∈[p]Aij≠0max{Aijφc+kij,0}+acφc, (24)

where

 kij=vj+∑i′∈[m]i′≠iAi′jφi′+BT:jλ. (25)

This is a convex piecewise-affine function of . Its breakpoints are and for each . To find its minimum subject to , it is enough to consider the cases listed below.

1. If function (24) is strictly decreasing and is finite, then is the unique minimum.

2. If function (24) is strictly increasing and is finite, then is the unique minimum.

3. If function (24) has an (possibly unbounded) interval , where , as its set of minimizers, then the set of minimizers subject to is the projection of onto , i.e. an interval .777We define to be the projection of onto the interval . The projection onto unbounded intervals and is defined similarly and is denoted by and for brevity.

In order to perform an update to the relative interior of optimizers, we can simply set in the first case, in the second case. For the third case, the update to the relative interior corresponds to setting e.g. to some value from . In our implementation, we choose the midpoint of this interval if it is bounded. If it is unbounded in some direction, we choose a value in a fixed distance from its finite bound.

To identify which case occurred, one should analyse the slopes of the function between its breakpoints and the region of optima corresponds to the interval where the function (24) is constant. If there is no such interval, then its (unrestricted) minimum is at a breakpoint where the function changes from decreasing to increasing or the function is strictly monotone.

In other cases, function (24) is unbounded and therefore also the original problem (5) is unbounded.

Objective function (5a) restricted to a single variable reads (up to a constant)

 ∑i∈[p]Bij≠0max{Bijλi+lij,0}+biλi, (26)

where

 lij=vj+AT:jφ+∑i′∈[n]i′≠iBi′jλc′. (27)

To find the minimum of this function subject to , one can apply the same procedure as with , except that the breakpoints will be only for each .

## Appendix 0.B Proof of Theorem 3.2

###### Observation 0.B.1

For a given , with , fixed value of and the corresponding breakpoint of the restricted criterion (24), the value determined by (10) satisfies

 xj=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩12(1+Aij)if φi>b12if φi=b12(1−Aij)if φi
###### Observation 0.B.2

For a given , with , fixed value of and the corresponding breakpoint