Coordinate-wise minimization (or coordinate descent) is an iterative optimization method, which in every iteration finds a global minimum of the problem over a single variable, while keeping the other variables fixed111In this paper we consider only exact updates, where in every iteration the global minimum over a variable is found.. For general convex optimization problems, the method need not converge and its fixed points need not be global minima. A simple example is the unconstrained minimization of the function , which is unbounded but any point with is a coordinate-wise local minimum. For some classes of objective functions the method is known to converge to a global minimum. It is trivial to show that for unconstrained minimization of a differentiable convex function without constraints, any fixed point of the method is a global minimum. Moreover, if the function has unique univariate minima, then any limit point of the method is a global minimum [4, §2.7]. The same properties hold for convex functions whose non-differentiable part is separable . A natural extension of the method is block-coordinate minimization, where every iteration minimizes the objective over a block of variables.
Despite missing guarantees of global optimality, (block-)coordinate minimization can be a method of choice for large-scale convex optimization problems. A notable example is the class of convergent message passing methods for solving dual linear programming (LP) relaxation of max-posterior (MAP) inference in graphical models, which can be seen as various forms of (block-)coordinate minimization applied to various forms of the dual. In the typical case, the dual LP relaxation boils down to the unconstrained minimization of a convex piece-wise affine (hence non-differentiable) function. These methods include max-sum diffusion [20, 26, 30], TRW-S , MPLP , and SRMP 
. They do not guarantee global optimality but for large sparse instances from, e.g., computer vision the achieved coordinate-wise local optima are very good and TRW-S is significantly faster than competing methods[27, 15], including popular first-order primal-dual methods such as ADMM  or .
This is a motivation to look for other classes of convex optimization problems for which (block-)coordinate descent would work well or, alternatively, to extend convergent message passing methods to a wider class of convex problems than the dual LP relaxation of MAP inference. A step in this direction is the work , where it was observed that if the minimizer of the objective function over the current variable block is not unique, one should choose a minimizer that lies in the relative interior of the set of block-optimizers. It is shown that any update satisfying this condition is, in a precise sense, not worse than any other exact update.
To be precise, suppose we minimize a convex function on a closed convex set . We assume that is bounded from below on . For brevity of formulation, we rephrase this as the minimization of the extended-valued function such that for and for . One iteration of coordinate minimization with the relative interior rule  chooses a variable index
and replaces an estimatewith a new estimate such that222In , the iteration is formulated in a more abstract (coordinate-free) notation. Since we focus only on coordinate-wise minimization here, we use a more concrete notation.
where denotes the relative interior of a convex set . As this is a univariate convex problem, the set is either a singleton or an interval. In the latter case, the relative interior rule requires that we choose from the interior of this interval. A point that satisfies
for all is called a (coordinate-wise) interior local minimum of function on set .
It is natural to ask for which convex problems interior local minima are global minima333We neglect convergence issues in this paper and assume that the method converges to an interior local minimum. This is supported by experiments, e.g., max-sum diffusion and TRW-S have this property. More on convergence can be found in .. A succinct characterization of this class is elusive. Two subclasses of this class are known though [18, 26, 30]: the dual LP relaxation of MAP inference with pairwise potential functions and two labels, or with submodular potential functions. In this paper, we restrict ourselves to linear programs (where is linear and is a convex polyhedron) and present a different class of linear programs with this property. We show that dual LP relaxations of a number of combinatorial optimization problems belong to this class and coordinate-wise minimization converges in reasonable time on large practical instances. We must note, however, that there exist more efficient large-scale algorithms for solving these LP relaxations (such as reduction to max-flow), which makes the practical impact of our study limited so far.
2 Reformulations of Problems
Before presenting our main result, we make an important remark: while a convex optimization problem can be reformulated in many ways to an ‘equivalent’ problem which has the same global minima, not all of these transformations are equivalent with respect to coordinate-wise minimization, in particular, not all preserve interior local minima.
One example is dualization. If coordinate-wise minimization achieves good local (or even global) minima on a convex problem, it can get stuck in very poor local minima in the dual. Indeed, trying to apply (block-)coordinate minimization to the primal LP relaxation of MAP inference (linear optimization over the local marginal polytope) has been futile so far.
Consider the linear program
which has one interior local minimum with respect to individual coordinates that also corresponds to the unique global optimum. But if one adds a redundant constraint, namely , then any feasible point will become an interior local minimum w.r.t. individual coordinates, because the redundant constraint blocks changing the variable without changing for both .
Consider the linear program
which can be also formulated as
where . Then, when optimizing directly in form (4), one can see that all the interior local optima are global optimizers.
However, when one introduces the variables and applies coordinate-wise minimization on the corresponding problem (2), then there are interior local optima that are not global optimizers, for example , which is an interior local optimum, but is not a global optimum.
3 Main Result
The optimization problem with which we are going to deal is in its most general form defined as
where , , , , , , , , (assuming and ). We optimize over variables and . and denotes the -th column and -th row of , respectively.
Applying coordinate-wise minimization with relative-interior rule on the problem (5) corresponds to cyclic updates of variables, where each update corresponds to finding the region of optima of a convex piecewise-affine function of one variable on an interval. If the set of optimizers is a singleton, then the update is straightforward. If the set of optimizers is a bounded interval , the variable is assigned the middle value from this interval, i.e. . If the set of optima is unbounded, i.e. , then we set the variable to the value , where is a fixed constant. In case of , the variable is updated to . The details for the update in this setting are in Appendix 0.A.
In order to prove this claim, we formulate problem (5) as a linear program by introducing additional variables and and construct its dual. The proof of optimality is then obtained by constructing a dual feasible solution that satisfies complementary slackness.
The primal linear program (with corresponding dual variables and constraints on the same lines) reads
where the dual criterion is
and clearly, at optimum of the primal, we have
Even though the primal-dual pair (6) might seem overcomplicated, such general description is in fact necessary because as described in Section 2, equivalent reformulations may not preserve the structure of interior local minima and we would like to describe as general class, where optimality is guaranteed, as possible.
4.1 Weighted Partial Max-SAT
In weighted partial Max-SAT, one is given two sets of clauses, soft and hard. Each soft clause is assigned a positive weight. The task is to find values of binary variables, such that all the hard clauses are satisfied and the sum of weights of the satisfied soft clauses is maximized.
We organize the soft clauses into a matrix defined as
In addition, we denote to be the number of negated variables in clause . These numbers are stacked in a vector . The hard clauses are organized in a matrix and a vector in the same manner.
The LP relaxation of this problem reads
where are the weights of the soft clauses . This is a sub-class of the dual (6), where , , , , ( are therefore slack variables for the dual constraint (6h) that correspond to (11b)), (therefore ), (therefore ), ( are slack variables for the dual constraint (6i) that correspond to (11c)), .
Also notice that if we omitted the soft clauses (11b) and instead set , we would obtain an instance of Min-Ones SAT, which could be generalized to weighted Min-Ones SAT. This relaxation would still satisfy the requirements of Theorem 3.1 if all the present hard clauses have length at most 2.
We tested the method on 800 smallest444Smallest in the sense of the file size. All instances could not have been evaluated due to their size and lengthy evaluation. instances that appeared in Max-SAT Evaluations  in years 2017  and 2018 . The results on the instances are divided into groups in Table 1 based on the minimal and maximal length of present clauses. We have also tested this approach on 60 instances of weighted Max-2SAT from Ke Xu . The highest number of logical variables in an instance was 19034 and the highest overall number of clauses in an instance was 31450. It was important to separate the instances without unit clauses (i.e. clauses of length 1), because in such cases the LP relaxation (11) has a trivial optimal solution with for all .
Coordinate-wise minimization was stopped when the criterion did not improve by at least after a whole cycle of updates for all variables. We report the quality of the solution as the median and mean relative difference between the optimal criterion and the criterion reached by coordinate-wise minimization before termination.
Table 1 reports not only instances of weighted partial Max-2SAT but also instances with longer clauses, where optimality is no longer guaranteed. Nevertheless, the relative differences on instances with longer clauses still seem not too large and could be usable as bounds in a branch-and-bound scheme.
4.2 Weighted Vertex Cover
where is the set of nodes and is the set of edges of an undirected graph. This problem also satisfies the conditions of Theorem 3.1 and therefore the corresponding primal (5) will have no non-optimal interior local minima.
On the other hand, notice that formulation (12), which corresponds to dual (6) can have non-optimal interior local minima even with respect to all subsets of variables of size , an example is given in Appendix 0.C.
4.3 Minimum -Cut, Maximum Flow
Recall from  the usual formulation of max-flow problem between nodes and on a directed graph with vertex set , edge set and positive edge weights for each , which reads
Assume that there is no edge , there are no ingoing edges to and no outgoing edges from , then any feasible value of in (13) is an interior local optimum w.r.t. individual coordinates by the same reasoning as in Example 1 due to the flow conservation constraint (13c), which limits each individual variable to a single value. We are going to propose a formulation which has no non-globally optimal interior local optima.
The dual problem to (13) is the minimum -cut problem, which can be formulated as
where if edge is in the cut and if edge is not in the cut. The cut should separate and , so the set of nodes connected to after the cut will be denoted by and is the set of nodes connected to . Using this notation, . Formulation (14) is different from the usual formulation by replacing the variables by , therefore we also maximize the weight of the not cut edges instead of minimizing the weight of the cut edges, therefore if the optimal value of (14) is , then the value of the minimum -cut equals .
Formulation (14) is subsumed by the dual (6) by setting , and omitting the matrix. Also notice that each variable occurs in at most one constraint. The problem (14) therefore satisfies the conditions of Theorem 3.1 and the corresponding primal (5) is a formulation of the maximum flow problem, in which one can search for the maximum flow by coordinate-wise minimization. The corresponding formulation (5) reads
We have tested our formulation for coordinate-wise minimization on max-flow instances666Available at https://vision.cs.uwaterloo.ca/data/maxflow. from computer vision. We report the same statistics as with Max-SAT in Table 2, the instances corresponded to stereo problems, multiview reconstruction instances and shape fitting problems.
For multiview reconstruction and shape fitting, we were able to run our algorithm only on small instances, which have approximately between and nodes and between and edges. On these instances, the algorithm terminated with the reported precision in 13 to 34 minutes on a laptop.
|Instance Group or Instance||Results|
|Name||#inst.||Mean RD||Median RD|
|BVZ-sawtooth  ||20|
|BVZ-venus  ||22|
|KZ2-sawtooth  ||20|
|KZ2-venus  ||22|
4.4 MAP Inference with Potts Potentials
Coordinate-wise minimization for the dual LP relaxation of MAP inference was intensively studied, see e.g. the review . One of the formulations is
where is the set of labels, is the set of nodes and is the set of unoriented edges and
are equivalent transformations of the potentials. Notice that there are variables, i.e. two for each direction of an edge. In , it is mentioned that in case of Potts interactions, which are given as , one can add constraints
subject to . The decision of whether or should have the inverted sign depends on the chosen orientation of the originally undirected edges and is arbitrary. Also, given values satisfying (18), it holds for any edge and pair of labels that , which can be seen from the properties of the Potts interactions.
Therefore, one can reformulate (16) into
where the equivalent transformation in variables is given by
and we optimize over variables , the graph is the same as graph except that each edge becomes oriented (in arbitrary direction). The way of obtaining an optimal solution to (16) from an optimal solution of (20) is given by (19) and depends on the chosen orientation of the edges in . Also observe that for any node and label and therefore the optimal values will be equal. This reformulation therefore maps global optima of (20) to global optima of (16). However, it does not map interior local minima of (20) to interior local minima of (16) when , an example of such case is shown in Appendix 0.D.
and each is present only in and . Thus, will have non-zero coefficient in the matrix only on columns and . The coefficients of the variables in the criterion are only and the other conditions are straightforward.
4.5 Binarized Monotone Linear Programs
In , integer linear programs with at most two variables per constraint were discussed. It was also allowed to have 3 variables in some constraints if one of the variables occurred only in this constraint and in the objective function. Although the objective function in 
was allowed to be more general, we will restrict ourselves to linear criterion function. It was also shown that such problems can be transformed into binarized monotone constraints over binary variables by introducing additional variables whose amount is defined by the bounds of the original variables, such optimization problem reads
where contain exactly one per row and exactly one per row and all other entries are zero,
is the identity matrix. We refer the reader to for details, where it is also explained that the LP relaxation of (23) can be solved by min--cut on an associated graph. We can notice that the LP relaxation of (23) is subsumed by the dual (6), because one can change the minimization into maximization by changing the signs in . Also, the relaxation satisfies the conditions given by Theorem 3.1.
In the paper , there are listed many problems which are transformable to (23) and are also directly (without any complicated transformation) subsumed by the dual (6) and satisfy Theorem 3.1, for example, minimizing the sum of weighted completion times of precedence-constrained jobs (ISLO formulation in ), generalized independent set (forest harvesting problem in ), generalized vertex cover , clique problem , Min-SAT (introduced in , LP formulation in ).
For each of these problems, it is easy to verify the conditions of Theorem 3.1, because they contain at most two variables per constraint and if a constraint contains a third variable, then it is the only occurrence of this variable and the coefficients of the variables in the constraints are from the set .
5 Concluding Remarks
We have presented a new class of linear programs that are exactly solved by coordinate-wise minimization. We have shown that dual LP relaxations of several well-known combinatorial optimization problems (partial Max-2SAT, vertex cover, minimum -cut, MAP inference with Potts potentials and two labels, and other problems) belong, possibly after a reformulation, to this class. We have shown experimentally (in this paper and in ) that the resulting methods are reasonably efficient for large-scale instances of these problems. When the assumptions of Theorem 3.1 are relaxed (e.g., general Max-SAT instead of Max-2SAT, or the Potts problem with any number of labels), the method experimentally still provides good local (though not global in general) minima.
We must admit, though, that the practical impact of Theorem 3.1 is limited because the presented dual LP relaxations satisfying its assumptions can be efficiently solved also by other approaches. Thus, max-flow/min--cut can be solved (besides well-known combinatorial algorithms such as Ford-Fulkerson) by message-passing methods such as TRW-S. Similarly, the Potts problem with two labels is tractable and can be reduced to max-flow. In general, all considered LP relaxations can be reduced to max-flow, as noted in §4.5. Note, however, that this does not make our result trivial because (as noted in §2) equivalent reformulations of problems may not preserve interior local minima and thus message-passing methods are not equivalent in any obvious way to our method.
It is open whether there are practically interesting classes of linear programs that are solved exactly (or at least with constant approximation ratio) by (block-)coordinate minimization and are not solvable by known combinatorial algorithms such as max-flow. Another interesting question is which reformulations in general preserve interior local minima and which do not.
Our approach can pave the way to new efficient large-scale optimization methods in the future. Certain features of our results give us hope here. For instance, our approach has an important novel feature over message-passing methods: it applies to a constrained convex problem (the box constraints (5b) and (5c)). This can open the way to a new class of applications. Furthermore, updates along large variable blocks (which we have not explored) can speed algorithms considerably, e.g., TRW-S uses updates along subtrees of a graphical model, while max-sum diffusion uses updates along single variables.
-  Relative interior rule in block-coordinate minimization. Submitted to CVPR 2020. Cited by: §4.2, §4.4, §5.
-  (2017) MaxSAT Evaluation 2017. Cited by: §4.1.1.
-  (2018) MaxSAT Evaluation 2018. Cited by: §4.1.1.
-  (1999) Nonlinear programming. 2nd edition, Athena Scientific, Belmont, MA. Cited by: §1.
-  (2011-01) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 (1), pp. 1–122. Cited by: §1.
-  (2006) From photohulls to photoflux optimization.. In BMVC, Vol. 3, pp. 27. Cited by: Table 2.
Markov random fields with efficient approximations.
Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231), pp. 648–655. Cited by: Table 2.
-  (2011) A first-order primal-dual algorithm for convex problems with applications to imaging.. J. of Math. Imaging and Vision 40 (1), pp. 120–145. Cited by: §1.
-  (1999) A half-integral linear programming relaxation for scheduling precedence-constrained jobs on a single machine. Operations Research Letters 25 (5), pp. 199–204. Cited by: §4.5.
-  (1962) Flows in networks. Princeton University Press. Cited by: §4.3.
-  (2008) Fixing max-product: convergent message passing algorithms for MAP LP-relaxations. In Neural Information Processing Systems, pp. 553–560. Cited by: §1.
-  (1997) Forest harvesting and minimum cuts: a new approach to handling spatial constraints. Forest Science 43 (4), pp. 544–554. Cited by: §4.5.
-  (2000) Approximating a generalization of MAX 2SAT and MIN 2SAT. Discrete Applied Mathematics 107 (1-3), pp. 41–59. Cited by: §4.5.
-  (2002) Solving integer programs over monotone inequalities in three variables: a framework for half integrality and good approximations. European Journal of Operational Research 140 (2), pp. 291–321. Cited by: §4.5, §4.5, §4.5.
-  (2015) A comparative study of modern inference techniques for structured discrete energy minimization problems. Intl. J. of Computer Vision 115 (2), pp. 155–184. External Links: Cited by: §1.
-  (1994) The minimum satisfiability problem. SIAM Journal on Discrete Mathematics 7 (2), pp. 275–283. Cited by: §4.5.
-  (2001) Computing visual correspondence with occlusions via graph cuts. Technical report Cornell University. Cited by: Table 2.
-  (2006) Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Analysis and Machine Intelligence 28 (10), pp. 1568–1583. Cited by: §1, §1.
-  (2015-05) A new look at reweighted message passing. IEEE Trans. on Pattern Analysis and Machine Intelligence 37 (5). Cited by: §1.
-  (approx. 1975) A diffusion algorithm for decreasing the energy of the max-sum labeling problem. Note: Glushkov Institute of Cybernetics, Kiev, USSR. Unpublished Cited by: §1.
-  (2006) Oriented visibility for multiview reconstruction. In European Conference on Computer Vision, pp. 226–238. Cited by: Table 2.
-  (2007) Global optimization for shape fitting. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: Table 2.
-  (To Appear.) MaxSAT Evaluation 2018: new developments and detailed results. Journal on Satisfiability, Boolean Modeling and Computation. Note: Instances available at https://maxsat-evaluations.github.io/. Cited by: §4.1.1, Table 1.
-  (2017) LP relaxation of the Potts labeling problem is as hard as any linear program. IEEE Trans. Pattern Anal. Mach. Intell. 39 (7), pp. 1469–1475. Cited by: §4.4.
-  (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision 47 (1-3), pp. 7–42. Cited by: Table 2.
-  (2011) Diffusion algorithms and structural recognition optimization problems. Cybernetics and Systems Analysis 47, pp. 175–192. External Links: Cited by: §1, §1.
-  (2008) A comparative study of energy minimization methods for markov random fields with smoothness-based priors. IEEE Trans. on Pattern Analysis and Machine Intelligence 30 (6), pp. 1068–1080. Cited by: §1.
-  (2001-06) Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 (3), pp. 475–494. Cited by: §1.
-  (2019-10) Relative interior rule in block-coordinate minimization. External Links: Cited by: §1, §1, footnote 2, footnote 3.
-  (2007-07) A linear programming approach to max-sum problem: a review. IEEE Trans. Pattern Analysis and Machine Intelligence 29 (7), pp. 1165–1179. Cited by: §1, §1, §4.4.
Many hard examples in exact phase transitions with application to generating hard satisfiable instances. arXiv preprint cs/0302001. Note: Instances available at http://sites.nlsde.buaa.edu.cn/~kexu/benchmarks/max-sat-benchmarks.htm. Cited by: §4.1.1, Table 1.
Appendix 0.A Details on Coordinate-wise Updates
This is a convex piecewise-affine function of . Its breakpoints are and for each . To find its minimum subject to , it is enough to consider the cases listed below.
If function (24) is strictly decreasing and is finite, then is the unique minimum.
If function (24) is strictly increasing and is finite, then is the unique minimum.
If function (24) has an (possibly unbounded) interval , where , as its set of minimizers, then the set of minimizers subject to is the projection of onto , i.e. an interval .777We define to be the projection of onto the interval . The projection onto unbounded intervals and is defined similarly and is denoted by and for brevity.
In order to perform an update to the relative interior of optimizers, we can simply set in the first case, in the second case. For the third case, the update to the relative interior corresponds to setting e.g. to some value from . In our implementation, we choose the midpoint of this interval if it is bounded. If it is unbounded in some direction, we choose a value in a fixed distance from its finite bound.
To identify which case occurred, one should analyse the slopes of the function between its breakpoints and the region of optima corresponds to the interval where the function (24) is constant. If there is no such interval, then its (unrestricted) minimum is at a breakpoint where the function changes from decreasing to increasing or the function is strictly monotone.
Objective function (5a) restricted to a single variable reads (up to a constant)
To find the minimum of this function subject to , one can apply the same procedure as with , except that the breakpoints will be only for each .
Appendix 0.B Proof of Theorem 3.2
For a given , with , fixed value of and the corresponding breakpoint