1 Introduction
Coordinatewise minimization (or coordinate descent) is an iterative optimization method, which in every iteration finds a global minimum of the problem over a single variable, while keeping the other variables fixed^{1}^{1}1In this paper we consider only exact updates, where in every iteration the global minimum over a variable is found.. For general convex optimization problems, the method need not converge and its fixed points need not be global minima. A simple example is the unconstrained minimization of the function , which is unbounded but any point with is a coordinatewise local minimum. For some classes of objective functions the method is known to converge to a global minimum. It is trivial to show that for unconstrained minimization of a differentiable convex function without constraints, any fixed point of the method is a global minimum. Moreover, if the function has unique univariate minima, then any limit point of the method is a global minimum [4, §2.7]. The same properties hold for convex functions whose nondifferentiable part is separable [28]. A natural extension of the method is blockcoordinate minimization, where every iteration minimizes the objective over a block of variables.
Despite missing guarantees of global optimality, (block)coordinate minimization can be a method of choice for largescale convex optimization problems. A notable example is the class of convergent message passing methods for solving dual linear programming (LP) relaxation of maxposterior (MAP) inference in graphical models, which can be seen as various forms of (block)coordinate minimization applied to various forms of the dual. In the typical case, the dual LP relaxation boils down to the unconstrained minimization of a convex piecewise affine (hence nondifferentiable) function. These methods include maxsum diffusion [20, 26, 30], TRWS [18], MPLP [11], and SRMP [19]
. They do not guarantee global optimality but for large sparse instances from, e.g., computer vision the achieved coordinatewise local optima are very good and TRWS is significantly faster than competing methods
[27, 15], including popular firstorder primaldual methods such as ADMM [5] or [8].This is a motivation to look for other classes of convex optimization problems for which (block)coordinate descent would work well or, alternatively, to extend convergent message passing methods to a wider class of convex problems than the dual LP relaxation of MAP inference. A step in this direction is the work [29], where it was observed that if the minimizer of the objective function over the current variable block is not unique, one should choose a minimizer that lies in the relative interior of the set of blockoptimizers. It is shown that any update satisfying this condition is, in a precise sense, not worse than any other exact update.
To be precise, suppose we minimize a convex function on a closed convex set . We assume that is bounded from below on . For brevity of formulation, we rephrase this as the minimization of the extendedvalued function such that for and for . One iteration of coordinate minimization with the relative interior rule [29] chooses a variable index
and replaces an estimate
with a new estimate such that^{2}^{2}2In [29], the iteration is formulated in a more abstract (coordinatefree) notation. Since we focus only on coordinatewise minimization here, we use a more concrete notation.where denotes the relative interior of a convex set . As this is a univariate convex problem, the set is either a singleton or an interval. In the latter case, the relative interior rule requires that we choose from the interior of this interval. A point that satisfies
for all is called a (coordinatewise) interior local minimum of function on set .
It is natural to ask for which convex problems interior local minima are global minima^{3}^{3}3We neglect convergence issues in this paper and assume that the method converges to an interior local minimum. This is supported by experiments, e.g., maxsum diffusion and TRWS have this property. More on convergence can be found in [29].. A succinct characterization of this class is elusive. Two subclasses of this class are known though [18, 26, 30]: the dual LP relaxation of MAP inference with pairwise potential functions and two labels, or with submodular potential functions. In this paper, we restrict ourselves to linear programs (where is linear and is a convex polyhedron) and present a different class of linear programs with this property. We show that dual LP relaxations of a number of combinatorial optimization problems belong to this class and coordinatewise minimization converges in reasonable time on large practical instances. We must note, however, that there exist more efficient largescale algorithms for solving these LP relaxations (such as reduction to maxflow), which makes the practical impact of our study limited so far.
2 Reformulations of Problems
Before presenting our main result, we make an important remark: while a convex optimization problem can be reformulated in many ways to an ‘equivalent’ problem which has the same global minima, not all of these transformations are equivalent with respect to coordinatewise minimization, in particular, not all preserve interior local minima.
One example is dualization. If coordinatewise minimization achieves good local (or even global) minima on a convex problem, it can get stuck in very poor local minima in the dual. Indeed, trying to apply (block)coordinate minimization to the primal LP relaxation of MAP inference (linear optimization over the local marginal polytope) has been futile so far.
Example 1
Consider the linear program
(1) 
which has one interior local minimum with respect to individual coordinates that also corresponds to the unique global optimum. But if one adds a redundant constraint, namely , then any feasible point will become an interior local minimum w.r.t. individual coordinates, because the redundant constraint blocks changing the variable without changing for both .
Example 2
Consider the linear program
(2a)  
(2b)  
(2c) 
which can be also formulated as
(3a)  
(3b) 
Optimizing over the individual variables by coordinatewise minimization in (2) does not yield the same interior local optima as in (3). For instance, assume that , and the problem (3) is given as
(4) 
where . Then, when optimizing directly in form (4), one can see that all the interior local optima are global optimizers.
However, when one introduces the variables and applies coordinatewise minimization on the corresponding problem (2), then there are interior local optima that are not global optimizers, for example , which is an interior local optimum, but is not a global optimum.
3 Main Result
The optimization problem with which we are going to deal is in its most general form defined as
(5a)  
(5b)  
(5c) 
where , , , , , , , , (assuming and ). We optimize over variables and . and denotes the th column and th row of , respectively.
Applying coordinatewise minimization with relativeinterior rule on the problem (5) corresponds to cyclic updates of variables, where each update corresponds to finding the region of optima of a convex piecewiseaffine function of one variable on an interval. If the set of optimizers is a singleton, then the update is straightforward. If the set of optimizers is a bounded interval , the variable is assigned the middle value from this interval, i.e. . If the set of optima is unbounded, i.e. , then we set the variable to the value , where is a fixed constant. In case of , the variable is updated to . The details for the update in this setting are in Appendix 0.A.
Theorem 3.1
In order to prove this claim, we formulate problem (5) as a linear program by introducing additional variables and and construct its dual. The proof of optimality is then obtained by constructing a dual feasible solution that satisfies complementary slackness.
The primal linear program (with corresponding dual variables and constraints on the same lines) reads
(6a)  
(6b)  
(6c)  
(6d)  
(6e)  
(6f)  
(6g)  
(6h)  
(6i)  
(6j)  
(6k) 
where the dual criterion is
(7) 
and clearly, at optimum of the primal, we have
(8a)  
(8b) 
The variables were eliminated from the primal formulation (6) to obtain (5) due to similar reasoning as in Example 2. We also remark that setting (resp. , , ) results in (resp. , , ).
Even though the primaldual pair (6) might seem overcomplicated, such general description is in fact necessary because as described in Section 2, equivalent reformulations may not preserve the structure of interior local minima and we would like to describe as general class, where optimality is guaranteed, as possible.
Example 3
Theorem 3.2
4 Applications
Here we show that several LP relaxations of combinatorial problems correspond to the form (5) or to the dual (6) and discuss which additional constraints correspond to the assumptions of Theorem 3.1.
4.1 Weighted Partial MaxSAT
In weighted partial MaxSAT, one is given two sets of clauses, soft and hard. Each soft clause is assigned a positive weight. The task is to find values of binary variables
, such that all the hard clauses are satisfied and the sum of weights of the satisfied soft clauses is maximized.We organize the soft clauses into a matrix defined as
In addition, we denote to be the number of negated variables in clause . These numbers are stacked in a vector . The hard clauses are organized in a matrix and a vector in the same manner.
The LP relaxation of this problem reads
(11a)  
(11b)  
(11c)  
(11d)  
(11e) 
where are the weights of the soft clauses . This is a subclass of the dual (6), where , , , , ( are therefore slack variables for the dual constraint (6h) that correspond to (11b)), (therefore ), (therefore ), ( are slack variables for the dual constraint (6i) that correspond to (11c)), .
Formulation (11) satisfies the conditions of Theorem 3.1 if each of the clauses has length at most 2. In other words, optimality is guaranteed for weighted partial Max2SAT.
Also notice that if we omitted the soft clauses (11b) and instead set , we would obtain an instance of MinOnes SAT, which could be generalized to weighted MinOnes SAT. This relaxation would still satisfy the requirements of Theorem 3.1 if all the present hard clauses have length at most 2.
4.1.1 Results
We tested the method on 800 smallest^{4}^{4}4Smallest in the sense of the file size. All instances could not have been evaluated due to their size and lengthy evaluation. instances that appeared in MaxSAT Evaluations [23] in years 2017 [2] and 2018 [3]. The results on the instances are divided into groups in Table 1 based on the minimal and maximal length of present clauses. We have also tested this approach on 60 instances of weighted Max2SAT from Ke Xu [31]. The highest number of logical variables in an instance was 19034 and the highest overall number of clauses in an instance was 31450. It was important to separate the instances without unit clauses (i.e. clauses of length 1), because in such cases the LP relaxation (11) has a trivial optimal solution with for all .
Coordinatewise minimization was stopped when the criterion did not improve by at least after a whole cycle of updates for all variables. We report the quality of the solution as the median and mean relative difference between the optimal criterion and the criterion reached by coordinatewise minimization before termination.
Table 1 reports not only instances of weighted partial Max2SAT but also instances with longer clauses, where optimality is no longer guaranteed. Nevertheless, the relative differences on instances with longer clauses still seem not too large and could be usable as bounds in a branchandbound scheme.
4.2 Weighted Vertex Cover
Dual (6) also subsumes^{5}^{5}5It is only necessary to transform minimization to maximization of negated objective in (12). the LP relaxation of weighted vertex cover, which reads
(12) 
where is the set of nodes and is the set of edges of an undirected graph. This problem also satisfies the conditions of Theorem 3.1 and therefore the corresponding primal (5) will have no nonoptimal interior local minima.
4.3 Minimum Cut, Maximum Flow
Recall from [10] the usual formulation of maxflow problem between nodes and on a directed graph with vertex set , edge set and positive edge weights for each , which reads
(13a)  
(13b)  
(13c) 
Assume that there is no edge , there are no ingoing edges to and no outgoing edges from , then any feasible value of in (13) is an interior local optimum w.r.t. individual coordinates by the same reasoning as in Example 1 due to the flow conservation constraint (13c), which limits each individual variable to a single value. We are going to propose a formulation which has no nonglobally optimal interior local optima.
The dual problem to (13) is the minimum cut problem, which can be formulated as
(14a)  
(14b)  
(14c)  
(14d)  
(14e)  
(14f) 
where if edge is in the cut and if edge is not in the cut. The cut should separate and , so the set of nodes connected to after the cut will be denoted by and is the set of nodes connected to . Using this notation, . Formulation (14) is different from the usual formulation by replacing the variables by , therefore we also maximize the weight of the not cut edges instead of minimizing the weight of the cut edges, therefore if the optimal value of (14) is , then the value of the minimum cut equals .
Formulation (14) is subsumed by the dual (6) by setting , and omitting the matrix. Also notice that each variable occurs in at most one constraint. The problem (14) therefore satisfies the conditions of Theorem 3.1 and the corresponding primal (5) is a formulation of the maximum flow problem, in which one can search for the maximum flow by coordinatewise minimization. The corresponding formulation (5) reads
(15a)  
(15b) 
4.3.1 Results
We have tested our formulation for coordinatewise minimization on maxflow instances^{6}^{6}6Available at https://vision.cs.uwaterloo.ca/data/maxflow. from computer vision. We report the same statistics as with MaxSAT in Table 2, the instances corresponded to stereo problems, multiview reconstruction instances and shape fitting problems.
For multiview reconstruction and shape fitting, we were able to run our algorithm only on small instances, which have approximately between and nodes and between and edges. On these instances, the algorithm terminated with the reported precision in 13 to 34 minutes on a laptop.
Instance Group or Instance  Results  

Name  #inst.  Mean RD  Median RD 
BVZtsukuba [7]  16  
BVZsawtooth [25] [7]  20  
BVZvenus [25] [7]  22  
KZ2tsukuba [17]  16  
KZ2sawtooth [25] [17]  20  
KZ2venus [25] [17]  22  
BL06camelsml [21]  1  
BL06gargoylesml [6]  1  
LB07bunnysml [22]  1 
4.4 MAP Inference with Potts Potentials
Coordinatewise minimization for the dual LP relaxation of MAP inference was intensively studied, see e.g. the review [30]. One of the formulations is
(16a)  
(16b) 
where is the set of labels, is the set of nodes and is the set of unoriented edges and
(17a)  
(17b) 
are equivalent transformations of the potentials. Notice that there are variables, i.e. two for each direction of an edge. In [24], it is mentioned that in case of Potts interactions, which are given as , one can add constraints
(18a)  
(18b) 
to (16) without changing the optimal objective. One can therefore use constraint (18a) to reduce the overall amount of variables by defining
(19) 
subject to . The decision of whether or should have the inverted sign depends on the chosen orientation of the originally undirected edges and is arbitrary. Also, given values satisfying (18), it holds for any edge and pair of labels that , which can be seen from the properties of the Potts interactions.
Therefore, one can reformulate (16) into
(20a)  
(20b) 
where the equivalent transformation in variables is given by
(21) 
and we optimize over variables , the graph is the same as graph except that each edge becomes oriented (in arbitrary direction). The way of obtaining an optimal solution to (16) from an optimal solution of (20) is given by (19) and depends on the chosen orientation of the edges in . Also observe that for any node and label and therefore the optimal values will be equal. This reformulation therefore maps global optima of (20) to global optima of (16). However, it does not map interior local minima of (20) to interior local minima of (16) when , an example of such case is shown in Appendix 0.D.
In problems with two labels (), problem (20) is subsumed by (5) and satisfies the conditions imposed by Theorem 3.1 because one can rewrite the criterion by observing that
(22) 
and each is present only in and . Thus, will have nonzero coefficient in the matrix only on columns and . The coefficients of the variables in the criterion are only and the other conditions are straightforward.
4.5 Binarized Monotone Linear Programs
In [14], integer linear programs with at most two variables per constraint were discussed. It was also allowed to have 3 variables in some constraints if one of the variables occurred only in this constraint and in the objective function. Although the objective function in [14]
was allowed to be more general, we will restrict ourselves to linear criterion function. It was also shown that such problems can be transformed into binarized monotone constraints over binary variables by introducing additional variables whose amount is defined by the bounds of the original variables, such optimization problem reads
(23a)  
(23b)  
(23c)  
(23d)  
(23e) 
where contain exactly one per row and exactly one per row and all other entries are zero,
is the identity matrix. We refer the reader to
[14] for details, where it is also explained that the LP relaxation of (23) can be solved by mincut on an associated graph. We can notice that the LP relaxation of (23) is subsumed by the dual (6), because one can change the minimization into maximization by changing the signs in . Also, the relaxation satisfies the conditions given by Theorem 3.1.In the paper [14], there are listed many problems which are transformable to (23) and are also directly (without any complicated transformation) subsumed by the dual (6) and satisfy Theorem 3.1, for example, minimizing the sum of weighted completion times of precedenceconstrained jobs (ISLO formulation in [9]), generalized independent set (forest harvesting problem in [12]), generalized vertex cover [13], clique problem [13], MinSAT (introduced in [16], LP formulation in [14]).
For each of these problems, it is easy to verify the conditions of Theorem 3.1, because they contain at most two variables per constraint and if a constraint contains a third variable, then it is the only occurrence of this variable and the coefficients of the variables in the constraints are from the set .
5 Concluding Remarks
We have presented a new class of linear programs that are exactly solved by coordinatewise minimization. We have shown that dual LP relaxations of several wellknown combinatorial optimization problems (partial Max2SAT, vertex cover, minimum cut, MAP inference with Potts potentials and two labels, and other problems) belong, possibly after a reformulation, to this class. We have shown experimentally (in this paper and in [1]) that the resulting methods are reasonably efficient for largescale instances of these problems. When the assumptions of Theorem 3.1 are relaxed (e.g., general MaxSAT instead of Max2SAT, or the Potts problem with any number of labels), the method experimentally still provides good local (though not global in general) minima.
We must admit, though, that the practical impact of Theorem 3.1 is limited because the presented dual LP relaxations satisfying its assumptions can be efficiently solved also by other approaches. Thus, maxflow/mincut can be solved (besides wellknown combinatorial algorithms such as FordFulkerson) by messagepassing methods such as TRWS. Similarly, the Potts problem with two labels is tractable and can be reduced to maxflow. In general, all considered LP relaxations can be reduced to maxflow, as noted in §4.5. Note, however, that this does not make our result trivial because (as noted in §2) equivalent reformulations of problems may not preserve interior local minima and thus messagepassing methods are not equivalent in any obvious way to our method.
It is open whether there are practically interesting classes of linear programs that are solved exactly (or at least with constant approximation ratio) by (block)coordinate minimization and are not solvable by known combinatorial algorithms such as maxflow. Another interesting question is which reformulations in general preserve interior local minima and which do not.
Our approach can pave the way to new efficient largescale optimization methods in the future. Certain features of our results give us hope here. For instance, our approach has an important novel feature over messagepassing methods: it applies to a constrained convex problem (the box constraints (5b) and (5c)). This can open the way to a new class of applications. Furthermore, updates along large variable blocks (which we have not explored) can speed algorithms considerably, e.g., TRWS uses updates along subtrees of a graphical model, while maxsum diffusion uses updates along single variables.
References
 [1] Relative interior rule in blockcoordinate minimization. Submitted to CVPR 2020. Cited by: §4.2, §4.4, §5.
 [2] (2017) MaxSAT Evaluation 2017. Cited by: §4.1.1.
 [3] (2018) MaxSAT Evaluation 2018. Cited by: §4.1.1.
 [4] (1999) Nonlinear programming. 2nd edition, Athena Scientific, Belmont, MA. Cited by: §1.
 [5] (201101) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 (1), pp. 1–122. Cited by: §1.
 [6] (2006) From photohulls to photoflux optimization.. In BMVC, Vol. 3, pp. 27. Cited by: Table 2.

[7]
(1998)
Markov random fields with efficient approximations.
In
Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231)
, pp. 648–655. Cited by: Table 2.  [8] (2011) A firstorder primaldual algorithm for convex problems with applications to imaging.. J. of Math. Imaging and Vision 40 (1), pp. 120–145. Cited by: §1.
 [9] (1999) A halfintegral linear programming relaxation for scheduling precedenceconstrained jobs on a single machine. Operations Research Letters 25 (5), pp. 199–204. Cited by: §4.5.
 [10] (1962) Flows in networks. Princeton University Press. Cited by: §4.3.
 [11] (2008) Fixing maxproduct: convergent message passing algorithms for MAP LPrelaxations. In Neural Information Processing Systems, pp. 553–560. Cited by: §1.
 [12] (1997) Forest harvesting and minimum cuts: a new approach to handling spatial constraints. Forest Science 43 (4), pp. 544–554. Cited by: §4.5.
 [13] (2000) Approximating a generalization of MAX 2SAT and MIN 2SAT. Discrete Applied Mathematics 107 (13), pp. 41–59. Cited by: §4.5.
 [14] (2002) Solving integer programs over monotone inequalities in three variables: a framework for half integrality and good approximations. European Journal of Operational Research 140 (2), pp. 291–321. Cited by: §4.5, §4.5, §4.5.
 [15] (2015) A comparative study of modern inference techniques for structured discrete energy minimization problems. Intl. J. of Computer Vision 115 (2), pp. 155–184. External Links: ISSN 15731405 Cited by: §1.
 [16] (1994) The minimum satisfiability problem. SIAM Journal on Discrete Mathematics 7 (2), pp. 275–283. Cited by: §4.5.
 [17] (2001) Computing visual correspondence with occlusions via graph cuts. Technical report Cornell University. Cited by: Table 2.
 [18] (2006) Convergent treereweighted message passing for energy minimization. IEEE Trans. Pattern Analysis and Machine Intelligence 28 (10), pp. 1568–1583. Cited by: §1, §1.
 [19] (201505) A new look at reweighted message passing. IEEE Trans. on Pattern Analysis and Machine Intelligence 37 (5). Cited by: §1.
 [20] (approx. 1975) A diffusion algorithm for decreasing the energy of the maxsum labeling problem. Note: Glushkov Institute of Cybernetics, Kiev, USSR. Unpublished Cited by: §1.
 [21] (2006) Oriented visibility for multiview reconstruction. In European Conference on Computer Vision, pp. 226–238. Cited by: Table 2.
 [22] (2007) Global optimization for shape fitting. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. Cited by: Table 2.
 [23] (To Appear.) MaxSAT Evaluation 2018: new developments and detailed results. Journal on Satisfiability, Boolean Modeling and Computation. Note: Instances available at https://maxsatevaluations.github.io/. Cited by: §4.1.1, Table 1.
 [24] (2017) LP relaxation of the Potts labeling problem is as hard as any linear program. IEEE Trans. Pattern Anal. Mach. Intell. 39 (7), pp. 1469–1475. Cited by: §4.4.
 [25] (2002) A taxonomy and evaluation of dense twoframe stereo correspondence algorithms. International journal of computer vision 47 (13), pp. 7–42. Cited by: Table 2.
 [26] (2011) Diffusion algorithms and structural recognition optimization problems. Cybernetics and Systems Analysis 47, pp. 175–192. External Links: ISSN 10600396 Cited by: §1, §1.
 [27] (2008) A comparative study of energy minimization methods for markov random fields with smoothnessbased priors. IEEE Trans. on Pattern Analysis and Machine Intelligence 30 (6), pp. 1068–1080. Cited by: §1.
 [28] (200106) Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 (3), pp. 475–494. Cited by: §1.
 [29] (201910) Relative interior rule in blockcoordinate minimization. External Links: 1910.09488 Cited by: §1, §1, footnote 2, footnote 3.
 [30] (200707) A linear programming approach to maxsum problem: a review. IEEE Trans. Pattern Analysis and Machine Intelligence 29 (7), pp. 1165–1179. Cited by: §1, §1, §4.4.

[31]
(2003)
Many hard examples in exact phase transitions with application to generating hard satisfiable instances
. arXiv preprint cs/0302001. Note: Instances available at http://sites.nlsde.buaa.edu.cn/~kexu/benchmarks/maxsatbenchmarks.htm. Cited by: §4.1.1, Table 1.
Appendix 0.A Details on Coordinatewise Updates
We now describe coordinatewise minimization for problem (5), satisfying the relative interior rule. Objective function (5a) restricted to a single variable for chosen reads (up to a constant)
(24) 
where
(25) 
This is a convex piecewiseaffine function of . Its breakpoints are and for each . To find its minimum subject to , it is enough to consider the cases listed below.

If function (24) is strictly decreasing and is finite, then is the unique minimum.

If function (24) is strictly increasing and is finite, then is the unique minimum.

If function (24) has an (possibly unbounded) interval , where , as its set of minimizers, then the set of minimizers subject to is the projection of onto , i.e. an interval .^{7}^{7}7We define to be the projection of onto the interval . The projection onto unbounded intervals and is defined similarly and is denoted by and for brevity.
In order to perform an update to the relative interior of optimizers, we can simply set in the first case, in the second case. For the third case, the update to the relative interior corresponds to setting e.g. to some value from . In our implementation, we choose the midpoint of this interval if it is bounded. If it is unbounded in some direction, we choose a value in a fixed distance from its finite bound.
To identify which case occurred, one should analyse the slopes of the function between its breakpoints and the region of optima corresponds to the interval where the function (24) is constant. If there is no such interval, then its (unrestricted) minimum is at a breakpoint where the function changes from decreasing to increasing or the function is strictly monotone.
In other cases, function (24) is unbounded and therefore also the original problem (5) is unbounded.
Objective function (5a) restricted to a single variable reads (up to a constant)
(26) 
where
(27) 
To find the minimum of this function subject to , one can apply the same procedure as with , except that the breakpoints will be only for each .
Appendix 0.B Proof of Theorem 3.2
Observation 0.B.1
Observation 0.B.2
For a given , with , fixed value of and the corresponding breakpoint
Comments
There are no comments yet.