I Introduction
Markov Random Fields (MRFs) are a popular graphical model for reconstruction and recognition problems in computer vision and robotics, including 2D and 3D semantic segmentation, stereo reconstruction, image restoration and denoising, texture synthesis, object detection, and panorama stitching [1, 2, 3]. An MRF can be understood as a factor graph including only unary and binary factors, and where node variables are discrete labels. The discrete nature of the variables makes maximum a posteriori (MAP) inference in MRFs intractable in general, hence several MRFbased applications remain out of reach for realtime robotics. Our motivating application in this paper is realtime semantic segmentation, which is crucial for the robot to understand the surrounding environment and execute highlevel tasks. Therefore, we are interested in developing realtime MRF solvers that can support online operation at scale.
The literature on MRFs (reviewed in Section VI) is vast and includes methods based on graph cuts, message passing techniques, greedy methods, and convex relaxations, to mention a few. These approaches are typically approximation techniques, in the sense that they attempt to compute nearoptimal MAP estimates efficiently (the problem is NPhard in general, hence we do not expect to compute exact solutions in polynomial time). Among those, semidefinite programming (SDP) relaxations have been recognized to produce accurate approximations [4]. On the other hand, the computational cost of generalpurpose SDP solvers prevented widespread use of this technique beyond problems with few hundred variables [5] (semantic segmentation typically involves thousands to millions of variables), and SDPs lost popularity in favor of computationally cheaper alternatives including movemaking algorithms (based on graph cut) and message passing. Movemaking methods [6] require specific assumptions on the MRF and their performance typically degrades when these assumptions are not satisfied. Message passing methods [7, 8], on the other hand, may not even converge, even thought they are observed to work well in practice.
Contribution. Our first contribution, presented in Section III, is to design a dualascentbased method to solve standard SDP relaxations that takes advantage of the geometric structure of the problem to speed up computation. In particular, we show that each dual ascent iteration can be solved using a fast lowrank SDP solver known as the Riemannian Staircase [9]. Upon convergence of the dual ascent iterations this technique attains the same objective as the standard SDP relaxation while being more scalable. This technique, named Dual Ascent Riemannian Staircase (DARS), is able to solve MRF instances with thousands of variables in seconds, while generalpurpose SDP solvers (e.g., cvx [10]) are not able to provide an answer in reasonable time (hours) at that scale.
Our second contribution, presented in Section IV, is an even faster SDP relaxation. Despite being remarkably faster than generalpurpose SDP solvers, DARS is currently slow for realworld robotics applications, hence, we develop a Fast Unconstrained SEmidefinite Solver (FUSES) that can solve large problems in milliseconds. The backbone of this second approach is a novel SDP relaxation combined with the Riemannian Staircase method [9]. The novel formulation uses a more intuitive binary matrix (with entries in
), contrarily to related work that parametrizes the problem using a vector with entries in
.Our third contribution is an extensive experimental evaluation. We test the proposed SDP solvers in semantic image segmentation problems and evaluate the corresponding results in terms of accuracy and runtime. We compare the proposed techniques against several related approaches, including movemaking methods (expansion [6]) and message passing (Loopy Belief Propagation [8] and TreeReweighted Message Passing [7]). The results show that our MRF solver retains all the advantages of SDP relaxations (accuracy, no need for initial guess, no assumption on the objective function), while being fast and scalable. More specifically, our results show that (i) FUSES and DARS produce nearoptimal solutions, attaining an objective within 0.2% of the optimum, (ii) FUSES and DARS are remarkably faster than generalpurpose SDP solvers, and FUSES is more than two orders of magnitude faster than DARS while attaining similar solution quality, (iii) FUSES is faster than local search methods while being a global solver.
Ii Preliminaries
This section introduces standard notation for MRFs (sec:preMRF) and provides necessary background on semidefinite relaxations (sec:standardSDPrelax).
Iia Markov Random Fields: Models and Inference
A Markov Random Field (MRF) is a graphical model in which nodes are associated with discrete labels we want to estimate, and edges (or potentials) represent given probabilistic constraints relating the labels of a subset of nodes. Formally, for each node in the node set (where is the number of nodes), we need to assign a label , where is the set of possible labels. If
(i.e., nodes are classified into two classes) the corresponding model is called a
binary MRF. Here we consider possible labels, a setup generally referred to as a multilabel MRF.Maximum a posteriori (MAP) inference. The MAP estimate is the most likely assignment of labels, i.e., the assignment of the node labels that attains the maximum of the posterior distribution of an MRF, or, equivalently, the minimum of the negative logposterior. MAP estimation can be formulated as a discrete optimization problem over the labels with [1]:
(P0) 
where is the set of unary potentials (probabilistic constraints involving a single node), is the set of binary potentials (involving a pair of nodes), and and represent the negative logdistribution for each unary and binary potential, respectively (described below). For instance, in semantic segmentation each node in the MRF corresponds to a pixel (or superpixel) in the image, the unary potentials are dictated by pixelwise classification from a classifier applied to the image, and the binary potentials enforce smoothness of the resulting segmentation [12]. The binary potentials (often referred to as smoothness priors) are typically enforced between nearby (adjacent) pixels.
MRF Potentials. A typical form for the unary and binary potentials is:
(1) 
where is a datadriven noisy measurement of the label of node (typically from a classifier), and and are given scalars. Typically, it is assumed , i.e., choosing a label different from the measured one incurs a cost in (P0). Similarly, for the binary potentials it is typically assumed , i.e., label mismatch () incurs a cost of in the objective (P0). In this case the binary potentials are called attractive, while they are referred to as repulsive when (i.e., the potentials encourage label mismatches) [13].
IiB Standard Semidefinite Relaxation
Semidefinite programming (SDP) relaxation has been shown to provide an effective approach to compute a good approximation of the global minimizer of (P0) [5, 15, 16]. In this section we introduce a standard approach to obtain an SDP relaxation, for which we design a fast solver in Section III.
In order to obtain an SDP relaxation, related works rewrite each node variable as a vector , such that has a single entry equal to (all the others are ), and if the th entry of is , then the corresponding node has label . Moreover, they stack all vectors , , in a single vector . Using this reparametrization, the inference problem (P0) can be written in terms of the vector as follows (full derivation in Appendix A):
(2) 
where and are a suitable symmetric matrix and a suitable vector collecting the coefficients of the binary terms and the unary terms in (1), respectively; is the diagonal of the matrix , and , where is an vector which is all zero, except the th entry which is one, is a vector of ones, and is the Kronecker product. Intuitively, contains the square of each entry of , hence imposes that every entry of has norm , i.e., it belongs to ; the constraint writes in compact form , which enforces each node to have a unique label (i.e., a single entry in can be , while all the others are ).
Before relaxing problem (2), it is convenient to homogenize the objective by reparametrizing the problem in terms of an extended vector , where an entry equal to is concatenated to . We can now rewrite (2) in terms of :
(P1) 
where and . In (P1), we used the equality , and noted that since , then .
So far we have only reparametrized problem (P0), hence (P1) is still a MAP estimator. We can now introduce the SDP relaxation: problem (P1) only includes terms in the form , hence we can reparametrize it using a matrix . Moreover, we note that the set of matrices that satisfy is the set of positive semidefinite () rank1 matrices (). Rewriting (P1) using and dropping the nonconvex rank1 constraint, we obtain:
(S1) 
which is a (convex) semidefinite program and can be solved globally in polynomial time using interiorpoint methods [17]. While the SDP relaxation (S1) is known to provide nearoptimal approximations of the MAP estimate, interiorpoint methods are typically slow in practice and cannot solve problems with more than few hundred nodes in a reasonable time.
Iii Dars: Dual Ascent Riemannian Staircase
This section presents the first contribution of this paper: a dual ascent approach to efficiently solve large instances of the standard SDP relaxation (S1).
Iiia Dual Ascent Approach
The main goal of this section is to design a dual ascent method, where the subproblem to be solved at each iteration has a more favorable geometry, and can be solved quickly using the Riemannian Staircase method introduced in Section IIIB. Towards this goal, we rewrite (S1) equivalently as:
(3) 
where the objective function is now , where is the indicator function which is zero when the constraint inside the parenthesis is satisfied and plus infinity otherwise.
Under constraints qualification (e.g., the Slater’s condition for convex programs [18, Theorem 3.1]), we can obtain an optimal solution to (3) by computing a saddlepoint of the Lagrangian function :
(4) 
where is the vector of dual variables and is the primal variable.
The basic idea behind dual ascent [19, Section 2.1] is to solve the saddlepoint problem (4) by alternating maximization steps with respect to the dual variables and minimization steps with respect to the primal variable .
Dual Maximization. The maximization of the dual variable is carried out via gradient ascent. In particular, at each iteration ( is the maximum number of iterations), the dual ascent method fixes the primal variable and updates the dual variable as:
(5) 
where is the gradient of the Lagrangian with respect to the dual variables, evaluated at the latest estimate of the primaldual variables , and is a suitable stepsize. It is straightforward to compute the gradient with respect to the th dual variable as . Intuitively, the second summand in (4) penalizes the violation of the constraint (for all ). Moreover, since the gradient in (5) grows with the amount of violation , the dual update (5) increases the penalty for constraints with large violation.
Primal Minimization. The minimization step fixes the dual variable to the latest estimate and minimizes (4) with respect to the primal variable :
(6) 
where we substituted “” for “” since the objective cannot drift to minus infinity due to the implicit constraints imposed by the indicator functions in . Recalling the expression of , defining , and moving again the indicator functions to the constraints we write (6) more explicitly as:
(7) 
where we dropped the constant terms from the objective since they are irrelevant for the optimization. The minimization step in the dual ascent is again an SDP, but contrarily to the standard SDP (S1), problem (7) can be solved quickly using the Riemannian Staircase, as discussed in the following.
IiiB A Riemannian Staircase for the Dual Ascent Iterations
This section provides a fast solver to compute a solution for the SDP (7), that needs to be solved at each iteration of the dual ascent method of Section IIIA.
We use of the BurerMonteiro method [20], which replaces the matrix in (7) with a rank product with :
(8) 
Note that the constraint in (7) becomes redundant after the substitution, since is always positive semidefinite, hence it is dropped.
Following Boumal et al. [9] we note that the constraint set in (8) describes a smooth manifold, and in particular a product of Stiefel manifolds. To make this apparent, we recall that the (transposed) Stiefel manifold is defined as [9]:
(9) 
Then, we observe that can be written as , (where is the th row of ), which is equivalent to saying that for . This observation allows concluding that the matrix belongs to the product manifold . Therefore, we can rewrite (8) as an unconstrained optimization on manifold:
(R1) 
The formulation (R1) is nonconvex (the product of Stiefel manifolds describes a nonconvex set), but one can find local minima efficiently using iterative methods [9, 21]. While it might seem that little was gained (we started with an intractable problem and we ended up with another nonconvex problem), the following remarkable result from Boumal et al. [9] ties back local solutions of (R1) to globally optimal solutions of the SDP (7).
The previous proposition ensures that when local solutions (secondorder critical points) of (R1) are rank deficient, then they can be mapped back to global solutions of (7), hence providing a way to solve (7) efficiently via (R1).
The catch is that one has to choose the rank large enough to obtain rankdeficient solutions. Related work [9] therefore proposes the Riemannian staircase method, where one solves (R1) for increasing values of till a rankdeficient solution is found. Boumal et al. [9] also provide theoretical results ensuring that rankdeficient solutions are found for small (more details in Section V).
IiiC Dars: Summary, Convergence, and Guarantees
We name DARS (Dual Ascent Riemannian Staircase) the approach resulting from the combination of dual ascent and the Riemannian Staircase. DARS starts with an initial guess for the dual variables (we use ), and then alternates two steps: (i) the primal minimization where a solution for (7) is obtained using the Riemannian Staircase (R1) (in practice this is solved using iterative methods, such as the Truncated Newton method); (ii) the dual maximization were the dual variables are updated using the gradient ascent update (5).
Rounding. Upon convergence, DARS produces a matrix . When deriving the standard SDP relaxation (S1) we dropped the rank1 constraint, hence cannot be written in general as . The process of computing a feasible solution for the original problem (P1) is called rounding. A standard approach for rounding consists in computing a rank1 approximation of
(which can be done via singular value decomposition) and rounding the entries of the resulting vector in
. We refer to as the rounded estimate and we call the objective value attained by in (P1).Convergence. While dual ascent is a popular optimization technique, few convergence results are available in the literature. For instance, dual ascent is known to converge when the original objective is strictly convex [22]. Currently, we observe that DARS converges when the stepsize in (5) is sufficiently small. We prove the following perinstance performance guarantees.
Proposition 2 (Guarantees in Dars)
If the dual ascent iterations converge to a value (i.e., the dual iterations reach a solution where the gradient in (5) is zero) then the following properties hold:
The proof of Proposition 2 is given in Appendix B. The first claim in Proposition 2 ensures that when the dual ascent method converges, it produces an optimal solution for the standard SDP relaxation (S1). The second claim states that we can compute an upperbound on how far the DARS’ solution is from optimality () using the rounded objective and the relaxed objectives .
Iv Fuses: Fast Unconstrained
SEmidefinite Solver
In this section we propose a more direct way to obtain a semidefinite relaxation and a remarkably faster solver. While DARS is already able to compute an approximate MAP estimate in seconds for large problems, the approach presented in this section requires two orders of magnitude less time to compute a solution of comparable quality. We first present a binary (rather than ) matrix formulation (sec:fusesformulation) and derive an SDP relaxation (sec:fusessdp). We then present a Riemannian staircase approach to solve the resulting SDP in real time (sec:fusesstaircases) and discuss performance guarantees (sec:fusesguarantees).
Iva Matrix Formulation
In this section we rewrite the node variables as an binary matrix that is such that if an entry in position is equal to , then node has label and is zero otherwise. In other words, the th row of is a binary vector that describes the label of node and has a single entry equal to in position , where is the label assigned to the node. This is a more intuitive parametrization of the problem and indeed leads to a more elegant matrix formulation, given as follows.
Proposition 3 (Binary Matrix Formulation of MAPMRF)
Let and be defined as follows:
(10) 
where is the th row of , is the entry of in row and column , and are the coefficients defining the MRF, cf. eq. (1), and is a vector with a unique nonzero entry equal to 1 in position ( is the measured label for node ). Then the MAP estimator (P0) can be equivalently written as:
(11) 
IvB Novel Semidefinite Relaxation
This section presents a semidefinite relaxation of (11). Towards this goal, we first homogenize the cost by lifting the problem to work on a larger variable:
(12) 
where is the identity matrix. The reparametrization is given as follows.
Proposition 4 (Homogenized Binary Matrix Formulation)
At this point it is straightforward to derive a semidefinite relaxation, by noting that and by observing that is a symmetric positive semidefinite matrix of rank .
Proposition 5 (Semidefinite Relaxation)
The following SDP is a convex relaxation of the MAP estimator (P2):
(S2) 
where and are the topleft block and the bottomright block of the matrix , respectively, and we dropped the rank constraint for .
IvC Accelerated Inference via the Riemannian Staircase
We now present a fast specialized solver to solve the SDP (S2) in real time and for large problem instances. Similarly to Section IIIB, we use the BurerMonteiro method [20], which replaces the matrix in (S2) with a rank product :
(13) 
where (for a suitable rank ), and where the constraint in (S2) becomes redundant after the substitution, and is dropped.
Similarly to Section IIIB, we note that the constraint set in (13) describes a smooth manifold, and in particular a product of Stiefel manifolds. Specifically, we observe that can be written as , , which is equivalent to saying that for . Moreover, denoting with the block matrix including the last rows of , the constraint can be written as , which is equivalent to saying that . The two observations above allow concluding that the matrix belongs to the product manifold . Therefore, we can rewrite (13) as an unconstrained optimization on manifold:
(R2) 
The formulation (R2) is nonconvex but one can find local minima efficiently using iterative methods [9, 21]. We can again adapt the result from Boumal et al. [9] to conclude that rankdeficient local solutions of (R2) can be mapped back to global solutions of the semidefinite relaxation (S2).
IvD Fuses: Summary, Convergence, and Guarantees
We name FUSES (Fast Unconstrained SEmidefinite Solver) the approach presented in this section. Contrarily to DARS, FUSES is extremely simple and only requires solving the rankrestricted problem (R2), which can be solved using iterative methods, such as the Truncated Newton method. Besides its simplicity, FUSES is guaranteed to converge to the solution of the SDP (S2) for increasing values of the rank (Proposition 6).
Rounding. Upon convergence, FUSES produces a matrix . Similarly to DARS, we obtain a rounded solution by computing a rankK approximation of and rounding the corresponding matrix in (i.e., we assign the largest element in each row to 1 and we zero out all the others). We denote with the resulting estimate and we call the objective value attained by in (11).
Since the SDP (S2) is a relaxation of the MAP estimator (P2), it is straightforward to prove the following proposition.
Proposition 7 (Guarantees in Fuses)
Again, we can use Proposition 7 to compute how far the solution computed by FUSES is from the optimal objective attained by the MAP estimator.
V Experiments
This section evaluates the proposed MRF solvers on semantic segmentation problems, comparing their performance against the state of the art.
Va Fuses and Dars: Implementation Details
We implemented FUSES and DARS in C++ using Eigen’s sparse matrix manipulation and leveraging the optimization suite developed in [21]. Sparse matrix manipulation is crucial for speed and memory reasons, since the involved matrices are very large. For instance in DARS, the matrix in (R1) has size where typically and . We initialize the rank of the Riemannian Staircase to be for DARS and for FUSES (this is the smallest rank for which we expect a rankdeficient solution). The Riemannian optimization problems (R1) and (R2) are solved iteratively using the truncatedNewton trustregion method. We refer the reader to [23] for a description of the implementation of a truncatedNewton trustregion method. As in [23], we use the Lanczos algorithm to check that (R1) and (R2) converged to rankdeficient secondorder critical points, which are optimal according to Proposition 1 and Proposition 6, respectively. If the optimality condition is not met, the algorithm proceeds to the next step of the Riemannian staircase, repeating the optimization with the rank increased by 1. In all experiments, FUSES finds an optimal solution in the first iteration of the staircase (), while we observed that the rank in DARS (initially ) sometimes increases to . In DARS, we limit the number of dual ascent iterations to , and we terminate iterations when the gradient in (5) has norm smaller than . Using a constant stepsize ensured convergence in all tests.
VB Setup, Compared Techniques, and Performance Metrics
Setup. We evaluate FUSES and DARS using the Cityscapes dataset [24], which contains a large collection of images of urban scenes with pixelwise semantic annotations. The annotations include 30 semantic classes (e.g., road, sidewalk, person, car, building, vegetation). We first extract superpixels from the images using OpenCV (we obtain around 1000 superpixels per image, unless specified otherwise). Then, the unary terms are obtained using Bonnet [25], which uses a CNN to obtain pixelwise segmentation (Bonnet only uses 20 classes for classification purposes); the unary potential for each superpixel is set based on the majority of labels for the corresponding set of pixels (we set ). Bonnet returns noisy labels for each (super)pixel and the role of the MRF is to refine the segmentation by encouraging smoothness of nearby labels. In practice, since CNNs are typically inaccurate at the boundary between different objects, we expect the use of superpixels and MRF to improve the segmentation results. The binary potentials are modeled as [2, Section 7.2], where denotes the average color vector in superpixel , and are parameters to tune, and where "" represents the sample mean. In our tests, we set , and .
Compared techniques. We compare the proposed techniques against three stateoftheart methods: expansion [6] (label: exp). Loopy Belief Propagation [8] (label: LBP) and TreeReweighted Message Passing [7] (label: TRWS). We use the implementation of these methods available in the newlyreleased OpenGM2 library [26].
Performance metrics. We evaluate the results in terms of suboptimality, accuracy, and CPU time. We measure the suboptimality using three metrics: the percentage of optimal labels, the percentage relaxation gap, and the percentage rounding gap. The optimal labels are those that agree with the optimal solution of (P0). The relaxation gap is for DARS, and for FUSES. The rounding gap is for DARS, and for FUSES. We compute the optimal labels (and the corresponding optimal objective) using a commercial tool for integer programming, CPLEX [27]. The runtime of CPLEX increases exponentially in the problem size hence we can only use it offline for benchmarking the proposed solvers. We measure the accuracy using the Intersection over Union (IoU) metric [28], and record the CPU time for each compared technique.
VC Semantic Segmentation Results
Fig. 2 shows a typical execution of the algorithms for a single image in the Cityscapes dataset. Fig. 2(a) shows the convergence of FUSES, reporting the relaxed objective attained by iteratively solving (R2) (FUSESrelaxed), the objective of the corresponding rounded estimate at each iteration (FUSESrounded), and the optimal cost attained by CPLEX (Exact). The approach converges in few milliseconds, and the corresponding rounded estimate settles near the optimal objective. Fig. 2(b) shows the convergence of DARS, reporting the relaxed objective attained by (R1) (DARSrelaxed), the objective of the corresponding rounded estimate (DARSrounded), and the optimal cost from CPLEX (Exact). DARS’ relaxed cost does not decrease monotonically. Moreover, its convergence time is around two orders of magnitude slower than FUSES.
Fig. 2(c) shows all the compared techniques, while Fig. 2(d) provides a zoomedin view restricted to the first 18ms. We only report the final cost for DARS, whose convergence is much slower than all the other methods. From Fig. 2(c)(d) we note that exp, LBP, an TRWS perform well in segmentation problems. While not providing any optimality guarantee (LBP and TRWS may not even converge to a local optimum), these techniques return nearoptimal solutions in all the tested images. exp and LBP have longer convergence tails but typically obtain a smaller value than FUSES and DARS. TRWS also requires more time to terminate but attains a nearoptimal objective in few iterations. FUSES is farther from optimal (see also Tables III), but it is the only technique that does not require any initial guess. FUSES attains an objective comparable to the one of DARS, while being much faster.
Table I provides statistics describing the performance of the compared techniques on the Cityscapes’ Lindau dataset over 59 images (we use approximately 1000 superpixels). We show the percentage of optimal labels (“Optimal Labels” column), the relaxation gap (“Relax Gap” column), and the rounding gap (“Round Gap” column). The tables show that FUSES and DARS have comparable suboptimality (typically larger than the other compared techniques). FUSES and DARS produce optimal assignments for most of the nodes in the MRF, and attain a rounded cost within 0.2% of the optimum. The IoU (“Accuracy” column) shows that all the techniques have comparable accuracy (around ). All the compared techniques produce more accurate results than the CNNbased segmentation produced by Bonnet, which has IoU equal to on this dataset. Note that the accuracy depends on the parameters of the MRF ( and ) besides depending on the solver. FUSES is the fastest MRF solvers and can compute a solution in milliseconds, while not relying on any initial guess. Table II shows that even with 2000 superpixels, the advantages of FUSES remain. Fig. 1 shows qualitative segmentation results obtained using the proposed techniques. We also attempted to use a generalpurpose SDP solver, cvx [10], for our evaluation: with only 200 superpixels, cvx requires more than 50 minutes to solve the SDP (S1), while for 1000 superpixels it crashes due to excessive memory usage.
Method  Suboptimality  

Optimal  Relax  Round  Accuracy  Runtime  
Labels (%)  Gap (%)  Gap (%)  (% IoU)  (ms)  
FUSES  
DARS  
exp    
LBP    
TRWS   
Method  Suboptimality  

Optimal  Relax  Round  Accuracy  Runtime  
Labels (%)  Gap (%)  Gap (%)  (% IoU)  (ms)  
FUSES  
DARS  
exp    
LBP    
TRWS   
Fig. 3(a) shows the relaxation gap for FUSES and DARS for increasing number of nodes; we control the number of nodes by controlling how many superpixels each image is divided in. The relaxation gap decreases for increasing number of nodes, which is a desirable feature since one typically solves large problems (>1000 nodes). The relaxation gap in FUSES is slightly larger: in hindsight, we tradedoff suboptimality for fast computation. Fig. 3(b) shows the relaxation gap for FUSES and DARS for increasing number of labels; we artificially reduce the number of labels in Cityscapes for this test. The quality of both relaxations does not degrades significantly for increasing number of labels.
Vi Related Work
This section reviews inference techniques (sec:rwtechniquesexactVIB) and applications (sec:rwapplications) for pairwise MRFs including work on semantic segmentation. Our presentation is based on [1, 2, 3] but also covers more recent work on MRFs and semantic segmentation.
Via Exact Inference in MRFs
Efficient Algorithms. Inference in MRFs is intractable in general. However, particular instances of the problem are solvable in polynomial time. In particular, the Ising model can be solved exactly in polynomial time via graph cut [29, 30]. Note that graph cut algorithms are exact when binary potentials are “attractive”, i.e., in (1) (priors encourage nearby nodes to have the same label). MRFs with repulsive potentials () are intractable in general [31]. A more general (necessary and sufficient) condition that ensures optimality of graph cut for binary pairwise MRFs with classes is the regularity condition:
(14) 
for any , see Lemma 3.2 and Theorem 3,1 in [31]. The regularity condition in eq. (14) is a special case of submodularity, and indeed the corresponding potentials are also called submodular [31, 32, 33].
For multilabel pairwise MRFs, exact solutions exist for the case when the binary potentials are convex functions of the labels [34, 35, 36] and for the case where the binary potentials are linear and the unary potentials are convex [37]. We remark that these approaches assume a linear ordering of the labels, where the potentials penalize node labels depending on their label distance ; this means that choosing and incurs a larger penalty than choosing and ; on the other hand, the Potts model in eq. (1) penalizes in the same way any class mismatch . Assuming a linear ordering is often unrealistic in practice; for instance, in semantic segmentation the classes (e.g., cat, table, car) do not admit a linear order in general. Moreover, convexity is a strong assumption for several MRF applications, such as depth reconstruction, where nonconvex costs have the desirable property of being discontinuitypreserving [31] contrarily to convex ones, which tend to smooth out depth discontinuities. Inference in multiclass MRF based on the Potts model is NPhard, see [38].
In the special case where the topology of the MRF is a chain (e.g., when the MRF describes a 1D signal or sequence), or more generally a tree, Dynamic Programming provides an optimal MAP estimate in polynomial time, see [33, 39]. Related work [40, 41] also extends dynamic programming to certain families of graphs with cycles and small cliques.
Global Integer Solvers. The energy minimization problem (P0) is a quadratic integer program and can be easily reformulated as a binary optimization problem [42, 43, 44]. Integer programming is NPhard in general, but one may still resort to stateoftheart integer solvers (e.g., CPLEX [27]
) for moderatesize instances. For quadratic and linear programs, integer solvers based on cutting plane methods or branch & bound are able to produce solutions for problems with few hundred variables relatively quickly (i.e., in few seconds), but become unacceptably slow for larger problems. A BranchandCut approach is proposed in
[45]. An evaluation and a broader review of integer programming for MRFs is given in [3].ViB Approximate and Local Inference in MRFs
Iterative Local Solvers and Metaheuristics.
Local solvers start at a given initial guess and iteratively try to converge to a local optimum of the cost function. Early work includes the Iterative Local Modes (ICM) of Besag [46], which at each iteration greedily changes the label of a node in order to get the largest decrease in the cost. ICM is known to be very sensitive to the quality of the initial guess [1]. In order to improve convergence, Geman and Geman [47] use Simulated Annealing to perform inference in MRFs. Simulated Annealing requires exponential time to converge in theory and is notoriously slow in practice [48].Graph Cuts and MoveMaking Algorithms. While graph cut methods are able to compute globally optimal solutions in binary pairwise MRFs with submodular potentials (Section VIA), they are only able to converge to local minima in nonsubmodular binary MRFs or in multiclass MRFs. For the binary case, related works [49, 32] develop schemes to approximately solve MRFs with nonsubmodular potentials. Regarding the multiclass case, popular graph cut methods include the swapmove (swap) and the expansionmove (expansion) algorithms, both proposed in [6]. At each inner iteration, these algorithms solve a binary segmentation problem using graph cut, while the outer loop attempts to reconcile the binary results into a coherent multiclass segmentation. Boykov et al. [6] show that the swapmove algorithm is applicable whenever the smoothness potentials are semimetric (i.e., and ), and the expansionmove algorithm is applicable whenever the smoothness potentials are metric^{1}^{1}1Note that both the Potts model and the truncated distance are metrics. (i.e., they are semimetric and also satisfy the triangle inequality ); these conditions are further generalized in [31]. Under these conditions, Boykov et al. [6] show that these graph cut methods produce “strong” local minima, i.e., local minima where no allowed move is able to further reduce the cost. Moreover, these techniques produce a local solution with is proven to be within a known factor from the global minimum [6]. When these conditions are not satisfied, approximations of the cost function can be used [50, 38]. Komodakis and Tziritas [51] draw connections between movemaking algorithms and the dual of linear programming relaxations. Kumar and Koller [52, 53] propose a movemaking approach that applies to the semimetric case and attains the same guarantees of the linear relaxation (see paragraph below) in the metric case. Faster algorithmic variants are proposed by Alahari et al. [54]. Lempitsky et al. [55] provide a lowcomplexity algorithm (LogCut) that requires an offline learning step. A summary of the MRF formulations that can be solved exactly or within a constant factor from the global minimum via graph cut is given in [31]. When the potentials do not satisfy the conditions for applicability of graph cut methods, approximate versions of these techniques can be still applied [50] but the corresponding performance bounds no longer hold.
MessagePassing Techniques. Message passing techniques adjust the MAP estimate at each node in the MRF via local information exchange between neighboring nodes. A popular message passing technique is belief propagation [56], which results in exact inference in graphs without loops, but is also applicable to generic graphs [57, 58] (loopy belief propagation, or LBP in short). LBP is not guaranteed to converge in presence of cycles, but if convergence is attained LBP returns “strong” local minima [8, 59]. TreeReweighted Message Passing [7] (TRWS) is another popular messagepassing algorithm which is also able to estimate a lowerbound on the cost that can be used to assess the quality of the solution. Also in this case the estimate is not guaranteed to converge and may oscillate. Messagepassing techniques do not necessarily return integer solutions, hence the resulting estimates need to be rounded, see [3, Section 4.5]. Krähenbühl and Koltun [60] use message passing to perform inference in a mean field approximation of a fullyconnected Conditional Random Fields (CRFs).^{2}^{2}2Conditional Random Fields (CRFs) are a special case of MRFs, where the binary terms, rather than being smoothness priors, are data driven.
Linear Programming (LP) Relaxations. These techniques relax the optimization to work on continuous labels rather than discrete ones. Early relaxation techniques include the LP relaxation of the local polytope [7], which is typically applicable only to small problem instances [3]. Kleinberg and Tardos [61] provide suboptimality guarantees for LP relaxations with metric potentials. Gupta and Tardos [62] extend these results considering a truncated linear metric. Chekuri et al. [63] and Werner [64] further refine the suboptimality bounds. Komodakis and Tziritas [65] consider the case of semimetric and nonmetric potentials and derive primaldual methods to efficiently solve the resulting LP relaxations. Sontag and Jaakkola [66] propose a cuttingplane algorithm for optimizing over the marginal polytope. Other specialized solvers to attack larger instances have also been proposed, including blockcoordinate ascent [67], subgradient methods based on dual decomposition [68, 69, 70], Alternating Directions Dual Decomposition [71], and others [72, 73, 74]. The performance of these techniques is typically sensitive to the choice of the parameters (e.g., stepsize) and can only ensure local convergence [3]. For binary pairwise MRFs, LP relaxation over the local polytope can be solved efficiently by reformulating it as a maximum flow problem, see the roof duality (or QPBO) approach of Rother et al. [75]. LP relaxations typically do not produce an integer solution, therefore the corresponding solutions need to be rounded. Moreover, they are tightly coupled with messagepassing algorithms, see [3, Section 4.3]. Kumar et al. [4] provide a comparison between linear, quadratic, and secondorder cone programming relaxations, showing that the linear relaxation dominates the others.
Spectral and Semidefinite Relaxations. These techniques typically rephrase inference over an MRF in terms of a binary quadratic optimization problem [16], which can be then relaxed to a convex program (more details in Section IIB). Shi and Malik [76] propose a spectral relaxation for image segmentation; more recently, spectral segmentation is used by Aksoy et al. [77]. Keuchel et al. [5]
introduce SDP relaxations to several computer vision applications and use interiorpoint methods and randomized hyperplane techniques to obtain integer solutions, leveraging the celebrated result of Goemans and Williamson
[78], which bounds the suboptimality of the resulting solutions. SDP relaxations are known to provide better solutions than spectral methods [5, 16]. While early approaches also recognized the accuracy of SDP relaxations with respect to commonly used alternatives (e.g., [4]), the computational cost of generalpurpose SDP solvers prevented widespread use of this technique beyond problems with few hundred variables [5]. Keuchel et al. [15] propose an approach to reduce the dimension of the problem via image preprocessing and superpixel segmentation. Concurrently, Torr [79] proposes the use of SDP relaxations for pixel matching problems. Schellewald and C. Schnörr [80] suggest a similar SPD relaxation for subgraph matching in the context of object recognition. Heiler et al. [81] propose to add constraints in the SDP relaxation to enforce priors (e.g., constrain the number of pixels in a class, or force set of pixels to belong to the same class). Olsson et al. [16] develop a spectral subgradient method which is shown to reduce the relaxation gap of spectral relaxations. Huang et al. [82] use an Alternating Direction Methods of Multipliers to speed up computation, while Wang et al. [83, 84] develop a specialized dual solver. Frostig et al. [85] resort to nonconvex optimization to approximate the SDP solution, while Wang et al. [86] consider fullyconnected CRFs and propose fast solvers for the case where the pairwise potentials admit a lowrank decomposition. We remark that the approach to derive the SDP relaxation is common to all papers above and follows the line of Section IIB. Wainwright and Jordan [87] use semidefinite programming to approximately compute the marginal distributions in a graphical model. More generally, semidefinite programming has been a popular way to relax combinatorial integer programming problems [88, 89] and assignment problems [90, 91].ViC Applications
Overview.
MRFs have been successfully used in several application domains including computer vision, computer graphics, machine learning, and robotics. Popular applications include image denoising, inpainting, and superresolution
[36, 38, 30, 57], image segmentation (reviewed below), stereo reconstruction [92, 36, 38, 93, 35, 94, 95, 96, 97], panorama stitching and digital photomontages [98], image/video/texture synthesis [99], multicamera scene reconstruction [100], voxel occupancy estimation [101], nonrigid point matching and registration [102, 16], medical imaging [103, 104]. In stereo reconstruction, the labels are the disparities at each pixel and the binary potentials are function of the absolute color differences at nearby pixels. Birchfield and Tomasi [48] provide a comparison of graphcut methods for stereo reconstruction, while Tappen and Freeman [105] compare graph cut and LBP; Kolomogorow and Rother [106] evaluate TRWS, LBP, and graph cut. Szeliski et al. [1] compare several techniques on stereo reconstruction, photomontage, image segmentation, and image denoising benchmarks. The study concludes the the expansion move algorithm typically outperforms the swap move algorithm, while ICM performs poorly in practice. In general, the best approach may depend on the application: for instance, the expansion move algorithm is the best performer for the photomontage benchmark, while expansion move and TRWS perform the best on the depth reconstruction benchmark. A broader evaluation is presented in [3], which also provides a C++ library, OpenGM2 [26], that implements several inference algorithms.Semantic Segmentation. Semantic segmentation methods assign a semantic label to each “region” in an RBG image (2D segmentation), RBGD image, or 3D model (3D segmentation). Depending on the approach, labels can be assigned to single pixels/voxels, superpixels, or keypoints [3]; Since semantic segmentation is typically modeled as an MRF, the literature review in Sections VIAVIB already covers several work in segmentation, and indeed segmentation (together with depth reconstruction) is a typical benchmark for inference in MRFs, see [1, 2, 3, 33, 12] and the references therein. Therefore, the goal of this section is to (i) provide a brief taxonomy of semantic segmentation problems, and (ii) review semantic segmentation techniques that do not directly use MRFs. The corresponding literature is vast, and we refer the reader to the excellent survey of Zhu et al. [12] for a broader review of related work.
Taxonomy. Semantic segmentation is different from clustering, which groups pixels based on similarities without necessarily associating a given semantic label to each group (this is sometimes called nonsemantic, unsupervised, or bottomup segmentation [107, 12]). While semantic segmentation classifies image regions into semantic classes, instance segmentation also attempts to discern multiple objects belonging to the same class. In full analogy with MRFs, segmentation problems can be divided in binary segmentation problems (where only two classes, foreground and background, are segmented) and multiclass segmentation problems, where more than two labels are allowed. We can further divide the literature depending on the type of input data the segmentation operates on, including isolated RGB images (most common setup in computer vision), stereo images [38], RGBD images [108, 109], volumetric 3D data (e.g., volumetric Xray CT images [110], or 3D voxelbased models [111]), or multiple RBG images; the latter setup is typically referred to as cosegmentation [112, 113, 12] (for generic unordered images), or temporal (or video) segmentation [114] (if images are collected over time). Thoma [107] also categorizes the segmentation problems into active (where one can influence the data collection mechanism, as it happens in robotics), passive (where the input data is given), and interactive (where a human user provides coarse information to the segmentation algorithms).
Other Approaches. Traditional approaches for semantic segmentation work by extracting and classifying features in the input data, and then enforcing consistency of the classification across pixels (e.g., using MRFs or other models). Common features include pixel color, histogram of oriented gradients, SIFT, or textons, to mention a few [107, 115]. Shotton et al. [116, 117] use textons and Random Decision Forests for semantic segmentation. Yang et al. [118]
use Support Vector Machine (SVM) demonstrating competitive performance in the PASCAL segmentation challenge
[119]. A latent SVM model is used by Felszenzwalb et al. [120] to detect objects using deformable part models. Winn and Shotton [121] use a CRFbased algorithm, named the Layout Conditional Random Field (LayoutCRF), to detect and segment objects from their parts; the approach is further generalized by Hoiem et al. [122]; Shotton et al. [123] use textons within a CRF model for object segmentation. Kumar et al. [124] use MRFs to detect and segment objects in an image. Bray et al. [125] concurrently segment and estimate the 3D pose of a human body from multiple views. Higherorder MRF formulations are also used for semantic segmentation, see the work by Kohli and coauthors [126, 127, 128] and the review [3]. Approaches for interactive segmentation include intelligent scissors [129], active contour models [130, 131] (based on dynamic programming), and graph cut methods (GrabCut [132]). While most of the work mentioned so far operates on a discrete set of nodes of a graphical model, related work in multiclass segmentation also includes contributions modeling the problem over a continuous domain; examples of such efforts include the variational method of Lellmann et al. [133], and the anisotropic diffusion method of Kim et al. [113]; see the chapter by Cremers et al. [134]for a recent survey. More recently, deep convolutional neural networks have become a popular solution for semantic segmentation, see the recent review of GarciaGarcia
et al. [28].Vii Conclusion
We propose fast optimization techniques to solve two semidefinite relaxations of maximum a posteriori inference in Markov Random Fields (MRFs). The first technique, named DARS (Dual Ascent Riemannian Staircase), provides a scalable solution for the standard SDP relaxation proposed in the literature. The second technique, named FUSES (Fast Unconstrained SEmidefinite Solver), is based on a novel relaxation. We test the proposed approaches in semantic segmentation problems and compare them against stateoftheart MRF solvers, including movemaking and messagepassing methods. Our experiments show that (i) FUSES and DARS produce nearoptimal solutions, attaining an objective within 0.2% of the optimum, (ii) our approaches are remarkably faster than generalpurpose SDP solvers, while FUSES is more than two orders of magnitude faster than DARS, (iii) FUSES is faster than local search methods while being a global solver.
References
 [1] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother, “A Comparative Study of Energy Minimization Methods for Markov Random Fields with SmoothnessBased Priors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 6, pp. 1068–1080, 2008.
 [2] A. Blake, P. Kohli, and C. Rother, Markov Random Fields for Vision and Image Processing. The MIT Press, 2011.
 [3] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kröger, J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother, “A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems,” Intl. J. of Computer Vision, vol. 115, no. 2, pp. 155–184, 2015.
 [4] P. M. Kumar, V. Kolmogorov, and P. Torr, “An analysis of convex relaxations for MAP estimation,” in Advances in Neural Information Processing Systems (NIPS), 2008, pp. 1041–1048.
 [5] J. Keuchel, C. Schnörr, C. Schellewald, and D. Cremers, “Binary partitioning, perceptual grouping, and restoration with semidefinite programming,” IEEE Trans. Pattern Anal. Machine Intell., vol. 25, pp. 1364–1379, 2003.
 [6] Boykov, Yuri, Veksler, Olga, and Zabih, Ramin, “Fast Approximate Energy Minimization via Graph Cuts.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001.
 [7] M. Wainwright, T. Jaakkola, and A. Willsky, “MAP Estimation Via Agreement on Trees: MessagePassing and Linear Programming,” IEEE Trans. on Information Theory, vol. 51, no. 11, pp. 3697–3717, 2005.
 [8] Y. Weiss and W. T. Freeman, “On the optimality of solutions of the maxproduct beliefpropagation algorithm in arbitrary graphs,” IEEE Trans. on Information Theory, vol. 47, no. 2, pp. 736–744, 2001.
 [9] N. Boumal, V. Voroninski, and A. Bandeira, “The nonconvex Burer–Monteiro approach works on smooth semidefinite programs,” in Advances in Neural Information Processing Systems (NIPS), 2016, pp. 2757–2765.
 [10] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming.” [Online]. Available: http://cvxr.com/cvx
 [11] S. Hu and L. Carlone, “Accelerated inference in Markov Random Fields via smooth Riemannian optimization,” Tech. Rep., 2018, supplemental material: (pdf).
 [12] H. Zhu, F. Meng, J. Cai, and S. Lu, “Beyond pixels: A comprehensive survey from bottomup to semantic image segmentation and cosegmentation,” Journal of Visual Communication and Image Representation, vol. 12–27, p. 34, 2016.
 [13] A. Gallagher, D. Batra, and D. Parikh, “Inference for order reduction in Markov Random Fields,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1857–1864.
 [14] R. B. Potts and C. Domb, “Some generalized orderdisorder transformations,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 48, no. 01, p. 106, 1952.
 [15] J. Keuchel, M. Heiler, and C. Schnörr, “Hierarchical Image Segmentation Based on Semidefinite Programming.” DAGMSymposium, vol. 3175, no. Chapter 15, pp. 120–128, 2004.
 [16] C. Olsson, A. Eriksson, and F. Kahl, “Improved spectral relaxation methods for binary quadratic optimization problems,” Computer Vision and Image Understanding, vol. 112, no. 1, pp. 3–13, 2008.
 [17] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge University Press, 2004.
 [18] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Rev., vol. 38, no. 1, pp. 49–95, 1996.
 [19] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends, Machine Learning, vol. 3, no. 1, pp. 1–122, 2010.
 [20] S. Burer and R. Monteiro, “A nonlinear programming algorithm for solving semidefinite programs via lowrank factorization,” Mathematical Programming, vol. 95, no. 2, pp. 329–357, 2003.
 [21] D. Rosen, L. Carlone, A. Bandeira, and J. Leonard, “SESync: A certifiably correct algorithm for synchronization over the Special Euclidean group,” in Intl. Workshop on the Algorithmic Foundations of Robotics (WAFR), San Francisco, CA, December 2016, extended arxiv preprint: 1611.00128, (pdf) (pdf) (code).
 [22] P. Tseng, “Dual ascent methods for problems with strictly convex costs and linear constraints: A unified approach,” SIAM J. Control Optim., vol. 28, no. 1, pp. 214–242, 1990.
 [23] D. Rosen and L. Carlone, “Computational enhancements for certifiably correct SLAM,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2017, workshop on “Introspective Methods for Reliable Autonomy”, (pdf).

[24]
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes dataset for semantic urban scene understanding,” in
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.  [25] A. Milioto and C. Stachniss, “Bonnet: An OpenSource Training and Deployment Framework for Semantic Segmentation in Robotics using CNNs,” ArXiv, 2018.
 [26] B. Andres, T. Beier, and J. Kappes, “OpenGM2,” 2016. [Online]. Available: http://hci.iwr.uniheidelberg.de/opengm2/
 [27] IBM, “CPLEX: IBM ILOG CPLEX Optimization Studio.” [Online]. Available: https://www.ibm.com/products/ilogcplexoptimizationstudio
 [28] A. GarciaGarcia, S. OrtsEscolano, S. Oprea, V. VillenaMartinez, and J. GarcíaRodríguez, “A review on deep learning techniques applied to semantic segmentation,” ArXiv Preprint: 1704.06857, 2017.
 [29] P. L. Ivănescu, “Some Network Flow Problems Solved with PseudoBoolean Programming,” Operations Research, vol. 13, no. 3, pp. 388–399, 1965.

[30]
D. Greig, B. Porteous, and A. Seheult, “Exact Maximum A Posteriori Estimation for Binary Images,”
J. Royal Statistical Soc., vol. 51, no. 2, pp. 271–279, 1989.  [31] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147–159, 2004.
 [32] S. Jegelka and J. Bilmes, “Submodularity beyond submodular energies: Coupling edges in graph cuts,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). IEEE, 2011, pp. 1897–1904.
 [33] P. Felzenszwalb and R. Zabih, “Dynamic programming and graph algorithms in computer vision,” IEEE Trans. Pattern Anal. Machine Intell., vol. 33, no. 4, pp. 721–740, 2011.
 [34] H. Ishikawa, “Exact optimization for markov random fields with convex priors,” IEEE Trans. Pattern Anal. Machine Intell., vol. 25, no. 10, pp. 1333–1336, 2003.
 [35] H. Ishikawa and D. Geiger, “Occlusions, discontinuities, and epipolar lines in stereo,” in European Conf. on Computer Vision (ECCV), H. Burkhardt and B. Neumann, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 232–248.
 [36] Y. Boykov, O. Veksler, and R. Zabih, “Markov Random Fields with Efficient Approximations,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 1998.
 [37] D. S. Hochbaum, “An efficient algorithm for image segmentation, Markov random fields and related problems,” Journal of the ACM, vol. 48, no. 4, pp. 686–701, 2001.
 [38] Boykov, Yuri, Veksler, Olga, and Zabih, Ramin, “Fast Approximate Energy Minimization via Graph Cuts.” IEEE Trans. Pattern Anal. Machine Intell., vol. 23, no. 11, pp. 1222–1239, 2001.
 [39] O. Veksler, “Stereo correspondence by dynamic programming on a tree,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2005, pp. 384–390.
 [40] Y. Amit and A. Kong, “Graphical templates for model registration,” IEEE Trans. Pattern Anal. Machine Intell., vol. 18, no. 3, pp. 225–236, 1996.
 [41] P. Felzenszwalb, “Representation and detection of deformable shapes,” IEEE Trans. Pattern Anal. Machine Intell., vol. 27, no. 2, pp. 208–220, 2005.
 [42] E. Boros and P. L. Hammer, “Pseudoboolean optimization,” Discrete Applied Mathematics, vol. 123, no. 1, pp. 155–225, 2002. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0166218X01003419
 [43] A. Schrijver, Theory of Linear and Integer Programming. New York, NY, USA: John Wiley & Sons, Inc., 1986.
 [44] E. Boros, P. L. Hammer, and G. Tavares, “Preprocessing of unconstrained quadratic binary optimization,” Tech. Rep., 2006.
 [45] P. Wang, C. Shen, A. van den Hengel, and P. H. S. Torr, “Efficient Semidefinite BranchandCut for MAPMRF Inference,” Intl. J. of Computer Vision, vol. 117, no. 3, pp. 269–289, 2015.
 [46] J. Besag, “On the statistical analysis of dirty pictures,” J. Royal Statistical Soc., vol. 48, no. 3, pp. 48–259, 1986.
 [47] Geman, S and Geman, D, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Trans. Pattern Anal. Machine Intell., vol. 6, no. 6, pp. 721–741, 1984.
 [48] S. Birchfield and C. Tomasi, “A pixel dissimilarity measure that is insensitive to image sampling,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 4, pp. 401–406, 1998.
 [49] V. Kolmogorov and C. Rother, “Minimizing nonsubmodular functions with graph cutsa review,” IEEE Trans. Pattern Anal. Machine Intell., vol. 29, no. 7, pp. 1274–1279, 2007.
 [50] V. K. C. Rother, S. Kumar and A. Blake, “Digital Tapestry,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2005, pp. 589–596.
 [51] N. Komodakis and G. Tziritas, “A new framework for approximate labeling via graph cuts,” in Intl. Conf. on Computer Vision (ICCV), vol. 2, 2005, pp. 1018–1025.

[52]
M. Kumar and D. Koller, “MAP Estimation of Semimetric MRFs via Hierarchical
Graph Cuts,” in
Proceedings of the TwentyFifth Conference on Uncertainty in Artificial Intelligence (UAI)
, 2009, pp. 313–320.  [53] P. Torr and M. Kumar, “Improved moves for truncated convex models,” in Advances in Neural Information Processing Systems (NIPS), 2009, pp. 889–896.
 [54] K. Alahari, P. Kohli, and P. Torr, “Reduce, reuse, and recycle: Efficiently solving multilabel mrfs,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
 [55] V. Lempitsky, C. Rother, and A. Blake, “LogCut  Efficient Graph Cut Optimization for Markov Random Fields,” in Intl. Conf. on Computer Vision (ICCV), 2007, pp. 1–8.
 [56] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
 [57] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient Belief Propagation for Early Vision,” International Journal of Computer Vision, vol. 70, no. 1, pp. 41–54, 2006.
 [58] W. Freeman and E. Pasztor, “Learning lowlevel vision,” Intl. J. of Computer Vision, vol. 40, pp. 25–47, 1999.
 [59] M. Wainwright, T. Jaakkola, and A. Willsky, “Tree consistency and bounds on the performance of the maxproduct algorithm and its generalizations,” Statistics and Computing, vol. 14, no. 2, pp. 143–166, 2004.
 [60] P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” in Advances in Neural Information Processing Systems (NIPS), 2011, pp. 109–117.
 [61] J. Kleinberg and E. Tardos, “Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov Random Fields,” in Proc. of the 40th Annual Symposium on Foundations of Computer Science, 1999.

[62]
A. Gupta and É. Tardos, “A constant factor approximation algorithm for a
class of classification problems,” in
Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing
, 2000.  [63] C. Chekuri, S. Khanna, J. Naor, and L. Zosin, “Approximation algorithms for the metric labeling problem via a new linear programming formulation,” in Proc. of the Annual ACMSIAM Symposium on Discrete Algorithms, 2001, pp. 109–118.
 [64] T. Werner, “A linear programming approach to maxsum problem: A review,” IEEE Trans. Pattern Anal. Machine Intell., vol. 29, no. 7, pp. 1165–1179, 2007.
 [65] N. Komodakis and G. Tziritas, “Approximate Labeling via Graph Cuts Based on Linear Programming,” IEEE Trans. Pattern Anal. Machine Intell., vol. 29, no. 8, pp. 1436–1453, 2007.
 [66] D. Sontag and T. Jaakkola, “New outer bounds on the marginal polytope,” in Advances in Neural Information Processing Systems (NIPS), 2008, pp. 1393–1400.
 [67] V. Kolmogorov, “Convergent TreeReweighted Message Passing for Energy Minimization,” IEEE Trans. Pattern Anal. Machine Intell., vol. 28, no. 10, pp. 1568–1583, 2006.
 [68] N. Komodakis, N. Paragios, and G. Tziritas, “MRF Energy Minimization and Beyond via Dual Decomposition,” pami, vol. 33, no. 3, pp. 531–552, 2011.

[69]
J. Kappes, M. Speth, G. Reinelt, and C. Schn̈rr, “Towards efficient and exact mapinference for large scale discrete computer vision problems via combinatorial optimization,” in
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2013.  [70] M. Guignard and S. Kim, “Lagrangean decomposition: A model yielding stronger lagrangean bounds,” Mathematical Programming, vol. 39, no. 2, pp. 215–228, 1987.
 [71] A. Martins, M. Figueiredo, P. Aguiar, N. Smith, and E. Xing, “An augmented Lagrangian approach to constrained MAP inference,” in Intl. Conf. on Machine Learning (ICML), 2011, pp. 169–176.
 [72] B. Savchynskyy, S. Schmidt, J. Kappes, and C. Schnörr, “Efficient mrf energy minimization via adaptive diminishing smoothing,” in Proceedings of the TwentyEighth Conference on Uncertainty in Artificial Intelligence (UAI), 2012, pp. 746–755.
 [73] A. Globerson and T. Jaakkola, “Fixing maxproduct: Convergent message passing algorithms for MAP LPrelaxations,” in Advances in Neural Information Processing Systems (NIPS), 2008, pp. 553–560.
 [74] D. Sontag, D. Choe, and Y. Li, “Efficiently searching for frustrated cycles in map inference,” in Proceedings of the TwentyEighth Conference on Uncertainty in Artificial Intelligence (UAI), 2012.
 [75] C. Rother, V. Kolmogorov, V. Lempitsky, and M. Szummer, “Optimizing binary mrfs via extended roof duality,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2007.
 [76] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 5, 2000.
 [77] Y. Aksoy, T. Oh, S. Paris, M. Pollefeys, and W. Matusik, “Semantic soft segmentation,” SIGGRAPH, vol. 37, no. 4, pp. 72:1–72:13, 2018.
 [78] M. Goemans and D. Williamson, “Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming,” J. ACM, vol. 42, no. 6, pp. 1115–1145, 1995.
 [79] P. H. S. Torr, “Solving markov random fields using semi definite programming,” in International Workshop on Artificial Intelligence and Statistics (AISTATS), 2003.
 [80] C. Schellewald and C. Schnörr, “Probabilistic subgraph matching based on convex relaxation,” in Energy Minimization Methods in Computer Vision and Pattern Recognition. Springer Berlin Heidelberg, 2005, pp. 171–186.
 [81] M. Heiler, J. Keuchel, and C. Schnörr, “Semidefinite Clustering for Image Segmentation with Apriori Knowledge,” in Pattern Recognition. Springer, Berlin, Heidelberg, 2005, pp. 309–317.
 [82] Q. Huang, Y. Chen, and L. Guibas, “Scalable semidefinite relaxation for maximum a posterior estimation,” in Intl. Conf. on Machine Learning (ICML), 2014, pp. II–64–II–72.
 [83] W. Peng, S. Chunhua, and A. van den Hengel, “A fast semidefinite approach to solving binary quadratic problems,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1312–1319, 2013.
 [84] P. Wang, C. Shen, A. V. D. Hengel, and P. Torr, “Largescale Binary Quadratic Optimization Using Semidefinite Relaxation and Applications,” IEEE Trans. Pattern Anal. Machine Intell., vol. 39, no. 3, pp. 1–18, 2016.
 [85] R. Frostig, S. Wang, P. Liang, and C. Manning, “Simple MAP inference via lowrank relaxations,” in NIPS, 2014.
 [86] P. Wang, C. Shen, and A. V. D. Hengel, “Efficient SDP inference for fullyconnected CRFs based on lowrank decomposition,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3222–3231, 2015.
 [87] M. J. Wainwright and M. I. Jordan, “Semidefinite relaxations for approximate inference on graphs with cycles,” in Advances in Neural Information Processing Systems (NIPS), 2003, pp. 369–376.
 [88] F. Alizadeh, “Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization,” SIAM Journal on Optimization, vol. 5, no. 1, pp. 13–51, 1995.
 [89] S. Poljak, F. Rendl, and H. Wolkowicz, “A recipe for semidefinite relaxation for (0,1)quadratic programming,” Journal of Global Optimization, vol. 7, no. 1, pp. 51–73, 1995.
 [90] Q. Zhao, S. Karisch, F. Rendl, and H. H. Wolkowicz, “Semidefinite programming relaxations for the quadratic assignment problem,” Journal of Combinatorial Optimization, vol. 2, no. 1, pp. 71–109, 1998.
 [91] K. Anstreicher and N. Brixius, “A new bound for the quadratic assignment problem based on convex quadratic programming,” Mathematical Programming, vol. 89, no. 3, pp. 341–357, 2001.
 [92] S. Birchfield and C. Tomasi, “Multiway cut for stereo and motion with slanted surfaces,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 1, 1999, pp. 489–495.
 [93] H. Ishikawa and D. Geiger, “Segmentation by grouping junctions,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 1998, pp. 125–131.
 [94] V. Kolmogorov and R. Zabih, “Computing visual correspondence with occlusions using graph cuts,” in Intl. Conf. on Computer Vision (ICCV), vol. 2, 2001, pp. 508–515 vol.2.
 [95] S. Roy, “Stereo without epipolar lines: A maximumflow formulation,” Intl. J. of Computer Vision, vol. 34, no. 2, pp. 147–161, Aug 1999. [Online]. Available: https://doi.org/10.1023/A:1008192004934
 [96] S. Roy and I. J. Cox, “A MaximumFlow Formulation of the Ncamera Stereo Correspondence Problem,” in Intl. Conf. on Computer Vision (ICCV), 1998, pp. 492–499.
 [97] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense twoframe stereo correspondence algorithms,” Intl. J. of Computer Vision, vol. 47, no. 1, pp. 7–42, 2002.
 [98] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin, and M. Cohen, “Interactive digital photomontage,” ACM Trans. Graph., vol. 23, no. 3, pp. 294–302, 2004.
 [99] V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick, “Graphcut textures,” ACM Transactions on Graphics, vol. 22, no. 3, 2003.
 [100] V. Kolmogorov and R. Zabih, “Multicamera scene reconstruction via graph cuts,” in European Conf. on Computer Vision (ECCV), 2002.
 [101] D. Snow, P. Viola, and R. Zabih, “Exact voxel occupancy with graph cuts,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2000, pp. 345–352 vol.1.
 [102] N. Komodakis and N. Paragios, “Beyond loose LPrelaxations: Optimizing MRFs by repairing cycles,” in European Conf. on Computer Vision (ECCV), 2008.
 [103] Y. Boykov and M.P. Jolly, “Interactive Organ Segmentation Using Graph Cuts,” 2000.
 [104] J. Kim, J. F. III, A. Tsai, C. Wible, A. Willsky, and W. W. III, “Incorporating spatial priors into an information theoretic approach for fmri data analysis,” in Proc. of the Third International Conference on Medical Image Computing and ComputerAssisted Intervention, ser. MICCAI ’00. London, UK, UK: SpringerVerlag, 2000, pp. 62–71. [Online]. Available: http://dl.acm.org/citation.cfm?id=646923.710388
 [105] M. Tappen and W. Freeman, “Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters,” in Intl. Conf. on Computer Vision (ICCV), 2003, pp. 900–907.
 [106] V. Kolmogorov and C. Rother, “Comparison of Energy Minimization Algorithms for Highly Connected Graphs,” 2006.
 [107] M. Thoma, “A survey of semantic segmentation,” ArXiv Preprint: 1602.06541, 2017.
 [108] D. Zhuo, T. Sinisa, and L. Longin, “Semantic segmentation of RGBD images with mutex constraints,” Intl. Conf. on Computer Vision (ICCV), pp. 1733–1741, 2015.
 [109] S. Gupta, P. Arbeláez, R. Girshick, and J. Malik, “Indoor scene understanding with RGBD images: Bottomup segmentation, object detection and semantic segmentation,” Intl. J. of Computer Vision, vol. 112, no. 2, pp. 133–149, 2015.
 [110] S. Hu, E. Hoffman, and J. Reinhardt, “Automatic lung segmentation for accurate quantization of volumetric Xray CT images,” IEEE Transactions on Medical Imaging, vol. 20, no. 6, pp. 490–498, 2001.
 [111] A. Kundu, Y. Li, F. Dellaert, F. Li, and J. Rehg, “Joint semantic segmentation and 3D reconstruction from monocular video,” in European Conf. on Computer Vision (ECCV), ser. Lecture Notes in Computer Science, vol. 8694, 2014, pp. 703–718.
 [112] C. Rother, T. Minka, A. Blake, and V. Kolmogorov, “Cosegmentation of image pairs by histogram matching  incorporating a global constraint into MRFs,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2006, pp. 993–1000.
 [113] G. Kim, E. Xing, L. FeiFei, and T. Kanade, “Distributed cosegmentation via submodular optimization on anisotropic diffusion,” in Intl. Conf. on Computer Vision (ICCV), 2011, pp. 169–176.
 [114] A. Chen and J. Corso, “Temporally consistent multiclass videoobject segmentation with the video graphshifts algorithm,” in 2011 IEEE Workshop on Applications of Computer Vision (WACV), 2011, pp. 614–621.
 [115] S.C. Zhu, C. Guo, Y. Wang, and Z. Xu, “What are Textons?” Intl. J. of Computer Vision, vol. 62, no. 1/2, pp. 121–143, 2005.
 [116] J. Shotton, M. Johnson, and R. Cipolla, “Semantic texton forests for image categorization and segmentation,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.

[117]
F. Schroff, A. Criminisi, and A. Zisserman, “Object class segmentation using random forests,” in
British Machine Vision Conf. (BMVC), 2008.  [118] Y. Yang, S. Hallman, D. Ramanan, and C. Fowlkes, “Layered Object Models for Image Segmentation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 34, no. 9, pp. 1731–1743, 2012.
 [119] M. Everingham, L. V. Gool, C. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes (VOC) Challenge,” Intl. J. of Computer Vision, vol. 88, no. 2, pp. 303–338, 2009.
 [120] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained partbased models,” IEEE Trans. Pattern Anal. Machine Intell., vol. 32, no. 9, pp. 1627–1645, 2010.
 [121] J. Winn and J. Shotton, “The layout consistent random field for recognizing and segmenting partially occluded objects,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2006.
 [122] D. Hoiem, C. Rother, and J. Winn, “3D LayoutCRF for multiview object class recognition and segmentation,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1–8.
 [123] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost: Joint appearance, shape and context modeling for multiclass object recognition and segmentation,” in European Conf. on Computer Vision (ECCV), 2006.
 [124] M. Kumar, P. Torr, and A. Zisserman, “OBJ CUT,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2005, pp. 18–25.
 [125] M. Bray, P. Kohli, and P. Torr, “PoseCut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graphcuts,” in European Conf. on Computer Vision (ECCV), 2006, pp. 642–655.
 [126] P. Kohli, L. Ladický, and P. Torr, “Robust higher order potentials for enforcing label consistency,” Intl. J. of Computer Vision, vol. 82, no. 3, pp. 302–324, 2009.
 [127] P. Kohli, M. Kumar, and P. Torr, “ & beyond: solving energies with higher order cliques,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2007.
 [128] P. Kohli, L. Ladicky, and P. Torr, “Robust higher order potentials for enforcing label consistency,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
 [129] E. Mortensen and W. Barrett, “Intelligent scissors for image composition,” in SIGGRAPH, 1995, pp. 191–198.
 [130] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active Contour Models,” Intl. J. of Computer Vision, vol. 1, no. 4, pp. 321–331, 1987.
 [131] A. Amini, T. Weymouth, and R. Jain, “Using dynamic programming for solving variational problems in vision,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, no. 9, pp. 855–867, 1990.
 [132] C. Rother, V. Kolmogorov, and A. Blake, “GrabCut interactive foreground extraction using iterated graph cuts,” in SIGGRAPH, 2004.
 [133] J. Lellmann, F. Becker, and C. Schnörr, “Convex optimization for multiclass image labeling with a novel family of total variation based regularizers,” in Intl. Conf. on Computer Vision (ICCV), 2009, pp. 646–653.
 [134] D. Cremers, T. Pock, K. Kolev, and A. Chambolle, “Convex relaxation techniques for segmentation, stereo and multiview reconstruction,” in Markov Random Fields for Vision and Image Processing. MIT Press, 2011.
Appendix A: Equivalence between Problems (2) and (P0)
Here we prove that solving Problem (2) is equivalent to solving (P0), in the sense that the solution set of a problem is in 1to1 correspondence with the solution set of the other. Towards this goal, we show that (2) can be simply obtained as a reparametrization of (P0).
We first rewrite each node variable in (P0) as a vector , such that has a single entry equal to (all the others are ), and if the th entry of is , then the corresponding node has label . Each vector is a valid label assignment as long as there is a unique entry equal to , or, equivalently, . Using this vector parametrization we rewrite the unary and binary potentials (1) as:
(15)  
where is a vector of all zeros, except the entry in position (measured class label for node ), which is equal to . The reparametrization of the unary potentials in (15) can be seen to be the same as (1) by observing that if or otherwise; similarly, the reparametrization of the binary potentials follows from the fact that if or otherwise.
Using (15), we rewrite Problem (P0) as:
(16) 
where we dropped the constant terms from (15) (which are irrelevant for the optimization), and where the constraint enforces each vector to have at most one entry equal to (i.e., we assign a single label to each node).
In order to obtain Problem (2), we adopt a more compact notation by stacking all vectors , with , in a single vector , and note that the cost function (16) is quadratic in the entries of . Therefore, we rewrite problem (16) as:
(17) 
where is an symmetric block matrix, and is an vector, and , where is an vector which is all zero, except the th entry which is one, is a vector of ones, and is the Kronecker product. The constraint simply rewrites the constraint in (16). The reader can also verify by inspection that the following choice of and ensures that the objective in (17) is the same as (16):
(18)  
where stacks subvectors of size , is the th subvector of , is the block of in block row and block column , and is the identity matrix of size .
Now we observe that for a scalar , we can equivalently write as . Moreover, we note that the diagonal of the matrix contains the squares of every entry of . Combining these two observations, we rewrite problem (16) equivalently as:
Comments
There are no comments yet.