1 Introduction
Computing the maximum a posteriori (MAP) assignment in a graphical model is a central inference task spanning a wide scope of scenarios (Wainwright & Jordan, 2008), ranging from traditional applications in graph matching, stereo reconstruction, object detection, errorcorrecting codes, gene mapping, etc., to a more recent application in estimating consistent object orientations from noisy pairwise measurements (Crandall et al., 2011). For general graphs, this problem is wellknown to be NPhard (Shimony, 1994). However, due in part to its importance in practice, a large body of algorithms have been proposed to approximate MAP estimates by solving various convex relaxation formulations.
Among those methods based on convex surrogates, semidefinite relaxation usually strictly dominates other formulations based on linear programming or quadratic programming in terms of solution quality. Despite its superiority in obtaining more accurate estimates, however, the most significant challenge that limits the applicability of any semidefinite relaxation paradigm on real problems is efficiency. So far existing generalpurpose SDP solvers can only handle problems with small dimensionality.
In this paper, we propose a novel semidefinite relaxation approach (referred to as SDR) for secondorder MAP inference in pairwise undirected graphical models. Our key observation is that the marginalization constraints in a typical linear programming relaxation (c.f.(Kumar et al., 2009)) can be subsumed by combing a semidefinite conic constraint with a small set of linear constraints. As a result, SDR admits a concise set of nicely decoupled constraints, which allows us to develop an accelerated variant (referred as SDPADLR) of the alternating direction method of multipliers method (ADMM) that is scalable to very largescale problems.
On a standard PC, we have successfully applied SDR on dense problems of dimensions of () up to five thousand, and on gridstructured problems up to variables each with dozens of states per node.
Practically, SDPADLR performs remarkably well on a variety of problems. We have evaluated SDPADLR on two collections of benchmark datasets: OPENGM2 (Kappes et al., 2013a) and a probabilistic inference challenge (PIC, 2011). Each benchmark consists of multiple categories of problems derived from various MAP estimation tasks. Experimental results demonstrate that SDPADLR outperforms the stateoftheart algorithms in computational speed, while often obtaining better MAP estimates.
1.1 Background
There is a vast literature concerning MAP estimation over discrete undirected graphical models and it is beyond the scope of this paper to discuss all existing algorithms. Interested readers are referred to (Wainwright & Jordan, 2008) for an indepth introduction to this topic. In the following, we focus on methods that involve convex relaxation, which are the most relevant to our approach.
Many prior convex relaxation techniques are derived from the original graph structure underlying the MAP estimation problem, among which linear programming relaxation (LPR) methods (Chekuri et al., 2004; Wainwright et al., 2005) are the most popular. In addition to LPR, researchers have considered alternative convex relaxations, e.g., quadratic relaxation (QPRL) (Ravikumar et al., 2010) and secondorder cone relaxation (SOCPMS) (Kumar et al., 2009). In the seminal work of (Kumar et al., 2009), the authors evaluate various convex relaxation approaches, and assert that LPR dominates QPRL and SOCPMS. However, as will be shown later, LPR is further dominated by a standard SDP relaxation (Wainwright & Jordan, 2008), which is one of the main foci of this paper.
A recent line of approaches have aimed at obtaining tighter convex relaxations by incrementally adding higherorder interactions to enforce proper marginalization over groups of variables (Sontag et al., 2012; Komodakis & Paragios, 2008; Batra et al., 2011). Despite the practical success of these approaches, it remains an open problem to analyze their behavior — for example, to decide whether a polynomial number of clusters are sufficient.
There have been several attempts in applying semidefinite relaxation to obtain MAP assignment (Torr, 2003; Olsson et al., 2007; Wang et al., 2013; Peng et al., 2012). However, most of these methods are primarily designed for binary MAP estimation problems. In a recent work, (Peng et al., 2012) considered a general MAP estimation problem, where each variable has multiple states. The key difference between the proposed formulation and that of (Peng et al., 2012) is that we utilize the semidefinite cone constraint to prune redundant linear marginalization constraints. This leads to a concise set of loosely decoupled constraints, which is important in developing effective optimization paradigms.
1.2 Notation
Before proceeding, we introduce a few notations that will be used throughout the paper. For any linear operator , we let represent its conjugate operator. Denote by the set of matrices with nonnegative entries, and the projection operator onto . For any symmetric matrix , we use to represent the projection of onto the positive semidefinite cone. Finally, we denote by the Frobenius norm of a matrix .
2 MAP Estimation and SDP Relaxation
We start with state configurations over discrete random variables . Without loss of generality, assume that each takes values in a discrete state set . Consider a pairwise Markov random field (MRF) parameterized by the potentials (or sufficient statistics) for all vertices and for all edges . The energy (or loglikelihood) associated with this MRF is given by
(1) 
The goal of MAP estimation is then to compute the configuration of states that maximizes the energy – the most probable state assignment
.2.1 Semidefinite Programming Relaxation (SDR)
MAP estimation over discrete sets is an NPhard combinatorial problem, and can be cast as an integer quadratic program (IQP). Denote by
a binary vector such that
if and only if . Then MAP estimation is equivalent to the following integer program.subject to  (2) 
where and encode the corresponding potentials.
The hardness of the above IQP arises in two aspects: (i)
are binaryvalued, and (ii) the objective function is a quadratic function of these binary variables. These motivate us to relax the constraints in some appropriate manner, leading to our semidefinite relaxation. In the sequel, we present the proposed relaxation in a stepbystep fashion.

In the same spirit as existing convex formulations (e.g., (Kumar et al., 2009; Peng et al., 2012)), we introduce a binary block matrix to accommodate quadratic objective terms:
which apparently exhibits the following properties:
(3) 
The nonconvex constraint is then relaxed and replaced by , which by Schur complement condition is equivalent to the following semidefinite conic constraint :
(4) 
The binary constraints and are replaced by weaker linear constraints
Note that the constraints and are essentially subsumed by the constraints (2), (3), and (4) taken together. For the sake of numerical efficiency, we further relax the nonnegative constraint to be
(5) As we will see later, this relaxation is crucial in accelerating SDP solvers for largescale problems.
Remark 1.
The nonnegativity constraints described in (5) are necessary since otherwise SDR becomes loose for submodular functions. Below is an example in the presence of 2 variables each having 2 states:
It is clear that satisfies the submodular property. However, the optimizer of SDR after dropping the constraint is given by
which does not obey the nonnegativity constraint on .
2.2 Comparison with Prior Relaxation Heuristics
2.2.1 Superiority over LP relaxations.
Careful readers will remark that there might exist other convex constraints on and that we can enforce to tighten the proposed semidefinite relaxation. One alternative is the following marginalization constraints, which have been widely invoked in LP relaxation for MAP estimation:
(10) 
Somewhat unexpectedly, these constraints turn out to be redundant, as asserted in the following theorem.
Theorem 1.
Any feasible solution to SDR (i.e. any obeying the feasibility constraints of SDR) necessarily satisfies
(11) 
Proof.
See the supplemental material. ∎
Intuitively, this property arises from the following features of and :
These intrinsic properties are then propagated to all offdiagonal blocks by the semidefinite constraint.
2.2.2 Invariance under variable reparameterization.
Pioneered by the beautiful relaxation proposed for the MAXCUT problem (Goemans & Williamson, 1995), many SDP approaches developed for combinatorial problems employ the integer indicator to parameterize binary variables (e.g., (Torr, 2003; Kumar et al., 2009)). If one applies matrix lifting and follows a similar relaxation procedure, the resulting semidefinite relaxation (referred to as SDR2) can be derived as follows
subject to  (12)  
where are defined as
In fact, SDR2 is identical to SDR, as formally stated below.
Theorem 2.
is the solution to SDR if and only if
is the solution to SDR2.
Proof.
See the supplemental material. ∎
Despite the theoretical equivalence between SDR2 and SDR, from a numerical perspective, solving SDR2 is much harder than solving SDR. The difficulty arises from the complicated form of the linear constraints enforced by SDR2 (i.e., (12)). Note that the advantage of SDR2 is that all diagonal entries of are equal to as follows
Nevertheless, none of prior SDP algorithms takes full advantage of this property in accelerating the algorithm.
3 Scalable Optimization Algorithm
The curse of dimensionality poses inevitable numerical challenges when applying generalpurpose SDP solvers to solve SDR. Despite their superior accuracy, primaldual interior point methods (IPM) like SDPT
(Toh et al., 1999) are limited to smallscale problems (e.g. on a regular PC). More scalable solvers such as CSDP (Helmberg & Rendl, 2000) and DSDP (Benson & Ye, 2008) propose to solve the dual problem. However, since the nonnegativity constraints produce numerous dual variables, these solvers are still far too restrictive for our program — none of them can solve SDR on a standard PC when exceeds 1000.The limited scalability of interior point methods has inspired a flurry of activity in developing firstorder methods, among which the alternating direction method of multipliers (ADMM) (Wen et al., 2010; Boyd et al., 2011) proves well suited for largescale problems. In this section, we propose an efficient variant of ADMM – referred to as SDPADLR (SDP Alternating Direction method for Low Rank structure), which is tailored to the special structure of SDR (including low rank and sparsity) and enables us to solve problems with very large dimensionality.
3.1 Alternating Direction Augmented Lagrangian Method (ADMM)
For convenience of presentation, we denote
and rewrite SDR in the operator form:
minimize  dual variables  
subject to  
(13) 
where encodes all and , collects the equality constraints, and gathers elementwise nonnegative constraints. We let variables , , and represent the corresponding dual variables for respective constraints. In the sequel, we will start by reviewing SDPAD, i.e., the original alternating direction method introduced in (Wen et al., 2010), and then present the key modification underlying the proposed efficient variant SDPADLR.
3.1.1 SDPAD: Procedures and Convergence
SDPAD considers the following augmented Lagrangian:
where the penalty parameter controls the strength of the quadratic term. As suggested by (Boyd et al., 2011), we initialize with a small value, and gradually increase it throughout the optimization process.
Let superscript indicate the variable in the th iteration. Each iteration of the SDPAD consists of a dual optimization step, followed by a primal update step given as follows
(14) 
Instead of jointly optimizing all dual variables, the key idea of SDPAD is to decouple the dual optimization step into several subproblems or, more specifically, to optimize in order with other variables fixed. This leads to closedform solutions for each subproblem as follows
Similar to that considered in (Wen et al., 2010), our stopping criterion involves measuring of both primal feasibility and dual feasibility .
Convergence property. In general, convergence properties of SDPAD are known when only equality constraints are present (Wen et al., 2010). However, the inequality constraints of SDR are special in the following two aspects:

They are elementwise nonnegativity constraints;

They are essentially decoupled from other linear constraints.
Property (ii) arises as all equality constraints are concerned with diagonal blocks of , while all linear inequality constraints are only enforced on its offdiagonal blocks. Such special structure leads to theoretical convergence guarantees for SDPAD, as stated in the following theorem.
Theorem 3.
The SDPAD method presented above converges to the optimizer of SDR.
Proof.
See the supplemental material. ∎
3.1.2 SDPADLR: Accelerated Method
Apparently, the most computationally expensive step of SDPAD is the update of , which involves the eigendecomposition of an matrix. This limits the applicability of SDPAD to largescale problems (e.g.
). To bypass this numerical bottleneck, we modify SDPAD and present an efficient heuristic called SDPADLR, which exploits the lowrank structure of
.First, we observe that can be alternatively expressed as
This allows us to present SDPAD without invoking . The detailed steps of SDPAD can now be summarized as in Algorithm 1.
(15) 
It is straightforward to see that the bottleneck of Algorithm 1 lies in how to compute and store the primary variable . To derive an efficient solver, we make the assumption that the optimal solution is lowrank. This is motivated by the empirical evidence that for a variety of problems (see the experimental section for details), SDR is exact, meaning . Moreover, in the general case, the rank of is expected to be much small than its dimension (e.g. (Burer & Monteiro, 2003)), i.e.,
where is the number of constraints.^{1}^{1}1Practically, many negativity constraints are redundant. of SDPR.
Based on this assumption, the key idea of SDPADLR is to invoke a lowrank matrix for some small and encode throughout the iterative process. This allows us to keep all the variables in memory even for largescale problems.
In this case, (15) is modified as , where and represent the top eigenvalues and respective eigenvectors of
(16) 
Although is a dense matrix, its top eigenvectors can be efficiently computed using the Lanczos process (Cullum & Willoughby, 2002), whose efficiency is dictated by the complexity of the matrix multiplication operator . As SDR only involves the constrains , the matrix turns our to share the same sparsity pattern with . Thus, the complexity of computing is at most .
Theoretically, it is extremely challenging to derive an upper bound on to ensure the exactness of the modified algorithm. To address this issue, we thus design SDPADLR so that it iteratively doubles the value of and reapplies the modified algorithm until it returns the optimal solution. For most of our experiments, we found that is sufficient.
The pseudocode of SDPADLR is summarized in Algorithm 2.
3.2 Iterative Rounding
Similar to other ADMM methods (Boyd et al., 2011), SDPADLR converges rapidly to moderate accuracy within the first 400 iterations, and significantly slows down afterwards. Thus, rather than continuing until SDPADLR converges, it would be more efficient to shrink the problem size by fixing those variables whose optimal states are likely to have been revealed. Specifically, after each round of SDPADLR, we fix the optimal state of a variable if ( for all the examples) or . We then reapply the iterative procedures on the reduced problem. In practice, we find that due to the tightness of SDR, the size of the reduced problems are significantly smaller than the original problem, and one iterative rounding procedure is usually sufficient.
4 Experimental Results
Method  deer_0034.K10.F100 (dense)  file_30markers (sparse)  folding_2BE6 (dense)  gm275 (sparse)  

cpu  gap  inf  cpu  gap  inf  cpu  gap  inf  cpu  gap  inf  
SDPADLR  4:33  7.2e4  1.3e6  7:33  2.2e4  5.3e6  2:44:36  2.3e4  5.3e7  21:33  5.1e4  1.3e6 
SDPAD  8:29  8.2e5  4.3e7  10:33  9.4e5  1.3e7  25:56:37  2.3e4  3.7e6  41:33:21  1.2e4  3.1e6 
SDPNAL  10:55  8.1e5  1.3e6  9:42  6.2e5  2.1e6  18:33:11  5.2e5  4.7e7  21:34:35  9.7e5  4.5e7 
IPMNC  1:27  2.3e3  na  2:37  4.1e7  na  10:23  4.5e2  na  21:56  3.5e6  na 
MOSEK  21:33:10  2.3e6  1.3e9  na  na  na  
MULUpdate  6:13:56  8.1e3  2.7e5  na  na  na 
In this section, we evaluate SDPADLR on several benchmark data sets and compare its performance against existing SDP solvers and stateoftheart MAP inference algorithms.
4.1 Benchmark Datasets
categories  probs  
PICObject  full  60  1121  37  5m32s 
PICFolding  mixed  2K  2503  21  21m42s 
PICAlign  dense  30400  2093  19  37m63s 
GMLabel  sparse  1K  7  324  6m32s 
GMChar  sparse  5K18K  2  100  1h13m 
GMMontage  grid  100K  5,7  3  9h32m 
GMMatching  dense  19  19  4  2m21s 
ORIENT  sparse  1K  16  10  10m21s 
We perform experimental evaluation on MAP estimation problems from three popular benchmark data sets (See Table 2), i.e., OPENGM2 (Kappes et al., 2013a), PIC (PIC, 2011), and a new data set ORIENT for the task of estimating consistent camera orientations (Crandall et al., 2011). OPENGM2 comprises 19 categories of mostly sparse MAP problems. We choose four representative categories for evaluation: Geometric Surface Labeling (GMLabel), Chinese Characters (GMChar), MRF Photomontage (GMMontage) and Matching (GMMatching). The first three categories GMLabel, GMCharacter and GMMontage are sparse MAP estimation problems with increasing scales. GMMatching is a special category where our convex relaxation is not tight. PIC comprises 10 categories of MAP inference problems of various structure. As we already include sparse MAP inference problems from OPENGM2, we pick 3 representative dense categories from PIC: Object Detection(PICObject), Image Alignment (PICAlign) and Folding (PICFolding).
4.2 SDP Solver Evaluation
Baseline algorithms. We evaluate the proposed SDPADLR against the following existing largescale SDP solvers.

SDPAD — the original ADMM method presented in (Wen et al., 2010).

SDPNAL — the NewtonCG (conjugate gradient) augmented method proposed in (Zhao et al., 2010).

IPMNC — the nonconvex interior point method which attempts to solve a direct relaxation of the MAP inference problem (Burer & Monteiro, 2003):
minimize subject to This method serves as an alternative lowrank heuristic for the proposed SDPADLR. With losing generality, we set the initial values of .

MOSEK — the cuttingedge interior point method. To apply it on largescale SDRs, we add the nonnegativity constraints in an incremental fashion, i.e., at each iteration, we detect the 100 smallest negative entries and add them to the constraint set.

MULUpdate — an approximate online SDP solver that is based on multivariate weight updates (Arora et al., 2012).
Problem sets. For evaluation, we consider four categories, on which most baseline algorithms are applicable: PICOBJ, PICAlign, PICFolding and GM2Label. For simplicity, we pick a representative problem from each category. The dimensions of these problem sets range from to , and they contain both dense and sparse problems (See Table 1).
SDPADLR  Ficolofo  BRAOBB  expand  TRWSLF2  ogmTRBP  MCBC  Astar  

ORIENT  7834.6  na  3059.2  7695.4  7592.4  7553.8  na  na 
100%  0%  0%  0%  0%  
PICObject 
19316.12  19308.94  19113.87  10106.8  19020.82  18900.81  na  na 
97.3%  91.9%  24.3%  0%  59.5%  32.2%  
PICFolding  5963.68  5963.68  5927.01  5652.76  5905.01  5907.24  na  na 
100%  100%  42.9%  14.2%  38.1%  42.9%  
PICAlign  2285.23  2285.34  2285.34  2285.34  2286.64  2289.12  na  na 
100%  90%  90%  90%  80%  70%  
GMLabel  476.95  na  na  476.95  476.95  486.42  na  na 
100%  100%  99.67%  40%  
GMChar  59550.67  na  na  na  49519.44  49507.98  49550.10  na 
86.1%  11%  6%  89.1%  
GMMontage  168298.00  na  na  168220.00  735193.0  235611.00  na  na 
66.3%  33.3%  0%  0%  
GMMatching  44.19  na  21.22  na  32.38  5.5e10  na  21.22 
0%  100%  0%  0%  100% 
Evaluation protocol. Following the standard protocol for assessing convex programs, we evaluate the duality gap and the primal/dual infeasibility of each algorithm:
gap  
inf  
As IPMNC solves a different optimization problem, we report the gap between its optimal solutions with the groundtruth optimal solutions.
Analysis of results. We run each algorithm until the duality gap is below or the maximum number of iterations is reached. Table 1 shows the running time, duality gap and maximum primal/dual infeasibility of each algorithm on each problem. We can see that SDPADLR generates results that are comparable to SDPAD and SDPNAL. However, SDPADLR turns out to be remarkably more efficient than SDPAD and SDPNAL on largescale or sparse datasets. This is due to the fact that SDPADLR only requires computing the top eigenvalues, which is both memory and computationally efficient.
Both interior point methods (i.e., IPMNC and MOSEK) have provable guarantees to generate more accurate results than other methods. However, MOSEK is not scalable to large data sets, as reported in Table 1. IPMNC is scalable to largescale problems, as the number variables involved is small. However, as IPMNC solves a nonconvex optimization problem, it may easily get trapped into local minimals (e.g., on deer_0034.K10.F100_30markers and folding_2BE6).
Finally, the multivariate weight update method MULUpdate turns out be inefficient on solving SDRs of MAP inference problems. This is due to the fact that MULUpdate is an approximate solver and it requires a lot of iterations to obtain an accurate solution.
4.3 MAP Inference Evaluation
Experimental setup. We compare SDR with the topperforming algorithms from OPENGM2 (Kappes et al., 2013a). These algorithms include (i) BRAOBB (Otten & Dechter, 2012), which is based on combinatorial search, (ii) expansion (Szeliski et al., 2008)–a move making method, (iii) MCBC (Kappes et al., 2013b), which is based on a highly optimized maxcut solver, (iv) TRWSLF2 (Kolmogorov, 2006)– Treereweighted message passing, (vi) ogmTRBP— Treereweighted belief propagation (Szeliski et al., 2008) and (vii) ficolofo (Cooper et al., 2010)– the top performing method on dense problems of PIC.
We use two measures to assess the performance of each method. The first measure evaluates for each method the mean objective values of the resulting MAP assignments on each category. For the consistency with (Kappes et al., 2013a), we report , meaning that the smaller the value, the better the algorithm. The second measure reports the percentage that each method achieves the best solution among all existing methods (not necessarily the global optimal). The higher the percentage, the better the algorithm.
Performance. Table 3 summarizes the performance of SDPADLR v.s. stateoftheart MAP inference algorithms on each type of problems. In each block, the top element (which is tilted) describes of each method on each category, and the bottom block describes the percentage of obtaining the best solution. We can see that the overall performance of SDPADLR is superior to each other individual algorithm. Except on GMMatching, SDPADLR is the top performing on each other dataset. In contrast, each existing method either does not apply or generates poor results on one or several datasets. This shows the advantage of solving a strong convex relaxation of the MAP inference problem. Below we break down the performance on each benchmark.

ORIENT. SDPADLR is the leading method on ORIENT. The problems in ORIENT exhibit specific structures, i.e, the pairwise potentials consist of approximately shifted permutation matrices. Experimentally, we found that SDR is usually tight on these problems. This explains the superior performance SDPADLR. In contrast, linear programming relaxations are not tight on ORIENT, and thus TRBP and TRWS only deliver moderate performance. Moreover, this structural pattern leads to huge search spaces for combinatorial algorithms (e.g., BRAOBB), and they can easily get stuck in local optimums.

Dense problems. SDPADLR also outperforms other methods on three dense categories from PIC. It achieves the best mean energy value as well as the highest percentage of obtaining the best solution. This again arises since SDR is tight on these problems.

Sparse problems. SDR yields comparable results with stateoftheart algorithms on the three sparse categories from OPENGM2. GMLabel consists of problems where the standard LP relaxation is tight. On GMChar which consists of largescale binary problems, SDR is comparable to MCBC in the sense that SDR achieves a better mean energy value while MCBC attains a higher percentage of being the best solution. This arises because MCBC is a highly optimized solver designed for binary quadratic problems. On the other hand, SDPADLR is only an approximate SDP solver which, in some cases, may not converge to the global optimum due to numerical issues.

GMMatching.
SDR only yields moderate results on GMMatching. This occurs because SDR is not tight on GMMatching. In contrast, as GMMatching is a smallscale problem, combinatorial optimization techniques such as BRAOBB and Astar are capable of finding globally optimal solutions.
Running Times. The running time of SDPADLR (including the rounding procedure) is of the same scale as other convex relation techniques. As shown in Table 2, our preliminary Matlab implementation takes less than 10 mins on smallscale problems (i.e. those in PICObject, GMMatching and PICLabel). On medium size problems, i.e., those in PICFolding, PICAlign, GMChar and ORIENT, the running time of SDPADLR ranges from 20 minutes to 1 hour. On largescale problems from GMMontage, SDPADLR takes around 8 hours on each problem. However, there is still huge room for improvement. One alternative is to use the eigenvalues computed in the previous iteration to accelerate the eigendecomposition at the current iteration, which is left for future work.
5 Conclusions
In this paper, we have presented a novel semidefinite relaxation for secondorder MAP estimation and proposed an efficient ADMM solver. We have extensively compared the proposed SDP solver with various stateoftheart SDP solvers. Experimental results confirm that our SDP solver is much more scalable than prior approaches when applied to various MAP estimation problem, which enables us to apply SDR on largescale datasets. Owing to the power of semidefinite relaxation, SDR proves superior to other topperforming MAP inference algorithms on a variety of benchmark datasets.
There are plenty of opportunities for future research. First, we would like to extend SDR to higherorder MAP problems. Moreover, it would be interesting to integrate SDR and combinatorial optimization techniques, which has the potential to boost the power of both. From the theoretical side, theoretical support for exact estimation with SDR would be one exciting direction for investigation. This would offer justification of the presented lowrank heuristic. On the other hand, as many combinatorial optimization problems can be formulated as MAP inference problems, such exact estimation conditions can shed light on the original combinatorial optimization problems.
Acknowledgments
This work has been supported in part by NSF grants FODAVA 808515 and CCF 1011228, AFOSR grant FA95501210372, ONR MURI N000141310341, and a Google research award.
References
 PIC (2011) Probabilistic inference chanllenge, 2011. http://www.cs.huji.ac.il/project/PASCAL/index.php.
 Arora et al. (2012) Arora, Sanjeev, Hazan, Elad, and Kale, Satyen. The multiplicative weights update method: a metaalgorithm and applications. Theory of Computing, 8(1):121–164, 2012.
 Batra et al. (2011) Batra, D., Nowozin, S., and Kohli, P. Tighter relaxations for MAPMRF inference: A local primaldual gap based separation algorithm. AISTATS’11, 15:146–154, 2011.
 Benson & Ye (2008) Benson, S. and Ye, Y. DSDP5: software for semidefinite programming. ACM Trans. Math. Softw., 34(3):16:1–16:20, May 2008.

Boyd et al. (2011)
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J.
Distributed optimization and statistical learning via the alternating
direction method of multipliers.
Foundations and Trends in Machine Learning
, 3(1):1–122, 2011.  Burer & Monteiro (2003) Burer, Samuel and Monteiro, Renato D. C. A nonlinear programming algorithm for solving semidefinite programs via lowrank factorization. Math. Program., 95(2):329–357, 2003.
 Chekuri et al. (2004) Chekuri, C., Khanna, S., Naor, J., and Zosin, L. A linear programming formulation and approximation algorithms for the metric labeling problem. SIAM J. Discrete Math., 18(3):608–625, 2004.
 Cooper et al. (2010) Cooper, M. C., de Givry, S., Sanchez, M., Schiex, T., Zytnicki, M., and Werner, T. Soft arc consistency revisited. Artif. Intell., 174(78):449–478, 2010.
 Crandall et al. (2011) Crandall, D., Owens, A., Snavely, N., and Huttenlocher, D. SfM with MRFs: discretecontinuous optimization for largescale structure from motion. CVPR’11, pp. 3001–3008, 2011.
 Cullum & Willoughby (2002) Cullum, J. K. and Willoughby, R. A. Lanczos Algorithms for Large Symmetric Eigenvalue Computations. Number 41. SIAM, 2002.
 Goemans & Williamson (1995) Goemans, M. and Williamson, D. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. JACM, 1995.
 Helmberg & Rendl (2000) Helmberg, C. and Rendl, F. A spectral bundle method for semidefinite programming. SIAM Journal on Optimization, 10(3):673–696, 2000.
 Kappes et al. (2013a) Kappes, J. H., Andres, B., Hamprecht, F. A., Schnorr, C., Nowozin, S., Batra, D., Kim, S., Kausler, B. X., Lellmann, J., Komodakis, N., and Rother, C. A comparative study of modern inference techniques for discrete energy minimization problems. In CVPR’13, June 2013a.

Kappes et al. (2013b)
Kappes, J. H., Speth, M., Reinelt, G., and Schnörr, C.
Towards efficient and exact MAPinference for large scale discrete computer vision problems via combinatorial optimization.
In CVPR, 2013b.  Kolmogorov (2006) Kolmogorov, V. Convergent treereweighted message passing for energy minimization. IEEE PAMI., 28:1568–1583, October 2006.
 Komodakis & Paragios (2008) Komodakis, N. and Paragios, N. Beyond loose LPrelaxations: Optimizing MRFs by repairing cycles. In ECCV (3), pp. 806–820, 2008.
 Kumar et al. (2009) Kumar, M., Kolmogorov, V., and Torr, P. An analysis of convex relaxations for MAP estimation of discrete MRFs. JMLR, 10:71–106, 2009.
 Olsson et al. (2007) Olsson, C., Eriksson, A., and Kahl, F. Solving large scale binary quadratic problems: Spectral methods vs. semidefinite programming. In CVPR’07, 2007.
 Otten & Dechter (2012) Otten, Lars and Dechter, Rina. Anytime and/or depthfirst search for combinatorial optimization. AI Commun., 25(3):211–227, 2012.
 Peng et al. (2012) Peng, Jian, Hazan, Tamir, Srebro, Nathan, and Xu, Jinbo. Approximate inference by intersecting semidefinite bound and local polytope. In AISTATS, pp. 868–876, 2012.
 Ravikumar et al. (2010) Ravikumar, P., Agarwal, A., and Wainwright, M. J. Messagepassing for graphstructured linear programs: Proximal methods and rounding schemes. The Journal of Machine Learning Research, 11:1043–1080, 2010.
 Shimony (1994) Shimony, S. E. Finding MAPs for belief networks is NPhard. Artif. Intell., 68(2):399–410, August 1994.
 Sontag et al. (2012) Sontag, D., Meltzer, T., Globerson, A., Jaakkola, T. S., and Weiss, Y. Tightening LP relaxations for MAP using message passing. arXiv preprint arXiv:1206.3288, 2012.
 Szeliski et al. (2008) Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., and Rother, C. A comparative study of energy minimization methods for Markov random fields with smoothnessbased priors. PAMI, 30(6):1068–1080, June 2008.
 Toh et al. (1999) Toh, K. C., Todd, M. J., and Tutuncu, R. H. SDPT3–a Matlab software package for semidefinite programming. Opt. Methods and Software, 11(12):545–581, 1999.
 Torr (2003) Torr, Philip. Solving Markov random fields using semidefinite programming. In AISTATs’03, 2003.
 Wainwright et al. (2005) Wainwright, M., Jaakkola, T., and Willsky, A. MAP estimation via agreement on trees: messagepassing and linear programming. IEEE Trans Info Theory, 2005.
 Wainwright & Jordan (2008) Wainwright, M. J. and Jordan, M. I. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(12), 2008.
 Wang et al. (2013) Wang, P., Shen, C., and van den Hengel, A. A fast semidefinite approach to solving binary quadratic problems. In CVPR ’13, pp. 1312–1319, 2013.
 Wen et al. (2010) Wen, Z., Goldfarb, D., and Yin, W. Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Prog. Comp., 2(34):203–230, 2010.
 Zhao et al. (2010) Zhao, XinYuan, Sun, Defeng, and Toh, KimChuan. A newtoncg augmented lagrangian method for semidefinite programming. SIAM J. on Optimization, 20(4):1737–1765, January 2010. ISSN 10526234.