1 Introduction
Given the probability distribution of a graphical model, maximum a posteriori (MAP) inference aims to infer the most probable label configuration. MAP inference can be formulated as an integer linear program (ILP)
variationalinferencebook2008 . However, due to the integer constraint, the exact optimization of ILP is intractable in many realistic problems. To tackle it, a popular approach is relaxing ILP to a continuous linear program over a local marginal polytope, i.e., (defined in Section 3), called linear programming (LP) relaxation. The optimal solution to the LP relaxation will be obtained at the vertices of . It has been known variationalinferencebook2008 that all valid integer label configurations are at the vertices of , but not all vertices of are integer, while some are fractional. Since LP relaxation is likely to give fractional solutions, the rounding method must be adopted to generate integer solutions. To alleviate this issue, intense efforts have been made to design tighter relaxations (e.g., highorder relaxation MAPLPthesis2010 ) based on LP relaxation, such that the proportion of fractional vertices of can be reduced. However, the possibility of fractional solutions still exists. And, these tighter relaxations are often much more computationally expensive than the original LP relaxation. Moreover, there are also exact inference methods, such as branchandbound branchandbound1960 and cuttingplane cuttingplane1960 , by utilizing LP relaxation as subroutines, leading to much higher computational cost than approximate methods.Instead of proposing a new approximation with a tighter relaxation, we propose an exact reformulation of the original MAP inference problem. Specifically, we add a new constraint, called sphere WUlpboxADMMPAMI2018 , onto the original LP relaxation problem. It enforces that the solution should be on a sphere, i.e., . We can prove that the intersection between the sphere constraint and the local polytope is equivalent to the set of all possible label configurations of the original MAP inference problem, i.e., the constraint space of the ILP problem. Thus, the proposed formulation, dubbed LSLP, is an equivalent but continuous reformulation of the ILP formulation for MAP inference. Furthermore, inspired by AD3JMLR2015 and WUlpboxADMMPAMI2018 , we adopt the ADMM algorithm ADMMboyd2011 , to not only separate the different constraints, but also decompose variables to allow parallel inference by exploiting the factor graph structure. Although the sphere constraint is nonconvex, we prove that the ADMM algorithm for the LSLP problem with a sufficiently small perturbation will globally converges to the KKT KKT1939 ; KKT2014 point of the original LSLP problem. The obvious advantages of the proposed LSLP formulation and the corresponding ADMM algorithm include: 1) compared to other LP relaxation based methods, our method directly gives the valid integer label configuration, without any rounding techniques as postprocessing; 2) compared to the exact methods like branchandbound branchandbound1960 and cuttingplane cuttingplane1960 , our method optimizes one single continuous problem once, rather than multiple times. Experiments on benchmarks from Probabilistic Inference Challenge (PIC 2011) PIC2011 and OpenGM 2 opengm2ijcv2015 verify the competitive performance of LSLP against stateoftheart MAP inference methods.
The main contributions of this work are threefold. 1) We propose a continuous but equivalent reformulation of the MAP inference problem. 2) We present the ADMM algorithm for optimizing the perturbed LSLP problem, which is proved to be globally convergent to the KKT point of the original LSLP problem. The analysis of convergence rate is also presented. 3) Experiments on benchmark datasets verify the competitive performance of our method compared to stateoftheart MAP inference methods.
2 Related Work
As our method is closely related to LP relaxation based methods, here we mainly review LP relaxation based MAP inference methods. For other categories of methods, such as message passing and move making, we refer the readers to variationalinferencebook2008 and opengm2ijcv2015 for more details. Although some offtheshelf LP solvers can be used to optimize the LP relaxation problem, in many realworld applications the problem scale is too large to adopt these solvers. Hence, most methods focus on developing efficient algorithms to optimize the dual LP problem. Block coordinate descent methods MPLPNIPS2007 TRWSPAMI2006 are fast, but they may converge to suboptimal solutions. Subgradient based methods PSDDICCV2007 DDSG2012 can converge to global solutions, but their convergence is slow. Their common drawback is the nonsmoothness of the dual objective function. To handle this difficulty, some smoothing methods have been developed. The Lagrangian relaxation LagrangianrelaxationMAP2007 method uses the smooth logsumexp function to approximate the nonsmooth max function in the dual objective. A proximal regularization proximalregularizationICML2010 or an regularization term smoothstrongMAP2015 is added to the dual objective. Moreover, the steepest descent method proposed in epsilondescentnips2012 and FWepsilondescenticml2014 can accelerate the convergence of the standard subgradient based methods. Parallel MAP inference methods based on ADMM have also been developed to handle largescale inference problems. For example, AD3 AD3JMLR2015 AD3ICML2011 and BetheADMM BetheADMMUAI2013 optimize the primal LP problem, while ADMMdual dualADMMECML2011 optimizes the dual LP problem. The common drawback of these methods is that they are likely to produce fractional solutions, since the underlying problem is merely a relaxation to the MAP inference problem.
Another direction is pursuing tighter relaxations, such as highorder consistency MAPLPthesis2010 and SDP relaxation SDPrelaxation2002 . But they are often more computationally expensive than LP relaxations. In contrast, the formulation of the proposed LSLP is an exact reformulation of the original MAP inference problem, and the adopted ADMM algorithm can explicitly produce valid integer label configurations, without any rounding operation. In comparison with other expensive exact MAP inference methods (e.g., BranchandBound branchandbound1960 and cutting plane cuttingplane1960 ), LSLP is very efficient owing to the resulting parallel inference, similar to other ADMM based methods.
Another related work is box ADMM WUlpboxADMMPAMI2018 , which is a framework to optimize the general integer program. The proposed LSLP is inspired by this framework, where the integer constraints are replaced by the intersection of two continuous constraints. However, 1) LSLP is specifically designed for MAP inference, as it replaces the valid integer configuration space (e.g., for the variable with binary states), rather than the whole binary space (e.g., ) as did in box ADMM. 2) LSLP is tightly combined with the LP relaxation, and the ADMM algorithm decomposes the problem into multiple simple subproblems by utilizing the structure of the factor graph, which allows parallel inference for any type of inference problems (e.g., multiple variable states and highorder factors). In contrast, box ADMM does not assume any special properties for the objective function, and it optimizes all variable nodes in one subproblem. Especially for largescale models, the subproblem involved in box ADMM could be very cost. 3) As the LP relaxation is parameterized according to the factor graph, any types of graphical models (e.g., directed models, highorder potentials, asymmetric potentials) can be naturally handled by LSLP. In contrast, Lpbox ADMM needs to transform the inference objective based on MRF graphs to some easy forms (e.g., binary quadratic program (BQP)). However, the transformation is nontrivial in some cases. For example, if there are highorder potentials, the graphical model is difficult to be transformed to BQP.
3 Background
3.1 Factor Graph
Denote as a set of random variables in a discrete space , where with being the possible states of . The joint probability of is formulated based on a factor graph kollerpgm2009 ,
(1) 
where with being the node set of variables, being the node set of factors, as well as the edge set linking the variable and factor nodes. A simple example of an MRF and its factor graph is shown in Fig. 1(a,b). We refer the readers to kollerpgm2009 for the detailed definition of the factor graph. indicates the label configuration of the factor , and its state will be determined according to the connected variable nodes . denotes the unary log potential (logPot) function, while indicates the factor logPot function.
3.2 MAP Inference as Linear Program
Given , an important task is to find the most probable label configuration of , referred to as MAP inference,
(2) 
Eq. (2) can be reformulated as the integer linear program (ILP) variationalinferencebook2008 ,
(3)  
where
denotes the log potential (logPot) vector, derived from
and . , where and . indicates the label vector corresponding to : if the state of is , then , while all other entries are . Similarly, indicates the label vector corresponding to . The local marginal polytope is defined as follows,(4)  
with being the probability simplex, and the second constraint ensures the local consistency between and . of the local consistency constraint included in is defined as: the entry of is if , where indicates the state of and the state of the corresponding element in are the same; otherwise, the entry is . For example, we consider a binarystate variable node and a pairwise factor node connected to two variable nodes (the variable node is the first). The first entry of indicates the score of choosing state , while the second entry corresponds to that of choosing state . The four entries of indicate the scores of four label configurations of two connected variables, i.e., . In this case, .
Moreover, Eq. (2) can also be rewritten as
(5) 
where the marginal polytope is defined as follows,
(6) 
Solving is difficult (NPhard in general), especially for large scale problems. Instead, the approximation over is widely adopted, as follows:
(7) 
which is called LP relaxation. Note that here and are continuous variables, and they are considered as local marginals of and , respectively.
According to variationalinferencebook2008 , the characteristics of , and their relationships are briefly summarized in Lemma 1.
Lemma 1
variationalinferencebook2008 The relationship between and , and that between and are as follows.

;

;

All vertices of are integer, while includes both integer and fractional vectices. And the set of integer vertices of is same with the set of the vertices of . All nonvertices in and are fractional points.

Since both and are convex polytopes, the global solutions of and will be on the vertices of and , respectively.

The global solution of could be fractional or integer. If it is integer, then it is also the global solution of .
3.3 KurdykaLojasiewicz Inequality
The KurdykaLojasiewicz inequality was firstly proposed in lojasiewicz1963propriete , and it has been widely used in many recent works attouch2010proximal ; wotaoyinarxiv2015 ; admmnonconvexsiam2015 for the convergence analysis of nonconvex problems. Since it will also be used in the later convergence analysis of our algorithm, it is firstly produced here, as shown in Definition 1.
Definition 1
attouch2010proximal A function is said to have the KurdykaLojasiewicz (KL) property at ( indicates subgradient), if the following two conditions hold

there exist an constant , a neighborhood of , as well as a continuous concave function , with and is differentiable on with positive derivatives.

satisfying , the KurdykaLojasiewicz inequality holds
(8)
Remark. According to attouch2010proximal ; bolte2007lojasiewicz ; bolte2007clarke , if is semialgebraic, then it satisfies the KL property with , where and are constants. This point will be used in later analysis of convergence.
4 MAP Inference via sphere Linear Program Reformulation
4.1 Equivalent Reformulation
Firstly, we introduce the sphere constraint WUlpboxADMMPAMI2018 ,
(9) 
Note that is defined with respect to the vector , rather than individual scalars . Specifically, the constraint space is tighter than , including nonconvex constraints, while includes only one nonconvex constraint. We propose to add the sphere constraint onto the variable nodes . Combining this with LP relaxation (see Eq. (7) ), we propose a new formulation for MAP inference,
(10) 
Due to the nonconvex constraint , it is no longer a linear program. However, to emphasize its relationship to LP relaxation, we still denote it as a sphere constrained linear program (LSLP) reformulation. More importantly, as shown in Proposition 4.1, LSLP is equivalent to the original MAP inference problem, rather than a relaxation as in LP. Inspired by the constraint separation in box ADMM WUlpboxADMMPAMI2018 , we introduce the extra variable to reformulate (10) as
(11)  
where is the concatenated vector of all extra variable nodes. The combination of the original factor graph and these extra variable nodes is referred to as augmented factor graph (AFG). An example of AFG corresponding to Problem (11) is shown in Figure 1(c). The gray squares correspond to extra variables , and connections to the purple box indicate that . Note that AFG does not satisfy the definition of the standard factor graph, as it is not a bipartite graph where connections only exist between variables nodes and factor nodes. However, AFG provides a clear picture of the structure of LSLP and the node relationships. The proposed LSLP problem is equivalent to the original MAP inference problem, as shown in Proposition 4.1. It means that the global solutions of this two problems are equivalent.
Lemma 2
The following constraint spaces are equivalent.
(12) 
Proof
Firstly, we focus on . We have
(13) 
Besides, the following relations hold
(14)  
. The equation in the last relation holds if and only if . Combining with (13), we conclude that holds . Consequently, utilizing the local consistency constraint , we obtain that also holds . Thus, we have . Then, the relation is proved.
Besides, as shown in Lemma 1, the set of integer vertices of is same with the one of , and all nonvertices in and are fractional points. Thus, it is easy to know . Hence the proof is finished.
Theorem 4.1
Utilizing Lemma 2, the aforementioned MAP inference problems have the following relationships,
(15) 
4.2 A General Form and KKT Conditions
For clarity, we firstly simplify the notations and formulations in (11) to the general shape,
(18) 
where , with . with . , with with , and being the set of neighborhood nodes connected to the th factor. . The constraint matrix with . , with .
Definition 2
The solution of the LSLP problem (18) is said to be the KKT point if the following conditions are satisfied:
(19) 
where denotes the Lagrangian multiplier; indicates the subgradient of , while represents the gradient of . Moreover, is considered as the KKT point if the following conditions hold:
(20) 
5 Perturbed ADMM Algorithm for LSLP
We propose a perturbed ADMM algorithm to optimize the following perturbed augmented Lagrangian function,
(24) 
where with a sufficiently small constant , then is full row rank. , with and . . Note that both and are full row rank, and the secondorder gradient is bounded. These properties will play key roles in our later analysis of convergence.
Following the conventional ADMM algorithm, the solution to the LSLP problem (18) could be obtained through optimizing the following subproblems based on (24) iteratively. The general structure of the algorithm is summarized in Algorithm 1.
5.1 SubProblem w.r.t. in LSLP Problem
Given and , could be updated by solving the subproblem (21) (see Algorithm 1). According to the definitions of , this problem can be further separated to the following two independent subproblems, which can be solved in parallel.
Update :
(25) 
It has a closed form solution as follows
(26) 
where with . is the projection onto : . As demonstrated in WUlpboxADMMPAMI2018 , this projected solution is the optimal solution to (25).
Update : The subproblems with respect to can be run in parallel ,
(27) 
where denotes the index set of variable nodes connecting to the factor node . It is easy to know that Problem (27) is convex, as is positive semidefinite and is a convex set. Any offtheshelf QP solver could be adopted to solve (27). In experiments, we adopt the activeset algorithm implemented by a publiclyavailable toolbox called Quadratic Programming in C (QPC)^{1}^{1}1http://sigpromu.org/quadprog/download.php?sid=3wtwk5tb, which is written in C language and can be called from MATLAB.
5.2 SubProblem w.r.t. in LSLP Problem
5.3 Update in LSLP Problem
5.4 Complexity and Implementation Details
Complexity. In terms of computational complexity, as all other update steps have simple closedform solutions, the main computational cost lies in updating , which is convex quadratic programming with the probability simplex constraint. Its computational complexity is .
As the matrix with the largest size is in LSLP, the space complexity is . Both the computational and space complexity of AD3 are similar with LSLP. More detailed analysis about the computational complexity will be presented in Section 7.5.
Implementation details. In each iteration, we use the same value of for all and . After each iteration, we update using an incremental rate , i.e., . A upper limit of is also set: if is larger than , it is not updated anymore. The perturbation is set to . We utilizes two stopping criterion jointly, including: 1) the violation of the local consistency constraint, i.e., ; 2) the violation of the equivalence constraint , i.e., . We set the same threshold for both criterion. If this two violations are lower than simultaneously, then the algorithm stops.
6 Convergence Analysis
The convergence property of the above ADMM algorithm is demonstrated in Theorem 6.1. Due to the space limit, the detailed proof will be presented in Appendix.
Theorem 6.1
We suppose that is set to be larger than a constant, then the variable sequence generated by the perturbed ADMM algorithm globally converges to , where is the KKT point to the LSLP problem (18), as defined in Definition 2.
Furthermore, according to Definition 1, we assume that has the KL property at with the concave function , where . Consequently, we could obtain the following inequalities:

If , then the perturbed ADMM algorithm will converge in finite steps.

If , then we will obtain the KKT solution to the LSLP problem in at least steps.

If , then we will obtain the KKT solution to the LSLP problem in at least steps.
Proof
The general structure of the proof consists of the following six consecutive steps, as follows:

The perturbed augmented Lagrangian function (see (24)) is monotonically decreasing along the optimization.

The variable sequence is bounded.

The sequence of variable residuals is converged, i.e., , as .

The variable sequence globally converges to the cluster point .

is the KKT point of the LSLP problem (18).

We finally analyze the convergence rate that how many steps are required to achieve the KKT point.
The detailed proof will be presented in Appendix.
dataset  C  C  C  C  C  C  C 

Seg2  50  229.14  622.28  1244.56  2  2  4 
Seg21  50  229.14  622.28  1244.56  2  21  441 
Scene  715  182.56  488.99  977.98  2  8  64 
Grids  21  3142.86  6236.19  12472.4  2  2  4 
Protein  7  14324.7  21854.7  57680.4  2.64  2  6.56 
7 Experiments
7.1 Experimental Settings
7.1.1 Datasets
We evaluate on four benchmark datasets from the Probabilistic Inference Challenge (PIC 2011) PIC2011 and OpenGM 2 opengm2ijcv2015 , including Segmentation PIC2011 , Scene scenedataiccv2009 , Grids PIC2011 , and Protein proteindata2006 , as shown in Table 1. Segmentation consists of Seg2 and Seg21, with different variable states. Protein includes higherorder potentials, while others are pairwise graphs.
7.1.2 Compared Methods
We compare with different categories of MAP inference methods, including: 1) moving making methods, i.e., ICM ICM1986 ; 2) messagepassing methods, including belief propagation (BP) BP2001 and TRBP TRBP2005 ; 3) polyhedral methods (including LP relaxation based methods), including dual decomposition using subgradient (DDSG) DDSG2012 , TRWS TRWSPAMI2006 , ADSal ADSal2012 , PSDD PSDDICCV2007 and AD3 AD3ICML2011 AD3JMLR2015 . 4) We also compare with LPLP, which calls the the activeset method (implemented by linprog in MATLAB) to optimize . It serves as a baseline to measure the performance of above methods. 5) The most related work Box ADMM algorithm WUlpboxADMMPAMI2018 is also compared. However, the presented algorithm in WUlpboxADMMPAMI2018 can only handle MRF models with pairwise potentials, which is formulated as a binary quadratic programming (BQP) problem. Thus, Box ADMM (hereafter we call it Box for clarity) is not compared on Protein, of which models include highorder potentials. 6) We also compare with two hybrid methods, including method DAOOPT (adopting branchandbound method branchandbound1960 as a subroutine) daoopt2012 DAOOPT2012details and MPLPC mplpcuai2012 (adopting MPLP MPLPNIPS2007
as a subroutine). The ‘hybrid’ indicates that the method is a combination of an offtheshelf single method and some heuristic steps. And we call above 5 types as nonhybrid methods. Both the proposed LSLP and
Box are implemented by MATLAB. The following methods are implemented by the author provided C++ package, including: PSDD and AD3^{2}^{2}2http://www.cs.cmu.edu/ ark/AD3/, MPLPC^{3}^{3}3https://github.com/opengm/MPLP, and DAOOPT^{4}^{4}4https://github.com/lotten/daoopt. All other methods are implemented through the OpenGM 2 software opengm2ijcv2015 , and we add a prefix “ogm" before the method name, such as ogmTRWS.In experiments, we set some upper limits: the maximal iteration as 2000 for PSDD and AD3, 500 for Box and LSLP, and 1000 for other methods; for DAOOPT, the memory limit of mini buckets is set as 4000 MB and the upper time limit as 2 hours. The parameter tuning of all compared methods (except Box) is selfincluded in their implementations. Both LSLP and Box are ADMM algorithms, and their hyperparameters are tuned as follows: the hyperparameters , and (see implementation details of Section 5) are adjusted in the ranges , and , respectively, and those leading to the higher logPot value are used.
7.1.3 Evaluation Metrics
We evaluate the performance of all compared methods using three types of metrics, including the log potential (logPot) values, the solution type, as well as the computational complexity and runtime.
Evaluation using logPot values. The logPot value indicates the objective value of Eq. (7). Given that constraints in (7) are satisfied, the larger logPot value indicates the better inference performance. Since LPLP gives the optimal solution to (7) in , and we know that the constraint space of (7) is the subset of , then the logPot value of any valid label configurations cannot be larger than that of LPLP. Note that in the implementation of OpenGM 2 opengm2ijcv2015 , a rounding method is adopted as the postprocessing step to produce the integer solution for the continuous MAP inference methods. However, the performance of different MAP inference methods may be significantly changed by rounding. Thus, for other methods not implemented by OpenGM 2, we report the logPot values of original continuous solutions, without any rounding.
Evaluation using solution types. Since LPLP, PSDD, AD3, Box and LSLP are possible to give continuous solutions, the larger logPot value doesn’t always mean the better MAP inference result. Thus, we also define four qualitative measures, including valid, fractional, approximate and uniform, to intuitively measure the inference quality. Valid (V) means that the solution is integer and satisfies the constraints in ; Fractional (F) indicates that the solution belongs to , but its value is fractional; Approximate (A) means that some constraints in are violated, and its solutions is integer or fractional. Uniform (U) denotes that the solution belongs to , but the value is uniform, such as for the variable node with binary states. These qualitative types provide an intuitive measure of the inference quality.
Evaluation using the computational complexity and runtime. The computational complexity and the practical runtime are also important performance measures for MAP inference methods, as shown in Section 7.5.
Method type 
Baseline 
Hybrid methods 
Nonhybrid methods  Proposed  

Dataset 
LPLP 
DAOOPT 
MPLPC 
ogmICM 
ogmBP 
ogmTRBP 
ogmTRWS 
ogmADSal 
PSDD  AD3  Box  LSLP  
Seg2  mean  75.5  75.5  75.5  137.1  79  76.8  75.5  75.5  75.4  75.5  76.5  75.6 
std  19.63  19.63  19.63  70.1  20.24  19.36  19.24  19.24  19.77  19.63  20.3  19.69  
Seg21 
mean  324.89  325.34  324.89  393.37  330.37  328.92  324.89  324.89  325.1  324.89  344.51  324.89 
std  58.12  58.14  58.12  74.47  58.54  58.57  56.97  56.97  58.16  58.12  59.24  58.12  
Scene  mean  866.66  866.66  866.66  864.27  866.49  866.51  866.66  866.66  866.65  866.66  864.11  866.66 
std  109.34  109.34  109.36  109.64  109.22  109.2  109.19  109.19  109.34  109.34  108.66  109.34 
Method type 
Baseline  Hybrid methods  Nonhybrid methods  Proposed  

Model 
LPLP 
DAOOPT 
MPLPC 
ogmICM 
ogmBP 
ogmTRBP 
ogmDDSG 
ogmTRWS 
ogmADSal 
Box 
LSLP 
D1  3736.7  3015.7  3015.7  2708.9  121.3  235.2  1286.3  2524.9  2605.2  2794.8  2931.8

D2  3830.3  3051  3033.6  2567.9  276.4  19.2  1484.7  2674.4  2670.2  2812.4  2936.7

D3  5605.1  4517.3  4517.3  4067.3  332.1  14.02  1889.7  3829.3  3884  4301.1  4408.9

D4  5745.5  4563.2  4563.2  3837.12  924.5  36.7  2023.4  3894.6  4015  4202.4  4446.6

D5  1915.2  1542.7  1542.7  1318.41  481.5  47.8  807.6  1325.5  1323.9  1427.4  1503.2

D6  15601.2  12662.9  12665.7  10753.7  2793.5  2214.3  5051.9  10500.8  11029  11486.2  12336.1

D7  16291.5  13050.7  13054.8  10903.8  1217.1  132.4  4634.8  10665  10870.4  11867.6  12537.2

D8  23401.8  18952.45  18896.8  16154.2  4314.9  5371.1  7160  16014  16276.9  17367.5  18358.7

D9  24437.3  19538  19427.5  16334.2  3560.8  1111  7187.3  16004.3  16508.1  17990  18785.8

D10  3121.2  2689  2688.8  2255.38  1665.3  1582.9  1330.7  2215.7  2369.1  2552.6  2659.8

D11  3231.6  2714.67  2714.52  2258.54  1399.6  42.8  1285.9  2271.5  2370.1  2556.8  2654.9

D12  7800.6  6401.15  6396  5356.28  2033.5  1953.1  2832.5  5282.5  5558.8  5903  6201.2

D13  8078.5  6472.9  6469.7  5425.16  1711.5  381  2814.3  5452.8  5646.1  5923.5  6275.4

D14  62943  –  45813.6  43538.9  5690.9  6426.7  18700.4  42274.2  43292.5  44397.5 
48766.1

D15  63993.1  –  47444.4  42855  4287.1  956.4  18811.9  42535  42918.7  44759.5 
48657.3

D16  94414.5  –  69408.6  65081.2  4374.2  4656.5  27320.6  63148.1  64401.1  66784.2 
72993.8

D17  96243.6  –  71730.8  63768.1  13662.7  529.3  27287.7  63885.1  64487.9  67589.4 
73486

D18  12721.3  –  10445.8  9062.03  5198.7  4975.4  4785.5  8793.5  9408.4  10015.1 
10580.8

D19  12875.6  –  10674.1  9214.57  5944.6  1213.1  5328.5  8952.4  9385.1  10163.6 
10698.4

D20  31809.7  –  22292.5  21527.9  5410.9  4762  9837.3  21546.8  22109.5  22913.3 
24834.5

D21  31996.9  –  24032.4  21529.6  4242.3  47.6  10423.8  21195.9  21730.3  22668.7 
24532.8

Method type 
Hybrid methods 
Nonhybrid methods  Proposed  

Model  MPLPC 
ogmICM 
ogmBP 
ogmTRBP 
ogmDDSG 
PSDD  AD3  LSLP 
D1  30181.3  32409.9  32019.1  31671.6  33381.2  30128.8  30143.6 
30165.5

D2  29305.4  32561.3  30966.1  31253.3  33583.6  29307.3  29302.6 
29295.4

D4  28952.1  32570  31031.4  31176.6  33747.7  28952.5  28952 
28952

D5  269567  256489  382766  357330  553376  66132.3  115562  267814

D6  30070.6  31699.1  30765.2  30772.2  32952.9  30063.6  30062.2 
30063.4

D7  30288.3  32562.2  31659.6  31791.1  33620.4  30248.5  30239.8 
30266

D8  29336.5  32617.2  31064.7  31219.9  34549.9  29331  29336.1 
29334.7

7.2 Results on Segmentation and Scene
The average results on Seg2, Seg21 and Scene are shown in Table 2. LPLP gives valid solutions on all models, i.e., the best solutions. Except for PSDD, all other methods give valid solutions, and their logPot values can not be higher than those of LPLP. The logPot values of ICM are the lowest, and those of ogmBP, ogmTRBP are slightly lower than the best logPot values, while other methods achieve the best logPot values on most models. Only PSDD gives approximate solutions (i.e., the constraints in are not fully satisfied) on some models, specifically, 5 models in Seg2, 8 models in Seg21 and 166 models in Scene. ogmDDSG fails to give solutions on some models of these datasets, thus we ignore it. Evaluations on these easy models only show that the performance ranking is ogmICM < ogmBP, ogmTRBP, Box < others.
7.3 Results on Grids
The results on Grids are shown in Table 3. For clarity, we use the model indexes D1 to D21 to indicate the model name to save space in this section. The corresponding model names to D1 to D21 are grid20x20.f10.uai, grid20x20.f10.wrap.uai, grid20x20.f15.uai, grid20x20.f15.wrap.uai, grid20x20.f5.wr ap.uai, grid40x40.f10.uai, grid40x40.f10.wrap.uai, grid40x40. f15.uai, grid40x40.f15.wrap.uai, grid40x40.f2.uai, grid40x40. f2.wrap.uai, grid40x40.f5.uai, grid40x40.f5.wrap.uai, grid80 x80.f10.uai, grid80x80.f10.wrap.uai, grid80x80.f15.uai, grid 80x80.f15.wrap.uai, grid80x80.f2.uai, grid80x80.f2.wrap.uai, grid80x80.f5.uai, grid80x80.f5.wrap.uai, respectively.
The models in Grids are much challenging for LP relaxation based methods, as all models have symmetric pairwise log potentials and very dense cycles in the graph. In this case, many vertices of are uniform solutions . Consequently, the LP relaxation based methods are likely to produce uniform solutions. This is verified by that LPLP, AD3, PSDD give uniform solutions on all models in Grids, i.e., most solutions are . Thus, we only show the logPot values of LPLP in Table 3, to provide the theoretical upperbound of logPot of valid solutions from other methods. In contrast, the additional sphere constraint in LSLP excludes the uniform solutions. On small scale models D1 to D13, DAOOPT and MPLPC show the highest logPot values, while LSLP gives slightly lower values. On large scale models D14 to D21, DAOOPT fails to give any result within 2 hours. LSLP gives the best results, while MPLPC shows slightly lower results. Box performs worse than LSLP, MPLPC and DAOOPT on most models, while better than all other methods, among which ogmBP, ogmTRBP and ogmDDSG perform worst. These results demonstrate that 1) LSLP is comparable to hybrid methods DAOOPT and MPLPC, but with much lower computational cost (shown in Section 7.5); 2) LSLP performs much better than other approximated methods.
7.4 Results on Protein
The results on Protein are shown in Table 4. Different with above three datasets, Protein includes 8 large scale models, and with highorder factors. Similarly, we use the model indexes D1 to D8 to indicate the model name to save space in this section. The corresponding model names of D1 to D8 are didNotconverge1.uai, didNotconverge2.uai, didNotconverge4.uai, didNotconverge5.uai, didNotconverge6.uai, didNotconverge7.uai, didNotconverge8.uai, respectively. As D1 and D3 are completely same, we remove D3 in experiments. DAOOPT fails to give solutions within 2 hours on all models, and LPLP cannot produce solutions due to the memory limit. ogmTRWS and Box are not evaluated as it cannot handle highorder factors. LSLP produces valid integer solutions on all models, and gives the highest logPot values on all models except of D5. MPLPC gives slightly lower logPot values than LSLP. AD3 only produces a fractional solution on D4, while produces approximate and fractional solutions on other models, while PSDD gives approximate and fractional solutions on all models. Other methods also show much worse performance than LSLP and MPLPC. One exception is that ogmICM gives the best results on D5, and we find that D5 is the most challenging model for approximated methods.
Methods  Complexities 

MPLP  
MPLPC 
Comments
There are no comments yet.