MAP Inference via L2-Sphere Linear Program Reformulation

05/09/2019 ∙ by Baoyuan Wu, et al. ∙ King Abdullah University of Science and Technology 0

Maximum a posteriori (MAP) inference is an important task for graphical models. Due to complex dependencies among variables in realistic model, finding an exact solution for MAP inference is often intractable. Thus, many approximation methods have been developed, among which the linear programming (LP) relaxation based methods show promising performance. However, one major drawback of LP relaxation is that it is possible to give fractional solutions. Instead of presenting a tighter relaxation, in this work we propose a continuous but equivalent reformulation of the original MAP inference problem, called LS-LP. We add the L2-sphere constraint onto the original LP relaxation, leading to an intersected space with the local marginal polytope that is equivalent to the space of all valid integer label configurations. Thus, LS-LP is equivalent to the original MAP inference problem. We propose a perturbed alternating direction method of multipliers (ADMM) algorithm to optimize the LS-LP problem, by adding a sufficiently small perturbation epsilon onto the objective function and constraints. We prove that the perturbed ADMM algorithm globally converges to the epsilon-Karush-Kuhn-Tucker (epsilon-KKT) point of the LS-LP problem. The convergence rate will also be analyzed. Experiments on several benchmark datasets from Probabilistic Inference Challenge (PIC 2011) and OpenGM 2 show competitive performance of our proposed method against state-of-the-art MAP inference methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 4

page 5

page 6

page 7

page 9

page 12

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Given the probability distribution of a graphical model, maximum a posteriori (MAP) inference aims to infer the most probable label configuration. MAP inference can be formulated as an integer linear program (ILP)

variational-inference-book-2008 . However, due to the integer constraint, the exact optimization of ILP is intractable in many realistic problems. To tackle it, a popular approach is relaxing ILP to a continuous linear program over a local marginal polytope, i.e., (defined in Section 3), called linear programming (LP) relaxation. The optimal solution to the LP relaxation will be obtained at the vertices of . It has been known variational-inference-book-2008 that all valid integer label configurations are at the vertices of , but not all vertices of are integer, while some are fractional. Since LP relaxation is likely to give fractional solutions, the rounding method must be adopted to generate integer solutions. To alleviate this issue, intense efforts have been made to design tighter relaxations (e.g., high-order relaxation MAP-LP-thesis-2010 ) based on LP relaxation, such that the proportion of fractional vertices of can be reduced. However, the possibility of fractional solutions still exists. And, these tighter relaxations are often much more computationally expensive than the original LP relaxation. Moreover, there are also exact inference methods, such as branch-and-bound branch-and-bound-1960 and cutting-plane cutting-plane-1960 , by utilizing LP relaxation as sub-routines, leading to much higher computational cost than approximate methods.

Instead of proposing a new approximation with a tighter relaxation, we propose an exact reformulation of the original MAP inference problem. Specifically, we add a new constraint, called -sphere WU-lpbox-ADMM-PAMI-2018 , onto the original LP relaxation problem. It enforces that the solution should be on a -sphere, i.e., . We can prove that the intersection between the -sphere constraint and the local polytope is equivalent to the set of all possible label configurations of the original MAP inference problem, i.e., the constraint space of the ILP problem. Thus, the proposed formulation, dubbed LS-LP, is an equivalent but continuous reformulation of the ILP formulation for MAP inference. Furthermore, inspired by AD3-JMLR-2015 and WU-lpbox-ADMM-PAMI-2018 , we adopt the ADMM algorithm ADMM-boyd-2011 , to not only separate the different constraints, but also decompose variables to allow parallel inference by exploiting the factor graph structure. Although the -sphere constraint is non-convex, we prove that the ADMM algorithm for the LS-LP problem with a sufficiently small perturbation will globally converges to the -KKT KKT-1939 ; KKT-2014 point of the original LS-LP problem. The obvious advantages of the proposed LS-LP formulation and the corresponding ADMM algorithm include: 1) compared to other LP relaxation based methods, our method directly gives the valid integer label configuration, without any rounding techniques as post-processing; 2) compared to the exact methods like branch-and-bound branch-and-bound-1960 and cutting-plane cutting-plane-1960 , our method optimizes one single continuous problem once, rather than multiple times. Experiments on benchmarks from Probabilistic Inference Challenge (PIC 2011) PIC-2011 and OpenGM 2 opengm2-ijcv-2015 verify the competitive performance of LS-LP against state-of-the-art MAP inference methods.

The main contributions of this work are three-fold. 1) We propose a continuous but equivalent reformulation of the MAP inference problem. 2) We present the ADMM algorithm for optimizing the perturbed LS-LP problem, which is proved to be globally convergent to the -KKT point of the original LS-LP problem. The analysis of convergence rate is also presented. 3) Experiments on benchmark datasets verify the competitive performance of our method compared to state-of-the-art MAP inference methods.

2 Related Work

As our method is closely related to LP relaxation based methods, here we mainly review LP relaxation based MAP inference methods. For other categories of methods, such as message passing and move making, we refer the readers to variational-inference-book-2008 and opengm2-ijcv-2015 for more details. Although some off-the-shelf LP solvers can be used to optimize the LP relaxation problem, in many real-world applications the problem scale is too large to adopt these solvers. Hence, most methods focus on developing efficient algorithms to optimize the dual LP problem. Block coordinate descent methods MPLP-NIPS-2007 TRWS-PAMI-2006 are fast, but they may converge to sub-optimal solutions. Sub-gradient based methods PSDD-ICCV-2007 DD-SG-2012 can converge to global solutions, but their convergence is slow. Their common drawback is the non-smoothness of the dual objective function. To handle this difficulty, some smoothing methods have been developed. The Lagrangian relaxation Lagrangian-relaxation-MAP-2007 method uses the smooth log-sum-exp function to approximate the non-smooth max function in the dual objective. A proximal regularization proximal-regularization-ICML-2010 or an regularization term smooth-strong-MAP-2015 is added to the dual objective. Moreover, the steepest -descent method proposed in epsilon-descent-nips-2012 and FW-epsilon-descent-icml-2014 can accelerate the convergence of the standard sub-gradient based methods. Parallel MAP inference methods based on ADMM have also been developed to handle large-scale inference problems. For example, AD3 AD3-JMLR-2015 AD3-ICML-2011 and Bethe-ADMM Bethe-ADMM-UAI-2013 optimize the primal LP problem, while ADMM-dual dual-ADMM-ECML-2011 optimizes the dual LP problem. The common drawback of these methods is that they are likely to produce fractional solutions, since the underlying problem is merely a relaxation to the MAP inference problem.

Another direction is pursuing tighter relaxations, such as high-order consistency MAP-LP-thesis-2010 and SDP relaxation SDP-relaxation-2002 . But they are often more computationally expensive than LP relaxations. In contrast, the formulation of the proposed LS-LP is an exact reformulation of the original MAP inference problem, and the adopted ADMM algorithm can explicitly produce valid integer label configurations, without any rounding operation. In comparison with other expensive exact MAP inference methods (e.g., Branch-and-Bound branch-and-bound-1960 and cutting plane cutting-plane-1960 ), LS-LP is very efficient owing to the resulting parallel inference, similar to other ADMM based methods.

Another related work is -box ADMM WU-lpbox-ADMM-PAMI-2018 , which is a framework to optimize the general integer program. The proposed LS-LP is inspired by this framework, where the integer constraints are replaced by the intersection of two continuous constraints. However, 1) LS-LP is specifically designed for MAP inference, as it replaces the valid integer configuration space (e.g., for the variable with binary states), rather than the whole binary space (e.g., ) as did in -box ADMM. 2) LS-LP is tightly combined with the LP relaxation, and the ADMM algorithm decomposes the problem into multiple simple sub-problems by utilizing the structure of the factor graph, which allows parallel inference for any type of inference problems (e.g., multiple variable states and high-order factors). In contrast, -box ADMM does not assume any special properties for the objective function, and it optimizes all variable nodes in one sub-problem. Especially for large-scale models, the sub-problem involved in -box ADMM could be very cost. 3) As the LP relaxation is parameterized according to the factor graph, any types of graphical models (e.g., directed models, high-order potentials, asymmetric potentials) can be naturally handled by LS-LP. In contrast, Lp-box ADMM needs to transform the inference objective based on MRF graphs to some easy forms (e.g., binary quadratic program (BQP)). However, the transformation is non-trivial in some cases. For example, if there are high-order potentials, the graphical model is difficult to be transformed to BQP.

3 Background

3.1 Factor Graph

Denote as a set of random variables in a discrete space , where with being the possible states of . The joint probability of is formulated based on a factor graph koller-pgm-2009 ,

(1)

where with being the node set of variables, being the node set of factors, as well as the edge set linking the variable and factor nodes. A simple example of an MRF and its factor graph is shown in Fig. 1(a,b). We refer the readers to koller-pgm-2009 for the detailed definition of the factor graph. indicates the label configuration of the factor , and its state will be determined according to the connected variable nodes . denotes the unary log potential (logPot) function, while indicates the factor logPot function.

3.2 MAP Inference as Linear Program

Given , an important task is to find the most probable label configuration of , referred to as MAP inference,

(2)

Eq. (2) can be reformulated as the integer linear program (ILP) variational-inference-book-2008 ,

(3)

where

denotes the log potential (logPot) vector, derived from

and . , where and . indicates the label vector corresponding to : if the state of is , then , while all other entries are . Similarly, indicates the label vector corresponding to . The local marginal polytope is defined as follows,

(4)

with being the probability simplex, and the second constraint ensures the local consistency between and . of the local consistency constraint included in is defined as: the entry of is if , where indicates the state of and the state of the corresponding element in are the same; otherwise, the entry is . For example, we consider a binary-state variable node and a pairwise factor node connected to two variable nodes (the variable node is the first). The first entry of indicates the score of choosing state , while the second entry corresponds to that of choosing state . The four entries of indicate the scores of four label configurations of two connected variables, i.e., . In this case, .

Figure 1: An example of (a) MRF, (b) factor graph corresponding to LP and (c) augmented factor graph corresponding to LS-LP.

Moreover, Eq. (2) can also be rewritten as

(5)

where the marginal polytope is defined as follows,

(6)

Solving is difficult (NP-hard in general), especially for large scale problems. Instead, the approximation over is widely adopted, as follows:

(7)

which is called LP relaxation. Note that here and are continuous variables, and they are considered as local marginals of and , respectively.

According to variational-inference-book-2008 , the characteristics of , and their relationships are briefly summarized in Lemma 1.

Lemma 1

variational-inference-book-2008 The relationship between and , and that between and are as follows.

  • ;

  • ;

  • All vertices of are integer, while includes both integer and fractional vectices. And the set of integer vertices of is same with the set of the vertices of . All non-vertices in and are fractional points.

  • Since both and are convex polytopes, the global solutions of and will be on the vertices of and , respectively.

  • The global solution of could be fractional or integer. If it is integer, then it is also the global solution of .

3.3 Kurdyka-Lojasiewicz Inequality

The Kurdyka-Lojasiewicz inequality was firstly proposed in lojasiewicz1963propriete , and it has been widely used in many recent works attouch2010proximal ; wotao-yin-arxiv-2015 ; admm-nonconvex-siam-2015 for the convergence analysis of non-convex problems. Since it will also be used in the later convergence analysis of our algorithm, it is firstly produced here, as shown in Definition 1.

Definition 1

attouch2010proximal A function is said to have the Kurdyka-Lojasiewicz (KL) property at ( indicates sub-gradient), if the following two conditions hold

  • there exist an constant , a neighborhood of , as well as a continuous concave function , with and is differentiable on with positive derivatives.

  • satisfying , the Kurdyka-Lojasiewicz inequality holds

    (8)

Remark. According to attouch2010proximal ; bolte2007lojasiewicz ; bolte2007clarke , if is semialgebraic, then it satisfies the KL property with , where and are constants. This point will be used in later analysis of convergence.

4 MAP Inference via -sphere Linear Program Reformulation

4.1 Equivalent Reformulation

Firstly, we introduce the -sphere constraint WU-lpbox-ADMM-PAMI-2018 ,

(9)

Note that is defined with respect to the vector , rather than individual scalars . Specifically, the constraint space is tighter than , including non-convex constraints, while includes only one non-convex constraint. We propose to add the -sphere constraint onto the variable nodes . Combining this with LP relaxation (see Eq. (7) ), we propose a new formulation for MAP inference,

(10)

Due to the non-convex constraint , it is no longer a linear program. However, to emphasize its relationship to LP relaxation, we still denote it as a -sphere constrained linear program (LS-LP) reformulation. More importantly, as shown in Proposition 4.1, LS-LP is equivalent to the original MAP inference problem, rather than a relaxation as in LP. Inspired by the constraint separation in -box ADMM WU-lpbox-ADMM-PAMI-2018 , we introduce the extra variable to reformulate (10) as

(11)

where is the concatenated vector of all extra variable nodes. The combination of the original factor graph and these extra variable nodes is referred to as augmented factor graph (AFG). An example of AFG corresponding to Problem (11) is shown in Figure 1(c). The gray squares correspond to extra variables , and connections to the purple box indicate that . Note that AFG does not satisfy the definition of the standard factor graph, as it is not a bipartite graph where connections only exist between variables nodes and factor nodes. However, AFG provides a clear picture of the structure of LS-LP and the node relationships. The proposed LS-LP problem is equivalent to the original MAP inference problem, as shown in Proposition 4.1. It means that the global solutions of this two problems are equivalent.

Lemma 2

The following constraint spaces are equivalent.

(12)
Proof

Firstly, we focus on . We have

(13)

Besides, the following relations hold

(14)

. The equation in the last relation holds if and only if . Combining with (13), we conclude that holds . Consequently, utilizing the local consistency constraint , we obtain that also holds . Thus, we have . Then, the relation is proved.

Besides, as shown in Lemma 1, the set of integer vertices of is same with the one of , and all non-vertices in and are fractional points. Thus, it is easy to know . Hence the proof is finished.

Theorem 4.1

Utilizing Lemma 2, the aforementioned MAP inference problems have the following relationships,

(15)
Proof

According to Lemma 1.3 and 1.4, the global solution of will be on the vertices of , i.e., the integer points, then we have

(16)

Then, utilizing the equivalences between the constraint spaces shown in Lemma 2 (see Eq. (12)), we obtain

(17)

Combining with (see Lemma 1.2), the proof is finished.

4.2 A General Form and KKT Conditions

For clarity, we firstly simplify the notations and formulations in (11) to the general shape,

(18)

where , with . with . , with with , and being the set of neighborhood nodes connected to the -th factor. . The constraint matrix with . , with .

Definition 2

The solution of the LS-LP problem (18) is said to be the KKT point if the following conditions are satisfied:

(19)

where denotes the Lagrangian multiplier; indicates the sub-gradient of , while represents the gradient of . Moreover, is considered as the -KKT point if the following conditions hold:

(20)
1:The initializations , the perturbation , the hyper-parameter
2:for  to  do:
3:     Update as follows (see Section 5.1 for details)
(21)
4:     Update as follows (Section 5.2 for details)
(22)
5:     Update (see Section 5.3 for details)
(23)
6:     Check stopping criterion, as shown in Section 5.4
7:end for
8:return
Algorithm 1 The pertubred ADMM algorithm

5 Perturbed ADMM Algorithm for LS-LP

We propose a perturbed ADMM algorithm to optimize the following perturbed augmented Lagrangian function,

(24)

where with a sufficiently small constant , then is full row rank. , with and . . Note that both and are full row rank, and the second-order gradient is bounded. These properties will play key roles in our later analysis of convergence.

Following the conventional ADMM algorithm, the solution to the LS-LP problem (18) could be obtained through optimizing the following sub-problems based on (24) iteratively. The general structure of the algorithm is summarized in Algorithm 1.

5.1 Sub-Problem w.r.t. in LS-LP Problem

Given and , could be updated by solving the sub-problem (21) (see Algorithm 1). According to the definitions of , this problem can be further separated to the following two independent sub-problems, which can be solved in parallel.

Update :

(25)

It has a closed form solution as follows

(26)

where with . is the projection onto : . As demonstrated in WU-lpbox-ADMM-PAMI-2018 , this projected solution is the optimal solution to (25).

Update : The sub-problems with respect to can be run in parallel ,

(27)

where denotes the index set of variable nodes connecting to the factor node . It is easy to know that Problem (27) is convex, as is positive semi-definite and is a convex set. Any off-the-shelf QP solver could be adopted to solve (27). In experiments, we adopt the active-set algorithm implemented by a publicly-available toolbox called Quadratic Programming in C (QPC)111http://sigpromu.org/quadprog/download.php?sid=3wtwk5tb, which is written in C language and can be called from MATLAB.

5.2 Sub-Problem w.r.t. in LS-LP Problem

Given and , could be updated by solving the sub-problem (22) (see Algorithm 1). According to the definition of , this problem could be separated to independent sub-problems w.r.t. , as follows:

(28)
(29)

where . The close-form solution can be easily obtained by setting its gradient to 0.

5.3 Update in LS-LP Problem

Given and , is updated using (23) (see Algorithm 1). Similarly, it can be separately to independent sub-problems, as follows

(30)
(31)

5.4 Complexity and Implementation Details

Complexity.   In terms of computational complexity, as all other update steps have simple closed-form solutions, the main computational cost lies in updating , which is convex quadratic programming with the probability simplex constraint. Its computational complexity is .

As the matrix with the largest size is in LS-LP, the space complexity is . Both the computational and space complexity of AD3 are similar with LS-LP. More detailed analysis about the computational complexity will be presented in Section 7.5.

Implementation details.   In each iteration, we use the same value of for all and . After each iteration, we update using an incremental rate , i.e., . A upper limit of is also set: if is larger than , it is not updated anymore. The perturbation is set to . We utilizes two stopping criterion jointly, including: 1) the violation of the local consistency constraint, i.e., ; 2) the violation of the equivalence constraint , i.e., . We set the same threshold for both criterion. If this two violations are lower than simultaneously, then the algorithm stops.

6 Convergence Analysis

The convergence property of the above ADMM algorithm is demonstrated in Theorem 6.1. Due to the space limit, the detailed proof will be presented in Appendix.

Theorem 6.1

We suppose that is set to be larger than a constant, then the variable sequence generated by the perturbed ADMM algorithm globally converges to , where is the -KKT point to the LS-LP problem (18), as defined in Definition 2.

Furthermore, according to Definition 1, we assume that has the KL property at with the concave function , where . Consequently, we could obtain the following inequalities:

  1. If , then the perturbed ADMM algorithm will converge in finite steps.

  2. If , then we will obtain the -KKT solution to the LS-LP problem in at least steps.

  3. If , then we will obtain the -KKT solution to the LS-LP problem in at least steps.

Proof

The general structure of the proof consists of the following six consecutive steps, as follows:

  1. The perturbed augmented Lagrangian function (see (24)) is monotonically decreasing along the optimization.

  2. The variable sequence is bounded.

  3. The sequence of variable residuals is converged, i.e., , as .

  4. The variable sequence globally converges to the cluster point .

  5. is the -KKT point of the LS-LP problem (18).

  6. We finally analyze the convergence rate that how many steps are required to achieve the -KKT point.

The detailed proof will be presented in Appendix.

dataset C C C C C C C
Seg-2 50 229.14 622.28 1244.56 2 2 4
Seg-21 50 229.14 622.28 1244.56 2 21 441
Scene 715 182.56 488.99 977.98 2 8 64
Grids 21 3142.86 6236.19 12472.4 2 2 4
Protein 7 14324.7 21854.7 57680.4 2.64 2 6.56
Table 1: Benchmark datasets used in the Probabilistic Inference Challenge (PIC 2011) PIC-2011 and OpenGM 2 opengm2-ijcv-2015 . C to C represent: number of data, average variables, average factors, average edges, average factor size, average variable states, average factor states.

7 Experiments

7.1 Experimental Settings

7.1.1 Datasets

We evaluate on four benchmark datasets from the Probabilistic Inference Challenge (PIC 2011) PIC-2011 and OpenGM 2 opengm2-ijcv-2015 , including Segmentation PIC-2011 , Scene scene-data-iccv-2009 , Grids PIC-2011 , and Protein protein-data-2006 , as shown in Table 1. Segmentation consists of Seg-2 and Seg-21, with different variable states. Protein includes higher-order potentials, while others are pairwise graphs.

7.1.2 Compared Methods

We compare with different categories of MAP inference methods, including: 1) moving making methods, i.e., ICM ICM-1986 ; 2) message-passing methods, including belief propagation (BP) BP-2001 and TRBP TRBP-2005 ; 3) polyhedral methods (including LP relaxation based methods), including dual decomposition using sub-gradient (DD-SG) DD-SG-2012 , TRWS TRWS-PAMI-2006 , ADSal ADSal-2012 , PSDD PSDD-ICCV-2007 and AD3 AD3-ICML-2011 AD3-JMLR-2015 . 4) We also compare with LP-LP, which calls the the active-set method (implemented by linprog in MATLAB) to optimize . It serves as a baseline to measure the performance of above methods. 5) The most related work -Box ADMM algorithm WU-lpbox-ADMM-PAMI-2018 is also compared. However, the presented algorithm in WU-lpbox-ADMM-PAMI-2018 can only handle MRF models with pairwise potentials, which is formulated as a binary quadratic programming (BQP) problem. Thus, -Box ADMM (hereafter we call it -Box for clarity) is not compared on Protein, of which models include high-order potentials. 6) We also compare with two hybrid methods, including method DAOOPT (adopting branch-and-bound method branch-and-bound-1960 as a sub-routine) daoopt-2012 DAOOPT-2012-details and MPLP-C mplp-c-uai-2012 (adopting MPLP MPLP-NIPS-2007

as a sub-routine). The ‘hybrid’ indicates that the method is a combination of an off-the-shelf single method and some heuristic steps. And we call above 5 types as non-hybrid methods. Both the proposed LS-LP and

-Box are implemented by MATLAB. The following methods are implemented by the author provided C++ package, including: PSDD and AD3222http://www.cs.cmu.edu/ ark/AD3/, MPLP-C333https://github.com/opengm/MPLP, and DAOOPT444https://github.com/lotten/daoopt. All other methods are implemented through the OpenGM 2 software opengm2-ijcv-2015 , and we add a prefix “ogm" before the method name, such as ogm-TRWS.

In experiments, we set some upper limits: the maximal iteration as 2000 for PSDD and AD3, 500 for -Box and LS-LP, and 1000 for other methods; for DAOOPT, the memory limit of mini buckets is set as 4000 MB and the upper time limit as 2 hours. The parameter tuning of all compared methods (except -Box) is self-included in their implementations. Both LS-LP and -Box are ADMM algorithms, and their hyper-parameters are tuned as follows: the hyper-parameters , and (see implementation details of Section 5) are adjusted in the ranges , and , respectively, and those leading to the higher logPot value are used.

7.1.3 Evaluation Metrics

We evaluate the performance of all compared methods using three types of metrics, including the log potential (logPot) values, the solution type, as well as the computational complexity and runtime.

Evaluation using logPot values.   The logPot value indicates the objective value of Eq. (7). Given that constraints in (7) are satisfied, the larger logPot value indicates the better inference performance. Since LP-LP gives the optimal solution to (7) in , and we know that the constraint space of (7) is the subset of , then the logPot value of any valid label configurations cannot be larger than that of LP-LP. Note that in the implementation of OpenGM 2 opengm2-ijcv-2015 , a rounding method is adopted as the post-processing step to produce the integer solution for the continuous MAP inference methods. However, the performance of different MAP inference methods may be significantly changed by rounding. Thus, for other methods not implemented by OpenGM 2, we report the logPot values of original continuous solutions, without any rounding.

Evaluation using solution types.   Since LP-LP, PSDD, AD3, -Box and LS-LP are possible to give continuous solutions, the larger logPot value doesn’t always mean the better MAP inference result. Thus, we also define four qualitative measures, including valid, fractional, approximate and uniform, to intuitively measure the inference quality. Valid (V) means that the solution is integer and satisfies the constraints in ; Fractional (F) indicates that the solution belongs to , but its value is fractional; Approximate (A) means that some constraints in are violated, and its solutions is integer or fractional. Uniform (U) denotes that the solution belongs to , but the value is uniform, such as for the variable node with binary states. These qualitative types provide an intuitive measure of the inference quality.

Evaluation using the computational complexity and runtime.   The computational complexity and the practical runtime are also important performance measures for MAP inference methods, as shown in Section 7.5.

Method type

Baseline

Hybrid methods

Non-hybrid methods Proposed

Dataset

LP-LP

DAOOPT

MPLP-C

ogm-ICM

ogm-BP

ogm-TRBP

ogm-TRWS

ogm-ADSal

PSDD AD3 -Box LS-LP
Seg-2 mean -75.5 -75.5 -75.5 -137.1 -79 -76.8 -75.5 -75.5 -75.4 -75.5 -76.5 -75.6
std 19.63 19.63 19.63 70.1 20.24 19.36 19.24 19.24 19.77 19.63 20.3 19.69

Seg-21

mean -324.89 -325.34 -324.89 -393.37 -330.37 -328.92 -324.89 -324.89 -325.1 -324.89 -344.51 -324.89
std 58.12 58.14 58.12 74.47 58.54 58.57 56.97 56.97 58.16 58.12 59.24 58.12
Scene mean 866.66 866.66 866.66 864.27 866.49 866.51 866.66 866.66 866.65 866.66 864.11 866.66
std 109.34 109.34 109.36 109.64 109.22 109.2 109.19 109.19 109.34 109.34 108.66 109.34
Table 2: LogPot values of MAP inference solutions on Seg-2, Seg-21 and Scene. Except of PSDD, all other methods give valid solutions. The best result among valid solutions in each row is highlighted in bold. Please refer to Section 7.2 for details.

Method type

Baseline Hybrid methods Non-hybrid methods Proposed
Model

LP-LP

DAOOPT

MPLP-C

ogm-ICM

ogm-BP

ogm-TRBP

ogm-DD-SG

ogm-TRWS

ogm-ADSal

-Box

LS-LP

D1 3736.7 3015.7 3015.7 2708.9 121.3 -235.2 1286.3 2524.9 2605.2 2794.8 2931.8

3
D2 3830.3 3051 3033.6 2567.9 276.4 19.2 1484.7 2674.4 2670.2 2812.4 2936.7

3
D3 5605.1 4517.3 4517.3 4067.3 332.1 14.02 1889.7 3829.3 3884 4301.1 4408.9

3
D4 5745.5 4563.2 4563.2 3837.12 924.5 -36.7 2023.4 3894.6 4015 4202.4 4446.6

3
D5 1915.2 1542.7 1542.7 1318.41 481.5 -47.8 807.6 1325.5 1323.9 1427.4 1503.2

3
D6 15601.2 12662.9 12665.7 10753.7 2793.5 2214.3 5051.9 10500.8 11029 11486.2 12336.1

3
D7 16291.5 13050.7 13054.8 10903.8 1217.1 132.4 4634.8 10665 10870.4 11867.6 12537.2

3
D8 23401.8 18952.45 18896.8 16154.2 4314.9 5371.1 7160 16014 16276.9 17367.5 18358.7

3
D9 24437.3 19538 19427.5 16334.2 3560.8 -1111 7187.3 16004.3 16508.1 17990 18785.8

3
D10 3121.2 2689 2688.8 2255.38 1665.3 1582.9 1330.7 2215.7 2369.1 2552.6 2659.8

3
D11 3231.6 2714.67 2714.52 2258.54 1399.6 42.8 1285.9 2271.5 2370.1 2556.8 2654.9

3
D12 7800.6 6401.15 6396 5356.28 2033.5 1953.1 2832.5 5282.5 5558.8 5903 6201.2

3
D13 8078.5 6472.9 6469.7 5425.16 1711.5 381 2814.3 5452.8 5646.1 5923.5 6275.4

3
D14 62943 45813.6 43538.9 5690.9 6426.7 18700.4 42274.2 43292.5 44397.5 48766.1

1
D15 63993.1 47444.4 42855 4287.1 956.4 18811.9 42535 42918.7 44759.5 48657.3

1
D16 94414.5 69408.6 65081.2 4374.2 4656.5 27320.6 63148.1 64401.1 66784.2 72993.8

1
D17 96243.6 71730.8 63768.1 13662.7 -529.3 27287.7 63885.1 64487.9 67589.4 73486

1
D18 12721.3 10445.8 9062.03 5198.7 4975.4 4785.5 8793.5 9408.4 10015.1 10580.8

1
D19 12875.6 10674.1 9214.57 5944.6 1213.1 5328.5 8952.4 9385.1 10163.6 10698.4

1
D20 31809.7 22292.5 21527.9 5410.9 4762 9837.3 21546.8 22109.5 22913.3 24834.5

1
D21 31996.9 24032.4 21529.6 4242.3 47.6 10423.8 21195.9 21730.3 22668.7 24532.8

1
Table 3: MAP inference results on Grids dataset. LP-LP, PSDD and AD3 produce uniform solutions on all models in Grids, while all other methods give integer solutions. Here we only show the logPot of LP-LP as the upper bound of other methods. The best logPot among integer solutions in each row is highlighted in bold. The number with in circle indicates the performance ranking of LS-LP. Please refer to Section 7.3 for details.

Method type

Hybrid methods

Non-hybrid methods Proposed
Model MPLP-C

ogm-ICM

ogm-BP

ogm-TRBP

ogm-DD-SG

PSDD AD3 LS-LP
D1 -30181.3 -32409.9 -32019.1 -31671.6 -33381.2 -30128.8 -30143.6 -30165.5

1
D2 -29305.4 -32561.3 -30966.1 -31253.3 -33583.6 -29307.3 -29302.6 -29295.4

1
D4 -28952.1 -32570 -31031.4 -31176.6 -33747.7 -28952.5 -28952 -28952

1
D5 -269567 -256489 -382766 -357330 -553376 -66132.3 -115562 -267814

2
D6 -30070.6 -31699.1 -30765.2 -30772.2 -32952.9 -30063.6 -30062.2 -30063.4

1
D7 -30288.3 -32562.2 -31659.6 -31791.1 -33620.4 -30248.5 -30239.8 -30266

1
D8 -29336.5 -32617.2 -31064.7 -31219.9 -34549.9 -29331 -29336.1 -29334.7

1
Table 4: LogPot values of MAP inference solutions on Protein dataset. Except for PSDD and AD3, all other methods give integer solutions. The solution types of PSDD on D1 to D8 are: . Those of AD3 are: . The best result among valid solutions in each row is highlighted in bold. The number with in circle indicates the performance ranking of LS-LP. Please refer to Section 7.4 for details.

7.2 Results on Segmentation and Scene

The average results on Seg-2, Seg-21 and Scene are shown in Table 2. LP-LP gives valid solutions on all models, i.e., the best solutions. Except for PSDD, all other methods give valid solutions, and their logPot values can not be higher than those of LP-LP. The logPot values of ICM are the lowest, and those of ogm-BP, ogm-TRBP are slightly lower than the best logPot values, while other methods achieve the best logPot values on most models. Only PSDD gives approximate solutions (i.e., the constraints in are not fully satisfied) on some models, specifically, 5 models in Seg-2, 8 models in Seg-21 and 166 models in Scene. ogm-DD-SG fails to give solutions on some models of these datasets, thus we ignore it. Evaluations on these easy models only show that the performance ranking is ogm-ICM < ogm-BP, ogm-TRBP, -Box < others.

7.3 Results on Grids

The results on Grids are shown in Table 3. For clarity, we use the model indexes D1 to D21 to indicate the model name to save space in this section. The corresponding model names to D1 to D21 are grid20x20.f10.uai, grid20x20.f10.wrap.uai, grid20x20.f15.uai, grid20x20.f15.wrap.uai, grid20x20.f5.wr ap.uai, grid40x40.f10.uai, grid40x40.f10.wrap.uai, grid40x40. f15.uai, grid40x40.f15.wrap.uai, grid40x40.f2.uai, grid40x40. f2.wrap.uai, grid40x40.f5.uai, grid40x40.f5.wrap.uai, grid80 x80.f10.uai, grid80x80.f10.wrap.uai, grid80x80.f15.uai, grid 80x80.f15.wrap.uai, grid80x80.f2.uai, grid80x80.f2.wrap.uai, grid80x80.f5.uai, grid80x80.f5.wrap.uai, respectively.

The models in Grids are much challenging for LP relaxation based methods, as all models have symmetric pairwise log potentials and very dense cycles in the graph. In this case, many vertices of are uniform solutions . Consequently, the LP relaxation based methods are likely to produce uniform solutions. This is verified by that LP-LP, AD3, PSDD give uniform solutions on all models in Grids, i.e., most solutions are . Thus, we only show the logPot values of LP-LP in Table 3, to provide the theoretical upper-bound of logPot of valid solutions from other methods. In contrast, the additional -sphere constraint in LS-LP excludes the uniform solutions. On small scale models D1 to D13, DAOOPT and MPLP-C show the highest logPot values, while LS-LP gives slightly lower values. On large scale models D14 to D21, DAOOPT fails to give any result within 2 hours. LS-LP gives the best results, while MPLP-C shows slightly lower results. -Box performs worse than LS-LP, MPLP-C and DAOOPT on most models, while better than all other methods, among which ogm-BP, ogm-TRBP and ogm-DD-SG perform worst. These results demonstrate that 1) LS-LP is comparable to hybrid methods DAOOPT and MPLP-C, but with much lower computational cost (shown in Section 7.5); 2) LS-LP performs much better than other approximated methods.

7.4 Results on Protein

The results on Protein are shown in Table 4. Different with above three datasets, Protein includes 8 large scale models, and with high-order factors. Similarly, we use the model indexes D1 to D8 to indicate the model name to save space in this section. The corresponding model names of D1 to D8 are didNotconverge1.uai, didNotconverge2.uai, didNotconverge4.uai, didNotconverge5.uai, didNotconverge6.uai, didNotconverge7.uai, didNotconverge8.uai, respectively. As D1 and D3 are completely same, we remove D3 in experiments. DAOOPT fails to give solutions within 2 hours on all models, and LP-LP cannot produce solutions due to the memory limit. ogm-TRWS and -Box are not evaluated as it cannot handle high-order factors. LS-LP produces valid integer solutions on all models, and gives the highest logPot values on all models except of D5. MPLP-C gives slightly lower logPot values than LS-LP. AD3 only produces a fractional solution on D4, while produces approximate and fractional solutions on other models, while PSDD gives approximate and fractional solutions on all models. Other methods also show much worse performance than LS-LP and MPLP-C. One exception is that ogm-ICM gives the best results on D5, and we find that D5 is the most challenging model for approximated methods.

Methods Complexities
MPLP
MPLP-C