1 Introduction
The MAP inference problem in general higherorder graphical models (Bayesian models, Markov random field models (MRFs), and beyond), also, called the discrete higherorder multiplepartitioning (or multilabel) problem (HoMPP) can be stated as follows. Given:

a discrete domain of points (sites) assumed, without loss of generality, to be the integer set , where stands for an integer which is greater than, or equal to ,

a discrete labelset assumed, without loss of generality, to be the integer set , where stands for an integer which is greater than, or equal to ,

a hypersiteset consisting of subsets of with cardinality greater than, or equal to ,

a set of realvalued local functions .
then, the goal is to find a multilabel function of the form:
in a way which either minimizes, or maximizes a higherorder cost function defined for all multilabel function as:
(1) 
where it has been assumed that, , one has .
For the sake of convenience in the remainder, we propose to encode the multilabel function by means of a
dimensional integer vector
, while simply bearing in mind that , one has , and we refer throughout to as the multilabel vector (MLV). The problem then amounts to finding an integer vector solution in which either globally solves the following minimization problem:(2) 
or globally solves the following maximization problem:
(3) 
More generally, one might be interested in finding both modes (i.e.; the minimum and the maximum solutions) of , and we propose to denote such a problem by:
(4) 
Furthermore, in order to rule out any trivial instances of the HoMPP, therefore, we make throughout the following mild assumptions:

,

and are nonconstant functions.
For the sake of clarity in the remainder, we shall be referring to minimization problem (2), maximization problem (3) and modes finding problem (4) using the acronyms MinMPP, MaxMPP, and ModesMPP, respectively. Furthermore, we want to emphasize that in practice, such a higherorder function often arises as minus the loglikelihood of a instance of a graphical model (e.g.; a Bayesian model, a MRF model, and so forth) given the observed data (up to the minus log of a normalization constant). Then, depending on the application, either one may only be interested in a MAP solution of the HoMPP (i.e.; a one which maximizes the likelihood, equivalently, which minimizes ), or in both modes of the likelihood. In fact, one of the contributions of this paper is that it also shows that both modes of are intimately related to each other (please refer to section 8, especially to the discussion which follows Theorem 11 for more details).
The remainder of this paper is structured as follows. After reviewing some the existing literature on the MAP inference problem in graphical models, we first reformulate, on stochastic grounds, both MinMPP (2) and MaxMPP (3) as expectation minimization and maximization linear programs (LPs), respectively. After that, we introduce the notion of deltadistribution, and we reformulate ModesMPP (4) as a deltaexpectation minimization LP. Next, we introduce the orthomarginal framework as a general discrete function approximation by means of an orthogonal projection in terms of linear combinations of function margins with respect to a given hypersiteset, though, as mentioned in the abstract, we rather use the latter for the purpose of modeling local consistency from a global perspective. Then, we proceed in a traditional way for obtaining useful LP relaxations of the HoMPP, by merely enforcing locally the probability and the deltaprobability axioms, respectively. Having in mind the two mathematical tools above, namely, the notion of deltadistribution and the orthomarginal framework, we reformulate the proposed LP relaxations from a global viewpoint, and we show that their optimal solutions coincide with the ones of their original (hard) versions. Last but not least, since one is only guaranteed to recover a set of optimal marginal distributions (resp. a set of optimal marginal deltadistributions) of the HoMPP, we also develop an algorithm allowing to compute modes of
from a perhaps fractional solution of its LP relaxation. Before moving to the crux of the approach, we want to emphasize that the present paper is selfcontained, moreover, all the presented results throughout are shown using rather simple algebraic techniques widely accessible to anyone who is familiar with the basic concepts of linear algebra, linear programming, and probability theory.
2 Related work
MaximumAPosteriori (MAP) estimation in higherorder probabilistic graphical models
[LauritzenLauritzen1991, BishopBishop2006, Wainwright JordanWainwright Jordan2008], also referred to in the operational research and computer vision literatures as the higherorder multiplepartitioning (or the multilabel) problem (HoMPP), has been, for many decades, a central topic in the literature of AI and related fields (statistics, machine learning, datamining, natural language processing, computer vision, coding theory, operations research, computational biology, to name a few). In the culture of mathematical programming, the HoMPP is nothing else than unconstrained integer programming
[Nemhauser WolseyNemhauser Wolsey1988, Grootschel, Lovasz, SchrijverGrootschel et al.1993], whereas in the culture of data science, the HoMPP often arises as an inference (or an inverse) problem, in the sense that one is interested in finding the most likely configuration of model parameters which explains the observed data. The choice of a graphical model for a given practical situation may be motivated by the nature of the (random) process which generates the data, but may also be severely constrained by available computing resources and/or realtime considerations. Thus, factorable graphical models have arisen as an almost inescapable AI tool both for modeling and solving a variety of AI problems, above all, due to their modularity, flexibility as well as their ability to model a variety of realworld problems. In this regard, two popular classes of graphical models are the Bayesian graphs (or the directed graphical models)
[PearlPearl1982, Pearl RussellPearl Russell2002], and the Markov random field (MRF) graphs (or the undirected graphical models) [Hammersley CliffordHammersley Clifford1971, Kinderman SnellKinderman Snell1980]. Historically, MRFs had long been known in the field of statistical physics [IsingIsing1925, Ashkin TellerAshkin Teller1943, PottsPotts1952], before they were first introduced in computer science [BesagBesag1974] and later popularized by many other authors [Geman GemanGeman Geman1984, BesagBesag1986, Geman GraffigneGeman Graffigne1986, LiLi1995]. Nowadays, graphical models are a branch in its own right of statistical and probability theories, and of which use in AI is ubiquitous.With that being said, exact MAP inference, or even approximate MAP inference in general graphical models is a hard combinatorial problem [KarpKarp1972, CooperCooper1990, Dagum LubyDagum Luby1993, ShimonyShimony1994, RothRoth1996, ChickeringChickering1996, CipraCipra2000, MegretskiMegretski1996, Boykov, Veksler, ZabihBoykov et al.2001, Park DarwichePark Darwiche2004, Cohen, Cooper, Jeavons, KrokhinCohen et al.2006]. As a matter of fact, unless P = NP, one may not even hope achieving an approximate polynomialtime algorithm for computing the modes of an arbitrary instance of a graphical model. Therefore, except in particular cases which are known to be solvable exactly and in polynomialtime [HammerHammer1965, Greig, Porteous, SeheultGreig et al.1989, Boykov, Veksler, ZabihBoykov et al.1998, IshikawaIshikawa2003, SchlesingerSchlesinger2007, Osokin, Vetrov, KolmogorovOsokin et al.2011]
, the MAP inference problem in graphical models has been mostly dealt with, so far, by using heuristical approaches, and which may be ranked in three main categories. First, probabilitysamplingbased approaches also called the Markov Chain Monte Carlo (MCMC) methods
[HastingsHastings1970, GreenGreen1995, Gelfand SmithGelfand Smith1990] have been among the firstly used MAP estimation algorithms in MRF models [Geman GemanGeman Geman1984, BesagBesag1986], and their good practical performances both in terms of computational efficiency and accuracy have been, extensively, reported in the literature [Baddeley Van LieshoutBaddeley Van Lieshout1993, WinklerWinkler1995, DescombesDescombes2011]. Graphtheory based approaches which are mostly variants of the graphcut algorithm have been also extensively used for optimizing a plethora of MRF instances which are mainly encountered in computer vision [Boykov, Veksler, ZabihBoykov et al.2001, Kolmogorov ZabihKolmogorov Zabih2004, Liu VekslerLiu Veksler2010, VekslerVeksler2012]. More recently, fostered by the important breakthroughs in linear programming [ChvàtalChvàtal1983, DantzigDantzig1990, KarmarkarKarmarkar1984, Bertsimas TsitsiklisBertsimas Tsitsiklis1997] and, more generally, in convex programming [YeYe1989, Nesterov NemirovskyNesterov Nemirovsky1994, NesterovNesterov2004, NesterovNesterov2009, LesajaLesaja2009, Beck TeboulleBeck Teboulle2009], as well as by the important recent surge in highperformance computing, such as multiprocessor and parallel computing (GPU) technologies [Bolz, Farmer, Grinspun, SchrooderBolz et al.2003, Li, Lu, Hu, JiangLi et al.2011], linear and convex programming relaxation approaches–including spatiallycontinuous continuous approaches [Nikolova, Esedoglu, ChanNikolova et al.2006, Cremers, Pock, Kolev, ChambolleCremers et al.2011, Lellmann SchnorrLellmann Schnorr2011, Nieuwenhuis, Toeppe, CremersNieuwenhuis et al.2013, Zach, Hane, PollefeysZach et al.2014] and spatiallydiscrete ones [SchlesingerSchlesinger1976, Hummel ZuckerHummel Zucker1983, Hammer, Hansen, SimeoneHammer et al.1984, PearlPearl1988, Sherali AdamsSherali Adams1990, Koster, Van Hoesel, KolenKoster et al.1998, Chekuri, Khanna, Naor, ZosinChekuri et al.2005, Kingsford, Chazelle, SinghKingsford et al.2005, KolmogorovKolmogorov2006a, WernerWerner2007, CooperCooper2012]– have arisen as a promising alternative both to graphtheory based and MCMC based MAP estimation approaches in graphical models. Generally speaking, the latter category of approaches may also be seen as an approximate marginal inference approach in graphical models [Wainwright, Jaakkola, WillskyWainwright et al.2005, Wainwright JordanWainwright Jordan2008], in the sense that, one generally attempts to optimize the objective over a relaxation of the marginal polytope constraints, in such a way that, an approximate MAP solution may be found by a mere rounding procedure, or by means of a more sophisticated message passing algorithm [Wainwright, Jaakkola, WillskyWainwright et al.2005, KolmogorovKolmogorov2006b, Komodakis, Paragios, TziritasKomodakis et al.2011, Sontag JaakkolaSontag Jaakkola2008]. In fact, the approach which is described in this paper belongs to the latter category of approaches, yet, it may solve the MAP inference problem in an arbitrary graphical model instance.3 The HoMPP expectation optimization framework
The goal of this section is to transform both MinMPP (2) and MaxMPP (3) into equivalent continuous optimization problems, and, eventually, into linear programs by means of the expectationoptimization framework.
Therefore, we first assume MinMPP (2), and in order to fix ideas once and for all throughout, we propose to develop from scratch the expectation minimization (EM) approach, allowing to recast any instance of MinMPP (2) as a linear program (LP). In the introduction section, we have assumed that the labeling process is purely deterministic, but unknown. Therefore in this section, we rather advocate a random multilabel process, consisting in randomly drawing vector samples with a certain probability, then assigning to each site realization of its random label. Let us stress that randomization serves here only temporarily for developing the EM approach which is deterministic. Therefore, suppose a random multilabel vector (RMLV) , with value domain , and consider the stochastic (random) version of the objective function of MinMPP (2) expressing as:
Then, one writes the expectation of as:
Please observe that expresses solely in terms of the marginal distributions of the random vectors . Next, suppose that one is rather given a set (or a family) of candidate probability distributions of RMLV ^{1}^{1}1But more formally speaking, one would rather consider a set of independent copies of , each of which is endowed with its own distribution in ., such that:
and the goal is to choose among
the joint distribution of RMLV
which solves the following minimization problem:(5) 
We refer to minimization problem (5) as EMinMPP, standing for expectation minimization multiplepartitioning problem. Now, in order to see how EMinMPP (5) relates to MinMPP (2), one may write : , where stands for the indicator function of , defined as:
in such a way that, by denoting by which stands for the set of indicator functions of the integer vector set , one may completely reformulate MinMPP (2) as an instance of EMinMPP (5), with . Furthermore, since , writes as some convex combination of the elements of the set , one derives immediately that:
(6) 
which means that EMinMLP (5) is an upperbound for MinMPP (2). Then, Theorem 1 below gives a sufficient condition under which EMinMPP (5) exactly solves MinMPP (2), and how one may obtain, accordingly, an optimal vector solution of MinMPP (2) from a perhaps fractional optimal probability solution of EMinMPP (5).
Theorem 1
Proof 1
The assumption that guarantees that a strict equality is achieved in formula (6). Moreover, if a distribution of RMLV is optimal for problem (5), then so must be any indicator function which is expressed with a strictly positive coefficient in the convex combination of in terms of indicator functions of the set , in other words, any integer vector sample of must also be optimal for MinMPP (2).
Clearly, Theorem 1 is nothing else than the probabilistic counterpart of the wellknown convex hull reformulation in integer programming [Sherali AdamsSherali Adams1990, Grootschel, Lovasz, SchrijverGrootschel et al.1993, Bertsimas TsitsiklisBertsimas Tsitsiklis1997, Wainwright JordanWainwright Jordan2008]. Having said that, in the remainder, we will assume that stands for the entire convex set of candidate joint distributions of which is given by:
(7) 
Clearly, one has , and coincides with the convex hull of . One may reexpress EMinMPP (5), accordingly, as a linear program (LP) as follows:
(8) 
Equally, one finds that the following LP:
(9) 
completely solves MaxMPP (3). Throughout, we shall refer to LP (8) and LP (9) using the acronyms EMinMLP and EMaxMLP, respectively. We conclude this section by merely saying that both EMinMLP (8) and EMaxMLP (9) are untractable in their current form, and the goal in the remainder of this paper is to develop their efficient LP relaxations.
4 The HoMPP deltaexpectation minimization framework
In this section, we develop the deltaexpectation minimization framework for addressing ModesMPP (4), in other words, both MinMPP (2) and MaxMPP (3) in a common minimization framework.
4.1 Joint deltadistribution
Definition 1 (Joint deltadistribution)
We call a joint deltadistribution of RMLV any function which can write in terms of the difference of two (arbitrary) joint distributions of RMLV as:
where both and stand for (ordinary) joint distributions of RMLV .
Theorem 2 provides a useful alternative definition of a joint deltadistribution of RMLV , without resorting to its ordinary joint distributions.
Theorem 2
A function defines a joint deltadistribution of RMLV , if and only if, satisfies the following two formulas:

,

.
Interestingly, one has managed to get rid of the pointwise sign constraint of ordinary distributions of RMLV by means of its joint deltadistributions. One then notes that the decomposition of a joint deltadistribution of RMLV in terms of the difference of its two ordinary joint distributions is, generally, nonunique, hence Proposition 1 which fully characterizes joint deltadistributions of RMLV admitting such a unique decomposition.
Proposition 1
A joint deltadistribution of RMLV admits a unique decomposition of the form , where both and stand for joint distributions of RMLV , if and only if, one has , in which case, and are uniquely given by:
Last but not least, Proposition 2 below establishes that any zeromean function defines, at worst, up to a multiplicative scale, a joint deltadistribution of RMLV .
Proposition 2
Suppose a nonzero function , such that, . Then, there exists , such that, , the normalized function defined as:
defines a joint deltadistribution of RMLV .
4.2 Reformulation of a HoMPP as a deltaexpectation minimization problem
We begin by introducing the notion of deltaexpectation of a realvalued random function of RMLV .
Definition 2 (Deltaexpectation)
Suppose is a joint deltadistribution of RMLV , and suppose a realvalued function . Then, one defines the deltaexpectation of random function as:
(10) 
Next, similarly to the EM framework, one rather assumes a set of candidate joint deltadistributions of RMLV denoted by , and considers the deltaexpectation minimization problem:
(11) 
In the remainder, we take as the entire (convex) set of joint deltadistributions of RMLV which, according to Theorem 2, is defined as:
(12) 
thus enabling deltaexpectation minimization problem (11) to be expressed as a LP as follows:
(13) 
which may also expand, using the marginal deltadistributions of , as:
(14) 
In the remainder, we refer to problem (13) using the acronym DEMinMLP. Theorem 3 below may be seen as the deltadistribution analog of Theorem 1.
Theorem 3
Suppose is an optimal solution of DEMinMLP (13). It follows that:

achieves an optimal objective value which is equal to: ,

, ,

, .
Moreover, satisfies , thereby, admitting a unique decomposition of the form , where both and stand for two joint distributions of RMLV which are given by:
and which are optimal for EMinMLP (8) and EMaxMLP (9), respectively.
5 The orthomarginal framework
In this section, we describe an algebraic approach (called the orthomarginal framework) for general discrete function approximation via an orthogonal projection in terms of linear combinations of function margins with respect to a given hypersiteset. Nevertheless, the main usefulness of such an approach in the present paper is that it enables to model any set of locally constant functions (see Definition 8) in terms of a global (yet nonunique) mother function . Therefore, in order to fix ideas once and for all in the remainder, subsection 5.1 is devoted to the introduction of all the useful definitions to the development of the orthomarginal framework, and subsection 5.2 is devoted to the description of its main results.
Beforehand, we want to note that, throughout this section, we assume that is a hypersiteset with respect to , moreover, we assume some order (e.g.; a lexicographic order) on the elements of which means that, whatever , if , then either one has , or one has .
5.1 Definitions
Definition 3 (Maximal hypersiteset)
One says that is maximal, if and only if:
or, in plain words, if one may not find in both a hypersite, and any of its subsets.
Definition 4 (Frontier hypersiteset)
One defines the frontier of , denoted by , as the smallest maximal hypersiteset which is contained in . In plain words, is the hypersiteset which contains all the hypervertices in which are not included in any of its other hypervertices.
Definition 5 (Ancestor hypersite)
Suppose a hypersite (if any). Then, we call an ancestor hypersite of , any hypersite , such that, .
Definition 6 (Ancestry function)
We call an ancestry function with respect to , any function:
such that, , is an ancestor of in .
Please note that the ancestor of some hypersite may not be unique, hence, the function may not be unique too.
Remark 1
It does not take much effort to see that higherorder function (1) may rewrite solely in terms of local functions with respect as:
by simply merging each term of with respect to any with the term corresponding to any of its ancestors in .
Definition 7 (Margin)
Suppose a function , and a hypersite . Then, one defines the margin of with respect to as the function defined as:
(15) 
Definition 8 (Pseudomarginal)
One says that a set of local functions of the form is a pseudomarginalsset (or a set of locally consistent functions) with respect to , if and only if, it satisfies the following identities:
(16) 
where stands for the hypersite of which sites belong to but do not belong to , and stands for an arbitrary ancestor function with respect to (see Definition 6).
Clearly, any set of actual margins with respect to of an arbitrary function also defines a pseudomarginalsset with respect to .
Convention 1
We abuse of notation by denoting by the empty hypersite (i.e.; a one which does not contain any site), and we convene, henceforth, that whatever a function , the margin of with respect to , simply, denoted by , is the real quantity .
Definition 9 (Frontierclosure of a hypersiteset)
One defines the frontierclosure of as the hypersiteset with respect to , denoted by , such that:

, ,

, .
5.2 Main results of the orthomarginal framework
First of all, Theorem 4 below establishes that marginalization of any function with respect to is intimately related to an orthogonal projection of .
Theorem 4
Let stand for an arbitrary realvalued function. Then, may write as a direct sum of two functions , and as: , such that:

the marginsset with respect to of coincides with the one of ,

all the margins of with respect to are identically equal to zero.

the closedform expression of function is given by:
where , stands for the margin of with respect to , and the integer coefficients are iteratively given by:
(17)
Furthermore, introduce operator denoted by and defined as:
(18) 
Then, is an orthogonal projection.
Notation 1
We refer in the remainder to the operator as the orthomarginal operator with respect to hypersiteset .
Theorem 5 below builds on the result of Theorem 4 for establishing that any pseudomarginalsset with respect to may be viewed as the actual marginsset with respect to of a global, yet nonunique, function .
Theorem 5
Definition 10 (Orthomarginal space)
The orthomarginal space with respect to denoted by , is defined as the linear function space which is given by:
We also denote by the complement space of , defined as:
Remark 2
One notes that any function , reflexively, writes in terms of its margins with respect to as:
where , stands for the margin of with respect to .
Proposition 3
Suppose a realvalued function . Then, one has , if and only if, there exists a set of local functions (not to be confused here with the margins of with respect to ), such that:
Proposition 4
One has:
where , and stand for the margins with respect to of and , respectively.
Proof 2
The proof of Proposition 4 follows immediately from the definition of , since if , then both and write as a linear combination of their respective margins with respect to , which then must coincide if , and viceversa.
6 LP relaxation of the HoMPP over the local marginalpolytope
In order to fix ideas throughout, thus, this section consists of subsection 6.1 in which we introduce some (or better said, we recall some already known) useful definitions, and subsection 6.2 where we develop the LP relaxation approach of the HoMPP.
6.1 Definitions
Definition 11 (Pseudomarginal probability set)
Suppose is a pseudomarginalsset which, thus, satisfies identities (16). If, moreover, verifies the following identities:
(20) 
then is called a pseudomarginal probability set with respect to .
Definition 12 (Pseudomarginal polytope)
The pseudo (or the local) marginal polytope with respect to denoted by is defined as the space of all the pseudomarginal probability sets with respect to .
Definition 13 (Pseudomarginal deltaprobability set)
Suppose is a pseudomarginalsset which, thus, satisfies identities (16). If, moreover, verifies the identities:
(21) 
then is called a pseudomarginal deltaprobability set with respect to .
Definition 14 (pseudomarginal deltapolytope)
The pseudomarginal deltapolytope with respect to denoted by is defined as the space of all the pseudomarginal deltaprobability sets with respect to .
Remark 3
Let us note that the system of identities that defines either a pseudomarginal probability set, or a pseudomarginal deltaprobability set necessarily presents many redundancies, thus, making it prone to further simplifications. For the sake of example, by taking into account the developed arguments in section 5, one may see immediately that the identities of the form may be reduced to a single identity of the form , equally, the identities of the form may be reduced to a single identity of the form , with standing for an arbitrary hypersite in , and so on. Nevertheless, for the sake of simplicity, we will not proceed to such simplifications in this paper, though, the latter may turn out to be desirable in practice, above all, for bigger values of .
6.2 Relaxation
One proceeds in a traditional way for obtaining LP relaxations of EMinMLP (8) and EMaxMLP (9), hence of MinMPP (2) and MaxMPP (3), respectively, by just enforcing locally the probability axioms, as follows:
(22) 
and
(23) 
where stands for the pseudomarginal polytope (see Definition 12).
Equally, one may obtain a useful LP relaxation of DEMinMLP (13), hence of ModesMPP (4), by just enforcing just enforcing locally the deltaprobability axioms, as follows:
(24) 
where stands for the pseudomarginal deltapolytope (see Definition 14.
In the remainder, we refer to LP (22), LP (23), and LP (24) using the acronyms PseudoEMinMLP, PseudoEMaxMLP, and PseudoEMinMLP, respectively. One then easily checks that all of PseudoEMinMLP (22), PseudoEMaxMLP (23), and PseudoEMinMLP (24) are bounded, moreover, they constitute a lowerbound for EMinMLP (8), an upper bound for EMaxMLP (9), and a lower bound for DEMinMLP (13), respectively.
7 Optimality study of the LP relaxations
This section is divided into two main subsections. First, subsection 7.1 develops equivalent global reformulations of the described LP relaxations in section 5, thereby, setting the stage for their optimality study in subsection 7.2.
7.1 Global reformulation of the LP relaxations
The main result in this section regarding the equivalent global reformulation of PseudoEMinMLP (22), PseudoEMaxMLP (23), and PseudoEMinMLP (24) is highlighted in Theorem 6 below.
Theorem 6

PseudoEMinMLP (22) is equivalent to the following LP:
(25) 
PseudoEMaxMLP (23) is equivalent to the following LP:
(26) 
PseudoEMinMLP (24) is equivalent to the following LP:
(27)
in the sense that any of the global LP reformulations above:

achieves the same optimal objective value as its local reformulation counterpart,

the margins set with respect to of any of its feasible solutions is a feasible solution of its local reformulation counterpart,

conversely, whatever a feasible solution of its local counterpart, any function of which margins set with the respect to is its feasible solution and achieves an objective value equal to the one achieved by the former in its local counterpart.
7.2 Main optimality results
One begins by observing an interesting phenomenon which is as follows. First of all, consider the LP which stands for the difference of GlbPseudoEMinMLP (25) and GlbPseudoEMaxMLP (26), in that order, as follows:
(28) 
Clearly, solving both GlbPseudoEMinMLP (25) and GlbPseudoEMaxMLP (26) amounts to solving LP (28) once, and viceversa, this is on the one hand. On the other hand, suppose is a feasible solution of GlbPseudoEMinMLP (27), and and are feasible solutions of GlbPseudoEMinMLP (25) and GlbPseudoEMaxMLP (26), respectively, hence of LP (28) too. It follows, by Proposition 2, that both and are, at worst, up to a multiplicative scale (greater than, or equal to ), feasible solutions of DEMinMLP (13). But, since only their respective orthogonalprojection parts, namely and are, in fact, effective in GlbPseudoEMinMLP (27) and LP (28), respectively, as one may write:
plus, by Theorem 5, the margins with respect to of coincide with the ones of , and the margins with respect to of coincide with the ones of , then, in the light of the result of Theorem 3, one would want to know to which extent at least one of the following two maxmin problems:
(29) 
and
(30) 
achieves an optimal objective which is equal to , as by Theorem 3, this would immediately imply that one may efficiently solve any HoMPP instance by means of its LP relaxation. Furthermore, it is easy to check that maxmin problem (29) is an upper bound for maxmin problem (30), implying that, if the latter achieves an optimal objective value which is equal to , then the former will also achieve an optimal objective value which is equal to . But nevertheless, we will establish, hereafter, two separate results for each of maxmin problems (29) and (30) above, just in order to stress on the fact that, for finding the modes of , one actually has the choice between solving two LP instances, namely, PseudoEMinMLP (22) and PseudoEMaxMLP (23), or solving a single LP instance, namely, PseudoEMinMLP (24), as both choices above turn out to be equivalent.
Theorem 7
The optimal objective value of maxmin problem (29) is equal to .
Theorem 8
The optimal objective value of maxmin problem (30) is equal to .
In short, Theorem 7 and Theorem 8 establish exactness of the claim that we have just made above which is that both feasible sets of PseudoEMinMLP (24) and LP (28) are within the “tolerance interval” which is allowed by Theorem 3 in order to hope solving the HoMPP by means of its LP relaxation. Said otherwise, by taking into account the arguments that we have developed above, either result of Theorem 7 or of Theorem 8 is enough to guarantee that one may completely solve the HoMPP by means of PseudoEMinMLP (22) (equivalently, by means of PseudoEMaxMLP (23)) or by means of PseudoEMinMLP (24). Therefore, we summarize the latter findings in Theorem 9 and Theorem 10 below.
Theorem 9
8 Computation of a full integral MAP solution of the HoMPP
It might be the case that a HoMPP instance has multiple MAP solutions (i.e.; might have multiple minima and/or multiple maxima), thus, the resolution either of PseudoEMinMLP (22), or PseudoEMaxMLP (23) (resp. of PseudoEMinMLP (24)) might only yield the marginals with respect to of a fractional (i.e.; nonbinary) optimal distribution (resp. a fractional (i.e.; non signedbinary) deltadistribution) happening to be some convex combination of optimal binary distributions (resp. delta distributions). Thus in such a case, one moreover needs join the pieces in order to obtain an full MAP solution of a HoMPP instance. Therefore, the goal in the remainder of this section is to address the latter problem under general assumptions about a HoMPP instance.
8.1 Theory
For the sake of example, assume PseudoEMinMLP (22) of which resolution has yielded an optimal solution denoted by . Then by Theorem 9, stands for a set of marginal distributions of being originated from a joint distribution of RMLV denoted by which is, thus, optimal for EMinMLP (8). Moreover, by Theorem 1, obtaining a full optimal solution of MinMPP (2) amounts to obtaining a sample from , however, for the sake of computational efficiency, one wants to avoid accessing (which is hard). Therefore, in the remainder of this section, we describe an approach for computing a sample of directly from .
Then, a first naive (yet, polynomialtime) algorithm for achieving the aforementioned goal is based on the result of Proposition 5 below.
Proposition 5
Suppose and , such that, . Then, there exists , such that, and .
Proof 3
Such a result of Proposition 5 follows immediately from the identity:
as otherwise, i.e.; if , such that, , one had , then one would have , which is a contradiction with the assumption that .
Based on such a result of Proposition 5, one may proceed as follows. Suppose and , such that, . Thus, if one replaced in the value of with its optimal value , solved a new instance of PseudoEMinMLP (22) accordingly, and repeated this procedure with respect to some , then with respect to some , and so on, until all the variables of are exhausted, one would be guaranteed to ultimately obtain in polynomialtime a full mode of . Obviously, such an algorithm is utterly slow, as it requires solving multiple instances of PseudoEMinMLP (22), successively (yet, with less variables each time). On the other hand, sampling from a general probability distribution by sole access to its marginals is not a straightforward procedure. Fortunately, as it will be shown hereafter, distributions of RMLV which are candidates for optimality in EMinMLP (8) (equivalently, in EMaxMLP (9)) are not any (see Proposition 6 below), thereby, making it possible to efficiently compute their samples by sole access to their marginalssets with respect to .
Let us then begin by introducing the sign function defined as:
Comments
There are no comments yet.