Probabilistic logic models (a.k.a. probabilistic or statistic relational models) provide high-level representation languages for probabilistic models of structured data [Breese (1992), Poole (1993), Sato (1995), Ngo et al. (1995), Jaeger (1997), Friedman et al. (1999), Kersting and Raedt (2001), Milch et al. (2005), Vennekens et al. (2006), Taskar et al. (2002), Richardson and Domingos (2006)]. While supporting model specifications at an abstract, first-order logic level, inference is typically performed at the level of concrete ground instances of the models, i.e., at the propositional level. This mismatch between model specification and inference methods has been noted early on [Jaeger (1997)], and has given rise to numerous proposals for inference techniques that operate at the high level of the underlying model specifications [Poole (2003), de Salvo Braz et al. (2005), Milch et al. (2008), Kisyński and Poole (2009), Jha et al. (2010), Gogate and Domingos (2011), Van den Broeck et al. (2011), Van den Broeck (2011), Fierens et al. (2011)]. Inference methods of this nature have collectively become known as “lifted” inference techniques.
The concept of lifted inference is mostly introduced on an informal level:
“ …lifted, that is,
deals with groups of random variables at a first-order level
…lifted, that is, deals with groups of random variables at a first-order level” [de Salvo Braz et al. (2005)]; “The act of exploiting the high level structure in relational models is called lifted inference” [Apsel and Brafman (2011)]; “The idea behind lifted inference is to carry out as much inference as possible without propositionalizing [Kisyński and Poole (2009)]; “lifted inference, which deals with groups of indistinguishable variables, rather than individual ground atoms [Singla et al. (2010)]. While, thus, the term lifted inference emerges as a quite coherent algorithmic metaphor, it is not immediately obvious what its exact technical meaning should be. Since quite a variety of different algorithmic approaches are collected under the label “lifted”, and since most of them can degenerate for certain models to ground, or propositional, inference, it is difficult to precisely define the class of lifted inference techniques in terms of specific algorithmic techniques employed.
A more fruitful approach is to make more precise the concept of lifted inference in terms of its objectives. Here one observes that lifted inference techniques very consistently are evaluated on, and compared against each other, by how well inference complexity scales as a function of the domain (or population) for which the general model is instantiated. Thus, empirical evaluations of lifted inference techniques are usually presented in the form of domainsize vs. inference time plots as shown in Figure 1.
Van den Broeck 2011, therefore, has proposed a formal definition of domain lifted inference in terms of polynomial time complexity in the domainsize parameter. Experimental and theoretical analyses of existing lifted inference techniques then show that they provide domain lifted inference in some cases where basic propositional inference techniques would exhibit exponential complexity (as illustrated in Figure 1). However, until recently, these positive results were mostly limited to examples of individual models, and little was known about the feasibility of lifted inference for certain well-defined classes of models. First results that show the feasibility of lifted inference for whole classes of models are given by Van den Broeck 2011, and Domingos and Webb 2012.
On the other hand, Jaeger (2000) has shown that under certain assumptions on the expressivity of the modeling language, probabilistic inference is not polynomial in the domainsize, thereby demonstrating some inherent limitations in terms of worst-case complexity for the goals of lifted inference. However, the results of Jaeger (2000) are based on types of probabilistic logic models that are somewhat different from the models that presently receive the most attention: first, they essentially assume a directed
modeling framework, in which the model represents a generative stochastic process for sampling relational structures. The model is defined by specifying marginal and conditional probability distributions for random variables corresponding to ground atoms. Ground instances of the model, then, can be represented by directed graphical models, i.e., Bayesian networks. While the majority of existing model classes fall into the category of directed modelsBreese (1992); Poole (1993); Sato (1995); Ngo et al. (1995); Jaeger (1997); Friedman et al. (1999); Kersting and Raedt (2001); Milch et al. (2005); Vennekens et al. (2006), there is currently a lot of interest in undirected models that are given by a set of soft constraints on relational structures, specified in the form of potential functions, and in the ground case giving rise to undirected graphical models, i.e., Markov networks. Secondly, the results of Jaeger (2000) require quite strong assumptions on the expressivity of the probabilistic-logic modeling language, which is required to allow that conditional distributions of atoms can be specified dependent on unrestricted first-order properties. Much current work, in contrast, is concerned with languages that only incorporate certain weak fragments of first-order logic.
In this paper the general approach of Jaeger (2000) is extended to obtain lower complexity bounds for inference in probabilistic-logic model classes that have emerged as the focus of interest for lifted inference techniques, i.e., undirected models based on quantifier- and function-free fragments of first-order logic.
In a sharp contrast with Jaeger (2000), where a “trivial” constant-time approximate inference method was described, we show that our lower complexity bounds also hold for approximate inference. Further sharpening earlier results, we finally establish that the lower complexity bounds also hold for models not using the equality predicate, which in Jaeger (2000) was conjectured to be the key source of inherent complexity.
A preliminary version of this paper has been published as Jaeger (2012). Its main results were also already included in the survey paper Jaeger and Van den Broeck (2012), which contains a systematic overview of known results and open problems related to the complexity of lifted inference.
In the following section we introduce a general framework in which classes of undirected probabilistic-logic models, and classes of associated inference problems can be defined. Section 3 reviews classic results relating first-order logic models to the complexity class NETIME. Section 4 contains our main results, and Section 5 discusses some notable differences that emerge between the results for directed and for undirected models.
2 Weighted Feature Models
Similarly as Richardson and Domingos (2006), Van den Broeck et al. (2011) and Gogate and Domingos (2011) we assume the following framework: a model, or knowledge base, is given by a set of weighted formulas:
where the are formulas in first-order predicate logic, are non-negative weights, and are the free variables of . The case , i.e., is a sentence without free variables, is also permitted. The use a given signature of relation-, function-, and constant symbols.
An interpretation (or possible world) for consists of a domain , and an interpretation function that maps the symbols in to functions, relations and elements on . For a tuple then the truth value of is defined, and we write , or simpler , if is true in . We use to denote the set of all interpretations for the signature over the domain .
In this paper we are only concerned with finite domains, and assume without loss of generality that for some .
For let denote the number of elements in for which . The weight of then is
where . The probability of is
where is the normalizing constant (partition function)
For a first-order sentence and then
is the probability of in .
We call a knowledge base (1) together with the semantics given by (2) and (4) a weighted feature model, since it associates weights with model features . Weighted feature models in our sense can be seen as a slight generalization as weighted model counting (wmc) frameworks Fierens et al. (2011); Gogate and Domingos (2011) in which non-zero weights are only associated with literals. Knowledge bases of the form (1) can be translated into wmc frameworks via an introduction of new relation symbols , hard constraints , and weighted formulas Van den Broeck et al. (2011); Gogate and Domingos (2011). Up to an expansion of the signature, thus, weighted feature models and wmc are equally expressive. Markov Logic Networks Richardson and Domingos (2006) also are based on knowledge bases of the form (1) allowing arbitrary formulas . However, the semantics of the model there depends on a transformation of the formulas into conjunctive normal form, and therefore does not exactly correspond to (2) and (4), unless the are clauses.
All types of models here discussed, thus, are very similar in nature, and only differ with respect to certain restrictions on what types of logically defined features can be associated with a weight. The general definition of weighted feature models gives us the flexibility of considering a variety of classes of such restrictions.
A probabilistic inference problem for a weighted feature model is given by a knowledge base KB, a domainsize , and two first-order sentences . The solution to the inference problem is the conditional probability .
A class of inference problems is defined by allowing arguments KB, , and only from some restricted classes , (the query class), and (the evidence class), respectively. We use the notation
for classes of inference problems.
The results of this paper will be given for the case where consists of all ground atoms, denoted , and is empty. Thus, as far as and are concerned, we are considering the most restrictive class of inference problems. Since we are deriving lower complexity bounds, this leads to the strongest possible results, which directly apply also to more general classes and .
Classes are defined by various syntactic restrictions on the formulas in the knowledge base. In this paper, we consider the following fragments of first-order logic (FOL): relational FOL (RFOL), i.e. FOL without function and constant symbols; 0-RFOL, which is the quantifier-free fragment of RFOL, and 0-RFOL, which is 0-RFOL without the equality relation.
An algorithm solves a class , if it solves all instances in the class. An algorithm -approximately solves , if for any in the class it returns a number . An algorithm that solves is polynomial in the domainsize, if for fixed the computation of is polynomial in .
3 Spectra and Complexity
The following definition introduces the central concept for our analysis.
Let be a sentence in first-order logic. The spectrum of is the set of integers for which is satisfiable by an interpretation of size .
Let , where
expresses that the binary relation defines an undirected graph () in which every node is connected to exactly one other node (). Thus, describes a pairing relation that is satisfiable exactly over domains of even size: .
The complexity class ETIME consists of problems solvable in time , for some constant . The corresponding nondeterministic class is NETIME. Note that these classes are distinct from the more commonly studied classes (N)EXPTIME, which are characterized by complexity bounds Johnson (1990). For let denote the binary coding of , and the unary coding (i.e., is represented as a sequence of 1s). A set is in (N)ETIME, iff is in (N)ETIME, which also is equivalent to being in (N)PTIME.
Like Jaeger (2000), we use the following connection between spectra and NETIME as the key tool for our complexity analysis.
Jones and Selman (1972) A set is in NETIME, iff is the spectrum of a sentence RFOL.
If , then there exists a first-order sentence , such that is not recognized in deterministic polynomial time.
Thus, by reducing instances of the spectrum recognition problem to probabilistic inference problems , where are fixed for the given , one establishes that the is not polynomial in the domainsize (under the assumption ).
4 Complexity Results
This section contains our complexity results. We begin with a result for knowledge bases using full RFOL. This is rather straightforward, and (for exact inference) already implied by the results of Jaeger (2000). We then proceed to extend this base result to 0-RFOL and 0-RFOL.
4.1 Base Result: the RFOL Case
If , then there does not exist an algorithm that 0.25-approximately solves in time polynomial in the domainsize.
The proof of this theorem provides the general pattern also for subsequent proofs. It is therefore here given in full.
Let be a sentence with a non-polynomial spectrum as given by Corollary 3. Let be the relational signature of . Let be a new relation symbol of arity zero (i.e., represents a propositional variable). The first weighted formula in our knowledge base then is
We now already have that iff there exists with , i.e., iff . This already reduces the decision problem for to solving exactly. However, from the 0-1 laws of first-order logic Fagin (1976), it follows that for our current KB: . Thus, for every we could define an -approximate constant-time inference algorithm by returning 0 for all sufficiently large .
In order to obtain our result for approximate inference, we will now ensure that for all the probability is greater than 0.5, while it remains zero for . We do this essentially by calibrating the normalization constant in (3). For this we introduce another new relation , and add to KB:
Thus, for every there is exactly one interpretation with nonzero weight in which is true (the one in which all relations have empty interpretations). Finally, we give zero weight to all interpretations except those in which or is true:
Let KB consist of (5),(6),(7). Every then has weight 0 if it satisfies one of the three formulas, and weight 1 otherwise. Consider the case . Then, by (5) . By (7) this then means that in all interpretations of nonzero weight must be true. By (6) there is exactly one such interpretation. Thus, in (3) is 1, and .
If , then , and (if the interpretation in which all are empty also is a model of ), or (otherwise). Thus, . A 0.25-approximate inference algorithm for , thus, would decide .
4.2 The 0-RFOL Case
We now proceed towards our main result, which is going from RFOL to 0-RFOL. If we wanted to allow function and constant symbols in our knowledge base, then one could go to a quantifier-free fragment in a quite straightforward manner using Skolemization. Since satisfiability over a given domain is the same for a formula and its quantifier-free Skolemized version , the arguments of the proof of Theorem 4.1 would go through with little change. In order to accomplish the same using only the relational fragment 0-RFOL, we define the relational Skolemization of a formula. The idea is to replace function and constant symbols in the Skolemized version of a formula with relational representations. For example, the Skolemized version of from Example 3 is
with a new function symbol . Introducing a relational encoding of leads to
with a new binary relation symbol encoding . This translation must be accompanied by axioms that confine the possible interpretations of to relations that encode functions.
Such relational encodings of functions are well established. However, there does not seem to be a standard account of this technique that serves our purpose. The following proposition, therefore, provides the relevant result in a form tailored for our needs.
Let , where is a set of relation symbols, and a set of function and constant symbols. Let be a set of new relation symbols that for every -ary contains a -ary (constant symbols are treated as 0-ary function symbols). Let Func be the set of sentences that for every contains
Then there exists a formula 0-RFOL, such that the following are equivalent for all :
there exists with
there exists with
If is the Skolemization of a formula RFOL, we then call the relational Skolemization of , written .
Our plan, now, is to prove the analogon of Theorem 4.1 for 0-RFOL by replacing in (5) with . However, this is not enough, since we also need to constrain the models of our knowledge base (more precisely: those models in which is true) to satisfy the axioms (8) and (9). This poses a problem, because (9) contains an existential quantifier, and so we cannot add this axiom directly as a constraint to a knowledge base restricted to 0-RFOL. Indeed, we almost seem to have gone full circle, since we are back at knowledge bases in a relational vocabulary with existential quantification! However, we now have reduced arbitrary occurrences of existential quantifiers to occurrences only within in the special formulas (9).
Our strategy, now, is to approximate formulas (9) with weighted formulas of the form
that reward models of in which the existential quantifier of (9) is satisfied for many (all) . We will no longer be able to ensure that when . However, by a suitable choice of , and by a careful calibration of the weight of models of the alternative proposition , we still can ensure that when , and when . However, the right calibration of the weights of models of and within will now require that one sets to a value depending on .
This means that we no longer can reduce the decision problem to the probabilistic inference problem for a fixed knowledge base KB. We only achieve a reduction to the inference problem , where the logical structure of KB is fixed, but a weight parameter depends on . Generally, for a knowledge base KB containing weighted formulas, we denote with the knowledge base that contains the same formulas as KB, but with the weights set to values .
To translate the lower complexity bounds of the original spectrum recognition problem into lower complexity bounds for the resulting inference problem, one now has to be precise about the representation of the inference problem. To this end, we assume that weights are rational numbers, and represented by pairs of integers, so that . We then define the representation size as . The total representation size of the weight parameters in a knowledge base is . An inference algorithm for probabilistic inference problems in is polynomial in the domainsize and the representation size of the weight parameters, if for any , , the class of inference problems can be solved in time that is bounded by a polynomial ( ). We can now state the following theorem:
If , then there does not exist an algorithm that 0.2-approximately solves in time polynomial in the domainsize and the representation size of the weight parameters.
The full proof of the theorem is given in the appendix. It consists of a polynomial-time reduction of the decision problem to a probabilistic inference problem , where is polynomial in . An inference algorithm that can solve in time polynomial in the domainsize and , thus, would yield a polynomial decision procedure for .
4.3 Polynomiality in
One may wonder how strong or surprising Theorem 4.2 really is in light of its extra runtime polynomial in condition. It has previously been emphasized that lifted inference procedures should only be expected to be polynomial in the domain size, but not in other parameters that characterize the complexity of KB Jaeger (2000); Van den Broeck (2011). These remarks, however, have mostly been motivated by considerations of the logical complexity of KB, e.g. in terms of the number and complexity of its weighted formulas, or the size of the signature. The complexity in terms of numerical parameters, on the other hand, has not received much attention.
To better understand the nature of the condition of being polynomial in the domainsize and , we have to look a little closer at how the parameters affect the complexity of the computation. We consider algorithms that can be described as follows: to compute the algorithm performs a number of steps , where step consists either of executing a constant time operation that does not depend on the numerical model parameters (e.g., a logical operation on formulas), or of a basic operation on numerical parameters.
We consider the executions the algorithm performs on inputs with fixed logical structure KB, and fixed , but varying weight parameters and domainsizes . Let denote the set of all numerical variables stored by the algorithm before performing step , when it is run on inputs . Thus, comprises the original weight parameters of the model, as well as computed intermediate results, etc. We now make two basic assumptions on the algorithm:
The weight parameters only influence the numerical values of the variables stored in , but not the sequence of execution steps performed by the algorithm. In particular, the number of execution steps performed by the algorithm only depends on : .
The basic operations performed on numerical variables are polynomial time in the size of their arguments, and they produce an output whose size is linear in the size of the inputs. This is the case for the basic arithmetic operations addition and multiplication, for example.
The total representation size of then is bounded by , where is a coefficient not depending on . Also, let be a polynomial that provides a common complexity bound for the basic numerical operations that can be performed at one step. The total execution time of the algorithm on input then is bounded by
If, now, for fixed weight vectorsthe algorithm is polynomial in (equivalently: the algorithm is polynomial in under a computation model where basic numeric operations are constant time), then and must be polynomially bounded in . The combined complexity (11) then, in fact, is polynomial both in and .
In summary, this shows: an algorithm that for fixed is polynomial in , and that satisfies assumptions (A1) and (A2), actually is polynomial in and . Thus, for this type of algorithm, the additional restriction of Theorem 4.2 compared to Theorem 4.1 is insignificant.
The remaining question, then, is how restrictive or realistic assumptions (A1) and (A2) actually are. For exact inference algorithms it appears that (A1) and (A2) are satisfied by all existing approaches, with a small qualification: algorithms might give special treatment to special weight parameters, such as or , which then can lead to a violation of (A1) in the strict sense. However, our analysis could also be performed based on a weakened form of (A1) that allows certain special weights to influence the computation differently from proper numerical weights . A slightly more elaborate argument would then arrive at essentially the same conclusions.
The situation is less clear for approximate inference algorithms. Here the numerical values stored in may influence the algorithm in multiple ways: for example, they can be used to test a termination condition, or to decided which computations to perform next in order to improve approximation bounds derived so far. In all such cases, the model weights can have an impact on the sequence and the total number of execution steps, and (A1) is not satisfied. Thus, even though the theorem also applies to approximate inference, its implications for the construction of approximate inference algorithms may be less severe, since there might be reasonable ways to build approximate inference algorithms that are polynomial in , without also being polynomial in .
4.4 The Case
In a final strengthening of our results, we now move on to the fragment . The availability of the equality predicate for the formulas of KB, so far, has been an important prerequisite for our arguments, because Theorem 3 crucially depends on equality: spectra for formulas are always of the form for some , and, thus, decidable in constant time. For this reason it was suggested in Jaeger (2000) that one should focus on logical fragments without equality when looking for model classes for which lifted inference scales polynomially in the domainsize. As our final result shows, however, elimination of equality may not have such a large impact on complexity, after all.
If , then there does not exist an algorithm that 0.2-approximately solves in time polynomial both in the domainsize, and the representation size of the weight parameters.
This theorem is a generalization of Theorem 4.2, and, strictly speaking, makes 4.2 redundant. It is only for expository purposes, and greater transparency in the proof arguments, that we here develop these results in two steps.
The proof of Theorem 4.4 is a refinement of the proof of Theorem 4.2. In addition to approximating Skolem functions with relations , we now also approximate the equality predicate with a binary relation . Similarly as we could not impose in 0-RFOL hard constraints that ensure that encodes a function, we also cannot constrain models to always interpret as the equality relation. However, just as with (8) and (10) we rewarded interpretations with functional , we can penalize interpretations in which is not true equality by means of the two weighted formulas
where is a large weight.
5 Approximate Inference , Convergence, and Evidence
There are some notable differences with respect to approximate inference between the results we here obtained for weighted model counting, and the results of Jaeger (2000). In Jaeger (2000) it was shown that due to convergence of query probabilities as , in theory a trivial constant time approximation algorithm exists: perform exact inference for all input domains up to a size , and output the limit probability for all domains of size . This “algorithm”, however, has no practical use, since for a desired accuracy value one first would have to determine a sufficiently high threshold value to make the output indeed be an -approximation.
Nevertheless, the difference between the existence of an impractical approximation algorithm on the one hand, and the non-existence of any approximation algorithm on the other hand, is just one consequence of a more fundamental difference: while in the models considered in Jaeger (2000) query probabilities converge to a limit, this is not necessarily the case for knowledge bases of weighted formulas – at least when full RFOL is allowed: in the proof of Theorem 4.1 we have constructed knowledge bases KB, such that oscillates between zero and values as oscillates between and its complement. The construction of knowledge bases with this behavior does not require formulas with a non-polynomial spectrum as in Corollary 3, and is not contingent on . Already a knowledge base as constructed in the proof of Theorem 4.1 with replaced by of Example 3 will show this behavior.
The reason behind these different convergence properties lies in a somewhat different role that conditioning on evidence plays in directed and undirected models: in the former, a conditional probability defined by a model can, in general, not be defined as an unconditional probability in a modified model . As a result, the convergence guarantees and – theoretical – approximability for certain classes of unconditional queries , do not carry over to conditional queries .
For weighted feature knowledge bases KB, on the other hand, there is no fundamental difference between unconditional and conditional queries and , respectively. To reduce the conditional to unconditional queries, one can just add to KB the hard constraint to obtain with . This means that as long as is not more expressive than , the problem classes and have the same characteristics in terms of complexity as a function of the domainsize. Note, though, that this is only true when we consider complexity of strictly as a function of for fixed . If the evidence is allowed to change with the domainsize, i.e., , then even in cases where restrictions on make polynomial in , one can define sequences of inference problems with , that are no longer polynomial in Van den Broeck and Davis (2012).
We have shown that for currently quite popular probabilistic-logic models consisting of collections of weighted, quantifier- and function-free formulas there is likely to be no general polynomial lifted inference method (contingent on ). Somewhat surprisingly, this even holds for approximate inference. Between this negative result, and the positive result of Van den Broeck (2011), there still could be a lot of room for identifying tractable fragments by restricting 0-RFOL further via limits on the number of variables, or the richness of the signature .
Appendix A Proofs
of Proposition 4.2
We begin by defining the term-depth of a term in the signature as the maximal nesting depth of function symbols in . Precisely, we define inductively: if , then has term depth 0. If (a constant), or (a function term with only variables as arguments), then has term depth 1. If , then the term depth of is one plus the maximal term depth of the .
The term depth of a formula is the maximal term depth of the terms it contains.
We now show that every formula of term depth can be transformed into a formula of term depth in 0-FOL, such that the statement for of the proposition holds for (but with instead of in ii). The proposition then follows by defining as the result of iteratively applying such transformations to . Since the term depth of the resulting is zero, then actually 0-RFOL.
Let be the set of all distinct terms (including sub-terms) of depth 1 appearing in . Let be new variables. Define as
To now show iii let with . Define as the expansion of in which each is interpreted as the relational representation of , i.e., iff . Clearly, . Furthermore, the following are equivalent:
For iii let as in ii be given. Since , we can turn into an interpretation for by defining as the unique for which holds in . Then, by the same equivalences as above, implies .
of Theorem 4.2 Let RFOL as given by Corollary 3, and its relational Skolemization. Let be the original signature of , and the relation symbols introduced in the relational Skolemization. Furthermore, for each -ary we introduce a new -ary relation . These new symbols will be used to calibrate the weight of models for the reference proposition . Note that the arity of symbols in is at least 1, and , thus, is well-defined, but may contain relations of arity 0. We denote with the collection of all the introduced symbols. We now reduce the spectrum recognition problem for to probabilistic inference from a knowledge base in the signature .
The first formula in our knowledge base is
We now approximately axiomatize the functional nature of the symbols . The sentence (8) can be directly encoded as a weighted formula:
Next, we would like to enforce (9) by means of a weighted formula. However, (9) encodes the essence of the existential quantifiers we are about to eliminate, and, thus, it is not surprising that this is not possible to enforce strictly. However, we can reward models in which the existential quantification of (9) is satisfied via the weighted formulas
where is a weight whose exact value is to be defined later.
We now proceed with constraining models of the reference proposition . First, all symbols in shall have an empty interpretations in models of :
In order to allow -models to gain some weight, we use the extra symbols in :
where is the same weight as in (16). To further limit the possible interpretations of -models, we also stipulate:
The extra symbols must have empty interpretations in -models:
Finally, we add:
We now determine (approximately) and for the cases and .
First, consider : for any , there exists exactly one interpretation with nonzero weight in which is true. This is the interpretation in which all relations in are empty ((17),(18)), all relations in are maximal (20), and, in consequence of the latter, because of (21), is false.
Assume that , where has arity . Then contributes via (19) a factor of to , and the total weight is:
using for abbreviation .
We next turn to in the case . Then there exists at least one interpretation , in which is true, and in which the relations from have a functional interpretation. We can expand this interpretation to an interpretation in by giving all relations in an empty interpretation, and setting to true and to false. Then does not violate any hard constraint in KB, and collects from (16) a total weight of . Thus
and therefore, when
Finally, we have to consider in the case . For any with nonzero weight in which is true, because of (14), also must be true. This, now, only is possible when some is not a functional relation, which, because of (15) can only mean that for some there exists no with . The total weight of accrued from (16) then is at most . Because of (21), cannot obtain any additional weight from (19), so that
The total number of interpretations in is for a polynomial . Thus
We now obtain for the case
Setting , we thus have if . The representation size of is polynomial in . Thus, an algorithm that computes up to an accuracy of in time polynomial in and the representation size of would give a polynomial time decision procedure for .
Let be a new binary relation symbol. We replace equalities in (14) and (15) with . To (approximately) axiomatize as the identity relation in models of , we add to the knowledge base consisting of (14)-(22) the weighted formulas
where is the same weight as in (16) and (19), and whose exact value is to be determined later. To calibrate the weight of -models, we introduce in analogy to the relations a unary relation , and in analogy to (19) - (21) add to the knowledge base
We now obtain for all
If , then there exists an interpretation in which is true, the have a functional interpretation, and the interpretation of is the identity relation. We can thus lower-bound the weight of by the weight of that interpretation:
As in (24), one then obtains .
We now turn to the case . Consider any in which is true, and that has nonzero weight. This now, only is possible when in there is an which is not a functional relation, or when is not the identity relation in (or both). In all cases, the weight of coming from (16) and (29) is at most . The total number of interpretations in is for a polynomial . Thus
from which, as in (27), then . Now setting again yields the bound .
Apsel, U. and Brafman, R. I. 2011.
Extended lifted inference with joint formulas.
Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-11). AUAI Press, 11–18.
- Breese (1992) Breese, J. S. 1992. Construction of belief and decision networks. Computational Intelligence 8, 4, 624–647.
- de Salvo Braz et al. (2005) de Salvo Braz, R., Amir, E., and Roth, D. 2005. Lifted first-order probabilistic inference. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05). 1319–1325.
- Domingos and Webb (2012) Domingos, P. and Webb, W. A. 2012. A tractable first-order probabilistic logic. In Proc. of AAAI-12. To appear.
- Fagin (1976) Fagin, R. 1976. Probabilities on finite models. Journal of Symbolic Logic 41, 1, 50–58.
Fierens et al. (2011)
Fierens, D., den Broeck, G. V., Thon, I., Gutmann, B.,
and Raedt, L. D. 2011.
Inference in probabilistic logic programs using weighted cnf’s.In Proc. of UAI 2011.
- Friedman et al. (1999) Friedman, N., Getoor, L., Koller, D., and Pfeffer, A. 1999. Learning probabilistic relational models. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99).
- Gogate and Domingos (2011) Gogate, V. and Domingos, P. 2011. Probabilistic theorem proving. In Proceedings of the 27th Conference of Uncertainty in Artificial Intelligence (UAI-11).
- Jaeger (1997) Jaeger, M. 1997. Relational bayesian networks. In Proceedings of the 13th Conference of Uncertainty in Artificial Intelligence (UAI-13), D. Geiger and P. P. Shenoy, Eds. Morgan Kaufmann, Providence, USA, 266–273.
- Jaeger (2000) Jaeger, M. 2000. On the complexity of inference about probabilistic relational models. Artificial Intelligence 117, 297–308.
- Jaeger (2012) Jaeger, M. 2012. Lower complexity bounds for lifted inference. http://arxiv.org/abs/1204.3255.
- Jaeger and Van den Broeck (2012) Jaeger, M. and Van den Broeck, G. 2012. Liftability of probabilistic inference: Upper and lower bounds. In Proceedings of the 2nd International Workshop on Statistical Relational AI.
- Jha et al. (2010) Jha, A., Gogate, V., Meliou, A., and Suciu, D. 2010. Lifted inference seen from the other side: The tractable features. In Proc. of NIPS.
- Johnson (1990) Johnson, D. S. 1990. A catalog of complexity classes. In Handbook of Theoretical Computer Science, J. van Leeuwen, Ed. Vol. 1. Elsevier, Amsterdam, 67–161.
Jones, N. D. and Selman, A. L. 1972.
Turing machines and the spectra of first-order formulas with
Proceedings of the Fourth ACM Symposium on Theory of Computing. 157–167.
- Kersting and Raedt (2001) Kersting, K. and Raedt, L. D. 2001. Towards combining inductive logic programming with bayesian networks. In Proceedings of the 11th International Conference on Inductive Logic Programming (ILP-01). LNAI, vol. 2157. 118–131.
- Kisyński and Poole (2009) Kisyński, J. and Poole, D. 2009. Lifted aggregation in directed first-order probabilistic models. In Proc. of IJCAI 2009.
- Milch et al. (2005) Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D., and Kolobov, A. 2005. Blog: Probabilistic logic with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05). 1352–1359.
- Milch et al. (2008) Milch, B., Zettlemoyer, L. S., Kersting, K., Haimes, M., and Kaelbling, L. P. 2008. Lifted probabilistic inference with counting formulas. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI-08).
- Ngo et al. (1995) Ngo, L., Haddawy, P., and Helwig, J. 1995. A theoretical framework for context-sensitive temporal probability model construction with application to plan projection. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. 419–426.
- Poole (1993) Poole, D. 1993. Probabilistic horn abduction and Bayesian networks. Artificial Intelligence 64, 81–129.
- Poole (2003) Poole, D. 2003. First-order probabilistic inference. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03).
- Richardson and Domingos (2006) Richardson, M. and Domingos, P. 2006. Markov logic networks. Machine Learning 62, 1-2, 107 – 136.
- Sato (1995) Sato, T. 1995. A statistical learning method for logic programs with distribution semantics. In Proceedings of the 12th International Conference on Logic Programming (ICLP’95). 715–729.
- Singla et al. (2010) Singla, P., Nath, A., and Domingos, P. 2010. Approximate lifted belief propagation. In Proc. of AAAI-10 Workshop on Statistical Relational AI.
- Taskar et al. (2002) Taskar, B., Abbeel, P., and Koller, D. 2002. Discriminative probabilistic models for relational data. In Proc. of UAI 2002.
- Van den Broeck (2011) Van den Broeck, G. 2011. On the completeness of first-order knowledge compilation for lifted probabilistic inference. In Proc. of the 25th Annual Conf. on Neural Information Processing Systems (NIPS).
- Van den Broeck and Davis (2012) Van den Broeck, G. and Davis, J. 2012. Conditioning in first-order knowledge compilation and lifted probabilistic inference. In Proceedings of the Twenty-Sixth AAAI Conference on Articial Intelligence. 1961–1967.
- Van den Broeck et al. (2011) Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., and Raedt, L. D. 2011. Lifted probabilistic inference by first-order knowledge compilation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11).
- Vennekens et al. (2006) Vennekens, J., Denecker, M., and Bruynooghe, M. 2006. Representing causal information about a probabilistic process. In Logics in Artificial Intelligence, 10th European Conference, JELIA 2006, Proceedings. Lecture Notes in Computer Science, vol. 4160. Springer, 452–464.