# Lower Complexity Bounds for Lifted Inference

One of the big challenges in the development of probabilistic relational (or probabilistic logical) modeling and learning frameworks is the design of inference techniques that operate on the level of the abstract model representation language, rather than on the level of ground, propositional instances of the model. Numerous approaches for such "lifted inference" techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established early on in (Jaeger 2000). However, it is not immediate that these results also apply to the type of modeling languages that currently receive the most attention, i.e., weighted, quantifier-free formulas. In this paper we extend these earlier results, and show that under the assumption that NETIME =/= ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier- and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference, and for knowledge bases not containing the equality predicate.

## Authors

• 10 publications
• ### Compilation of Propositional Weighted Bases

In this paper, we investigate the extent to which knowledge compilation ...
07/11/2002 ∙ by Adnan Darwiche, et al. ∙ 0

• ### The Complexity of Bayesian Networks Specified by Propositional and Relational Languages

We examine the complexity of inference in Bayesian networks specified by...
12/04/2016 ∙ by Fabio Gagliardi Cozman, et al. ∙ 0

• ### The Polynomial Method is Universal for Distribution-Free Correlational SQ Learning

We consider the problem of distribution-free learning for Boolean functi...
10/22/2020 ∙ by Aravind Gollakota, et al. ∙ 0

• ### Stratified Knowledge Bases as Interpretable Probabilistic Models (Extended Abstract)

In this paper, we advocate the use of stratified logical theories for re...
11/18/2016 ∙ by Ondrej Kuzelka, et al. ∙ 0

• ### Lower Bounds for Approximate Knowledge Compilation

Knowledge compilation studies the trade-off between succinctness and eff...
11/27/2020 ∙ by Alexis de Colnet, et al. ∙ 0

• ### Extended Lifted Inference with Joint Formulas

The First-Order Variable Elimination (FOVE) algorithm allows exact infer...
02/14/2012 ∙ by Udi Apsel, et al. ∙ 0

• ### The Language of Search

This paper is concerned with a class of algorithms that perform exhausti...
10/12/2011 ∙ by A. Darwiche, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Probabilistic logic models (a.k.a. probabilistic or statistic relational models) provide high-level representation languages for probabilistic models of structured data [Breese (1992), Poole (1993), Sato (1995), Ngo et al. (1995), Jaeger (1997), Friedman et al. (1999), Kersting and Raedt (2001), Milch et al. (2005), Vennekens et al. (2006), Taskar et al. (2002), Richardson and Domingos (2006)]. While supporting model specifications at an abstract, first-order logic level, inference is typically performed at the level of concrete ground instances of the models, i.e., at the propositional level. This mismatch between model specification and inference methods has been noted early on [Jaeger (1997)], and has given rise to numerous proposals for inference techniques that operate at the high level of the underlying model specifications [Poole (2003), de Salvo Braz et al. (2005), Milch et al. (2008), Kisyński and Poole (2009), Jha et al. (2010), Gogate and Domingos (2011), Van den Broeck et al. (2011), Van den Broeck (2011), Fierens et al. (2011)]. Inference methods of this nature have collectively become known as “lifted” inference techniques.

The concept of lifted inference is mostly introduced on an informal level: “

…lifted, that is, deals with groups of random variables at a first-order level

” [de Salvo Braz et al. (2005)]; “The act of exploiting the high level structure in relational models is called lifted inference” [Apsel and Brafman (2011)]; “The idea behind lifted inference is to carry out as much inference as possible without propositionalizing [Kisyński and Poole (2009)]; “lifted inference, which deals with groups of indistinguishable variables, rather than individual ground atoms [Singla et al. (2010)]. While, thus, the term lifted inference emerges as a quite coherent algorithmic metaphor, it is not immediately obvious what its exact technical meaning should be. Since quite a variety of different algorithmic approaches are collected under the label “lifted”, and since most of them can degenerate for certain models to ground, or propositional, inference, it is difficult to precisely define the class of lifted inference techniques in terms of specific algorithmic techniques employed.

A more fruitful approach is to make more precise the concept of lifted inference in terms of its objectives. Here one observes that lifted inference techniques very consistently are evaluated on, and compared against each other, by how well inference complexity scales as a function of the domain (or population) for which the general model is instantiated. Thus, empirical evaluations of lifted inference techniques are usually presented in the form of domainsize vs. inference time plots as shown in Figure 1.

Van den Broeck 2011, therefore, has proposed a formal definition of domain lifted inference in terms of polynomial time complexity in the domainsize parameter. Experimental and theoretical analyses of existing lifted inference techniques then show that they provide domain lifted inference in some cases where basic propositional inference techniques would exhibit exponential complexity (as illustrated in Figure 1). However, until recently, these positive results were mostly limited to examples of individual models, and little was known about the feasibility of lifted inference for certain well-defined classes of models. First results that show the feasibility of lifted inference for whole classes of models are given by Van den Broeck 2011, and Domingos and Webb 2012.

On the other hand, Jaeger (2000) has shown that under certain assumptions on the expressivity of the modeling language, probabilistic inference is not polynomial in the domainsize, thereby demonstrating some inherent limitations in terms of worst-case complexity for the goals of lifted inference. However, the results of Jaeger (2000) are based on types of probabilistic logic models that are somewhat different from the models that presently receive the most attention: first, they essentially assume a directed

modeling framework, in which the model represents a generative stochastic process for sampling relational structures. The model is defined by specifying marginal and conditional probability distributions for random variables corresponding to ground atoms. Ground instances of the model, then, can be represented by directed graphical models, i.e., Bayesian networks. While the majority of existing model classes fall into the category of directed models

Breese (1992); Poole (1993); Sato (1995); Ngo et al. (1995); Jaeger (1997); Friedman et al. (1999); Kersting and Raedt (2001); Milch et al. (2005); Vennekens et al. (2006), there is currently a lot of interest in undirected models that are given by a set of soft constraints on relational structures, specified in the form of potential functions, and in the ground case giving rise to undirected graphical models, i.e., Markov networks. Secondly, the results of Jaeger (2000) require quite strong assumptions on the expressivity of the probabilistic-logic modeling language, which is required to allow that conditional distributions of atoms can be specified dependent on unrestricted first-order properties. Much current work, in contrast, is concerned with languages that only incorporate certain weak fragments of first-order logic.

In this paper the general approach of Jaeger (2000) is extended to obtain lower complexity bounds for inference in probabilistic-logic model classes that have emerged as the focus of interest for lifted inference techniques, i.e., undirected models based on quantifier- and function-free fragments of first-order logic.

In a sharp contrast with Jaeger (2000), where a “trivial” constant-time approximate inference method was described, we show that our lower complexity bounds also hold for approximate inference. Further sharpening earlier results, we finally establish that the lower complexity bounds also hold for models not using the equality predicate, which in Jaeger (2000) was conjectured to be the key source of inherent complexity.

A preliminary version of this paper has been published as Jaeger (2012). Its main results were also already included in the survey paper Jaeger and Van den Broeck (2012), which contains a systematic overview of known results and open problems related to the complexity of lifted inference.

In the following section we introduce a general framework in which classes of undirected probabilistic-logic models, and classes of associated inference problems can be defined. Section 3 reviews classic results relating first-order logic models to the complexity class NETIME. Section 4 contains our main results, and Section 5 discusses some notable differences that emerge between the results for directed and for undirected models.

## 2 Weighted Feature Models

Similarly as Richardson and Domingos (2006), Van den Broeck et al. (2011) and Gogate and Domingos (2011) we assume the following framework: a model, or knowledge base, is given by a set of weighted formulas:

 \emphKB:ϕ1(\boldv1):w1ϕ2(\boldv2):w2……ϕn(\boldvN):wN (1)

where the  are formulas in first-order predicate logic,  are non-negative weights, and  are the free variables of . The case , i.e.,  is a sentence without free variables, is also permitted. The  use a given signature  of relation-, function-, and constant symbols.

An interpretation (or possible world)  for  consists of a domain , and an interpretation function  that maps the symbols in  to functions, relations and elements on . For a tuple  then the truth value of  is defined, and we write , or simpler , if  is true in . We use  to denote the set of all interpretations for the signature  over the domain .

In this paper we are only concerned with finite domains, and assume without loss of generality that  for some .

For  let  denote the number of elements  in  for which . The weight of  then is

 W\emphKB,n(I):=N∏i=1w#(i,I)i, (2)

where . The probability of  is

 P\emphKB,n(I)=W\emphKB,n(I)/Z

where  is the normalizing constant (partition function)

 Z=∑I∈I(Dn,S)W\emphKB,n(I). (3)

For a first-order sentence  and  then

 P\emphKB,n(ϕ):=P\emphKB,n({I∈I(Dn,S)∣I⊨ϕ}) (4)

is the probability of  in .

We call a knowledge base (1) together with the semantics given by (2) and (4) a weighted feature model, since it associates weights  with model features . Weighted feature models in our sense can be seen as a slight generalization as weighted model counting (wmc) frameworks Fierens et al. (2011); Gogate and Domingos (2011) in which non-zero weights are only associated with literals. Knowledge bases of the form (1) can be translated into wmc frameworks via an introduction of new relation symbols , hard constraints , and weighted formulas  Van den Broeck et al. (2011); Gogate and Domingos (2011). Up to an expansion of the signature, thus, weighted feature models and wmc are equally expressive. Markov Logic Networks Richardson and Domingos (2006) also are based on knowledge bases of the form (1) allowing arbitrary formulas . However, the semantics of the model there depends on a transformation of the formulas into conjunctive normal form, and therefore does not exactly correspond to (2) and (4), unless the  are clauses.

All types of models here discussed, thus, are very similar in nature, and only differ with respect to certain restrictions on what types of logically defined features can be associated with a weight. The general definition of weighted feature models gives us the flexibility of considering a variety of classes of such restrictions.

A probabilistic inference problem  for a weighted feature model is given by a knowledge base KB, a domainsize , and two first-order sentences . The solution to the inference problem is the conditional probability .

A class of inference problems is defined by allowing arguments KB, , and  only from some restricted classes ,  (the query class), and  (the evidence class), respectively. We use the notation

 \emphPI(KB,Q,E):={\emphPI(\emphKB,n,ϕ,ψ)∣\emphKB∈KB,n∈\Nset,χ∈Q,η∈E}

for classes of inference problems.

The results of this paper will be given for the case where  consists of all ground atoms, denoted , and  is empty. Thus, as far as  and  are concerned, we are considering the most restrictive class of inference problems. Since we are deriving lower complexity bounds, this leads to the strongest possible results, which directly apply also to more general classes  and .

Classes  are defined by various syntactic restrictions on the formulas  in the knowledge base. In this paper, we consider the following fragments of first-order logic (FOL): relational FOL (RFOL), i.e. FOL without function and constant symbols; 0-RFOL, which is the quantifier-free fragment of RFOL, and 0-RFOL, which is 0-RFOL without the equality relation.

An algorithm solves a class , if it solves all instances  in the class. An algorithm -approximately solves , if for any  in the class it returns a number . An algorithm that solves  is polynomial in the domainsize, if for fixed  the computation of  is polynomial in .

## 3 Spectra and Complexity

The following definition introduces the central concept for our analysis.

Let  be a sentence in first-order logic. The spectrum of  is the set of integers  for which  is satisfiable by an interpretation of size .

Let , where

 ψ1≡∀x,y  u(x,y)⇔u(y,x)ψ2≡∀x ∃y  y≠x∧u(x,y)ψ3≡∀x,y,y′  (u(x,y)∧u(x,y′)⇒y=y′)

expresses that the binary relation  defines an undirected graph () in which every node is connected to exactly one other node (). Thus,  describes a pairing relation that is satisfiable exactly over domains of even size: .

The complexity class ETIME consists of problems solvable in time , for some constant . The corresponding nondeterministic class is NETIME. Note that these classes are distinct from the more commonly studied classes (N)EXPTIME, which are characterized by complexity bounds  Johnson (1990). For  let  denote the binary coding of , and  the unary coding (i.e.,  is represented as a sequence of  1s). A set  is in (N)ETIME, iff  is in (N)ETIME, which also is equivalent to  being in (N)PTIME.

Like Jaeger (2000), we use the following connection between spectra and NETIME as the key tool for our complexity analysis.

Jones and Selman (1972) A set  is in NETIME, iff  is the spectrum of a sentence  RFOL.

If , then there exists a first-order sentence , such that  is not recognized in deterministic polynomial time.

Thus, by reducing instances  of the spectrum recognition problem to probabilistic inference problems , where  are fixed for the given , one establishes that the  is not polynomial in the domainsize (under the assumption ).

## 4 Complexity Results

This section contains our complexity results. We begin with a result for knowledge bases using full RFOL. This is rather straightforward, and (for exact inference) already implied by the results of  Jaeger (2000). We then proceed to extend this base result to 0-RFOL and 0-RFOL.

### 4.1 Base Result: the RFOL Case

If , then there does not exist an algorithm that 0.25-approximately solves  in time polynomial in the domainsize.

The proof of this theorem provides the general pattern also for subsequent proofs. It is therefore here given in full.

Let  be a sentence with a non-polynomial spectrum as given by Corollary 3. Let  be the relational signature of . Let  be a new relation symbol of arity zero (i.e.,  represents a propositional variable). The first weighted formula in our knowledge base then is

 ¬(ϕ↔a()):0 (5)

We now already have that  iff there exists  with , i.e., iff . This already reduces the decision problem for  to solving  exactly. However, from the 0-1 laws of first-order logic Fagin (1976), it follows that for our current KB: . Thus, for every  we could define an -approximate constant-time inference algorithm by returning 0 for all sufficiently large .

In order to obtain our result for approximate inference, we will now ensure that for all  the probability  is greater than 0.5, while it remains zero for . We do this essentially by calibrating the normalization constant  in (3). For this we introduce another new relation , and add to KB:

 (6)

Thus, for every  there is exactly one interpretation  with nonzero weight in which  is true (the one in which all relations have empty interpretations). Finally, we give zero weight to all interpretations except those in which  or  is true:

 ¬(a()∨b()):0 (7)

Let KB consist of (5),(6),(7). Every  then has weight 0 if it satisfies one of the three formulas, and weight 1 otherwise. Consider the case . Then, by (5) . By (7) this then means that in all interpretations of nonzero weight  must be true. By (6) there is exactly one such interpretation. Thus,  in (3) is 1, and .

If , then , and  (if the interpretation in which all  are empty also is a model of ), or  (otherwise). Thus, . A 0.25-approximate inference algorithm for , thus, would decide .

### 4.2 The 0-RFOL Case

We now proceed towards our main result, which is going from RFOL to 0-RFOL. If we wanted to allow function and constant symbols in our knowledge base, then one could go to a quantifier-free fragment in a quite straightforward manner using Skolemization. Since satisfiability over a given domain is the same for a formula  and its quantifier-free Skolemized version , the arguments of the proof of Theorem 4.1 would go through with little change. In order to accomplish the same using only the relational fragment 0-RFOL, we define the relational Skolemization of a formula. The idea is to replace function and constant symbols in the Skolemized version of a formula with relational representations. For example, the Skolemized version of  from Example 3 is

 ψ\emphSkol2≡∀x  f(x)≠x∧u(x,f(x))

with a new function symbol . Introducing a relational encoding of  leads to

 ψ\emphR−Skol2≡∀x,y Rf(x,y)→(y≠x∧u(x,y))

with  a new binary relation symbol encoding . This translation must be accompanied by axioms that confine the possible interpretations of  to relations that encode functions.

Such relational encodings of functions are well established. However, there does not seem to be a standard account of this technique that serves our purpose. The following proposition, therefore, provides the relevant result in a form tailored for our needs.

Let , where  is a set of relation symbols, and  a set of function and constant symbols. Let  be a set of new relation symbols that for every -ary  contains a -ary  (constant symbols are treated as 0-ary function symbols). Let Func be the set of sentences that for every  contains

 ∀\boldxyy′ (Rf(\boldx,y)∧Rf(\boldx,y′)→y=y′) (8) ∀\boldx∃y Rf(\boldx,y). (9)

Then there exists a formula 0-RFOL, such that the following are equivalent for all :

i

there exists  with

ii

there exists  with

If  is the Skolemization of a formula RFOL, we then call  the relational Skolemization of , written .

Our plan, now, is to prove the analogon of Theorem 4.1 for 0-RFOL by replacing  in (5) with . However, this is not enough, since we also need to constrain the models of our knowledge base (more precisely: those models in which  is true) to satisfy the axioms (8) and (9). This poses a problem, because (9) contains an existential quantifier, and so we cannot add this axiom directly as a constraint to a knowledge base restricted to 0-RFOL. Indeed, we almost seem to have gone full circle, since we are back at knowledge bases in a relational vocabulary with existential quantification! However, we now have reduced arbitrary occurrences of existential quantifiers to occurrences only within in the special formulas (9).

Our strategy, now, is to approximate formulas (9) with weighted formulas of the form

 a()∧Rf(\boldx,y):w (10)

that reward models of  in which the existential quantifier of (9) is satisfied for many (all) . We will no longer be able to ensure that  when . However, by a suitable choice of , and by a careful calibration of the weight of models of the alternative proposition , we still can ensure that  when , and  when . However, the right calibration of the weights of models of  and  within  will now require that one sets  to a value  depending on .

This means that we no longer can reduce the decision problem  to the probabilistic inference problem  for a fixed knowledge base KB. We only achieve a reduction to the inference problem , where the logical structure of KB is fixed, but a weight parameter  depends on . Generally, for a knowledge base KB containing  weighted formulas, we denote with  the knowledge base that contains the same formulas as KB, but with the weights set to values .

To translate the lower complexity bounds of the original spectrum recognition problem into lower complexity bounds for the resulting inference problem, one now has to be precise about the representation of the inference problem. To this end, we assume that weights  are rational numbers, and represented by pairs  of integers, so that . We then define the representation size  as . The total representation size of the weight parameters  in a knowledge base is . An inference algorithm for probabilistic inference problems in  is polynomial in the domainsize and the representation size of the weight parameters, if for any , ,  the class of inference problems  can be solved in time that is bounded by a polynomial  ( ). We can now state the following theorem:

If , then there does not exist an algorithm that 0.2-approximately solves  in time polynomial in the domainsize and the representation size of the weight parameters.

The full proof of the theorem is given in the appendix. It consists of a polynomial-time reduction of the  decision problem to a probabilistic inference problem , where  is polynomial in . An inference algorithm that can solve  in time polynomial in the domainsize and , thus, would yield a polynomial decision procedure for .

### 4.3 Polynomiality in l(\boldw)

One may wonder how strong or surprising Theorem 4.2 really is in light of its extra runtime polynomial in  condition. It has previously been emphasized that lifted inference procedures should only be expected to be polynomial in the domain size, but not in other parameters that characterize the complexity of KB Jaeger (2000); Van den Broeck (2011). These remarks, however, have mostly been motivated by considerations of the logical complexity of KB, e.g. in terms of the number and complexity of its weighted formulas, or the size of the signature. The complexity in terms of numerical parameters, on the other hand, has not received much attention.

To better understand the nature of the condition of being polynomial in the domainsize and , we have to look a little closer at how the parameters affect the complexity of the computation. We consider algorithms that can be described as follows: to compute  the algorithm performs a number of steps , where step  consists either of executing a constant time operation that does not depend on the numerical model parameters (e.g., a logical operation on formulas), or of a basic operation on numerical parameters.

We consider the executions the algorithm performs on inputs with fixed logical structure KB, and fixed , but varying weight parameters  and domainsizes . Let  denote the set of all numerical variables stored by the algorithm before performing step , when it is run on inputs . Thus,  comprises the original weight parameters of the model, as well as computed intermediate results, etc. We now make two basic assumptions on the algorithm:

(A1)

The weight parameters  only influence the numerical values of the variables stored in , but not the sequence of execution steps performed by the algorithm. In particular, the number of execution steps performed by the algorithm only depends on : .

(A2)

The basic operations performed on numerical variables are polynomial time in the size of their arguments, and they produce an output whose size is linear in the size of the inputs. This is the case for the basic arithmetic operations addition and multiplication, for example.

The total representation size of  then is bounded by , where  is a coefficient not depending on . Also, let  be a polynomial that provides a common complexity bound for the basic numerical operations that can be performed at one step. The total execution time of the algorithm on input  then is bounded by

 L(n)∑i=1q(cn(i)l(\boldw)). (11)

If, now, for fixed weight vectors

the algorithm is polynomial in  (equivalently: the algorithm is polynomial in  under a computation model where basic numeric operations are constant time), then  and  must be polynomially bounded in . The combined complexity (11) then, in fact, is polynomial both in  and .

In summary, this shows: an algorithm that for fixed  is polynomial in , and that satisfies assumptions (A1) and (A2), actually is polynomial in  and . Thus, for this type of algorithm, the additional restriction of Theorem 4.2 compared to Theorem 4.1 is insignificant.

The remaining question, then, is how restrictive or realistic assumptions (A1) and (A2) actually are. For exact inference algorithms it appears that (A1) and (A2) are satisfied by all existing approaches, with a small qualification: algorithms might give special treatment to special weight parameters, such as  or , which then can lead to a violation of (A1) in the strict sense. However, our analysis could also be performed based on a weakened form of (A1) that allows certain special weights to influence the computation differently from proper numerical weights . A slightly more elaborate argument would then arrive at essentially the same conclusions.

The situation is less clear for approximate inference algorithms. Here the numerical values stored in  may influence the algorithm in multiple ways: for example, they can be used to test a termination condition, or to decided which computations to perform next in order to improve approximation bounds derived so far. In all such cases, the model weights  can have an impact on the sequence and the total number of execution steps, and (A1) is not satisfied. Thus, even though the theorem also applies to approximate inference, its implications for the construction of approximate inference algorithms may be less severe, since there might be reasonable ways to build approximate inference algorithms that are polynomial in , without also being polynomial in .

### 4.4 The 0-RFOL≠ Case

In a final strengthening of our results, we now move on to the fragment . The availability of the equality predicate for the formulas of KB, so far, has been an important prerequisite for our arguments, because Theorem 3 crucially depends on equality: spectra for formulas  are always of the form  for some , and, thus, decidable in constant time. For this reason it was suggested in Jaeger (2000) that one should focus on logical fragments without equality when looking for model classes for which lifted inference scales polynomially in the domainsize. As our final result shows, however, elimination of equality may not have such a large impact on complexity, after all.

If , then there does not exist an algorithm that 0.2-approximately solves  in time polynomial both in the domainsize, and the representation size of the weight parameters.

This theorem is a generalization of Theorem 4.2, and, strictly speaking, makes 4.2 redundant. It is only for expository purposes, and greater transparency in the proof arguments, that we here develop these results in two steps.

The proof of Theorem 4.4 is a refinement of the proof of Theorem 4.2. In addition to approximating Skolem functions  with relations , we now also approximate the equality predicate  with a binary relation . Similarly as we could not impose in 0-RFOL hard constraints that ensure that  encodes a function, we also cannot constrain models to always interpret  as the equality relation. However, just as with (8) and (10) we rewarded interpretations with functional , we can penalize interpretations in which  is not true equality by means of the two weighted formulas

 a()∧¬E(x,x):0 (12) a()∧E(x,y):1/w (13)

where  is a large weight.

## 5 Approximate Inference , Convergence, and Evidence

There are some notable differences with respect to approximate inference between the results we here obtained for weighted model counting, and the results of Jaeger (2000). In Jaeger (2000) it was shown that due to convergence of query probabilities  as , in theory a trivial constant time approximation algorithm exists: perform exact inference for all input domains up to a size , and output the limit probability for all domains of size . This “algorithm”, however, has no practical use, since for a desired accuracy value  one first would have to determine a sufficiently high threshold value  to make the output indeed be an -approximation.

Nevertheless, the difference between the existence of an impractical approximation algorithm on the one hand, and the non-existence of any approximation algorithm on the other hand, is just one consequence of a more fundamental difference: while in the models considered in Jaeger (2000) query probabilities  converge to a limit, this is not necessarily the case for knowledge bases of weighted formulas – at least when full RFOL is allowed: in the proof of Theorem 4.1 we have constructed knowledge bases KB, such that  oscillates between zero and values  as  oscillates between  and its complement. The construction of knowledge bases with this behavior does not require formulas  with a non-polynomial spectrum as in Corollary 3, and is not contingent on . Already a knowledge base as constructed in the proof of Theorem 4.1 with  replaced by  of Example 3 will show this behavior.

The reason behind these different convergence properties lies in a somewhat different role that conditioning on evidence plays in directed and undirected models: in the former, a conditional probability  defined by a model  can, in general, not be defined as an unconditional probability  in a modified model . As a result, the convergence guarantees and – theoretical – approximability for certain classes of unconditional queries , do not carry over to conditional queries .

For weighted feature knowledge bases KB, on the other hand, there is no fundamental difference between unconditional and conditional queries  and , respectively. To reduce the conditional to unconditional queries, one can just add to KB the hard constraint  to obtain  with . This means that as long as  is not more expressive than , the problem classes  and  have the same characteristics in terms of complexity as a function of the domainsize. Note, though, that this is only true when we consider complexity of  strictly as a function of  for fixed . If the evidence is allowed to change with the domainsize, i.e., , then even in cases where restrictions on  make  polynomial in , one can define sequences of inference problems  with ,  that are no longer polynomial in  Van den Broeck and Davis (2012).

## 6 Conclusion

We have shown that for currently quite popular probabilistic-logic models consisting of collections of weighted, quantifier- and function-free formulas there is likely to be no general polynomial lifted inference method (contingent on ). Somewhat surprisingly, this even holds for approximate inference. Between this negative result, and the positive result of Van den Broeck (2011), there still could be a lot of room for identifying tractable fragments by restricting 0-RFOL further via limits on the number of variables, or the richness of the signature .

## Appendix A Proofs

of Proposition 4.2

We begin by defining the term-depth of a term  in the signature  as the maximal nesting depth of function symbols in . Precisely, we define inductively: if , then  has term depth 0. If  (a constant), or  (a function term with only variables as arguments), then  has term depth 1. If , then the term depth of  is one plus the maximal term depth of the .

The term depth of a formula  is the maximal term depth of the terms it contains.

We now show that every formula  of term depth  can be transformed into a formula  of term depth  in 0-FOL, such that the statement for  of the proposition holds for  (but with  instead of in ii). The proposition then follows by defining  as the result of iteratively applying  such transformations to . Since the term depth of the resulting  is zero, then actually 0-RFOL.

Let  be the set of all distinct terms (including sub-terms) of depth 1 appearing in . Let  be new variables. Define  as

 r⋀i=1Rfi(\boldxi,zi)→ϕ(\boldx)[z1/f1(\boldx1),…,zr/fr(\boldxr)]

To now show iii let  with . Define  as the expansion of  in which each  is interpreted as the relational representation of , i.e.,  iff . Clearly, . Furthermore, the following are equivalent:

 I⊨∀\boldxϕ(\boldx)I⊨∀\boldx\boldz⋀ri=1fi(\boldxi)=zi→ϕ(\boldx)[z1/f1(\boldx1),…,zr/fr(\boldxr)]I+⊨∀\boldx\boldz⋀ri=1Rfi(\boldxi,zi)→ϕ(\boldx)[z1/f1(\boldx1),…,zr/fr(\boldxr)]

For iii let  as in ii be given. Since , we can turn  into an interpretation for  by defining  as the unique  for which  holds in . Then, by the same equivalences as above,  implies .

of Theorem 4.2 Let RFOL as given by Corollary 3, and  its relational Skolemization. Let  be the original signature of , and  the relation symbols introduced in the relational Skolemization. Furthermore, for each -ary  we introduce a new -ary relation . These new symbols will be used to calibrate the weight of models for the reference proposition . Note that the arity of symbols in  is at least 1, and , thus, is well-defined, but may contain relations of arity 0. We denote with  the collection of all the introduced  symbols. We now reduce the spectrum recognition problem for  to probabilistic inference from a knowledge base in the signature .

The first formula in our knowledge base is

 a()∧¬ϕ\emphR−Skol(\boldx):0 (14)

We now approximately axiomatize the functional nature of the symbols . The sentence (8) can be directly encoded as a weighted formula:

 (15)

Next, we would like to enforce (9) by means of a weighted formula. However, (9) encodes the essence of the existential quantifiers we are about to eliminate, and, thus, it is not surprising that this is not possible to enforce strictly. However, we can reward models in which the existential quantification of (9) is satisfied via the weighted formulas

 (16)

where  is a weight whose exact value is to be defined later.

We now proceed with constraining models of the reference proposition . First, all symbols in  shall have an empty interpretations in models of :

 b()∧R(\boldx):0(R∈S) (17)
 (18)

In order to allow -models to gain some weight, we use the extra symbols in :

 b()∧R++(\boldx):w(R++∈S++) (19)

where  is the same weight as in (16). To further limit the possible interpretations of -models, we also stipulate:

 (20)

The extra symbols  must have empty interpretations in -models:

 a()∧R++(\boldx):0(R++∈S++) (21)

 ¬(a()∨b()):0 (22)

We now determine (approximately)  and  for the cases  and .

First, consider : for any , there exists exactly one interpretation  with nonzero weight in which  is true. This is the interpretation in which all relations in  are empty ((17),(18)), all relations in  are maximal (20), and, in consequence of the latter, because of (21),  is false.

Assume that , where  has arity . Then  contributes via (19) a factor of  to , and the total weight is:

 W\emphKB,n(Ib())=W\emphKB,n(b())=wnk1+⋯+nkm=wK(n), (23)

using for abbreviation .

We next turn to  in the case . Then there exists at least one interpretation , in which  is true, and in which the relations from  have a functional interpretation. We can expand this interpretation to an interpretation in  by giving all relations in  an empty interpretation, and setting  to true and  to false. Then  does not violate any hard constraint in KB, and collects from (16) a total weight of . Thus

 W\emphKB,n(a())≥wK(n),

and therefore, when

 P\emphKB,n(a())≥W\emphKB,n(a())/(W\emphKB,n(a())+W\emphKB,n(b()))≥1/2. (24)

Finally, we have to consider  in the case . For any  with nonzero weight in which  is true, because of (14), also  must be true. This, now, only is possible when some  is not a functional relation, which, because of (15) can only mean that for some  there exists no  with . The total weight of  accrued from (16) then is at most . Because of (21),  cannot obtain any additional weight from (19), so that

 W\emphKB,n(I)≤wK(n)−1. (25)

The total number of interpretations in  is  for a polynomial . Thus

 W\emphKB,n(a())≤2L(n)wK(n)−1. (26)

We now obtain for the case

 P\emphKB,n(a())≤W\emphKB,n(a())/W\emphKB,n(b())≤2L(n)wK(n)−1/wK(n)=2L(n)/w. (27)

Setting , we thus have  if . The representation size of  is polynomial in . Thus, an algorithm that computes  up to an accuracy of  in time polynomial in  and the representation size of  would give a polynomial time decision procedure for .

of Theorem 4.4 The proof is an extension of the proof of Theorem 4.2, and we here just give the necessary modifications.

Let  be a new binary relation symbol. We replace equalities  in (14) and (15) with . To (approximately) axiomatize  as the identity relation in models of , we add to the knowledge base consisting of (14)-(22) the weighted formulas

 a()∧¬E(x,x) 0 (28) a()∧E(x,y) 1/w (29)

where  is the same weight as in (16) and (19), and whose exact value is to be determined later. To calibrate the weight of -models, we introduce in analogy to the  relations a unary relation , and in analogy to (19) - (21) add to the knowledge base

 b()∧E++(x) 1/w (30) b()∧¬E++(x) 0 (31) a()∧E++(x) 0 (32)

We now obtain for all

 W\emphKB,n(b())=wK(n)(1/w)n=wK(n)−n. (33)

If , then there exists an interpretation in which  is true, the  have a functional interpretation, and the interpretation of  is the identity relation. We can thus lower-bound the weight of  by the weight of that interpretation:

 W\emphKB,n(a())≥wK(n)(1/w)n=wK(n)−n. (34)

As in (24), one then obtains .

We now turn to the case . Consider any  in which  is true, and that has nonzero weight. This now, only is possible when in  there is an  which is not a functional relation, or when  is not the identity relation in  (or both). In all cases, the weight of  coming from (16) and (29) is at most . The total number of interpretations in  is  for a polynomial . Thus

 W\emphKB,n(a())≤2M(n)wK(n)−n−1, (35)

from which, as in (27), then . Now setting  again yields the bound .

## References

• Apsel and Brafman (2011) Apsel, U. and Brafman, R. I. 2011. Extended lifted inference with joint formulas. In

Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-11)

. AUAI Press, 11–18.
• Breese (1992) Breese, J. S. 1992. Construction of belief and decision networks. Computational Intelligence 8, 4, 624–647.
• de Salvo Braz et al. (2005) de Salvo Braz, R., Amir, E., and Roth, D. 2005. Lifted first-order probabilistic inference. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05). 1319–1325.
• Domingos and Webb (2012) Domingos, P. and Webb, W. A. 2012. A tractable first-order probabilistic logic. In Proc. of AAAI-12. To appear.
• Fagin (1976) Fagin, R. 1976. Probabilities on finite models. Journal of Symbolic Logic 41, 1, 50–58.
• Fierens et al. (2011) Fierens, D., den Broeck, G. V., Thon, I., Gutmann, B., and Raedt, L. D. 2011.

Inference in probabilistic logic programs using weighted cnf’s.

In Proc. of UAI 2011.
• Friedman et al. (1999) Friedman, N., Getoor, L., Koller, D., and Pfeffer, A. 1999. Learning probabilistic relational models. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99).
• Gogate and Domingos (2011) Gogate, V. and Domingos, P. 2011. Probabilistic theorem proving. In Proceedings of the 27th Conference of Uncertainty in Artificial Intelligence (UAI-11).
• Jaeger (1997) Jaeger, M. 1997. Relational bayesian networks. In Proceedings of the 13th Conference of Uncertainty in Artificial Intelligence (UAI-13), D. Geiger and P. P. Shenoy, Eds. Morgan Kaufmann, Providence, USA, 266–273.
• Jaeger (2000) Jaeger, M. 2000. On the complexity of inference about probabilistic relational models. Artificial Intelligence 117, 297–308.
• Jaeger (2012) Jaeger, M. 2012. Lower complexity bounds for lifted inference. http://arxiv.org/abs/1204.3255.
• Jaeger and Van den Broeck (2012) Jaeger, M. and Van den Broeck, G. 2012. Liftability of probabilistic inference: Upper and lower bounds. In Proceedings of the 2nd International Workshop on Statistical Relational AI.
• Jha et al. (2010) Jha, A., Gogate, V., Meliou, A., and Suciu, D. 2010. Lifted inference seen from the other side: The tractable features. In Proc. of NIPS.
• Johnson (1990) Johnson, D. S. 1990. A catalog of complexity classes. In Handbook of Theoretical Computer Science, J. van Leeuwen, Ed. Vol. 1. Elsevier, Amsterdam, 67–161.
• Jones and Selman (1972) Jones, N. D. and Selman, A. L. 1972. Turing machines and the spectra of first-order formulas with equality. In

Proceedings of the Fourth ACM Symposium on Theory of Computing

. 157–167.
• Kersting and Raedt (2001) Kersting, K. and Raedt, L. D. 2001. Towards combining inductive logic programming with bayesian networks. In Proceedings of the 11th International Conference on Inductive Logic Programming (ILP-01). LNAI, vol. 2157. 118–131.
• Kisyński and Poole (2009) Kisyński, J. and Poole, D. 2009. Lifted aggregation in directed first-order probabilistic models. In Proc. of IJCAI 2009.
• Milch et al. (2005) Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D., and Kolobov, A. 2005. Blog: Probabilistic logic with unknown objects. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05). 1352–1359.
• Milch et al. (2008) Milch, B., Zettlemoyer, L. S., Kersting, K., Haimes, M., and Kaelbling, L. P. 2008. Lifted probabilistic inference with counting formulas. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI-08).
• Ngo et al. (1995) Ngo, L., Haddawy, P., and Helwig, J. 1995. A theoretical framework for context-sensitive temporal probability model construction with application to plan projection. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. 419–426.
• Poole (1993) Poole, D. 1993. Probabilistic horn abduction and Bayesian networks. Artificial Intelligence 64, 81–129.
• Poole (2003) Poole, D. 2003. First-order probabilistic inference. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI-03).
• Richardson and Domingos (2006) Richardson, M. and Domingos, P. 2006. Markov logic networks. Machine Learning 62, 1-2, 107 – 136.
• Sato (1995) Sato, T. 1995. A statistical learning method for logic programs with distribution semantics. In Proceedings of the 12th International Conference on Logic Programming (ICLP’95). 715–729.
• Singla et al. (2010) Singla, P., Nath, A., and Domingos, P. 2010. Approximate lifted belief propagation. In Proc. of AAAI-10 Workshop on Statistical Relational AI.
• Taskar et al. (2002) Taskar, B., Abbeel, P., and Koller, D. 2002. Discriminative probabilistic models for relational data. In Proc. of UAI 2002.
• Van den Broeck (2011) Van den Broeck, G. 2011. On the completeness of first-order knowledge compilation for lifted probabilistic inference. In Proc. of the 25th Annual Conf. on Neural Information Processing Systems (NIPS).
• Van den Broeck and Davis (2012) Van den Broeck, G. and Davis, J. 2012. Conditioning in first-order knowledge compilation and lifted probabilistic inference. In Proceedings of the Twenty-Sixth AAAI Conference on Articial Intelligence. 1961–1967.
• Van den Broeck et al. (2011) Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., and Raedt, L. D. 2011. Lifted probabilistic inference by first-order knowledge compilation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11).
• Vennekens et al. (2006) Vennekens, J., Denecker, M., and Bruynooghe, M. 2006. Representing causal information about a probabilistic process. In Logics in Artificial Intelligence, 10th European Conference, JELIA 2006, Proceedings. Lecture Notes in Computer Science, vol. 4160. Springer, 452–464.