Tractability through Exchangeability: A New Perspective on Efficient Probabilistic Inference

by   Mathias Niepert, et al.
University of Washington

Exchangeability is a central notion in statistics and probability theory. The assumption that an infinite sequence of data points is exchangeable is at the core of Bayesian statistics. However, finite exchangeability as a statistical property that renders probabilistic inference tractable is less well-understood. We develop a theory of finite exchangeability and its relation to tractable probabilistic inference. The theory is complementary to that of independence and conditional independence. We show that tractable inference in probabilistic models with high treewidth and millions of variables can be understood using the notion of finite (partial) exchangeability. We also show that existing lifted inference algorithms implicitly utilize a combination of conditional independence and partial exchangeability.



There are no comments yet.


page 1

page 2

page 3

page 4


A Theory of Uncertainty Variables for State Estimation and Inference

Probability theory forms an overarching framework for modeling uncertain...

Exploring Unknown Universes in Probabilistic Relational Models

Large probabilistic models are often shaped by a pool of known individua...

Revisiting the probabilistic method of record linkage

In theory, the probabilistic linkage method provides two distinct advant...

Exchangeable Variable Models

A sequence of random variables is exchangeable if its joint distribution...

Local Exchangeability

Exchangeability---in which the distribution of an infinite sequence is i...

Evidential Confirmation as Transformed Probability

A considerable body of work in AI has been concerned with aggregating me...

Conditional independence by typing

A central goal of probabilistic programming languages (PPLs) is to separ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Probabilistic graphical models such as Bayesian and Markov networks explicitly represent conditional independencies

of a probability distribution with their structure 

[Pearl1988, Lauritzen1996, Koller and Friedman2009, Darwiche2009]

. Their wide-spread use in research and industry can largely be attributed to this structural property and their declarative nature, separating representation and inference algorithms. Conditional independencies often lead to a more concise representation and facilitate efficient algorithms for parameter estimation and probabilistic inference. It is well-known, for instance, that probabilistic graphical models with a tree structure admit efficient inference. In addition to conditional independencies, modern inference algorithms exploit

contextual independencies [Boutilier et al.1996] to speed up probabilistic inference.

The time complexity of classical probabilistic inference algorithms is exponential in the treewidth [Robertson and Seymour1986] of the graphical model. Independence and its various manifestations often reduce treewidth and treewidth has been used in the literature as the decisive factor for assessing the tractability of probabilistic inference (cf. Koller:2009 Koller:2009, darwiche2009modeling darwiche2009modeling). However, recent algorithmic developments have shown that inference in probabilistic graphical models can be highly tractable, even in high-treewidth models without any conditional independencies. For instance, lifted probabilistic inference algorithms [Poole2003, Kersting2012]

often perform efficient inference in densely connected graphical models with millions of random variables. With the success of lifted inference, understanding these algorithms and their tractability on a more fundamental level has become a central challenge. The most pressing question concerns the underlying statistical principle that allows inference to be tractable in the absence of independence.

The present paper contributes to a deeper understanding of the statistical properties that render inference tractable. We consider an inference problem tractable when it is solved by an efficient algorithm, running in time polynomial in the number of random variables. The crucial contribution is a comprehensive theory that relates the notion of finite partial exchangeability [Diaconis and Freedman1980a] to tractability. One instance is full exchangeability where the distribution is invariant under variable permutations. We develop a theory of exchangeable decompositions that results in novel tractability conditions. Similar to conditional independence, partial exchangeability decomposes a probabilistic model so as to facilitate efficient inference. Most importantly, the notions of conditional independence and partial exchangeability are complementary, and when combined, define a much larger class of tractable models than the class of models rendered tractable by conditional independence alone.

Conditional and contextual independence are such powerful concepts because they are statistical

properties of the distribution, regardless of the representation used. Partial exchangeability is such a statistical property that is independent of any representation, be it a joint probability table, a Bayesian network, or a statistical relational model. We introduce novel forms of exchangeability, discuss their sufficient statistics, and efficient inference algorithms. The resulting exchangeability framework allows us to state known liftability results as corollaries, providing a first statistical characterization of exact lifted inference. As an additional contribution, we connect the semantic notion of exchangeability to

syntactic notions of tractability by showing that liftable statistical relational models have the required exchangeability properties due to their syntactic symmetries. We thereby unify notions of lifting from the exact and approximate inference community into a common framework.

2 A Case Study: Markov Logic

The analysis of exchangeability and tractable inference is developed in the context of arbitrary discrete probability distributions, independent of a particular representational formalism. Nevertheless, for the sake of accessibility, we will provide examples and intuitions for Markov logic, a well-known statistical relational language that exhibits several forms of exchangeability. Hence, after the derivation of the theoretical results in each section, we apply the theory to the problem of inference in Markov logic networks. This also allows us to link the theory to existing results from the lifted probabilistic inference literature.

Markov Logic Networks

We first introduce some standard concepts from function-free first-order logic. An atom consists of a predicate of arity followed by argument terms , which are either constants, or logical variables . A unary atom has one argument and a binary atom has two. A formula combines atoms with connectives (e.g., , ). A formula is ground if it contains no logical variables. The groundings of a formula are obtained by instantiating the variables with particular constants.

Many statistical relational languages have been proposed in recent years [Getoor and Taskar2007, De Raedt et al.2008]. We will work with one such language, called Markov logic networks (MLN) [Richardson and Domingos2006]. An MLN is a set of tuples , where is a real number representing a weight and is a formula in first-order logic. Let us, for example, consider the following MLN


which states that smokers are more likely to (4) get cancer and (2) be friends with other smokers.

Given a domain of constants , a first-order MLN induces a grounding, which is the MLN obtained by replacing each formula in with all its groundings (using the same weight). Take for example the domain (e.g., two people, Alice and Bob), the above first-order MLN represents the following grounding.

This ground MLN contains eight different random variables, which correspond to all groundings of atoms , and . This leads to a distribution over possible worlds. The weight of each world is the product of the expressions , where is a ground MLN formula and is satisfied by the world. The probabilities of worlds are obtained by normalizing their weights. Without loss of generality [Jha et al.2010], we assume that first-order formulas contain no constants.

Lifted Probabilistic Inference

The advent of statistical relational languages such as Markov logic has motivated a new class of lifted inference algorithms [Poole2003]. These algorithms exploit the high-level structure and symmetries of the first-order logic formulas to speed up inference [Kersting2012]. Surprisingly, they perform tractable inference even in the absence of conditional independencies. For example, when interpreting the above MLN as an (undirected) probabilistic graphical model, all pairs of random variables in are connected by an edge due to the groundings of Formula 2. The model has no conditional or contextual independencies between the variables. Nevertheless, lifted inference algorithms exactly compute its single marginal probabilities in time linear in the size of the corresponding graphical model [Van den Broeck et al.2011], scaling up to millions of random variables.

As lifted inference research makes algorithmic progress, the quest for the source of tractability and its theoretical properties becomes increasingly important. For exact lifted inference, most theoretical results are based on the notion of domain-lifted inference [Van den Broeck2011].

Definition 1 (Domain-lifted).

Domain-lifted inference algorithms run in time polynomial in .

Note that domain-lifted algorithms can be exponential in other parameters, such as the number of formulas and predicates. Our current understanding of exact lifted inference is that syntactic properties of MLN formulas permit domain-lifted inference [Van den Broeck2011, Jaeger and Van den Broeck2012, Taghipour et al.2013]. We will review these results where relevant. Moreover, the (fractional) automorphisms of the graphical model representation have been related to lifted inference [Niepert2012b, Bui, Huynh, and Riedel2012, Noessner, Niepert, and Stuckenschmidt2013, Mladenov and Kersting2013]. While there are deep connections between automorphisms and exchangeability [Niepert2012b, Niepert2013, Bui, Huynh, and Riedel2012, Bui, Huynh, and de Salvo Braz2012], we refer these to future work.

Figure 1: An undirected graphical model with finitely exchangeable Bernoulli variables. There are no (conditional) independencies that hold among the variables.

3 Finite Exchangeability

This section provides some background on the concept of finite partial exchangeability. We proceed by showing that particular forms of finite exchangeability permit tractable inference. For the sake of simplicity and to provide links to statistical relational models such as MLNs, we present the theory for finite sets of (upper-case) binary random variables

. However, the theory applies to all distributions over finite valued discrete random variables. Lower-case

denote an assignments to .

We begin with the most basic form of exchangeability.

Definition 2 (Full Exchangeability).

A set of variables is fully exchangeable if and only if for all permutations of .

Full exchangeability is best understood in the context of a finite sequence of binary random variables such as a number of coin tosses. Here, exchangeability means that it is only the number of heads that matters and not their particular order. Figure 1 depicts an undirected graphical model with finitely exchangeable dependent Bernoulli variables.

Finite Partial Exchangeability

The assumption that all variables of a probabilistic model are exchangeable is often too strong. Fortunately, exchangeability can be generalized to the concept of partial exchangeability using the notion of a sufficient statistic [Diaconis and Freedman1980b, Lauritzen et al.1984, Lauritzen1988]. Particular instances of exchangeability such as full finite exchangeability correspond to particular statistics.

Definition 3 (Partial Exchangeability).

Let be the domain of , and let be a finite set. A set of random variables is partially exchangeable with respect to the statistic if and only if

The following theorem states that the joint distribution of a set of random variables that is partially exchangeable with a statistic

is a unique mixture of uniform distributions.

Theorem 1 (Diaconis:1980 Diaconis:1980).

Let be a finite set and let be a statistic of a partially exchangeable set . Moreover, let , let be the uniform distribution over , and let . Then,

Hence, a distribution that is partially exchangeable with respect to a statistic can be parameterized as a unique mixture of uniform distributions. We will see that several instances of partial exchangeability render probabilistic inference tractable. Indeed, the major theme of the present paper can be summarized as finding methods for constructing the above representation and exploiting it for tractable probabilistic inference for a given probabilistic model.

Let be the indicator function. The uniform distribution of each equivalence class is ; and the probability of is for every . Hence, every value of the statistic corresponds to one equivalence class of joint assignments with identical probability. We will refer to these equivalence classes as orbits. We write when assignments and agree on the values of their shared variables [Darwiche2009]. The suborbit for some evidence state is the set of those states in that are compatible with , that is, .

Partial Exchangeability and Probabilistic Inference

We are now in the position to relate finite partial exchangeability to tractable probabilistic inference, using notions from Theorem 1. The inference tasks we consider are

  • MPE inference, i.e., finding for any given assignment to variables , and

  • marginal inference, i.e., computing for any given .

For a set of variables , we say that can be computed efficiently iff it can be computed in time polynomial in . We make the following complexity claims

Theorem 2.

Let be partially exchangeable with statistic . If we can efficiently

  • for all , evaluate , and

  • for all and decide whether there exists an , and if so, construct it,

then the complexity of MPE inference is polynomial in . If we can additionally compute efficiently, then the complexity of marginal inference is also polynomial in .


For MPE inference, we construct an for each , and return the one maximizing . For marginal inference, we return . ∎

If the above conditions for tractable inference are fulfilled we say that a distribution is tractably partially exchangeable for MPE or marginal inference. We will present notions of exchangeability and related statistics that make distributions tractably partially exchangeable. Please note that Theorem 2 generalizes to situations in which we can only efficiently compute up to a constant factor , as is often the case in undirected probabilistic graphical models.

Markov Logic Case Study

Exchangeability and independence are not mutually exclusive. Independent and identically distributed (iid) random variables are also exchangeable. Take for example the MLN

The random variables are independent. Hence, we can compute their marginal probabilities independently as

The variables are also finitely exchangeable. For example, the probability that smokes and does not is equal to the probability that smokes and does not. The sufficient statistic counts how many people smoke in the state and the probability of a state in which out of people smoke is .

Exchangeability can occur without independence, as in the following Markov logic network

This distribution has neither independencies nor conditional independencies. However, its variables are finitely exchangeable and the probability of a state is only a function of the sufficient statistic counting the number of smokers in . The probability of a state now increases by a factor of with every pair of smokers. When people smoke there are pairs and, hence, , where is a normalization constant. Let consist of all variables except for , and let be an assignment to in which people smoke. The probability that smokes given is

which clearly depends on the number of smokers in . Hence, is not independent of but the random variables are exchangeable with sufficient statistic . Figure 1 depicts the graphical representation of the corresponding ground Markov logic network.

4 Exchangeable Decompositions

We now present novel instances of partial exchangeability that render probabilistic inference tractable. These instances generalize exchangeability of single variables to exchangeability of sets of variables. We describe the notion of an exchangeable decomposition and prove that it fulfills the tractability requirements of Theorem 2. We proceed by demonstrating that these forms commonly occur in MLNs.

Variable Decompositions

The notions of independent and exchangeable decompositions are at the core of the developed theoretical results.

Definition 4 (Variable Decomposition).

A variable decomposition partitions into subsets . We call the width of the decomposition.

Definition 5 (Independent Decomposition).

A variable decomposition is independent if and only if factorizes as

Definition 6 (Exchangeable Decomposition).

A variable decomposition is exchangeable iff for all permutations ,

Figure 2: An exchangeable decomposition of binary random variables (the boxes) into components of width (the rows). The statistic

counts the occurrences of each unique binary vector.

Figure 2 depicts an example distribution with random variables and a decomposition into subsets of width . The joint distribution is invariant under permutations of the sequences. The corresponding sufficient statistic counts the number of occurrences of each binary vector of length and returns a tuple of counts.

Please note that the definition of full finite exchangeability (Definition 2) is the special case when the exchangeable decomposition has width . Also note that the size of all subsets in an exchangeable decomposition equal the width.

Tractable Variable Decompositions

The core observation of the present work is that variable decompositions that are exchangeable and/or independent result in tractable probabilistic models. For independent decompositions, the following tractability guarantee is used in most existing inference algorithms.

Proposition 3.

Given an independent decomposition of with bounded width, and a corresponding factorized representation of the distribution (cf. Definition 5), the complexity of MPE and marginal inference is polynomial in .

While the decomposition into independent components is a well-understood concept, the combination with finite exchangeability has not been previously investigated as a statistical property that facilitates tractable probabilistic inference. We can now prove the following result.

Theorem 4.

Suppose we can compute in time polynomial in . Then, given an exchangeable decomposition of with bounded width, the complexity of MPE and marginal inference is polynomial in .


Following Theorem 2, we have to show that there exists a statistic so that (a)  is polynomial in ; (b) we can efficiently decide whether an exists and if so, construct it; and (c) efficiently compute for all and . Statements (b) and (c) ensure that the assumptions of Theorem 2 hold for exchangeable decompositions, which combined with (a) proves the theorem.

To prove (a), let us first construct a sufficient statistic for exchangeable decompositions. A full joint assignment decomposes into assignments in accordance with the given variable decomposition. Each is a bit string . Consider a statistic , where each has a corresponding unique bit string and . The value of the statistic thus represents the number of components in the decomposition that are assigned bit string . Hence, we have , and we prove (a) by observing that

To prove statements (b) and (c) we have to find, for each partial assignment , an algorithm that generates an and that computes in time polynomial in . To hint at the proof strategy, we give the formula for the orbit without evidence :

The proof is very technical and deferred to the appendix. ∎

Markov Logic Case Study

Let us consider the following MLN

It models a distribution in which every non-smoker or smoker with cancer, that is, every satisfying the first formula, increases the probability by a factor of . Each pair of smokers increases the probability by a factor of . This model is not fully exchangeable: swapping and in a state yields a different probability. There are no (conditional) independencies between the atoms.

The variables in this MLN do have an exchangeable decomposition whose width is two, namely

The sufficient statistic of this decomposition counts the number of people in each of four groups, depending on whether they smoke, and whether they have cancer. The probability of a state only depends on the number of people in each group and swapping people between groups does not change the probability of a state. For example,

Theorem 4 says that this MLN permits tractable inference.

The fact that this MLN has an exchangeable decomposition is not a coincidence. In general, we can show this for MLNs of unary atoms, which are called monadic MLNs.

Theorem 5.

The variables in a monadic MLN have an exchangeable decomposition. The width of this decomposition is equal to the number of predicates.

The proof builds on syntactic symmetries of the MLN, called renaming automorphisms [Bui, Huynh, and Riedel2012, Niepert2012a]. Please see the appendix for further details.

It now follows as a corollary of Theorems 4 and 5 that MPE and marginal inference in monadic MLNs is polynomial in , and therefore also in the domain size .

Corollary 6.

Inference in monadic MLNs is domain-lifted.

5 Marginal and Conditional Exchangeability

Many distributions are not decomposable into independent or exchangeable decompositions. Similar to conditional independence, the notion of exchangeability can be extended to conditional exchangeability. We generalize exchangeability to conditional distributions, and state the corresponding tractability guarantees.

Marginal and Conditional Decomposition

Tractability results for exchangeable decompositions on all variables under consideration also extend to subsets.

Definition 7 (Marginal Exchangeability).

When a subset of the variables under consideration has an exchangeable decomposition , we say that is marginally exchangeable.

This means that is still an exchangeable decomposition when considering the distribution .

Theorem 7.

Suppose we are given a marginally exchangeable decomposition of with bounded width and let . If computing is polynomial in for all , then the complexity of MPE and marginal inference over variables is polynomial in .


Let be the statistic associated with the given decomposition, and let be evidence given on . Then,

By the assumption that is marginally exchangeable and the proof of Theorem 4, we can compute and an in time polynomial in . An analogous argument holds for MPE inference on . ∎

We now need to identify distributions for which we can compute efficiently. This implies tractable probabilistic inference over . Given a particular , we have already seen sufficient conditions: when decomposes exchangeably or independently conditioned on , Proposition 3 and Theorem 4 guarantee that computing is tractable. This suggests the following general notion.

Definition 8 (Conditional Decomposability).

Let be a set of variables with and . We say that is exchangeably (independently) decomposable given if and only if for each assignment to , there exists an exchangeable (independent) decomposition of .

Furthermore, we say that is decomposable with bounded width iff the width of each is bounded. When the decomposition can be computed in time polynomial in for all , we say that is efficiently decomposable.

Theorem 8.

Let be a set of variables with and . Suppose we are given a marginally exchangeable decomposition of with bounded width. Suppose further that is efficiently (exchangeably or independently) decomposable given with bounded width. If we can compute efficiently, then the complexity of MPE and marginal inference over variables is polynomial in .


Following Theorem 7, we only need to show that we can compute in time polynomial in for all . When is exchangeably decomposable given , this follows from constructing and employing the arguments made in the proof of Theorem 4. The case when is independently decomposable is analogous. ∎

Theorems 7 and 8 are powerful results and allow us to identify numerous probabilistic models for which inference is tractable. For instance, we will prove liftability results for Markov logic networks. However, we are only at the beginning of leveraging these tractability results to their fullest extent. Especially Theorem 7 is widely applicable because the computation of can be tractable for many reasons. For instance, conditioned on the variables , the distributions could be bounded treewidth graphical models, such as tree-structured Markov networks. Tractability for follows immediately from Theorem 7.

Since Theorem 7 only speaks to the tractability of querying variables in , there is the question of when we can also efficiently query the variables . Results from the lifted inference literature may provide a solution by bounding or approximating queries and evidence that includes to maintain marginal exchangeability [Van den Broeck and Darwiche2013]. The next section shows that certain restricted situations permit tractable inference on the variables in .

Markov Logic Case Study

Let us again consider the MLN

having the marginally exchangeable decomposition

whose width is one. To intuitively see why this decomposition is marginally exchangeable, let us consider two states and of the atoms in which only the values of two atoms, for example and , are swapped. There is a symmetry of the MLNs joint distribution that swaps these atoms: the renaming automorphism [Bui, Huynh, and Riedel2012, Niepert2012a] that swaps constants and in all atoms. For marginal exchangeability, we need that . But this holds since the renaming automorphism is an automorphism of the set of states – for every , , and there exists an automorphism that maps to with .

The given MLN has several marginally exchangeable decompositions, with the most general one being

For that decomposition, the remaining variables

are independently decomposable given . The variables appear at most once in any formula. In a probabilistic graphical model representation, evidence on the variables would therefore decompose the graph into independent components. Thus, it follows from Theorem 8 that we can efficiently answer any query over the variables in .

This insight generalizes to a large class of MLNs, called the two-variable fragment. It consists of all MLNs whose formulas contain at most two logical variables.

Theorem 9.

In a two-variable fragment MLN, let and be the ground atoms with one and two distinct arguments respectively. Then there exists a marginally exchangeable decomposition of , and is efficiently independently decomposable given . Each decomposition’s width is at most twice the number of predicates.

The proof of Theorem 9 is rather technical and we refer the reader to the appendix for a detailed proof. It now follows from Theorems 8 and 9 that the complexity of inference over the unary atoms in the two-variable fragment is polynomial in the domain size .

What happens if our query involves variables from – the binary atoms? It is known in the lifted inference literature that we cannot expect efficient inference of general queries that involve the binary atoms. Assignments to the variables break symmetries and therefore break marginal exchangeability. This causes inference to become #P-hard as a function of the query [Van den Broeck and Davis2012]. Nevertheless, if we bound the number of binary atoms involved in the query, we can use the developed theory to show a general liftability result.

Theorem 10.

For any MLN in the two-variable fragment, MPE and marginal inference over the unary atoms and a bounded number of binary atoms is domain-lifted.

This corresponds to one of the strongest known theoretical results in the lifted inference literature [Jaeger and Van den Broeck2012]. We refer the interested reader to the appendix for the proof. A consequence of Theorem 10 is that we can efficiently compute all single marginals in the two-variable fragment, given arbitrary evidence on the unary atoms.

6 Discussion and Conclusion

We conjecture that the concept of (partial) exchangeability has potential to contribute to a deeper understanding of tractable probabilistic models. The important role conditional independence plays in the research field of graphical models is evidence for this hypothesis. Similar to conditional independence, it might be possible to develop a theory of exchangeability that mirrors that of independence. For instance, there might be a (graphical) structural representations of particular types of partial exchangeability and corresponding logical axiomatizations [Pearl1988]. Moreover, it would be interesting to develop graphical models with exchangeability and independence, and notions like d-separation to detect marginal exchangeability and conditional decomposability from a structural representation. The first author has taken steps in this direction by introducing exchangeable variable models, a class of (non-relational) probabilistic models based on finite partial exchangeability [Niepert and Domingos2014].

Recently, there has been considerable interest in computing and exploiting the automorphisms of graphical models [Niepert2012b, Bui, Huynh, and Riedel2012]. There are several interesting connections between automorphisms, exchangeability, and lifted inference [Niepert2012a]. Moreover, there are several group theoretical algorithms that one could apply to the automorphism groups to discover the structure of exchangeable variable decompositions from the structure of the graphical models. Since we presently only exploit renaming automorphisms, there is a potential for tractable inference in MLNs that goes beyond what is known in the lifted inference literature.

Partial exchangeability is related to collective graphical models [Sheldon and Dietterich2011] (CGMs) and cardinality-based potentials [Gupta, Diwan, and Sarawagi2007] as these models also operate on sufficient statistics. However, probabilistic inference for CGMs is not tractable and there are no theoretical results that identify tractable CGMs models. The presented work may help to identify such situations. The presented theory generalizes the statistics of cardinality-based potentials.

Lifted Inference and Exchangeability

Our case studies identified a deep connection between lifted probabilistic inference and the concepts of partial, marginal and conditional exchangeability. In this new context, it appears that exact lifted inference algorithms [de Salvo Braz, Amir, and Roth2005, Milch et al.2008, Jha et al.2010, Van den Broeck et al.2011, Gogate and Domingos2011, Taghipour et al.2012] can all be understood as performing essentially three steps: (i) construct a sufficient statistic , (ii) generate all possible values of the sufficient statistic, and (iii) count suborbit sizes for a given statistic. For an example of (i), we can show that a compiled first-order circuit [Van den Broeck et al.2011] or the trace of probabilistic theorem proving [Gogate and Domingos2011] encode a sufficient statistic in their existential quantifier and splitting nodes. Steps (ii) and (iii) are manifested in all these algorithms through summations and binomial coefficients.

Between Corollary 6 and Theorem 10, we have re-proven almost the entire range of liftability results from the lifted inference literature [Jaeger and Van den Broeck2012] within the exchangeability framework, and extended these to MPE inference. There is an essential difference though: liftability results make assumptions about the syntax (e.g., MLNs), whereas our exchangeability theorems apply to all distributions. We expect Theorem 8 to be used to show liftability, and more general tractability results for many other representation languages, including but not limited to the large number of statistical relational languages that have been proposed [Getoor and Taskar2007, De Raedt et al.2008].


This work was partially supported by ONR grants #N00014-12-1-0423, #N00014-13-1-0720, and #N00014-12-1-0312; NSF grants #IIS-1118122 and #IIS-0916161; ARO grant #W911NF-08-1-0242; AFRL contract #FA8750-13-2-0019; the Research Foundation-Flanders (FWO-Vlaanderen); and a Google research award to MN.

Appendix A Continued Proof of Theorem 4


To prove statements (b) and (c), we need to represent partial assignments with . The partial assignments decompose into partial assignments in accordance with . Each corresponds to a string where characters and encode assignments to variables in and encodes an unassigned variable in . In this case, we say that is of type . Please note that there are distinct and distinct . We say that agrees with , denoted by , if and only if their shared variables have identical assignments.

A completion of to is a bijection such that implies . Every completion corresponds to a unique way to assign elements in to unassigned variables so as to turn the partial assignment into the full assignment .

Let , let , and let . Moreover, let for each . Consider the set of matrices

Every represents a set of completions from to for which . The value indicates that each completion represented by maps elements in of type to elements in of type . We write for the set of completions represented by .

We have to prove the following statements

  1. For every and every there exists an with and is a completion of to ;

  2. For every with and every completion of to there exists an such that ;

  3. For all with we have that ;

  4. For every , we can efficiently compute , the size of the set of completions represented by .

To prove statement (1), let and . By the definition of , maps elements in of type to elements in of type . By the conditions and of the definition of we have that is a bijection. By the condition of the definition of we have that implies and, therefore, . Hence, is a completion. Moreover, completes to an with by the definition of . Hence, .

To prove statement (2), let with and let be a completion of to . We construct an with as follows. Since is a completion we have that implies and, hence, we set if . For all other entries in we set . Since is surjective, we have that and since is injective, we have that . Hence, .

To prove statement (3), let with . Since we have that there exist such that, without loss of generality, . Hence, every maps fewer elements of type to elements of type than every . Hence, for every and every .

To prove statement (4), let . Every maps elements in of type to elements in of type . Hence, the size of the set of completions represented by is, for each , the number of different ways to place balls of color , , into urns. Hence,

From the statements (1)-(4) we can conclude that

This allows us to prove (b) and (c). We can construct in time polynomial in as follows. There are entries in a matrix and each entry has at most different values. Hence, we can enumerate all possible matrices . We simply select those for which the conditions in the definition of hold. For one we can efficiently construct one and the that it completes to. This proves (b). Finally, we compute . This proves (c). ∎

Appendix B Proof of Theorem 5


Let be the unary predicates of a given MLN and let be the domain. After grounding, there are ground atoms per predicate. We write to denote the ground atom that resulted from instantiating predicate with domain element . Let be a decomposition of the set of ground atoms with for every . A renaming automorphism [Bui, Huynh, and Riedel2012, Niepert2012a] is a permutation of the ground atoms that results from a permutation of the domain elements. The joint distribution over all ground atoms remains invariant under these permutations. Consider the permutation of ground atoms that results from swapping two domain elements . This permutation acting on the set of ground atoms permutes the components and and leaves all other components invariant. Since this is possible for each pair it follows that the decomposition is exchangeable. ∎

Appendix C Proof of Theorem 9


Let be the unary predicates and let be the binary predicates of a given MLN and let be the domain. After grounding, there are ground atoms per unary and ground atoms per binary predicate. We write to denote the ground atom that resulted from instantiating unary predicate with domain element and to denote the ground atom that resulted from instantiating binary predicate with domain elements and .

Let be the set of all ground atoms, let and let . Moreover, let with . We can make the same arguments as in the proof of Theorem 5 to show that is exchangeable.

Now, we prove that the variables are independently decomposable given . Let for all . Now, let be any ground formula and let be the set of ground atoms occurring in both and . Then, either or since every formula in the MLN has at most two variables. Hence, is a decomposition of with components, width , and factorizes as

By the properties of MLNs, we have that the are computable in time exponential in the width of the decomposition but polynomial in . ∎

Appendix D Proof of Theorem 10


Suppose that the query contains a bounded number of binary atoms whose arguments are constants from the set . Consider the set of variables consisting of all unary atoms whose argument comes from , and all binary atoms whose arguments both come from . The unary atoms in are no long marginally exchangeable, because their arguments can appear asymmetrically in . We can now answer the query by simply enumerating all states of and performing inference in each separately, were all variables have again become marginally exchangeable, and all variables have become independently decomposable given . The construction of and is similar to the proof of Theorem 9, except that some additional binary atoms are now in instead of . These atoms have one argument in , and one not in , and are treated as unary. When we bound the number of binary atoms in the query, the size of will not be a function of the domain size, and enumerating over all states is domain-lifted. ∎


  • [Boutilier et al.1996] Boutilier, C.; Friedman, N.; Goldszmidt, M.; and Koller, D. 1996. Context-specific independence in Bayesian networks. In

    Proceedings of the 12th Conference on Uncertainty in artificial intelligence (UAI)

    , 115–123.
  • [Bui, Huynh, and de Salvo Braz2012] Bui, H.; Huynh, T.; and de Salvo Braz, R. 2012. Exact lifted inference with distinct soft evidence on every object. In Proceedings of the 26th Conference on Artificial Intelligence (AAAI).
  • [Bui, Huynh, and Riedel2012] Bui, H.; Huynh, T.; and Riedel, S. 2012. Automorphism groups of graphical models and lifted variational inference.
  • [Darwiche2009] Darwiche, A. 2009. Modeling and reasoning with Bayesian networks. Cambridge University Press.
  • [De Raedt et al.2008] De Raedt, L.; Frasconi, P.; Kersting, K.; and Muggleton, S., eds. 2008.

    Probabilistic inductive logic programming: theory and applications

    Berlin, Heidelberg: Springer-Verlag.
  • [de Salvo Braz, Amir, and Roth2005] de Salvo Braz, R.; Amir, E.; and Roth, D. 2005. Lifted first-order probabilistic inference. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 1319–1325.
  • [Diaconis and Freedman1980a] Diaconis, P., and Freedman, D. 1980a. De Finetti’s generalizations of exchangeability. In Studies in Inductive Logic and Probability, volume II.
  • [Diaconis and Freedman1980b] Diaconis, P., and Freedman, D. 1980b. Finite exchangeable sequences. The Annals of Probability 8(4):745–764.
  • [Getoor and Taskar2007] Getoor, L., and Taskar, B. 2007. Introduction to Statistical Relational Learning. The MIT Press.
  • [Gogate and Domingos2011] Gogate, V., and Domingos, P. 2011. Probabilistic theorem proving. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), 256–265.
  • [Gupta, Diwan, and Sarawagi2007] Gupta, R.; Diwan, A. A.; and Sarawagi, S. 2007. Efficient inference with cardinality-based clique potentials. In

    Proceedings of the 24th International Conference on Machine Learning (ICML)

    , 329–336.
  • [Jaeger and Van den Broeck2012] Jaeger, M., and Van den Broeck, G. 2012. Liftability of probabilistic inference: Upper and lower bounds. In Proceedings of the 2nd International Workshop on Statistical Relational AI.
  • [Jha et al.2010] Jha, A.; Gogate, V.; Meliou, A.; and Suciu, D. 2010. Lifted inference seen from the other side: The tractable features. In Proceedings of the 24th Conference on Neural Information Processing Systems (NIPS).
  • [Kersting2012] Kersting, K. 2012. Lifted probabilistic inference. In Proceedings of European Conference on Artificial Intelligence (ECAI).
  • [Koller and Friedman2009] Koller, D., and Friedman, N. 2009. Probabilistic Graphical Models. The MIT Press.
  • [Lauritzen et al.1984] Lauritzen, S. L.; Barndorff-Nielsen, O. E.; Dawid, A. P.; Diaconis, P.; and Johansen, S. 1984. Extreme point models in statistics. Scandinavian Journal of Statistics 11(2).
  • [Lauritzen1988] Lauritzen, S. L. 1988. Extremal families and systems of sufficient statistics. Lecture notes in statistics. Springer-Verlag.
  • [Lauritzen1996] Lauritzen, S. L. 1996. Graphical Models. Oxford University Press.
  • [Milch et al.2008] Milch, B.; Zettlemoyer, L.; Kersting, K.; Haimes, M.; and Kaelbling, L. 2008. Lifted probabilistic inference with counting formulas. Proceedings of the 23rd AAAI Conference on Artificial Intelligence 1062–1068.
  • [Mladenov and Kersting2013] Mladenov, M., and Kersting, K. 2013. Lifted inference via k-locality. In Proceedings of the 3rd International Workshop on Statistical Relational AI.
  • [Niepert and Domingos2014] Niepert, M., and Domingos, P. 2014. Exchangeable variable models. In Proceedings of the International Conference on Machine Learning (ICML).
  • [Niepert2012a] Niepert, M. 2012a. Lifted probabilistic inference: An MCMC perspective. In Proceedings of StaRAI.
  • [Niepert2012b] Niepert, M. 2012b. Markov chains on orbits of permutation groups. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI).
  • [Niepert2013] Niepert, M. 2013. Symmetry-aware marginal density estimation. In Proceedings of the 27th Conference on Artificial Intelligence (AAAI).
  • [Noessner, Niepert, and Stuckenschmidt2013] Noessner, J.; Niepert, M.; and Stuckenschmidt, H. 2013. RockIt: Exploiting Parallelism and Symmetry for MAP Inference in Statistical Relational Models. In Proceedings of the 27th Conference on Artificial Intelligence (AAAI).
  • [Pearl1988] Pearl, J. 1988. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann.
  • [Poole2003] Poole, D. 2003. First-order probabilistic inference. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 985–991.
  • [Richardson and Domingos2006] Richardson, M., and Domingos, P. 2006. Markov logic networks. Machine learning 62(1-2):107–136.
  • [Robertson and Seymour1986] Robertson, N., and Seymour, P. 1986. Graph minors. II. Algorithmic aspects of tree-width. Journal of algorithms 7(3):309–322.
  • [Sheldon and Dietterich2011] Sheldon, D., and Dietterich, T. 2011. Collective graphical models. In Advances in Neural Information Processing Systems (NIPS). 1161–1169.
  • [Taghipour et al.2012] Taghipour, N.; Fierens, D.; Davis, J.; and Blockeel, H. 2012. Lifted variable elimination with arbitrary constraints. In Proceedings of the fifteenth international conference on Artificial Intelligence and Statistics, volume 22, 1194–1202.
  • [Taghipour et al.2013] Taghipour, N.; Fierens, D.; Van den Broeck, G.; Davis, J.; and Blockeel, H. 2013. Completeness results for lifted variable elimination. In Proceedings of the 16th Conference on Artificial Intelligence and Statistics, 572–580.
  • [Van den Broeck and Darwiche2013] Van den Broeck, G., and Darwiche, A. 2013. On the complexity and approximation of binary evidence in lifted inference. In Advances in Neural Information Processing Systems 26 (NIPS).
  • [Van den Broeck and Davis2012] Van den Broeck, G., and Davis, J. 2012. Conditioning in first-order knowledge compilation and lifted probabilistic inference. In Proceedings of the 26th Conference on Artificial Intelligence (AAAI).
  • [Van den Broeck et al.2011] Van den Broeck, G.; Taghipour, N.; Meert, W.; Davis, J.; and De Raedt, L. 2011. Lifted probabilistic inference by first-order knowledge compilation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), 2178–2185.
  • [Van den Broeck2011] Van den Broeck, G. 2011. On the completeness of first-order knowledge compilation for lifted probabilistic inference. In Advances in Neural Information Processing Systems 24 (NIPS),, 1386–1394.