# First-Order Bayesian Network Specifications Capture the Complexity Class PP

The point of this note is to prove that a language is in the complexity class PP if and only if the strings of the language encode valid inferences in a Bayesian network defined using function-free first-order logic with equality.

## Authors

• 17 publications
12/04/2016

### The Complexity of Bayesian Networks Specified by Propositional and Relational Languages

We examine the complexity of inference in Bayesian networks specified by...
09/04/2019

### Efficient elimination of Skolem functions in first-order logic without equality

We prove that elimination of a single Skolem function in pure logic incr...
08/28/2019

Functional logic languages can solve equations over user-defined data an...
02/19/2018

### A Method to Translate Order-Sorted Algebras to Many-Sorted Algebras

Order-sorted algebras and many sorted algebras exist in a long history w...
11/09/2017

### On First-order Cons-free Term Rewriting and PTIME

In this paper, we prove that (first-order) cons-free term rewriting with...
11/24/2018

### Answers to Imamura Note on the Definition of Neutrosophic Logic

In order to more accurately situate and fit the neutrosophic logic into ...
12/27/2013

### The Garden Hose Complexity for the Equality Function

The garden hose complexity is a new communication complexity introduced ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The point of this note is to prove that a language is in the complexity class if and only if the strings of the language encode valid inferences in a Bayesian network defined using function-free first-order logic with equality. Before this statement can be made precise, a number of definitions are needed. Section 2 summarizes the necessary background and Section 3 defines first-order Bayesian network specifications and the complexity class . Section 4 states and proves the former captures the latter.

## 2 Background

We collect a number of definitions here [1, 2], so as to fix our terminology and notation.

We consider input strings in the alphabet ; that is, a string is a sequence of s and s. A language is a set of strings; a complexity class is a set of languages. A language is decided

by a Turing machine if the machine accepts each string in the language, and rejects each string not in the language. The complexity class

contains each languages that can be decided by a nondeterministic Turing machine with a polynomial time bound.

We focus on function-free first-order logic with equality (denoted by ). That is, all formulas we contemplate are well-formed formulas of first-order logic with equality but without functions, containing predicates, negation (), conjunction (), disjunction (), implication (), equivalence (), existential quatification () and universal quantification (). The set of predicates is the vocabulary.

A formula in existential function-free second-order logic (denoted by ) is a formula of the form , where is a sentence of containing predicates . Such a sentence allows existential quantification over the predicates themselves. Note that again we have equality in the language (that is, the built-in predicate is always available).

For a given vocabulary, a structure is a pair consisting of a domain and an interpretation. A domain is simply a set. An interpretation is a truth assignment for every grounding of every predicate that is not existentially quantified. As an example, consider the following formula of as discussed by Grädel [1]:

 ∃partition:∀\mathpzcx:∀\mathpzcy:(edge(\mathpzcx,\mathpzcy)⇒(partition(\mathpzcx)⇔¬partition(\mathpzcy))).

A domain is then a set that can be taken as the set of nodes of an input graph. An interpretation is a truth assignment for the predicate and can be taken as the set of edges of the input graph. The formula is satisfied if and only if it is possible to partition the vertices into two subsets such that if a node is in one subset, it is not in the other. That is, the formula is satisfied if and only if the input graph is bipartite.

We only consider finite vocabularies and finite domains in this note. If a formula has free logical variables , then denote by the fact that formula is true in structure when the logical variables are replaced by elements of the domain . In this case say that is a model of .

Note that if is a formula in as in the previous paragraphs, then its interpretations runs over the groundings of the non-quantified predicates; that is, if contains predicates and , but are all existentially quantified, then a model for contains an intepretation for .

There is an isomorphism between structures and when there is a bijective mapping between the domains such that if is true in , then is true in , and moreover if is true in , then is true in (where denotes the inverse of ). A set of structures is isomorphism-closed if whenever a structure is in the set, all structures that are isomorphic to it are also in the set.

We assume that every structure is given as a string, encoded as follows for a fixed vocabulary [2, Section 6.1]. First, if the domain contains elements , then the string begins with symbols followed by . The vocabulary is fixed, so we take some order for the predicates, . We then append, in this order, the encoding of the interpretation of each predicate. Focus on predicate of arity . To encode it with respect to a domain, we need to order the elements of the domain, say . This total ordering is assumed for now to be always available; it will be important later to check that the ordering itself can be defined. In any case, with a total ordering we can enumerate lexicographically all -tuples over the domain. Now suppose is the th tuple in this enumeration; then the the bit of the encoding of is if is true in the given interpretation, and otherwise. Thus the encoding is a string containing symbols (either or ).

We can now state Fagin’s theorem:

###### Theorem 1.

Let be an isomorphism-closed set of finite structures of some non-empty finite vocabulary. Then is in if and only if is the class of finite models of a sentence in existential function-free second-order logic.

Denote by the problem of deciding whether an input structure is a model of a fixed existential function-free second-order sentence. Fagin theorem means first that is in (this is the easy part of the theorem). Second, the theorem means that every language that can be decided by a polynomial-time nonderministic Turing machine can be exactly encoded as the set of models for a sentence in existential second-order logic (this is the surprising part of the theorem). This implies that is -hard, but the theorem is much more elegant (because it says that there is no need for some polynomial processing outside of the specification provided by existential second-order logic).

The significance of Fagin’s theorem is that it offers a definition of that is not tied to any computational model; rather, it is tied to the expressivity of the language that is used to specify problems. Any language that can be decided by a polynomial nondeterministic Turing machine can equivalently be be decided using first-order logic with some added quantification over predicates.

## 3 First-order Bayesian network specifications and the complexity class PP

We start by defining our two main characters: on one side we have Bayesian networks that are specified using ; on the other side we have the complexity class .

It will now be convenient to view each grounded predicate

as a random variable once we have a fixed vocabulary and domain. So, given a domain

, we understand as a function over all possible interpretations of the vocabulary, so that yields if is true in interpretation , and otherwise.

### 3.1 First-order Bayesian network specifications

A first-order Bayesian network specification is a directed graph where each node is a predicate, and where each root node is associated with a probabilistic assessment

 P(r(^\mathpzcx)=1)=α,

while each non-root node is associated with a formula (called the definition of )

 s(^x)⇔ϕ(^\mathpzcx),

where is a formula in with free variables .

Given a domain, a first-order Bayesian network specification can be grounded into a unique Bayesian network. This is done:

1. by producing every grounding of the predicates,

2. by associating with each grounding of a root predicate the grounded assessment ;

3. by associating with each grounding of a non-root predicate the grounded definition ;

4. finally, by drawing a graph where each node is a grounded predicate and where there is an edge into each grounded non-root predicate from each grounding of a predicate that appears in the grounded definition of .

Consider, as an example, the following model of asymmetric friendship, where an individual is always a friend of herself, and where two individuals are friends if they are both fans (of some writer, say) or if there is some “other” reason for it:

 P(fan(\mathpzcx)) = 0.2, P(friends(\mathpzcx,\mathpzcy)) ⇔ (\mathpzcx=\mathpzcy)∨ (fan(\mathpzcx)∧fan(\mathpzcy))∨ other(\mathpzcx,\mathpzcy), P(other(\mathpzcx,\mathpzcy)) = 0.1.

Suppose we have domain . Figure 1 depicts the Bayesian network generated by and Expression (3.1).

For a given Bayesian network specification and a domain , denote by the Bayesian network obtained by grouning with respect to . The set of all first-order Bayesian network specifications is denoted by .

### 3.2 Probabilistic Turing machines and the complexity class PP

If a Turing machine is such that, whenever its transition function maps to a non-singleton set, the transition is selected with uniform probability within that set, then the Turing machine is a

probabilistic Turing machine. The complexity class is the set of languages that are decided by a probabilistic Turing machine in polynomial time, with an error probability strictly less than for all input strings.

Intuitively, represents the complexity of computing probabilities for a phenomenon that can be simulated by a polynomial probabilistic Turing machine.

This complexity class can be equivalently defined as follows: a language is in if and only if there is a polynomial nondeterministic Turing machine such that a string is in the language if and only if more than half of the computation paths of the machine end in the accepting state when the string is the input. We can imagine that there is a special class of nondeterministic Turing machines that, given an input, not only accept it or not, but actually write in some special tape whether that input is accepted in the majority of computation paths. Such a special machine could then be used directly to decide a language in .

## 4 B(FFFO) captures PP

Given a first-order Bayesian network specification and a domain, an evidence piece is a partial interpretation; that is, an evidence piece assigns a truth value for some groundings of predicates.

We encode a pair domain/evidence using the same strategy used before to encode a structure; however, we must take into account the fact that a particular grounding of a predicate can be either assigned true or false or be left without assignment. So we use a pair of symbols in to encode each grounding; we assume that means “false” and means “true”, while say means lack of assignment.

Say there is an isomorphism between pairs and when there is a bijective mapping between the domains such that if is true in , then is true in , and moreover if is true in , then is true in (where again denotes the inverse of ). A set of pairs domain/evidence is isomorphism-closed if whenever a pair is in the set, all pairs that are isomorphic to it are also in the set.

Suppose a set of pairs domain/evidence is given with respect to a fixed vocabulary . Once encoded, these pairs form a language that can for instance belong to or to . One can imagine building a Bayesian network specification on an extended vocabulary consisting of plus some additional predicates, so as to decide this language of domain/evidence pairs. For a given input pair , the Bayesian network specification and the domain lead to a Bayesian network ; this network can be used to compute the probability of some groundings, and that probabiility in turn can be used to accept/reject the input. This is the sort of strategy we pursue.

The point is that we must determine some prescription by which, given a Bayesian network and an evidence piece, one can generate an actual decision so as to accept/reject the input pair domain/evidence. Suppose we take the following strategy. Assume that in the extended vocabulary of there are two sets of distinguished auxiliary predicates and that are not in . We can use the Bayesian network to compute the probability where and are interpretations of and respectively. And then we might accept/reject the input on the basis of . However, we cannot specify particular intepretations and as the related predicates are not in the vocabulary . Thus the sensible strategy is to fix attention to some selected pair of intepretations; we simply take the interpretations that assign true to every grounding.

In short: use the Bayesian network to determine whether or not , where assigns true to every grounding of , and assigns true to every grounding of . If this inequality is satisfied, the input pair is accepted; if not, the input pair is rejected.

We refer to as the conditioned predicates, and to as the conditioning predicates.

Here is the main result:

###### Theorem 2.

Let be an isomorphism-closed set of pairs domain/evidence of some non-empty finite vocabulary, where all domains are finite. Then is in if and only if is the class of domain/evidence pairs that are accepted by a fixed first-order Bayesian network specification with fixed conditioned and conditioning predicates.

###### Proof.

First, if is a class of domain/query pairs that are accepted by a fixed first-order Bayesian network specification, they can be decided by a polynomial time probability Turing machine. To see that, note that we can build a nondeterministic Turing machine that guesses the truth value of all groundings that do not appear in the query (that is, not in ), and then verify whether the resulting complete interpretation is a model of the first-order Bayesian network specification (as model checking of a fixed first-order sentence is in [2]).

To prove the other direction, we must adapt the proof of Fagin’s theorem as described by Grädel [1], along the same lines as the proof of Theorem 1 by Saluja et al. [3]. So, suppose that is a language decided by some probabilistic Turing machine. So equivalently there is a nondeterministic Turing machine that determines whether the majority of its computation paths accept an input, and accepts/rejects the input accordingly. By the mentioned proof of Fagin’s theorem, there is a first-order sentence with vocabulary consisting of the vocabulary of the input plus additional auxiliary predicates, such that each interpretation of this joint vocabulary is a model of the sentence if it is encodes a computation path of the Turing machine, as long as there is an available additional predicate that is guaranteed to be a linear order on the domain. Denote by the zero arity predicate with associated definition . Suppose a linear order is indeed available; then by creating a first-order Bayesian network specification where all groundings are associated with probability , and where a non-root node is associated with the sentence in the proof of Fagin’s theorem, we have that the probability of the query is larger than iff the majority of computation paths accept. The challenge is to encode a linear order. To do so, introduce a new predicate and the first-order sentence that forces to be a total order, and a zero arity predicate that is associated with definition . Now an input domain/pair is accepted by the majority of computation paths in the Turing machine if and only if we have . Note that there are actually linear orders that satisfy , but for each one of these linear orders we have the same assignments for all other predicates, hence the ratio between accepting computations and all computations is as desired. ∎

We might picture this as follows. There is always a Turing machine and a corresponding triple such that for any pair , we have

 (D,E) as input to TM with output % given by P(TM accepts (D,E))>1/2,

if and only if

 (D,E) as input'' to (τ,A,B) with output% '' given by Pτ,D(A|B,E)>1/2,

where denotes probability with respect to . (Of course, there is no need to use only zero-arity predicates and , as Theorem 2 allows for sets of predicates.)

Note that the same result could be proved if every evidence piece was taken to be a complete interpretation for the vocabulary . In that case we could directly speak of structures as inputs, and then the result would more closely mirror Fagin’s theorem. However it is very appropriate, and entirely in line with practical use, to take the inputs to a Bayesian network as the groundings of a partially observed interpretation. Hence we have preferred to present our main result as stated in Theorem 2.

## References

• [1] Erich Grädel. Finite model theory and descriptive complexity. In Grädel, E., Kolaitis, P .G., Libkin, L., Marx, M., Spencer, J., Vardi, M.Y., Venema, Y., Weinstein, S., editors, Finite Model Theory and Its Applications, pp. 125–230, Springer 2007.
• [2] Leonid Libkin. Elements of Finite Model Theory, Springer, 2012.
• [3] Sanjeev Saluja, K. V. Subrahmanyam and Madhukar N. Thakur. Descriptive complexity of functions. Journal of Computer and System Sciences, 50:493–505, 1995.