Constraint-Based Inference in Probabilistic Logic Programs

04/26/2018
by   Arun Nampally, et al.
Stony Brook University
0

Probabilistic Logic Programs (PLPs) generalize traditional logic programs and allow the encoding of models combining logical structure and uncertainty. In PLP, inference is performed by summarizing the possible worlds which entail the query in a suitable data structure, and using it to compute the answer probability. Systems such as ProbLog, PITA, etc., use propositional data structures like explanation graphs, BDDs, SDDs, etc., to represent the possible worlds. While this approach saves inference time due to substructure sharing, there are a number of problems where a more compact data structure is possible. We propose a data structure called Ordered Symbolic Derivation Diagram (OSDD) which captures the possible worlds by means of constraint formulas. We describe a program transformation technique to construct OSDDs via query evaluation, and give procedures to perform exact and approximate inference over OSDDs. Our approach has two key properties. Firstly, the exact inference procedure is a generalization of traditional inference, and results in speedup over the latter in certain settings. Secondly, the approximate technique is a generalization of likelihood weighting in Bayesian Networks, and allows us to perform sampling-based inference with lower rejection rate and variance. We evaluate the effectiveness of the proposed techniques through experiments on several problems. This paper is under consideration for acceptance in TPLP.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

04/03/2019

Symbolic Exact Inference for Discrete Probabilistic Programs

The computational burden of probabilistic inference remains a hurdle for...
10/19/2012

CLP(BN): Constraint Logic Programming for Probabilistic Knowledge

We present CLP(BN), a novel approach that aims at expressing Bayesian ne...
12/12/2011

Inference in Probabilistic Logic Programs with Continuous Random Variables

Probabilistic Logic Programming (PLP), exemplified by Sato and Kameya's ...
05/23/2021

PASOCS: A Parallel Approximate Solver for Probabilistic Logic Programs under the Credal Semantics

The Credal semantics is a probabilistic extension of the answer set sema...
06/02/2020

Generating Random Logic Programs Using Constraint Programming

Testing algorithms across a wide range of problem instances is crucial t...
04/23/2018

Top-down and Bottom-up Evaluation Procedurally Integrated

This paper describes how XSB combines top-down and bottom-up computation...
06/09/2011

Parameter Learning of Logic Programs for Symbolic-Statistical Modeling

We propose a logical/mathematical framework for statistical parameter le...

1 Introduction

A wide variety of models that combine logical and statistical knowledge can be expressed succinctly in the Probabilistic Logic Programming (PLP) paradigm. The expressive power of PLP goes beyond that of traditional probabilistic graphical models (eg. Bayesian Networks (BNs) and Markov Networks (MNs)) as can be seen in the examples in Figs. 1 and 2. These examples are written in PRISM, a pioneering PLP language [Sato and Kameya, 1997]. While the example in Fig. 1

encodes the probability distribution of a palindrome having a specific number of occurrences of a given character, the example in Fig.

2 encodes the probability that at least two persons in a given set have the same birthday. Examples such as these and other models like reachability over graphs with probabilistic links illustrate how logical clauses can be used to specify models that go beyond what is possible in traditional probabilistic graphical models.

% Generate a list of N random variables. genlist(0, []). genlist(N, L) :-     N > 0,     msw(flip, N, X),     L = [X|L1],     N1 is N-1,     genlist(N1, L1). % Evidence: list is a palindrome. evidence(N) :-     genlist(N, L), palindrome(L). % Query: string has K ’a’s query(N, K) :-     genlist(N, L), count_as(L, K). % Check if a given list is a palindrome palindrome(L) :- phrase(palindrome, L). palindrome --> []. palindrome --> [_X]. palindrome --> [X], palindrome, [X]. % Query condition: count_as([], 0). count_as([X|Xs], K) :-     count_as(Xs, L),     (X=a -> K is L+1; K=L). % Domains: values(flip, [a,b]). % Distribution parameters: set_sw(flip, [0.5, 0.5]).
Figure 1: Palindrome PLP

The Driving Problem.

The expressiveness of PLP comes at a cost. Since PLP is an extension to traditional logic programming, inference in PLP is undecidable in general. Inference is intractable even under strong finiteness assumptions. For instance, consider the PRISM program in Fig. 1. In that program, genlist/2 defines a list of the outcomes of

identically distributed random variables ranging over

{a,b} (through msw/3 predicates). Predicate palindrome/1 tests, using a definite clause grammar definition, if a given list is a palindrome; and count_as/2 tests if a given list contains (not necessarily consecutive) “a”s. Using these predicates, consider the inference of the conditional probability of given : i.e., the probability that an -element palindrome has a”s.

% Two from a population of
% size N share a birthday.
same_birthday(N) :-
    person(N, P1),
    msw(b, P1, D),
    person(N, P2),
    P1 < P2,
    msw(b, P2, D)
% Bind P, backtracking
% through 1..N
person(N, P) :-
    basics:for(P, 1, N).
% Distribution parameters:
:- set_sw(b(_),
    uniform(1,365)).
Figure 2: Birthday PLP

The conditional probability is well-defined according to PRISM’s distribution semantics [Sato and Kameya, 2001]. However, PRISM itself will be unable to correctly compute the conditional query’s probability, since the conditional query, as encoded above, will violate the PRISM system’s assumptions of independence among random variables used in an explanation. Moreover, while the probability of goal evidence(N) can be computed in linear time (using explanation graphs), the conditional query itself is intractable, since the computation is dominated by the binomial coefficient . This is not surprising since probabilistic inference is intractable over even simple statistical models such as Bayesian networks. Consequently, exact inference techniques used in PLP systems such as PRISM, ProbLog [De Raedt et al., 2007] and PITA [Riguzzi and Swift, 2011], have exponential time complexity when used on such programs.

Approximate inference based on rejection sampling also performs poorly, rejecting a vast number of generated samples, since the likelihood of a string being a palindrome decreases exponentially in

. Alternatives such as Metropolis-Hastings-based Markov Chain Monte Carlo (MCMC) techniques 

[Hastings, 1970, e.g.] do not behave much better: the chains exhibit poor convergence (mixing), since most transitions lead to strings inconsistent with evidence. Gibbs-sampling-based MCMC [Geman and Geman, 1984] cannot be readily applied since the dependencies between random variables are hidden in the program and not explicit in the model.

Our Approach.

In this paper, we use PRISM’s syntax and distribution semantics, but without the requirements imposed by the PRISM system, namely, that distinct explanations of an answer are pairwise mutually exclusive and all random variables within an explanation are independent. We, however, retain the assumption that distinct random variable instances are independent. Thus we consider PRISM programs with their intended model-theoretic semantics, rather than that computed by the PRISM system.

We propose a data structure called Ordered Symbolic Derivation Diagram (OSDD) which represents the set of possible explanations for a goal symbolically. The key characteristic of OSDDs is the use of constraints on random variables. This data structure is constructed through tabled evaluation on a transformed input program. For example, the OSDD for “same_birthday(3)” from example Fig. 2 is shown in Fig. 3(b). This data structure can be used for performing exact inference in polynomial time as will be described later. In cases where exact inference is intractable, OSDDs can be used to perform sampling based inference. For example, the OSDD for “evidence(6)” from example Fig. 1 is shown in Fig. 3 (a) and can be used for performing likelihood weighted sampling [Fung and Chang, 1990; Shachter and Peot, 1990].

scale=0.75

msw(flip, 6, X1)

msw(flip, 5, X2)

msw(flip, 4, X3)

msw(flip, 3, X4)

msw(flip, 2, X5)

msw(flip, 1, X6)

scale=0.85

msw(b,1,X1)

msw(b,2,X2)

msw(b,3,X3)

1

364

1

1

363
(a) OSDD for evidence(6) (b) OSDD for same_birthday(3)
Figure 3: OSDDs for introductory examples

The rest of the paper is organized as follows. In Section 2 we formally define OSDDs and the operations on OSDDs. Next we give the procedure for construction of OSDDs. This procedure relies on a program transformation which is explained in Section 3. Next we give the exact and approximate inference algorithms using OSDDs in Section 4. We present the experimental results in Section 5, related work in Section 6, and concluding remarks in Section 7.

2 Ordered Symbolic Derivation Diagrams

Notation:

We assume familiarity with common logic programming terminology such as variables, terms, substitutions, predicates and clauses. We use Prolog’s convention, using identifiers beginning with a lower case letter to denote atomic constants or function symbols, and those beginning with an upper case letter to denote variables. We often use specific symbols such as to denote ground terms, and to denote integer indices. We assume an arbitrary but fixed ordering ”” among variables and ground terms.

A type is a finite collection of ground terms. In this paper, types represent the space of outcomes of switches or random processes. For example in Palindrome example of Fig 1, the set of values is a type. A variable referring to the outcome of a switch is a typed variable; its type, denoted , is deemed to be the same as the space of outcomes of . The type of a ground term can be any of the sets it is an element of.

Definition 1 (Atomic Constraint).

An atomic constraint, denoted , is of the form or , where is a variable and is a variable or a ground term of the same type as . When is a ground term, we can assert that .

A set of atomic constraints representing their conjunction is called a constraint formula. Constraint formulas are denoted by symbols and . Note that atomic constraints are closed under negation, while constraint formulas are not closed under negation.

Definition 2 (Constraint Graph).

The constraint graph for a constraint formula is labeled undirected graph, whose nodes are variables and ground terms in . Edges are labeled with ”” and ”” such that

  • and are connected by an edge labeled ”” if, and only if, is entailed by .

  • and are connected by an edge labeled ”” if, and only if, is entailed by , and at least one of is a variable.

Note that a constraint graph may have edges between two terms even when there is no explicit constraint on the two terms in .

Definition 3 (Ordering).

Based on an (arbitrary but fixed) ordering among ground terms, we define an ordering on ground switch instance pairs and as follows:

  • If then for all ground terms and .

  • If then .

  • If then .

Canonical representation of constraint formulas.

A constraint formula is represented by its constraint graph which in turn can be represented as a set of triples each of which represents an edge in the constraint graph. Recall that variables and ground terms can be compared using total order . Assuming an order between the two symbols ”” and ””, we can define a lexicographic order over each edge triple, and consequently order the elements of the edge set. The sequence of edge triples following the above order is a canonical representation of a constraint formula. Using this representation, we can define a total order, called the canonical order, among constraint formulas themselves given by the lexicographic ordering defined over the edge sequences.

Given any constraint formula , its negation is given by a set of constraint formulas . The above defines negation of a formula to be a set of constraint formulas which are pairwise mutually exclusive and together represent the negation.

The set of solutions of a constraint formula is denoted and their projection onto a variable is denoted . The constraint formula is unsatisfiable if , and satisfiable otherwise. Note that substitutions can also be viewed as constraint formulas.

Prism.

The following is a high-level overview of PRISM. PRISM programs have Prolog-like syntax (see Fig. 4). In a PRISM program the msw relation (“multi-valued switch”) has a special meaning: msw(X,I,V) says that V is the outcome of the I-th instance from a family X of random processes. The set of variables are i.i.d. for a given random process .

The distribution parameters of the random variables are specified separately.

The program in Fig. 4

encodes a Hidden Markov Model (HMM) in PRISM.

Figure 4: PRISM program for an HMM

The set of observations is encoded as facts of predicate obs, where obs(I,V) means that value V was observed at time I. In the figure, the clause defining hmm says that T is the N-th state if we traverse the HMM starting at an initial state S (itself the outcome of the random process init). In hmm_part(I, N, S, T), S is the I-th state, T is the N-th state. The first clause of hmm_part defines the conditions under which we go from the I-th state S to the I+1-th state NextS. Random processes trans(S) and emit(S) give the distributions of transitions and emissions, respectively, from state S.

The meaning of a PRISM program is given in terms of a distribution semantics [Sato and Kameya, 1997, 2001]. A PRISM program is treated as a non-probabilistic logic program over a set of probabilistic facts, the msw relation. An instance of the msw relation defines one choice of values of all random variables. A PRISM program is associated with a set of least models, one for each msw relation instance. A probability distribution is then defined over the set of models, based on the probability distribution of the msw relation instances. This distribution is the semantics of a PRISM program. Note that the distribution semantics is declarative. For a subclass of programs, PRISM has an efficient procedure for computing this semantics based on OLDT resolution [Tamaki and Sato, 1986].

Inference in PRISM proceeds as follows. When the goal selected at a step is of the form msw(X,I,Y), then Y is bound to a possible outcome of a random process X. Thus in PRISM, derivations are constructed by enumerating the possible outcomes of each random variable. The derivation step is associated with the probability of this outcome. If all random processes encountered in a derivation are independent, then the probability of the derivation is the product of probabilities of each step in the derivation. If a set of derivations are pairwise mutually exclusive, the probability of the set is the sum of probabilities of each derivation in the set. PRISM’s evaluation procedure is defined only when the independence and exclusiveness assumptions hold. Finally, the probability of an answer is the probability of the set of derivations of that answer.

Osdd.

We use OSDDs to materialize derivations in a compact manner. OSDDs share a number of features with Binary Decision Diagrams (BDDs) [Bryant, 1992] and Multivalued Decision Diagrams (MDDs) [Kam et al., 1998]. BDDs are directed acyclic graphs representing propositional boolean formulas, with leaves labeled from and internal nodes labeled with propositions. In a BDD, each node has two outgoing edges labeled and , representing a true and false valuation, respectively, for the node’s proposition. An MDD generalizes a BDD where internal nodes are labeled with finite-domain variables, and the outgoing edges are labeled with the possible valuations of that variable. In an OSDD, internal nodes represent switches and the outgoing edges are labeled with constraints representing the possible outcomes of that node’s switch.

Definition 4 (Ordered Symbolic Derivation Diagram).

An ordered symbolic derivation diagram over a set of typed variables is a tree, where leaves are labeled from the set and internal nodes are labeled by triples of the form , where and are switch and instance respectively and . We call the output variable of the node. The edges are labeled by constraint formulas over . We represent OSDDs by textual patterns where is the label of the root and each sub-OSDD is connected to the root by an edge labeled . OSDDs satisfy the following conditions:

  1. Ordering: For internal nodes and , if is the parent of , then . The edges are ordered by using the canonical ordering of the constraint formulas labeling them.

  2. Mutual Exclusion: The constraints labeling the outgoing edges from an internal node are pairwise mutually exclusive (i.e., for each and , ).

  3. Completeness: Let be a sub-OSDD and let be any substitution that satisfies all constraints on the path from root to the given sub-OSDD such that . Then there is a such that satisfies .

  4. Urgency: Let be the set of output variables in the path from the root to node (including ). Then for every constraint formula labeling an outgoing edge from , it holds that and for every ancestor of , .

  5. Explicit constraints: If constraint formula out of a node entails an implicit atomic constraint on variables in , then occurs explicitly in the path from root to . A consequence of this condition is that the conjunction of constraint formulas labeling edges in a path will be satisfiable.

A tree which satisfies all conditions of an OSDD except conditions 4 and 5 is called an improper OSDD.

Example 1 (OSDD properties).

We illustrate the definiton by using the OSDD shown in Fig. 3(b). The OSDD is represented by the textual pattern where is the sub-tree rooted at the node labeled , which in turn can be represented by the textual pattern where is the sub-tree rooted at the node and so on. The internal nodes satisfy the total ordering based on the instance numbers. All outgoing edges from an internal node are pairwise mutually exclusive. For instance for the outgoing edges of the node labeled , is mutually exclusive w.r.t to and . Similarly, is mutually exclusive to . Consider the sub-OSDD rooted at any substitution that grounds satisfies the (empty) constraints on the path from the root to that sub-tree. Further any such substitution will satisfy exactly one of the edge constraints or . It is obvious from the example that urgency is satisfied. Finally consider the constraint formula . This entails the implicit constraint . However, this constraint is explicitly found in the path from the root to that sub-tree. Therefore the OSDD in Fig. 3(b) is a proper OSDD.

OSDDs can be viewed as representing a set of explanations or derivations where a node of the form binds . This observation leads to the definition of bound and free variables:

Definition 5 (Bound and Free variables).

Given an OSDD the bound variables of , denoted , are the output variables in . The free variables of , denoted , are those variables which are not bound.

Each OSDD corresponds to an MDD which can be constructed as follows.

Definition 6 (Grounding).

Given an OSDD , the MDD corresponding to it is denoted and is recursively defined as where and such that is satisfiable.

Example 2 (Grounding).

As an example consider a smaller version of the OSDD shown in Fig. 3(a) as shown in Fig. 5(a). In the first step of the grounding, all values satisfy the (empty) constraint of the outgoing edge. Therefore we get two subtrees which are identical except the substitution that is applied to the variable . In the next step (which we omit here), the subtrees get ground. Consider the left subtree, the value “a” satisfies the left branch, while the value “b” satisfies the right branch. Therefore in effect those edges get relabelled by “a” and “b” and similarly for the right subtree.

scale=0.75

msw(flip, 1, X1)

msw(flip, 2, X2)

scale=0.6

msw(flip, 1, X1)

msw(flip, 2, X2)

msw(flip, 2, X2)

(a) OSDD (b) Grounding
Figure 5: Example of grounding OSDD

Canonical OSDD representation.

Given a total order among variables and terms, the order of nodes in an OSDD is fixed. We can further order the outgoing edges uniquely based on the total order of constraints labeling them. This yields a canonical representation of OSDDs. In the rest of the paper we assume that OSDDs are in canonical form.

Definition 7 (Equivalence).

All OSDD leaves which have the same node label are equivalent. Two OSDDs and are equivalent if .

We now define common operations over OSDDs which can create OSDDs from primitives.

Definition 8 (Conjunction/Disjunction).

Given OSDDs and , let stand for either or operation. Then is defined as follows.

  • If , then

  • If then

  • If , first we apply the substitution to the second OSDD. Then .

Example 3 (Conjunction/Disjunction).

Consider the input OSDDs in Fig. 6(a). Their disjunction is shown in Fig. 6(b). Disjunction of this OSDD with a third OSDD involving switch instance pairs results in the OSDD shown in Fig. 3(b).

scale=0.7

msw(b,1,X1)

msw(b,2,X2)

scale=0.7

msw(b,1,X1)

msw(b,3,X3)

scale=0.7

msw(b,1,X1)

msw(b,2,X2)

msw(b,3,X3)

(a) Input OSDDs (b) Disjunction
Figure 6: Disjunction of OSDDs

Although OSDDs have been defined as trees, we can turn them into DAGs by combining equivalent subtrees. It is easy to generalize the above operations to work directly over DAGs. The above operation may produce improper OSDDs, but can be readily transformed to proper OSDDs as follows

Transformation from improper to proper OSDDs.

scale=0.7

X

Y

Z

scale=0.75

X

Y

Z

Z

(a) Improper OSDD (b) Proper OSDD
Figure 7: Transformation example

When performing and/or operations over proper OSDDs, the resulting OSDD may be improper. For instance, consider one OSDD with variables , and another with variables . Constraints between and , and those between and may imply constraints between and that may not be explicitly present in the resulting OSDD, thereby violating the condition of explicit constraints (Def.  4).

Fig. 7(a) shows an improper OSDD that violates the explicit constraints condition. For example, the edge , leading to the sub-OSDD , has edge constraints which imply while the edges leading to and imply .

An improper OSDD can be converted into a proper one by rewriting it using a sequence of steps as follows. First, we identify an implicit constraint and where it may be explicitly added without violating the urgency property. In the example, we identify as implicit, and attempt to introduce it on the outgoing edges of node labeled . This introduction splits the edge from to into two: one labeled , and another labeled , the negation of the identified constraint. The original child rooted at is now replicated due to this split. We process each child, eliminating edges and corresponding sub-OSDDs whenever the constraints are unsatisfiable. In the example in Fig. 7(b) we see that the edges and have been removed from since their constraint formula are inconsistent with . We repeat this procedure until no implicit constraints exist in .

Constraint application.

An OSDD may be specialized for a context which constrains its variables, as follows.

Definition 9 (Constraint application).

Given an OSDD and an atomic constraint , the application of to results in a new OSDD as follows:

  • Application of to yields respectively.

  • If where is the root of , then

  • Else where each results from the application of to .

Properties.

OSDDs and the operations defined above have a number of properties necessary for their use in representing explanations for query evaluation in PLPs.

Proposition 1 (Closure).

OSDDs are closed under conjunction and disjunction operations.

The following shows that conjunction and disjunction operations over OSDDs lift the meaning of these operations over ground MDDs.

Proposition 2 (Compatibility with Grounding).

Let and be two OSDDs, then

3 Construction

Given a definite PLP program and a ground query, we construct an OSDD as the first step of the inference process. The construction is done via constraint-based tabled evaluation over a transformed program. At a high level, each -ary predicate in the original PLP program is transformed into a -ary predicate with one of the new arguments representing an OSDD at the time of call to , and the other representing OSDDs for answers to the call.

For simplicity, although the transformed program represents OSDDs as Prolog terms, we reuse the notation from Section 2 to describe the transformation.

Transformation.

We use to represent tuples of arguments. Clauses in a definite program may be of one of two forms:

  • Fact : is transformed to another fact of the form , denoting that a fact may bind its arguments but do not modify a given OSDD.

  • Clause : without loss of generality, we assume that the body is binary: i.e., clauses are of the form . Such clauses are transformed into

For each user-defined predicate in the input program, we add the following directive for the transformed predicate

:- table lattice(or/3)).

which invokes answer subsumption [Swift and Warren, 2010] to group all answers by their first arguments, and combine the -nd argument in each group using or/3, which implements the disjunction operation over OSDDs.

Constraint-Based Evaluation.

An important aspect of OSDD construction is constraint processing. Our transformation assumes that constraints are associated with variables using their attributes [Holzbaur, 1992]. We assume the existence of the following two predicates:

  • inspect(X, C), which, given a variable X, returns the constraint formula associated with X; and

  • store(C), which, given a constraint formula C, annotates all variables in C with their respective atomic constraints.

For tabled evaluation, we assume that each table has a local constraint store [Sarna-Starosta and Ramakrishnan, 2007]. Such a constraint store can be implemented using the above two predicates.

OSDD Builtins.

The construction of the transformed program is completed by defining predicates to handle the two constructs that set PLPs apart:

  • msw(S, K, X, , ): Note that msw’s in the body of a clause would have been transformed to a 5-ary predicate. This predicate is defined as:

    where and/3 implements conjunction operation over OSDDs.

  • Constraint handling: constraints in the input program will be processed using:

    where applyConstraint is an implementation of Defn. 9

To compute the OSDD for a ground atom in the original program, we evaluate to obtain the required OSDD as .

4 Inference

Exact Inference.

Given an OSDD , let be the Cartesian product of the types of each . We define a function which maps an OSDD and a substitution to . Its definition for leaves is as follows

Next for an OSDD and an arbitrary substitution , define to be

where constraint formulas and are said to be compatible if their conjunction is satisfiable (denoted ). For internal nodes we define as

where is the probability that -th instance of has outcome .

Given a ground query whose OSDD is we return the answer probability as .

Proposition 3 (Complexity).

The time complexity of probability computation of OSDD is where is the maximum cardinality of all types, is the number of nodes in , and is the size of the largest set of free variables among all internal nodes of .

Under certain conditions we can avoid the exponential complexity of the naive probabilistic inference algorithm. By exploiting the regular structure of the solution space to a constraint formula we avoid the explicit summation . In this case we say that is measurable. We formally define measurability and a necessary and sufficient condition for measurability.

Definition 10 (Measurability).

A satisfiable constraint formula is said to be measurable w.r.t if for all ground substitutions on which satisfy , is equal to a unique value called the measure of in .

Definition 11 (Saturation).

A constraint formula is said to be saturated if its constraint graph satisfies the following condition: For every , let be the set of nodes connected to with a edge. Then there exists an edge ( or ) between each pair of nodes in (except when both nodes in the pair represent constants).

Proposition 4 (Condition for Measurability).

A satisfiable constraint formula is measurable w.r.t all of its variables if and only if it saturated.

Definition 12 (Measurability of OSDDs).

An OSDD is said to be measurable, if for each internal node and outgoing edge labeled , the constraint formula obtained by taking the conjunction of with the constraint formula on the path from root to is measurable w.r.t the output variable in node .

Example 4 (Measurability).

As an example, consider the OSDD for the birthday problem. It satisfies the measurability condition. The measures for the constraint formulas labeling the edges are shown in Fig. 3(b).

When an OSDD is measurable and all distributions are uniform, the probability computation gets specialized as follows:

where is an arbitrary value in and .

Proposition 5 (Complexity for Measurable OSDDs).

If an OSDD

is measurable and all switches have uniform distribution, the complexity of computing probability of

is where is the maximum cardinality of all types and is the number of nodes in .

Likelihood Weighted Sampling.

Likelihood weighting is a popular sampling based inference technique in BNs. The technique can be described as follows: Sample all variables except evidence variables in the topological order. The values of evidence variables are fixed and each one of them contributes a likelihood weight

, which is the probability of its value given the values of its parents. The likelihood weight of the entire sample is the product of the likelihood weights of all evidence variables. This technique has been shown to produce sample estimates with lower variance than independent sampling

[Fung and Chang, 1990; Shachter and Peot, 1990].

Likelihood weighted sampling can be generalized to PLPs as follows: Given an OSDD for evidence, generate a sample as follows:

  • Construct where .

  • If sample from the distribution of leaving likelihood weight of the sample unchanged.

  • Otherwise, sample uniformly from and multiply the likelihood weight of the sample by . Let for some . Then continue construction of the sample by recursively sampling from the OSDD

For PLPs encoding BNs and MNs, the simple nature of evidence allows us to generate only consistent samples. However, for general queries in PLP, it is possible to reach a node whose edge constraints to non- children are unsatisfiable. In that case, we reject the current sample and restart. Thus, we generalize traditional LW sampling.

To compute conditional probabilities, samples generated for evidence are extended by evaluating queries. The conditional probability of the query given evidence is computed as the sum of the likelihood weights of the samples satisfying the query and evidence divided by the sum of the likelihood weights of the samples which satisfy evidence. To compute unconditional probability of a query, we simply compute the average likelihood weight of the samples satisfying the query.

Example 5 (Likelihood weighting).

Consider the OSDD shown in Fig. 3(a). To generate a likelihood weighted sample we start from the root and sample the random variables. The first three nodes do not have any constraints on their outgoing edges, therefore we can sample those random variables from their distributions. Assume that we get the sequence “aba”. The likelihood weight of the sample remains 1 at this stage. When sampling the random variables at the next three nodes, gets restricted to a single value. Since the distributions are all uniform the likelihood of the entire sample becomes . All samples would have the same likelihood weight and therefore the probability of “evidence(6)” is

5 Experimental Evaluation

We present the results of experiments using a prototype implementation of a likelihood weighted sampler based on OSDDs. The prototype uses XSB Prolog to construct OSDDs, and a few modules written in C for maintaining the sampler’s state and dealing with random variable distributions. We used the following examples in the experiments.

(a) Palindrome (b) Birthday
Figure 8: Experimental Results
  • Palindrome, which is shown in Fig. 1, with evidence limited to strings of length , and query checking a”s. While the compuation of evidence probability is easy, the computation of the probability for conjunction of query and evidence is not. This is because, the query searches for all possible combinations of ’K’ positions. Therefore, for large problem sizes, likelihood weighting has to be used to get approximate answers.

  • Birthday, shown in Fig. 2 with population size of , i.e. query same_birthday(6). While this query can be evaluated by exact inference due to measurability, we use it to test the performance of likelihood weighted sampler.

The results of the experiments are shown in Fig. 8. Each subfigure plots the estimated probability and variance of the estimate (on log scale), for two samplers: the LW method described in this paper, and a simple independent sampler (with rejection sampling for conditional queries). Note that the LW sampler’s results show significantly lower variance in both the examples.

For the Palindrome example, the LW sampler quickly converges to the actual probability, while the independent sampler fails to converge even after a million samples. The unusual pattern of variance for independent sampler in the initial iterations is due to it not being able to generate consistent samples and hence not having an estimate for the answer probability. In the birthday example, we notice that all consistent samples have the same likelihood weight and they are quite low. Due to this reason, likelihood weighting doesn’t perform much better than independent sampling. Interestingly, independent sampling using the OSDD structure was significantly faster (up to ) than using the program directly. This is because the program’s non-deterministic evaluation has been replaced by a deterministic traversal through the OSDD.

Comparison with PITA and ProbLog samplers

We compared the performance of LW sampler with the sampling based inference of ProbLog and PITA 111We used a core i5 machine with 16 GB memory running macOS 10.13.4 222ProbLog version: 2.1.0.19 and PITA available with SWI-Prolog 7.6.4. ProbLog provides independent sampling along with an option to propagate evidence. We didn’t find these to be effective and ProbLog failed to generate consistent samples in reasonable time (5 mins). PITA sampling [Riguzzi, 2011] on the other hand provided better performance (see table 1). Due to the lack of constraint processing, the time required per sample is higher for PITA although the convergence behavior is similar. It should be noted that PITA counts only consistent samples, and there can be many attempts at generation of a consistent sample. In contrast, our constraint processing techniques allow us to generate a consistent sample at every attempt (for the specific examples considered). The plots showing estimated answer probability and the variance of the estimates for the two examples for PITA are shown in Fig. 9.

Problem OSDD gen. LW ProbLog ProbLog-pe Mcint.-rej. Mcint.-MH
palindrome 0.019 0.00017 188 na 0.41 7.7
birthday 0.025 1.7e-5 19 26 0.004 0.003
Table 1: Time for OSDD computation and per consistent sample (seconds)
PITA palindrome PITA birthday
Figure 9: Experimental Results for PITA samplers

PITA and ProbLog exact inference

We evaluated the exact inference procedures of PITA and ProbLog on the same examples. We used a timeout of 15 minutes for both systems. ProbLog’s inference does not scale beyond small problem sizes for thes two examples. PITA could successfully compute the conditional probabilities for Palindrome with . PITA’s inference completed for the Birthday example with population size , but ran out of memory for larger population sizes. The table 2 shows the time required to construct the OSDDs for the birthday example and the palindrome example. The leftmost column gives the population size/length of the string. While the birthday problem has a single osdd, the palindrome example requires two osdds: the one for evidence and one for conjunction of query and evidence. It turns out that all of them satisfy the measurability property. However we note that the osdd for conjunction of query and evidence for palindrome example is intractable for large problem sizes. The evidence osdd for palindrome on the other hand is very simple and scales well. The columns with title “M. prob” show the time required to compute the probability from the OSDDs by exploiting measurability.

Birthday Palindrome
Size osdd M. prob. evid. osdd qe osdd M. prob
6 0.025 0.005 0.01 0.043 0.003
8 0.079 0.008 0.01 0.208 0.006
10 0.215 0.014 0.0109 0.7189 0.008
12 0.795 0.024 0.0129 1.877 0.008
14 5.49 0.035 0.014 4.94 0.033
16 52.83 0.056 0.017 16.03 0.076
Table 2: Time for OSDD generation and probability computation (seconds)

6 Related Work

Symbolic inference based on OSDDs was first proposed in [Nampally and Ramakrishnan, 2015]. The present work expands on it two significant ways: Firstly, the construction of OSDDs is driven by tabled evaluation of transformed programs instead of by abstract resolution. Secondly, we give an exact inference procedure for probability computation using OSDDs which generalizes the exact inference procedure with ground explanation graphs.

Probabilistic Constraint Logic Programming [Michels et al., 2013] extends PLP with constraint logic programming (CLP). It allows the specification of models with imprecise probabilities. Whereas a world in PLP denotes a specific assignment of values to random variables, a world in PCLP can define constraints on random variables, rather than specific values. Lower and upper bounds are given on the probability of a query by summing the probabilities of worlds where query follows and worlds where query is possibly true respectively. While the way in which “proof constraints” of a PCLP query are obtained is similar to the way in which symbolic derivations are obtained (i.e., through constraint based evaluation), the inference techniques employed are completely different with PCLP employing satisfiability modulo theory (SMT) solvers.

cProbLog extends ProbLog with first-order constraints [Fierens et al., 2012]. This gives the ability to express complex evidence in a succinct form. The semantics and inference are based on ProbLog. In contrast, our work makes the underlying constraints in a query explicit and uses the OSDDs to drive inference.

CLP() [Costa et al., 2002] extends logic programming with constraints which encode conditional probability tables. A CLP(

) program defines a joint distribution on the ground skolem terms. Queries are answered by performing inference over a corresponding BN.

There has been a significant interest in the area of lifted inference as exemplified by the work of [Poole, 2003; Braz et al., 2005; Milch et al., 2008]. The main idea of lifted inference is to treat indistinguishable instances of random variables as one unit and perform inference at the population level. Lifted inference in the context of PLP has been performed by converting the problem to parfactor representation [Bellodi et al., 2014] or weighted first-order model counting [Van den Broeck et al., 2011]. Lifted explanation graphs [Nampally and Ramakrishnan, 2016] are a generalization of ground explanation graphs, which treat instances of random processes in a symbolic fashion. In contrast, exact inference using OSDDs treats values of random variables symbolically, thereby computing probabilities without grounding the random variables. Consequently, the method in this paper can be used when instance-based lifting is inapplicable. Its relationship to more recent liftable classes [Kazemi et al., 2016] remains to be explored.

The use of sampling methods for inference in PLPs has been widespread. The evidence has generally been handled by heuristics to reduce the number of rejected samples

[Cussens, 2000; Moldovan et al., 2013]. More recently, [Nitti et al., 2016] present an algorithm that generalizes the applicability of LW samples by recognizing when valuation of a random variable will lead to query failure. Our technique propagates constraints imposed by evidence. With a rich constraint language and a propagation algorithm of sufficient power, the sampler can generate consistent samples without any rejections.

Adaptive sequential rejection sampling [Mansinghka et al., 2009] is an algorithm that adapts its proposal distributions to avoid generating samples which are likely to be rejected. However, it requires a decomposition of the target distribution, which may not be available in PLPs. Further, in our work the distribution from which samples are generated is not adapted. It is an interesting direction of research to combine adaptivity with the proposed sampling algorithm.

7 Conclusion

In this work we introduced OSDDs as an alternative data structure for PLP. OSDDs enable efficient inference over programs whose random variables range over large finite domains. We also showed the effectiveness of using OSDDs for likelihood weighted sampling.

OSDDs may provide asymptotic improvements for inference over many classes of first-order probabilistic graphical models. An example of such models is the Logical hidden Markov model (LOHMM) which lifts the representational structure of hidden Markov models (HMMs) to a first-order domain [Kersting et al., 2006]. LOHMMs have proved to be useful for applications in computational biology and sequential behavior modeling. LOHMMs encode first-order relations using abstract transitions of the form where . Any of may be partially ground, and there may be logical variables which are shared between any of the atoms in an abstract transition. Abstract explanations that are obtained by inference over such models that avoids grounding of variables whenever possible can be naturally captured by OSDDs.

References

  • Bellodi et al. [2014] Bellodi, E., Lamma, E., Riguzzi, F., Costa, V. S., and Zese, R. 2014. Lifted variable elimination for probabilistic logic programming. TPLP 14, 4-5, 681–695.
  • Braz et al. [2005] Braz, R. D. S., Amir, E., and Roth, D. 2005. Lifted first-order probabilistic inference. In

    19th Intl. Joint Conf. on Artificial intelligence

    . 1319–1325.
  • Bryant [1992] Bryant, R. E. 1992. Symbolic boolean manipulation with ordered binary-decision diagrams. ACM Computing Surveys 24, 3, 293–318.
  • Costa et al. [2002] Costa, V. S., Page, D., Qazi, M., and Cussens, J. 2002. CLP(BN): Constraint logic programming for probabilistic knowledge. In 19th Conf. on Uncertainty in Artificial Intelligence. 517–524.
  • Cussens [2000] Cussens, J. 2000. Stochastic logic programs: Sampling, inference and applications. In 16th Conf. on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 115–122.
  • De Raedt et al. [2007] De Raedt, L., Kimmig, A., and Toivonen, H. 2007. Problog: A probabilistic Prolog and its application in link discovery. In 20th Intl. Joint Conf. on Artifical Intelligence. 2462–2467.
  • Fierens et al. [2012] Fierens, D., Van den Broeck, G., Bruynooghe, M., and De Raedt, L. 2012. Constraints for probabilistic logic programming. In NIPS Probabilistic Programming Workshop. 1–4.
  • Fung and Chang [1990] Fung, R. M. and Chang, K.-C. 1990. Weighing and integrating evidence for stochastic simulation in bayesian networks. In 5th Annual Conf. on Uncertainty in Artificial Intelligence. 209–220.
  • Geman and Geman [1984] Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 6, 721–741.
  • Hastings [1970] Hastings, W. 1970. Monte carlo sampling methods using markov chains and their applications. Biometrika 57, 1, 97–109.
  • Holzbaur [1992] Holzbaur, C. 1992. Metastructures versus attributed variables in the context of extensible unification. In 4th Intl. Symp. on Programming Language Implementation and Logic Programming. 260–268.
  • Kam et al. [1998] Kam, T., Villa, T., Brayton, R., and Sangiovanni-Vincentelli, A. 1998. Multivalued decision diagrams: Theory and applications. Intl. J. Multiple-Valued Logic 4, 9–62.
  • Kazemi et al. [2016] Kazemi, S. M., Kimmig, A., Broeck, G. V. d., and Poole, D. 2016. New liftable classes for first-order probabilistic inference. In NIPS. 3125–3133.
  • Kersting et al. [2006] Kersting, K., Raedt, L. D., and Raiko, T. 2006. Logical hidden Markov models. Journal of Artificial Intelligence Research 25, 2006.
  • Mansinghka et al. [2009] Mansinghka, V. K., Roy, D. M., Jonas, E., and Tenenbaum, J. B. 2009. Exact and approximate sampling by systematic stochastic search. In 12th Intl. Conf. on A. I. and Statistics. 400–407.
  • Michels et al. [2013] Michels, S., Hommersom, A., Lucas, P. J., Velikova, M., and Koopman, P. 2013. Inference for a new probabilistic constraint logic. In 23rd Intl. Joint Conf. on Artificial Intelligence. 2540–2546.
  • Milch et al. [2008] Milch, B., Zettlemoyer, L. S., Kersting, K., Haimes, M., and Kaelbling, L. P. 2008. Lifted probabilistic inference with counting formulas. In 23rd AAAI Conf. on Artificial Intelligence. 1062–1068.
  • Moldovan et al. [2013] Moldovan, B., Thon, I., Davis, J., and Raedt, L. D. 2013. MCMC estimation of conditional probabilities in probabilistic programming languages. In ECSQARU. 436–448.
  • Nampally and Ramakrishnan [2015] Nampally, A. and Ramakrishnan, C. R. 2015. Constraint-based inference in probabilistic logic programs. In 2nd Intl. Workshop on Probabilistic Logic Programming. 46–56.
  • Nampally and Ramakrishnan [2016] Nampally, A. and Ramakrishnan, C. R. 2016. Inference in probabilistic logic programs using lifted explanations. In 32nd Intl. Conf. on Logic Programming, ICLP 2016 TCs. 15:1–15:15.
  • Nitti et al. [2016] Nitti, D., De Laet, T., and De Raedt, L. 2016. Probabilistic logic programming for hybrid relational domains. Machine Learning 103, 3, 407–449.
  • Poole [2003] Poole, D. 2003. First-order probabilistic inference. In 18th Intl. Joint Conf. on A. I. 985–991.
  • Riguzzi [2011] Riguzzi, F. 2011. MCINTYRE: A Monte Carlo algorithm for probabilistic logic programming. In CILC. 25–39.
  • Riguzzi and Swift [2011] Riguzzi, F. and Swift, T. 2011. The PITA system: Tabling and answer subsumption for reasoning under uncertainty. Theory and Practice of Logic Programming (TPLP) 11, 4-5, 433–449.
  • Sarna-Starosta and Ramakrishnan [2007] Sarna-Starosta, B. and Ramakrishnan, C. R. 2007. Compiling constraint handling rules for efficient tabled evaluation. In Practical Aspects of Declarative Languages (PADL). 170–184.
  • Sato and Kameya [1997] Sato, T. and Kameya, Y. 1997. PRISM: A language for symbolic-statistical modeling. In 15th Intl. Joint Conf. on Artificial Intelligence. 1330–1339.
  • Sato and Kameya [2001] Sato, T. and Kameya, Y. 2001. Parameter learning of logic programs for symbolic-statistical modeling. Journal of Artificial Intelligence Research 15, 391–454.
  • Shachter and Peot [1990] Shachter, R. D. and Peot, M. A. 1990. Simulation approaches to general probabilistic inference on belief networks. In 5th Annual Conf. on Uncertainty in Artificial Intelligence. 221–234.
  • Swift and Warren [2010] Swift, T. and Warren, D. S. 2010. Tabling with answer subsumption: Implementation, applications and performance. In Logics in Artificial Intelligence: 12th European Conference (JELIA). 300–312.
  • Tamaki and Sato [1986] Tamaki, H. and Sato, T. 1986. OLD resolution with tabulation. In Proceedings of the Third international conference on logic programming. Springer, 84–98.
  • Van den Broeck et al. [2011] Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., and De Raedt, L. 2011. Lifted probabilistic inference by first-order knowledge compilation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence. 2178–2185.

Appendix A Proofs

Proposition (Closure properties).

OSDDs are closed under conjunction and disjunction operations.

Proof.

Let and be two OSDDs.

Let denote either or , then by the definition of ordering is preserved. Depending on the ordering of the OSDDs, has three cases. If (resp. then is constructed by leaving the root and edge lables intact at (resp. . In this case urgency, mutual exclusion, and completeness are all preserved since (resp. ) is an OSDD and the root and its edge labels are unchanged.

If urgency is preserved since are the constructed edges of and individually these and satisfied urgency. If we take two distinct edge constraints and it is the case that since either or and both and . Let be the grounding substitution of that is compatible with constraint formula labeling the path to the node . To prove completeness, we note that . Therefore, . ∎

Proposition.

Let and be two OSDDs, then

Proof.

When , then . But .

Thus, we consider the case where . Both ground explanation graphs have the same root, therefore the ground explanations in are obtained by combining subtrees connected which have the same edge label. Given grounding substitution on that is compatible with the constraint formula labeling the path from root to the node under consideration, if some value is such that it satisfies and for specific , then in , , therefore the same subtrees are combined. ∎

Proposition (Condition for Measurability).

A satisfiable constraint formula is measurable w.r.t all of its variables if and only if it saturated.

Proof.

First we prove that saturation is a sufficient condition for measurability.

The proof is by induction on the number of variables in . When the proposition holds since the only satisfiable constraint formulas with a single variable are for some or formulas of the form for some distinct set of values . Clearly the formulas are measurable w.r.t .

Assume that the proposition holds for saturated constraint formulas with variables. Now consider a satisfiable constraint formula with variables which is saturated. Let . Consider the graph obtained by removing and all edges incident on from the constraint graph of . It represents a saturated constraint formula with variables. This is because for any three variables distinct from , if then are connected by an “” edge. Similarly, if , then are connected by an “” edge. Further for any variable other than , if is the set of variables connected to by “” edges, then there exists edges between each pair of these nodes. This is due to the definition of saturation which is satisfied by .

But, by inductive hypothesis is measurable w.r.t each of its variables. Now consider computing the measure of in . If is connected to any node with an “” edge, then measure of is 1. If is not connected to any node with an “” edge, then it is either disconnected from other nodes or connected to them by only “” edges. In either case is computed by subtracting the number of nodes connected to by “” edges from the domain.

To prove that saturation is a necessary condition we use proof by contradiction. Assume there exists a measurable constraint formula which is not saturated. Then there exists a variable and a set which is the set of nodes connected to the node for by “” edge and for some pair of elements , there is no edge between them. Since we take closure of “” edges, we can assume that . So there must exist two substitutions where and respectively. The number of solutions of under these two substitutions is clearly different, which is a contradiction. ∎