Fusing First-order Knowledge Compilation and the Lifted Junction Tree Algorithm

07/02/2018
by   Tanya Braun, et al.
Universität Lübeck
0

Standard approaches for inference in probabilistic formalisms with first-order constructs include lifted variable elimination (LVE) for single queries as well as first-order knowledge compilation (FOKC) based on weighted model counting. To handle multiple queries efficiently, the lifted junction tree algorithm (LJT) uses a first-order cluster representation of a model and LVE as a subroutine in its computations. For certain inputs, the implementations of LVE and, as a result, LJT ground parts of a model where FOKC has a lifted run. The purpose of this paper is to prepare LJT as a backbone for lifted inference and to use any exact inference algorithm as subroutine. Using FOKC in LJT allows us to compute answers faster than LJT, LVE, and FOKC for certain inputs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/02/2018

Preventing Unnecessary Groundings in the Lifted Dynamic Junction Tree Algorithm

The lifted dynamic junction tree algorithm (LDJT) efficiently answers fi...
07/02/2018

Answering Hindsight Queries with Lifted Dynamic Junction Trees

The lifted dynamic junction tree algorithm (LDJT) efficiently answers fi...
05/13/2014

Lifted Variable Elimination for Probabilistic Logic Programming

Lifted inference has been proposed for various probabilistic logical fra...
02/21/2020

An Advance on Variable Elimination with Applications to Tensor-Based Computation

We present new results on the classical algorithm of variable eliminatio...
05/13/2014

Understanding the Complexity of Lifted Inference and Asymmetric Weighted Model Counting

In this paper we study lifted inference for the Weighted First-Order Mod...
08/19/2012

Lifted Variable Elimination: A Novel Operator and Completeness Results

Various methods for lifted probabilistic inference have been proposed, b...
01/16/2013

Any-Space Probabilistic Inference

We have recently introduced an any-space algorithm for exact inference i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Preliminaries

This section introduces notations and recap LJT. We specify a version of the smokers example (e.g., [van den Broeck et al.2011]), where two friends are more likely to both smoke and smokers are more likely to have cancer or asthma. Parameters allow for representing people, avoiding explicit randvars for each individual.

Parameterised Models

To compactly represent models with first-order constructs, parameterised models use logical variables (logvars) to parameterise randvars, abbreviated PRVs. They are based on work by Poole [Poole2003].

Definition 1.

Let , , and be sets of logvar, factor, and randvar names respectively. A PRV , , is a syntactical construct with and to represent a set of randvars. For PRV , the term denotes possible values. A logvar has a domain . A constraint is a tuple with a sequence of logvars and a set restricting logvars to given values. The symbol marks that no restrictions apply and may be omitted. For some , the term refers to its logvars, to its PRVs with constraints, and to all instances of grounded w.r.t. its constraints.

For the smoker example, let and to build boolean PRVs , , and . We denote by and by . Both logvar domains are . An inequality yields a constraint . refers to all propositional randvars that result from replacing with the tuples in . Parametric factors (parfactors) combine PRVs as arguments. A parfactor describes a function, identical for all argument groundings, that maps argument values to the reals (potentials), of which at least one is non-zero.

Definition 2.

Let be a set of logvars, a sequence of PRVs, each built from and possibly , a function, , and a constraint . We denote a parfactor by . We omit if . A set of parfactors forms a model .

We define a model for the smoker example, adding the binary PRVs and to the ones above. The model reads ,

has eight, to have two, and and four input-output pairs (omitted here). Constraint refers to the constraint given above. The other constraints are . Figure 1 depicts as a graph with five variable nodes and six factor nodes for the PRVs and parfactors with edges to arguments.

The semantics of a model

is given by grounding and building a full joint distribution. With

as the normalisation constant, represents the full joint probability distribution . The QA problem asks for a likelihood of an event, a marginal distribution of some randvars, or a conditional distribution given events, all queries boiling down to computing marginals w.r.t. a model’s joint distribution. Formally, denotes a (conjunctive) query with a set of grounded PRVs and a set of events (grounded PRVs with range values). If , the query is for a conditional distribution. A query for is . We call a singleton query. Lifted QA algorithms seek to avoid grounding and building a full joint distribution. Before looking at lifted QA, we introduce FO jtrees.

Figure 1: Parfactor graph for

Figure 2: FO jtree for (local models in grey)

First-order Junction Trees

LJT builds an FO jtree to cluster a model into submodels that contain all information for a query after propagating information. An FO jtree, defined as follows, constitutes a lifted version of a jtree. Its nodes are parameterised clusters (parclusters), i.e., sets of PRVs connected by parfactors.

Definition 3.

Let be a set of logvars, a set of PRVs with , and a constraint on . Then, denotes a parcluster. We omit if . An FO jtree for a model is a cycle-free graph , where is the set of nodes (parclusters) and the set of edges. must satisfy three properties: (i) : . (ii) : s.t. . (iii) If s.t. , then on the path between and : . The parameterised set , called separator of edge , is defined by . The term refers to the neighbours of node . Each has a local model and : . The ’s partition .

Figure 2 shows an FO jtree for with the following parclusters,

Separators are . As and model the same randvars, names only one. Parfactor appears at but could be in any local model as . We do not consider building FO jtrees here (cf. [Braun and Möller2016] for details).

procedure LJT(Model , Queries , Ev. )
     Construct FO jtree for
     Enter into
     Pass messages on
     for each query  do
         Find subtree for
         Extract submodel from
         Answer on      
Algorithm 1 Outline of the Lifted Junction Tree Algorithm

Lifted Junction Tree Algorithm

LJT answers a set of queries efficiently by answering queries on smaller submodels. Algorithm 1 outlines LJT for a set of queries (cf. [Braun and Möller2016] for details). LJT starts with constructing an FO jtree. It enters evidence for a local model to absorb whenever the evidence randvars appear in a parcluster. Message passing propagates local information through the FO jtree in two passes: LJT sends messages from the periphery towards the center and then back. A message is a set of parfactors over separator PRVs. For a message from node to neighbour , LJT eliminates all PRVs not in separator from and the messages from other neighbours using LVE. Afterwards, each parcluster holds all information of the model in its local model and received messages. LJT answers a query by finding a subtree whose parclusters cover the query randvars, extracting a submodel of local models and outside messages, and answering the query on the submodel. In the original LJT, LJT eliminates randvars for messages and queries using LVE.

LJT as a Backbone for Lifted Inference

LJT provides general steps for efficient QA given a set of queries. It constructs an FO jtree and uses a subroutine to propagate information and answer queries. To ensure a lifted algorithm run without groundings, evidence entering and message passing impose some requirements on the algorithm used as a subroutine. After presenting those requirements, we analyse how LVE matches the requirements and to what extend FOKC can provide the same service.

Requirements

LJT has a domain-lifted complexity, meaning that if a model allows for computing a solution without grounding part of a model, LJT is able to compute the solution without groundings, i.e., has a complexity linear in the domain size of the logvars. Given a model that allows for computing solutions without grounding part of a model, the subroutine must be able to handle message passing and query answering without grounding to maintain the domain-lifted complexity of LJT.

Evidence displays symmetries if observing the same value for instances of a PRV [Taghipour et al.2013]. Thus, for evidence handling, the algorithm needs to be able to handle a set of observations for some instances of a single PRV in a lifted way. Calculating messages entails that the algorithm is able to calculate a form of parameterised, conjunctive query over the PRVs in the separator. In summary, LJT requires the following:

  1. Given evidence in the form of a set of observations for some instances of a single PRV, the subroutine must be able to absorb the evidence independent of the size of the number of instances in the set.

  2. Given a parcluster with its local model, messages, and a separator, the subroutine must be able to eliminate all PRVs in the parcluster that do not appear in the separator in a domain-lifted way.

The subroutine also establishes which kind of queries LJT can answer. The expressiveness of the query language for LJT follows from the expressiveness of the inference algorithm used. If an algorithm answers queries of single randvar, LJT answers this type of query. If an algorithm answers maximum a posteriori (MAP) queries, the most likely assignment to a set of randvars, LJT answers MAP queries. Next, we look at how LVE fits into LJT.

Lifted Variable Elimination

First, we take a closer look at LVE before analysing it w.r.t. the requirements of LJT. To answer a query, LVE eliminates all non-query randvars. In the process, it computes VE for one case and exponentiates its result for isomorphic instances (lifted summing out). Taghipour implements LVE through an operator suite (see [Taghipour et al.2013] for details). Algorithm 2 shows an outline. All operators have pre- and postconditions to ensure computing a result equivalent to one for . Its main operator sum-out realises lifted summing out. An operator absorb handles evidence in a lifted way. The remaining operators (count-convert, split, expand, count-normalise, multiply, ground-logvar) aim at enabling lifted summing out, transforming part of a model.

LVE as a subroutine provides lifted absorption for evidence handling. Lifted absorption splits a parfactor into one part, for which evidence exists, and one part without evidence. The part with evidence then absorbs the evidence by absorbing it once and exponentiating the result for all isomorphic instances. For messages, a relaxed QA routine computes answers to parameterised queries without making all instances of query logvars explicit. LVE answers queries for a likelihood of an event, a marginal distribution of a set of randvars, and a conditional distribution of a set of randvars given events. LJT with LVE as a subroutine answers the same queries. Extensions to LJT or LVE enable even more query types, such as queries for a most probable explanation or MAP [Braun and Möller2018].

function LVE(Model , Query , Evidence )
     Absorb in
     while  has non-query PRVs do
         if PRV fulfils sum-out preconditions then
              Eliminate using sum-out
         else
              Apply transformator               
     return Multiply parfactors in -normalise
procedure FOKC(Model , Queries , Ev. )
     Reduce to WFOMC problem with
     Compile a circuit for ,
     for each query  do
         Compile a circuit for , ,
         Compute through WFOMCs in      
Algorithm 2 Outlines of Lifted QA Algorithms

First-order Knowledge Compilation

FOKC aims at solving a WFOMC problem by building FO d-DNNF circuits given a query and evidence and computing WFOMCs on the circuits. Of course, different compilation flavours exist, e.g., compiling into a low-level language [Kazemi and Poole2016]. But, we focus on the basic version of FOKC with an implementation available. We briefly take a look at WFOMC problems, FO d-DNNF circuits, and QA with FOKC, before analysing FOKC w.r.t. the LJT requirements. See [van den Broeck et al.2011] for details.

Let be a theory of constrained clauses and a positive and a negative weight function. Clauses follow standard notations of (function-free) first-order logic. A constraint expresses, e.g., an (in)equality of two logvars. and assign weights to predicates in . A WFOMC problem consists of computing

where is an interpretation of that satisfies , is the Herbrand base and maps atoms to their predicate. See [van den Broeck2013] for a description of how to transform parfactor models into WFOMC problems.

FOKC converts to be in FO d-DNNF, where all conjunctions are decomposable (all pairs of conjuncts independent) and all disjunctions are deterministic (only one disjunct true at a time). The normal form allows for efficient reasoning as computing the probability of a conjunction decomposes into a product of the probabilities of its conjuncts and computing the probability of a disjunction follows from the sum of probabilities of its disjuncts. An FO d-DNNF circuit represents such a theory as a directed acyclic graph. Inner nodes are labelled with and . Additionally, set-disjunction and set-conjunction represent isomorphic parts in . Leaf nodes contain atoms from . The process of forming a circuit is called compilation.

Now, we look at how FOKC answers queries. Algorithm 2 shows an outline with input model , a set of query randvars , and evidence . FOKC starts with transforming into a WFOMC problem with weight functions and . It compiles a circuit for including . For each query , FOKC compiles a circuit for including and . It then computes

(1)

by propagating WFOMCs in and based on and . FOKC can reuse the denominator WFOMC for all .

Regarding the potential of FOKC as a subroutine for LJT, FOKC does not fulfil all requirements. FOKC can handle evidence through conditioning [van den Broeck and Davis2012]. But, a lifted message passing is not possible in a domain-lifted and exact way without restrictions. FOKC answers queries for a likelihood of an event, a marginal distribution of a single randvar, and a conditional distribution for a single randvar given events. Inherently, conjunctive queries are only possible if the conjuncts are probabilistically independent [Darwiche and Marquis2002], which is rarely the case for separators. Otherwise, FOKC has to invest more effort to take into account that the probabilities overlap. Thus, the restricted query language means that LJT cannot use FOKC for message calculations in general. Given an FO jtree with singleton separators, message passing with FOKC as a subroutine may be possible. FOKC as such takes ground queries as input or computes answers for random groundings, so FOKC for message passing needs an extension to handle parameterised queries. FOKC may not fulfil all requirements, but we may combine LJT, LVE, and FOKC into one algorithm to answer queries for models where LJT with LVE as a subroutine struggles.

Fusing LJT, LVE, and FOKC

We now use LJT as a backbone and LVE and FOKC as subroutines, fusing all three algorithms. Algorithm 3 shows an outline of the fused algorithm named LJTKC. Inputs are a model , a set of queries , and evidence . Each query has a single query term in contrast to a set of randvars in LVE and LJT. The change stems from FOKC to ensure a correct result. Thus, LJTKC has the same expressiveness regarding the query language as FOKC.

The first three steps of LJTKC coincide with LJT as specified in Alg. 2: LJTKC builds an FO jtree for , enters into , and passes messages in using LVE for message calculations. During evidence entering, each local model covering evidence randvars absorbs evidence. LJTKC calculates messages based on local models with absorbed evidence, spreading the evidence information along with other local information. After message passing, each parcluster contains in its local model and received messages all information from and . This information is sufficient to answer queries for randvars contained in and remains valid as long as and do not change. At this point, FOKC starts to interleave with the original LJT procedure.

LJTKC continues its preprocessing. For each parcluster , LJTKC extracts a submodel of local model and all messages received and reduces to a WFOMC problem with theory and weight functions . It does not need to incorporate as the information from is contained in through evidence entering and message passing. LJTKC compiles an FO d-DNNF circuit for and computes a WFOMC on . In precomputing a WFOMC for each parcluster, LJTKC uses that the denominator of Eq. 1 is identical for varying queries on the same model and evidence. For each query handled at , the submodel consists of , resulting in the same circuit and WFOMC .

procedure LJTKC(Model , Queries , Evidence )
     Construct FO jtree for
     Enter into
     Pass messages on LVE as subroutine
     for each parcluster of with local model  do
         Form submodel
         Reduce to WFOMC problem with
         Compile a circuit for
         Compute      
     for each query  do
         Find parcluster where
         Compile a circuit for ,
         Compute
         Compute      
Algorithm 3 Outline of LJTKC

To answer a query , LJTKC finds a parcluster that covers and compiles an FO d-DNNF circuit for and . It computes a WFOMC in and determines an answer to by dividing the just computed WFOMC by the precomputed WFOMC of this parcluster. LJTKC reuses , , and from preprocessing.

Example Run

For , LJTKC builds an FO jtree as depicted in Fig. 2. Without evidence, message passing commences. LJTKC sends messages from parclusters and to parcluster and back. For message from to , LJTKC eliminates from using LVE. For message from to , LJTKC eliminates from using LVE. For the messages back, LJTKC eliminates each time, for message to from and for message to from . Each parcluster holds all model information encoded in its local model and received messages, which form the submodels for the compilation steps. At , the submodel contains and . At , the submodel contains , , and . At , the submodel contains and .

For each parcluster, LJTKC reduces the submodel to a WFOMC problem, compiles a circuit for the problem specification, and computes a parcluster WFOMC. Given, e.g., query randvar , LJTKC takes a parcluster that contains the query randvar, here . It compiles a circuit for the query and , computes a query WFOMC , and divides by to determine . Next, we argue why QA with LJTKC is sound.

Theorem 1.

LJTKC is sound, i.e., computes a correct result for a query given a model and evidence .

Proof sketch.

We assume that LJT is correct, yielding an FO jtree for model , which means, fulfils the three junction tree properties, which allows for local computations based on [Shenoy and Shafer1990]. Further, we assume that LVE is correct, ensuring correct computations for evidence entering and message passing, and that FOKC is correct, computing correct answers for single term queries.

LJTKC starts with the first three steps of LJT. It constructs an FO jtree for , allowing for local computations. Then, LJTKC enters and calculates messages using LVE, which produces correct results given LVE is correct. After message passing, each parcluster holds all information from and in its local model and received messages, which allows for answering queries for randvars that the parcluster contains. At this point, the FOKC part takes over, taking all information present at a parcluster and compiling a circuit and computing a WFOMC, which produces correct results given FOKC is correct. The same holds for the compilation and computations done for query . Thus, LJTKC computes a correct result for given and . ∎

Figure 3: Runtimes [ms] for ; on x-axis: from to ; both axes on log scale; points connected for readability
Figure 4: Runtimes [ms] for ; on x-axis: from to ; both axes on log scale; points connected for readability

Theoretical Discussion

We discuss space and runtime performance of LJT, LVE, FOKC, and LJTKC in comparison with each other.

LJT requires space for its FO jtree as well as storing the messages at each parcluster, while FOKC takes up space for storing its circuits. As a combination of LJT and FOKC, LJTKC stores the preprocessing information produced by both LJT and FOKC. Next to the FO jtree structure and messages, LJTKC stores a WFOMC problem specification and a circuit for each parcluster. Since the implementation of LVE for the cases causes LVE (and LJT) to ground, the space requirements during QA are increasing with rising domain sizes. Since LJTKC avoids the groundings using FOKC, the space requirements during QA are smaller than for LJT alone. W.r.t. circuits, LJTKC stores more circuits than FOKC but the individual circuits are smaller and do not require conditioning, which leads to a significant blow-up for the circuits.

LJTKC accomplishes speeding up QA for certain challenging inputs by fusing LJT, LVE, and FOKC. The new algorithm has a faster runtime than LJT, LVE, and FOKC as it is able to precompute reusable parts and provide smaller models for answering a specific query through the underlying FO jtree with its messages and parcluster compilation. In comparison with FOKC, LJTKC speeds up runtimes as answering queries works with smaller models. In comparison with LJT and LVE, LJTKC is faster when avoiding groundings in LVE. Instead of precompiling each parcluster, which adds to its overhead before starting with answering queries, LJTKC could compile on demand. On-demand compilation means less runtime and space required in advance but more time per initial query at a parcluster. One could further optimise LJTKC by speeding up internal computations in LVE or FOKC (e.g., caching for message calculations or pruning circuits using context-specific information)

In terms of complexity, LVE and FOKC have a time complexity linear in terms of the domain sizes of the model logvars for models that allow for a lifted solution. LJT with LVE as a subroutine also has a time complexity linear in terms of the domain sizes for query answering. For message passing, a factor of , which is the number of parclusters, multiplies into the complexity, which basically is the same time complexity as answering a single query with LVE. LJTKC has the same time complexity as LJT for message passing since the algorithms coincide. For query answering, the complexity is determined by the FOKC complexity, which is linear in terms of domain sizes. Therefore, LJTKC has a time complexity linear in terms of the domain sizes. Even though, the original LVE and LJT implementations show a practical problem in translating the theory into an efficient program, the worst case complexity for liftable models is linear in terms of domain sizes.

The next section presents an empirical evaluation, showing how LJTKC speeds up QA compared to FOKC and LJT for challenging inputs.

Figure 5: Runtimes [ms] for ; on x-axis: from to ; both axes on log scale; points connected for readability
Figure 6: Runtimes [ms] for ; on x-axis: from to ; both axes on log scale; points connected for readability

Empirical Evaluation

This evaluation demonstrates the speed up we can achieve for certain inputs when using LJT and FOKC in conjunction. We have implemented a prototype of LJT, named ljt here. Taghipour provides an implementation of LVE including its operators (available at https://dtai.cs.kuleuven.be/software/gcfove), named lve. [van den Broeck2013] provides an implementation of FOKC (available at https://dtai.cs.kuleuven.be/software/wfomc), named fokc. For this paper, we integrated fokc into ljt to compute marginals at parclusters, named ljtkc. Unfortunately, the FOKC implementation does not handle evidence in a lifted manner as described in [van den Broeck and Davis2012]. Therefore, we do not consider evidence as fokc runtimes explode. We have also implemented the propositional junction tree algorithm, named jt.

This evaluation has two parts: First, we test two input models with inequalities to highlight (i) how runtimes of LVE and, subsequently, LJT explode, (ii) how FOKC handles the inputs without the blowup in runtime, and (iii) how LJTKC provides a speedup for those inputs. Second, we test two inputs without inequalities to highlight (i) how runtimes of LVE and LJT compare to FOKC without inequalities and (ii) how LJT enables a fast and stable reasoning. We compare overall runtimes without input parsing averaged over five runs with a working memory of 16GB. lve eliminates all non-query randvars from its input model for each query, grounding in the process. ljt builds an FO jtree for its input model, passes messages, and then answers queries on submodels. fokc forms a WFOMC problem for its input model, compiles a model circuit, compiles for each query a query circuit, and computes the marginals of all PRVs in the input model with random groundings. ljtkc starts like ljt for its input model until answering queries. It then calls fokc at each parcluster to compute marginals of parcluster PRVs with random groundings. jt receives the grounded input models and otherwise proceeds like ljt.

Inputs with Inequalities

For the first part of this evaluation, we test two input models, and a slightly larger model that is an extension of . has two more logvars, each with its own domain, and eight additional PRVs with one or two parameters. The PRVs are arguments to twenty parfactors, each parfactor with one to three inputs. The FO jtree for has six parclusters, the largest one containing five PRVs. We vary the domain sizes from to , resulting in from to and from to . We query each PRV with random groundings, leading to and queries, respectively. For , the queries could be

  • ,

  • ,

  • , and

  • ,

where stands for a domain value of and . Figures 4 and 4 show for and respectively runtimes in milliseconds [ms] with increasing on log-scaled axes, marked as follows:

  • fokc: circle, orange,

  • jt: star, turquoise,

  • ljt: filled square, turquoise,

  • ljtkc: hollow square, light turquoise, and

  • lve: triangle, dark orange.

In Fig. 4, we compare runtimes on the smaller model, , with four queries. For the first two settings, jt is the fastest with a runtime of under ms, while fokc is the slowest with over ms. After the fourth setting, the jt runtime explodes even more and memory errors occur. lve and ljt have shorter runtimes than fokc and ljtkc for the first three settings as well, with ljt being faster than lve due to the smaller submodels for QA. But, runtimes of lve and ljt steadily increase as the groundings become more severe with larger domain sizes. With the seventh setting, both programs have memory errors. fokc and ljtkc show runtimes that increase linearly with domain sizes. Given this small model, ljtkc has minimally faster runtimes than fokc.

For the larger model, , the runtime behaviour is similar as shown in Fig. 4. Due to the larger model, the jt runtimes are already much longer with the first setting than the other runtimes. Again, up to the third setting, lve and ljt perform better than fokc with ljt being faster than lve and from the seventh setting on, memory errors occur. ljtkc performs best from the third setting onwards. ljtkc and fokc show the same steady increase in runtimes as before. ljtkc runtimes have a speedup of a factor from to for compared to fokc. Up to a domain size of (), ljtkc saves around one order of magnitude.

For small domain sizes, ljtkc and fokc perform worst. With increasing domain sizes, they outperform the other programs. While not a part of this evaluation, experiments showed that with an increasing number of parfactors, ljtkc promises to outperform fokc even more, especially with smaller domain sizes (for our setups, to ).

Inputs without Inequalities

For the second part of this evaluation, we test two input models, and , that are both the models from the first part but with receiving an own domain as large as , making the inequality superfluous. Domain sizes vary from to , resulting in from to and from to . Each PRV is a query with random groundings again (without a grounding). Figures 6 and 6 show for and respectively runtimes in milliseconds [ms] with increasing , marked as before. Both axes are log-scaled. Points are connected for readability.

Figures 6 and 6 show that lve and ljt do not exhibit the runtime explosion without inequalities. ljtkc does not perform best as the overhead introduced by FOKC does not pay off as much. In fact, ljt performs best in almost all cases. In both figures, jt is the fastest for the first setting. With the following settings, jt runs into memory problems while runtimes explode. lve has a steadily increasing runtime for most parts, though a few settings lead to shorter runtimes with higher domain sizes. We could not find an explanation for the decrease in runtime for those handful of settings. Overall, lve runtimes rise more than the other runtimes apart from jt. ljtkc exhibits an unsteady runtime performance on the smaller model, though again, we could not find an explanation for the jumps between various sizes. With the larger model, ljtkc shows a more steady performance that is better than the one of fokc. ljtkc is a factor of to faster. fokc and ljt runtimes steadily increase with rising . ljt gains over an order of magnitude compared to fokc. In the larger model, ljt is a factor of to than fokc over all domain sizes.

In summary, without inequalities ljt performs best on our input models, being faster by over an order of magnitude compared to fokc. Though, ljtkc does not perform worst, ljt performs better and steadier. With inequalities, ljtkc shows promise in speeding up performance.

Conclusion

We present a combination of FOKC and LJT to speed up inference. For certain inputs, LJT (with LVE as a subroutine) and FOKC start to struggle either due to model structure or size. LJT provides a means to cluster a model into submodels, on which any exact lifted inference algorithm can answer queries given the algorithm can handle evidence and messages in a lifted way. FOKC fused with LJT and LVE can handle larger models more easily. In turn, FOKC boosts LJT by avoiding groundings in certain cases. The fused algorithm enables us to compute answers faster than LJT with LVE for certain inputs and LVE and FOKC alone.

We currently work on incorporating FOKC into message passing for cases where an problematic elimination occurs during message calculation, which includes adapting an FO jtree accordingly. We also work on learning lifted models to use as inputs for LJT. Moreover, we look into constraint handling, possibly realising it with answer-set programming. Other interesting algorithm features include parallelisation and caching as a means to speed up runtime.

References

  • [Ahmadi et al.2013] Ahmadi, B.; Kersting, K.; Mladenov, M.; and Natarajan, S. 2013. Exploiting Symmetries for Scaling Loopy Belief Propagation and Relational Training. Machine Learning 92(1):91–132.
  • [Apsel and Brafman2011] Apsel, U., and Brafman, R. I. 2011. Extended Lifted Inference with Joint Formulas. In

    UAI-11 Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence

    .
  • [Bellodi et al.2014] Bellodi, E.; Lamma, E.; Riguzzi, F.; Costa, V. S.; and Zese, R. 2014. Lifted Variable Elimination for Probabilistic Logic Programming. Theory and Practice of Logic Programming 14(4–5):681–695.
  • [Braun and Möller2016] Braun, T., and Möller, R. 2016. Lifted Junction Tree Algorithm. In Proceedings of KI 2016: Advances in Artificial Intelligence, 30–42. Springer.
  • [Braun and Möller2018] Braun, T., and Möller, R. 2018. Lifted Most Probable Explanation. In Proceedings of the International Conference on Conceptual Structures, 39–54. Springer.
  • [Chavira and Darwiche2007] Chavira, M., and Darwiche, A. 2007.

    Compiling Bayesian Networks Using Variable Elimination.

    In IJCAI-07 Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2443–2449.
  • [Chavira and Darwiche2008] Chavira, M., and Darwiche, A. 2008. On Probabilistic Inference by Weighted Model Counting. Artificial Intelligence 172(6-7):772–799.
  • [Choi, Amir, and Hill2010] Choi, J.; Amir, E.; and Hill, D. J. 2010. Lifted Inference for Relational Continuous Models. In UAI-10 Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 13–18.
  • [Darwiche and Marquis2002] Darwiche, A., and Marquis, P. 2002. A Knowledge Compilation Map. Journal of Artificial Intelligence Research 17(1):229–264.
  • [Das et al.2016] Das, M.; Wu, Y.; Khot, T.; Kersting, K.; and Natarajan, S. 2016. Scaling Lifted Probabilistic Inference and Learning Via Graph Databases. In Proceedings of the SIAM International Conference on Data Mining, 738–746.
  • [de Salvo Braz2007] de Salvo Braz, R. 2007. Lifted First-order Probabilistic Inference. Ph.D. Dissertation, University of Illinois at Urbana Champaign.
  • [Gogate and Domingos2010] Gogate, V., and Domingos, P. 2010. Exploiting Logical Structure in Lifted Probabilistic Inference. In Working Note of the Workshop on Statistical Relational Artificial Intelligence at the 24th Conference on Artificial Intelligence, 19–25.
  • [Gogate and Domingos2011] Gogate, V., and Domingos, P. 2011. Probabilistic Theorem Proving. In UAI-11 Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, 256–265.
  • [Kazemi and Poole2016] Kazemi, S. M., and Poole, D. 2016. Why is Compiling Lifted Inference into a Low-Level Language so Effective? In IJCAI-16 Statistical Relational AI Workshop.
  • [Lauritzen and Spiegelhalter1988] Lauritzen, S. L., and Spiegelhalter, D. J. 1988. Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society. Series B: Methodological 50:157–224.
  • [Milch et al.2008] Milch, B.; Zettelmoyer, L. S.; Kersting, K.; Haimes, M.; and Kaelbling, L. P. 2008. Lifted Probabilistic Inference with Counting Formulas. In AAAI-08 Proceedings of the 23rd Conference on Artificial Intelligence, 1062–1068.
  • [Poole and Zhang2003] Poole, D., and Zhang, N. L. 2003. Exploiting Contextual Independence in Probabilistic Inference. Jounal of Artificial Intelligence 18:263–313.
  • [Poole2003] Poole, D. 2003. First-order Probabilistic Inference. In IJCAI-03 Proceedings of the 18th International Joint Conference on Artificial Intelligence.
  • [Shenoy and Shafer1990] Shenoy, P. P., and Shafer, G. R. 1990. Axioms for Probability and Belief-Function Propagation. Uncertainty in Artificial Intelligence 4 9:169–198.
  • [Singla and Domingos2008] Singla, P., and Domingos, P. 2008. Lifted First-order Belief Propagation. In AAAI-08 Proceedings of the 23rd Conference on Artificial Intelligence, 1094–1099.
  • [Taghipour and Davis2012] Taghipour, N., and Davis, J. 2012. Generalized Counting for Lifted Variable Elimination. In Proceedings of the 2nd International Workshop on Statistical Relational AI, 1–8.
  • [Taghipour et al.2013] Taghipour, N.; Fierens, D.; Davis, J.; and Blockeel, H. 2013. Lifted Variable Elimination: Decoupling the Operators from the Constraint Language. Journal of Artificial Intelligence Research 47(1):393–439.
  • [van den Broeck and Davis2012] van den Broeck, G., and Davis, J. 2012. Conditioning in First-Order Knowledge Compilation and Lifted Probabilistic Inference. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, 1961–1967.
  • [van den Broeck and Niepert2015] van den Broeck, G., and Niepert, M. 2015. Lifted Probabilistic Inference for Asymmetric Graphical Models. In AAAI-15 Proceedings of the 29th Conference on Artificial Intelligence, 3599–3605.
  • [van den Broeck et al.2011] van den Broeck, G.; Taghipour, N.; Meert, W.; Davis, J.; and Raedt, L. D. 2011. Lifted Probabilistic Inference by First-order Knowledge Compilation. In IJCAI-11 Proceedings of the 22nd International Joint Conference on Artificial Intelligence.
  • [van den Broeck2013] van den Broeck, G. 2013. Lifted Inference and Learning in Statistical Relational Models. Ph.D. Dissertation, KU Leuven.
  • [Vlasselaer et al.2016] Vlasselaer, J.; Meert, W.; van den Broeck, G.; and Raedt, L. D. 2016. Exploiting Local and Repeated Structure in Dynamic Baysian Networks. Artificial Intelligence 232:43–53.
  • [Zhang and Poole1994] Zhang, N. L., and Poole, D. 1994. A Simple Approach to Bayesian Network Computations. In Proceedings of the 10th Canadian Conference on Artificial Intelligence, 171–178.