Updating Probabilistic Knowledge on Condition/Event Nets using Bayesian Networks

The paper extends Bayesian networks (BNs) by a mechanism for dynamic changes to the probability distributions represented by BNs. One application scenario is the process of knowledge acquisition of an observer interacting with a system. In particular, the paper considers condition/event nets where the observer's knowledge about the current marking is a probability distribution over markings. The observer can interact with the net to deduce information about the marking by requesting certain transitions to fire and observing their success or failure. Aiming for an efficient implementation of dynamic changes to probability distributions of BNs, we consider a modular form of networks that form the arrows of a free PROP with a commutative comonoid structure, also known as term graphs. The algebraic structure of such PROPs supplies us with a compositional semantics that functorially maps BNs to their underlying probability distribution and, in particular, it provides a convenient means to describe structural updates of networks.



There are no comments yet.


page 1

page 2

page 3

page 4


Uncertainty Reasoning for Probabilistic Petri Nets via Bayesian Networks

This paper exploits extended Bayesian networks for uncertainty reasoning...

Relational Bayesian Networks

A new method is developed to represent probabilistic relations on multip...

Concurrency and Probability: Removing Confusion, Compositionally

Assigning a satisfactory truly concurrent semantics to Petri nets with c...

Probability Bracket Notation, Multivariable Systems and Static Bayesian Networks

Probability Bracket Notation (PBN) is applied to systems of multiple ran...

Probabilistic Inference Using Generators - The Statues Algorithm

We present here a new probabilistic inference algorithm that gives exact...

Unifying Inference for Bayesian and Petri Nets

Recent work by the authors equips Petri occurrence nets (PN) with probab...

Hypothesis Management in Situation-Specific Network Construction

This paper considers the problem of knowledge-based model construction i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Representing uncertain knowledge by probability distributions is the core idea of Bayesian learning. We model the potential of an agent—the observer—interacting with a concurrent system with hidden or uncertain state to gain knowledge through “experimenting” with the system, focussing on the problem of keeping track of knowledge updates correctly and efficiently. Knowledge about states is represented by a probability distribution. Our system models are condition/even nets where states or possible worlds are markings and transitions describe which updates are allowed.

Figure 1: Example: Social network account with location privacy

In order to clarify our intentions we consider an application scenario from social media: preventing inadvertent disclosure, which is the concern of location privacy [7]. Consider the example of a social network account, modelled as a condition/event net, allowing a user to update and share their location (see Figure 1). We consider two users. User 1 does not allow location updates to be posted to the social network, they are only recorded on their device. In the net this is represented by places and modelling the user at corresponding locations, and transitions and for moving between them. We assume that only User 1 can fire or observe these transitions. User 2 has a similar structure for locations and movements, but allows the network to track their location. The user can decide to make their location public or hide it by firing transition or . Any observer can attempt to fire or to query the current location of User 2. If  is marked, this will allow the observer to infer the correct location. Otherwise the observer is left uncertain as to why the query fails, i.e. due to the wrong location being tested or the lack of permission, unless they test both locations. While our net captures the totality of possible behaviours, we identify different observers, the two users, the social network, and an unrelated observer. For each of these we define which transitions they can access. We then focus on one observer and only allow transitions they are authorised for. In our example, if we want to analyse the unrelated observer, we fix the users’ locations and privacy choices before it is the observer’s turn to query the system.

The observer may have prior knowledge about the dependencies between the locations of Users 1 and 2, for example due to photos with location information published by User 2, in which both users may be identifiable. The prior knowledge is represented in the initial probability distribution, updated according to the observations.

We also draw inspiration from probabilistic databases [27, 1] where the values of attributes or the presence of records are only known probabilistically. However, an update to the database might make it necessary to revise the probabilities. Think for instance of a database where the gender of a person (male or female) is unknown and we assume with probability that they are male. Now a record is entered, stating that the person has married a male. Does it now become more probable that the person is female?

Despite its simplicity, our system model based on condition/event nets allows us to capture databases: the content of a database can be represented as a (hyper-)graph (where each record is a (hyper-)edge). If the nodes of the graph are fixed, updates can be represented by the transitions of a net, where each potential record is represented by a place.

Given a net, the observer does not know the initial marking, but has a prior belief, given by a probability distribution over markings. The observer can try to fire transitions and observe whether the firing is successful or fails. Then the probability distribution is updated accordingly. While the update mechanism is rather straightforward, the problem lies in the huge number of potential states: we have markings if is the number of places.

To mitigate this state space explosion, we propose to represent the observer’s knowledge using Bayesian networks (BNs) [21, 23]

, i.e., graphical models that record conditional dependencies of random variables in a compact form. However, we encounter a new problem as updating the observer’s knowledge becomes non-trivial. To do this correctly and efficiently, we develop a compositional approach to BNs based on symmetric monoidal theories and PROPs

[19]. In particular, we consider modular Bayesian networks as arrows of a freely generated PROP and (sub-)stochastic matrices as another PROP with a functor from the former to the latter. In this way, we make Bayesian networks compositional and we obtain a graphical formalism [26] that we use to modify Bayesian networks: in particular, we can replace entire subgraphs of Bayesian networks by equivalent ones, i.e., graphs that evaluate to the same matrix. The compositional approach allows us to specify complex updates in Bayesian networks by a sequence of simpler updates using a small number of primitives.

We furthermore describe an implementation and report promising runtime results.

The proofs of all results can be found in Appendix A.

2 Knowledge Update in Condition/Event Nets

We will formalise knowledge updates by means of an extension of Petri nets with probabilistic knowledge on markings. The starting point are condition/event nets [25].

Definition 1 (Condition/event net).

A condition/event net (CN) is a five-tuple consisting of a finite set of places , a finite set of transitions with pre-conditions , post-conditions , and an initial marking. A marking is any subset of places . We assume that for any , .

A transition can fire for a marking , denoted , if and . Then marking is transformed into , written . We write to indicate that there exists some with .

We will use two different notations to indicate that a transition cannot fire, the first referring to the fact that the pre-condition is not sufficiently marked, the second stating that there are tokens in the post-condition: whenever and whenever . We denote the set of all markings by .

For simplicity we assume that for . Then, a marking

can be characterized by a boolean vector

, i.e., . Using the vector notation we write for if all places in are marked in .

To capture the probabilistic observer we augment CNs by a probability distribution over markings modelling uncertainty about the hidden initial or current marking.

Definition 2 (Condition/Event net with Uncertainty).

A Condition/Event Net with Uncertainty (CNU) is a six-tuple where is a net as in Definition 1. Additionally, is a function with that assigns a probability mass to each possible marking. This gives rise to a probability space with defined by .

We assume that , i.e. the initial marking is possible according to .

We model the knowledge gained by observers when firing transitions and observing their outcomes. Firing can either result in success (all places of are marked and no place in is marked) or in failure (at least one place of is empty or one place in is marked). Thus, there are two kinds of failure, the absence of tokens in the pre-condition or the presence of tokens in the post-condition. If a transition fails for both reasons, the observer will learn only one of them. To model the knowledge gained we define the following operations on distributions. inlineinlinetodo: inline R1: must none of the places of be marked for t to be successful? It seems like the fail_post condition implies this, but it is not stated as a success condition.
Ben: [DONE] I think it is explicitly stated in the previous sentences and we should not change anything

Definition 3 (Operations on CNUs).

Given a CNU the following operations update the mass function and as a result the probability distribution .

  • To assert that certain places all contain a token () or that none contains a token () we define the operation assert

  • To state that at least one place of a set does (resp. does not) contain a token we define operation negative assert

  • Modifying a set of places such that all places contain a token () or none contains a token () requires the following operation

  • A successful firing of a transition leads to an assert () and of the pre-conditions and the post-conditions . A failed firing translates to a negative assert () of the pre- or post-condition and nothing is set. Thus we define for a transition

Operations are partial, defined whenever the sum in the denominator of their first clause is greater than . That means, the observer only fires transitions whose pre- and postconditions have a probability greater than zero, i.e., where according to their knowledge about the state it is possible that these transitions are enabled. By Definition 1 the initial marking is possible, and this property is maintained as markings and distributions are updated. If this assumption is not satisfied, the operations in Definition 3 are undefined.

The and operations result from conditioning the input distribution on (not) having tokens at (compare Proposition 4). Also, and for can be performed iteratively, i.e., and . For a single place we have .

Figure 2 gives an example for a Petri net with uncertainty and explains how the observer can update their knowledge by interacting with the net.

– places –  init
Figure 2: Example of operations on a net with uncertainty. We set and assume the observer first fires (and succeeds) and then tries to fire (and fails). Columns in the table represent updated distributions on the markings after each operation (ordered from left to right). For this example, in the end the observer knows that the final configuration is with probability .
inlineinlinetodo: inline R1: Figure 2: last column of the table seems wrong, should the 1 be in the row corresponding to 001 (instead of 101)?
Ben: [DONE] He/She is right. I moved the 1 to the correct spot
inlineinlinetodo: inline R1: Also, it would be nice to explicitly state each column [in Fig. 2] is the result of applying the operation in the header to the column immediately to the left.
Ben: [DONE] I changed one sentence in the caption of Fig. 2

We can now show that our operations correctly update the probability assumptions according to the observations of the net.

Proposition 4.

Let be a CNU where is the corresponding probability distribution. For given and let , , and . Then, provided that , respectively are non-empty, it holds for that

inlineinlinetodo: inline R1: it’s clear what is meant from context, but I don’t think the notation was ever defined.
Ben: [DONE] I added the notation in Definition 1. Please check.

We shall refer to the the joint distribution (over all places) by 

. Note that it is unfeasible to explicitly store it if the number of places is large. To mitigate this problem we use a Bayesian network with a random variable for each place, recording dependencies between the presence of tokens in different places. If such dependencies are local, the BN is often moderate in size and thus provides a compact symbolic representation. However, updating the joint distribution of BNs is non-trivial. To address this problem, we propose a propagation procedure based on a term-based, modular representation of BNs.

3 Modular Bayesian Networks and Sub-Stochastic Matrices

Bayesian networks (BNs) are a graphical formalism to reason about probability distributions. They are visualized as directed, acyclic graphs with nodes random variables and edges dependencies between them. inlineinlinetodo: inline R1:“the probability distribution of Figure 2” – Figure 2 has many probability distributions; perhaps it’s clearer to refer explicitly to ’init’
Ben: [DONE] I removed the sentence “An example BN, encoding the probability distribution of Figure 2 is given in Figure 5.” completely and in Figure 5 I added “initial” to distribution.
This is sufficient for static BNs whose most common operation is the inference of (marginalized or conditional) distributions of the underlying joint distribution.

For a rewriting calculus on dynamic BNs, we consider a modular representation of networks that do not only encode a single probability vector, but a matrix, with several input and output ports. The first aim is compositionality: larger nets can be composed from smaller ones via sequential and parallel composition, which correspond to matrix multiplication and Kronecker product of the encoded matrices. This means, we can implement the operations of Section 2 in a modular way.

PROPs with Commutative Comonoid Structure

We now describe the underlying compositional structure of (modular) BNs and (sub-)stochastic matrices, which facilitates a compositional computation of the underlying probability distribution of (modular) BNs. The mathematical structure are PROPs [19] (see also [12, Chapter 5.2]), i.e., strict symmetric monoidal categories whose objects are (in bijection with) the natural numbers, with monoidal product as (essentially) addition, with unit . The compositional structure of PROPs can be intuitively represented using string diagrams with wires and boxes (see Figure 3). Symmetries serve for the reordering of wires. 11todo: 1B: What is ?

Figure 3:

String diagrammatic composition (resp. tensor) of two arrows

, (resp. , ) of a PROP

A paradigmatic example is the PROP of -dimensional Euclidean spaces and linear maps, equipped with the tensor product: the tensor product of - and -dimensional spaces is -dimensional, composition of linear maps amounts to matrix multiplication, and the tensor product is also known as Kronecker product (as detailed below). We refer to the natural numbers of the domain and codomain of arrows in a PROP as their type; thus, a linear map from - to -dimensional Euclidean space has type .

We shall have the additional structure on symmetric monoidal categories that was dubbed graph substitution in work on term graphs [6], which amounts to a commutative comonoid structure on PROPs.

Definition 5 (PROPs with commutative comonoid structure).

A CC-structured PROP is a tuple where is a PROP and the last two components are arrows and , which are subject to Equations 4 (cf. Figure 4).



Table 1: Axioms for CC-structured PROPs and definition of operators of higher arity
Figure 4: Comultiplication and counit arrows
and the equations of commutative comonoids

To give another, more direct definition, the arrows of a freely generated CC-structured PROP can be represented as terms over some set of generators and constants , , , , combined with the operators sequential composition () and tensor () and quotiented by the axioms in Table 1 (see [29]). This table also lists the definition of operators of higher arity. We often refer to the comultiplication  and its counit  as duplicator and terminator, resp. (cf. Figure 4). Roughly, adding the commutative comonoid structure amounts to the possibility to have several or no connections to each one of the output port of gates and input ports. In other words, outputs can be shared.

(Sub-)Stochastic Matrices

We now consider (sub-)stochastic matrices as an instance of a CC-structured PROP. A matrix of type is a matrix of dimension with entries taken from the closed interval . We restrict attention to sub-stochastic matrices, i.e., column sums will be at most ; if we require equality, we obtain stochastic matrices.


We index matrices over , i.e., for , the corresponding entry is denoted by . We use this notation to evoke the idea of conditional probability (the probability that the first index is equal to , whenever the second index is equal to .) When we write as a matrix, the rows/columns are ordered according to a descending sequence of binary numbers ( first, last).

Sequential composition is matrix multiplication, i.e., given , we define , which is a -matrix. The tensor is given by the Kronecker product, i.e., given , we define as where , .

The constants are defined as follows:

In more detail, the constant matrices can be spelled out as follows.

  • is the unique stochastic -matrix, i.e., .

  • is the identity matrix, i.e., iff (otherwise ).

  • iff (otherwise ).

  • iff and (otherwise ).

  • for every .

Proposition 6 ([11]).

(Sub-)stochastic matrices form a CC-structured PROP.

Proof sketch.

It is straightforward to check that (sub-)stochastic matrices satisfy all the axioms in Table 1. On the other hand the result also follows from [11], which interprets Bayesian networks over stochastic maps, a generalization of stochastic matrices in terms of measure theory. ∎

Causality Graphs

We next introduce causality graphs, a variant of term graphs [6], to provide a modular representation of Bayesian networks. Nodes play the role of gates of string diagrams; the main difference to port graphs [12, Chapter 5] is the branching structure at output ports, which corresponds to (freely) added commutative comonoid structure. We fix a set of generators  (a.k.a. signature), elements of which can be thought of as blueprints of gates of a certain type; all generators will be of type , which means that each node can be identified with its single output port while it has a certain number of input ports.

Definition 7 (Causality Graph (CG)).

A causality graph (CG) of type is a tuple where

  • is a set of nodes,

  • is a labelling function that assigns a generator to each node ,

  • where is the source function that assigns a sequence of wires to each node such that if ,

  • is the output function that assigns each output port to a wire.

Moreover, the corresponding directed graph (defined by ) has to be acyclic.

inlineinlinetodo: inline R1: “such that the corresponding DAG of the causality graph is acyclic”. This could be improved. The word "the" is repeated twice. DAGs are by definition acyclic. Perhaps most importantly, the "corresponding DAG" of a causality graph is not defined. I can guess what it means, but it would be nice to see something precise to check my intuition against.
Ben: [MAYBE DONE] I changed the last sentence a bit. However, to properly define the directed graph we mean would need a lot more space, which would be a bit of a waste I think.

By we denote the set of input ports and by the set of output ports. By and we denote the direct predecessors and successors of a node, i.e. and , respectively. By we denote the set of indirect predecessors, using transitive closure. Furthermore denotes the set of all nodes which lie on paths from to .

A wire originates from a single input port or node and each node can feed into several successor nodes and/or output ports. Note that input and output are not symmetric in the context of causality graphs. This is a consequence of the absence of a monoid structure.

We equip CGs with operations of composition and tensor product, identities, and a commutative comonoid structure. We require that the node sets of Bayesian nets are disjoint.111The case of non-disjoint sets can be handled by a suitable choice of coproducts.


Whenever , we define with , , , where is defined as follows and extended to sequences: if and if .


Disjoint union is parallel composition, i.e., with , , , where and are defined as follows: if and if . Furthermore if and if .


Finally the constants and generators are as follows:222A function , where is finite, is denoted by . We denote a function with empty domain by .

, whenever with type

Finally, all these operations lift to isomorphism classes of CGs.

Proposition 8 ([6]).

CGs quotiented by isomorphism form the freely generated CC-structured PROP over the set of generators , where two causality graphs , , are isomorphic if there is a bijective mapping such that and hold for all and holds for all .333We apply to a sequence of wires, by applying pointwise and assuming that for .

Proof sketch.

This follows from the fact that CC-structured PROPs correspond to the gs-monoidal categories (with natural numbers as objects) of [6]. Furthermore CGs are in essence term graphs, where the input ports are called empty nodes. Since [6] shows that term graphs are one-to-one with the arrows of the free gs-monoidal category, our result follows. ∎

In the following, we often decompose a CG into a subgraph and its “context”.

Lemma 9 (Decompositionality of CGs).

Let be a causality graph. Let be a subset of nodes closed with respect to paths, i.e. for all . Then there exist and with for such that , and for all .

Thus, given a set of nodes in a BN that contains all nodes on paths between them, we have the induced subnet of the node set and a suitable “context” such that the whole net can be seen as the result of substition of the subnet into the “context”.

Modular Bayesian Networks

We will now equip the nodes of causality graphs with matrices, assigning an interpretation to each generator. This fully determines the corresponding matrix of the BN. Note that Bayesian networks as PROPs have earlier been studied in [11, 15, 16].

Definition 10 (Modular Bayesian network (MBN)).

A modular Bayesian network (MBN) is a tuple where is a causality graph and an evaluation function that assigns to every generator with a -matrix . An MBN is called an ordinary Bayesian network (OBN) whenever has no inputs (i.e. ),

is a bijection, and every node is associated with a stochastic matrix.

inlineinlinetodo: inline R1: “B is of type 0 -> m, i.e. B has no inputs, out is a bijection…” – the scope of the "i.e." is not clear; one might read it as saying that being of type 0 -> m implies that B has no inputs AND that out is a bijection AND that every node is associated with a stochastic matrix.
Ben: [DONE] Changed word order in the sentence. Now it should be clearer.
inlineinlinetodo: inline R3: move definition 10 a bit earlier : I found the Causality graphs subsection very dense and hardly motivated, also the substochatic matrix section seems to add very little. I think if you turn things around (and potentially take some of this material out) it would read much better.
Ben: Easier said than done. Def. 10 depends on all previous… Maybe we can leave some things out, but I don’t really see anything really unnecessary.
Figure 5: The initial distribution of the CNU from Figure 2 as an MBN.

In an OBN every node corresponds to a random variable and it represents a probability distribution on . OBNs are exactly the Bayesian networks considered in [13].

Example 11.
Figure 5 gives an example of a BN where and . It encodes exactly the probability distribution from Figure 2. Its term representation is where and .

Definition 12 (MBN semantics).

Let be an MBN where the network is of type . The MBN semantics is the matrix with

with where is applied pointwise to sequences.

Intuitively the function assigns boolean values to wires, in a way that is consistent with the input/output values (). For each such assignment, the corresponding entries in the matrices are multiplied. Finally, we sum over all possible assignments.

The semantics is compositional. It is the canonical (i.e., free) extension of the evaluation map from single nodes to the causality graph of an MBN . Here, we rely on two different findings from the literature, namely, the CC-PROP structure of (sub-)stochastic matrices [11] and the characterization of term graphs as the free symmetric monoidal category with graph substition [6]. The formal details can be found in the appendix, see Lemma 25.

4 Updating Bayesian Networks

We have introduced MBNs as a compact and compositional representation of distributions on markings of a CNU. Coming back to the scenario of knowledge update, we now describe how success and failure of operations requested by the observer affect the MBN. We will first describe how the operations can be formulated as matrix operations that tell us which nodes have to be added to the MBN. We shall see that updated MBNs are in general not OBNs, which makes it harder to interpret and retrieve the encoded distribution. However, we shall show that MBNs can efficiently be reduced to OBNs.

Notation: In this section we will use the following notation: first, we will use variants of the operators/matrices , which have a higher arity (see the definitions in Table 1). Furthermore, we will write for and for . By we denote the zero matrix and set . We also introduce as a notation for the matrix  if (respectively if ).

With we denote a square matrix with entries on the diagonal and zero elsewhere. In particular, we will need the sub-stochastic matrices where and .

Given a bit-vector , we will write respectively to denote the -th entry respectively the sub-sequence from position to position . If we define .

CNU Operations on MBNs

In this section we characterize the operations of Definition 3 as stochastic matrices that can be multiplied with the distribution to perform the update. We describe them as compositions of smaller matrices that can easily be interpreted as changes to an MBN. In the following lemmas, is always a stochastic matrix representing the distribution of markings of a CNU. Furthermore, is a set of places and w.l.o.g. we assume that for some (as otherwise we can use permutations that preceed and follow the operations and switch wires as needed).

Starting with the operation (3) recall that it is defined in a way so that the marginal distributions of non-affected places stay the same while the marginals of every single place in report with probability one. The following lemma shows how the matrix for a set operation can be constructed (see Figure 6).

Lemma 13.

It holds that where is if , and otherwise. Moreover, is stochastic.

Figure 6: String diagrams of the updated distributions after , , operations were applied to an initial distribution .

Next, we deal with the operation. Applying it to a distribution is simply a conditioning of on non-emptiness of all places . Intuitively, this means that we keep only entries of for which the condition is satisfied and set all other entries to zero. However, in order to keep the updated a probability distribution, we have to renormalize, which already shows that modelling this operation introduces sub-stochastic matrices to the computation. In the next lemma normalization involves the costly computation of a marginal (the probability that all places in are set to ), however omitting the normalization factor will give us a sub-stochastic matrix and we will later show how such sub-stochastic matrices can be removed, in many cases avoiding the full costs of a marginal computation.

Lemma 14.

It holds that with where is if , and otherwise. We require that . Furthermore if and otherwise.

In contrast to and , the operation couples all involved places in . Asserting that at least one place has no token means that once the observer learns that e.g. one particular place definitely has a token it affects all the other ones. Thus for updating the distribution we have to pass the wires of places through another matrix that removes the possibility of all places containing a token and renormalizes.

Lemma 15.

The following characterization holds: with ( is defined as in Lemma 14). We require that .

22todo: 2 R3: text after Lemma 15 is hard to read, please structure (and format) better
Ben: I don’t see how to do that without giving symbols to the 2x2 matrices…

An analogous result holds for by using .

The previous lemmas determine how to update an MBN to incorporate the changes to the encoded distribution stemming from the operations on the CNU. We denote the updated MBN by with .

For the operation Lemma 13 shows that we have to add a new node and a new generator for each . We set and , and . Similarly, this holds for the operation with the only difference that the associated matrix for each is (cf. Figure 6).

For the operation Lemma 15 defines a usually larger matrix that intuitively couples the random variables for all places in . We cannot simply add a node to the MBN which evaluates to since nodes in the MBN always have to be of type . However, one can show (see Lemma 18) that for each -matrix, there exists an MBN such that . This can then be appended to which has the same affect as appending a single node with the -matrix.

Simplifying MBNs to OBNs

The characterisations of operations above ensure that updated MBNs correctly evaluate to the updated probability distributions. However, rather than OBNs we obtain MBNs where the complexity of updates is hidden in newly added nodes. Evaluating such MBNs is computationally more expensive because of the additional nodes. Below we show how to simplify the MBN, minimising the number of nodes either after each update or (in a lazy mode) after several updates.

As a first step we provide a lemma that will feature in all following simplifications. It states that every matrix can be expressed by the composition of two matrices.

Lemma 16 (Decomposition of matrices).

Given a matrix of type and a set of outputs – without loss of generality we pick – there exist two matrices and such that


which is visualized in Figure 7. Moreover, the matrices can be chosen so that is stochastic and sub-stochastic. If is stochastic can be chosen to be stochastic as well.

We can now deduce the known special case of arc reversal in OBN, stated e.g. in [3].

Figure 7: Schematic string diagram depiction of the decomposition of matrices.
Corollary 17 (Arc reversal in OBNs).

Let be an OBN with and two nodes , where is a direct predecessor of , i.e. . Then there exists an OBN with , evaluating to the same probability distribution, where , if and and