Representing uncertain knowledge by probability distributions is the core idea of Bayesian learning. We model the potential of an agent—the observer—interacting with a concurrent system with hidden or uncertain state to gain knowledge through “experimenting” with the system, focussing on the problem of keeping track of knowledge updates correctly and efficiently. Knowledge about states is represented by a probability distribution. Our system models are condition/even nets where states or possible worlds are markings and transitions describe which updates are allowed.
In order to clarify our intentions we consider an application scenario from social media: preventing inadvertent disclosure, which is the concern of location privacy . Consider the example of a social network account, modelled as a condition/event net, allowing a user to update and share their location (see Figure 1). We consider two users. User 1 does not allow location updates to be posted to the social network, they are only recorded on their device. In the net this is represented by places and modelling the user at corresponding locations, and transitions and for moving between them. We assume that only User 1 can fire or observe these transitions. User 2 has a similar structure for locations and movements, but allows the network to track their location. The user can decide to make their location public or hide it by firing transition or . Any observer can attempt to fire or to query the current location of User 2. If is marked, this will allow the observer to infer the correct location. Otherwise the observer is left uncertain as to why the query fails, i.e. due to the wrong location being tested or the lack of permission, unless they test both locations. While our net captures the totality of possible behaviours, we identify different observers, the two users, the social network, and an unrelated observer. For each of these we define which transitions they can access. We then focus on one observer and only allow transitions they are authorised for. In our example, if we want to analyse the unrelated observer, we fix the users’ locations and privacy choices before it is the observer’s turn to query the system.
The observer may have prior knowledge about the dependencies between the locations of Users 1 and 2, for example due to photos with location information published by User 2, in which both users may be identifiable. The prior knowledge is represented in the initial probability distribution, updated according to the observations.
We also draw inspiration from probabilistic databases [27, 1] where the values of attributes or the presence of records are only known probabilistically. However, an update to the database might make it necessary to revise the probabilities. Think for instance of a database where the gender of a person (male or female) is unknown and we assume with probability that they are male. Now a record is entered, stating that the person has married a male. Does it now become more probable that the person is female?
Despite its simplicity, our system model based on condition/event nets allows us to capture databases: the content of a database can be represented as a (hyper-)graph (where each record is a (hyper-)edge). If the nodes of the graph are fixed, updates can be represented by the transitions of a net, where each potential record is represented by a place.
Given a net, the observer does not know the initial marking, but has a prior belief, given by a probability distribution over markings. The observer can try to fire transitions and observe whether the firing is successful or fails. Then the probability distribution is updated accordingly. While the update mechanism is rather straightforward, the problem lies in the huge number of potential states: we have markings if is the number of places.
, i.e., graphical models that record conditional dependencies of random variables in a compact form. However, we encounter a new problem as updating the observer’s knowledge becomes non-trivial. To do this correctly and efficiently, we develop a compositional approach to BNs based on symmetric monoidal theories and PROPs. In particular, we consider modular Bayesian networks as arrows of a freely generated PROP and (sub-)stochastic matrices as another PROP with a functor from the former to the latter. In this way, we make Bayesian networks compositional and we obtain a graphical formalism  that we use to modify Bayesian networks: in particular, we can replace entire subgraphs of Bayesian networks by equivalent ones, i.e., graphs that evaluate to the same matrix. The compositional approach allows us to specify complex updates in Bayesian networks by a sequence of simpler updates using a small number of primitives.
We furthermore describe an implementation and report promising runtime results.
The proofs of all results can be found in Appendix A.
2 Knowledge Update in Condition/Event Nets
We will formalise knowledge updates by means of an extension of Petri nets with probabilistic knowledge on markings. The starting point are condition/event nets .
Definition 1 (Condition/event net).
A condition/event net (CN) is a five-tuple consisting of a finite set of places , a finite set of transitions with pre-conditions , post-conditions , and an initial marking. A marking is any subset of places . We assume that for any , .
A transition can fire for a marking , denoted , if and . Then marking is transformed into , written . We write to indicate that there exists some with .
We will use two different notations to indicate that a transition cannot fire, the first referring to the fact that the pre-condition is not sufficiently marked, the second stating that there are tokens in the post-condition: whenever and whenever . We denote the set of all markings by .
For simplicity we assume that for . Then, a marking
can be characterized by a boolean vector, i.e., . Using the vector notation we write for if all places in are marked in .
To capture the probabilistic observer we augment CNs by a probability distribution over markings modelling uncertainty about the hidden initial or current marking.
Definition 2 (Condition/Event net with Uncertainty).
A Condition/Event Net with Uncertainty (CNU) is a six-tuple where is a net as in Definition 1. Additionally, is a function with that assigns a probability mass to each possible marking. This gives rise to a probability space with defined by .
We assume that , i.e. the initial marking is possible according to .
We model the knowledge gained by observers when firing transitions
and observing their outcomes. Firing can
either result in success (all places of
are marked and no place in is marked)
or in failure (at least one place of is
empty or one place in is marked). Thus, there are two kinds of failure, the absence
of tokens in the pre-condition or the presence of tokens in the
post-condition. If a transition fails for both reasons, the observer
will learn only one of them.
To model the knowledge gained we define the following operations on distributions.
R1: must none of the places of be marked for t to be successful? It seems like the fail_post condition implies this, but it is not stated as a success condition.
Ben: [DONE] I think it is explicitly stated in the previous sentences and we should not change anything
Definition 3 (Operations on CNUs).
Given a CNU the following operations update the mass function and as a result the probability distribution .
To assert that certain places all contain a token () or that none contains a token () we define the operation assert
To state that at least one place of a set does (resp. does not) contain a token we define operation negative assert
Modifying a set of places such that all places contain a token () or none contains a token () requires the following operation
A successful firing of a transition leads to an assert () and of the pre-conditions and the post-conditions . A failed firing translates to a negative assert () of the pre- or post-condition and nothing is set. Thus we define for a transition
Operations are partial, defined whenever the sum in the denominator of their first clause is greater than . That means, the observer only fires transitions whose pre- and postconditions have a probability greater than zero, i.e., where according to their knowledge about the state it is possible that these transitions are enabled. By Definition 1 the initial marking is possible, and this property is maintained as markings and distributions are updated. If this assumption is not satisfied, the operations in Definition 3 are undefined.
The and operations result from conditioning the input distribution on (not) having tokens at (compare Proposition 4). Also, and for can be performed iteratively, i.e., and . For a single place we have .
Figure 2 gives an example for a Petri net with uncertainty and explains how the observer can update their knowledge by interacting with the net.
Ben: [DONE] He/She is right. I moved the 1 to the correct spot inlineinlinetodo: inline R1: Also, it would be nice to explicitly state each column [in Fig. 2] is the result of applying the operation in the header to the column immediately to the left.
Ben: [DONE] I changed one sentence in the caption of Fig. 2
We can now show that our operations correctly update the probability assumptions according to the observations of the net.
Let be a CNU where is the corresponding probability distribution. For given and let , , and . Then, provided that , respectively are non-empty, it holds for that
Ben: [DONE] I added the notation in Definition 1. Please check.
We shall refer to the the joint distribution (over all places) by. Note that it is unfeasible to explicitly store it if the number of places is large. To mitigate this problem we use a Bayesian network with a random variable for each place, recording dependencies between the presence of tokens in different places. If such dependencies are local, the BN is often moderate in size and thus provides a compact symbolic representation. However, updating the joint distribution of BNs is non-trivial. To address this problem, we propose a propagation procedure based on a term-based, modular representation of BNs.
3 Modular Bayesian Networks and Sub-Stochastic Matrices
Bayesian networks (BNs) are a graphical formalism to reason about probability distributions. They are visualized as directed, acyclic graphs with nodes random variables and edges dependencies between them.
R1:“the probability distribution of Figure 2” – Figure 2 has many probability distributions; perhaps it’s clearer to refer explicitly to ’init’
Ben: [DONE] I removed the sentence “An example BN, encoding the probability distribution of Figure 2 is given in Figure 5.” completely and in Figure 5 I added “initial” to distribution. This is sufficient for static BNs whose most common operation is the inference of (marginalized or conditional) distributions of the underlying joint distribution.
For a rewriting calculus on dynamic BNs, we consider a modular representation of networks that do not only encode a single probability vector, but a matrix, with several input and output ports. The first aim is compositionality: larger nets can be composed from smaller ones via sequential and parallel composition, which correspond to matrix multiplication and Kronecker product of the encoded matrices. This means, we can implement the operations of Section 2 in a modular way.
PROPs with Commutative Comonoid Structure
We now describe the underlying compositional structure of (modular) BNs and (sub-)stochastic matrices, which facilitates a compositional computation of the underlying probability distribution of (modular) BNs. The mathematical structure are PROPs  (see also [12, Chapter 5.2]), i.e., strict symmetric monoidal categories whose objects are (in bijection with) the natural numbers, with monoidal product as (essentially) addition, with unit . The compositional structure of PROPs can be intuitively represented using string diagrams with wires and boxes (see Figure 3). Symmetries serve for the reordering of wires. 11todo: 1B: What is ?
String diagrammatic composition (resp. tensor) of two arrows, (resp. , ) of a PROP
A paradigmatic example is the PROP of -dimensional Euclidean spaces and linear maps, equipped with the tensor product: the tensor product of - and -dimensional spaces is -dimensional, composition of linear maps amounts to matrix multiplication, and the tensor product is also known as Kronecker product (as detailed below). We refer to the natural numbers of the domain and codomain of arrows in a PROP as their type; thus, a linear map from - to -dimensional Euclidean space has type .
We shall have the additional structure on symmetric monoidal categories that was dubbed graph substitution in work on term graphs , which amounts to a commutative comonoid structure on PROPs.
Definition 5 (PROPs with commutative comonoid structure).
To give another, more direct definition, the arrows of a freely generated CC-structured PROP can be represented as terms over some set of generators and constants , , , , combined with the operators sequential composition () and tensor () and quotiented by the axioms in Table 1 (see ). This table also lists the definition of operators of higher arity. We often refer to the comultiplication and its counit as duplicator and terminator, resp. (cf. Figure 4). Roughly, adding the commutative comonoid structure amounts to the possibility to have several or no connections to each one of the output port of gates and input ports. In other words, outputs can be shared.
We now consider (sub-)stochastic matrices as an instance of a CC-structured PROP. A matrix of type is a matrix of dimension with entries taken from the closed interval . We restrict attention to sub-stochastic matrices, i.e., column sums will be at most ; if we require equality, we obtain stochastic matrices.
We index matrices over , i.e., for , the corresponding entry is denoted by . We use this notation to evoke the idea of conditional probability (the probability that the first index is equal to , whenever the second index is equal to .) When we write as a matrix, the rows/columns are ordered according to a descending sequence of binary numbers ( first, last).
Sequential composition is matrix multiplication, i.e., given , we define , which is a -matrix. The tensor is given by the Kronecker product, i.e., given , we define as where , .
The constants are defined as follows:
In more detail, the constant matrices can be spelled out as follows.
is the unique stochastic -matrix, i.e., .
is the identity matrix, i.e., iff (otherwise ).
iff (otherwise ).
iff and (otherwise ).
for every .
Proposition 6 ().
(Sub-)stochastic matrices form a CC-structured PROP.
We next introduce causality graphs, a variant of term graphs , to provide a modular representation of Bayesian networks. Nodes play the role of gates of string diagrams; the main difference to port graphs [12, Chapter 5] is the branching structure at output ports, which corresponds to (freely) added commutative comonoid structure. We fix a set of generators (a.k.a. signature), elements of which can be thought of as blueprints of gates of a certain type; all generators will be of type , which means that each node can be identified with its single output port while it has a certain number of input ports.
Definition 7 (Causality Graph (CG)).
A causality graph (CG) of type is a tuple where
is a set of nodes,
is a labelling function that assigns a generator to each node ,
where is the source function that assigns a sequence of wires to each node such that if ,
is the output function that assigns each output port to a wire.
Moreover, the corresponding directed graph (defined by ) has to be acyclic.
Ben: [MAYBE DONE] I changed the last sentence a bit. However, to properly define the directed graph we mean would need a lot more space, which would be a bit of a waste I think.
By we denote the set of input ports and by the set of output ports. By and we denote the direct predecessors and successors of a node, i.e. and , respectively. By we denote the set of indirect predecessors, using transitive closure. Furthermore denotes the set of all nodes which lie on paths from to .
A wire originates from a single input port or node and each node can feed into several successor nodes and/or output ports. Note that input and output are not symmetric in the context of causality graphs. This is a consequence of the absence of a monoid structure.
We equip CGs with operations of composition and tensor product, identities, and a commutative comonoid structure. We require that the node sets of Bayesian nets are disjoint.111The case of non-disjoint sets can be handled by a suitable choice of coproducts.
Whenever , we define with , , , where is defined as follows and extended to sequences: if and if .
Disjoint union is parallel composition, i.e., with , , , where and are defined as follows: if and if . Furthermore if and if .
Finally the constants and generators are as follows:222A function , where is finite, is denoted by . We denote a function with empty domain by .
, whenever with type
Finally, all these operations lift to isomorphism classes of CGs.
Proposition 8 ().
CGs quotiented by isomorphism form the freely generated CC-structured PROP over the set of generators , where two causality graphs , , are isomorphic if there is a bijective mapping such that and hold for all and holds for all .333We apply to a sequence of wires, by applying pointwise and assuming that for .
This follows from the fact that CC-structured PROPs correspond to the gs-monoidal categories (with natural numbers as objects) of . Furthermore CGs are in essence term graphs, where the input ports are called empty nodes. Since  shows that term graphs are one-to-one with the arrows of the free gs-monoidal category, our result follows. ∎
In the following, we often decompose a CG into a subgraph and its “context”.
Lemma 9 (Decompositionality of CGs).
Let be a causality graph. Let be a subset of nodes closed with respect to paths, i.e. for all . Then there exist and with for such that , and for all .
Thus, given a set of nodes in a BN that contains all nodes on paths between them, we have the induced subnet of the node set and a suitable “context” such that the whole net can be seen as the result of substition of the subnet into the “context”.
Modular Bayesian Networks
We will now equip the nodes of causality graphs with matrices, assigning an interpretation to each generator. This fully determines the corresponding matrix of the BN. Note that Bayesian networks as PROPs have earlier been studied in [11, 15, 16].
Definition 10 (Modular Bayesian network (MBN)).
A modular Bayesian network (MBN) is a tuple where
is a causality graph and an
evaluation function that assigns to every generator
with a -matrix . An MBN
is called an ordinary Bayesian network (OBN) whenever has no inputs (i.e. ), is a
bijection, and every node is associated with a stochastic matrix.
is a bijection, and every node is associated with a stochastic matrix.
Ben: [DONE] Changed word order in the sentence. Now it should be clearer. inlineinlinetodo: inline R3: move definition 10 a bit earlier : I found the Causality graphs subsection very dense and hardly motivated, also the substochatic matrix section seems to add very little. I think if you turn things around (and potentially take some of this material out) it would read much better.
Ben: Easier said than done. Def. 10 depends on all previous… Maybe we can leave some things out, but I don’t really see anything really unnecessary.
In an OBN every node corresponds to a random variable and it
represents a probability distribution on . OBNs are
exactly the Bayesian networks considered in
Definition 12 (MBN semantics).
Let be an MBN where the network is of type . The MBN semantics is the matrix with
with where is applied pointwise to sequences.
Intuitively the function assigns boolean values to wires, in a way that is consistent with the input/output values (). For each such assignment, the corresponding entries in the matrices are multiplied. Finally, we sum over all possible assignments.
The semantics is compositional. It is the canonical (i.e., free) extension of the evaluation map from single nodes to the causality graph of an MBN . Here, we rely on two different findings from the literature, namely, the CC-PROP structure of (sub-)stochastic matrices  and the characterization of term graphs as the free symmetric monoidal category with graph substition . The formal details can be found in the appendix, see Lemma 25.
4 Updating Bayesian Networks
We have introduced MBNs as a compact and compositional representation of distributions on markings of a CNU. Coming back to the scenario of knowledge update, we now describe how success and failure of operations requested by the observer affect the MBN. We will first describe how the operations can be formulated as matrix operations that tell us which nodes have to be added to the MBN. We shall see that updated MBNs are in general not OBNs, which makes it harder to interpret and retrieve the encoded distribution. However, we shall show that MBNs can efficiently be reduced to OBNs.
Notation: In this section we will use the following notation: first, we will use variants of the operators/matrices , which have a higher arity (see the definitions in Table 1). Furthermore, we will write for and for . By we denote the zero matrix and set . We also introduce as a notation for the matrix if (respectively if ).
With we denote a square matrix with entries on the diagonal and zero elsewhere. In particular, we will need the sub-stochastic matrices where and .
Given a bit-vector , we will write respectively to denote the -th entry respectively the sub-sequence from position to position . If we define .
CNU Operations on MBNs
In this section we characterize the operations of Definition 3 as stochastic matrices that can be multiplied with the distribution to perform the update. We describe them as compositions of smaller matrices that can easily be interpreted as changes to an MBN. In the following lemmas, is always a stochastic matrix representing the distribution of markings of a CNU. Furthermore, is a set of places and w.l.o.g. we assume that for some (as otherwise we can use permutations that preceed and follow the operations and switch wires as needed).
Starting with the operation (3) recall that it is defined in a way so that the marginal distributions of non-affected places stay the same while the marginals of every single place in report with probability one. The following lemma shows how the matrix for a set operation can be constructed (see Figure 6).
It holds that where is if , and otherwise. Moreover, is stochastic.
Next, we deal with the operation. Applying it to a distribution is simply a conditioning of on non-emptiness of all places . Intuitively, this means that we keep only entries of for which the condition is satisfied and set all other entries to zero. However, in order to keep the updated a probability distribution, we have to renormalize, which already shows that modelling this operation introduces sub-stochastic matrices to the computation. In the next lemma normalization involves the costly computation of a marginal (the probability that all places in are set to ), however omitting the normalization factor will give us a sub-stochastic matrix and we will later show how such sub-stochastic matrices can be removed, in many cases avoiding the full costs of a marginal computation.
It holds that with where is if , and otherwise. We require that . Furthermore if and otherwise.
In contrast to and , the operation couples all involved places in . Asserting that at least one place has no token means that once the observer learns that e.g. one particular place definitely has a token it affects all the other ones. Thus for updating the distribution we have to pass the wires of places through another matrix that removes the possibility of all places containing a token and renormalizes.
The following characterization holds: with ( is defined as in Lemma 14). We require that .
Ben: I don’t see how to do that without giving symbols to the 2x2 matrices…
An analogous result holds for by using .
The previous lemmas determine how to update an MBN to incorporate the changes to the encoded distribution stemming from the operations on the CNU. We denote the updated MBN by with .
For the operation Lemma 13 shows that we have to add a new node and a new generator for each . We set and , and . Similarly, this holds for the operation with the only difference that the associated matrix for each is (cf. Figure 6).
For the operation Lemma 15 defines a usually larger matrix that intuitively couples the random variables for all places in . We cannot simply add a node to the MBN which evaluates to since nodes in the MBN always have to be of type . However, one can show (see Lemma 18) that for each -matrix, there exists an MBN such that . This can then be appended to which has the same affect as appending a single node with the -matrix.
Simplifying MBNs to OBNs
The characterisations of operations above ensure that updated MBNs correctly evaluate to the updated probability distributions. However, rather than OBNs we obtain MBNs where the complexity of updates is hidden in newly added nodes. Evaluating such MBNs is computationally more expensive because of the additional nodes. Below we show how to simplify the MBN, minimising the number of nodes either after each update or (in a lazy mode) after several updates.
As a first step we provide a lemma that will feature in all following simplifications. It states that every matrix can be expressed by the composition of two matrices.
Lemma 16 (Decomposition of matrices).
Given a matrix of type and a set of outputs – without loss of generality we pick – there exist two matrices and such that
which is visualized in Figure 7. Moreover, the matrices can be chosen so that is stochastic and sub-stochastic. If is stochastic can be chosen to be stochastic as well.
We can now deduce the known special case of arc reversal in OBN, stated e.g. in .
Corollary 17 (Arc reversal in OBNs).
Let be an OBN with and two nodes , where is a direct predecessor of , i.e. . Then there exists an OBN with , evaluating to the same probability distribution, where , if and and