Real-world systems often contain asymmetric state spaces and do not admit a natural product structure. To model such systems with traditional graphical models such as Bayesian Networks (BNs), including its various static and dynamic variants, a set of distinct measurement variables first need to be elicited. For many asymmetric processes, this is not always easy, natural or defensible to do. Forcing a random variable based description of such systems results in a model that stores lots of redundant structural zeroes within its conditional probability tables (CPTs). For instance, a variable describing the post-operative health condition of a patient does not make sense if the patient died during the operation.
Additionally, asymmetric systems often exhibit context-specific conditional independences where the independence relationship depends on the realisations of the conditioning set. Modifications of the BN have been proposed for embodying context-specific independences (for e.g., Geiger and Heckerman (1996); Boutilier et al. (1996); Poole and Zhang (2003)); often using tree-like adaptations of the graph or the CPTs in the process.
Chain Event Graphs (CEGs) (Collazo et al., 2018) address both these shortcomings of BNs when it comes to asymmetric processes. CEGs are built from event trees which provide an excellent and natural framework to describe the evolution of such processes (Shafer, 1996). Through a series of transformations involving the colouring of its vertices, an event tree is transformed into the more concise graph of a CEG. The topology of the CEG describes the partial or complete symmetries within the different ways in which a process might evolve. This allows reading of conditional independences, causal exploration and moreover, the graphical properties of the CEG can be drawn directly from natural language descriptions provided by domain experts before the tree is populated with probabilities.
Example 1: The hypothetical event tree in Figure 1 shows the different infection strains, treatment alternatives and outcomes for an individual (at vertex ) who shows the first symptoms of a particular infection.
. In this paper, we look at the continuous time DCEG (CT-DCEG) which is inspired by the flexibility offered by semi-Markov processes (SMPs) in modelling non-exponentially distributed holding times at the various states. First introduced as the extended DCEG inBarclay et al. (2015), this class was developed through a special case called the reduced DCEG which was applied to modelling public health interventions (Shenvi and Smith, 2019) and criminal investigations (Bunnin and Smith, 2019).
CT-DCEGs differ from dynamic BNs (DBNs) (Nicholson and Brady, 1994) for the reasons stated above and additionally because the latter models discrete time processes. A continuous time analogue of BNs is given by continuous time BNs (CTBNs) (Nodelman et al., 2002). CTBNs were inspired by Markov processes and represent the dynamics of structured multi-component processes. The holding times at the vertices are restricted to exponential distributions (except in Nodelman et al. (2005b)). The major difference between CT-DCEGs and CTBNs is that the former describes the possible life-histories of a unit within a temporal process while the latter describes a temporal process through the evolution of the components describing the process. Hence CTBNs may have variables co-evolving while in a CT-DCEG, a unit can only be in one state at a time and models the time evolution from that state. Secondly, exact inference in CTBNs is exponential in its number of components and the approximate inference techniques are either (a) variational (Nodelman et al., 2005a; Saria et al., 2007; Cohn et al., 2009), typically using expectation propagation; or (b) stochastic approximations (Fan and Shelton, 2008; El-Hay et al., 2008; Fan et al., 2010); in both cases using approximations that exploit the exponential holding time constraint of such models. And lastly, under plausible assumptions, the transition probabilities and holding times can be modelled independently in a CT-DCEG.
Interestingly, a visionary but currently underdeveloped class of graphical models called the Temporal Nodes BN (TNBN) (Arroyo-Figueroa and Sucar, 1999) have several similarities with the CT-DCEG. Although some properties of propagation in a TNBN are similar in intuition to that in a CT-DCEG, they are presented in a non-technical way with no formal justification. Besides, point temporal evidence cannot be propagated through the TNBN (Galán and Díez, 2002). Another important difference between TNBNs and CT-DCEGs is that the former, as it uses BN semantics, has to assume that each effect is caused by only one of its parents. The CT-DCEG circumvents this problem through its event tree construction and colourings of its vertices.
In Section 2 we review the terminology and framework of the CT-DCEG class. The new inference scheme for CT-DCEGs proposed in this paper showed a gap in the existing literature which we filled in Section 3 by presenting for the first time a static continuous time CEG (CT-CEG) and proved an extension of the standard propagation algorithm (Thwaites et al., 2008) for this class. Our inference scheme for propagating evidence in CT-DCEGs, inspired by the scheme in Kjærulff (1992) for DBNs, is presented in Section 4. This splits the CT-DCEG into three sub-models; two of which have linear time complexity for propagation. While here we apply this to a continuous time setting, this scheme works exactly the same for DCEGs with discrete time domains. Later in Section 5 we explore what we believe to be a highly applicable novel class of models called the mixed CEGs whose vertices are partitioned into a set for which holding times are meaningful and the other where they are not. We briefly discuss how the methods developed in this paper extend to this new class.
2 The Continuous Time Dceg
2.1 Framework and Terminology
A CT-DCEG consists of a graph describing the possible developments of a process. It exploits the symmetries within these evolutions and so provides a concise representation. It also facilitates the exploration of the underlying dependences and associates a well-defined probability model to the graph. Similar to a CEG, a CT-DCEG is constructed from an event tree, albeit one which has a continuous time domain. For clarity we have chosen to use notation as close as possible to Thwaites et al. (2008).
An event tree is a directed graph representing the evolution of a process in continuous time with a (possibly infinite) vertex set and edge set . The set of non-leaf vertices - called situations - is represented by . Without loss of generality, we assume that the time spent by a unit in a situation is dependent on the situation it visits next, say . Hence, this conditional holding time can be associated with edge which goes from to . We denote this by variable . The variable indicates the unconditional holding time in situation .
Symmetries within the event tree are expressed through vertex and edge colourings associated with partitions called stages and clusters. Two situations are in the same stage
when they can be hypothesised to share the same conditional transition distribution. For the purposes of this paper we also require that for situations in the same stage, the edges with the same estimated probability share the same edge label. However, this is not essential. In fact, the edges may be retrospectively labeled to explain the symmetries observed - one of the key features of this family of models. Similarly, two edges are in the samecluster when they share the same holding time distribution. The vertices and edges of the event tree are coloured according to their stage and cluster memberships respectively. Such a coloured tree is referred to as a hued tree. So the stages indicate equivalences in where a unit goes and clusters indicate equivalences about the speed of these transitions. Table 1 gives the non-trivial stages and clusters for the finite event tree in Example 1.
An additional partition called positions defines the vertex set of the resultant CT-DCEG. Two situations in the hued tree are in the same position if and only if the subtrees rooted at these situations are isomorphic to each other where the isomorphism preserves structure and colouring. We denote the set of positions in by .
From a hued tree representation, we obtain a CT-DCEG by coalescing situations in the same position into a single vertex and by collapsing all its leaves into a single sink vertex . Hence . Note that . Only the nodes which are in the same stage but not the same position retain their colouring in the CT-DCEG. Here we focus on CT-DCEGs with a finite representation. However, this is not necessary for our proposed scheme as the graph is first “unrolled” (see Section 2.2) to propagate evidence. This gives our scheme the flexibility of adapting easily if and when the structure of the graph needs to be partially or fully changed at a later time.
Example 2: Suppose we consider reinfection from the same or different strain for individuals who were not hospitalised. Assume that recovery from a particular strain does not offer any added or reduced resistance to that or the other strains. This gives an infinite event tree whose fragments are represented in Figure 1. The repetition in structure and probabilities results in the CT-DCEG in Figure 2 where returns are represented by backward arrows labelled “recovered”.
These backward edges representing a repetition of structure in the underlying hued tree are called cyclic edges. Additionally, we define passage-slices which will play the role of time-slices in this continuous time setting. The first passage-slice is a subgraph of the CT-DCEG starting at its root and following all possible developments of a unit until it either arrives at or up to the vertex from which it traverses along a cyclic edge (e.g. in Figure 2). The subsequent passage slices are a collection of subgraphs of the CT-DCEG such that each subgraph is rooted at a vertex into which a cyclic edge from the preceding passage-slice enters, . The termination of each subgraph is determined as described for above. Thus, the cyclic edges connect the passage-slices. In practice, the time-interval of a passage-slice can be arbitrarily defined. In our example, all the passage-slices are identical.
The event tree notation introduced earlier extends in an obvious way to CT-DCEGs. Transition probabilities in a CT-DCEG can be written as the probability of an event defined using the set of its root-to-sink paths, . The probability of reaching a position is given by where is the union of all paths in passing through . Similarly, the probability of passing through is given by where is the union of all paths in passing through the edge . The holding time density of staying at position for time before transitioning along edge is denoted by . Let and .
2.2 Unrolling a Ct-Dceg
Denote by the set of cyclic edges connecting passage-slice to , . Any DCEG can be “unrolled” into an infinitely large CEG (Collazo and Smith, 2018; Shenvi and Smith, 2019) by connecting its passage-slices with the corresponding cyclic edges. This is analogous to unrolling a DBN.
Denote by a CT-DCEG which has been unrolled from passage-slices to , for . All the edges in are collected into a sink node . The unrolling process may result in multiple sink nodes then merged into a single sink node. Figure 3 shows a pictorial description of this process.
2.3 Semi-Markov Representation
A CT-DCEG can be represented as an SMP (see Section 5 and the relevant appendices of Shenvi and Smith (2019)). Informally, an SMP is a stochastic process where the next state occupied by a unit in state
is determined by the transition probabilities of its embedded Markov chain, and the distribution governing the time spent in stateis determined by the choice of . The positions of a CT-DCEG can be regarded as states in its corresponding SMP. Furthermore, depending on the evidence, only a small subset of nodes of the original CT-DCEG might be relevant. Such a CT-DCEG can be represented by a condensed SMP. The state-transition diagram of the SMP for the CT-DCEG in Example 2 is identical to Figure 2 with the exception that the two edges from position to are merged into a single edge with a holding time distribution that is a mixture of their individual holding time distributions.
3 The Continuous Time Ceg
A CT-CEG is a static variant of the CT-DCEG or equivalently, a continuous time analogue of the discrete time CEG. A CT-CEG is an acyclic event-based graphical model with a total ordering (coming out of its event tree construction) and vertices evolving at possibly varying time granularities - a semi-Markovian approach. It has one sink node to collect its leaves. For simplicity, here we consider time-homogeneous CT-CEGs.
Say that a CT-CEG is completely specified when
are specified. The joint distribution of any events measurable with respect to the-algebra of can be obtained when is completely specified. Let be a path of a sequence of edges, and be a sequence of times at which transitions are made along the edges of . We can write this as a sequence of triples of vertex, edge and time spent at the vertex before going along the edge - for example, . The joint probability of and can be specified as
3.1 Compatible Evidence and Events
Evidence in a BN is typically in the form of instantiations of a subset of its variables. In a CEG, evidence takes the form of intrinsic events occurring (Thwaites et al., 2008). Consider an event given by in a CT-CEG . Call an intrinsic event if the subgraph of , say induced by the root-to-sink paths of are exactly the same as the set of root-to-sink paths contained in . With respect to temporal evidence, here we consider only point evidence. Call the temporal evidence temporally compatible if we know all the transition times for the unit starting from the root of up to a certain depth. If evidence defines an intrinsic event and is temporally compatible, call it compatible evidence. In Section 3.2.1, we consider temporal evidence where we only know the transition time for some specific, non-root vertex or vertices.
3.2 A Propagation Algorithm
For a process described by a CT-CEG , compatible evidence about the temporal evolution of a unit makes the retrospective transition probabilities dependent on the corresponding holding time densities. Assume that the temporal information in gives the transition times at all vertices from the root to the sink of the CT-CEG. While we may not know the exact vertices visited by the unit, we know the time the unit made its th transition. Observing typically revises the probability of not visiting several of the root-to-sink paths, either partially or entirely, to one. These paths or parts thereof can be deleted from the graph of to obtain a condensed representation given by the adapted CT-CEG subgraph called the transporter CT-CEG. Note that where represents the set of paths implied by the compatible evidence . The original staging structure within may be destroyed by the deletion of vertices and edges to obtain . In this sense, only preserves the conditional independences which are still valid after observing . A CT-CEG is called minimal when it contains no two vertices in that have isomorphic subtrees preserving structure and colourings. While this is not essential, we will assume from now that our transporter CT-CEG is minimal.
We describe below a two-pass backward-forward message-passing algorithm which has two main steps: a backward step to calculate the potentials and emphases, and a forward step which revises the transition probabilities. Note that refers to probabilities in and to the updated probabilities in .
Denote by all the edges emanating from vertex . Let . The algorithm proceeds as follows.
For each edge , , set the t-potential and h-potential as
if and zero otherwise. The holding time at is indicated by . Set the t-emphasis and h-emphasis as follows
Now we say that the sink and all the positions in are accommodated.
For an edge , such that all of ’s children are accommodated, set the t-potential and h-potential as
if and zero otherwise. Set the emphases as
Position is accommodated when the potentials and emphases are calculated for all .
For all , the revised conditional transition probabilities are given by
Note that for the edges in the transporter , the holding time densities are invariant under the compatible evidence and are simply imported from the relevant edges in .
A proof of this result is presented in the appendix. Let denote the vertices that have edges terminating in and denote the edges terminating in . The pseudo-code for the above algorithm is given in Algorithm 1. Here, the possible arrival time at position is denoted by . Note that this algorithm is also applicable to the discrete time setting where the holding time densities are replaced by the corresponding probability mass functions.
3.2.1 Incomplete Temporal Evidence
So far we considered evidence where we knew all the arrival times from the root up to the sink. However, it may happen that the evidence contains the arrival times only up to some non-sink vertex . In this scenario, the t-potentials and both the emphases are as defined in Section 3.2. The h-potential is set to one for all vertices for which the transition time from is unknown. For the other vertices, define the h-potential as in Section 3.2. Thus when the transition time for a certain vertex is not known, we revert to the standard CEG propagation algorithm for that vertex. Observe that both these types of temporal evidence automatically remove the possibility of having visited any paths that contain fewer or more vertices than we have arrival times.
In several applications, there may be directed paths of varying lengths from the root to some vertex , denote them by . Evidence might only indicate that the unit has arrived at at some time , following the convention that the arrival time at the root is . In such cases, often the interest is in knowing the probabilities associated with the different paths in . Examples of domains where this may be the case are medicine, law and criminal justice.
To obtain these path probabilities, we first construct the transporter CT-CEG and calculate the revised transition probabilities, denoted by , only using the non-temporal evidence in and the CEG propagation algorithm. For each path , let the random variable indicate the time it takes to get from vertex to in , when the unit goes along path . Then is a convolution of the holding time densities along the edges in . The probability that the unit travelled to from along path is given by
where . The holding time densities are invariant under any evidence observed and the path probability is obtained as
Alternatively, revised transition probabilities could be obtained by iteratively computing the holding time density on edges upstream of given the temporal evidence in . However, this requires integrating over the possible times the unit could have arrived at the intermediate vertices which is a non-trivial task.
4 Dynamic Propagation
For a given CT-DCEG , suppose the evidence pertains to a set of positions contained in passage-slices to , . Assume that is compatible with respect to the model . Then can be split into three models: present, past and future. The construction and propagation scheme within each of these models is described below.
Present model: The present model is given by . Propagating the compatible evidence in this model proceeds exactly as in a static CT-CEG (see Section 3). Note that situations in the event tree of a CT-DCEG which have a directed path between them could be in the same position as the infinitely large subtrees rooted at them could be isomorphic. However, when a CT-DCEG is unrolled and we look at a finite number of passage-slices, they can no longer be in the same position. This is necessary for writing 2 as 3 in the proof in the appendix.
Past model: The unrolled model
gives the past model. Any evidence in the present model also affects the past model. However, this need not be propagated to the past model unless we need to re-estimate the past probability distributions or make inferences about the positions within the past passage-slices.
Passage-slices from the past model can be moved to the present model in a straightforward way for re-estimating the probabilities therein. For instance, for inference on evidence concerning positions in passage-slice , , can be incorporated into the present model as follows. First, the revised past model is given by . Next, the vertices and edges that are not visited with probability one, conditioned on evidence and , are deleted from . Denote this by . For an edge , if , then connect it to the relevant vertex in the present model. This gives us the revised present model. Propagation continues backwards from the root vertices of the original present model to the root vertices of the revised present model.
Future model: The graph of a finite CT-DCEG as it applies to passage-slices , is first adapted to delete all the edges and vertices that will not be visited in future passage-slices with probability one given the evidence . Call this . The conditional transition probabilities at each position are revised as
Recall that the holding time distributions defined along the extant edges in are invariant under observing evidence and can be imported directly from . The adapted CT-DCEG can now be represented by the state-transition diagram of a, possibly condensed, SMP (see Section 2.3). Forecasts concerning probabilities of future events are calculated using the transition matrix of its embedded Markov chain. Additionally, all inferences that can be typically made from a semi-Markov process or a CT-DCEG can still be made in the standard way (Shenvi and Smith, 2019).
The above scheme, although simple, is capable of making a wide range of inferences. While it is analogous to the dynamic inference scheme for DBNs in Kjærulff (1992), movement between the three models is much easier for the CT-DCEG. This is because we do not need to reconstruct a junction tree and propagation is carried out directly on the vertices and edges of the adapted graphs of the concerned model. More importantly, the complexity of propagation in the past and present models is linear in the number of vertices they contain, whereas it is exponential in the maximal clique size in a junction tree. Another key advantage of the CT-DCEG is that additional intrinsic events observed always lead to a simplification of the graph and thereby to efficiency gains.
5.1 A Simple Application
We now revisit Example 2. Suppose that for an individual who has had the infection twice in the past, we observe that they had the infection again, were treated for it and recovered from it for the third time. Suppose we also observe that the individual had the following transition times recorded as the number of days since they recovered last time . For this evidence, the present model is simply given by Figure 8 and propagation in this model requires only 32 operations: 8 t-potentials, 8 h-potentials, 5 t-emphases, 5 h-emphases and 6 revised edge probabilities.
The future model is represented by an SMP whose state-transition diagram is given in Figure 5. The two edges from to in Figure 8 are replaced by a single edge in the SMP. As reinfection from any of the strains is possible, the CT-DCEG for the future model is identical to the original CT-DCEG in Figure 2. Observe that acts as an absorbing state in the SMP. This example is explored in greater detail in the supplementary material.
It is instructive to compare this problem representation with alternative dynamic models: the DBN and the CTBN. A DBN for Example 2 could be represented by the two time-slice DBN in Figure 6(a) where the variables , and represent the strain of the infection, the treatment type and outcome respectively. Given the significant asymmetries in the example, this DBN is an approximation and hides away structural zeros within its CPTs. It also does not graphically represent the lack of treatment options for strain 3 of the infection.
Figure 6(b) shows a CTBN for Example 2. Due to structural zeroes, some of the conditional intensity matrices are null matrices (e.g. for treatment given the third strain of infection). Additionally, as seen from Table 2, the process contains non-exponential holding times which CTBNs were not designed to represent: CTBN propagation algorithms rely on exploiting the exponential nature of the holding times.
5.2 A Mixed Ceg Application
For certain types of transitions, it is not natural to define a holding time at that vertex. For instance, a vertex categorising an individual into different risk categories may not be meaningfully associated with a holding time. Call a CEG a mixed CEG if its vertex set can be partitioned into two mutually exclusive subsets and such that transitions from vertices in are associated with holding times and those from vertices in are not. However, using a mixed CEG is a modelling choice. The modeller may choose to associate such a vertex with, for example, the time it takes a health practitioner to make the categorisation. Additionally, may contain vertices with holding time distributions in discrete as well as continuous time domains. In our experience such mixed systems are more common in the real world than a homogeneously defined one. In fact, the public health applications considered in Shenvi and Smith (2019) are mixed DCEGs although they were not recognised as such. For illustrative purposes, we consider one of these applications to emphasise the usefulness of mixed (D)CEGs.
Example 3: Consider the mixed DCEG in Figure 7 based on a real-world falls intervention (Eldridge et al., 2005). The transitions from , describe categorical events that are not naturally associated with holding times. However, all transitions from the remaining vertices are best described in conjunction with how long it took for such transitions to occur and such descriptions are of clinical importance.
For propagation in a mixed CEG, the h-potentials for edges emanating from vertices in are set to be one. The remaining potentials and emphasis are defined as in Section 3.2. Thus our methodology can be adapted in a straightforward way to the wide range of applications for which mixed CEGs are appropriate.
We presented an evidence propagation scheme for the CT-DCEG - a temporal, dynamic event-based graphical model - within a statistically grounded framework. This scheme is also applicable to all other DCEGs. We also filled the technical gaps in the literature for such a scheme to work by providing a propagation algorithm for a static CT-CEG, and briefly explored the novel class of mixed CEGs.
We have demonstrated how the CT-DCEG gives a better representation of a process when it contains significant asymmetries and when the evolution of the process can be described under some total ordering of the events. For dynamic processes with fewer asymmetries, where co-evolution of components is important and where holding times are known to be exponential, the CTBN should be preferred. On the other hand, the DBN is the ideal and most well-developed modelling tool available when the process meets all the conditions for the CTBN and all the components can be assumed to evolve at the same discrete time granularity.
Finally, in this paper we only looked at certain types of point temporal evidence. Methods to incorporate interval temporal evidence in the CT-(D)CEG are yet to be devised. Additionally, certain types of point and interval temporal evidence can lead to confounding by making all the past subprocesses highly dependent on each other. One example of this is addressed in a very simple way in Section 3.2.1. In case of such evidence we suggest (a) deferral of its inclusion until further evidence is obtained, or (b) use of approximate inference schemes. The latter remains an open problem for this class of models.
The CT-CEG propagation algorithm states that for we have
where the potentials and emphases are as defined in Section 3.2 and is some compatible evidence.
Given that a unit is in some position , the probability of transitioning along an edge emanating from is independent of the holding time at . This gives us
Let denote the event tree underlying the CT-CEG and - a subtree of - denote the event tree underlying the transporter CT-CEG . By the definition of a position, each corresponds to a set of vertices in . Then, for , we can split this into two mutually exclusive subsets: representing the vertices of in and representing the vertices of not in . Additionally, the paths in and in are a union of the paths in and in respectively for . Every represents . Thus, for some represented by , we have
This allows us to write (1), for , as
to be evaluated on the tree . There is no directed path from and for as their subtrees are isomorphic in . Hence we have that . So we can write (2) as
For we have that . Also, notice that writing as a probability over paths of is equivalent to writing it over paths of , for . So we can write (3) as
for any .
The proofs for where , , and follow exactly as given in Thwaites et al. (2008). Additionally, we have for ,
by definition of and by the invariance of the holding time density given any compatible evidence . We use induction to prove that where , .
Step 1: Consider the positions . We have
for any and .
Step 2: Now consider any such that all the vertices into which terminate have . Then we have
However, in a tree, we have that . So we can write as
This completes the proof.
A Propagation in the present model
In this section, we continue our analysis of Example 2 given the evidence in Section 5.1. The potentials and emphases for the present model are given in Tables 3 and 4 respectively. Here refers to the edge associated with strain of the infection, .
The revised transition probabilities are shown in Figure 8. Let paths be given by the sequences of edges