The Reduced Dynamic Chain Event Graph

11/21/2018
by   Aditi Shenvi, et al.
0

In this paper we introduce a new class of probabilistic graphical models called the Reduced Dynamic Chain Event Graph (RDCEG) which is a novel mixture of a Chain Event Graph (CEG) and a semi-Markov process (SMP). It has been demonstrated that many real-world scenarios, particularly in the domain of public health and security, can be modelled as an unfolding of events in the life histories of individuals. Our interest not only lies in the future trajectories of an individual with a specified history and set of characteristics but also in the timescale associated with these developments. Such information is critical in developing suitable interventions and informs the prioritisation of policy decisions. The RDCEG was born out of the need for such a model. It is a coloured graph which inherits useful properties like fast conjugate model selection, conditional independence interrogations and a support for causal interventions from the family of probabilistic graphical models. Its novelty lies in its underlying semi-Markov structure which offers the flexibility of the holding time at each state being any arbitrary distribution. We demonstrate this new decision support system with a simulated intervention to reduce falls in the elderly.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/17/2018

An N Time-Slice Dynamic Chain Event Graph

The Dynamic Chain Event Graph (DCEG) is able to depict many classes of d...
08/29/2017

Unifying DAGs and UGs

We introduce a new class of graphical models that generalizes Lauritzen-...
10/03/2021

Hierarchical Causal Analysis of Natural Languages on a Chain Event Graph

Various graphical models are widely used in reliability to provide a qua...
10/22/2021

Temporal Point Process Graphical Models

Many real-world objects can be modeled as a stream of events on the node...
06/29/2020

Constructing a Chain Event Graph from a Staged Tree

Chain Event Graphs (CEGs) are a recent family of probabilistic graphical...
11/11/2018

Unifying Gaussian LWF and AMP Chain Graphs to Model Interference

An intervention may have an effect on units other than those to which th...
10/22/2018

Properties of an N Time-Slice Dynamic Chain Event Graph

A Dynamic Chain Event Graph (DCEG) provides a rich tree-based framework ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Chain Event Graphs and their dynamic analogues have now been applied to a number of applications especially in public health and safety [smithassault2018, barclay2015dynamic, collazo2018ntime]

. The CEG is a graphical model which has the useful property, like its cousin the Bayesian Network (BN), that it admits conjugate estimation and closed-form scoring under complete sampling so that model selection across this otherwise enormous class is nevertheless feasible at least for moderate sized problems

[silander2013dynamic, freeman2011bayesian, cowell2014causal]

. Furthermore it has been shown, as for the BN, various conditional independence statements can be read from the topology of the graph of a CEG alone without any reference to its embellishing edge probabilities. This enables any elicitation or explanation of the model class to be translated to and from natural language - a vital property for efficient decision support. This makes it a useful tool to feed conclusions back to the client and also empower them to be involved in the modelling process.

In this paper we define a related class of dynamic models called the Reduced Dynamic Chain Event Graph (RDCEG). This has been designed specifically for a dynamic setting where the corresponding stochastic finite state semi-Markov process contains an absorbing state which can be transitioned to from most of the other states. This dynamic version of the CEG is proving to be particularly useful when we are modelling processes of an open population of people, for example the general public respecting a certain demography and geographical location who might suffer a progressive health condition. Whatever their state of health or treatment, each unit may be removed from the population because of care needs, death or because they move from the region of study.

The RDCEG is closely related to the Extended DCEG (Ex-DCEG) class introduced in [barclay2015dynamic]. Like the Ex-DCEG, the RDCEG is associated with a continuous time semi-Markov process with a finite state space. While the Ex-DCEG need not contain any absorbing states, the RDCEG necessarily contains at least one absorbing state as described above and it is called the immune state. Apart from death, the other reasons an individual might enter the immune state correspond to right-censoring in medical statistics. In clinical trials, the immune state would correspond to the reasons associated with loss to follow up. In this paper, we assume the censoring into the immune state to be uninformative. The main distinguishing factor between the RDCEG and the Ex-DCEG is in the interpretation of what these models represent and the meaning of the conditional independence statements that can be read from them. Note that the development of the Ex-DCEG itself is in its infancy. The methods developed in this paper for the RDCEG class are equally applicable to the Ex-DCEG.

The other dynamic variants of the CEG developed thus far are the vanilla Dynamic CEG (DCEG) [barclay2015dynamic] and the N Time-Slice Dynamic CEG (NT-DCEG) [collazo2018ntime]

. Both these classes correspond to a discrete time Markov process with a finite state space. The implicit assumption is that the amount of time an individual will spend in any state is geometrically distributed. While in several applications this may be a safe assumption, in domains such as public health and security, the amount of time an individual spends in any state is not necessarily geometrically distributed (in the discrete case) or exponentially distributed (in the continuous case). For instance, individuals who are ill may wait different amounts of time to see a doctor and then again different amounts of time to recover after receiving treatment. In case of security, individuals who possess the willingness, ability and motive to carry out an attack are likely to wait for different amounts of time before they act on it

[smithassault2018]. In cases such as these, we lose information when measurements are carried out at regular time intervals [nodelman2002learning]. The RDCEG and Ex-DCEG address these issues by allowing the time spent in different states to be governed by more flexible distributions such as the Weibull or Hypoexponential distributions, and by recording measurements or observations only when an individual makes a transition out of their current state. The absence of regular time measurements leads to there being no naturally occurring time-slice. We address this by developing an analogous concept for continuous time models.

In the next section of this paper, we briefly describe the preliminaries, extend the existing semantics and introduce new concepts to enable smooth explanation of the development of the RDCEG in later sections. In section 3, we formally construct the infinite event tree and define a probability measure over the entire structure. Then building on the more general class of DCEGs and Ex-DCEGs, we introduce the RDCEG. The construction of the RDCEG is described in detail in this section. In section 5, we demonstrate how, given the coloured graph of any RDCEG, we can read conditional independence statements from it. In Section 4

, we proceed to describe how to embellish this structure so that it supports a full Bayesian analysis: how to choose the setting of prior probabilities on its vectors of edge probabilities and the holding time distributions at each stage, along with how to set a prior on the entire model search space. We then demonstrate how conjugate estimation and closed-form model selection can take place by adapting technologies already developed for the vanilla DCEG. We end the paper with a short description of future challenges of the use of such methodologies.

All the methods and techniques developed in this paper are demonstrated using a running example based on an intervention to prevent falls in the elderly based on [eldridge2005modelling]. We show how this class can be used to provide decision support for social care and health policy makers concerned with the safety of elderly people in danger of falling. We show how current knowledge can be embedded into these Bayesian models and how the new graphical model can be used as part of a decision support tool to examine the efficacy of various policies that might be adopted in the light of available data.

2 Preliminaries

In this section we briefly describe the necessary graph theoretic notions and other preliminary concepts originally developed for CEGs and DCEGs, appropriately adapted for the RDCEG. We further define new concepts crucial to the development of the RDCEG. The CEG family of models was introduced in [smith2008conditional] and further developed in [thwaites2015separation, thwaites2010causal, freeman2011bayesian, cowell2014causal]. Dynamic variants of the CEG have been developed in [barclay2015dynamic, collazo2018ntime]. Additionally, for detailed development of event trees see [shafer1996art] and for a discrete multivariate time-series analysis of a dynamic variant of coloured event trees see [freeman2011dynamic].

2.1 Background

Definition 1 (Event Tree)

A graph = (V(), E()) is called an event tree if it is an acyclic, connected, directed graph. The vertices represent events that a unit may traverse through as it passes through the system. There exists a single vertex with no parents, this vertex is the root vertex . All other vertices have exactly one parent. The vertices with no children are called leaves. Let L() be the set of leaves. The non-leaf vertices are called situations. Let S() form the set of situations where S() = V()L(). A tree is said to be infinite if V() and E() are infinite.

The notation does not distinguish between finite and infinite trees. Unless stated otherwise, all the concepts introduced in this section hold true for a finite and an infinite .

A path from vertex to vertex is a sequence of directed edges from to . For an event tree , let be the path in from to , if such a path exists. Let be the set of all possible infinite trajectories starting from in . Let be the set of children of in

. The associated random variable

has a sample space of and its realisation gives the situation occupied by the unit after leaving . A situation is called the ancestor of if the path exists. Similarly, is called the descendent of if lies on some path . We write if situation lies downstream of . A floret of situation is defined as where and is the set of edges induced by in .

Example 1

Falls related injuries and fatalities are a serious problem among the elderly. According to NICE guidelines [Nice_2013]

, 30% of people older than 65 and 50% of people older than 80 fall at least once a year. Suppose that a group of researchers are designing an intervention to identify those who are susceptible to falling and to refer them for treatment at a falls clinic. Individuals are assessed and classified as high-risk or low-risk based on the Falls Risk Assessment Tool (FRAT)

[nandy2004development]. High-risk individuals are either referred for treatment, opt to get treatment at their own expense or else don’t get any treatment. For every individual it is recorded whether they’ve had a fall after assessment and in some cases, treatment. The finite event tree describing this intervention is given in figure 1.

In figure 1, is the path depicting the high-risk individuals who are not treated. High-risk individuals (vertex ) are either “referred & treated”, “not referred & treated” or “not treated” (vertices , and respectively). Thus . Vertex is an ancestor of vertex and is its descendent. The vertices and along with the three edges between them make up the floret .

Figure 1: Event tree for the intervention in example 1.
Definition 2 (Labelled path)

For situations and in , where , the ordered sequence of edge labels from to in is called a labelled path and it is denoted by . The labelled path from the root to may also be written as .

The labelled path in the event tree in figure 1 from situation to leaf , denoted by or simply , is given by the ordered sequence (“High-risk”, “Referred & treated”, “Don’t fall”).

2.2 Additional definitions

Definition 3 (Immune state)

Units passing through the system may leave the system with a non-zero probability at any time from most if not all of the situations of for a variety of reasons. The collection of leaf vertices depicting such an exit from the system is denoted by the immune state .

Figure 2: Event tree for example 1 explicitly depicting the immune state.
Example 1 (continued)

Our group of researchers have observed that in such interventions, often there is a non-zero probability that individuals may leave the population for reasons such as leaving the country or area of study, becoming bedridden due to health issues, choosing not to be a part of the study anymore or in some cases, dying before the completion of the study. The model of the intervention should therefore represent this information to enable the construction of a decision support system that is representative of the real-world. We assume that individuals can drop out of the population from any situation except for the situation representing risk assessment. Figure 2 shows the event tree for our intervention with edges emanating into the immune state. The vertices , represent the immune state.

The event tree described in definition 1 is based on discrete time. We now present the continuous time analogue of the event tree called the modified event tree. While the immune state contributes to the inferential and causal analysis exercises, the ability to access this state from most of the other states makes its explicit presence in the graph redundant and leads to cluttering. The implications of the immune state are taken into account while learning parameters, answering queries and carrying out interventions or manipulations. For compactness, the immune state and the edges emanating into it are deleted in the construction of the modified event tree. The edge transition probabilities in each floret therefore may not sum up to one.

Definition 4 (Modified Event Tree)

The modified event tree is the continuous time analogue of the discrete time event tree where the vertices not representing a change in the current state of a unit are deleted along with the edges emanating into them. The immune state and the edges emanating into it in are also deleted in .

Example 1 (continued)

The tree in figure 2 would enable us to assess the short-term efficacy of the falls intervention. However, given that falling could be a recurrent event with varying consequences, we’re interested in how this intervention affects the long-term health status of individuals. The researchers define the intervention in more detail to enable such an analysis. They believe there are typically no serious consequences of a fall for a low-risk individual as they are likely to be in a better state of health. The best strategy to ensure that the health of low-risk individuals does not deteriorate after a fall is to reassess their risk after every fall. The consequences of a fall for a high-risk individual could be more serious. It could result in severe complications in which case our intervention is ill-equipped to help them and such individuals are considered to have left the population. Note that they will not be entering the immune state but would be entering an alternative leaf situation instead. Or they could be all right after the fall and would need to be reassessed for treatment.

The infinite event tree in figure 3 represents this intervention completely if it were based on observations being recorded at regular time intervals. As the researchers believe it is more natural to record when an individual has fallen rather than recording for instance, every month that they haven’t had a fall, the underlying structure will be semi-Markovian. We now convert the infinite event tree in figure 3 to the corresponding modified event tree given in figure 4. Note that in both figures, we use the object representation of a subtree of the tree to denote the repetition in structure from some of the nodes.

The modified event tree only records observations when a transition occurs into a new state rather than observations being made at regular time intervals. As observations are only recorded when a change of state occurs, we need to introduce another concept which would tell us how long a unit spends in each state that it traverses through. This is called the holding time. Let the random variable associated with the holding time in situation be given by . This can be further extended to allow , to define the holding time at situation before the unit moves along its th emanating edge. In this paper, we assume that the holding times are time homogeneous.

Definition 5 (Stage)

Two situations and in are said to be in the same stage if and only if there exists a bijection under which

  1. X() and X() have the same distribution given by

  2. E() to E() have the same edge labels given by

where indexes the edges emanating from situations and .

Figure 3: Infinite event tree for the falls intervention denoting the infinite trajectories that could develop from the process. This tree has an underlying Markov structure and explicitly depicts the immune state.
Figure 4: The corresponding modified event tree for the infinite event tree in figure 3. This tree has an underlying semi-Markov structure.

The second condition ensures that the bijection is meaningful in terms of the application concerned. For instance, suppose that and both have two emanating edges with labels “High” and “Low”. If maps the “High” edge from to the “Low” edge from , then it does not make sense to merge these two situations into the same stage even though they have the same distribution under . Note that, if it is appropriate for the application, the second condition can be relaxed to allow mapping of edges which have edge labels conveying similar meaning if not the same. Let the set of stages be given by . Each stage is identified by a unique colour. An event tree is transformed into a staged tree by colouring the situations in the tree according to their stage membership. Due to the homogeneity condition for the holding times, given that a unit has reached stage , we need no further information to predict their holding time in that stage.

The staging defined above clusters situations based on the equivalence of the probability distributions governing the transitions out of these situations. Additionally, we could define another staging structure based on the equivalence of the holding time distributions such that two situations

and are in the same stage if and only if there exists a bijection under which and have the same distribution, where indexes the edges emanating from these situations. In most real-world scenarios, we would expect that the staging associated with transition probabilities does not equal the staging associated with the holding times. Depending on the purpose of modelling and the queries we want to answer, we will consider one staging or the other. For instance, no reference to holding times is made when reading conditional independence statements as seen in section 5.3 and so we would consider the staging structure associated with the transition probabilities for this purpose. If we were instead interested in the time to absorption as described in section 5.4, we would consider the staging of situations associated with the holding time distributions. In this paper, we consider the degenerate case where the staging structures associated with the transition probabilities and the holding times are equal.

Definition 6 (Position)

Two situations and in are said to be in the same position if and only if the staged trees and rooted at and respectively are isomorphic in the sense of isomorphism between coloured graphs.

Trivially, situations in the same position are also in the same stage. However, the converse is not true. Let the set of positions be given by . The vertex set of a CEG or a DCEG consists of the set of positions of its corresponding event tree along with a sink node if the process has any terminating trajectories that don’t end in the immune state. The path a unit passing through the system might traverse is denoted by directed edges between the positions. When two distinct positions and contain situations which belong to the same stage , this is denoted by an undirected edge between and . Positions inherit the colouring of the stage their situations belong to. We omit the colouring if .

In example 1, a certain researcher may hypothesis that the probability of falling for low-risk individuals is the same as that for high-risk individuals who have been referred and treated. Situations and in figure 4 would be in the same stage but not in the same position as the infinite subtrees rooted at and are not isomorphic. The researcher may believe that the situation rooted at the end of the edge labelled “Not referred & treated”, call this situation belongs to the same stage as situations and . From figure 4, we can see that the infinite subtrees rooted at and are isomorphic and hence they’d be in the same position.

3 The Reduced Dynamic Chain Event Graph (RDCEG)

In section 1 the motivation behind the development of the RDCEG class of models was explored. In this section, we formally define the RDCEG class of models, set up conditions for an RDCEG with a finite structure and describe in detail the construction of an RDCEG.

3.1 Finite RDCEG

Definition 7 (Reduced Dynamic Chain Event Graph (RDCEG))

A reduced dynamic chain event graph (RDCEG) = (V(), E()) is a continuous time finite state DCEG, which is a function of its underlying modified event tree , with an underlying semi-Markov structure with at least one absorbing state (i.e the immune state) which can be accessed from most of the other positions.

As an RDCEG has an underlying semi-Markov structure, it allows us to harness the technologies already developed for semi-Markov processes to exploit the flexibility and expressiveness provided by an RDCEG. The superiority of the RDCEG over a semi-Markov process comes through the colouring of the nodes of the RDCEG through which we can learn critical conditional independence information as described in section 5

. The semi-Markov chain associated with an RDCEG is not irreducible as it has at least two communication classes due to existence of the absorbing state.

In this paper we only consider RDCEGs with a finite representation, which we simply call a finite RDCEG. A finite RDCEG, like any other finite DCEG variant, is supported by a corresponding infinite event tree which is periodic [collazo2018ntime]. We introduce three concepts of the invariant subtree, repeating subtrees and periodic infinite tree to describe the structure of the infinite event tree or equivalently, the modified event tree associated with the finite RDCEG. Similar concepts can be found in [collazo2018ntime] focusing on the development of an N Time-Slice DCEG where periodicity exists after the (n-1)st time-slice. These concepts are introduced here again in a different way to adapt to the development of a DCEG variant which does not have an explicit time-slice structure. The same idea of periodicity is implied here through the topology of the subtrees which make up the infinite tree without the need to first embellish the tree with information about the time at which transitions occur.

Definition 8 (Invariant subtree)

The invariant subtree of an infinite tree is either empty or it is a finite subtree which is rooted at , the root of . No subset of vertices V() V() can induce a subtree isomorphic to .

Definition 9 (Repeating subtree)

A repeating subtree of an infinite tree is a non-empty finite subtree such that V() V() that induces a subtree isomorphic to . An infinite tree can have multiple repeating subtrees. We denote the set of repeating subtrees by .

Definition 10 (Periodic infinite tree)

An infinite tree is periodic if and only if it can be constructed from a possibly empty invariant subtree and a finite number of repeating subtrees contained in the set of repeating subtrees such that for each leaf of every subtree one of the following conditions is satisfied

  1. L(), i.e. is also a leaf in , and for any situation in some subtree with , is also a leaf in ,

  2. a repeated subtree is rooted at such that for any situation in some subtree with and additionally if , then for some L(), the same repeated subtree is rooted at ,

where is the set of descendants of .

Using the concepts developed above, we can construct a probability measure on any infinite event tree as described in appendix A.1. For the modified event tree of example 1, the invariant subtree and repeating subtrees are given in figure 5.

Figure 5: The subtrees and correspond to the set of repeating subtrees of figure 4 which has invariant subtree .
Theorem 1

A finite RDCEG is supported by a periodic infinite modified event tree.

Proof. See theorem 1 in [collazo2018ntime].   

The RDCEG being built on a continuous time semi-Markov process does not have a naturally occurring time-slice structure. Therefore we introduce the concepts of process-structures and process-cycles for a tree and an RDCEG.

Definition 11 (Process-structure)

The first process-structure is a tree made of the invariant subtree if it is non-empty and the first repeating subtree structures rooted at the leaves of . If is empty, then is the repeating subtree rooted at the root of . The subsequent process-structures are forests consisting of the first repeating subtree structures rooted at the leaves of the previous process-structure. A process-structure in an RDCEG is given by the partial111Partial because not all leaves of these CEGs will be collected into the sink node of the RDCEG. Only those leaves which represent terminated paths in the underlying modified event tree will be collected into the sink node. CEG(s) of the associated , of its underlying modified event tree .

Note that if , all the process-structures after will be identical, else all process-structures are identical.

Definition 12 (Process-cycle)

A process-cycle gives the labelled paths of all units passing through the process-structure , such that these paths necessarily end at one of the leaves of . A process-cycle in an RDCEG gives the labelled paths of all units passing through the associated of its underlying modified event tree .

Theorem 2

An RDCEG supported by a periodic infinite modified event tree is periodic if and only if the stage structure associated with every process-structure is the same.

Proof. See theorem 2 in [collazo2018ntime].   

3.2 Construction of RDCEGs

As with the CEG and DCEG, the process of construction of an RDCEG begins by eliciting an event tree from the domain expert or from available literature which can be later corroborated by the domain expert. Shafer [shafer1996art]

demonstrated that the topology of a tree is a powerful tool for representing the natural language description of an expert’s view of the evolution of a process. In case the pool of potential variables or situations is very large, standard feature selection methods used in machine learning can be employed to reduce dimension and to retain interpretability among other things

[guyon2003introduction]. The purpose of modelling would also influence the variable selection process. For instance, logic models might be routinely developed for monitoring and evaluating outcomes in a public health intervention. These models inform the primary and secondary outcomes of the intervention and describe the factors hypothesised to be key for the success of the intervention. Such information could be used to select variables which would aid the evaluation of the intervention along similar lines to the logic model.

The choice of modelling with a finite RDCEG implies that the expert opinion is that the probabilistic structure of the process being modelled is periodic. Once the event tree has been constructed, we need to elicit for each situation a probability measure for the holding time in that situation and for the transitions to adjacent situations. Since we believe in the probabilistic periodicity of the structure, we only need to elicit the following information from the domain expert for the situations in the first process-structure to embellish into a probability tree:

  • Since the time spent at each situation may not be the same for each individual, a probability distribution describing the holding time or sojourn time at each situation must be elicited. The holding time distribution gives a probability distribution for the time a unit stays in situation before traversing along one of its emanating edges.

  • Given that a unit arrives in situation , we need the expert to elicit the conditional probability vector (cpv) where denotes the probability of a unit transitioning from situation to a situation along its jth edge, .

If there is no data available, we can begin to colour this tree as per the conditions of staging as described in section 2. If we have data, the expert opinion will inform the prior we set on the holding time and transition probability distributions for each situation in . The data can then be used to update the prior to a posterior by independently learning the holding time and transition probability distributions through conjugate analysis and the stage structure of can be learned using a greedy model selection algorithm as described in section 4. Next we transform the coloured into the associated modified event tree by deleting the edges and vertices which do not represent a transition into a different state as well as the immune state and all the edges directed towards it. The situations in the same position are collapsed into a single vertex and undirected edges are added between situations in the same stage but not in the same position. All paths of which terminate in the underlying modified event tree are collected into a single vertex . The vertex set of the graph thus formed represents the position set of the RDCEG. As a position may contain several situations, a representative situation is chosen from the position. If there exists an edge from to any other situation , a directed edge is drawn from to . Besides the edges described thus far, the RDCEG also contains edges which give rise to the cyclic nature of its structure. Such an edge emanating from some position in would be directed into some position in such that there exists an edge from a situation in to a situation in and in the underlying . This leads to the RDCEG having loops. We call these cyclic edges. This completes the construction of the finite RDCEG.

Example 1 (continued)

Suppose that the researchers would like to observe an RDCEG model based on the symmetries they believe to exist in the trajectories of individuals in the falls intervention population. From domain literature, they deduce that the future trajectories of a high-risk individual falling after receiving treatment through referral is identical to the trajectories of a high-risk individual who pays for their own treatment. This says that situations and where = (“Not referred & treated”) in figure 4 are in the same position and are assigned colour 1. Also, the future trajectories of a high-risk individual who has fallen does not depend on whether or not they had been treated. Situations , and where = (“Not referred & treated”, “Fall”) and = (“Not treated”, “Fall”), are also in the same position and are given colour 2. We first transform this coloured infinite event tree into a coloured modified event tree. Collapsing the vertices in the same position in the first process-structure, adding position to collect terminating paths (not terminating in the immune state ) and adding the cyclic edges as follows - edge from to for “Reassessing for treatment” and edge from to for “Reassessing for risk” - gives us the RDCEG in figure 6.

Figure 6: This is the RDCEG based solely on the opinions of domain experts for example 1.

3.3 Modified infinite CEG

In [collazo2018ntime], the relationship between the NT-DCEG and the CEG class of models is described. Further, it is shown how a dynamic NT-DCEG can be written as an infinite CEG model. This construction enables us to answer queries about how events stretching across consecutive process-structures influence each other. In this section, we will describe how a finite RDCEG can be written as an infinite CEG. The RDCEG being built from its associated modified event tree does not depict transitions into the immune state and vertices not representing a change of state. Similarly, an infinite CEG constructed from an RDCEG will also not depict these transitions. Hence a CEG obtained from an RDCEG is called a modified infinite CEG.

The construction of a modified infinite CEG from an RDCEG is straightforward. The periodicity of the finite RDCEG makes this task easier. Let the modified infinite CEG rolled out up to process-structures be given by . Begin by constructing the CEG . Paths in which terminate into the sink vertex in also terminate in a corresponding sink vertex in . The cyclic edges in the RDCEG do not loop back to upstream vertices in the corresponding . Instead they link to the CEG of their associated repeating subtree where is the forest of the second process-structure of the RDCEG . For a particular cyclic edge in the RDCEG given by , this link exists in the form of an edge between the position in and the root of its associated such that this root corresponds to the position . The construction of the CEG corresponding to the process-structures , follows the same procedure described above where the cyclic edges of the RDCEG link process-structure to , . There is only one sink vertex for the entire CEG and it collects the paths in which also terminate in the RDCEG . Additionally, all the paths of the th process-structure are also collected in .

Figure 7 shows the infinite modified CEG for figure 6 rolled out up to two process-structures.

Figure 7: CEG associated with the RDCEG for example 1 given in figure 6. The rolled out

4 Learning in RDCEGs

In this section we describe the process of parameter estimation and structure learning for an RDCEG. We demonstrate our methods within a Bayesian conjugate learning and estimation framework.

4.1 Parameter estimation

Each stage of an RDCEG has two types of parameters of interest. One of these is associated with the transition probability distributions and the other with the distributions of the holding time in the current stage which is dependent on the stage that is occupied next. For each stage with emanating edges where is the set of stages in the RDCEG , let the transition probability distribution parameters be given by and the holding time distribution be parametrised by . Assuming that all the transition probability and holding time parameters are a priori independent and under complete random sampling, the likelihood of and separate as shown in [barclay2015dynamic]. For an RDCEG we can write this as

(1)

where N is the vector denoting the number of times each edge has been traversed and H denotes the time spent in each stage by the units in the sample. Similarly, the posterior joint density separates too. This shows that we can learn and update the parameters independently.

The transition probability vector is assumed to be Multinomially distributed at each stage . Under local and global parameter independence [freeman2011bayesian], it has a Dirichlet prior which belongs to the conjugate family of distributions. Thus we have

(2)
(3)

where denotes the number of transitions made from stage along its ith edge in the sample and

are the hyperparameters,

.

In survival analysis, survival times are commonly modelled by two-parameter Weibull distributions [cox1984analysis, collett2015modelling]. These distributions are also routinely used to model the lifetime of a product under a semi-Markov framework in reliability engineering [limnios2012reliability]

. A two-parameter Weibull distribution has a scale parameter and a shape parameter. When the shape parameter is known, the scale parameter has an Inverse-Gamma conjugate prior

[fink1997compendium]. The random variable governing the holding time at stage before transitioning along its ith edge is given by and parametrised by . Assuming that the holding time is Weibull distributed with unknown scale parameter , known shape parameter and that all the holding times parameters are a priori independent, the scale parameter

follows an Inverse-Gamma distribution. Here we have

(4)
(5)

where , gives the holding time for each of the units that traverses from stage along its ith edge, and and are the hyperparameters, .

The effect of the hazard rate (known as the failure rate in reliability engineering) is characterised by the value of the shape parameter which can be interpreted as described in [jiang2011study]. If , the hazard rate decreases over time, i.e. most transitions occur along the ith edge soon after entering stage . Similarly, if , the hazard rate increases over time and if , the hazard rate is constant. A Weibull distribution with and mean is an exponential distribution with mean .

4.2 Setting priors

The RDCEG has an underlying infinite event tree given by . We assume that the structure is Markovian in the sense that the staging of the process-structure is only dependent on the staging of the previous process-structure , . We also assume structural and parametric stationarity of the process. This implies that the staging is fixed and independent of . Under these assumptions, we only need to learn the staging of the first process-structure to learn the structure of the RDCEG . The placement of the cyclic edges don’t need to be learned as they are evident from the underlying tree structure.

As described in section 3.2, to construct an RDCEG, we first need to colour its underlying infinite event tree to represent stage memberships and then we transform it into its associated modified event tree before coalescing vertices in the same position to obtain the RDCEG. As stated above, we only need to learn the staging of the first process-structure of the infinite event tree to infer the structure of the entire RDCEG. A periodic tree can be sequentially built up from an invariant subtree and a finite collection of repeating subtrees. If , the RDCEG will necessarily have a part of its structure representing the partial CEG of the modified invariant subtree . All units pass through this part of the RDCEG only once. Whereas the part of the RDCEG associated with the partial CEGs of the repeating subtrees will have units passing through it at least once. This poses a problem when we do not have enough domain knowledge to set up informative priors on the transition probability distributions at each vertex of . As discussed in [barclay2015dynamic], we cannot assign an equal probability to each of the paths in the infinite tree to determine the prior at each vertex using the method described in [freeman2011bayesian] unless all paths eventually terminate in one of the absorbing states, i.e. one of the sink nodes or the immune state .

Since we only need to learn the staging of the first process-structure of the infinite event tree, using the data corresponding to its associated first process-cycle we can learn the stage memberships. What about the rest of the data? Observe that the first-order Markov property and parameter stationarity imply that if and when a unit in process-structure enters the next process-structure , , the system refreshes its memory and the unit has the same probability of traversing different paths in this process-structure as it did in the previous one. Essentially the data corresponding to each process-cycle , can be treated as observations belonging to a distinct independent set of individuals and can be used to obtain a better estimate of the transition probabilities of the situations appearing in the repeating subtrees of . Then setting the transition probability priors for becomes a straightforward exercise.

One way is to use the property of mass conversation as described in [collazo2018chain]. This concept interprets the parameters as phantom counts. We start with an equivalent sample size denoting the total number of phantom units starting at the root. The number of phantom units entering a situation equals the number leaving it. Starting at the root, these phantom units are distributed throughout the tree based on how many units out of the equivalent sample size we expect to see at different branches of the tree. In the lack of sufficient prior information, we may choose to distribute the phantom count equally across the tree. For each situation of let the probability vector with hyperparameter where i indexes the situation. Let be the number of phantom counts that make it to situation . For an equal distribution of across the tree, for .

We now need to set priors for the holding time distributions. As we can see from equation 5, the shape hyperparameter of the Inverse-Gamma distribution is updated using the edge-specific data counts for each . For each situation of , let the corresponding where indexes the situation and represents the jth edge emanating from , we set an Inverse-Gamma distribution with shape parameter , where is the phantom count associated with the ith edge emanating from situation . The scale parameters for the prior Inverse-Gamma distribution of is to set to where is the holding time at stage before moving along its ith edge as estimated by the experts or set to a small value in lack of sufficient information.

In [barclay2015dynamic], the method of setting transition probability and holding time priors for a pre-specified RDCEG structure is demonstrated. They use the concept of equilibrium of the transition probabilities of the underlying embedded Markov chain to determine the values of the hyperparameters for each stage , . The shape parameter is set equal to and the mean given by which is defined for is set to one for the Inverse-Gamma priors on the parameters of the holding time distributions for stage . This method can be employed when we have a set of potential RDCEG structures that we wish to compare. It can also be used when we believe an RDCEG structure to be most probable. Let this be called the prior RDCEG. The priors on the transition probabilities and the holding times can be determined for the prior RDCEG using the above method. This can then be translated to corresponding priors for the situations of the first process-structure . A situation in will inherit the priors of the stage to which it belongs in the prior RDCEG.

Example 1 (continued)

The existing domain literature leads the researchers to believe that there is a greater proportion of high-risk individuals living in communal establishments than in the community. Communal establishments include care homes, nursing homes and hospitals. They believe this is due to several of those who live in communal establishments being unable to live independently in the community because of factors such as their age or frailty. They also observe that an individual living in the community may move to a communal establishment if their health deteriorates to an extent where they are unable to look after themselves or else don’t feel confident in doing so. All high-risk individuals living in communal establishments who receive treatment do so by being referred. To embed this information into the model, we introduce a new variable describing the residence type of the individuals as “Community” or “Communal establishment”. High-risk individuals living in the community who suffer from a fall may now move to a communal establishment as a result of deterioration of their health caused by the fall. We do not consider the movement of individuals from the community to a communal establishment for other reasons. We assume that individuals living in communal establishments cannot go back to living in the community. Apart from the differences mentioned above, the referral and treatment pathways under the intervention are the same for individuals irrespective of their residence type.

The researchers wish to analyse the data they received from the trial for this intervention. Following the advice in [gelman2008weakly], in the absence of a fully informative prior, they use a weakly informative prior that is readily modified with specific prior information once it becomes available. They set an equivalent sample size of 4 to correspond with the maximum number of categories in a situation as recommended by [neapolitan2004learning]. Figure 8 shows a subtree of the first-process structure which is rooted at the situation representing community-dwelling individuals. Table 1 gives the transition probability prior and posterior for each situation . Table 2 gives the prior and posterior on the scale parameter for the Weibull distributions representing the holding time at situation if the next transition is along its jth edge to situations respectively, . The table also gives the known shape parameter for these Weibull distributions.

Figure 8: Subtree of the first-process structure which is rooted at the situation representing community-dwelling individuals.
Situation Prior Posterior
Dir(1, 1) Dir(928, 4858)
Dir(1/4, 1/4, 1/4, 1/4) Dir(634 , 259 , 265 , 53 )
Dir(1/3, 1/3, 1/3) Dir(953 , 3663 , 241 )
Dir(1/12, 1/12, 1/12) Dir(291 , 301 , 42 )
Dir(1/12, 1/12, 1/12) Dir(124 , 118 , 17 )
Dir(1/12, 1/12, 1/12) Dir(203 , 51 , 11 )
Dir(1/6, 1/6) Dir(866 , 87 )
Dir(1/48, 1/48, 1/48, 1/48) Dir(177 , 26 , 52 , 36 )
Dir(1/48, 1/48, 1/48, 1/48) Dir(81 , 9 , 22 , 12 )
Dir(1/48, 1/48, 1/48, 1/48) Dir(115 , 20 , 43 , 25 )
Table 1: Table showing the priors and posteriors for the transition probability parameters for the subtree given in Figure 8.
Edge Shape parameter Prior Posterior
2.5 IG(1/4, ) IG(634 , 65086315.91)
1.8 IG(1/4, ) IG(259 , 1021178.93)
0.5 IG(1/4, ) IG(265 , 735.04)
Table 2: This table gives the shape parameter of the Weibull distributions governing the holding times in situation and the prior and posterior distributions on the scale parameter of these Weibull distributions. The holding time is dependent on the stage currently occupied and the one that will be occupied next. Hence the first column gives the holding time for each edge emanating from , .

For the falls example, if we were to determine the scale parameters of the Inverse-Gamma distributions using the method described in [barclay2015dynamic], we would have to ensure that all the phantom counts were greater than one so that the mean exists. This would result in a very large equivalent sample size and the transition probability priors would no longer be weakly informative. We use the alternative method described above and set where is the shape parameter of the Weibull distribution governing the holding times at each stage before moving along its ith edge.

4.3 Structure learning

Agglomerative Hierarchical Clustering (AHC)

[freeman2011bayesian] and dynamic programming [cowell2014causal, silander2013dynamic] algorithms have been employed to learn CEG structure from data. The only model selection algorithm for a dynamic variant of a CEG is presented in [collazo2018ntime] for an NT-DCEG. In this section, we discuss model selection in the RDCEG. As with a semi-Markov process, the RDCEG models a stochastic process that has two types of sub-processes evolving simultaneously. One of these sub-processes describes the states occupied as the process evolves and the other describes the time evolution of the process in terms of time spent at each state. Construction of CEGs and their dynamic variants can be viewed as a methodology for clustering the different pathways that evolve out a process whose evolution can be described in terms of events. This clustering occurs in the form of stage creation. As noted in Section 2.2, stages can be created to cluster situations based on the equivalence of their transition probability distributions or on the equivalence of the holding time distributions for each edge emanating from these situations. The staging for the transition probabilities and the holding times need not be the same. In this case, the type of analysis will determine the staging that needs to be considered. For instance, it would seem sensible to use the staging associated with the transition probabilities when reading conditional independence statements and to use the staging associated with the holding times to estimate first passage times by using the underlying semi-Markov process.

Recall from Equation 1 that the likelihood separates even in the case of the staging for the transition probabilities and the holding times being different. Assume that the set gives the staging for the transition probabilities and the set gives the staging for the holding times. Assuming all the RDCEG structures are a priori equally likely,, the marginal likelihood for the RDCEG is given by

Using logs of the marginal likelihoods simplifies the use of the Bayes Factor for comparing potential structures. The log Bayes Factor for comparing RDCEG

and is then given by

In Example 1, we assume that the staging of situations according to transition probability distributions corresponds exactly with the staging according to the holding times. In this case, we can learn the RDCEG structure by employing the AHC algorithm on the transition probability distributions that sequentially merges stages which when combined have a local maximum score, thus improving the overall log marginal score of the RDCEG. The RDCEG thus learned for Example 1 is given in Figure 9.

Figure 9: RDCEG structure learned from the data by using the AHC algorithm to obtain staging of the situations based on their transition probability distributions. Positions with the same colour (except for black) are in the same stage.

5 Reasoning in RDCEGs

The popularity of Bayesian Networks (BNs) [korb2010bayesian, neapolitan2004learning, cowell2006probabilistic] can be largely attributed to the availability of techniques to read conditional independence relations from them. Much of this work was due to the d-separation theorem developed by Pearl, Geiger and Verma [verma1990causal, geiger1990d]. However, BNs suffer from two major shortcomings. They are unable to graphically represent asymmetries introduced by structural zeroes and context-specific conditional independencies.

Context-specific information can be incorporated into a BN in different ways as demonstrated by [poole2003exploiting, boutilier1996context] but they are non-graphical and absorb such information in the conditional probability tables. An exception is the modification suggested in [geiger1996knowledge] known as Bayesian Multinets which although is graphical, does not provide a unified graphical representation and becomes impractical when the amount of context-specific information is large. Graphical models have been a powerful tool for effectively feeding back conclusions to clients. When essential information gets absorbed into the statistical tools but cannot be expressed graphically, the client is being disempowered. Chain Event Graphs were developed to represent and analyse such asymmetric information and address both the shortcomings mentioned above. In [smith2008conditional], it is shown that the CEG provides a much richer and flexible framework to model information that cannot be represented by a BN and contains all discrete BNs as a special case. With variables described using concepts of cuts and fine cuts, they demonstrated how to read independence relations from a CEG. In [thwaites2015separation] a separation theorem for a class of CEGs called simple CEGs or sCEGs is proved and it is analogous to the d-separation theorem for BNs.

5.1 Comparison with dynamic variants of BNs

Although techniques for reading conditional independence relationships have been developed and demonstrated for the CEG, analogous techniques for their dynamic variants have largely been unaddressed. The exception to this is the NT-DCEG [collazo2018ntime] for which some of these methods have been extended. In this paper, we develop similar techniques to read conditional independence statements from an RDCEG. More importantly, we discuss the interpretation of such statements for an RDCEG. The probability measure does not need to be added to the graph at this point. It can be substituted by hypotheses made by experts concerning the symmetries in the trajectories of individuals. We only need a stage structure or a hypothesised stage structure to be in place to enable us to read conditional independencies. Besides reading conditional independencies, harnessing the properties of the underlying semi-Markov structure of the RDCEG also allows us to answer queries that are not typically posed to graphical models. This vastly extends the scope of this class of models by merging two well-developed technologies to learn more about the system being modelled. We will first discuss why we are able to read independence statement from the RDCEG and other dynamic variants of the CEG while being unable to do so with the dynamic variants of the BN. This discussion highlights the far-reaching effects of describing the evolution of a process through events rather than random variables.

The dynamic variants of BNs such as Dynamic Bayesian Networks (DBNs) [murphy2002dynamic, ghahramani1998learning] and Continuous Time Bayesian Networks (CTBNs) [nodelman2002learning, nodelman2002continuous] graphically represent the relationship between random variables for processes developing on a longitudinal time scale. DBNs constitute of several component Bayesian Networks over the same variable set and these component BNs are connected by temporal edges to demonstrate the dependence carrying forward from one component BN to the next as the process develops over time. DBNs were initially developed to unify the developments made in BN technologies and traditional time-series analysis [dagum1992dynamic]. CTBNs represent the evolution of a process on a continuous time scale. They were developed to provide a framework to express processes where time evolution is an essential part of the information we want to learn and where continuous time evolution of variables might lead to cyclic relations [nodelman2002continuous]. The shift to the continuous time domain prevented loss by discretisation through time-steps of uniform time granularity in scenarios where the loss of information is thought to be particularly significant [nodelman2002learning].

Due to the BN and its dynamic variants being centred around the description of process evolution through random variables, while we are able to make conditional independence statements for the BN, doing so becomes contrived in the dynamic case. While dealing with random variables, we would like to make conditional independence statements such as is independent of given . Unless the process decomposes into non-communicating sub-processes, even if we can say ( denotes probabilistic independence) at time , we will have that at time will communicate with at some time given from time to time . For a convincing graphical explanation, see [boyen1998tractable] where the authors use a 2 time-slice BN to demonstrate the concept of temporal entanglement. A similar problem is faced by CTBNs [nodelman2002continuous]

. We note that while expressing Kalman Filter models (KFMs) or Hidden Markov models (HMMs) as DBNs, it is common practice to say that the DBN captures all the conditional independence assumptions made by the corresponding KFM or HMM

[murphy1999modelling, ghahramani1998learning, murphy2002dynamic]. In this context, the conditional independence assumptions refer to the conditional independence relations between these variables at different time-slices. For instance, one such statement could be . While this is a valid form of a conditional independence statement, we cannot make any global statements about the full trajectories of the variables - which are the descriptive tools of a DBN - unless the process devolves into independent sub-processes.

The RDCEG and other types of DCEGs do not face the problem of temporal entanglement as the evolution of processes is described through events. The events can be suitably defined according to the query that needs to be answered to enable us to read appropriate conditional independence statements from the DCEG. They also inherit from CEGs the property of graphically expressing contextual conditional independence statements. Note that events in the CEG family are more than simply realisations of the random variables used in the corresponding BN family as they play a pivotal role in providing a real-world description of the trajectories an individual’s life can take as they pass through the system. For instance, in example 1, an event could correspond to fall rates in individuals who don’t receive treatment which would be given by the union of the paths and .

5.2 Defining filtrations

In Appendix A.1, we defined a probability measure over the entire infinite tree by sequentially constructing the tree making use of the invariant subtree and the repeating subtrees, and by Kolmogorov’s Extension theorem. This construction allows us to have a corresponding probability measure over the modified event tree which enables us to define random variables to read conditional independence relations from the topology of any Dynamic CEG. Analogous to the fine cut and cut concepts representing the filtration process of the CEG [smith2008conditional], we define these concepts for an RDCEG. Let refer to the associated modified infinite CEG of an RDCEG rolled out up to process-structures.

Definition 13 (Fine cut)

A collection of positions which is the set of positions in the modified infinite CEG associated with an RDCEG is called a fine cut if all terminating and non-terminating paths of the RDCEG explicitly shown in the topology of the CEG pass through at least one .

Notice that while the fine cut talks about all root-to-leaf paths in a vanilla CEG, the analogous concepts for an RDCEG described through its associated modified infinite CEG rolled out up to some process-structures talk about the terminating and non-terminating paths of the underlying RDCEG that are explicitly shown in the graph. We are thus excluding the paths that terminate in the immune state and those that terminate in some position by never transitioning out of position . This is because we are interested in the evolution of individuals while still alive and present in the system. Units that leave the system by entering the immune state and units that enter some non-absorbing position but never transition out of it are not considered to be evolving and hence do not inform the development of the process directly.

Definition 14 (Cut)

A collection of stages which is the set of stages in the modified infinite CEG associated with an RDCEG is called a cut if all terminating and non-terminating paths of the RDCEG explicitly shown in the topology of the CEG pass through at least one for some .

Let a fine cut and a cut be orthogonal if all the positions and stages comprising the fine cut and cut respectively correspond to the same random variable within that process-structure. For instance, all the stages in a particular cut corresponding to the random variable of Treatment within a particular process-structure.

5.3 Reading conditional independence statements

In this subsection, we demonstrate how to construct random variables using a cut. These methods can be used in a similar way for fine cuts. The random variables of interest to us are those that are measurable with respect to the of the modified tree as they partition the atoms of into events. We define three random variables using cuts that help us construct conditional independence statements. Let be a cut in a modified infinite CEG of an RDCEG in the th process-structure where . This cut need not be orthogonal. Let be the random variable associated with the parents of each for every . The state space of is the set . Thus the realisation of tells us which stage is occupied by the unit. Let be the random variable associated with all the paths for each for every in the modified infinite CEG . The modified infinite CEG is rolled out up to process-structures and thus the future trajectories with respect to of a unit occupying a stage in the th process-structure given by will describe the evolution of that unit up to process-structures. This definition also includes the possible trajectory in which the unit transitions to some position downstream of and stays there indefinitely. Thus is the random variable denoting all the finite-length future trajectories from up to process-structures which do not terminate in the absorbing immune state. This gives us the following conditional independence statement

(6)

where this conditioning statement implicitly assumes that the unit did not enter the immune state and did not stay indefinitely in any position upstream of the positions forming the stages of . The unit has reached some position for some and transitions out of to some such that and is not the immune state. While we also assume that the future trajectory of a unit up to process-structures after reaching some given by the random variable does not include the possibility of it entering the immune state, the independence statement would hold even if this possibility is included in the definition of . Thus this independence statement says that once we know the stage occupied by a unit along a cut , additional information about how it reached there does not provide any additional insight in predicting what may happen to the unit up to a finite number of steps ahead (precisely up to process-structures ahead of ) once it leaves .

In the dynamic setting, if a process develops into a number of independent sub-processes such that the positions constituting these sub-processes do not communicate with one another after splitting from some common ancestor , then we can read additional conditional independence statements as described below. This corresponds to being able to read conditional independence relations from a DBN or CTBN only when the process devolves into non-communicating sub-processes.

Suppose for a RDCEG the underlying process develops into two distinct non-communicating sub-processes and after splitting from the most recent common ancestor . Let be the random variable associated with position such that its state space is given by which is the set of children of in . As splits into two non-communicating sub-processes, , without counting the suppressed edge going into the immune state and possibly a suppressed self-loop. Let be a random variable whose state space consists of all the finite and infinite paths rooted at a child of given by . Similarly, define random variable . Let denote the random variable associated with all the paths which is the set of all paths from the root position to position of the RDCEG . As we did earlier, care must be taken while interpreting the conditional independence statements by considering the immune state and the suppressed self-loops. Thus, given that a unit reached position without entering the immune state (which is implied as the immune state is absorbing by construction) and given where the unit transitions to from position and assuming that this transition is not into the immune state, we can say that the future evolution of the path that will be traversed by such a unit is independent of the path it took to reach position . This conditional independence statement is written as

(7)

This is analogous to the independence statement described in statement 6. As sub-processes and develop independently after separating from position , we only need to know the position occupied by a unit after transitioning from position to make certain statements about its future evolution. Thus given that a unit which has reached position transitions to one of its children in the RDCEG , i.e. does not transition into the immune state , the events that are attributable to the evolution of the sub-process after leaving position and those attributable to the evolution of the sub-process after leaving position are independent of each other. This is given by

(8)

Finally, we will briefly describe the distinction between the two broad categories conditional independence statements fall under while reading from the topology of the RDCEG rather than its rolled out modified infinite CEG . Recall that while the RDCEG is finite, the modified event tree supporting it is infinite and periodic. This implies that we can construct its underlying modified event tree from an invariant subtree and a finite number of repeating subtrees. Each repeating subtree can be coloured to form a staged tree and then can be transformed into a CEG by collapsing nodes in the same position and adding an additional position which collects all the leaves of . Conditional independence statements can be read from this CEG as described in [smith2008conditional]. Irrespective of how a unit reaches a position such that there exists a situation for , the same conditional independence relations that can be read from the CEG of apply to this unit in the RDCEG . This forms the first category of conditional independence statements. The other category refers to the type of conditional independence statements that involve the entire topology of the RDCEG. The independence statements 7 and 8 belong to this second category.

5.4 Answering queries with reference to the underlying semi-Markov structure

Graphical models (such as Bayesian Networks, Chain graphs and Markov random fields) and stochastic processes (such as Markov processes, renewal processes and semi-Markov processes) have for the most part developed separately. While the former is typically more concerned with independence structures and causal pathways, the latter describes the evolution of a stochastic process over time. The RDCEG is a probabilistic graphical model and it inherits several useful properties of CEGs and DCEGs. It also has an underlying semi-Markov structure and this enables us to use technologies developed for semi-Markov processes to answer queries in the RDCEG. Theorem 1 in [barclay2015dynamic] proves that an Ex-DCEG where the set of positions and the set of stages are identical, and no vertex has two edges emanating from it into the same child, is a semi-Markov process. While this is an interesting result, it is more useful to show how any RDCEG (or Ex-DCEG) can be written as a semi-Markov process which has a state space containing only the vertices which help us to answer queries of interest. This also provides a compact representation of the RDCEG.

In [collazo2018ntime], it is demonstrated how to write an NT-DCEG as a Markov chain with a state space given by the roots of the repeating subtrees of the process-structure forest , . This representation is particularly useful when we want to answer probabilistic queries regarding the stage in the RDCEG occupied by an individual in the future. A semi-Markov process has two simultaneously evolving sub-processes, one concerning the state occupied by the individual and the second concerned with the time spent by the individual in each state. The semi-Markov representation helps us answer similar queries of which stage the individual will occupy in the future. We can also answer queries of how long it takes to get to a particular stage in the RDCEG. We will explore this in the form of a first-passage query for Example 1. First we shall formally introduce a semi-Markov process.

Definition 15 (Semi-Markov process)

Consider a stochastic process Z = on a discrete state space S. The position of the process at the th transition is given by X = , the jump times by T = and the holding time in before transition to by H = where . We call Z a semi-Markov process if the following condition holds

(9)

where S and . The sequence defined by the tuples is called a Markov renewal process.

From Definition 15, a semi-Markov process is constructed from a Markov renewal process which is defined by its renewal kernel and its initial distribution where [grabski2014semi, grabski2016concept]. The th entry of the renewal kernel is given by

(10)

The process is semi-Markov because the process described by is Markovian but the one described by depends on the current state as well as the state occupied next. Note that the holding time can instead be defined to depend on the current state and the previous state if that is appropriate for the process being modelled. Another important point to note about semi-Markov processes is one we infer from Equation 9 which implies that the state next occupied by the process is dependent only on the current state and must not be influenced by the time spent in that or any other state. This may be an inappropriate for some situations. For instance, if we are modelling post-surgery recovery process of individuals using a semi-Markov process, those who spend a long time in the intensive care unit (ICU) before being moved to the general ward is likely to influence the health state they occupy next. However, this type of modelling scenario can be accounted for in the RDCEG model. In the post-surgery recovery example, individuals can be clustered based on covariates that are considered to have the highest influence on their recovery after moving out of the ICU. Each cluster forms a stage and each stage has different holding time distributions and a different transition probability associated with it. This shortcoming of semi-Markov processes can be effectively incorporated in the RDCEG through its stages and positions which enhances its desirability as a modelling tool.

Example 1 (continued)

Suppose that we are given that a community-dwelling individual is currently at a high-risk of falling. The researchers would like to know the probability and the expected time it will take them to move to a communal establishment if they move there at all. They do not want to consider the probability of the individual staying in any of the stages indefinitely (e.g. never having a fall) or of the individual moving to the immune state. To answer these questions given the constraints, we must first construct the state transition diagram for the underlying semi-Markov process and embellish it with the necessary information which can be extracted from the RDCEG.

At each stage, the RDCEG provides transition probabilities out of that stage conditional on entering that stage. We want to further refine this to be conditional on the process entering into a new stage and on that stage not being the immune state. Figure 10 shows the state transition diagram of the semi-Markov process which we require to answer the above queries. Table 3 gives the posterior Dirichlet distributions at each state of this semi-Markov process along with their means and the conditional means for each state. Table 4 gives the mean holding time for each edge of the semi-Markov process .

Figure 10: The state transition diagram of the semi-Markov process required to answer the queries presented in Example 1.
Stage Posterior distribution Mean vector Conditional mean vector
Dir(634 , 259 , 265 , 53 ) (0.52, 0.21, 0.22, 0.05) (0.55, 0.22, 0.23)
Dir(415 , 419 , 59 ) (0.46, 0.47, 0.07) (1)
Dir(203 , 51 , 11 ) (0.77, 0.19, 0.04) (1)
Dir(373 , 55 , 117 , 73 ) (0.60, 0.09, 0.19, 0.12) (0.68, 0.10, 0.22)
Table 3: The conditional mean probability vector gives the necessary probabilities for Figure 10. The mean probability vector and the conditional mean probability vector are both conditional on arriving at the position they correspond to.

To arrive at the parameter vector of the posterior Dirichlet distribution for a stage , we simply need to perform element-wise addition of the parameter vectors of the Dirichlet distributions that form stage . For instance, position in Figure 10 is a stage made of situations and in Figure 8. The vector of Dirichlet parameters for is obtained by element-wise addition of the Dirichlet parameter vectors for and as can be seen in Tables 1 and 3. Similarly, the parameters for the Inverse-Gamma distribution on an edge emanating from stage are obtained from the element-wise addition of the parameters of the Inverse-Gamma distributions on the corresponding edges of the situations contained in that stage. To calculate the conditional mean vector, we need to condition on our constraints, which in this case are the individual not entering the immune state or never making a transition out of the current state. For position the mean probability vector is for transition into edges representing “Referred & treated”, “Not referred & treated”, “Not treated” and “Immune state” respectively. Recall that the edge for the “Immune state” is suppressed in the graphical representation of the RDCEG. Let the positions at the end of the edges emanating from position be numerically indexed in the order given above. So represents a transition out of into the immune state. We get the conditional mean vector value at element position k, for as follows

We now shift our focus to estimating the holding time distribution on each edge of the semi-Markov representation of the RDCEG . Consider edge in the semi-Markov process such that there is exactly one edge between and in . We believe the holding time variable on edge is governed by a Weibull distribution with scale parameter and known shape parameter . The scale parameter is distributed according to an Inverse-Gamma distribution with shape parameter and scale parameter . In effect, we are interested in the compounding distribution which has pdf given by