Many real-world processes contain non-symmetric sample space structures. Examples of such processes can be frequently found in public health, medical diagnosis and treatment, risk analysis and policing (see collazo2018chain
). Such asymmetries may arise due to the existence of structural zeros in the sample space of a variable conditional on the realisation of some other variable(s). By a structural zero we mean a conditional probability which by the very nature of the application must be zero - for example a logical impossibility. Thus, even in an infinitely large population, such an outcome would never be observed. It is easy to see how such asymmetries may give rise to context-specific conditional independences which are independence relationships of the formbut where stands for probabilistic independence and the vertical bar shows the conditioning variables on the right. However, context-specific independences regularly arise naturally in many applications (zhang1999role).
Graphical models such as Bayesian Networks (BNs) are unable to fully describe asymmetric processes. They are primarily stymied in this respect as they force the process description on a set of variables that are defined a priori. Indeed, in order to be able to scale up BN methodologies so that these can be applied to large problems, good BN software contain functions that copy parts of one conditional probability table to another so implicitly acknowledging and embedding context-specific phenomena. Thus although BNs can implicitly embed context-specific independences through probability assignments within its conditional probability tables, this structural information is never explicitly represented in their topologies. Uncovering these independences requires serious modifications (typically involving trees in some form) to their standard representation and/or inferential process (boutilier1996context; zhang1999role; jabbari2018instance). Additionally, structural zeros too are hidden away in their conditional probability tables.
Chain Event Graphs (CEGs) are a family of probabilistic graphical models whose structural representation makes such asymmetries - whether structural zeroes or context-specific conditional independences - explicit (collazo2018chain; smith2008conditional). This class embeds the class of finite discrete BNs as a special case (smith2008conditional). They are constructed from event trees which provide a natural and intuitive framework for describing the unfolding of any process through a sequence of events (shafer1996art). Although the size of an event tree increases linearly with the number of events involved in the evolution of the process and this may become unwieldy for large complex processes, they are nonetheless easy for the statistician to transparently elicit from the natural language descriptions of a domain expert. Embedding structural zeros within an event tree is a matter of simply not drawing the corresponding branch in the tree (shenvi2018modelling). However, a more compact representation of an event tree while retaining its properties and transparency is desirable. A CEG provides such a compact representation.
To obtain a CEG, we first transform an event tree into a staged tree by colouring its vertices to represent symmetries within its structure. The vertices of the staged tree are then merged to provide a more concise representation of these symmetries in the form of the graph of a CEG. Such a transformation results in a much simpler graph, that is one which can often have an order of magnitude fewer vertices and edges than the generating tree. Like an event tree, a CEG also describes a process through a sequence of events. Thus it inherits the ability to graphically represent structural zeros from its underlying event tree. The CEG representation is especially useful because various implicit conditional independences, including of the context-specific nature, hidden within the patterns of colouring of the tree can be read directly from its topology using sets of events called cuts and fine cuts (see smith2008conditional).
Several fast learning algorithms now exist for the CEG (freeman2011bayesian; silander2013dynamic; cowell2014causal; collazo2016new). The output of these algorithms is a staged tree. A staged tree typically must go through a sequence of non-trivial transformations before it represents the graph of a CEG. In fact, a CEG is uniquely defined by its staged tree, and we show that the staged tree can be recovered from the graph of the CEG alone.
In silander2013dynamic, the authors present an algorithm to transform a stratified staged tree into a stratified CEG (SCEG). A stratified staged tree (CEG) is one in which events broadly corresponding to the same variable are at the same distance from a leaf (the sink). Intuitively this corresponds to there being no events which become redundant conditional on the past events that have occurred. SCEGs have been studied extensively because any problem that can be represented by a finite discrete BN can also be represented within this wider class. In particular, the advantages of the CEG over a BN can be demonstrated (barclay2013refining; silander2013dynamic). However, we are increasingly finding many applications both in forensic science and in public health where the CEG representation is not stratified (shenvi2018modelling; collazo2018chain). So it is timely that automatic algorithms are available to make this transformation for any staged tree.
The contribution of our paper is threefold. First we provide an algorithm that can transform any staged tree into a CEG and provide an optimal stopping time for this algorithm. Secondly, we prove that the transformation of a staged tree into a CEG, while making the reading of conditional independences easier, does not lead to the loss of any information. We do this by showing that the map from a staged tree to a CEG is bijective. Lastly, we provide Python code that obtains a staged tree using an Agglomerative Hierarchical Clustering (AHC) algorithm and then transforms it into a CEG using our algorithm. Unlike the existing ‘ceg’R package (ceg_package), our code is not restricted to SCEGs and it also allows manual addition of edges with sampling zeros.
This paper is organised as follows. In Section 2 we review the notation and the preliminary concepts. In Section 3 we present our simple recursive backward algorithm - easily coded within supporting software - that can construct a CEG from any staged tree. Here we also prove some properties of the algorithm and also of the transformation itself. We then demonstrate in Section 4, how the algorithm for compacting a stratified staged tree into an SCEG presented in silander2013dynamic can be adapted for the general case and compare it to our algorithm with the optimal stopping time. We conclude the paper with a short discussion in Section 5.
2 Notation and Preliminaries
A CEG construction begins by eliciting an event tree of a process either from a domain expert or from existing literature. Alternatively, it can be constructed directly from data. Below we outline the transformations an event tree goes through to obtain the graph of a CEG:
Vertices in the event tree whose one step ahead evolutions, i.e. conditional transition probabilities, are equivalent are assigned the same colour to indicate this symmetry;
Vertices whose rooted subtrees (the subtree formed by considering that vertex as the root) are isomorphic - in the structure and colour preserving sense - are merged into a single vertex which retains the colouring of its merged vertices;
All the leaves of the tree are merged into a single vertex called the sink.
Here we consider a topical example. The staged tree in Figure 1 shows a hypothesised example of testing for a certain disease available to individuals exhibiting symptoms in three different settings: hospitals, care homes and in the general community. For simplicity, we assume here that the test is 100% sensitive and specific, and that we are only interested in the outcomes related to the disease. By “recovery” we collectively refer to those who recover and those who never had the disease. We further assume that death can only be caused by the disease in the time period considered. The coloured vertices represent equivalence of their conditional transition probabilities. For instance, the probability of dying is the same for individuals in hospitals and care homes who exhibit symptoms but do not get a test. The CEG for this staged tree is shown in Figure 2. It is not hard to see how this tree can be refined to be more realistic.
Let denote an event tree with a finite vertex set and an edge set . An edge from vertex to vertex with edge label is an ordered triple given by . Denote by the set of leaves in . The non-leaf vertices in are called situations and their set is denoted by . The set of children of a vertex are denoted by . Let where denotes the parameters for each vertex .
Two situations and are said to be in the same stage whenever and if then and for edge emanating from and emanating from
. The latter condition states that the edges emanating from situations in the same stage which have the same estimated conditional transition probability must also share the same edge label. Note that when edge labels are not fixed, this condition is relaxed. In this case, edges of vertices in the same stage are coloured to represent which edges share the same conditional transition probabilities. This allows the statistician and domain expert to retrospectively assign labels to events which have the same meaning but which could have initially be assigned different labels.
(Continued) The domain expert may decide that the edge labels “recovery” and “recovery can be treated as equivalent. Then situations and would be in the same position.
The collection of stages partitions . Each stage is a set of situations in that belong to the stage . Stage memberships are represented by colouring the situations of such that each non-trivial stage is represented by a unique colour. An event tree whose situations are coloured according to their stage memberships is called a staged tree and is denoted by . Situations in the staged tree whose rooted subtrees are isomorphic 111In this paper isomorphism is in a structure and colouring preserving sense. have equivalent sets of parameters. That is, for two isomorphic subtrees and rooted at and , . In a non-technical sense, this implies that and have identical future evolutions. Situations whose rooted subtrees are isomorphic belong to the same position. The collection of positions is a finer partition of and each position is a set of situations of that belong to the position . Merging the situations in which are in the same position and collecting all the leaves in into a sink node denoted by result in the graph of a CEG for the process being modelled. Thus a CEG is uniquely defined by its staged tree, or in other words, it is uniquely defined by the pair where is its underlying event tree and is the set of stages. Notice that and are sets of sets and to disambiguate, we refer to sets of sets as collections in this paper.
[Chain Event Graph] A Chain Event Graph (CEG) of a process represented by a staged tree with set of parameters is a directed acyclic graph with where is a set constructed by choosing a representative situation from each set in the collection . The edges in are constructed as follows: For a , create an edge for every edge , with where belongs to a set in which is represented by in . Additionally, retains the colouring of .
A floret of a vertex in any of these directed graphs is denoted by where and is the set of edges induced by in the graph. Denote the set of root-to-sink (root-to-leaf) paths in a CEG (event tree /staged tree ) by ( / ) where a path is a sequence of tuples of the form (‘vertex colour’, ‘edge label’) from the root vertex to the sink following the directed edges. Say that an event tree, staged tree or a CEG is stratified whenever the vertices representing the same type of event (e.g. severity of illness) have the same number of edges between them and the root vertex along any path connecting them, and otherwise say it is non-stratified. Non-stratified CEGs provide a truer representation of a wide range of processes containing structural zeroes (see e.g. shenvi2018modelling; shenvi2019bayesian).
2.1 Why not just Staged Trees?
Staged trees are themselves a graphical representation of a parametric statistical model and they encapsulate within their colouring conditional independence information about the events describing the process (gorgen2016differential; gorgen2018equivalence). So why do we need CEGs when staged trees are powerful tools in themselves?
While we show that staged trees and CEGs representations are equivalent, the graph of CEG is more compact and simpler. Typically, a CEG contains far fewer vertices and edges than its corresponding staged tree representation. As the number of ways in which a process could evolve increases, the staged tree becomes larger and larger, and thus becomes harder to visualise and evaluate. In fact, for dynamic processes, the staged tree is infinite but the corresponding CEG might be finite (barclay2015dynamic; shenvi2019bayesian). More importantly, technologies to read conditional independences from CEG graphs are becoming increasingly sophisticated while they are yet to be developed for staged trees (smith2008conditional; thwaites2015separation).
Note that there exists an interesting framework called conditional independence trees (CITs) (su2005representing; zhang2004conditional)
which decompose decision trees into smaller subtrees by exploiting the conditional independence relationships (including those of the context-specific nature) exhibited by the problem. While CITs support exploration of conditional independences, their development is still very nascent and they were primarily designed for improving prediction on classification problems. As yet, they do not provide any formal method of causal manipulation within these models and it is hard to see how these models can be scaled to a dynamic variant. Most importantly, in our opinion, the representation they provide is too fragmented for a structured, unified understanding of the problem.
3 A Recursive Algorithm to Construct a CEG
In this section we present a simple recursive backward algorithm for constructing the graph of a CEG from any staged tree irrespective of whether it is stratified. We note that a variety of model selection techniques exist for this family (freeman2011bayesian; silander2013dynamic; cowell2014causal; collazo2016new). We do not discuss these in this paper. The outcome of any model selection algorithm for the CEG family is a collection of stages for its underlying event tree. This allows us to colour the situations of the event tree according to the stage memberships and thus, we can obtain its corresponding staged tree. Here we assume that we are only given the staged tree - obtained either as an output of a model selection algorithm or elicited by domain experts - from which we can deduce the collection of stages . The collection and the topology of the staged tree are then used to iteratively identify the collection of positions. The following recursion takes no more steps than steps where is the depth of the tree, such that
where denotes the cardinality of a sequence or set. Thus is the number of tuples on the longest root-to-leaf path in . The recursion progressively melds situations together according to the position structure incrementally more distant from the leaves of the staged tree. The iteration produces a sequence of coloured graphs where and is the graph of the CEG. Each of the graphs constructed in this recursion has the same root-to-leaf/sink paths, that is . Additionally, the following relationship holds
We specify our construction by writing the vertex and edge sets of each graph as a function of the vertex and edge sets of the graph . Note that the vertices in retain their colouring from the graph . Henceforth, we will say , , whenever the two graphs and are isomorphic. Say that a vertex is at a distance from the sink vertex (or equivalently, a leaf in a tree) if the shortest directed path from to the sink (or a leaf) contains tuples. Let be the set of vertices in a given graph such that every is at a distance of from the sink vertex (or a leaf) of the graph. We describe our iterative algorithm below.
Step 1: Initialisation. We first set where is the staged tree. The iteration begins by melding all the leaves of into a sink vertex . Define the following sets
where . Now we can write the vertex and edge sets of as
Step 2: Generalisation of the iterative process. Suppose that we have graph , . To construct the vertex and edge sets of from the vertex and edge sets of proceed as follows:
We first identify the situations in that belong to the same stage. Create a sub-collection informed by the collection of stages such that each situation belongs to only one set for some , and two situations belong to the same set if and only if there exists a stage such that . Thus, the collection gives us the stage structure for the vertices in .
We now construct a finer partition of the collection - call this collection - such that it partitions the situations in into positions. Each is replaced in by the sets , . Each situation belongs to only one set for some , and two situations belong to the same set if and only if there exists an edge for every edge . Thus, we have that , , , and .
Define the following terms for each , , ,
We now define the following terms to enable us to construct the vertex and edge sets of ,
where in which for , . With these, we construct the vertex and edge sets of as
We now prove that the above construction of actually results in a collection of positions of the vertices in . The associated theorem is stated below with a proof in Appendix A.1.
Given graph in the sequence of graphs transforming a staged tree to a CEG , two situations are in the same position if and only if they belong to the same stage and whenever their emanating edges share the same edge label, these edges enter the same downstream position in .
The recursion stops when the defined recursion step would imply , that is when graphs so that, in the notation introduced earlier we can let . We prove that this is indeed the optimal stopping time for the recursion in the theorem below. The proof can be found in Appendix A.2.
[Optimal stopping time] The recursion described above can be stopped either when recursive steps have taken place after constructing or when is isomorphic to for some where is the depth of the staged tree .
3.1 Preservation of Information
Recall that the set of root-to-sink paths in a CEG is equivalent to the root-to-leaf paths in its underlying staged tree , i.e. . Here we show that we can reconstruct the staged tree uniquely from a given CEG. To construct a staged tree from a given CEG , proceed as follows:
Sort the paths in in ascending order of the length of the paths, where length is the number of tuples in the path.
Draw a root vertex of the staged tree . For each path of length where is a colour and is an edge label, draw an edge from and label it . Assign colour to .
Proceed to construct the staged tree in ascending order of the length of the paths. In general, for any path of length given by , there necessarily exists a path ending in a vertex, say in the staged tree constructed so far. To add the th tuple to this path, colour by , add a vertex and draw a directed edge from to with edge label .
This process results in a unique staged tree for a given CEG . This shows that no information is lost between the transformation from a staged tree to a CEG.
A model selection algorithm for SCEGs - although the stratified terminology was not used - based on dynamic programming was presented in silander2013dynamic. Additionally, the authors presented an algorithm to transform a staged tree to an SCEG. We show how, by adapting this algorithm, we arrive at our algorithm albeit with no optimal early stopping criterion, and so it searches the entire staged tree to identify the positions.
silander2013dynamic define the structure of a CEG for -dimensional data as a “layered directed acyclic graph with layers”. They assumed that the vertices in layer correspond to the same variable, say . They also assume that from each vertex in layer , there are exactly emanating edges, all entering vertices in layer . The staged tree to SCEG transformation algorithm states a weaker form of Theorem 3 without a proof and carries out a backward iteration from one layer to the previous one, alll the way to the root, by merging situations which satisfy Theorem 3. However, it is easy to see that using their definition of layers, this algorithm fails for non-stratified CEGs where events don’t necessarily satisfy a symmetric product space structure.
We adapt their algorithm so that layer in their algorithm corresponds to what we defined as set in Section 3. The main difference between the adapted version of their algorithm and ours is that (1) we provide an optimal stopping time which saves on computational effort of searching the entire staged tree, (2) we provide all the necessary proofs for our algorithm. For convenience, call their adapted algorithm the baseline algorithm and ours the optimal time algorithm. From Table 1 we can see that the optimal time algorithm takes less time than the baseline algorithm while arriving at the same CEG. This makes our algorithm more effective as the staged tree gets larger and larger.
The first four datasets are from the UCI repository (Dua:2019). The missing values were removed and sampling zeros were treated as structural. The fifth dataset is from the Christchurch Health and Development Study (CHDS) conducted at the University of Otago, New Zealand (see fergusson1986social) and the last two are simulated datasets which have structural zeros. It has been shown that the last three datasets also exhibit context-specific conditional independences (collazo2018chain; shenvi2018modelling; shenvi2019bayesian). Table 1 222These experiments were carried out using our Python code (https://github.com/ashenvi10/Chain-Event-Graphs) on a 2.9 GHz MacBook Pro with 32GB memory. gives for each dataset the number of situations in the staged tree output by the AHC algorithm (), the maximum depth of the staged tree () and the time taken (in milliseconds) by the two compacting algorithms ( and ) as well as the number of positions in the resulting CEG found by the two algorithms ( and ). It is clear from this table that our algorithm is faster as it can stop as soon as it finds that for some , , there are no situations which are in the same position.
In this paper we have provided a simple iterative backward algorithm for transforming a staged tree - stratified or non-stratified - into a CEG. Research in CEGs and their applications has been an increasingly active field in recent years. However, such a general algorithm and proofs of the validity of the staged tree to CEG transformation have been missing in the literature so far. We know through personal correspondence that, an as yet unpublished, d-separation theorem for the family of CEGs has been developed. In fact, we can use this theorem to show the stronger property that no conditional independence statements are lost in representing a staged tree as a CEG. We shall present this result in a future report.
We would like to thank John Horwood and the CHDS research group for providing one of the datasets. Jim Q. Smith was supported by the Alan Turing Institute and funded by the EPSRC [grant number EP/K03 9628/1].
Appendix A Proofs
a.1 Proof for Theorem 1
We have a graph belonging to the sequence of graphs converting a staged tree into a CEG . This implies that all the vertices in , in represent positions.
Given that two situations are in the same position. We show that (1) and belong to the same stage; (2) whenever their emanating edges share the same edge label, these edges enter the same downstream position in .
If and are in the same position, it is trivially true that they are also in the same stage. Additionally, by the definition of a position, the subtrees rooted at and , call them and in the staged tree are isomorphic. Thus also, for every subtree rooted at a child of in , there exists an isomorphic subtree rooted at a child of in . In fact, due to the requirement in the definition of stages that edges with the same estimated conditional transition probability must also have the same edge label, the subtree rooted at a child of after traversing from along an edge with label will be isomorphic exactly to the subtree rooted at a child of which lies at the end of the edge labelled emanating from .
Notice that and belong to the set in . Since their rooted subtrees in are isomorphic, they belong to the same position and are represented by a single vertex, say in . The edges in and in are represented by edges and in . Similarly, the remaining edges emanating from and in will go into the same downstream positions in whenever they share the same edge label.
Given that in belong to the same stage and whenever their emanating edges share the same edge label, these edges enter the same downstream position in . We need to show that and are in the same position.
Recall that two situations are in the same position when the subtrees rooted at these vertices in are isomorphic in the structure and colour preserving sense. Since and are in the same stage, they have the same number of emanating edges and additionally, the edges from and which share the same edge label have the same estimated conditional transition probability. Consider edge emanating from and edge emanating from in where is the common downstream position into which these edges with label enter. By the definition of a tree, each vertex has at most one parent. So in the staged tree , the position would be represented by two separate vertices, call them and in the subtrees rooted at and respectively. Thus, the edge would be replaced by an edge in the subtree rooted at , call this in . Similarly, the edge would be replaced by an edge in which is the subtree rooted at in . Since and are in the same position in , they have isomorphic subtrees in and .
Similarly, the subtrees rooted at the children of and in and respectively are isomorphic whenever the edges from and to their respective children share the same edge label. Since and are in the same stage, the florets in and in are also isomorphic. Thus we have that and are isomorphic and hence, they belong to the same position.
a.2 Proof for Theorem 2
For a staged tree of depth , if recursions have taken place after constructing , all the situations in the staged tree have been inspected to identify whether they belong to a non-trivial position. Thus, all the positions in have been identified (see Theorem 3) and have been merged together. Thus , and the graph of the CEG has been constructed from the staged tree.
On the other hand, given that recursions have taken place and , then we need to show that . As the graph of the CEG is the most parsimonious representation of the event tree describing a process, showing that is equivalent to showing that where is the collection of positions. In Theorem 3 we showed from , we correctly identify the set of positions among the situations in in the staged tree . So we can frame the problem as showing that if there are no non-trivial positions in then there are no non-trivial positions in any of , . We prove this by contradiction as follows.
Given that there are no non-trivial positions in , suppose that two situations are in the same position and hence, the same stage. Thus the subtrees of rooted at and given by and respectively are isomorphic preserving structure and colouring. Let be a child of along the directed edge and let be the subtree rooted at . By the definition of a stage, there exists an edge in with rooted subtree . The subtrees and are isomorphic as and are isomorphic. By the definition of a position, and are in the same position. As , we have that . This contradicts that there are no non-trivial positions in . A similar argument can be made for any , . Thus, the recursions can be stopped when we have that . Thus .