Identifiability and Transportability in Dynamic Causal Networks

10/18/2016 ∙ by Gilles Blondel, et al. ∙ Universitat Politècnica de Catalunya 0

In this paper we propose a causal analog to the purely observational Dynamic Bayesian Networks, which we call Dynamic Causal Networks. We provide a sound and complete algorithm for identification of Dynamic Causal Net- works, namely, for computing the effect of an intervention or experiment, based on passive observations only, whenever possible. We note the existence of two types of confounder variables that affect in substantially different ways the iden- tification procedures, a distinction with no analog in either Dynamic Bayesian Networks or standard causal graphs. We further propose a procedure for the transportability of causal effects in Dynamic Causal Network settings, where the re- sult of causal experiments in a source domain may be used for the identification of causal effects in a target domain.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Bayesian Networks (BN) are a canonical formalism for representing probability distributions over sets of variables and reasoning about them. A useful extension for modeling phenomena with recurrent temporal behavior are Dynamic Bayesian Networks (DBN). While regular BN are directed acyclic graphs, DBN may contain cycles, with some edges indicating dependence of a variable at time

on another variable at time . The cyclic graph in fact compactly represents an infinite acyclic graph formed by infinitely many replicas of the cyclic net, with some of the edges linking nodes in the same replica, and others linking nodes in consecutive replicas.

BN and DBN model conditional (in)dependences, so they are restricted to observational, non-interventional data or, equivalently, model association, not causality. Pearl’s causal graphical models and do-calculus pearl1994probabilistic are a leading approach to modeling causal relations. They are formally similar to BN, as they are directed acyclic graphs with variables as nodes, but edges represent causality. A new notion is that of a confounder, an unobserved variable that causally influences two variables and so that the association between and may erroneously be taken for causal influence. Confounders are unnecessary in BNs since the association between and represents their correlation, with no causality implied. Causal graphical models allow to consider the effect of interventions or experiments, that is, externally forcing the values of some variables regardless of the variables that causally affect them, and studying the results.

The do-calculus is an algebraic framework for reasoning about such experiments: An expression indicates the probability distribution of a set of variables upon performing an experiment on another set . In some cases, the effect of such an experiment can be obtained from observational data only; this is convenient as some experiments may be impossible, expensive, or unethical to perform. When the expression , for a given causal network, can be rewritten as an expression containing only observational probabilities, without a do operator, we say that it is identifiable. shpitser2006identification ; huang2006identifiability showed that a do-expression is identifiable if and only if it can be rewritten in this way with a finite number of applications of the three rules of do-calculus, and shpitser2006identification proposed the ID algorithm which performs this transformation if at all possible, or else returns fail indicating non-identifiability.

In this paper we use a causal analog of DBNs to model phenomena where a finite set of variables evolves over time, with some variables causally influencing others at the same time but also others at time . The infinite DAG representing these causal relations can be folded, if regular enough, into a directed graph, with some edges indicating intra-replica causal effects and other indicating effect on variables in the next replica. Central to this representation is of course the intuitive fact that causal relations are directed towards the future, and never towards the past.

Existing work on dynamic causal models focuses on the discovery of causal models from data and on causal reasoning given a causal model. Regarding the discovery of causal models in dynamic systems iwasaki1989causality and dash2008note

propose an algorithm to establish an ordering of the variables corresponding to the temporal order of propagation of causal effects. Methods for the discovery of cyclic causal graphs from data have been proposed using independent component analysis

lacerda2012discovering and using local d-separation criteria meek2014toward . Existing algorithms for causal discovery from static data have been extended to the dynamic setting by moneta2006graphical and chicharro2015algorithms . dahlhaus2003causality ; white2010granger ; white2011linking discuss the discovery of causal graphs from time series by including granger causality concepts into their causal models. Our paper does not address causal discovery from data. Given the formal description of a dynamic system under a set of assumptions, our paper proposes algorithms that identify the modified trajectory of the system over time, after an intervention.

Dynamic causal systems are often modeled with sets of differential equations. However dashfundamental dash2001caveats dash2005restructuring show the caveats of causal discovery of dynamic models based on differential equations which pass through equilibrium states, and how causal reasoning based on such models may fail. voortman2012learning propose an algorithm for discovery of causal relations based on differential equations while ensuring those caveats due to system equilibrium states are taken into account. Time scale and sampling rate at which we observe a dynamic system play a crucial role in how well the obtained data may represent the causal relations in the system. aalen2014can discuss the difficulties of representing a dynamic system with a DAG built from discrete observations and gong2015discovering argue that under some conditions the discovery of temporal causal relations is feasible from data sampled at lower rate than the system dynamics. Our paper assumes that the observation time-scale is sufficiently small compared to the system dynamics, and that causal models include the non-equilibrium causal relations and not only those under equilibrium states. We assume that a stable set of causal dependencies exist which generate the system evolution along time. Our proposed algorithms take such models (and under these assumptions) as an input and predict the system evolution upon intervention on the system.

Regarding causal reasoning given a dynamic causal model, one line of research is based on time series and granger causality concepts eichler2010granger ; eichler2012causal ; eichler2012causal2 . queen2009intervention use multivariate time series for identification of causal effects in traffic flow models. lauritzen2002chain discuss intervention in dynamic systems in equilibrium, for several types of time-discreet and time-continuous generating processes with feedback. didelezcausal uses local independence graphs to represent time-continuous dynamic systems and identify the effect of interventions by re-weighting involved processes.

Existing work on causal models does not thoroughly address causal reasoning in dynamic systems using do-calculus. eichler2010granger ; eichler2012causal ; eichler2012causal2 discuss back-door and front-door criteria in time-series but do not extend to the full power of do-calculus as a complete logic for causal identification. One of the advantages of do-calculus is its non-parametric approach so that it leaves the type of functional relation between variables undefined. Our paper extends the use of do-calculus to time series while requiring less restrictions than parametric causal analysis. Parametric approaches may require to differentiate the intervention impacts depending on the system state, non-equilibrium or equilibrium, while our non parametric approach is generic across system states.

Required work is to precisely define the notion and semantics of do-calculus and unobserved confounders in the dynamic setting and investigate whether and how existing do-calculus algorithms for identifiability of causal effects can be applied to the dynamic case.

As a running example (more for motivation than for its accurate modeling of reality), let us consider two roads joining the same two cities, where drivers choose every day to use one or the other road. The average travel delay between the two cities any given day depends on the traffic distribution among the two roads. Drivers choose between a road or another depending on recent experience, in particular how congested a road was last time they used it. Figure 1 indicates these relations: the weather() has an effect on traffic conditions on a given day (, ) which affects the travel delay on that same day (). Driver experience influences the road choice next day, impacting and . To simplify, we assume that drivers have short memory, being influenced by the conditions on the previous day only. This infinite network can be folded into a finite representation as shown in Figure 2, where indicates an edge linking two consecutive replicas of the DAG. Additionally, if one assumes the weather to be an unobserved variable then it becomes a confounder as it causally affects two observed variables, as shown in Figure 3. We call the confounders with causal effect over variables in the same time slice static confounders, and confounders with causal effect over variables at different time slices dynamic confounders. Our models allow for causal identification with both types of confounders, as will be discussed in Section 4.

This setting enables the resolution of causal effect identification problems where causal relations are recurrent over time. These problems are not solvable in the context of classic DBNs, as causal interventions are not defined in such models. For this we use causal networks and do-calculus. However, time dependencies can’t be modeled with static causal networks. As we want to predict the trajectory of the system over time after an intervention, we must use a dynamic causal network. Using our example, in order to reduce travel delay traffic controllers could consider actions such as limiting the number of vehicles admitted to one of the two roads. We would like to predict the effect of such action on the travel delay a few days later, e.g. .

Our contributions in this paper are:

  • We introduce Dynamic Causal Networks (DCN) as an analog of Dynamic Bayesian Networks for causal reasoning in domains that evolve over time. We show how to transfer the machinery of Pearl’s do-calculus pearl1994probabilistic to DCN.

  • We extend causal identification algorithms tianphd ; shpitser2006identification ; shpitser2012efficient to the identifiability of causal effects in DCN settings. Given the expression , the algorithms either compute an equivalent do-free formula or conclude that such a formula does not exist. In the first case, the new formula provides the distribution of variables at time given that a certain experiment was performed on variables at time . For clarity, we present first an algorithm that is sound but not complete (Section 4), then give a complete one that is more involved to describe and justify (Section 5).

  • Unobserved confounder variables are central to the formalism of do-calculus. We observe a subtle difference between two types of unobserved confounder variables in DCN (which we call static and dynamic). This distinction is genuinely new to DCN, as it appears neither in DBN nor in standard causal graphs, yet the presence or absence of unobserved dynamic confounders has crucial impacts on the post-intervention evolution of the system over time and on the computational cost of the algorithms.

  • Finally, we extend from standard Causal Graphs to DCN the results by pearl2011transportability

    on transportability, namely on whether causal effects obtained from experiments in one domain can be transferred to another domain with similar causal structure. This opens the way to studying relational knowledge transfer learning

    pan2010survey of causal information in domains with a time component.

Figure 1: A dynamic causal network. The weather has an effect on traffic flows , , which in turn have an impact on the average travel delay . Based on the travel delay car drivers may choose a different road next time, having a causal effect on the traffic flows.

2 Previous Definitions and Results

In this section we review the definitions and basic results on the three existing notions that are the basis of our work: DBN, causal networks, and do-calculus. New definitions introduced in this paper are left for Section 3.

All formalisms in this paper model joint probability distributions over a set of variables. For static models (regular BN and Causal Networks) the set of variables is fixed. For dynamic models (DBN and DCN), there is a finite set of “metavariables”, meaning variables that evolve over time. For a metavariable and an integer , is the variable denoting the value of at time .

Let be the set of metavariables for a dynamic model. We say that a probability distribution is time-invariant if is the same for every . Note that this does not mean that for every , but rather that the laws governing the evolution of the variable do not change over time. For example, planets do change their positions around the Sun, but the Kepler-Newton laws that govern their movement do not change over time. Even if we performed an intervention (say, pushing the Earth away from the Sun for a while), these laws would immediately kick in again when we stopped pushing. The system would not be time-invariant if e.g. the gravitational constant changed over time.

2.1 Dynamic Bayesian Networks

Dynamic Bayesian Networks (DBN) are graphical models that generalize Bayesian Networks (BN) in order to model time-evolving phenomena. We rephrase them as follows.

Definition 1

A DBN is a directed graph over a set of nodes that represent time-evolving metavariables. Some of the arcs in the graph have no label, and others are labeled “”. It is required that the sub-graph formed by the nodes and the unlabeled edges must be acyclic, therefore forming a Directed Acyclic Graph (DAG). Unlabeled arcs denote dependence relations between metavariables within the same time step, and arcs labeled “” denote dependence between a variable at one time and another variable at the next time step.

Definition 2

A DBN with graph represents an infinite Bayesian Network as follows. Timestamps are the integer numbers; will thus be a biinfinite graph. For each metavariable in and each time step there is a variable in . The set of variables indexed by the same is denoted and called “the slice at time ”. There is an edge from to iff there is an unlabeled edge from to in , and there is an edge from to iff there is an edge labeled “” from to in . Note that is acyclic.

The set of metavariables in is denoted , or simply when is clear from the context. Similarly or denote the variables in the -th slice of .

In this paper we will also use transition matrices to model probability distributions. Rows and columns are indexed by tuples assigning values to each variable, and the entry of the matrix represents the probability . Let denote this transition matrix. Then we have, in matrix notation, and, more in general, . In the case of time-invariant distributions, all matrices are the same matrix , so .

2.2 Causality and Do-Calculus

The notation used in our paper is based on causal models and do-calculus pearl1994probabilistic ; pearl2000causality .

Definition 3 (Causal Model)

A causal model over a set of variables is a tuple

, where U is a set of random variables that are determined outside the model (”exogenous” or ”unobserved” variables) but that can influence the rest of the model,

is a set of n variables that are determined by the model (”endogenous” or ”observed” variables), is a set of n functions such that , are the parents of in , are a set of constant parameters and is a joint probability distribution over the variables in .

In a causal model the value of each variable is assigned by a function which is determined by constant parameters , a subset of called the ”parents” of ()) and a subset of ().

A causal model has an associated graphical representation (also called the ”induced graph of the causal model”) in which each observed variable corresponds to a vertex, there is one edge pointing to from each of its parents, i.e. from the set of vertex and there is a doubly-pointed edge between the vertex influenced by a common unobserved variable in (see Figure 3). In this paper we call the unobserved variables in ”unobserved confounders” or ”confounders” for simplicity.

Causal graphs encode the causal relations between variables in a model. The primary purpose of causal graphs is to help estimate the joint probability of some of the variables in the model upon controlling some other variables by forcing them to specific values; this is called an action, experiment or intervention. Graphically this is represented by removing all the incoming edges (which represent the causes) of the variables in the graph that we control in the experiment. Mathematically the

operator represents this experiment on the variables. Given a causal graph where and are sets of variables, the expression is the joint probability of upon doing an experiment on the controlled set .

Figure 2: Compact representation of a dynamic causal network where indicates an edge linking a variable in with a variable in .

A causal relation represented by is said to be identifiable if it can be uniquely computed from an observed, non-interventional, distribution of the variables in the model. In many real world scenarios it is impossible, impractical, unethical or too expensive to perform an experiment, thus the interest in evaluating its effects without actually having to perform the experiment.

The three rules of do-calculus pearl1994probabilistic allow us to transform expressions with operators into other equivalent expressions, based on the causal relations present in the causal graph.

For any disjoint sets of variables , , and :

  1. if

  2. if

  3. if

is the graph where all edges incoming to are removed. is the graph where all edges outgoing from are removed. Z(W) is the set of Z-nodes that are not ancestors of any W-nodes in .

Do-calculus was proven to be complete shpitser2006identification ; huang2006identifiability in the sense that if an expression cannot be converted into a do-free one by iterative application of the three do-calculus rules, then it is not identifiable.

2.3 The ID Algorithm

The ID algorithm shpitser2006identification , and earlier versions by tianpearl2002 ; tian2004identifying implement an iterative application of do-calculus rules to transform a causal expression into an equivalent expression without any terms in semi-Markovian causal graphs (with confounders). This enables the identification of interventional distributions from non-interventional data in such graphs.

The ID algorithm is sound and complete shpitser2006identification in the sense that if a do-free equivalent expression exists it will be found by the algorithm, and if it does not exist the algorithm will exit and provide an error.

The algorithm specifications are as follows. Inputs: causal graph , variable sets and , and a probability distribution over the observed variables in ; Output: an expression for without any terms, or fail.

Remark:  In our algorithms of Sections 4 and 5, we may invoke the ID algorithm with a slightly more complex input: (note the “extra” to the right of the conditioning bar). In this case, we can solve the identification problem for the more complex expression with two calls to the ID algorithm using the following identity (definition of conditional probability):

The expression is thus identifiable if and only if both and are shpitser2006identification .

Another algorithm for the identification of causal effects is given in shpitser2012efficient .

The algorithms we propose in this paper show how to apply existing causal identification algorithms to the dynamic setting. In this paper we will refer as ”ID algorithm” any existing causal identification algorithm.

3 Dynamic Causal Networks and Do-Calculus

In this section we introduce the main definitions of this paper and state several lemmas based on the application of do-calculus rules to DCNs.

In the Definition 3 of causal model the functions are left unspecified and can take any suitable form that best describes the causal dependencies between variables in the model. In natural phenomenon some variables may be time independent while others may evolve over time. However rarely does Pearl specifically treat the case of dynamic variables.

The definition of Dynamic Causal Network is an extension of Pearl’s causal model in Definition 3, by specifying that the variables are sampled over time, as in valdes2011effective .

Definition 4 (Dynamic Causal Network)

A dynamic causal network is a causal model in which the set of functions is such that ; where is the variable associated with the time sampling of the observed process ; is the variable associated with the time sampling of the unobserved process ; and are discreet values of time.

Note that may include variables in any time sampling previous to up to and including , depending on the delays of the direct causal dependencies between processes in comparison with the sampling rate. may be generated by a noise process or by an unobserved confounder. In the case of noise, we assume that all noise processes are independent of each other, and that their influence to the observed variables happens without delay, so that . In the case of unobserved confounders, we assume as causes precede their effects.

To represent unobserved confounders in DCN, we extend to the dynamic context the framework developed in pearl1991theory on causal model equivalence and latent structure projections. Let’s consider the projection algorithm verma1993graphical , which takes a causal model with unobserved variables and finds an equivalent model (with the same set of causal dependencies), called a ”dependency-equivalent projection”, but with no links between unobserved variables and where every unobserved variable is a parent of exactly two observed variables.

The projection algorithm in DCN works as follows. For each pair of of observed processes, if there is a directed path from to through unobserved processes then we assign a directed edge from to ; however if there is a divergent path between them through unobserved processes then we assign a bidirected edge, representing an unobserved confounder.

In this paper we represent all DCN by their dependency-equivalent projection. Also we assume the sampling rate to be adjusted to the dynamics of the observed processes. However, both the directed edges and the unobserved confounder paths may be crossing several time steps depending on the delay of the direct causal dependencies in comparison with the sampling rate. We now introduce the concept of static and dynamic confounder.

Definition 5 (Static Confounder)

Let be a DCN. Let be the maximal number of time steps crossed by any of the directed edges in . Let be the maximal number of time steps crossed by an unobserved confounder path. If then the unobserved confounder is called Static.

Definition 6 (Dynamic Confounder)

Let , and be as in Definition 5. If then the unobserved confounder is called Dynamic. More specifically, if we call it ”first order” Dynamic Confounder; if we call it ”higher order” Dynamic Confounder.

In this paper, we consider three case scenarios in regards to DCN and their time-invariance properties. If a DCN contains only static confounders we can construct a first order Markov process in discrete time, by taking (per Definition 5) consecutive time samples of the observed processes in . This does not mean the DCN generating functions in Definition 4

are time-invariant, but that a first order Markov chain can be built over the observed variables when marginalizing the static confounders over

time samples.

In a second scenario, we consider DCN with first order dynamic confounders. We can still construct a first order Markov process in discrete time, by taking consecutive time samples. However we will see in later sections how the effect of interventions on this type of DCN has a different impact than on DCN with static confounders.

Finally, we consider DCN with higher order dynamic confounders, in which case we may construct a first order Markov process in discrete time by taking a multiple of consecutive time samples.

As we will see in later sections, the difference between these three types of DCN is crucial in the context of identifiability. Dynamic confounders cause a time invariant transition matrix to become dynamic after an intervention, e.g. the post-intervention transition matrix will change over time. However, if we perform an intervention on a DCN with static confounders, the network will return to its previous time-invariant behavior after a transient period. These differences have a great impact on the complexity of the causal identification algorithms that we present.

Considering that causes precede their effects, the associated graphical representation of a DCN is a DAG. All DCN can be represented as a biinfinite DAG with vertices ; edges from to ; and confounders (bi-directed edges). DCN with static confounders and DCN with first order dynamic confounders can be compactly represented as time samples (a multiple of time samples for higher order dynamic confounders) of the observed processes ; their corresponding edges and confounders; and some of the directed and bi-directed edges marked with a ”+1” label representing the dependencies with the next time slice of the DCN.

Definition 7 (Dynamic Causal Network identification)

Let be a DCN, and , be two time slices of . Let be a subset of and be a subset of . The DCN identification problem consists of computing the probability distribution from the observed probability distributions in , i.e. computing an expression for the distribution containing no do() operators.

In the definition above we always assume that and are disjoint. In this version we only consider the case in which all intervened variables are in the same time sample. It is not difficult to extend our algorithm to the general case.

The following lemma is based on the application of do-calculus to DCN. Intuitively, future actions have no impact on the past.

Lemma 1 (Future actions)

Let be a DCN. Take any sets and , with . Then for any set the following equalities hold:

  1. whenever with .

Proof

The first equality derives from rule 3 and the proof in shpitser2006identification that interventions on variables which are not ancestors of in have no effect on . The second is the special case . We can transform the third expression using the equivalence ; since and precede in , by rule 3 and , and then the above equals . ∎

In words, traffic control mechanisms applied next week have no causal effect on the traffic flow this week.

The following lemma limits the size of the graph to be used for the identification of DCNs.

Lemma 2

Let be a DCN. Let be the sub-graph of consisting of all time slices in between (and including) and . Let be the sub-graph augmented with the time slice preceding it. If is identifiable in then it is identifiable in and the identification provides the same result on both graphs.

Proof

(sketch) By C-component factorization tianphd , we decompose the problem as that of identification of each C-component in and (if all C-components are identifiable) multiplying all identified quantities to obtain . C-components are sets of variables linked by confounder edges in the graph . An identifiable C-component is computed as the product of for each variable in the C-component, where is the set of all variables preceding in some topological ordering shpitser2006identification ; tianphd . The C-component factorization involving all the variables preceding the set

leads to the joint distribution of these variables, and can be computed using the joint distribution of the time slice preceding

alone. Also, non-ancestors of can be ignored from the graph, by application of do-calculus rule 3, so time slices succeeding can be discarded. Therefore the identification problem can be computed in the limited graph .

This result is crucial to reduce the complexity of identification algorithms in dynamic settings. In order to describe the evolution of a dynamic system over time, after an intervention, we can run a causal identification algorithm over a limited number of time slices of the DCN, instead of the entire DCN. ∎

4 Identifiability in Dynamic Causal Networks

In this section we analyze the identifiability of causal effects in the DCN setting. We first study DCNs with static confounders and propose a method for identification of causal effects in DCNs using transition matrices. Then we extend the analysis and identification method to DCNs with dynamic confounders. As discussed in Section 3, both the DCNs with static confounders and with dynamic confounders can be represented as a Markov chain. For graphical and notational simplicity, we represent these DCN graphically as recurrent time slices as opposed to the shorter time samples, on the basis that one time slice contains as many time samples as the maximal delay of direct causal influence among the processes. Also for notational simplicity we assume the transition matrix from one time slice to the next to be time-invariant; however removing this restriction would not make any of the lemmas, theorems or algorithms invalid, as they are the result of graphical non-parametric reasoning.

Consider a DCN under the above assumptions, and let be its time invariant transition matrix from any time slice to . We assume that there is some time such that the distribution is known. Fix now and a set . We will now see how performing an intervention on affects the distributions in .

We begin by stating a series of lemmas that apply to DCNs in general.

Lemma 3

Let be such that , with . Then . Namely, transition probabilities are not affected by an intervention in the future.

Proof

By Lemma 1, (2), for all such . By definition of , this equals . Then induct on with as base. ∎

Lemma 4

Assume that an expression is identifiable for some . Let be the matrix whose entries correspond to the probabilities . Then .

Proof

Case by case evaluation of ’s entries. ∎

4.1 DCNs with Static Confounders

Static confounders impact sets of variables within one time slice only, and there are no confounders between variables at different time slices (see Figure 3).

The following three lemmas are based on the application of do-calculus to DCNs with static confounders. Intuitively, conditioning on the variables that cause time dependent effects d-separates entire parts (future from past) of the DCN (Lemmas 5, 6, 7).

Lemma 5 (Past observations and actions)

Let be a DCN with static confounders. Take any set . Let be the set of variables in that are direct causes of variables in . Let and , with and (positive natural numbers). The following distributions are identical:

Proof

By the graphical structure of a DCN with static confounders, conditioning on d-separates from . The three rules of do-calculus apply, and (1) equals (3) by rule 1, (1) equals (2) by rule 2, and also (2) equals (3) by rule 3. ∎

In our example, we want to predict the traffic flow in two days caused by traffic control mechanisms applied tomorrow , and conditioned on the traffic delay today . Any traffic controls applied before today are irrelevant, because their impact is already accounted for in .

Lemma 6 (Future observations)

Let , and be as in Lemma 5. Let and , with and , then:

Proof

By the graphical structure of a DCN with static confounders, conditioning on d-separates from and the expression is valid by rule 1 of do-calculus. ∎

In our example, observing the travel delay today makes observing the future traffic flow irrelevant to evaluate yesterday”s traffic flow.

Lemma 7

Let . Then . Namely, transition probabilities are not affected by intervention more than one time unit in the past.

Proof

where the elements of are . As includes all variables in that are direct causes of variables in , conditioning on d-separates from . By Lemma 5 we exchange the action by the observation and so . Moreover, d-separates from , so they are statistically independent given . Therefore, which are the elements of matrix as required. ∎

Theorem 4.1

Let be a DCN with static confounders, and transition matrix . Let and for two time points . If the expression is identifiable with corresponding transition matrix , then is identifiable and

Proof

Applying Lemma 3, we obtain that . We assumed that is identifiable, therefore Lemma 4 guarantees that

Finally, by repeatedly applying Lemma 7. is obtained by marginalizing variables in in the resulting expression . ∎

As a consequence of Theorem 4.1, causal identification of reduces to the problem of identifying the expression . The ID algorithm can be used to check whether this expression is identifiable and, if it is, compute its joint probability from observed data.

Note that Theorem 4.1 holds without the assumption of transition matrix time-invariance by replacing powers of with products of matrices .

Figure 3: Dynamic Causal Network where and have a common unobserved cause, a confounder. Since both variables are in the same time slice, we call it a static confounder.

4.1.1 DCN-ID Algorithm for DCNs with Static Confounders

The DCN-ID algorithm for DCNs with static confounders is given in Figure 4. Its soundness is immediate from Theorem 4.1, the soundness of the ID algorithm shpitser2006identification , and Lemma 2.

Theorem 4.2 (Soundness)

Whenever DCN-ID returns a distribution for , it is correct. ∎

Observe that line 2 of the algorithm calls ID with a graph of size . By the remark of Section 2.3, this means two calls but notice that in this case we can spare the call for the “denominator” because Lemma 1 guarantees . Computing transition matrix A on line 3 has complexity , where is the number of variables in one time slice and the number of bits encoding each variable. The formula on line 4 is the multiplication of by matrices, which has complexity . To solve the same problem with the ID algorithm would require running it on the entire graph of size and evaluating the resulting joint probability with complexity compared to with DCN-ID.

If the problem we want to solve is evaluating the trajectory of the system over time

after an intervention at time slice , with ID we would need to run ID times and evaluate the outputs with overall complexity . Doing the same with DCN-ID requires running ID one time to identify , evaluating the output and applying successive transition matrix multiplications to obtain the joint probability of the time slices thereafter, with resulting complexity .

 Function DCN-ID(,, ,, ,,,)

INPUT:

  • DCN defined by a causal graph on a set of variables and a set describing causal relations from to for every

  • transition matrix for derived from observational data

  • a set included in

  • a set included in

  • distribution at the initial state,

OUTPUT: The distribution , or else FAIL

  1. let be the acyclic graph formed by joining , , , and by the causal relations given by ;

  2. run the standard ID algorithm for expression on ; if it returns FAIL, return FAIL;

  3. else, use the resulting distribution to compute the transition matrix , where ;

  4. return ;

Figure 4: The DCN-ID algorithm for DCNs with static confounders

 

4.2 DCNs with Dynamic Confounders

We now discuss the case of DCNs with dynamic confounders, that is, with confounders that influence variables in consecutive time slices.

The presence of dynamic confounders d-connects time slices, and we will see in the following lemmas how this may be an obstacle for the identifiability of the DCN.

In the presence of dynamic confounders, Lemma 7 does no longer hold since d-separation is no longer guaranteed. As a consequence, we cannot guarantee the DCN will recover its “natural” (non-interventional) transition probabilities from one cycle to the next after the intervention is performed.

Our statement of the identifiability theorem for DCNs with dynamic confounders is weaker and includes in its assumptions those conditions that can no longer be guaranteed.

Theorem 4.3

Let be a DCN with dynamic confounders. Let be its transition matrix under no interventions. We further assume that:

  1. is identifiable by matrix

  2. For all , is identifiable by matrix

Then is identifiable and computed by

Proof

Similar to the proof of Theorem 4.1. By Lemma 3, we can compute the distribution up to time as

Using the first assumption in the statement of the theorem, by Lemma 4 we obtain

Then, we compute the final using the matrices from the statement of the theorem that allows us to compute probabilities for subsequent time-slices. Namely,

and so on until we find

Finally, the do-free expression of is obtained by marginalization over variables of not in . ∎

Again, note that Theorem 4.3 holds without the assumption of transition matrix time-invariance by replacing powers of with products of matrices .

4.2.1 DCN-ID Algorithm for DCNs with Dynamic Confounders

 Function DCN-ID(,, ,, ,,,,)

INPUT:

  • DCN defined by a causal graph on a set of variables and a set describing causal relations from to for every , and a set describing confounder relations from to for every

  • transition matrix for derived from observational data

  • a set included in

  • a set included in

  • distribution at the initial state,

OUTPUT: The distribution , or else FAIL

  1. let be the acyclic graph formed by joining , , , and by the causal relations given by and confounders given by ;

  2. run the standard ID algorithm for expression on ; if it returns FAIL, return FAIL;

  3. else, use the resulting distribution to compute the transition matrix , where ;

  4. for each from up to :

    1. let be the causal graph composed of time slices , , …,

    2. run the standard ID algorithm on for the expression ; if it returns FAIL, return FAIL;

    3. else, use the resulting distribution to compute the transition matrix , where ;

  5. return ;

Figure 5: The DCN-ID algorithm for DCNs with dynamic confounders

 

The DCN-ID algorithm for DCNs with dynamic confounders is given in Figure 5.

Its soundness is immediate from Theorem 4.3, the soundness of the ID algorithm shpitser2006identification , and Lemma 2.

Theorem 4.4 (Soundness)

Whenever DCN-ID returns a distribution for , it is correct. ∎

Notice that this algorithm is more expensive than the DCN-ID algorithm for DCNs with static confounders. In particular, it requires calls to the ID algorithm with increasingly larger chunks of the DCN. To identify a single future effect it may be simpler to invoke Lemma 2 and do a unique call to the ID algorithm for the expression restricted to the causal graph formed by time-slices , …, . However, to predict the trajectory of the system over time after an intervention, the DCN-ID algorithm for dynamic confounders directly identifies the post-intervention transition matrix and its evolution. A system characterized by a time-invariant transition matrix before the intervention will be characterized by a time dependent transition matrix, given by the DCN-ID algorithm, after the intervention. This dynamic view offers opportunities for the analysis of the time evolution of the system, and conditions for convergence to a steady state.

5 Complete DCN Identifiability

In this section we show that the identification algorithms as formulated in previous sections are not complete, and we develop complete algorithms for complete identification of DCNs. To prove completeness we use previous results shpitser2006identification . It is shown there that the absence of a structure called ’hedge’ in the graph is a sufficient and necessary condition for identifiability. We first define some graphical structures that lead to the definition of hedge, in the context of DCNs.

Definition 8 (C-component)

Let be a DCN. Any maximal subset of variables of connected by bidirected edges (confounders) is called a C-component.

Definition 9 (C-forest)

Let be a DCN and a C-component of . If all variables in have at most one child, then is called a C-forest. The set of variables in that have no descendants is called the C-forest root, and the C-forest is called -rooted.

Definition 10 (Hedge)

Let and be sets of variables in . Let and be two -rooted C-forests such that , , , . Then and form a Hedge for in .

Notice that refers to those variables that are ancestors of in the causal network where incoming edges to have been removed. We may drop the subscript as in in which case we are referring to the ancestors of in the unmodified network (in which case, the network we refer to should be clear from the context). Moreover, we overload the definition of the ancestor function and we use to refer to the ancestors of the union of sets and , that is, .

The presence of a hedge prevents the identifiability of causal graphs shpitser2006identification . Also any non identifiable graph necessarily contains a hedge. These results applied to DCNs lead to the following lemma.

Lemma 8 (DCN complete identification)

Let be a DCN with confounders. Let and be sets of variables in . is identifiable iif there is no hedge in for .

We can show that the algorithms presented in the previous section, in some cases introduce hedges in the sub-networks they analyze, even if no hedges existed in the original expanded network.

Lemma 9

The DCN-ID algorithms for DCNs with static confounders (Section 4.1) and dynamic confounders (Section 4.2) are not complete.

Proof

Let be an DCN. Let be such that contains two -rooted C-forests and , , , . Let be such that . The condition for implies that does not contain a hedge, and is therefore identifiable by Lemma 8. Let the set of variables at time slice of , , be such that . By Definition 10, contains a hedge for . The identification of requires the DCN-ID algorithm to identify which fails. ∎

Figure 6: Identifiable Dynamic Causal Network which the DCN-ID algorithm fails to identify. and are -rooted C-forests, but since is not an ancestor of there is no hedge for . However is an ancestor of and DCN-ID fails when finding the hedge for .

Figure 6 shows an identifiable DCN that DCN-ID fails to identify.

The proof of Lemma 9 provides the framework to build a complete algorithm for identification of DCNs.

5.1 Complete DCN identification algorithm with Static Confounders

The DCN-ID algorithm can be modified so that no hedges are introduced if none existed in the original network. This is done at the cost of more complicated notation, because the fragments of network to be analyzed do no longer correspond to natural time slices. More delicate surgery is needed.

Lemma 10

Let be a DCN with static confounders. Let and for two time slices . If there is a hedge for in then .

Proof

By definition of hedge, and are connected by confounders to . As has only static confounders , and must be within . ∎

Lemma 11

Let be a DCN with static confounders. Let and for two time slices . is identifiable if and only if is identifiable.

Proof

(if) By Lemma 8, if

is identifiable then there is no hedge for this expression in . By Lemma 10 if has static confounders, a hedge must be within time slice . If time slice does not contain two -rooted C-forests and such that , , , then there is no hedge for any set so there is no hedge for the expression which makes it identifiable. Now let’s assume time slice contains two -rooted C-forests and such that , , , then . As is in time slice , this implies and so there is no hedge for the expression which makes it identifiable.

(only if) By Lemma 8, if is identifiable then there is no hedge for in . By Lemma 10 if has static confounders, a hedge must be within time slice . If time slice does not contain two -rooted C-forests and such that , , , then there is no hedge for any set so there is no hedge for the expression

which makes it identifiable. Now let’s assume time slice contains two -rooted C-forests and such that , , , then (if would contain a hedge by definition). As is in time slice , implies and so there is no hedge for which makes this expression identifiable. ∎

Lemma 12

Assume that an expression is identifiable for some and . Let be the matrix whose entries correspond to the probabilities . Then .

Proof

Case by case evaluation of ’s entries. ∎

 Function cDCN-ID(,, ,, ,,,)

INPUT:

  • DCN defined by a causal graph on a set of variables and a set describing causal relations from to for every

  • transition matrix representing the probabilities derived from observational data

  • a set included in

  • a set included in

  • distribution at the initial state,

OUTPUT: The distribution if it is identifiable, or else FAIL

  1. let be the acyclic graph formed by joining , , , and by the causal relations given by ;

  2. run the standard ID algorithm for expression on ; if it returns FAIL, return FAIL;

  3. else, use the resulting distribution to compute the transition matrix , where ;

  4. let be the matrix marginalized as