Causal Calculus in the Presence of Cycles, Latent Confounders and Selection Bias

01/02/2019 ∙ by Patrick Forré, et al. ∙ University of Amsterdam 0

We prove the main rules of causal calculus (also called do-calculus) for interventional structural causal models (iSCMs), a generalization of a recently proposed general class of non-/linear structural causal models that allow for cycles, latent confounders and arbitrary probability distributions. We also generalize adjustment criteria and formulas from the acyclic setting to the general one (i.e. iSCMs). Such criteria then allow to estimate (conditional) causal effects from observational data that was (partially) gathered under selection bias and cycles. This generalizes the backdoor criterion, the selection-backdoor criterion and extensions of these to arbitrary iSCMs. Together, our results thus enable causal reasoning in the presence of cycles, latent confounders and selection bias.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Statistical models are governed by the rules of probability (e.g. sum and product rule), which link joint distributions with the corresponding (conditional) marginal ones.

Causal models

follow additonal rules, which relate the observational distributions with the interventional ones. In contrast to the rules of probability theory, which directly follow from their axioms, the rules of

causal calculus need to be proven, when based on the definition of structural causal models (SCMs). As SCMs will among other things depend on the underlying graphical structure (e.g. with or without cycles or bidirected edges, etc.), the used function classes (e.g. linear or non-linear, etc.) and the allowed probability distributions (e.g. discrete, continuous, singular or mixtures, etc.) the respective endeavour is not immediate.

Such a framework of causal calculus contains rules about when one can 1.) insert/delete observations, 2.) exchange action/observation, 3.) insert/delete actions; and about when and how to recover from interventions and/or selection bias (backdoor and selection-backdoor criterion), etc. (see [18, 19, 20, 30, 29, 12, 21, 23, 28, 1, 24, 3, 4]). While these rules have been extensively studied for acyclic causal models, e.g. (semi-)Markovian models, which are attached to directed acyclic graphs (DAGs) or acyclic directed mixed graphs (ADMGs) (see [18, 19, 20, 30, 29, 12, 21, 23, 28, 1, 24, 3, 4]), the case of causal models with cycles stayed in the dark.

To deal with cycles and latent confounders at the same time in this paper we will introduce the class of interventional structural causal models (iSCMs), a “conditional” version of the recently proposed class of modular structural causal models (mSCMs) (see [8, 9]) to also include external nodes that can play the role of parameter/action/intervention nodes. They have several desirable properties: iSCMs allow for arbitrary probability distributions, non-/linear functional relations, latent confounders and cycles. They can also model non-/probabilistic external and probabilistic internal nodes in one framework. Furthermore, the class of iSCMs is closed under arbitrary marginalisations and interventions. All causal models that are based on acyclic graphs like DAGs, ADMGs or mDAGs (see [25, 7]) can be interpreted as special acyclic iSCMs. Thus iSCMs generalize all these classes of causal models in one framework, but also allow for cycles and external non-/probabilistic nodes. Also the generalized directed global Markov property for mSCMs (see [8, 9]) generalizes to iSCMs, i.e. iSCMs entail the conditional independence relations that follow from the -separation criterion in the underlying graph, where -separation generalizes the usual d-separation (also called m- or m-separation, see [17, 33, 21, 7, 25]) from acyclic graphs to directed mixed graphs (DMGs) (and even HEDGes and -CGs) with or without cycles in a non-naive way.

This paper now aims at proving the mentioned main rules of causal calculus for iSCMs and derive adjustment criteria with corresponding adjustment formulas like generalized (selection-)backdoor adjustments.

The paper is structured as follows: We will first give the precise definition and main constructions of interventional structural causal models (iSCMs) closely mirroring mSCMs from [8, 9]. We then will review the definition of -separation and generalize its criterion from mSCMs (see [8, 9]) to iSCMs. As a preparation for the causal calculus, which relates observational and interventional distributions, we will then show how one can extend a given iSCM to one that also incorporates additional interventional variables indicating the regime of interventions onto the observed nodes. We will then basically show how the rules of causal calculus directly follow from the existence of such an extended iSCM and the -separation criterion applied to it. Finally, we will derive the mentioned general adjustment criteria with corresponding adjustment formulas.

2 Interventional Structural Causal Models

In this section we will define interventional structural causal models (iSCMs), which could be seen as a “conditional” version of modular structural causal models (mSCMs) defined in [8, 9]. We will then construct marginalized iSCMs and intervened iSCMs. To allow for cycles we first need to introduce the notion of loop of a graph and its strongly connected components.

Definition 2.1 (Loops).

Let be a directed graph (with or without cycles).

  1. A set of nodes is called a loop of if for every two nodes there are two directed paths and in such that all the intermediate nodes are also in (if any). The sets are also considered as loops.

  2. The set of loops of is written as .

  3. The strongly connected component of in is defined to be:

Remark 2.2.

Let be a directed graph.

  1. We always have and .

  2. If is acyclic then: .

In the following, all spaces are meant to be equipped with -algebras, forming standard measurable spaces, and all maps to be measurable.

Definition 2.3 (Interventional Structural Causal Model).

An interventional structural causal model (iSCM) by definition consists of:

  1. a set of nodes , where elements of correspond to observed variables, elements of to latent variables and elements of to intervention variables.

  2. an observation/latent/action space for every , ,

  3. a product probability measure on the latent space ,

  4. a directed graph structure with the properties:

    1. ,

    2. ,

    where and stand for children and parents in , resp.,111To have a “reduced” form of the latent space one can in addition impose the condition: for every two distinct . This can always be achieved by gathering latent nodes together if .

  5. a system of structural equations :

    that satisfy the following global compatibility conditions: For every nested pair of loops of and every element we have the implication:

    where and denote the corresponding components of .

The iSCM will be denoted by .

Definition 2.4 (Modular structural causal model, see [8, 9]).

A modular structural causal model (mSCM) is an iSCM without intervention nodes, i.e. .

Remark 2.5 (Relation between iSCMs and mSCMs).

Given an iSCM with graph we can construct a well-defined mSCM by specifying a product distribution on . For every node we can decide to change either to a latent node () or to an observed node (). In the latter case we then formally need to add a latent node to and an edge to , put and and consider to live on the latent space (corresponding to rather then to directly).

The actual joint distributions on the observed space

and thus the random variables attached to any iSCM will be defined in the following.

Definition 2.6.

Let be an iSCM with . We fix a value . The following constructions will depend on the choice of .

  1. The latent variables are given by , i.e. by the canonical projections , which are jointly -independent. These are still independent of , but we put .

  2. For we put , the constant variable given by the -component of .

  3. The observed variables are inductively defined by:

    where and where the second index refers to the -component of . The induction is taken over any topological order of the strongly connected components of , which always exists (see [8]).

  4. By the compatibility condition for we then have that for every with the following equality holds:

    where we put and for subsets .

  5. We define the family of conditional distributions:

    for and . Note that in the following we will use the and the -free notation (only) for the -variables interchangeably.

  6. If we, furthermore, specify a product distribution on , then we get a joint distribution on by setting:

Remark 2.7.

Let be an iSCM with . For every subset we get a well defined map by recursively plugging in the into each other for the biggest occuring loops by the same arguments as before. These then are all globally compatible by construction and satisfy:

Similar to mSCMs (see [8, 9]) we can define the marginalisation of an iSCM.

Definition 2.8 (Marginalisation of iSCMs).

Let be an iSCM with and a subset. The marginalised iSCM w.r.t.  can be defined by plugging the functions related to into each other. For example, when marginalizing out we can define (for the non-trivial case ):

where is the marginalised graph of , is any loop of and the corresponding induced loop in .

Similar to mSCMs (see [8, 9]) we now define what it means to intervene on observed nodes in an iSCM.

Definition 2.9 (Perfect interventions on iSCMs).

Let be an iSCM with . Let be a subset. We then define the post-interventional iSCM w.r.t. :

  1. Define the graph by removing all the edges for all nodes and .

  2. Put and .

  3. Remove the functions for loops with .

The remaining functions then are clearly globally compatible and we get a well-defined iSCM .

3 Conditional Independence

Here we shortly generalize conditional independence for structured families of distributions. The main application will be the distributions coming from iSCMs, but the following definition might be of more general importance.

Definition 3.1 (Conditional independence).

Let and be product spaces and

a family of distributions on (measurably333We require that for every measurable the map given by is measurable. Such families of distributions are also called channels or (stochastic) Markov (transition) kernels (see [14]).) parametrized by . For subsets we write:

if and only if for every product distribution on we have:

where , the distribution given by and then .

Lemma 3.2.

Let the situation be like in 3.1. If are pairwise disjoint, and then we have the equivalence:

  1. , if and only if

  2. is only a function of (for every setting of ).


Since the latter does not depend on the choice of it clearly implies the former. Now assume the former. For every two values and every put . Since the former holds for every of product form we get:

with given by using from above. The claim follows:

Remark 3.3.
  1. Lem. 3.2 shows that definition 3.1 generalizes the one from [26].

  2. Thm. 4.4 in [2] shows that definition 3.1 also generalizes the one from [2].

  3. In contrast with [5, 2, 26] definition 3.1 can accommodate any variable from or at any spot of the conditional independence statement.

  4. satisfies the semi-graphoid/separoid axioms (see [5, 22, 11] or see rules 1-5 in Lem. 4.5 for ) as these rules hold for any distribution and are preserved under conjunction.

4 -Separation

In this section we will define -separation on directed mixed graphs (DMG) and present the generalized directed global Markov property stating that every iSCM will entail the conditional independencies that come from -separation in its induced DMG. We again will closely follow the work in [9].

Definition 4.1 (Directed mixed graph (DMG)).

A directed mixed graph (DMG) consists of a set of nodes together with a set of directed edges () and bidirected edges (). In case contains no directed cycles it is called an acyclic directed mixed graph (ADMG).

Definition 4.2 (-Open path in a DMG).

Let be a DMG with set of nodes and a subset. Consider a path in with nodes: