1 Introduction
Difficult causal questions, such as ‘does eating meat cause cancer’ or ‘would increasing the minimum wage lead to a fall in employment’ are fundamental to decisions around how our society is structured and our understanding of the world. The development of Causal Graphical Models (CGMs) and the docalculus Pearl (1995, 2009)
has given us an extremely rich and powerful framework with which to formalise and approach such questions. This framework is presented as fundamentally extrastatistical  Pearl has argued forcefully that (Bayesian) probability theory alone is not sufficient for solving causal problems
Pearl (2001).The notion that causality fundamentally requires new mathematics and that causal questions cannot be solved within existing paradigms for probabilistic inference has led to extensive controversy and debate, eg Gelman (2009, 2019). This debate has been particularly intense between proponents of causal modelling and Bayesian modellers, perhaps not surprisingly, since the Bayesian approach to combining assumptions with data is typically presented as sufficiently general to tackle any probabilistic inference problem (although computational constraints may make it impractical).
In this paper, we demonstrate how the assumptions encoded by causal graphical models can be represented with a probabilistic graphical model (PGM). The advantage of doing so is mostly conceptual: it allows Bayesian practitioners to represent and reason about the modelling assumptions required for causal inference in a framework with which they are familiar. However, there may also be practical benefits in cases where causal queries are not identifiable via the docalculus. In such cases, it is fundamentally impossible to infer the exact outcome of an intervention, even given infinite preinterventional data without additional assumptions. Modelling such problems within a standard Bayesian inference setting allows us to leverage a vast body of existing research on combining assumptions with data to obtain finite sample estimates for distributions of interest. While the posterior distribution will always remain sensitive to the prior (unless we add assumptions about the functional form of the relationships between variables) we may still obtain useful bounds. The disadvantage of modelling causal questions explicitly as a single PGM is that it is more cumbersome and computationally expensive (unless we use the machinery of the docalculus to identify appropriate reparameterisations).
1.1 Representing a Causal Problem with a Probabilistic graphical model
In the following sections we show how a causal query can be represented with a PGM and how to do causal inference via this approach. For the necessary background on probabilistic and causal graphical models, we refer readers to the appendix.
To represent an intervention with an ordinary Probabilistic graphical model, we must explicitly model the pre and post intervention systems and the relationship between them. Algorithm 1 constructs a probabilistic graphical model for a specific intervention in a causal graphical model.
Algorithm 1: CausalBayesConstruct
Input: Causal graph and intervention .
Output: Probabilistic graphical model representing this intervention

Draw the original causal graph inside a plate indexed from to represent the data generating process.

For each variable , parameterize by adding a parameter with a link into .

Draw the graph after the intervention by setting and removing all links into it. Rename each of the variables to distinguish them from the variables in the original graph, e.g. becomes .

Connect the two graphs linking to the corresponding variable in the postinterventional graph, for each excluding .
A PGM constructed with Algorithm 1 represents exactly the same assumptions about a specific intervention as the corresponding CGM, see Figures 1 and 2 for an example. We have just explicitly created a joint model over the system pre and postintervention, which allows the direct application of standard statistical inference, rather than requiring additional notation and operations that map from one to the other  as the docalculus does. The Bayesian model is specified by the parameterization of the conditional distribution of variables given their parents, and priors may be placed on the parameters . The fact that the parameters are shared for all pairs of variables excluding , captures the assumption that all that is changed by the intervention is the way takes its value  the conditional distributions for all other variables given their parents are invariant.
1.2 Causal Inference with Probabilistic graphical models
The result of Algorithm 1 is a Probabilistic graphical model on which we can do inference with standard probability theory rather than the docalculus, and which has properties such as arrow reversal (by the use of Bayes rule). To infer causal effects we compute a predictive distribution for the quantity of interest in the postintervention graph using Bayes rule, integrating out all parameters, latent variables and any observed variables that are not of interest, for each setting of the treatment .
To make this procedure clearer, let be the set of variables in the original causal graph , excluding the variable we intervene on, , and be the corresponding variables in the postinterventional graph. We have: : the set of model parameters, : a matrix of the observations of variables , collected preintervention,
: a vector of the
observed values of the treatment variable , , : The variables of the system postintervention, : the value that the intervened on variable is set to, : the variable of interest postintervention.The goal is to infer the value of the unobserved postinterventional distribution over , given the observed data and and a selected treatment . By construction, conditional on the parameters , the postinterventional variables are independent of data collected preintervention . The value of the intervention is set exogenously^{1}^{1}1Also has no marginal distribution  it is a constant set by the intervention  so is independent of both and
. This ensures joint distribution over
factorize into three terms: a prior over the parameters , the likelihood for the original system , and a predictive distribution for the postinterventional variables given parameters and intervention :We then marginalize out ,
(1) 
and condition on the observed data ,
(2) 
Finally, if the goal is to infer mean treatment effects^{2}^{2}2We could also compute conditional treatment effects by first conditioning on selected variables in . on a specific variable postintervention , we can marginalize out the remaining variables in ,
(3) 
If there are no latent variables in , assuming positive density over the domain of and a well defined prior , the likelihood will dominate, and the posterior over the parameters will become independent of the prior at the infinite data limit. The term can be expanded into a product of terms of the form following the factorization implied by the postinterventional graph. From step (3) of Algorithm 1 each of these terms are equal to the corresponding terms , giving results equivalent to Pearl’s truncated product formula Pearl (2009). Authors (2019) demonstrate the equivalence of this approach with the docalculus on a number of worked examples.
2 Conclusion
The paper shows that it is possible to arrive at the same solution for causal problems using both the docalculus and Bayesian theory, the key insight required for the Bayesian formulation is that the probabilistic graphical model must jointly model both the preintervention and post intervention worlds. Our conclusion is similar to that of Lindley et al. (1981), however we provide an explicit mechanism by which we can encode the assumptions implied by a causal graphical model, formalising the notion of exchangability in this context.
References
 Authors (2019) Authors (2019). Replacing the docalculus with bayes rule. arXiv preprint arXiv:1906.07125.
 Dawid (2015) Dawid, A. P. (2015). Statistical causality from a decisiontheoretic perspective. Annual Review of Statistics and Its Application, 2:273–303.
 Gelman (2009) Gelman, A. (2009). Resolving disputes between j. pearl and d. rubin on causal inference. https://statmodeling.stat.columbia.edu/2009/07/05/disputes_about/.
 Gelman (2019) Gelman, A. (2019). “the book of why” by pearl and mackenzie. https://statmodeling.stat.columbia.edu/2019/01/08/bookpearlmackenzie/.
 Jordan (2004) Jordan, M. I. (2004). Graphical models. Statistical Science, 19(1):140–155.
 Lindley et al. (1981) Lindley, D. V., Novick, M. R., et al. (1981). The role of exchangeability in inference. The Annals of Statistics, 9(1):45–58.
 Pearl (1995) Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4):669–688.
 Pearl (2001) Pearl, J. (2001). Bayesianism and causality, or, why i am only a halfBayesian. In Foundations of Bayesianism, pages 19–36. Springer.
 Pearl (2009) Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press, New York.
 Peters et al. (2017) Peters, J., Janzing, D., and Schölkopf, B. (2017). Elements of causal inference: foundations and learning algorithms. MIT press.
3 Appendix
3.1 Background on Probabilistic and Causal graphical models
Probabilistic graphical models (PGMs) combine graph theory with probability theory in order to develop new algorithms and to present models in an intuitive framework Jordan (2004). A Probabilistic graphical model is a directed acyclic graph over variables, which represents how the joint distribution over these variables may be factorized. In particular, any missing edge in the graph must correspond to a conditional independence relation in the joint distribution. There are multiple valid Probabilistic graphical model representations for a given joint distribution. For example, any joint distribution over two variables may be represented by both or .
A causal graphical model (CGM) is a Probabilistic graphical model, with the additional assumption that a link means causes . Think of the data generating process for a CGM as sampling data first for the exogenous variables (those with no parents in the graph), and then in subsequent steps sampling values for the children of previously sampled nodes. An atomic intervention in such a system that sets the value of a specific variable to a fixed constant corresponds to removing all links into  as it is now set exogenously, rather than determined by its previous causes. It is assumed that everything else in the system remains unchanged, in particular the functions or conditional distributions that determine the value of a variable given its parents in the graph. In this way, a CGM encodes more than the factorization (or conditional independence structure) of the joint distribution over its variables; It additionally specifies how the system responds to atomic interventions.
A CGM describes how the structure of a system is modified by an intervention. However, answering causal queries such as "what would the distribution of cancer look like if we were able to prevent smoking?" requires inference about the distributions of variables in the postinterventional system. The donotation is a shorthand for describing the distribution of variables postintervention and the docalculus is a set of rules for identifying which (conditional) distributions are equivalent pre and postintervention. If it is possible to derive an expression for the desired postinterventional distribution purely in terms of the joint distribution over the original system via the docalculus then the causal query is identifiable, meaning assuming positive density and infinite data we obtain a point estimate for it.
Here we present the docalculus in a simplified form that applies to interventions on single variables Pearl (1995, 2009); Peters et al. (2017).
The docalculus
Let be a CGM, represent postintervention (i.e with all links into removed) and represent with all links out of removed. Let represent intervening to set a single variable to ,
Rule 1:
if in
Rule 2:
if in
Rule 3:
if in , and is not a decedent of .
Comments
There are no comments yet.