Bayesian formulations of causal inference enable practitioners to explicitly reason about uncertainty when answering structural questions (e.g., “What is the probability that causes ?”) as well as questions about the effects of a specific intervention (e.g., “How much will intervening to make increase the probability that ?”). Bayesian formulations have been developed for both the potential outcomes framework McCandless et al. (2009) and the causal graphical models framework Friedman and Koller (2000); Griffiths and Tenenbaum (2009); Heckerman et al. (1995). In principle, Bayesian approaches make it possible to incorporate prior knowledge and make efficient use of limited data Mansinghka et al. (2006); Murphy (2012).
In this paper, we explore a new approach to implementing Bayesian causal inference based on probabilistic programming, inspired by Bayesian synthesis Saad et al. (2019). Probabilistic programming languages enable users to compactly specify probabilistic models in code. Some languages, like Stan Carpenter et al. (2017), have syntax that closely resembles the statistical notation often used in the literature to define probabilistic models: a list of equations of the form . Others, like Gen Cusumano-Towner et al. (2019), allow users to include arbitrary program control flow in their models; a model is represented by a program that simulates stochastically from a distribution. In this paper, we represent hypothesized causal models explaining some phenomenon as programs in MiniStan, a simple probabilistic programming language designed to resemble Stan (Figure 1
). Then, we use a more expressive probabilistic programming language, Gen, to encode a prior and likelihood over MiniStan programs, and to do inference. The Gen model (i) stochastically generates MiniStan programs to encode a prior distribution over causal model structures and parameters, (ii) programmatically edits the generated MiniStan programs to reflect interventions and experimental conditions, then (iii) interprets the MiniStan programs to generate observational and experimental data. We can then use Gen’s inference programming and conditioning features to condition the entire process on actual observational and experimental data, and to obtain posterior samples of the MiniStan code defining the original observational model—that is, to perform both structure learning and parameter estimation.
Causal models are typically structured as a set of autonomous components Aldrich (1989); Haavelmo (1944); Pearl (2000), such that interventions in the system can be accurately represented in the model as an alteration of a small number of model components, and all other model components (and the causal relationships among them) remain unchanged. In the formalism of causal graphical models, interventions are typically expressed using the do-operator Pearl (2000)
, which fixes the value of one random variable and removes the influence of its parents. However, many realistic interventions are not accurately represented by this particular variety of model alterationEberhardt and Scheines (2007); Korb et al. (2004); Sherman and Shpitser (2019). For example, realistic interventions might best be represented by altering the functional form of a particular dependence, enabling or disabling specific causes, or enacting complex combinations of these interventions. This paper demonstrates interventions represented as modifications of probabilistic program source code and shows how this representation enables the Bayesian synthesis approach to handle a broad class of experimental data.
1.1 A conceptual example
Consider the task of inferring whether a student’s belief in her ability is causal for success at a research project. Observational data on student belief and student success alone are insufficient to answer this question, due to the confounding effect of skill (see Figures 2a & 2b).
We can imagine multiple types of experiments that would enable effective causal inference despite the confounding effect of skill. For example, an advisor could encourage a student, shifting her belief in her ability (but not increasing her skill). An advisor could also administer an assessment on the key skills needed for the project, before the student attempts it, and look at the results. Unfortunately, although this might reveal the true skill level to the advisor, this might also change the student’s belief in her own ability to succeed. Hypothetically, one can imagine a miracle pill that modifies one’s confidence to a fixed value, without changing anything else. Each of these experiments corresponds to a different modification to the source code from Figures 2c and 2d. Examples of these modifications are shown in Figures 3a-f.
This paper shows how to formalize this example, using probabilistic programs that generate, edit, and interpret the source code of causal models. It also presents results from an implementation in the Gen probabilistic programming language, demonstrating the utility of incorporating diverse sources of experimental data.
2 Priors on Causal Models
To compute the posterior distribution over the two candidate causal models, we first specify a prior distribution over a set of global latent variables. One of these variables, , determines whether influences .
3 Likelihoods for Experiments
To incorporate experimental evidence of various forms, the Bayesian synthesis approach requires an intervention library which consists of a set of code-editing functions that modify causal model programs in the domain specific language. For the conceptual example, our intervention library contains three interventions: (i) an atomic intervention, which applies the do-operator; (ii) a shift intervention, which changes the mean of a distribution by a fixed increment; and (iii) a variance-scaling intervention, which modifies the variance of a random variable assumed to be drawn from a normal distribution. In principle, an intervention library could contain arbitrary rules for modifying causal model source code, including changing the underlying distribution for a random variable or adding variables (latent or observed) that didn’t exist in the observational model.
These interventions can be freely composed to represent a diverse set of experimental scenarios. We demonstrate this compositionality in the “assessment” experiment, which is composed of a shift intervention (a student’s skill may improve if she has to take a test) and a variance-scaling intervention (a student’s belief in her ability has less noise after taking a test).
When interpreted, a causal program in MiniStan represents a likelihood function over observational data. To compute the likelihood of experimental data, we simply modify the causal program using the intervention library before subsequently interpretting the modified program.
We demonstrate the utility of this approach by performing approximate posterior inference over synthesized causal model programs from our conceptual example. In this example we: (i) generate a MiniStan program from the prior, (ii) generate a set of observational and experimental data from the interpreted MiniStan program, and (iii) perform approximate posterior inference over synthesized causal models using sequential Monte Carlo Doucet et al. (2000) with Metropolis Hastings rejuvination. We generated ten individuals’ skill, belief, and outcome for each of the four observational and experimental settings from a single causal model where and .
Using only observational data, the posterior probability of the edge variable is low. This may be because the data can be explained only by appealing to skill, and this simpler model could lead to a higher marginal probability than one which introduces a new parameter (lambda_bo). (This phenomenon is sometimes called “Bayesian Ockham’s Razor”.) However, as we incorporate additional experimental evidence the posterior probability of the edge increases. Similarly, the posterior distribution over , the effect of belief on outcome, concentrates around the true value as we leverage experimental evidence.
The Bayesian synthesis approach we have outlined in this paper provides several advantages over alternative approaches to structure discovery and parameter estimation in causal modeling: (i) an explicit characterization of uncertainty over model structures; (ii) a principled way to model diverse interventions; and (iii) a formalization that can be re-used in diverse problems, with varying degrees of prior knowledge, without requiring practitioners to design custom inferences for each use case.
Although this example uses parametric causal models, it is conceptually straightforward to use Gaussian processes and/or Dirichlet process mixture models for the functional forms of causal relationships Saad et al. (2019). It may thus be fruitful to develop Bayesian variants of existing non-parametric techniques for causal inference Imbens (2004); Louizos et al. (2017).
The results reported here were obtained using vanilla sequential Monte Carlo over the joint space of model structure, parameters, and the latent variables in each observation or experiment. In order for this approach to scale to complex models, hierarchical priors over models, and large datasets, we expect more powerful techniques will be necessary. However, the Gen platform provides programmable inference constructs Cusumano-Towner et al. (2019), including hybrids of Hamiltonian Monte Carlo Duane et al. (1987) and Metropolis-Adjusted Langevin Roberts et al. (1996) approaches with sequential Monte Carlo Doucet et al. (2000), that could potentially address some of these scaling challenges.
6 Related Work
Probabilistic programs are often used to represent causal processes Goodman et al. (2012). Some languages, such as Omega Tavares et al. (2019), make this causal interpretation explicit, including a semantics for interventional and counterfactual reasoning. It would be interesting to consider whether the framework we present here, which considers interventions to be arbitrary code-editing procedures, could also be usefully applied to counterfactual reasoning problems.
Incorporating experimental evidence for structure learning and parameter estimation can be thought of as the inner loop of an optimal experimental design procedure. Probabilistic programs have been used to automate this search over experiments Ouyang et al. (2016), seeking to maximize the expected information gain over some query given new evidence. In that work, experiments are modeled as arguments to a probabilistic program. Our approach instead describes an experiment as a modification of MiniStan programs, enabling a clean abstraction between the specification of causal models (or distributions over causal models) and interventions that modify those models.
Improving methodology for combining observational and experimental evidence has far-reaching implications for a wide variety of scientific disciplines, and has received significant attention in the graph-based causal inference literature. For example, extensions of the do-calculus have been developed to incorporate experiments expressed as atomic interventions given a known causal graphical model structure Lee et al. (2019). Recent extensions of existing graph-based structure discovery algorithms have been made to incorporate atomic interventions Wang et al. (2017) and imperfect interventions Yang et al. (2018). Our work proposes characterizing imperfect interventions as code-editors acting on probabilistic programs; this representation enables us to perform posterior inference (with uncertainty estimates) over both structure and model parameters.
We thank Javier Burroni, Dan Garant, Zenna Tavares, and Reilly Grant for thoughtful discussion.
- Autonomy. Oxford Economic Papers 41 (1), pp. 15–34. Cited by: §1.
- Stan: a probabilistic programming language. Journal of statistical software 76 (1). Cited by: §1.
- Gen: a general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, New York, NY, USA, pp. 221–236. External Links: Cited by: §1, §5.
- On sequential monte carlo sampling methods for bayesian filtering. Statistics and computing 10 (3), pp. 197–208. Cited by: §4, §5.
- Hybrid monte carlo. Physics letters B 195 (2), pp. 216–222. Cited by: §5.
- Interventions and causal inference. Philosophy of Science 74 (5), pp. 981–995. Cited by: §1.
Being bayesian about network structure.
Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pp. 201–210. Cited by: §1.
- Church: a language for generative models. arXiv preprint arXiv:1206.3255. Cited by: §6.
- Theory-based causal induction.. Psychological review 116 (4), pp. 661. Cited by: §1.
- The probability approach in econometrics. Econometrica: Journal of the Econometric Society, pp. iii–115. Cited by: §1.
Learning bayesian networks: the combination of knowledge and statistical data. Machine learning 20 (3), pp. 197–243. Cited by: §1.
- Nonparametric estimation of average treatment effects under exogeneity: a review. Review of Economics and statistics 86 (1), pp. 4–29. Cited by: §5.
- Varieties of causal intervention. In Pacific Rim International Conference on Artificial Intelligence, pp. 322–331. Cited by: §1.
- General identifiability with arbitrary surrogate experiments. Cited by: §6.
- Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems, pp. 6446–6456. Cited by: §5.
- Structured priors for structure learning.. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI 2006), Cited by: §1.
- Bayesian propensity score analysis for observational data. Statistics in medicine 28 (1), pp. 94–112. Cited by: §1.
- Machine learning: a probabilistic perspective. MIT Press. Cited by: §1.
- Practical optimal experiment design with probabilistic programs. arXiv preprint arXiv:1608.05046. Cited by: §6.
- Causality: models, reasoning and inference. Vol. 29, Springer. Cited by: §1.
- Exponential convergence of langevin distributions and their discrete approximations. Bernoulli 2 (4), pp. 341–363. Cited by: §5.
- Bayesian synthesis of probabilistic programs for automatic data modeling. Proceedings of the ACM on Programming Languages 3 (POPL), pp. 37. Cited by: §1, §5.
- Intervening on network ties. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence, pp. . Cited by: §1.
- Soft constraints for inference with declarative knowledge. Cited by: §6.
- Permutation-based causal inference algorithms with interventions. In Advances in Neural Information Processing Systems, pp. 5822–5831. Cited by: §6.
- Characterizing and learning equivalence classes of causal dags under interventions. arXiv preprint arXiv:1802.06310. Cited by: §6.