Is Causal Reasoning Harder than Probabilistic Reasoning?

by   Milan Mosse, et al.

Many tasks in statistical and causal inference can be construed as problems of entailment in a suitable formal language. We ask whether those problems are more difficult, from a computational perspective, for causal probabilistic languages than for pure probabilistic (or "associational") languages. Despite several senses in which causal reasoning is indeed more complex – both expressively and inferentially – we show that causal entailment (or satisfiability) problems can be systematically and robustly reduced to purely probabilistic problems. Thus there is no jump in computational complexity. Along the way we answer several open problems concerning the complexity of well known probability logics, in particular demonstrating the ∃ℝ-completeness of a polynomial probability calculus, as well as a seemingly much simpler system, the logic of comparative conditional probability.



There are no comments yet.


page 1

page 2

page 3

page 4


Probabilistic Reasoning across the Causal Hierarchy

We propose a formalization of the three-tier causal hierarchy of associa...

Causal Modeling with Probabilistic Simulation Models

Recent authors have proposed analyzing conditional reasoning through a n...

Probabilistic Logic Programming under Inheritance with Overriding

We present probabilistic logic programming under inheritance with overri...

Nonmonotonic Probabilistic Logics between Model-Theoretic Probabilistic Logic and Probabilistic Logic under Coherence

Recently, it has been shown that probabilistic entailment under coherenc...

Probability Distinguishes Different Types of Conditional Statements

The language of probability is used to define several different types of...

On the Conditional Logic of Simulation Models

We propose analyzing conditional reasoning by appeal to a notion of inte...

Unifying Hidden-Variable Problems from Quantum Mechanics by Logics of Dependence and Independence

We study hidden-variable models from quantum mechanics, and their abstra...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Motivation and Preview

There is an uncontroversial sense in which causal reasoning is more difficult than purely probabilistic or statistical reasoning. The latter seems hard enough: estimating probabilities, predicting future events from past observations, determining statistical significance, adjudicating between statistical hypotheses—these are already formidable tasks, long mired in controversy. No free lunch theorems

(Shalev-Shwartz and Ben-David, 2014; Belot, 2020) show that strong assumptions are necessary to gain any inductive purchase on such problems, and there is considerable disagreement about what kinds of assumptions are reasonable in different epistemic and practical circumstances (Efron, 1978). Problems of causal inference only seem to make our tasks harder. Inferring causal effects, predicting the outcomes of interventions, determining causal direction, learning a causal model—these problems typically demand statistical reasoning, but they also demand more on the part of the investigator. They may require that we actively interrogate the world through deliberate experimentation rather than passive observation, or that we antecedently accept strong assumptions sufficient to justify the causal conclusions we want to reach, or (very often) both. Indeed, statistical indistinguishability is the norm in causal inference, even with substantive assumptions (Spirtes et al., 2000). As formalized in the causal hierarchy theorem of Bareinboim et al. (2020) (see also Ibeling and Icard 2021), it is not only impossible to infer causal information from purely correlational (or “observational”) data, but also generically impossible to infer counterfactual or explanatory information from purely experimental (or “interventional”) data. From an inferential perspective, probabilistic information vastly underdetermines causal information.

A feature common to both statistical inference and causal inference is that the most prominent approaches to each can be understood, at least in part, as attempts to turn an inductive problem into a deductive one. This is famously true of frequentist methods in the tradition associated with Neyman and Pearson (see Neyman 1977), but is arguably true of Bayesian approaches as well. As Gelman and Shalizi (2013)

suggest, “Statistical models are tools that let us draw inductive inferences on a deductive background,” rendering statistical inferences “deductively guaranteed by probabilistic assumptions” (p. 27). Indeed, one of the benefits of specifying a Bayesian probability model is that it provides an answer to virtually any question about the probability of a hypothesis conditional on data. Given the model and the data, this answer follows as a matter of logic.

Causal underdetermination is likewise confronted with methods for formulating precise inductive assumptions, sometimes allowing answers to causal questions to be derived by mere calculation.

Example 1.1 (Do-calculus).

As one prominent example, the do-calculus of Pearl and collaborators (see Pearl 1995 and Ch. 3 of Pearl 2009) establishes systematic correspondences between qualitative (“graphical”) properties of a causal scenario and certain conditional independence statements involving causal quantities. A typical causal quantity of interest is the (average) causal effect, e.g., how likely is to take on value given an intervention setting to . In a formal language (introduced in the sequel as ), we write this as , or more briefly, .

Absent assumptions, it is never possible to infer the value of from observational data (Bareinboim et al., 2020). Suppose, however, that we could assume the causal structure has something like the following shape (known in the literature as the front door graph):

For a standard example, we might assume that any causal effect of smoking () on cancer () will be mediated by tar deposited in the lungs (), and moreover that any unknown sources of variation () on or on (or on both), such as a person’s genotype, do not directly influence . Under these circumstances, the do-calculus licenses several substantive causal assumptions, which may be rendered precisely in . Let be the set of equality statements below:

  1. [label=()]

For instance, 1 says that the causal effect of on simply coincides with the conditional probability . Appealing to a combination of laws of probability and distinctively causal laws involving the “causal-conditional” statements like , it is possible to show that the following equality is in fact entailed by the statements , that is, by 1-4:


In other words, (1) shows that the causal effect of on can simply be calculated from suitable observational data involving the variables .

More broadly, a number of different approaches to inductive inference, both statistical and causal, can be assimilated to a regiment something like this:


In Example 1.1, are the inductive assumptions, the data would be information about , and the conclusion would be an estimate of the causal effect of on

. In a standard Bayesian analysis, the inductive assumption might be a prior probability model for some latent variables (e.g., parameters for a class of probability measures), while the data would be values of some observable variables, and the conclusion might be the posterior values for the hidden variables, or perhaps posterior predictive values for some yet-to-be-observed variables. A critical job of the statistician or data scientist is to identify suitable inductive assumptions that a relevant party judges reasonable (or, ideally if feasible, which are themselves empirically verifiable) and that are sufficiently strong to license meaningful conclusions from the types of data available.

From this vantage point our titular question takes on a new significance. Rather than asking about the difficulty of an inference task in terms of the strength of assumptions needed to justify the inference, we could instead ask how difficult it is in general, computationally speaking, to reason from inductive assumptions (together with data) to an inferential conclusion, in the strong sense of (2). In other words, we ask how difficult questions like (2) could be across different logical languages for describing relevant assumptions, data, and conclusions.

The contrast of interest in this article is between languages , suitable for probabilistic reasoning, and languages , which extend the corresponding probabilistic languages to encompass causal reasoning in addition. In short,

encompasses “pure” probabilistic reasoning about some set of random variables. In

we also reason about the probabilities of causal conditionals, the causal effect being a simple example. Such mixed reasoning is crucial for applications like the do-calculus, where causal conclusions depend on distinctively causal assumptions (such as 1-4 in Example 1.1). Some of the emblematic principles of reveal a subtle interplay between the probabilistic and causal-conditional components. For example, the formula


emerges as an instance of a more general scheme in a complete axiomatization of (see Ibeling and Icard 2020), implying that and cannot each causally affect the other.

In light of the considerable empirical (and expressive) gulf between these two kinds of languages, we might expect to see a parallel jump in computational complexity when moving from to . In a certain respect, can be seen as a combination of logics, embedding one modal system (a conditional logic) inside another (a probability logic), with non-trivial interactions between the two (such as (3)). It is common wisdom that such combinations may in general drive up complexity, in some cases even resulting in undecidability (see, e.g., Kurucz 2007). The present work introduces two main results, which show that this does not happen here: causal reasoning and probabilistic reasoning are, in a precise and robust sense, equally difficult.

The distinction between and is orthogonal to another distinction, namely how much arithmetic we admit in our formal language of probability over a set of probability terms . A wide range of probability logics have been studied in the literature, from pure qualitative comparisons between probability terms (e.g., de Finetti 1937) to richer fragments capable of reasoning about polynomials over such terms (e.g., Scott and Krauss 1966). For any such choice of probabilistic language we can consider the extension to allow not only probability terms, but also causal-probability terms like those introduced above. A strength of our analysis is that we provide a complexity-reflecting reduction from to in a way that is independent of our choice of probabilistic primitives. Thus, across the landscape of probability logics, we see no increase in complexity. Summarizing, our main result states:

Theorem 1 (Informal).

Probabilistic reasoning is no harder than causal reasoning. In particular:

  1. Reasoning about (causal or non-causal) probabilities is as hard as reasoning about sums of (causal or non-causal) probabilities; both are as hard as reasoning about Boolean formulas, or about sums of real numbers.

  2. Reasoning about (causal or non-causal) conditional probabilities is as hard as reasoning about arbitrary polynomials in (causal or non-causal) probabilities; both are as hard as reasoning about arbitrary polynomials in real numbers.

While the relationship between probabilistic and causal languages is our main focus, it is worth pointing out that some of our results are of interest beyond the connection with causality. We consider two very weak probabilistic languages, which one might expect to have relatively low complexity: one for reasoning about qualitative comparisons between conditional probability terms, and another that combines equality statements between probability terms with simple independence statements. Remarkably, both of these systems are at least as hard as the full existential first-order theory of real numbers (), and the logic of comparative conditional probability we show to be -complete, thus establishing another notable example of a problem complete for this class. It is also noteworthy that these weak probabilistic languages are—from a computational perspective—as complex as the most expressive causal languages we consider in the paper (namely, ).

Relation to previous work

There is a long line of work on probability logic, including a host of results about complexity (Fagin et al., 1990; Abadi and Halpern, 1994; Speranski, 2012; Ognjanović et al., 2016). As just mentioned, our contribution advances this literature. Concerning causal reasoning, there have been a number of complexity studies for various non-probabilistic causal notions (Eiter and Lukasiewicz, 2002; Aleksandrowicz et al., 2017). Most germane to the present study is Halpern’s (2000) analysis of the satisfiability problem for deterministic reasoning about causal models, which he shows to be -complete (the same as propositional logical reasoning). Eiter and Lukasiewicz (2002) studied numerous model-checking queries in a probabilistic setting, including the problem of determining the probability of a specific causal query. They show that this problem is complete for the class , the “counting analogue” to which also characterizes the problem of determining (approximations for) probabilities of (even very simple) propositional expressions (Roth, 1996).

Our interest in the present contribution is the complexity of reasoning—viz. testing for satisfiability, validity, or entailment, as portrayed in (2)—for probabilistic and causal languages. While this angle has not yet been explored thoroughly in the literature, our study is indebted to, and draws upon, much of this previous work. Theorem 1 synthesizes as well as greatly extends a heretofore piecemeal line of results (Fagin et al., 1990; Ibeling, 2018; Ibeling and Icard, 2020). Moreover, the results just mentioned by Halpern (2000) and by Eiter and Lukasiewicz (2002) could be said to lend further support to the claim that causal reasoning is no more difficult (in the sense of computational complexity) than purely probabilistic reasoning.

Overview of the paper

In the next two sections (§2 and §3), we introduce the languages and the notions from computational complexity needed to state Theorem 1 more formally. The proof of this main result appears in §4. Finally, in §5 we zoom out to consider what our results show about the relationship between probabilistic and causal reasoning, as well as consider a number of outstanding problems in this domain. In our presentation we assume no prior knowledge of causal modeling, complexity theory, or probability logic. Only elementary logic and probability are presupposed.

2 Introducing Causal and Probabilistic Languages

In this section, we introduce the syntax and semantics for a series of probabilistic and causal languages. With a precise syntax and semantics in hand, we rehearse mostly known examples and arguments that illustrate that these languages form a strict hierarchy, along two distinct dimensions.

2.1 Syntax

Let be a (possibly infinite) collection, representing the (endogenous) random variables under consideration. Informally, these are the variables that we may want to observe, change, query, or otherwise reason about explicitly.

For each variable , let denote the finite signature (range) of

. For example, for two binary variables we have

with . We introduce the following deterministic languages:

Choose either or as the base language . The former is essentially a propositional language with extended ranges, while the latter is a causal conditional language. The semantics of these formulas will be introduced in §2.2, but intuitively we can interpret a formula of , such as , as expressing a subjunctive conditional: were to take on value , then would come to have value . We understand the conditional causally, in a sense to be made precise below.

So-called terms over the base language are the main ingredient of our probabilistic languages. The most basic term is for , representing the probability of . By varying the composite terms admitted, we can define polynomial, conditional, linear, and comparative languages. Where are formulas of :

We define for each the causal and purely probabilistic languages:

Several of these probabilistic languages have appeared in the literature. For instance, appeared already in early work by Scott and Krauss (1966), while was introduced explicitly by Fagin et al. (1990). The language was introduced and studied recently in Ibeling and Icard (2020) (see also Bareinboim et al. 2020 and Eiter and Lukasiewicz 2002). Many of these languages, however, have not yet received explicit treatment.

2.2 Semantics

2.2.1 Structural Causal Models

The semantics for all of these languages will be defined relative to structural causal models, which can understood as a very general framework for encoding data-generating processes. In addition to the endogenous variables , structural causal models also employ exogenous variables as a source of random variation among endogenous settings. For extended introductions, see, e.g., Pearl (2009); Bareinboim et al. (2020).

Definition 2.1.

A structural causal model (SCM) is a tuple , with:

  1. [label=()]

  2. a set of endogenous variables, with each taking on possible values ,

  3. a set of exogenous variables, with each taking on possible values ,

  4. a probability measure on a -algebra on , and

  5. a set of structural functions, which determine the value of each given the values of the exogenous variables and those of the other endogenous variables .

Here we will assume for convenience that and are all finite.

In addition, we adopt the common assumption that our SCMs are recursive:

Definition 2.2.

A SCM is recursive if there is a well-order on such that respects in the following sense: for any , whenever have the property that for all , we are guaranteed that .

Intuitively, is recursive if for all , the function ensures that the value of is determined only by the exogenous random variables and endogenous random variables for which . Thus in a recursive model , the probability measure on

induces a joint probability distribution

over values of the variables .

Causal interventions represent the result of a manipulation to the causal system, and are defined in the standard way (e.g., Spirtes et al. 2000; Pearl 2009):

Definition 2.3.

An intervention is a partial function . It specifies variables to be held fixed and the values to which they are fixed. An intervention induces a mapping, also denoted , of systems of equations , such that is identical to , but with replaced by the constant function for each . Similarly, where is a model with equations , we write for the model which is identical to but with the equations in place of .

In order to guarantee that interventions lead to a well-defined semantics, we work with structural causal models which are measurable:

Definition 2.4.

We say that is measurable if under every finite intervention

, the joint distribution

associated with the model is well-defined.

For measurable models, one can define a notion of causal influence:

Definition 2.5.

A model induces the influence relation when there exist values and interventions differing only in the value they impose upon for which111The truth definition for is introduced formally below in §2.2.2.

Given an enumeration of variables compatible with a well-order , the model is compatible with when it induces no instance with .

To illustrate the preceding definitions, we return to the front door graph shown in Example 1.1, and demonstrate an example of a SCM that is compatible with this graph:

Example 2.6.

Consider the SCM , with the exogenous , each of which has probability of being 1 and probability of being 0, and with three endogenous variables . The equations are given by

We observe that is measurable and recursive with the ordering given by . Further, and , so that indeed realizes the front door graph and is compatible with .

2.2.2 Interpretations of Terms and Truth Definitions

It suffices to give the semantics for , since this language includes all of the other languages introduced above. A model is a recursive and measurable SCM . For each assignment of values to exogenous variables, each , and each , we define if the equations together with the assignment u assign the value to . Conjunction and negation are defined in the usual way, giving semantics for for any . If holds for all , then we simply write . When the relation does not depend on at all—that is, we have iff for all and all formulas —we say that the equations are deterministic. For , we write when for all , where material implicaiton is defined in the usual way.

For each intervention and each , we define iff , where is the intervention which effects the assignments described by . We also allow that may be the trivial intervention , in which case we simply write instead of . We define

For conditional probability terms we define when and using the above definition and the usual ratio definition otherwise. For two terms , we define iff . The semantics for negation and conjunction are defined in the usual way, giving a semantics for for any .

With this semantics, probability behaves as expected. For example, we have the following validity for any :

Causal interventions behave as expected as well. Indeed, fix any model with equations , any variable , and any assignment u of values to the exogenous variables. Then takes on at least and at most one value upon the intervention : this is trivial if intervenes on , and it otherwise follows immediately from the fact that once u is fixed, the values of all variables are determined by the equations . In other words, in the language for any , we have the validity for all and u:

More generally, for each , the indexed box can be thought of as a normal, functional modal operator.

Having introduced the syntax and semantics for several languages and pointed to some basic validities, we recall in the next subsection various results and examples that illustrate the expressive relationships between these languages.

2.3 A Two-Dimensional Hierarchy

The languages for and form hierarchies along two axes. First, the purely probabilistic language is always less expressive than the corresponding causal language . Second, is less expressive than both and , both of which are less expressive than the language . When we say that one language is less expressive than another, we mean that no statement in the less expressive language distinguishes two models which can be distinguished by some statement in the more expressive language.

Drawing arrows from less expressive languages to more expressive ones, the hierarchy can be shown graphically:

The arrow in the center of these squares is meant to indicate that is less expressive than for any choice of .

In this section, mostly rehearsing familiar results and examples, we illustrate that the expressivity of the languages does indeed vary along these axes; each arrow indicates a strict increase in expressivity.

2.3.1 First Axis: From Probabilistic to Causal

To illustrate the expressivity of causal as opposed to purely probabilistic languages, we recall a variation by Bareinboim et al. (2020) on an example due to Pearl (2009):

Example 2.7 (Causation without correlation).

Let , where U contains two binary variables such that , and V contains two variables such that and . Then and are independent. Having observed this, one could not conclude that has no causal effect on ; indeed, consider the model , which is like , except with the mechanisms:

Here is the indicator function for statement , equal to if holds and otherwise. In this case , so that the models are indistinguishable in any of the probabilistic languages . However, the models are distinguishable in , and so in all of the other causal languages. Indeed, note that while . Then, for instance, the following statement

belongs to and distinguishes from .

As shown in Bareinboim et al. (2020) (cf. also Suppes and Zanotti 1981), the pattern in Example 2.7 is universal: for any model it is always possible to find some that agrees with on all of but disagrees on .222The Causal Hierarchy Theorem of Bareinboim et al. (2020) (refer to Ibeling and Icard 2021 for a topological version, enabling the relevant generalization to infinite ) involves an intermediate language between and , capturing the type of causal information revealed by controlled experiments. Even this three-tiered hierarchy is strict, and in fact one can go further to obtain an infinite hierarchy of increasingly expressive causal languages between and . Because we are showing that there is a complexity collapse even from the most expressive to the least expressive systems, we are not concerned in the present work with these intermediate languages.

Theorem 2.

is more expressive than . What is stronger, no -theory (i.e., maximally consistent set in this language) uniquely determines a -theory.

2.3.2 Second Axis: From Qualitative to Quantitative

Focusing just on probabilistic languages, we will show that is less expressive than both and , and that both of these are less expressive than the language . In each case, it suffices to give two measures and which are indistinguishable in the less expressive language but which can be distinguished by some statement in the more expressive one.

Comparative probability.

First, we claim that is less expressive than . Suppose we have just a single binary variable , abbreviating by and by . Then let so that , and let so that . The qualitative order on the four events is the same, but, for instance, , while .

Next, we recall an example due to Luce (1968), which shows that is less expressive than . Let each be events corresponding to the three possible values taken by a random variable. Consider the measures and . Then the two orders are the same, because for

However, the conditional probabilities differ: , while . In other words, the measures are indistinguishable in but distinguishable in .

Polynomials in probabilities.

In fact, the measures in Luce’s example are not even distinguishable in , though they are distinguishable in , and so in . This shows that is more expressive than . Finally, we give an example to show that is less expressive than . Incidentally, the same example shows that distinguishes models that cannot.

As above, let be events corresponding to possible values taken by a random variable. Define , while . One can verify by exhaustion that all comparisons of conditional probabilities agree between and , thus they are indistinguishable in . At the same time, there are statements in in which the models differ. For example, , whereas . This shows that is less expressive than . Further, we observe that can be distinguished in : for but not for , and this statement is equivalent to the statement in that

This observation, together with the earlier remark that Luce’s example is not distinguishable in , shows that and are incomparable in expressivity.

Summarizing the results of this section:

Theorem 3.

and are incomparable in expressive power, and both strictly more expressive than . Meanwhile, is strictly more expressive than any of the other three languages.

3 Introducing Computational Complexity

In this section, we introduce the ideas from complexity theory needed to state our main results. We denote by the satisfiability problems for , respectively, where . There are two key definitions:

Definition 3.1.

Say that a map preserves and reflects satisfiability when is satisfiable if and only if is satisfiable. Such a map is called a many-one reduction of to . Such a map is said to run in polynomial time

if it is computable by a Turing machine in a number of time steps that is a polynomial function of the length

of the input formula; when the Turing machine is non-deterministic, the map is said to be non-deterministic as well.

Definition 3.2.

A decision problem maps an input, represented as a binary string, to an output “yes” or “no.” For example, maps a standard encoding of the formula to “yes” if it is satsifiable and to “no” otherwise. When each member of a collection of decision problems can be reduced via some deterministic, polynomial-time map to a particular decision problem , one says that the problem is -complete. The class of decision problems is called a complexity class.

When a problem is complete for some complexity class, this means that the complexity class fully characterizes the difficulty of the problem: the problem is at least as “hard” as any of the problems in , and it is itself in . Thus any two problems which are complete for a complexity class are equally hard, since each can be reduced in deterministic polynomial time to the other. Complete problems facilitate results relating complexity classes: to show that a class is contained in another , it suffices to give deterministic, polynomial-time, many-one reduction from a problem which is complete for to any problem .

Fagin et al. (1990) showed that is complete for the complexity class . That is also -complete follows quickly from this result and the Cook-Levin theorem (Cook, 1971), which says that Boolean satisfiability is -complete as well. For clarity, we include these known results in the statement of our main result, which gives completeness results for all of the other probabilistic and causal languages defined above:

Theorem 1.

We characterize two sets of tasks:

  1. are -complete.

  2. are -complete.

Since problems that are complete for a class are all equally hard, our main results imply that causal and probabilistic reasoning in these languages do not differ in complexity. In the remainder of this section, we introduce the complexity classes and . We note from the start that the inclusions are known, where is the set of problems solvable using polynomial space; it is an open problem whether either inclusion is strict. Further, ten Cate et al. (2013) show that is closed under many-one -reductions; and are also closed under many-one -reductions. By this we mean that if one wishes to show that a satisfiability problem is in , it suffices to find a polynomial-time, non-deterministic satisfiability-preserving map that reduces the problem to one that is known to be in .

3.1 The Class

The class contains any problem that can be solved by a non-deterministic Turing machine in a number of steps that grows polynomially in the input size. Equivalently, it contains any problem solvable by a polynomial-time deterministic Turing machine, when the machine is provided with a polynomial-size certificate, which we think of as providing the solution to the problem, or “lucky guesses.” In this case we think of the deterministic Turing machine as a verifier, tasked with ensuring that the certificate communicates a valid solution to the problem.

Hundreds of problems are known to be -complete. Among them are Boolean satisfiability and the decision problems associated with several natural graph properties, for example possession of a clique of a given size or possession of a Hamiltonian path. See Ruiz-Vanoye et al. (2011) for a survey of such problems and their relations.

3.2 The Class

The Existential Theory of the Reals (ETR) contains all true sentences of the form

where is a system of equalities and inequalities of arbitrary polynomials in the variables . For example, one can state in ETR the existence of the golden ratio, which is the only root of the polynomial greater than one, by “there exists satisfying .” The decision problem of saying whether a given formula ETR is complete (by definition) for the complexity class .

The class is the real analogue of , in two senses. Firstly, the satisfiability problem that is complete for features real-valued variables, while the satisfiability problems that are complete for typically feature integer- or Boolean-valued variables. Secondly, and more strikingly, Erickson et al. (2020) recently showed that while is the class of decision problems with answers that can be verified in polynomial time by machines with access to unlimited integer-valued memory, is the class of decision problems with answers that can be verified in polynomial time by machines with access to unlimited real-valued memory.

As with , a myriad of problems are known to be -complete. We include some examples that illustrate the diversity of such problems:

  • In geometry, there is the -complete “art gallery” problem of finding the smallest number of points from which all points of a given polygon are visible (Abrahamsen et al., 2018).

  • In graph theory, there is the -complete problem of deciding whether a given graph can be realized by a straight line drawing (Schaefer, 2013).

  • In game theory, there is the

    -complete problem of deciding whether an (at least) three-player game has a Nash equilibrium with no probability exceeding a fixed threshold (Bilò and Mavronicolas, 2017).

4 Our results

In this section, we prove our main result, Theorem 1. To do this, we first establish that one can reduce satisfiability problems for causal languages to corresponding problems for purely probabilistic languages.

4.1 Reduction

Definition 4.1.

Fix a set of operations on , and for a given placeholder set , let be the set of terms generated by application of operations in to members of . Define

The semantics for these languages are restricted to recursive SEMs.

Proposition 4.2 (Reduction).

There exists a many-one reduction from to .

We first give a prose overview of the main ideas underlying the reduction. Fix . The key observation is that the reduction is straightforward when every with mentioned in is a complete state description, where a complete state description says, for each possible intervention and each variable, what value that variable takes upon that intervention. Indeed, complete state descriptions have three nice properties:

  1. Polynomial-time comparison to ordering. One can easily check whether a complete state description implies influence relations conflicting with a given order on the variables appearing in it. Indeed, one simply reads which variables influence which variables off of the intervention statements appearing in .

  2. Existence of model matching probabilities. If a collection of complete state descriptions does not conflict with an order , then any probability distribution on the descriptions has a recursive model that induces it; briefly, one can simply take a distribution over deterministic models for the mutually unsatisfiable descriptions .

  3. Small model property. At most complete state descriptions are mentioned in , and so at most that many receive positive probability in any model satisfying .

These properties will allow a reduction to go through. Indeed, fix . Given that is satisfiable, one can request as an certificate an ordering and (relying on #3) the small set of complete state descriptions receiving positive probability. One then checks (relying on #1) that these descriptions do not conflict with . Since is satisfiable only if there exists a measure satisfying its inequalities, one can safely translate those inequalities into the probabilistic language, giving a probabilistic formula . If the probabilistic formula is satisfiable via some measure, one can (relying on #2) infer a corresponding recursive model for the causal formula . Thus the map preserves and reflects satisfiability.

As it turns out, the same reduction goes through in the general case, when the for which is mentioned in need not be complete state descriptions. Roughly, the strategy is to simply replace every such that is mentioned in with an equivalent disjunction of complete state descriptions. The primary complication with this strategy is that there are too many possible interventions, variables, and values those variables could take on; truly complete state descriptions are exponentially long, making the reduction computationally intractable. To address this issue, we work with a restricted class of state descriptions, which feature only the interventions, variables, and values appearing in the input formula :

Definition 4.3.

Fix a formula . Let contain all interventions appearing in and let denote all variables appearing in . For each variable , let contain whenever or appears in , and let it also contain one assignment not satisfying either of these conditions. Let contain all possible interventions paired with all possible assignments, where the possibilities are restricted to :

Call the results of the intervention , and the result for of the intervention . We write when as shorthand for . We write when contains some assignment .

The following three lemmas confirm that even working with this restricted class of state descriptions, (versions of) the three nice properties outlined above are retained.

Definition 4.4.

Fix a formula and . Fix a well-order on . Enumerate the variables in in a way consistent with . The formula is compatible with when there exists a model that assigns positive probability to and that is compatible with . Define to contain all compatible with .

Lemma 4.5 (Polytime Comparison to Ordering).

Fix . Given a set , one can check that and that each is compatible with in time polynomial in .

This lemma shows that given some statement and a set of formulas , one can efficiently (i.e. in polynomial time) check that the formulas satisfy two conditions. The first condition is that the formulas describe, in the fullest terms possible, the ways that could be true (i.e. ). The second is that the formulas do not rule out the causal influence relations specified by the order , for example the relations induced by the model of smoking’s effect on lung cancer discussed in Example 1.1 and Example 2.6.


Checking that is fast, since one can simply scan to make sure that mentions precisely interventions mentioned in all ; that mentions precisely the variables appearing in the results of every intervention in ; and that for each such variable , at most of one of its assignments in does not appear as an assignment or a negated assignment in .

We now give an algorithm to check whether is compatible with . We first give prose and formal descriptions of the algorithm and then consider its runtime and correctness.

Order the variables in in a way consistent with the well-order . For each variable with , do the following. First, for each intervention in that mentions , confirm that the intervention leads to satisfiable results: if says that upon the intervention which sets , the variable takes a value , we reject , which necessarily has probability 0. Next, for each pair of interventions in which do not intervene on the value assigned to , check whether both interventions result in the same assignments to variables for all ; we say that such interventions have agreement on all for . If this is the case, and yet says that these two interventions result in different values for , reject ; since can depend only on the values of for , when these values are constant, must be constant as well. Here is a formal description of the algorithm. We will write to denote that the variable appears (or is mentioned) in the intervention , i.e., that is a conjunct in for some value .

Order the variables in according to
for i in 1,…,n do
       for intervention in with appearing in  do
             if  with appears in the conjunction of assignments following  then
                   return is unsatisfiable, and so incompatible with
             end if
       end for
      for interventions in agreeing on all for , and such that and  do
             if  results in and results in with  then
                   return is incompatible with
             end if
       end for
end for
return is compatible with
Algorithm 1 Check that is compatible with

Below, we show that the above algorithm indeed runs in time and is correct, but for clarity, let us step through its execution on some examples. Consider the input . Then, by the first “if” clause in the algorithm, is rejected as unsatisfiable, since the intervention leads to impossible results. For another example, let be the formula

Then in the second “if” clause on the third iteration, is rejected as incompatible with , because the interventions and do not intervene on , result in the same values for and , and do result in the same value for , contradicting the fact that ’s value must depend only on those assigned to and .

It is helpful in considering these examples and the runtime of the algorithm to consider the following table of values:

Results of all interventions in the input formula