# Causal Modeling with Probabilistic Simulation Models

Recent authors have proposed analyzing conditional reasoning through a notion of intervention on a simulation program, and have found a sound and complete axiomatization of the logic of conditionals in this setting. Here we extend this setting to the case of probabilistic simulation models. We give a natural definition of probability on formulas of the conditional language, allowing for the expression of counterfactuals, and prove foundational results about this definition. We also find an axiomatization for reasoning about linear inequalities involving probabilities in this setting. We prove soundness, completeness, and NP-completeness of the satisfiability problem for this logic.

## Authors

• 7 publications
05/08/2018

### On the Conditional Logic of Simulation Models

We propose analyzing conditional reasoning by appeal to a notion of inte...
08/07/2014

### A Logic for Reasoning about Upper Probabilities

We present a propositional logic to reason about the uncertainty of even...
11/27/2021

### Is Causal Reasoning Harder than Probabilistic Reasoning?

Many tasks in statistical and causal inference can be construed as probl...
01/09/2020

### Probabilistic Reasoning across the Causal Hierarchy

We propose a formalization of the three-tier causal hierarchy of associa...
02/17/2022

### A Completeness Result for Inequational Reasoning in a Full Higher-Order Setting

This paper obtains a completeness result for inequational reasoning with...
03/03/2015

### Combining Probabilistic, Causal, and Normative Reasoning in CP-logic

In recent years the search for a proper formal definition of actual caus...
11/25/2021

### Observing Interventions: A logic for thinking about experiments

This paper makes a first step towards a logic of learning from experimen...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Accounts of subjunctive conditionals based on internal causal models offer an alternative to approaches based on ranking possible worlds by similarity [9]. One might, e.g., employ structural equation models (SEMs), i.e. systems of equations connecting the values of relevant variables, as the causal model; the semantics of conditionals are then based on a precise notion of intervention on the SEM [11]. Recently, some authors [8, 10, 4, 3, 1] have proposed using arbitrary programs, rather than systems of equations, as causal models. This approach emphasizes the procedural nature of many internal causal simulations over the purely declarative SEMs.

It is possible to define precisely this idea of programs as causal models and to generalize the idea of intervention from SEMs to programs [8]. It is also possible to give a sound and complete logic of conditionals in this setting [6]. However, these preliminary results have not fully explored the very important case—from, e.g., the Bayesian Logic modeling language [10] and implicit in the use of probabilistic programs as cognitive models [3]—of conditionals in a probabilistic setting, via using stochastic programs as the underlying causal model. In the present contribution we will establish foundational definitions and logical results for this setting, thus extending the causal simulation framework to probabilistic simulation programs. Probabilities over a causal modeling language are defined and results showing that they may actually be interpreted as probabilities are given. The probabilities are used to give the semantics of a language for probabilistic reasoning, for which an axiomatization is given. The language and axiomatization are extensions of an analogous probabilistic language considered for the purely propositional case by [2]. Soundness and completeness of the axiom system is proven, and the satisfiability problem is found to be -complete.

## 2 Probabilistic Simulation Models and the Logical Language

### 2.1 Simulation Models

We work toward the definition of a language for expressing probabilities involving probabilistic simulation models. Probabilistic simulation models extend the non-probabilistic111 The use of “non-probabilistic” rather than “deterministic” is intended to prevent confusion of the probabilistic/non-probabilistic distinction with the deterministic Turing machine/non-deterministic Turing machine distinction. The former distinction is about the presence of a source of randomness while the latter is about the number of possible halting executions. causal simulation models of [8, 6]. Formally, a non-probabilistic simulation model

is a Turing machine

222 [6] does not require these machines to be deterministic, and isolates an additional logical principle that is valid when the machines are deterministic. However here we will suppose “non-probabilistic simulation model” always refers to one whose Turing machine is deterministic. This definition is more useful for comparison with the probabilistic case, in which all underlying machines are deterministic. , and a probabilistic simulation model is a probabilistic Turing machine, i.e., a deterministic Turing machine (that of course still has a read-write memory tape) given read access to a random bit tape whose squares represent the results of independent fair coin flips. The use of Turing machines is meant to allow for complete generality and encompasses, e.g., both logic programming and imperative programming. We sometimes use intuitive pseudocode in describing simulation models; such pseudocode is readily convertible to Turing machine code.

We suppose that simulation models are run initially from an empty tape.333 [6] also includes an initial input tape in the definition of the model. This difference is inconsequential.

As a simulation model runs, it reads and writes the values of binary variables on its tape squares. Eventually, the model either halts with some resultant tape, or does not halt, depending on the results of the coin flips the model performs in the course of its simulation. Every probabilistic simulation model thus induces a distribution on these possible outcomes. We are interested not only in these outcomes, but also in the dynamics and counterfactual information embodied in the model. That is, we are interested in what

would happen were we to hold the values of the tape square variables fixed in a particular way that counterfactually differs from the actual values the squares take on—in the distribution over outcomes that results under a particular intervention:

###### Definition 1 (Intervention [8])

Let be a specification of binary values for a finite number of tape squares: for a finite index set . Then the intervention is a computable function from Turing machines to Turing machines specified in the following way. Given a machine , the intervened machine does the same thing as but holds the variables in to their fixed values specified by throughout the run. That is, first writes to square for all , then runs while ignoring any writes to any of the squares whose indices are in .

Suppose one fixes the entire random bit tape to some particular sequence in . Then the counterfactual, as well as actual, behavior of a probabilistic simulation model is completely non-probabilistic. We define first a basic language that allows us to express facts about such behavior. Then we will define the probability that a given probabilistic simulation model satisfies a formula of this basic language. Our final language uses these probabilities—it thus expresses facts about the probabilities that counterfactual properties hold. In all logical expressions we help ourselves to these standard notational conventions: abbreviates , and denotes .

### 2.2 The Basic Language

#### 2.2.1 Syntax

The basic, non-probabilistic language is a propositional language over conditionals. Formally:

###### Definition 2

Let be a set of atoms representing the values of the memory tape variables and let be the propositional language formed by closing off under conjunction, disjunction, and negation.

Let the intervention specification language be the language of purely conjunctive, ordered formulas of unique literals,444 The point being that such formulas are in one-to-one correspondence with specifications of interventions, i.e., finite lists of variables along with the values each is to be held fixed to. i.e., formulas of the form for some , where and each is either or . abbreviates the “empty intervention” formula with . Let be the conditional language of formulas of the form for .

The overall basic language is the language formed by closing off the formulas of 555Unlike [6], we do not admit the basic atoms as atoms of . There is no difficulty extending the semantics to such atoms, but allowing them would needlessly complicate the proof of Theorem 3.1. under conjunction, disjunction, and negation.

Every formula specifies an intervention by giving a list of variables to fix and which values they are to be fixed to. Given a subjunctive conditional formula , we call the antecedent and the consequent. We use for the dual of , i.e., abbreviates . Note that holds in a program if the unmodified program halts with a tape making true.

#### 2.2.2 Semantics

The semantics of the basic language are defined from considering a subjunctive conditional to be true in a simulation model when the program so intervened upon as to make its antecedent hold halts with such values of the tape variables as make its consequent hold. For example, consider a simple model that checks if the first memory tape square is and if so writes a into the second tape square , and otherwise simply halts. This program satisfies the formulas , , but also the counterfactual formula : holding the first memory square fixed to causes a write of the value into the second tape square, thus satisfying the consequent . Formally:

###### Definition 3

Let be a non-probabilistic simulation model. Define iff halts with a memory tape whose variable assignment satisfies . Now suppose is probabilistic, and fix values for all squares on the random bit tape to some sequence . Define iff when run with its random bit tape fixed to halts with a resultant memory tape satisfying . Define (in both cases) satisfaction of arbitrary formulas of in the familiar way by recursion.

In a sense, the validities of the non-probabilistic setting carry over to this setting, as we will now show. For , write if is valid in the class of all non-probabilistic simulation models. We will see that all such formulas are still valid for probabilistic simulation models, under Definition 3, once one fixes the random bit tape to a particular sequence.

###### Lemma 1

if and only if, for all probabilistic simulation models and all , we have that .

###### Proof

Suppose . Consider some probabilistic simulation model and sequence . is composed of -atoms, of the form . What is the behavior of ? Either reads only a finite portion of or reads an unbounded portion of (in the latter case, it also does not halt). If only a finite portion is read, let be the maximal random bit tape square reached of . Let be the maximum of the for all atoms in , clearly existent as has finite length. Construct a Turing machine from that embeds the contents of up to index into its code, replacing any read from with its value. This is possible in a finite amount of code as we only have to include values up to in .

What if ends up reading an unbounded portion of ? We note that it is possible to write code in to check if the machine is being run under an -fixing intervention—i.e., conditional code that runs under and no other intervention.666 For the precise details of this construction, see [6]. Briefly, if one wants to check if some is being held fixed by an intervention, one can try to toggle ; this attempt will be successful iff is not currently being fixed by an intervention. Add such code to , including an infinite loop conditional on an -intervention for each case where reads an unbounded portion of . Now, for all atoms , iff . As this holds for any atom of , and , we have that as desired.

Now, suppose that for all probabilistic . We want to see that . Given a non-probabilistic , convert to a probabilistic TM that never reads from its random tape, and take any random tape . Then so that . ∎

#### 2.3.1 Syntax

is the language of linear inequalities over probabilities that formulas of hold. More precisely:

###### Definition 4

Let be the language of formulas of the form

 a1P(φ1)+⋯+anP(φn)≤c (1)

for some , and , . Then is the language of propositional formulas formed by closing off under conjunction, disjunction, and negation.

We sometimes write inequalities of a different form from (1) with the understanding that they can be readily converted into some -formula. For example, an inequality with a sign is a negation of a -formula.

#### 2.3.2 Semantics

Let be a probabilistic simulation model. We will shortly define a probability . Now suppose a given has the form (1). Then iff the inequality (1) holds when each factor takes the value . Satisfaction for arbitrary is then defined familiarly by recursion. Given , the probability is simply the (standard) measure of the set of infinite bit sequences for which . More formally: let be the -algebra on generated by cylinder sets and be the standard measure defined on .777 That is, as the product measure of measures, as defined in, e.g., [3]. Now let . Then we define . The following Lemma ensures that is always measurable, so that this definition is valid.

###### Lemma 2

For any , we have .

###### Proof

Proof by induction on the structure of . If , then is the complement of a set in and hence is in . The case of a conjunction or disjunction is similar since is closed under intersection and union. The base case is that of the atoms. Consider an atom of the form . If halts on with random bit tape fixed to , then it does so reading only a finite portion of . Thus is the union of cylinder sets extending finite strings on which halts with a result satisfying , and hence is in . ∎

This probability is coherent in the sense that it plays well with the logic of the basic language:

###### Proposition 1

For any probabilistic we have,

1. if for

2. whenever for

3. for all

###### Proof

(1) holds since in this case, by Lemma 1, . (2) holds since in this case, . Finally (3) holds by noting , applying (2), and noting that and are disjoint. ∎

A corollary of part (2) is that logical equivalents under preserve probability.

### 2.4 The Case of Almost-Surely Halting Simulations

An interesting special case is that of the simulation models that halt almost-surely, i.e., with probability under every intervention. Call this class . Following the urging of [7] we have not restricted the definition of probabilistic simulation model to such models. We will see that from a logical point of view, this case is a natural probabilistic analogue of the class of non-probabilistic simulation models that halt under every intervention. By this we mean that we may prove an analogue to Lemma 1. Write if is valid in . Note that Lemma 1 does not hold if one merely changes all the preconditions to be halting/almost-surely halting: consider a probabilistic simulation model that repeatedly reads random bits and halts at the first it discovers; this program is almost-surely halting. But if is an infinite sequence of s, then , even though . Crucially, we must move to the perspective of probability and measure to see the analogy:

###### Lemma 3

if and only if, for all , we have for all except on a set of measure .

###### Proof

Suppose . We claim that for all we have for all except on a set of measure . Again, consider an atom appearing in . The set of for which does not halt has measure , given that . On each such , the run of must read infinitely many bits of : otherwise, the intervened machine would have a nonzero probability of not halting. Thus, excluding such , it is possible to repeat the construction of from the proof of Lemma 1 for , and in doing this construction we are already ignoring all cases where an unbounded portion of is read. This means that we do not have to include any infinite loops in , and will be always-halting. If we exclude all the such arising from all antecedents of atoms of , then we only exclude a set of measure since there are finitely many atoms. Except for such , the construction works, and has, as before, the same behavior as . But since , we have that except on the excluded set of measure .

For the opposite direction, let . We wish to show that . Convert to an identical probabilistic simulation program that never reads from its random tape. We have for all but on a set of measure ; in particular, for at least one . This implies . ∎

## 3 Axiomatic Systems

We will now give an axiomatic system for reasoning in and prove that it is sound and complete with respect to probabilistic simulation models: it proves all (completeness) and only (soundness) the formulas of that hold for all probabilistic simulation models. We will give an additional system that is sound and complete for validities with respect to the almost-surely halting simulation models .

###### Definition 5

Let AX be a set of rules and axioms formed by combining the following three modules.

1. PC: propositional reasoning (tautologies and modus ponens) over atoms of .

2. Prob: the following axioms:

 NonNeg. P(φ)≥0 Norm. P(⊤)=1 Add. P(φ∧ψ)+P(φ∧¬ψ)=P(φ) Dist. P(φ)=P(ψ) whenever ⊨non-probφ↔ψ
3. Ineq, an axiomatization (see [2]) for reasoning about linear inequalities:

 Zero. (a1P(φ1)+⋯+anP(φn)≤c) ⇔(a1P(φ1)+⋯+anP(φn)+0P(φn+1)≤c) Permutation. (a1P(φ1)+⋯+anP(φn)≤c)⇔(aj1P(φj1)+⋯+ajnP(φjn)≤c) when j1,…,jn are a permutation of 1,…,n AddIneq. ⇒((a1+a′1)P(φ1)+⋯+(an+a′n)P(φn)≤(c+c′)) Mult. (a1P(φ1)+⋯+anP(φn)≤c) ⇒(ba1P(φ1)+⋯+banP(φn)≤bc) for any b>0 Dichotomy. (a1P(φ1)+⋯+anP(φn)≤c)∨(a1P(φ1)+⋯+anP(φn)≥c) Mono. (a1P(φ1)+⋯+anP(φn)≤c) ⇒(a1P(φ1)+⋯+anP(φn)c

Additionally, let be the system formed in exactly the same way, but replacing with .

Note that the non-probabilistic validities and , appearing in Dist, have been completely axiomatized in [6]. The main result is:

###### Theorem 3.1

AX (respectively, ) is sound and complete for the validities of with respect to (respectively, ).

###### Proof

Soundness (of Prob) follows from Lemma 1, Proposition 1, and, for the almost-surely halting case, Lemma 3. For completeness, consider the general case of first. As usual, it suffices to show that any consistent is satisfiable by some probabilistic simulation model. We put into a normal form from which we construct a canonical model. By PC we may suppose is in disjunctive normal form. We may further suppose that it is a conjunction of -literals, as at least one (conjunctive) clause in the disjunctive normal form must be consistent. Let be the atoms that appear inside any probability in , and let represent all the formulas of the form that can be obtained by setting each to either or . We then have the following, which is a kind of normal form result:

###### Lemma 4 (Lemma 2.3, [2])

is provably-in-AX equivalent to a conjunction

 (P(δ1)≥0)∧⋯∧(P(δ2n)≥0) ∧ (P(δ1)+⋯+P(δ2n)=1) ∧ (a1,1P(δ1)+⋯+a1,2nP(δ2n)≤c1) ∧ … ∧ (am,1P(δ1)+⋯+am,2nP(δ2n)≤cm) ∧ (a′1,1P(δ1)+⋯+a′1,2nP(δ2n)>c′1) ∧ … ∧ (a′m′,1P(δ1)+⋯+a′m′,2nP(δ2n)>c′m′) (2)

for some integer coefficients .

###### Proof

Let be any of the formulas appearing inside of a probability in . Note that by Add. Moving on to , we have, provably, , and we may rewrite similarly. Applying this process successively, we have . For any term in the right-hand side of this inequality, if , propositional reasoning by Dist allows us to replace the term by , and if not, by . Thus we always have that for some coefficients . Applying this process to each -term in and using Ineq to rewrite the left-hand sides of the inequalities, and conjoining the (clearly provable) clauses that for all , and , we obtain (4). ∎

The conjunction (4) can be seen as a system of simultaneous inequalities over unknowns, . Ineq is actually sound and complete for such systems (we refer the reader to Section 4 of [2] for the proof of this fact). So if is consistent with AX—which includes Ineq—this system must have a solution. Thus there are values solving (4). We will now construct a probabilistic simulation model having precisely these probabilities of satisfying each . Note that for any with it is provable that , and we may conjoin this to (4). Note also that is unsatisfiable for any . Given these two observations, the following Lemma implies the result.

###### Lemma 5

For any collection of satisfiable -formulas no two of which are jointly satisfiable, and any rational probabilities such that , there is a probabilistic simulation model such that for all , .

###### Proof

Since the are satisfiable, there are non-probabilistic simulation models such that for all , we have . Further, we may suppose the machines so constructed use only a bounded number of memory tape squares.888 Why? Since are satisfiable, they are consistent with the axiomatization for non-probabilistic simulation models given by [6], and hence are satisfied by the canonical models given in [6]. These models use only boundedly many tape squares. Thus let the maximum index of a tape square used by any of the be . We now describe informally. Suppose without loss of generality that for all , for some common denominator . Let draw a random number from up to uniformly, and ensure that does any auxiliary computations it might need only on squares with indices at least . Check whether , and if so, let branch into the code of . If not, check if and if so, branch into . Repeat the process for . It’s clear that the probability of branching into each block is exactly , and the same is true under any relevant (i.e., involving only memory tape variables that appear in one of the ) intervention on : we may suppose any auxiliary computations might require use only memory tape squares with indices past . After branching into the th block, the behavior of is exactly the same as that of , meaning that any random bit tape fixings that end up causing a branch into this block will belong to . Another random bit tape fixing that causes a branch into another block, say the th, cannot belong to since are jointly unsatisfiable. Thus, for all . ∎

Finally, we must see that this model lies in if the original formula is consistent with . [6] has shown that . Then in the proof of Lemma 5, we may suppose that each block contains only always-halting code,999 Since the canonical programs of [6] for contain only such code. and hence that does not contain any loops either: thus it almost-surely halts. ∎

## 4 Computational Complexity

Call the problem of deciding if a formula is satisfiable . Theorem 4.1 shows that solving this problem is no more complex than is propositional satisfiability.

###### Theorem 4.1

is -complete in (where this length is computed standardly).

###### Proof

It’s -hard since, given any propositional , the formula is satisfiable iff is satisfiable (consider a machine that does nothing but write a satisfying memory tape assignment out). In order to show that the satisfiability problem is in , we give the following nondeterministic satisfiability algorithm: guess a program from a class of programs (that we will define shortly) that includes the program constructed in Lemma 5 —call this canonical program —and check (in polynomial time) if it satisfies . This algorithm decides satisfiability since, by soundness, a satisfiable formula must be consistent, and hence has a canonical model of the form constructed in Lemma 5. For the remainder of the proof, by the “length of a number,” we just mean the length of its computer (binary) representation. The “length of a rational” is the sum of the lengths of its numerator and its denominator.

What is the class of probabilistic simulation models that we may limit our guesses to? For some fixed constants , we will define a class . We will then show that there exist such that the canonical program of Lemma 5 belongs to for all consistent . Let be the fragment of probabilistic simulation models whose code consists of the following:

1. Code to draw a random number uniformly between and some , such that has length at most .

2. At most branches, that is, copies of: an if-statement with condition , whose body is a canonical program for some , of the same form as the non-probabilistic canonical models (i.e., in the class defined in the proof of Theorem 2 from [6]).

Letting be the bounds for the th copy in (2), we also require that , and that for all , and that . The following fact from linear algebra (we refer the reader to [2] for the proof) helps us to show that for all consistent , the canonical program belongs to for some .

###### Lemma 6

A system of linear inequalities with integer coefficients of length at most that has a nonnegative solution has a nonnegative solution with at most variables nonzero, and where the variables have length at most . ∎

Apply this lemma to (4). Each inequality in (4) originally came from , so there are of them. Further, recall that each integer coefficient in (4) came from summing up a subset of coefficients originally from , with is the number of atoms appearing anywhere inside expressions in . As this is thus —and hence is in length—and each original coefficient is also in length, each coefficient is in length as well (lengths of products add). Thus Lemma 6 shows that without loss of generality, we may suppose that the solutions for the of (4) have length. The common denominator of these rationals hence has length. The construction of Lemma 5 has one branch for each of them, and hence branches. This shows the existence of for part (1) of the definition of and the existence of a for part (2). We will abbreviate for some choice of thus guaranteed.

It remains to show that given any program , we can check if in polynomial time. It suffices to show that checking if for is polynomial time: if we know whether for every that is built out of, we can decide in linear time if . Thus suppose has the form . [6] shows that one may check if the in part (2) of the definition of satisfy any formula of the basic language in polynomial time. Then we can easily compute as simply the sum of the probabilities of each branch that satisfies . Doing the arithmetic to check if is satisfied is then certainly polynomial time, so we have our result. ∎

## 5 Conclusion and Future Work

We have defined and obtained foundational results concerning a very natural extension of counterfactual intervention on simulation models to the probabilistic case.

One critical operation in probability is conditioning, or updating probabilities given that some event is known to have occurred (in the subjective interpretation, updating a belief for known information). One may already define conditional probabilities in the usual way in the current framework, and our framework (without interventions) covers the conditional simulation approach to certain aspects of common-sense reasoning of [3]. In this approach, one limits oneself to the runs satisfying a certain query; the framework considered here would be equivalent for any queries expressible as formulas of . [2] also give a logic for reasoning about conditional probabilities. Future work would involve extending this system to probabilistic simulation models and studying the complexity of reasoning in that setting.

As [8, 6] note, the simulation model approach invalidates many important logical principles that are valid in other approaches [5, 11, 9], such as cautious monotonicity: . However the approach is otherwise quite general, and an important future direction would be to identify and characterize subclasses of of simulation models that validate this and other similar logical principles. We have begun investigating this extension. An interesting consequence it has is on the comparison of conditional probability with the probabilities of subjunctive conditionals: while these two probabilities are not in general equal in the classes or , they are equal in certain restricted classes.

A final direction we want to mention concerns “open-world” reasoning including first-order reasoning about models with some domain, where counterfactual antecedents might alter how many individuals are being considered or which individuals fall under a property or bear certain relations to each other. Recursion and the tools of logic programming [4, 10] make this very natural for the simulation model approach, and we would like to understand the first- and higher-order conditional logics that result in this approach, in both the non-probabilistic and probabilistic cases. We have also begun exploring this direction.

## References

• [1] Chater, N., Oaksford, M.: Programs as causal models: Speculations on mental programs and mental representation. Cognitive Science 37(6), 1171–1191 (2013)
• [2] Fagin, R., Halpern, J.Y., Megiddo, N.: A logic for reasoning about probabilities. Information and Computation 87, 78–128 (1990)
• [3]

Freer, C.E., Roy, D.M., Tenenbaum, J.B.: Towards common-sense reasoning via conditional simulation: legacies of turing in artificial intelligence. In: Downey, R. (ed.) Turing’s Legacy: Developments from Turing’s Ideas in Logic, Lecture Notes in Logic, vol. 42, pp. 195–252. Cambridge University Press (2014)

• [4] Goodman, N.D., Tenenbaum, J.B., Gerstenberg, T.: Concepts in a probabilistic language of thought. In: Margolis, E., Laurence, S. (eds.) The Conceptual Mind: New Directions in the Study of Concepts. MIT Press (2015)
• [5] Halpern, J.Y.: Axiomatizing causal reasoning. Journal of AI Research 12, 317–337 (2000)
• [6] Ibeling, D., Icard, T.: On the conditional logic of simulation models. Proc. 27th IJCAI (2018)
• [7] Icard, T.: Beyond almost-sure termination. Proc. 39th CogSci (2017)
• [8] Icard, T.F.: From programs to causal models. In: Cremers, A., van Gessel, T., Roelofsen, F. (eds.) Proceedings of the 21st Amsterdam Colloquium. pp. 35–44 (2017)
• [9] Lewis, D.: Counterfactuals. Harvard University Press (1973)
• [10] Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D.L., Kolobov, A.: BLOG: Probabilistic models with unknown objects. In: Proc. 19th IJCAI. pp. 1352–1359 (2005)
• [11] Pearl, J.: Causality. CUP (2009)