1 Introduction
A hallmark of intelligence is the ability to reason about cause and effect. Indeed, some form of causal representation is essential for robust reasoning under uncertainty, including flexible planning, delivering useful explanations, and the capacity to transfer knowledge from one domain to another. How should causal knowledge be represented? Extant formalisms vary widely both in terms of their representational primitives and in the reasoning principles they engender, witness frameworks based on similarityorderings (Lewis, 1973; Ginsberg, 1986)
, Bayesian networks, and structural equation models
(Spirtes et al., 2000; Pearl, 2009), among others. While distinct and sometimes even incompatible (see, e.g., Halpern 2013; Zhang 2013 on similarityorderings versus structural equation models, and Pearl 2009; Bottou et al. 2013 on Bayes nets versus structural equation models), each of these frameworks captures important insights about causal representation and reasoning.A second hallmark of uncertain reasoning is the ability to deal with “openuniverse” or “firstorder” domains in a way that does not assume a fixed, finite set of variables. Milch et al. (2005) present an illustrative example of aircrafts producing identical blips on a radar screen. The radar blips give a noisy picture of the positions and velocities of some antecedently unknown number of aircrafts. Reasoning about evident in this setting introduces distinct challenges, chiefly to encode evidence about objects one did not know existed. Observing blips, for example, is still consistent with any number of latent aircrafts. The field of statistical relational modeling has developed sophisticated methods for learning and reasoning in such domains (Poole, 2003; Milch et al., 2005; Carbonetto et al., 2005; Richardson and Domingos, 2006; Srivastava et al., 2014; Gogate and Domingos, 2016).
Research on these two aspects of uncertain reasoning has proceeded mostly independently. Whereas similaritybased causal representations in principle extend to firstorder settings (e.g., Friedman et al. 2000), and there have been numerous extensions of directed graphical models to handle unbounded numbers of variables (e.g., Friedman et al. 1998; Pfeffer and Koller 2000), models with a perspicuous and useful causal interpretation typically assume a fixed, finite set of variables. That is, although in the course of learning one may always infer the existence of new hidden variables, no inferred causal model would typically employ an unbounded set of variables. Theoretical work on causal representation nearly always assumes this restriction (Galles and Pearl, 1998; Halpern, 2000; Spirtes et al., 2000; Zhang, 2013; Pearl, 2009).
Such a restriction is undoubtedly appropriate for many applications. But it may also be limitative. Returning to the radar blip example, for instance, we should be able to assess such claims as, “if there were more than 100 aircrafts, at least one would be missed by the radar,” or “given the observed blips, had there been five more aircrafts, at least two would have been dangerously close to one another,” and so on. These are essentially firstorder queries that depend on complex causal facts about an unknown number of entities and their properties and relations. Such causal queries ought to be answerable from a single underlying model (possibly under different observations), without having to specify a distinct set of variables for each use.
One promising approach to this problem comes from a tradition broadly within statistical relational modeling, whereby knowledge is encoded in the form of generative probabilistic programs. Probabilistic programs can be used as stochastic simulators and inverted to perform conditional inference given observational data, capturing an important variety of probabilisitic learning (Milch et al., 2005; Goodman et al., 2008; de Raedt and Kimmig, 2015; Tran et al., 2017; Bingham et al., 2019). Encoding a generative process implicitly as a program facilitates succinct representation of complex dependencies among unbounded sets of variables.
Similarly, within research on neural networks, so called
implicit generative models—tools such as variational autoencoders and generative adversarial networks—also represent probability distributions implicitly by means of stochastic data simulators
(Mohamed and Lakshminarayanan, 2017). Though these models are typically incapable of dealing with fully openuniverse domains, extensions may in principle be expressive enough to do so (see, e.g., Li et al. 2017). Related ideas involving “mental simulation” of environmental dynamics are being increasingly explored within neural approaches to reinforcement learning
(Hamrick, 2019).Probabilistic generative models are often associated with an intuitive causal interpretation, although this interpretation is not always appropriate (Peters et al., 2017). For example, there should be a reason for thinking that parameters of the model correspond appropriately to aspects of the true underlying datagenerating process (Besserve et al., 2018). Our interest here is in a more abstract question: what structural features would be sufficient in order for an openuniverse model even to be a candidate as a causal model?
Our aim in this paper is to establish formal (axiomatic) foundations for genuinely openuniverse causal models. Specifically, we want to understand and assess the subjunctive conditional claims they encode. Subjunctive conditionals—“what if?” statements about what would occur under counterfactual or hypothetical conditions—are of foundational importance for causal reasoning, arguably definitional of the subject matter (Spirtes et al., 2000; Pearl, 2009; Bottou et al., 2013; Peters et al., 2017). Such conditional statements are typically formalized by appeal to a notion of causal intervention (Meek and Glymour, 1994; Spirtes et al., 2000; Pearl, 2009), and axioms for conditionals play crucially into prominent algorithms for inferring counterfactual and interventional probabilities (“causal effects”) from observational data (Shpitser and Pearl, 2008; Pearl and Bareinboim, 2012; Hyttinen et al., 2015).
Openuniverse models can be specified either implicitly via a (datagenerating) program or process, or explicitly via a set of equations relating the variables. In general these two kinds of models—together with natural corresponding concepts of hypothetical/counterfactual intervention—are quite different, both conceptually and axiomatically (Ibeling and Icard, 2018). For instance, they handle loops and feedback in incompatible ways (cf. also Lauritzen and Richardson 2002). Nonetheless, we show that under suitable restrictions, wellmotivated from a causal perspective, the two are expressively equivalent. Thus, our first contribution is to define two natural classes of simulation programs and structural equation models, respectively, and show equivalence of these two classes with respect to the conditional claims they entail. The characterizations appeal to an implicit (discrete) temporal structure, and can be seen as “openuniverse datagenerating processes” generalizing the idea of a “recursive” causal model in the sense of Pearl (2009), viewed either procedurally or declaratively. We offer this as a formalization of those generative models that could plausibly be interpreted as causal models.
Following this we establish a series of axiomatization results with respect to natural systems for reasoning about subjunctive conditionals. Our results build on previous axiomatic work on causal conditionals that assumed a fixed, finite set of variables (Galles and Pearl, 1998; Halpern, 1998, 2000), extending this work to the openuniverse setting. Dealing with an unbounded set of variables presents unique challenges. Nevertheless, we show that an axiomatic system including quintessential causal principles (including those used in aforementioned identifiability algorithms) is sound and complete for both the procedural and the declarative classes of models, thereby substantiating the causal interpretation of both classes.
Bringing out the openuniverse aspect of these models, we consider augmenting the conditional language with a causal influence relation , whereby expresses that there is a context in which a change in would lead to a change in (Woodward, 2003; Pearl, 2009). Whereas this relation is easily definable in the finite setting (Halpern, 1998), in our setting is an essentially higherorder notion, quantifying over all possible interventions. Reasoning about this relation in any particular model—most notably in rich probabilistic programs—may in general be undecidable; however, the satisfiability problem for all the systems we consider remains complete, showing that abstract causal reasoning (e.g., as features in the docalculus) in the openuniverse setting is no more complex than in the finite setting.
Finally, as an additional application of this study, we consider the special case where all causal influence is local, in the sense that whenever , this influence is mediated by variables temporally in between and . As we explain, this semantic assumption is quite natural in the openuniverse setting, and axiomatically it corresponds exactly to the claim that is a transitive relation.
The broader goal of this work is to establish foundations suitable for understanding and assessing highly expressive (openuniverse) representational systems that encode causal structure, even if only implicitly. Such models feature not only in recent AI research, but also centrally in models of human cognition (see, e.g., Freer et al. 2012; Lake et al. 2017). Indeed, the openuniverse nature of human reasoning under uncertainty is wellestablished (Kemp, 2012). We may thus expect that, however causal knowledge is represented in humans, it accommodates flexible, openuniverse reasoning. At a high level, the present work brings together research on causal representation and reasoning with ideas and tools from statistical relational modeling and higherorder representation.
2 Two Classes of Models
In this section we introduce two kinds of models, one based on systems of equations (§2.1), the other based on algorithms (§2.2). They respectively emphasize declarative and procedural styles of modeling. Owing in part to this difference, the most general versions of the two can be distinguished even at a very abstract, axiomatic level (Ibeling and Icard, 2018). However, by restricting to appropriate (causally motivated) subclasses, we can demonstrate their equivalence (Thm. 1).
Throughout this section we assume a signature , specifying a countably infinite set of variables that take on values from a set . For example, we might have a variable in representing “the position of the 77th aircraft,” which could take on values in some numerical range (or a special value “undefined” if there is no 77th aircraft). Even if we restrict each variable to take only finitely many possible values, the fact that is infinite allows encoding arbitrarily complex structures.
2.1 Structural Equation Models
The most general class of structural equations models (see, e.g., Pearl 2009) allows arbitrary equations among arbitrary sets of variables. Because our primary interest here is in (openuniverse, probabilistic) generative models, we restrict attention to a special subclass of possible sets of equations. First, we assume an infinite set of variables . Second, similar to so called recursive models (Pearl, 2009), we will assume that variables can be given a causal (intuitively, temporal) order. Although there may well be adequate causal interpretations of nonrecursive models, e.g., as descriptions of equilibrium behavior of some underlying process (Strotz and Wold, 1960), these interpretations rely on the declarative character of equational modeling and cannot always be adequately captured by an appropriate datagenerating process (Lauritzen and Richardson, 2002).
Third, unlike in dynamic Bayesian networks and related extended graphical models (Dean and Kanazawa, 1989; Friedman et al., 1998), we do not require each variable to depend on only finitely many others; a variable may depend on an unbounded number of variables preceding it in a temporal order. Fourth, we require all equations to be uniformly computable (Defn. 4).
Note, finally, that we make no use here of the distinction between exogenous and endogenous variables (though see the discussion in §2.3 below).
Definition 1 (Structural Equation Model).
A structural equation model (SEM) is a collection of partial functions , with , and a time map . Each is a function only of preceding variables: if and for all such that , then .
The functions in an SEM specify structural equations that are to be simultaneously satisfied: an SEM has solution if for all . We write in this case. Since the order is wellfounded and acyclic, may be built up iteratively, and is thus unique. Intervention on an SEM is defined standardly (Meek and Glymour, 1994; Pearl, 2009):
Definition 2 (Intervention on an SEM).
An intervention is a partial function . It specifies variables to be held fixed, and the values to which they are fixed. Intervention induces a mapping of SEMs, also denoted , as follows. Abbreviate as . Then is identical to , but with replaced by the constant function for each .
Given we say is a restriction of if and for all . Note may be a total function . We define a natural relation of causal influence in SEMs as follows (after Woodward 2003; Pearl 2009).
Definition 3 (Causal influence).
Let with . Then (read influences ) if there is an intervention (the context) and so that, for letting be the intervention fixing to and be the solution , we have .
If then ; this is easy to see again from the fact that the temporal order implies that a solution may be assembled iteratively.
For comparison with simulation models, it is helpful to identify those SEMs whose (counterfactual) solutions are computable. Similar restrictions have been explored in the literature, e.g., on approaches to causal inference based on so called algorithmic independence (Janzing and Schölkopf, 2010; Peters et al., 2017). This is the subclass in which the functions are uniformly computable and is also computable (both presuming an encoding of ):
Definition 4 (Computable Structural Equation Model).
Let be an SEM with time map , and let be the set of computable functions . We say is computable if is computable, for all , and defined by is a computable function.
To clarify the sense in which is computable, let us associate each variable in
with a square of an infinite Turing machine tape with alphabet
. We call such a tape a variable tape. A variable tape (no square of which is blank) stores a function . That is computable means that it is computed by a machine with two input tapes, the second of which is a variable tape. When is run with the encoding of some stored on the first tape and a computable function stored on the second (variable) tape, halts outputting the value . need not halt given uncomputable input .Let us write for the class of all computable SEMs; clearly closes under computable intervention and every SEM in has a unique solution.
Let be the subclass of in which every only depends on immediately preceding variables: if and for all such that , then . Causal influence is not transitive in SEMs generally, but it is transitive in (Pearl, 2009)
. Furthermore, we can prove the following “interpolation” result:
Proposition 1.
Let and . If and , there is a variable such that and .
Proof.
There is a context and for such that and (Defn. 3); thus (Defn. 4). Let be the set of variables whose squares the computation of accesses before outputting a value for and halting; crucially is finite. Form intervention as the restriction of to and let be the solution . Since , . Let be those variables in for which , and consider a finite sequence where , and for , with the sole exception that . Let be the solutions . Since , we have ; there is thus some , for which . Then witnesses that and witnesses that as . ∎
Taking the temporal interpretation of seriously, Prop. 1 reflects the intuition that causal influence is always mediated through time, essentially enforcing a kind of Markov assumption.
2.2 Monotone Simulation Models
Our second class of models similarly deals with an infinite class of variables, but is intended to capture the procedural character of a simulation program, e.g., as defined in an expressive programming language. Such models have been studied in relative generality by means of (probabilistic) Turing machines (see, e.g,. Freer et al. 2012 for an overview), and at this level of generality they can be shown to validate only a very weak set of conditional axioms (Ibeling and Icard, 2018), far weaker than those commonly required of causal conditionals (cf. §3.2 below). Here we restrict attention to machines in which the variables can be given a causal order, again analogously to recursive graphical models. As simulation programs are intended to capture an underlying datagenerating process, this causal order can be well justified (cf. Lauritzen and Richardson 2002).
Definition 5 (Simulation model).
A simulation model (or just a simulation) is a deterministic Turing machine with two tapes: a work tape and a variable tape (see discussion following Defn. 4). The variable tape is writeonly: no variable head transitions in which a symbol of is erased or rewritten are allowed.
Thanks to the writeonly restriction, any variable is written at most once. Given , if every is written with the value , is the solution of and we write . Intervention is defined via blocking rewrites (Icard, 2017; Ibeling and Icard, 2018):
Definition 6 (Intervention).
Given intervention and an oracle for , the simulation emulates but acts as if the square for any is fixed to the value ;^{1}^{1}1 Formally, suppose at some point in the execution the next variable tape transition of is . Before simulating this transition calls the oracle on . If then the simulation proceeds unimpeded: transitions to state , is written to square , and the head moves in direction . If then instead, the simulation does the action where . it dovetails this emulation with a procedure that writes to for all .
In other words, an intervention on a simulation program is an operation on the code of that holds some set of variables to fixed values, an operation that will typically have sideeffects on other variables.
A computable intervention is one whose oracle can be effectively implemented; simulations thus close under computable intervention. The primary phenomena of interest are the behaviors of a simulation under possible interventions. This motivates an equivalence notion:
Definition 7.
Simulations are equivalent, , if for any computable and , iff . They are weakly equivalent if this property only holds when .
It is immediate that interventions compose: for any interventions , there is an intervention such that for all . If are computable then so is .
The mere fact that a variable has been intervened on may affect a simulation, even if it has been held fixed to its actual value (Ibeling and Icard, 2018). This behavior differs starkly from that of SEMs, so to compare the models we must consider only the class of simulations in which noncounterfactual interventions are idempotent:
Definition 8.
We say is (strongly) functional if for any with and any restriction of , and are weakly equivalent.
We define the causal influence relation in simulations analogously to Defn. 3. We call a simulation monotone if it is wellordered with respect to causal influence:
Definition 9.
A simulation is monotone if there is a computable time map preserving .
It follows from compositionality of interventions that if is monotone under , then so is for any . Here we consider only simulations that have a solution under any computable intervention. Let be the class of such simulations that are also functional and monotone.
2.3 Relation to Existing Work
Thm. 1 below shows that the class of computable SEMs can essentially be seen as declarative versions of the class of generative programs. Both classes of models naturally accommodate probability. For SEMs, we simply identify a subset of the variables as exogenous and associate those variables with an appropriate probability distribution, which in turn induces (conditional and interventional) distributions on the remaining (“endogenous”) variables (Pearl, 2009). For simulation programs, we simply add an additional readonly random bit tape whose distribution induces random behavior in the machine, which in turn implicitly defines (conditional and interventional) distributions on the variables (cf. Freer et al. 2012). The resulting models encompass a wide range of familiar and powerful formalisms.
For instance, clearly incorporates all computable (Defn. 4) and recursive (in the sense of Pearl 2009) SEMs. Likewise, certainly includes Bayesian networks as well as dynamic Bayesian networks (Dean and Kanazawa, 1989), in addition to standard dynamic programming algorithms and feedforward neural networks. But crucially, also encompasses programs in which variables may depend on infinitely many others, such as those defined in common probabilistic programming languages (Milch et al., 2005; Goodman et al., 2008; de Raedt and Kimmig, 2015; Tran et al., 2017; Bingham et al., 2019). This includes, e.g., the BLOG program formalizing the radar blip example, among many others.
2.4 The Equivalence of Monotone Simulations and Computable SEMs
Analogously to Defn. 7, we can assess equivalence between between any two causal models, whether simulations or SEMs. It turns out that the classes and are equivalent in the following sense.
Theorem 1.
For any there is a such that and vice versa.
Proof.
Let . Consider with this pseudocode: consists of an infinite loop calling the following subroutine on every . The subroutine returns immediately if the variable tape square for isn’t blank. Otherwise it emulates a computation of (Defn. 4) and writes the result to the square for . The emulated input variable tape (which holds the second argument of ) behaves as follows on access to the square for some variable . If the square for in the original output variable tape (which writes to) is not blank, then the emulated tape holds the value from the original tape. If it is blank, then the simulation computes and ; if then is recursively called so that the square is no longer blank and one can proceed as above. Otherwise, the emulated tape contains some arbitrary value in its square.
Note is functional since the square for any can be written only by a call to . Next let us show that , which immediately implies that is monotone. Let be an intervention, and suppose . We show by induction that . The base case: a variable with . If then for any input ; thus writes . is trivial. Now consider an with . By induction and construction of the emulator, the value output by is at some such that for all such that . By Defn. 1, so we are done. An analogous induction shows that if then .
Now let . Define by giving the following pseudocode to compute : emulate the run of , where is the restriction of to all variables but , until reaching a write to the emulated square, and halt outputting this value. This is computable in the sense of Defn. 4 since the oracle (Defn. 6) for on can simply check that and then look on the input variable tape to find the required value . If , when computing any the above is a restriction of , so by functionality (Defn. 8). Consider Lem. 1 below for the converse.
Lemma 1.
Let , be an intervention, and . Suppose is the restriction of to those such that , with and . Then for all with .
Proof.
Such an is written to at some point in executing both so we have finite restrictions of such that write the same value to as do resp. Transforming into stepwise (cf. proof of Prop. 1) by adding one more variable to the domain in each step, if then some step yields a such that but . ∎
Suppose ; we show by induction. The base case is again a variable with . If , apply Lem. 1 choosing as ; the restriction is empty, so the value output for by agrees with that output by , which in turn agrees with that output by since and . For the inductive case with , choosing in Lem. 1 and applying functionality and composition gives that writes to its square. ∎
3 The Conditional Logic of OpenUniverse Causal Models
Thm. 1 establishes a strong equivalence between simulation models and (computable) SEMs, showing that they encode exactly the same underlying conditional theories. In this section we offer an axiomatic characterization of these conditional theories.
We introduce two formal languages of subjunctive conditionals interpreted in either or under their respective semantics of intervention. We then find sound and complete axiomatizations of their validities. For reasons of simplicity and elegance, but not necessity, we assume a binary alphabet in this section.
3.1 Syntax and Semantics
We admit the variables of as atoms and let be the language of propositional formulas over these atoms. Let be the language of purely conjunctive, ordered formulas of literals formed from distinct atoms. To be precise, let us call an literal if . Then consists of formulas of the form , where each is an literal for some distinct . Each specifies a finite intervention by giving fixed values for a fixed list of variables, and we also write for this intervention. includes the empty intervention with .
Let be the language of formulas of the form for . We call such a formula a subjunctive conditional, and call the antecedent and the consequent. The overall conditional language is the language of propositional formulas over atoms in . For , abbreviates , and means .
To give the semantics of , we first define a satisfaction relation between solutions and formulas of . For , write iff . For arbitrary , is defined familiarly by recursion. Now we may define satisfaction for . Let be a model in either or . Write if has solution and . Finally, satisfaction for the entire language is defined by recursion.
We also consider an augmented language , defined by extending to include additional atoms for . This is interpreted as causal influence: holds in model if (recall Defn. 3).
Thm. 1 implies that any is satisfiable in iff satisfiable in . Hence any semantic property is oblivious to the class in which to interpret it and we omit this.
A first observation is that our language and interpretations do not enjoy the property of compactness (the property that every unsatisfiable set of formulas has a finite unsatisfiable subset), and our axiomatizations are hence only complete relative to finite sets of assumptions:
Proposition 2.
The languages are not compact.
Proof.
Enumerate the variables as . Consider the set . Every finite subset of is easily seen to be satisfiable. But each formula implies that , contradicting that is wellfounded; thus is unsatisfiable. ∎
3.2 Axiomatizations
We now give an axiomatization AX of the validities of . Start with the base system from Ibeling and Icard (2018), obtained by the axioms and rules:
Propositional calculus  
The causal axiom of effectiveness (Galles and Pearl, 1998; Halpern, 2000) is encoded by RW and R, while F/D expresses the fact that solutions always exist uniquely. System , without F/D—a system significantly weaker than any existing logic of causal conditionals in the literature—has been shown sound and complete with respect to the general class of simulation models, while F/D can be added to axiomatize those programs that are deterministic and/or always halt (Ibeling and Icard, 2018). In the present setting we are assuming determinism (but allowing a natural incorporation of probability; recall §2.3).
In order to capture the logic of our restricted classes of SEMs and simulation programs, we add two further axioms to obtain the system . Axiom C is known variously as cautious monotonicity (Kraus et al., 1990) or composition (Galles and Pearl, 1998; Pearl, 2009), and schema Rec is known as recursiveness (Halpern, 1998, 2000).
In Rec, each is an literal and . Note that the axiom commonly known as reversibility is easily derivable in AX (Galles and Pearl, 1998).
For , consider the additional axioms
where in Wit, is an literal and . Let .
We are now in a position to present our three axiomatization results.
Theorem 2.
AX is sound and complete with respect to the validities of .
Proof.
By Thm. 1 it suffices to carry out the proof for only. Soundness is straightforward. As for completeness, we show that any consistent with AX is satisfiable. Let be the set of variables that appear as atoms in , let be the fragment of using only atoms from , and extend to a maximal consistent set . We construct an SEM satisfying all of , and, in particular, .
Let us first give . Define an irreflexive relation on by if any instance of the schema is in , where is an literal. By consistency with Rec there is a total order consistent with . Assign the least element to , and iteratively remove least elements to obtain a injective on .
Now we may give the structural equations. Note that for every and there is exactly one formula of the form , where is an literal, in : the forward direction of F/D
shows there is at most one, and the backward direction shows there is one. Thus form a vector
by if are in respectively, and if . Suppose includes exactly the variables for which . Let represent an extension of to all of ; then define on any such extension as . For all , let and .It remains to prove that satisfies every formula of , for which it suffices to show that iff . To this effect we prove that , where , by induction on the time . If then by we have ; thus suppose . Consider an that includes the value , where the equality holds inductively, for each such that . Thus by K we have that . Suppose that differ at , e.g. but . Then so by C, . By C and Rec we must have also. But this term specifies , and hence this contradicts that . ∎
Theorem 3.
is sound and complete with respect to the validities of .
Proof.
Soundness is again straightforward; we show a consistent is satisfiable. Extend to a maximal consistent set using only atoms. Define a relation by if . By consistency with we have that there is a total order consistent with . Now, reproduce the construction from the proof of Thm. 2 on . By consistency with Wit, is consistent with , so we may extend them to the same total order, from which we obtain . Let be the model thus constructed.
Now, obtain a new model by extending as follows. For every pair , introduce a new variable with and . It is possible to add such variables since is infinite. Now, modify the structural functions for each as follows. Labelling the variables for which as :
This modification is admissible because is consistent with . We claim that satisfies all of . Any is clearly satisfied: is satisfied by and in we have for all , so the structural equations are unmodified from those of . Thus, suppose that an extended atom . Consider an intervention that holds to , and holds all other to . Then under , we have the modified structural equation , so .
Now suppose . Suppose toward a contradiction that . Then there is an intervention under which toggling changes . If does not set any variables, then the structural equations are the same in and we have that , where is a literal, contradicting consistency with Wit. If sets any to for , then the value of is fixed to . The only case remaining is when only for some . If then there is a contradiction since the modification was only made if . Thus . The structural equation for then becomes ; further, is fixed to since , so this is impossible. ∎
Theorem 4.
is sound and complete with respect to the validities of over .
Proof.
Soundness follows since causal influence in is transitive, as we remarked before. To show completeness, as before, extend a given consistent to a maximal consistent set , and define by if . By consistency with , is a strict partial order. Again reproduce the construction from the proof of Thm. 2; by consistency with Wit, is consistent with and we extend them to the same total order, obtaining and .
First convert to a model . In , the structural function for a variable may be written as , where are precisely the variables preceding . To form , for each that does not satisfy , add a chain of variables such that and with structural equations for all , and change the structural function for to . It is clear that iff for any .
Now modify to obtain a model that we will show to satisfy all of . To form , for every pair such that , add a chain and a single variable . These have the times , and ; and the structural equations , for , and . Modify the structural equation for as follows, labelling the variables for which as :
The modification is admissible because all variables involved are assigned to . We claim that satisfies all of . Again, this holds for any literal, so suppose . Under an intervention that holds the to , we have that so that .
Now suppose that . The intervention witnessing this fixes either none, or one of the to . If it fixes none, then taking the values of all the variables into an intervention gives a witness that , so by Wit, . Thus suppose it fixes just one to . This gives that , with . Now ; we thus repeat the argument obtaining a sequence . Since is transitive (from Trans) we finally obtain . ∎
Proposition 3.
Proof Sketch.
In each case the proof of completeness produces a small model, usable as a certificate: variables have to be considered in the first, second, and third cases respectively. ∎
3.3 Relation to Existing Work
Thm. 2 shows that the class of openuniverse generative programs defined in §2.2 satisfies a natural set of axioms. In fact, the system AX encompasses all of the principles about counterfactual conditionals used in the complete identification algorithm of Shpitser and Pearl (2008). This, together with Thm. 1, lends at least some credence to the idea that these “merely implicit” causal models can be understood on a par with more familiar explicit causal representations such as SEMs.
The previous literature on causal conditionals has explored classes of axioms and models that are quite different from those considered here. For instance, in addition to studying the (finite) recursive structural equation models, Halpern (1998, 2000) axiomatizes the class of SEMs that have a unique solution (but may not be recursive), as well as the class of all SEMs built of arbitrary equations, which may in general lack solutions. Similarly, Zhang (2013) considers classes of SEMs with desirable sets of solutions. Some of the central axioms for these classes make reference to all variables of the signature and thus cannot be translated into the openuniverse setting. More fundamentally, it is not evident which of these models have an adequate procedural interpretation.
An analogous generalization in our setting might be to consider simulation models that “crash” under certain interventions and fail to have a solution. If we allow (Defn. 4) to be only partial computable on , the construction from Thm. 1 shows that every partial SEM has an equivalent simulation model in this wider class; but simple counterexamples show the reverse direction does not hold. Logically, only the forward direction of axiom F/D remains sound, and we leave the question of axiomatizing this wider class for future work.
4 Conclusion
We have identified two equivalent classes of models—one declarative, one procedural—formalizing the notion of an openuniverse causal model. Both classes validate an intuitive and familiar set of principles about subjunctive conditionals and the relation of causal influence. This highlights an important class of implicit generative models that can plausibly be treated as genuine causal models, on a par with (an infinitary generalization of computable, recursive) structural equation models. More detailed work is of course needed to identify concrete cases in which components of learned generative models support legitimate causal counterfactuals (see, e.g., Besserve et al. 2018 for progress on this question).
From an axiomatic perspective, it would be desirable to extend the present treatment to the full probabilistic setting, since, as remarked in §2.3, both classes of models can be augmented with a natural probabilistic source. Axioms for probabilistic formal systems are well studied (e.g., Fagin et al. 1990). In the direction of a fully formalized docalculus (Pearl, 2009), one would like to embed an axiom system like AX into an appropriate probability calculus, and combine these with a logic of direct causal influence (a direct and probabilistic version of ), so that the docalculus rules could be expressed and studied in a precise formal system. While the basic definitions would be clear for SEMs (and this could of course already be investigated for finite SEMs), the extension to simulation programs is less clear. Existing identifiability algorithms require specific assumptions about exogenous noise variables, e.g., that each has only two endogenous children (Shpitser and Pearl, 2008). Some work would need to done to ensure that probabilistic programs (or Turing machines) satisfy analogous restrictions.
Finally, other extensions to the languages considered here would also be natural to investigate. We studied one higherorder relation, namely , which involves quantification over an infinite domain (the space of interventions). One of the advantages of openuniverse models is precisely that they enable reasoning beyond the propositional level. Thus, e.g., systems for reasoning about causal and counterfactual statements involving explicit quantification (as in the examples from §1) are easily motivated, and ought to be understood.
References
 Besserve et al. (2018) Besserve, M., Shajarisales, N., Schölkopf, B., and Janzing, D. (2018). Group invariance principles for causal generative models. In Proceedings of the Twentyfirst International Conference on Artificial Intelligence and Statistics (AISTATS).

Bingham et al. (2019)
Bingham, E., Chen, J. P., Jankowiak, M., Obermeter, F., Pradhan, N.,
Karaletsos, T., Singh, R., Szerlip, P., Horsfall, P., and Goodman, N. D.
(2019).
Pyro: Deep universal probabilistic programming.
Journal of Machine Learning Research
, 28:1–6.  Bottou et al. (2013) Bottou, L., Peters, J., QuiñoneroCandela, J., Charles, D. X., Chickering, D. M., Portugaly, E., Ray, D., Simard, P., and Snelson, E. (2013). Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14:3207–3260.
 Carbonetto et al. (2005) Carbonetto, P., Kisyński, J., de Freitas, N., and Poole, D. (2005). Nonparametric Bayesian logic. In Proceedings of the TwentyFirst Conference on Uncertainty in Artificial Intelligence (UAI).

de Raedt and Kimmig (2015)
de Raedt, L. and Kimmig, A. (2015).
Probabilistic (logic) programming concepts.
Machine Learning, 100(1):5–47.  Dean and Kanazawa (1989) Dean, T. and Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5(2):142–150.
 Fagin et al. (1990) Fagin, R., Halpern, J. Y., and Megiddo, N. (1990). A logic for reasoning about probabilities. Information and Computation, 87:78–128.
 Freer et al. (2012) Freer, C. E., Roy, D. M., and Tenenbaum, J. B. (2012). Towards commonsense reasoning via conditional simulation: Legacies of Turing in artificial intelligence. In Downey, R., editor, Turing’s Legacy. ASL Lecture Notes in Logic.
 Friedman et al. (2000) Friedman, N., Halpern, J. Y., and Koller, D. (2000). Firstorder conditional logic for default reasoning revisited. ACM Transactions on Computational Logic, 1(2):175–207.
 Friedman et al. (1998) Friedman, N., Murphy, K., and Russell, S. (1998). Learning the structure of dynamic probabilistic networks. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI).
 Galles and Pearl (1998) Galles, D. and Pearl, J. (1998). An axiomatic characterization of causal counterfactuals. Foundations of Science, 3(1):151–182.
 Ginsberg (1986) Ginsberg, M. L. (1986). Counterfactuals. Artificial Intelligence, 30:35–79.
 Gogate and Domingos (2016) Gogate, V. and Domingos, P. (2016). Probabilistic theorem proving. Communications of the ACM, 59(7):107–115.
 Goodman et al. (2008) Goodman, N. D., Mansinghka, V. K., Roy, D., Bonawitz, K., and Tenenbaum, J. B. (2008). Church: a language for generative models. In Proceedings of the Twentyfourth Conference on Uncertainty in Artificial Intelligence (UAI).
 Halpern (1998) Halpern, J. Y. (1998). Axiomatizing causal reasoning. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI).
 Halpern (2000) Halpern, J. Y. (2000). Axiomatizing causal reasoning. Journal of AI Research, 12:317–337.
 Halpern (2013) Halpern, J. Y. (2013). From causal models to counterfactual structures. Review of Symbolic Logic, 6(2):305–322.

Hamrick (2019)
Hamrick, J. B. (2019).
Analogies of mental simulation and imagination in deep learning.
Current Opinions in Behavioral Sciences, 29:8–16.  Hyttinen et al. (2015) Hyttinen, A., Eberhardt, F., and Järvisalo, M. (2015). Docalculus when the true graph is unknown. In Proceedings of the ThirtyFirst Conference on Uncertainty in Artificial Intelligence (UAI).
 Ibeling and Icard (2018) Ibeling, D. and Icard, T. (2018). On the conditional logic of simulation models. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018).
 Icard (2017) Icard, T. F. (2017). From programs to causal models. In Cremers, A., van Gessel, T., and Roelofsen, F., editors, Proceedings of the 21st Amsterdam Colloquium, pages 35–44.
 Janzing and Schölkopf (2010) Janzing, D. and Schölkopf, B. (2010). Causal inference using the algorithmic Markov condition. IEEE Transactions on Information Theory, 56(10):5168–5194.
 Kemp (2012) Kemp, C. (2012). Exploring the conceptual universe. Psychological Review, 119(4):685–722.
 Kraus et al. (1990) Kraus, S., Lehmann, D., and Magidor, M. (1990). Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence, 44(2):167–207.
 Lake et al. (2017) Lake, B. M., Ullman, T. D., Tenenbaum, J. B., and Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40.
 Lauritzen and Richardson (2002) Lauritzen, S. L. and Richardson, T. S. (2002). Chain graphical models and their causal interpretations. Journal of the Royal Statistical Society, B, 64(3):321–361.
 Lewis (1973) Lewis, D. (1973). Counterfactuals. Harvard University Press.
 Li et al. (2017) Li, J., Xu, K., Chaudhuri, S., Yumer, E., Zhang, H., and Guibas, L. (2017). GRASS: Generative recursive autoencoders for shape structures. ACM Transactions on Graphics, 36(4):1–14.
 Meek and Glymour (1994) Meek, C. and Glymour, C. (1994). Conditioning and intervening. The British Journal for the Philosophy of Science, 45:1001–1021.
 Milch et al. (2005) Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2005). BLOG: Probabilistic models with unknown objects. In Proc. 19th IJCAI, pages 1352–1359.
 Mohamed and Lakshminarayanan (2017) Mohamed, S. and Lakshminarayanan, B. (2017). Learning in implicit generative models. arXiv:1610.03483v4.
 Pearl (2009) Pearl, J. (2009). Causality. CUP.
 Pearl and Bareinboim (2012) Pearl, J. and Bareinboim, E. (2012). External validity: From docalculus to transportability across populations. Statistical Science, 29(4):579–595.
 Peters et al. (2017) Peters, J., Janzing, D., and Schölkopf, B. (2017). Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press.
 Pfeffer and Koller (2000) Pfeffer, A. and Koller, D. (2000). Semantics and inference for recursive probability models. In Proc. 7th AAAI, pages 538–544.
 Poole (2003) Poole, D. (2003). Firstorder probabilistic inference. In Proc. 18th IJCAI.
 Richardson and Domingos (2006) Richardson, M. and Domingos, P. (2006). Markov logic networks. Machine Learning, 62(12):107–136.
 Shpitser and Pearl (2008) Shpitser, I. and Pearl, J. (2008). Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9:1941–1979.
 Spirtes et al. (2000) Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction, and Search. MIT Press.
 Srivastava et al. (2014) Srivastava, S., Russell, S., Ruan, P., and Cheng, X. (2014). Firstorder openuniverse POMDPs. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI).
 Strotz and Wold (1960) Strotz, R. H. and Wold, H. O. A. (1960). Recursive vs. nonrecursive systems: an attempt at synthesis. Econometrica, 28(2):417–427.
 Tran et al. (2017) Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., and Blei, D. M. (2017). Deep probabilistic programming. In International Conference on Learning Representations (ICLR).
 Woodward (2003) Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford university press.
 Zhang (2013) Zhang, J. (2013). A Lewisian logic of causal counterfactuals. Minds and Machines, 23:77–93.
Comments
There are no comments yet.