1 Introduction and Motivation
Much of intelligent action and reasoning involves assessing what would occur (or would have occurred) under various non-actual conditions. Such hypothetical and counterfactual (broadly, subjunctive) conditionals are bound up with central topics in artificial intelligence, including prediction, explanation, causal reasoning, and decision making. It is thus for good reason that AI researchers have focused a great deal of attention on conditional reasoning (see, e.g., [Ginsberg1986, Delgrande1998, Friedman et al.2000, Pearl2009, Bottou et al.2013], among many others).
Two broad approaches to subjunctive conditionals have been especially salient in the literature. The first, originating in philosophy [Stalnaker1968, Lewis1973], takes as basic a “similarity” or “normality” ordering on possibilities, and evaluates a claim ‘if then ’ by asking whether is true in (e.g., all) the most normal possibilities. The second approach, associated with the work of Judea Pearl, takes as basic a causal “structural equation” model (SEM), and evaluates conditionals according to a defined notion of intervention on the model. These two approaches are in some technical and conceptual respects compatible [Pearl2009], though they can also be shown to conflict on some basic logical matters [Halpern2013]. Both capture important intuitions about conditional reasoning, and both have enjoyed successful applications in AI research.
In this article we propose a third approach to conditionals, which captures a different intuition, and which can already be seen as implicit in a growing body of work in AI, as well as in cognitive science. This approach takes as basic the notion of a simulation model, that is, a program for simulating the transformation from one state of the world to another, or for building up or generating a world from a partial description of it. Simulation models have been of interest since the earliest days of AI [Newell and Simon1961]
. A recent tradition, coming out of work on statistical relational models, has proposed building complex generative models using rich and expressive programming languages, typically also incorporating probability (e.g.,[Pfeffer and Koller2000, Milch et al.2005, Goodman et al.2008, de Raedt and Kimmig2015]). Such languages have also been used for modeling human reasoning, including with counterfactuals [Goodman et al.2015].
Simulation models have an obvious causal (and more general dependence) structure, and it is natural to link conditionals with this very structure. We can assess a claim ‘if then ’ by intervening on the program to ensure that holds true throughout the simulation, and asking whether holds upon termination. This is conceptually different from the role of intervention in structural equation models, where the post-intervention operation is to find solutions to the manipulated system of equations. As we shall see, this conceptual difference has fundamental logical ramifications.
This more procedural way of thinking about subjunctive conditionals enjoys various advantages. First, there is empirical evidence suggesting that human causal and conditional reasoning is closely tied to mental simulation [Sloman2005]. Second, there are many independent reasons to build generative models in AI (e.g., minimizing prediction error in classification; see [Liang and Jordan2008]), making them a common tool. Thus, opportunistically, we can expect to have such models readily available (perhaps unlike normality orderings or even structural equation models).
Related to this second point, many of the generative models that are currently being built using deep neural networks fit neatly into our approach, even though we can often only use them as black boxes (see, e.g.,[Mirza and Osindero2014, Kocaoglu et al.2017], etc.). We know how to intervene on these programs (i.e., controlling input), and how to read off a result or prediction—that is, we can observe what conditional claims they embody—even though we may not understand all the causal details of the learned model. Some authors have recently argued that certain kinds of counterfactual analysis in particular establish an appropriate standard for interpretability for these models [Wachter et al.2018].
Our contribution in this article is threefold: (1) we propose a general semantic analysis of conditional claims in terms of program executions, subsuming all the aforementioned application areas; (2) we establish completeness theorems for a propositional conditional language with respect to (four different classes of) programs, allowing a comparison with alternative approaches at a fundamental logical level; (3) we establish -completeness of the satisfiability problem for these logical systems. Before turning to these details, we explain informally what is distinctive about the resulting logic.
2 Conditional Logics
The literature on conditional logic is extensive. We focus here on the most notable differences between the systems below and more familiar systems based on either world-orderings or SEMs. We will be using a notation inspired by dynamic logic (also used by [Halpern2000]), whereby can loosely be read as, ‘if were true, then would be true.’ Understanding the complete logic of a given interpretation can be of both theoretical and practical interest. In the causal setting, for instance, a complete set of axioms may give the exact conditions under which some counterfactual quantity is (not) identifiable from statistical data [Pearl2009].
One of the bedrock principles of conditional reasoning is called Cautious Monotonicity [Kraus et al.1990], or sometimes the Composition rule [Pearl2009]. This says that from we may always infer . While there are known counterexamples to it in the literature—it fails for some probabilistic and possibilistic interpretations [Dubois and Prade1991] and in standard versions of default logic [Makinson1994]—the principle is foundational to both world-ordering models and SEMs. By contrast, in our setting, holding fixed during the simulation may interrupt the sequence of steps leading to being made true. Here is a simple example (taken from [Icard2017]):
If Alf were ever in trouble (), the neighbors Bea and Cam would both like to help ( and , respectively). But neither wants to help if the other is already helping. Imagine the following scenario: upon finding out that Alf is in trouble, each looks to see if the other is already there to help. If not, then each begins to prepare to help, eventually making their way to Alf but never stopping again to see if the other is doing the same. If instead, e.g., Cam initially sees Bea already going to help, Cam will not go. One might then argue that the following both truly describe the situation: ‘If Alf were in trouble, Bea and Cam would both go to help’ and ‘If Alf were in trouble and Bea were going to help, Cam would not go to help’.
The example trades on a temporal ambiguity about when Bea is going to help, and it can be blocked simply by time-indexing variables. However, following a common stance in the literature [Halpern2000, Pearl2009], we maintain that requiring temporal information always be made explicit is excessively stringent. Furthermore, in line with our earlier remarks about black box models, we may often be in a situation where we simply do not understand the internal temporal and causal structure of the program. To take a simple example, asking a generative image model to produce a cityscape might result in images with clouds and blue skies, even though a request to produce a cityscape with a blue sky might not result in any clouds. We would like a framework that can accommodate conditional theories embodied in artifacts like these.
Our completeness results below (Thm. 1) show that the logic of conditional simulation is strictly weaker than any logic of structural equation models (as established in [Halpern2000]) or of normality orderings (as, e.g., in [Lewis1973]). The conditional logic of all programs is very weak indeed. At the same time, some of the axioms in these frameworks can be recovered by restricting the class of programs (e.g., the principle of Conditional Excluded Middle, valid on structural equation models and on some world-ordering models [Stalnaker1968], follows from optional axiom F below). We view this additional flexibility as a feature. However, even for a reader who is not convinced of this, we submit that understanding the logic of this increasingly popular way of thinking about conditional information is valuable.
The notion of intervention introduced below (Defn. 1) is different from, but inspired by, the corresponding notion in SEMs [Meek and Glymour1994, Pearl2009]. The logical language we study in this paper, restricting antecedents to conjunctive clauses but closing off under Boolean connectives, follows [Halpern2000].
Interestingly, prior to any of this work, [Balkenius and Gärdenfors1991] studied conditionals interpreted specifically over certain classes of neural networks, using a definition of “clamping a node” similar to our notion of intervention. They also observed that some of the core principles of non-monotonic logic fail for that setting. (See in addition [Leitgeb2004] for further development of related ideas.)
Let be a set of atoms and let be the language of propositional formulas over atoms in closed under disjunction, conjunction, and negation. Let be the language of purely conjunctive, ordered formulas of unique literals, i.e., formulas of the form , where and each is either or . Each formula in will specify an intervention by giving fixed values for a fixed list of variables. We also include the “empty” intervention in . Given , is the -equivalent of if is a propositionally consistent, purely conjunctive formula over literals and results from a reordering of literals and deletion of repeated literals in . For example, the -equivalent of is . Let be the language of formulas of the form for . We call such a formula a subjunctive conditional, and call the antecedent and the consequent. The overall causal simulation language is the language of propositional formulas over atoms in closed under disjunction, conjunction, and negation. For , abbreviates , and denotes . We use for the dual of , i.e., abbreviates .
We now define the semantics of over causal simulation models. A causal simulation model is a pair
of a Turing machineand tape contents represented by a state description , which specifies binary222The present setting can be easily generalized to the arbitrary discrete setting, indeed without changing the logic. See [Icard2017]. values for all tape variables, only finitely many of which can be nonzero. Running on input yields a new state description as output, provided the execution halts. We say iff in . Satisfaction of is then defined in the familiar way by recursion. For -atoms we define iff . Toward a definition of satisfaction for subjunctive conditionals, we now define an intervention (in the same way as in [Icard2017]):
Definition 1 (Intervention).
An intervention is a computable function mapping a machine to a new machine by taking a set of values , a finite index set, and holding fixed the value of each to throughout the execution of . That is, first sets each to , then runs while ignoring any write to any .
Any uniquely specifies an intervention, which we denote as : each literal in gives a tape variable to hold fixed, and the literal’s polarity tells us to which value it is to be fixed. Now we define iff for all halting executions of on , the resulting tape satisfies . Note that for deterministic machines, this means either does not halt on , or the unique resulting tape satisfies . The definition also implies that iff there exists a halting execution of on whose result satisfies . Having now defined for atoms , for complex is defined by recursion.
If is propositionally consistent, then it is undecidable whether .
Under a suitable encoding of natural numbers on the variable tape, the class , where is the class of all machines, gives an enumerable list of all the partial recursive functions, with computably recoverable from . Moreover, halts on input with output is extensional and , so by the Rice-Myhill-Shapiro Theorem it is undecidable. If we could decide whether , this would allow us to decide whether . ∎
A second limitative result is that we cannot have strong completeness (that is, completeness relative to arbitrary sets of assumptions), since by Prop. 2 we do not have compactness. On the other hand, our axiom systems (Defn. 3) are weakly complete (complete relative to finite assumption sets).
The language interpreted over causal simulation models is not compact.
Let be any uncomputable total function such that for all and consider with . If satisfies every , we could compute by intervening to set to 1, and checking which other variable is set to 1. As is total and , we could always find such . So is unsatisfiable. But it is easily seen that every finite subset of is satisfiable. ∎
5 Axiomatic Systems
We will now identify axiomatic systems (Defn. 3) that are sound and complete with respect to salient classes (Defn. 2) of causal simulation models, by which we mean that they prove all (completeness) and only (soundness) the generally valid principles with respect to those classes.
Let be the class of all causal simulation models , where may be non-deterministic. Let be the class of models with deterministic , and let be the class of models with non-deterministic that halt on all input tapes and interventions. Also let .
Below are two rules and four axioms.333 We use the standard names from modal and non-monotonic logic. The Left Equivalence rule [Kraus et al.1990], namely, infer from , is not needed: since antecedents belong to , they are never distinguished beyond equivalence.
|Propositional calculus (over the atoms of )|
AX denotes the system containing axioms R and K and closed under PC and RW. is AX in addition to axiom F, is AX in addition to axiom D, and is the system combining all of these axioms and rules.
For the remainder of this article, fix to be one of the classes , , , or , and let be the respective deductive system of Defn. 3. Then:
is sound and complete for validities with respect to the class .
The soundness of PC, RW, R, and K is straightforward. If is (or ), any has at most one halting execution, so a property holding of one execution holds of all and F is sound. If is (or ), then any has at least one halting execution, so a property holding of all holds of one, and D is sound.
As for completeness, it suffices to show that any -consistent has a canonical model satisfying it. Working toward the construction of , we prove a normal form result (Lem. 1) that elucidates what is required in order to satisfy (Lem. 3). We then define simple programming languages (Defn. 4)—easily seen to be translatable into Turing machine code—that we employ to construct a program for that meets exactly these requirements.
Any is provably-in-AX (and -) equivalent to a disjunction of conjunctive clauses, where each clause is of the form
and while for all for all and for all . We may assume without loss of generality that for distinct .
Note that provably in AX, and . Use these equivalences and PC and RW to rewrite and get the result. ∎
Given a clause as in (1), let be the set of -antecedents appearing in . Each gives rise to a selection function (cf. [Stalnaker1968]), obtained (not uniquely) as follows. To give the value of , suppose that for some . If for some , then is consistent: otherwise, implies which is - (and ) inconsistent. Thus for some , is also consistent. In general may be for multiple . For each such , we find such a . We then set to the set of -equivalents of the , and set to the set of -equivalents of the , if for any . The remaining case is that but for any ; in this case, set .
If is or we can assume is a singleton (or possibly empty in the case of ). If is or we can assume that .
In , if and , then because and , and thus , we have . In it is always possible to assume that for each there is some such that appears as a conjunct. So no such will be sent to . ∎
Let be a disjunct as in (1). Let . Suppose that , and for all that for each , that , and that whenever . Then .
We show that satisfies every conjunct in (1); satisfaction of is given. For conjuncts , for , suppose first that , for any . Then such that . If then for all such . Thus suppose for some . Again by the construction of , we have such that for some where each . Then implies for each such . To see that for each such that , by the assumption, we have for some . Generalizing the disjunction to and distributing it through shows that . Finally, for conjuncts where for any , we have so that . But then for any whatsoever, so that such conjuncts are satisfied. ∎
Let be a programming language whose programs are the instances of prog in the following grammar: ¡const¿ ::= ‘0’ — ‘1’ ¡var¿ ::= — — — —
¡cond¿ ::= ¡var¿ ‘=’ ¡const¿ — ¡var¿ ‘=’ ¡var¿ ¡var¿ ‘!=’ ¡var¿ — ¡cond¿ ‘&’ ¡cond¿
¡assign¿ ::= ¡var¿ ‘:=’ ¡const¿ — ¡var¿ ‘:=’ ¡var¿ — ¡var¿ ‘:= !’ ¡var¿
¡branches¿ ::= ¡prog¿ — ¡branches¿ ‘or’ ¡branches¿
¡prog¿ ::= ‘’ — ¡assign¿ — ¡prog¿ ‘;’ ¡prog¿ — ‘loop’ ‘if’ ¡cond¿ ‘then’ ¡prog¿ ‘else’ ¡prog¿ ‘end’ ‘choose’ ¡branches¿ ‘end’ will denote the same language except that excludes choose-statements, is identical but for excluding loop-statements, and is identical but for excluding both choose- and loop-statements.
A program in any of these languages may be “compiled” to the right type of Turing machine in an obvious way (loop represents an unconditional infinite loop). For the remainder of the article, fix to be the programming language of Defn. 4 corresponding to the choice of .
With the normal form result and suitable languages in hand, we proceed to construct the canonical model for . need only satisfy a consistent clause as in (1). Intuitively, will satisfy -atoms in via a suitable tape state (existent as and a fortiori is consistent), and will satisfy each -atom by dint of a branch in , conditional on the antecedent, in which the consequent is made to hold. We now write the -code of such a .
Suppose we are given , and that for each we have code defining a condition that is met iff the program is currently being run under an intervention that fixes to be true. Then consider a -program that contains one if-statement for each , each executing if is met. In the body of the if-statement for , has a choose-statement with one branch for each . The branch for each consists of a sequence of assignment statements guaranteed to make hold, call this , clearly existent since each is satisfiable. If is a singleton, this body contains only ; if , then this body consists of a single loop-statement. If is the machine corresponding to , and is a tape state satisfying , then for each , as the program has a halting branch with ; also, as there are no other halting executions. If , then , since under an -fixing intervention the program reaches a loop-statement and has no halting executions. So by Lem. 3, we have that satisfies . And thus . To see that , apply Lem. 2: in , so we have no loops in and . In , we have no choose-statements, so ; in , we have neither loop- nor choose-statements, and .
But how do we know it is possible to write code by which the program can tell whether it is being run under an -fixing intervention? For any tape variable, we may try to toggle it. If the attempt succeeds, then the variable is not presently fixed by an intervention. If not, then the present execution is under an intervention fixing the variable. Thus, we first try to toggle each relevant variable. Let be the maximum index of any atom appearing in . Listing 1—call it — performs the toggle check for and records the result in . It uses as a temporary variable and ultimately leaves the value of unchanged.
If has already been run for all , simply checks that exactly those variables appearing in have been marked as intervened on, and that these have the correct values. If is the -equivalent of , code for is given in Listing 2.
Completing the description of the code of adumbrated earlier, consists of, in order:
One copy of for each .
For each , an if-statement with condition , whose body is:
a choose-statement with a branch for each , with body , if ;
a -snippet, if ;
or a single loop-statement if .
Note that never reads or writes a variable for , and the relevant -operations may be implemented with bounded space, so that we have the following Corollary:
Let be the class of finite state machine restrictions of , i.e. those where uses only boundedly many tape variables, for any input and intervention. Then Thm. 1 holds also for . ∎
6 Computational Complexity
In this section we consider the problem of deciding whether a given is satisfiable in . Although by Prop. 1, it is in general undecidable whether a given particular simulation model satisfies a formula, we show here that it is decidable whether a given formula is satisfied by any model. In fact, reasoning in this framework is no harder than reasoning in propositional logic:
is -complete in (where is defined standardly).
We clearly have -hardness as propositional satisfiability can be embedded directly into -satisfiability. To see that satisfiability is , we guess a and check whether . is infinite, and the checking step is undecidable by Prop. 1. So how could such an algorithm work? The crucial insight is that we may limit our search to a finite class of models that are similar to the canonical (Lem. 5). Moreover, a nice property of the canonical is that it wears its causal structure on its sleeves: one can read off the effect of any intervention from the code of , and has polynomial size in (implied by Lem. 4). Models in will share this property, guaranteeing that the checking step can be done in polynomial time. We will now make precise and prove these claims. Let denote the set of -antecedents appearing in . For , define as the fragment of programs whose code consists of:
One copy of (Listing 1), for each , followed by
at most one copy of an if-statement with condition (Listing 2) for each , whose body is one and only one of the following options, (a)–(c):
a choose-statement with at most branches, each of which has a body consisting of a single sequence of assignments, which may only be to variables for ;
a single sequence of assignment statements, only to variables for ;
a single loop-statement.
However, if , (a) is not allowed; if , (c) is not allowed; and if , neither (a) nor (c) is allowed.
The maximum length (defined standardly) of a program in is polynomial in , and there is a such that for all , we have , assuming exists.
is , so part 1 of a program is in length. There are at most if-statements in part 2; consider the body of each one. In case (a) it has branches, each of which involves assignment to at most variables, and thus has length . In case (b) its length is ; in case (c) its length is . Since is , the total length of part 2 is , so that both parts combined are . To show the existence of , it suffices to prove: any choose-statement in the body of an if-statement in has branches. Now, the number of branches in the if-statement for is , for some consistent as in (1). But (1) is a clause of the disjunctive normal form of and contains no more -literals than does , which is of course . Since each element of arises from the selection of a literal in (1), the number of branches is . ∎
Henceforth let denote for some guaranteed by Lem. 4, and call the set of where only tape variables with indices are possibly nonzero . Let be the class of models where comes from a -program and . is finite, and the following Lemma guarantees that we may restrict the search to :
is satisfiable with respect to iff it is satisfiable with respect to .
Now with Lem. 5 our algorithm will guess a program and a tape , and verify whether the guessed model satisfies . We just need to show that the verification step is decidable in polynomial time. Suppose that all negations in appear only before -atoms, since any formula may be converted to such a form in linear time. Further, rewrite literals of the form to . Then it suffices to show that we can decide in polynomial time whether satisfies a given literal in : there are linearly many of these and the truth-value of may be evaluated from their values in linear time. For an -literal or , we simply output whether or not . For -literals with antecedent , simulate execution of on . Because and such programs trigger at most one if-statement when run under an intervention, we may perform this simulation by checking if there is any if-statement for in . If so, do one of the following, depending on what its body contains:
If a choose-statement, simulate the result of running each branch. Output true iff: either the literal was and every resulting tape satisfies , or the literal was and at least one resulting tape satisfies .
If an assignment sequence, simulate running it on the current tape, and output true iff the resulting tape satisfies .
If a loop, output true iff the literal is of the form.
This algorithm is correct since we thereby capture all halting executions, given that -programs conform to the fixed structure above. That it runs in polynomial time follows from the polynomial-length bound of Lem. 4. ∎
7 Conclusion and Future Work
A very natural way to assess a claim, ‘if were true, then would be true,’ is to run a simulation in which is assumed to hold and determine whether would then follow. Simulations can be built using any number of tools: (probabilistic) programming languages designed specifically for generative models, generative neural networks, and many others. Our formulation of intervention on a simulation program is intended to subsume all such applications where conditional reasoning seems especially useful. We have shown that this general way of interpreting conditional claims has its own distinctive, and quite weak, logic. Due to the generality of the approach, we can validate further familiar axioms by restricting attention to smaller classes of programs (deterministic, always-halting). We believe this work represents an important initial step in providing a foundation for conditional reasoning in these increasingly common contexts.
To close, we would like to mention several notable future directions. Perhaps the most obvious next step is to extend our treatment to richer languages, and in particular to the first order setting. This is pressing for several reasons. First, much of the motivation for many of the generative frameworks mentioned earlier was to go beyond the propositional setting characteristic of traditional graphical models, for example, to be able to handle unknown (numbers of) objects (see [Poole2003, Milch et al.2005]).
Second, much of the work in conditional logic in AI has dealt adequately with the first order setting by using frameworks based on normality orderings [Delgrande1998, Friedman et al.2000]. It is perhaps a strike against the structural equation approach that no one has shown how to extend it adequately to first order languages (though see [Halpern2000] for partial suggestions). In the present setting, just as we have used a tape data structure to encode a propositional valuation, we could also use such data structures to encode first order models. The difficult question then becomes how to understand complex (i.e., arbitrary first-order) interventions. We have begun exploring this important extension.
Given the centrality of probabilistic reasoning for many of the aforementioned tools, it is important to consider the probabilistic setting. Adding explicit probability operators in the style of [Fagin et al.1990] results in a very natural extension of the system [Ibeling2018]. One could also use probability thresholds (see, e.g, [Hawthorne and Makinson2007]): we might say just when results in output satisfying with at least some threshold probability.
Finally, another direction is to consider additional subclasses of programs, even for the basic propositional setting we have studied here. For example, in some contexts it makes sense to assume that variables are time-indexed and that no variable depends on any variable at a later point in time (as in dynamic Bayesian networks[Dean and Kanazawa1989]). In this setting there are no cyclic dependencies, which means we do not have programs like that in Example 1. Understanding the logic of such classes would be worthwhile, especially for further comparison with central classes of structural equation models (such as the “recursive” models of [Pearl2009]).
Duligur Ibeling is supported by the Sudhakar and Sumithra Ravi Family Graduate Fellowship in the School of Engineering at Stanford University.
- [Balkenius and Gärdenfors1991] Christian Balkenius and Peter Gärdenfors. Nonmonotonic inferences in neural networks. In Proceedings of KR, 1991.
[Bottou et al.2013]
Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Danis X.
Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and
Counterfactual reasoning and learning systems: The example of
Journal of Machine Learning Research, 14:3207–3260, 2013.
[de Raedt and Kimmig2015]
Luc de Raedt and Angelika Kimmig.
Probabilistic (logic) programming concepts.Machine Learning, 100(1):5–47, 2015.
- [Dean and Kanazawa1989] Thomas Dean and Keiji Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(2):142–150, 1989.
- [Delgrande1998] James P. Delgrande. On first-order conditional logics. Artificial Intelligence, 105:105–137, 1998.
- [Dubois and Prade1991] Didier Dubois and Henri Prade. Fuzzy sets in approximate reasoning, Part 1: Inference with possibility distributions. Fuzzy Sets and Systems, 40(1):182–224, 1991.
- [Fagin et al.1990] Ronald Fagin, Joseph Y. Halpern, and Nimrod Megiddo. A logic for reasoning about probabilities. Information and Computation, 87:78–128, 1990.
- [Friedman et al.2000] Nir Friedman, Joseph Y. Halpern, and Daphne Koller. First-order conditional logic for default reasoning revisited. ACM Transactions on Computational Logic, 1(2):175–207, 2000.
- [Ginsberg1986] Matthew L. Ginsberg. Counterfactuals. Artificial Intelligence, 30:35–79, 1986.
- [Goodman et al.2008] Noah D. Goodman, Vikash K. Mansinghka, Daniel Roy, Keith Bonawitz, and Joshua B. Tenenbaum. Church: a language for generative models. In Proc. 24th UAI, 2008.
- [Goodman et al.2015] Noah D. Goodman, Joshua B. Tenenbaum, and Tobias Gerstenberg. Concepts in a probabilistic language of thought. In Eric Margolis and Stephan Laurence, editors, The Conceptual Mind: New Directions in the Study of Concepts. MIT Press, 2015.
- [Halpern2000] Joseph Y. Halpern. Axiomatizing causal reasoning. Journal of AI Research, 12:317–337, 2000.
- [Halpern2013] Joseph Y. Halpern. From causal models to counterfactual structures. Review of Symbolic Logic, 6(2):305–322, 2013.
- [Hawthorne and Makinson2007] James Hawthorne and David Makinson. The quantitative/qualitative watershed for rules of uncertain inference. Studia Logica, 86(2):247–297, 2007.
- [Ibeling2018] Duligur Ibeling. Causal modeling with probabilistic simulation models. Manuscript, 2018.
- [Icard2017] Thomas F. Icard. From programs to causal models. In Alexandre Cremers, Thom van Gessel, and Floris Roelofsen, editors, Proceedings of the 21st Amsterdam Colloquium, pages 35–44, 2017.
- [Kocaoglu et al.2017] Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis, and Sriram Vishwanath. CausalGAN: Learning causal implicit generative models with adversarial training. Unpublished manuscript: https://arxiv.org/abs/1709.02023, 2017.
- [Kraus et al.1990] Sarit Kraus, Daniel Lehmann, and Menachem Magidor. Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence, 44(2):167–207, 1990.
- [Leitgeb2004] Hannes Leitgeb. Inference on the Low Level: An Investigation Into Deduction, Nonmonotonic Reasoning, and the Philosophy of Cognition. Kluwer, 2004.
- [Lewis1973] David Lewis. Counterfactuals. Harvard University Press, 1973.
- [Liang and Jordan2008] Percy Liang and Michael I. Jordan. In 25th ICML, 2008.
- [Makinson1994] David Makinson. General patterns in nonmonotonic reasoning. In D. Gabbay et al., editor, Handbook of Logic in Artificial Intelligence and Logic Programming, volume III, pages 35–110. OUP, 1994.
- [Meek and Glymour1994] Christopher Meek and Clark Glymour. Conditioning and intervening. The British Journal for the Philosophy of Science, 45:1001–1021, 1994.
- [Milch et al.2005] Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. BLOG: Probabilistic models with unknown objects. In Proc. 19th IJCAI, pages 1352–1359, 2005.
- [Mirza and Osindero2014] Mehdi Mirza and Simon Osindero. Conditional generative adversarial networks. Manuscript: https://arxiv.org/abs/1709.02023, 2014.
- [Newell and Simon1961] Allen Newell and Herbert A. Simon. Computer simulation of human thinking. Science, 134(3495):2011–2017, 1961.
- [Pearl2009] Judea Pearl. Causality. CUP, 2009.
- [Pfeffer and Koller2000] Avi Pfeffer and Daphne Koller. Semantics and inference for recursive probability models. In Proc. 7th AAAI, pages 538–544, 2000.
- [Poole2003] David Poole. First-order probabilistic inference. In Proc. 18th IJCAI, 2003.
- [Sloman2005] Steven A. Sloman. Causal Models: How We Think About the World and its Alternatives. OUP, 2005.
- [Stalnaker1968] Robert Stalnaker. A theory of conditionals. American Philosophical Quarterly, pages 98–112, 1968.
- [Wachter et al.2018] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law and Technology, 2018.