The Causal Frame Problem: An Algorithmic Perspective

01/26/2017
by   Ardavan Salehi Nobandegani, et al.
McGill University
1

The Frame Problem (FP) is a puzzle in philosophy of mind and epistemology, articulated by the Stanford Encyclopedia of Philosophy as follows: "How do we account for our apparent ability to make decisions on the basis only of what is relevant to an ongoing situation without having explicitly to consider all that is not relevant?" In this work, we focus on the causal variant of the FP, the Causal Frame Problem (CFP). Assuming that a reasoner's mental causal model can be (implicitly) represented by a causal Bayes net, we first introduce a notion called Potential Level (PL). PL, in essence, encodes the relative position of a node with respect to its neighbors in a causal Bayes net. Drawing on the psychological literature on causal judgment, we substantiate the claim that PL may bear on how time is encoded in the mind. Using PL, we propose an inference framework, called the PL-based Inference Framework (PLIF), which permits a boundedly-rational approach to the CFP to be formally articulated at Marr's algorithmic level of analysis. We show that our proposed framework, PLIF, is consistent with a wide range of findings in causal judgment literature, and that PL and PLIF make a number of predictions, some of which are already supported by existing findings.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/11/2012

Robustness of Causal Claims

A causal claim is any assertion that invokes causal relationships betwee...
03/13/2013

Objection-Based Causal Networks

This paper introduces the notion of objection-based causal networks whic...
11/24/2019

Algorithmic Bias in Recidivism Prediction: A Causal Perspective

ProPublica's analysis of recidivism predictions produced by Correctional...
01/16/2013

Causal Mechanism-based Model Construction

We propose a framework for building graphical causal model that is based...
03/15/2012

Causal Conclusions that Flip Repeatedly and Their Justification

Over the past two decades, several consistent procedures have been desig...
07/28/2021

Causal Support: Modeling Causal Inferences with Visualizations

Analysts often make visual causal inferences about possible data-generat...
08/28/2020

Causal blankets: Theory and algorithmic framework

We introduce a novel framework to identify perception-action loops (PALO...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

At the core of any decision-making or reasoning task, resides an innocent-looking yet challenging question: Given an inconceivably large body of knowledge available to the reasoner, what constitutes the relevant for the task and what the irrelevant? The question, as it is posed, echoes the well-known Frame Problem (FP) in epistemology and philosophy of mind, articulated by Glymour (1987) as follows: “Given an enormous amount of stuff, and some task to be done using some of the stuff, what is the relevant stuff for the task?” Fodor (1987) comments: “The frame problem goes very deep; it goes as deep as the analysis of rationality.”

The question posed above perfectly captures what is really at the core of the FP, yet, it may suggest an unsatisfying approach to the FP at the algorithmic level of analysis (Marr, 1982). Indeed, the question may suggest the following two-step methodology: In the first step, out of all the body of knowledge available to the reasoner (termed, the model), she has to identify what is relevant to the task (termed, the relevant submodel); it is only then that she advances to the second step by performing reasoning or inference on the identified submodel. There is something fundamentally wrong with this methodology (which we term, sequential approach to reasoning) which bears on the following understanding: The relevant submodel, i.e., the portion of the reasoner’s knowledge deemed relevant to the task, oftentimes is so enormous (or even infinitely large) that the reasoner—inevitably bounded in time and computational resources—would never get to the second step, had she adhered to such a methodology. In other words, in line with the notion of bounded rationality (Simon, 1957), a boundedly-rational reasoner must have the option, if need be, to merely consult a fraction of the potentially large—if not infinitely so—relevant submodel.

Recent work by icard2015 elegantly promotes this insight when they write: “Somehow the mind must focus in on some “submodel” of the “full” model (including all possibly relevant variables) that suffices for the task at hand and is not too costly to use.”111

In an informative example on Hidden Markov Models (HMMs), Icard & Goodman (2015) present a setting wherein the relevant submodel is infinitely large—an example which makes it pronounced what is wrong with the sequential approach stated earlier.

They then ask the following question: “what kind of simpler model should a reasoner consult for a given task?” This is an inspiring question hinting to an interesting line of inquiry as to how to formally articulate a boundedly-rational approach to the FP at Marr’s algorithmic level of analysis (1982).

In this work, we focus on the causal variant of the FP, the Causal Frame Problem (CFP), stated as follows: Upon being presented with a causal query, how does the reasoner manage to attend to her causal knowledge relevant to the derivation of the query while rightfully dismissing the irrelevant? We adopt Causal Bayesian Networks (CBNs) (Pearl, 1988; Gopnik et al., 2004,

inter alia) as a normative model to represent how the reasoner’s internal causal model of the world is structured (i.e., reasoner’s mental model). First, we introduce the notion of Potential Level (PL). PL, in essence, encodes the relative position of a node (representing a propositional variable or a concept) with respect to its neighbors in a CBN. Drawing on the psychological literature on causal judgment, we substantiate the claim that PL may bear on how time is encoded in the mind. Equipped with PL, we embark on investigating the CFP at Marr’s algorithmic level of analysis. We propose an inference framework, termed PL-based Inference Framework (PLIF), which aims at empowering the boundedly-rational reasoner to consult (or retrieve222The terms “consult” and “retrieve” will be used interchangeably. We elaborate on the rationale behind that in Sec. 5, where we connect our work to Long Term Memory and Working Memory.) parts of the underlying CBN deemed relevant for the derivation of the posed query (the relevant submodel) in a local, bottom-up fashion until the submodel is fully retrieved. PLIF allows the reasoner to carry out inference at intermediate stages of the retrieval process over the thus-far retrieved parts, thereby obtaining lower and upper bounds on the posed causal query. We show, in the Discussion section, that our proposed framework, PLIF, is consistent with a wide range of findings in causal judgment literature, and that PL and PLIF make a number of predictions, some of which are already supported by the findings in the psychology literature.

In their work, Icard and Goodman (2015) articulate a boundedly-rational approach to the CFP at Marr’s computational level of analysis, which, as they point out, is from a “god’s eye” point of view. In sharp contrast, our proposed framework PLIF is not from a “god’s eye” point of view and hence could be regarded, potentially, as a psychologically plausible proposal at Marr’s algorithmic level of analysis as to how the mind both retrieves and, at the same time, carries out inference over the retrieved submodel to derive bounds on a causal query. We term this concurrent approach to reasoning, as opposed to the flawed sequential approach stated earlier.333We elaborate more on this in the Discussion section. The retrieval process progresses in a local, bottom-up fashion, hence the submodel is retrieved incrementally, in a nested manner.444The term “nested” implies that the thus-far retrieved submodel is subsumed by every later submodel (should the reasoner proceeds with the retrieval process). Our analysis (Sec. 4.1) confirms Icard and Goodman’s insight (2015) that even in the extreme case of having an infinitely large relevant submodel, the portion of which the reasoner has to consult so as to obtain a “sufficiently good” answer to a query could indeed be very small.

2 Potential Level and Time

Before proceeding further, let us introduce some preliminary notations. Random Variables (RVs) are denoted by lower-case bold-faced letters, e.g.,

x, and their realizations by non-bold lower-case letters, e.g., . Likewise, sets of RVs are denoted by upper-case bold-faced letters, e.g., X, and their corresponding realizations by upper-case non-bold letters, e.g., .

denotes the set of possible values a random quantity can take on. Random quantities are assumed to be discrete unless stated otherwise. The joint probability distribution over

is denoted by . We will use the notation to denote the sequence of RVs , hence . The terms “node” and “variable” will be used interchangeably throughout. To simplify presentation, we adopt the following notation: We denote the probability by for some RV x and its realization . For conditional probabilities, we will use the notation instead of . Likewise, for and . A generic conditional independence relationship is denoted by where , and C represent three mutually disjoint sets of variables belonging to a CBN. Furthermore, throughout the paper, we assume that is some negligibly small positive real-valued quantity. Whenever we subtract from a quantity, we simply imply a quantity less than but arbitrarily close to the original quantity. The rationale behind adopting such a notation will become clearer in Sec. 4.

Before formally introducing the notion of PL (unavoidably, with some mathematical jargon), we articulate in simple terms what the idea behind PL is. PL simply induces a chronological order on the nodes of a CBN, allowing the reasoner to encode the timing between cause and effect.555More precisely, PL induces a topological order on the nodes of a CBN, with temporal interpretations suggested in Def. 1. As we will see, PL plays an important role in guiding the retrieval process used in our proposed framework. Next, PL is formally defined, followed by two clarifying examples.

Def. 1. (Potential Level (PL)) Let and denote, respectively, the sets of parents (i.e., immediate causes) and children (i.e., immediate effects) of x. Also let . The PL of x, denoted by , is defined as follows: (i) If , , and (ii) If , is a real-valued quantity selected from the interval such that indicates the amount of time which elapses between intervening simultaneously on all the RVs in (i.e., ) and x taking its value in accord with the distribution . If , substitute the upper bound of the given interval by .

Parameter symbolizes the origin of time, as perceived by the reasoner. is a natural choice, unless the reasoner believes that time continues indefinitely into the past, in which case . The next two examples further clarify the idea behind PL. In both examples we assume .

Figure 1: Relation between PL and time: Example.

For the first example, let us consider the CBN depicted in Fig. 1(a) containing the RVs and z with and . According to Def. 1, the given PLs can be construed in terms of the relative time between the occurrence of cause and effect as articulated next. Upon intervening on x (i.e., ), after the elapse of units of time, the RV y takes its value in accord with the distribution . Likewise, upon intervening on y (i.e., ), after the elapse of units of time, z takes its value according to .

For the second example, consider the CBN depicted in Fig. 1(b) containing the RVs , and t with , and . Upon intervening on x (i.e., ) the following happens: (i) after the elapse of units of time, y takes its value according to , and (ii) after the elapse of unit of time, z takes its value according to . Also, upon intervening simultaneously on RVs (i.e., ), after the elapse of units of time, t takes its value according to .

In sum, the notion of PL bears on the underlying time-grid upon which a CBN is constructed, and adheres to Hume’s principle of temporal precedence of cause to effect Hume (1748/1975). A growing body of work in psychology literature corroborates Hume’s centuries-old insight, suggesting that the timing and temporal order between events strongly influences how humans induce causal structure over them Bramley . (2014); Lagnado  Sloman (2006). The introduced notion of PL is based on the following hypothesis:

When learning the underlying causal structure of a domain, humans may as well encode the temporal patterns (or some estimates thereof) on which they rely to infer the causal structure.

This hypothesis is supported by recent findings suggesting that people have expectations about the delay length between cause and effect Greville  Buehner (2010); Buehner  May (2004); Schlottmann (1999). It is worth noting that we could have defined PL in terms of relative expected time between cause and effect, rather than relative absolute time. Under such an interpretation, the time which elapses between the intervention on a cause and the occurrence of its effect would be modeled by a probability distribution, and PL would be defined in terms of the expected value of that distribution. Our proposed framework, PLIF, is indifferent as to whether PL should be construed in terms of absolute or expected time. greville2010temporal show that causal relations with fixed temporal intervals are consistently judged as stronger compared to those with variable temporal intervals. This finding, therefore, seems to suggest that people expect, to a greater extent, fixed temporal intervals between cause and effect, rather than variable ones—an interpretation which, at least to a first approximation, favors construing PL in terms of relative absolute time (see Def. 1).666There are cases, however, that, despite the precedence of cause to effect, quantifying the amount of time between their occurrences may bear no meaning, e.g., when dealing with hypothetical constructs. In such cases, PL should be simply construed as a topological ordering. From a purely computational perspective, PL is a generalization of topological sorting in computer science.

3 Informative Example

To develop our intuition, and before formally articulating our proposed framework, let us present a simple yet informative example which demonstrates: (i) how the retrieval process can be carried out in a local, bottom-up fashion, allowing for retrieving the relevant submodel incrementally, and (ii) how adopting PL allows the reasoner to obtain bounds on a given causal query at intermediate stages of the retrieval process.

Let us assume that the posed causal query is where are two RVs in the CBN depicted in Fig. 2(a) with PLs , and let . The relevant information for the derivation of the posed query (i.e., the relevant submodel) is depicted in Fig. 2(e).

Figure 2: Example. Query variables are shown in orange.

Starting from the target RV x in the original CBN (Fig. 2(a)) and moving one step backwards,777Taking one step backwards from variable amounts to retrieving all the parents of . is reached (Fig. 2(b)). Since , y must be a non-descendant of , and therefore, of x. Hence, conditioning on -separates x from y Pearl (1988), yielding . Thus implying: . It is crucial to note that the given bounds can be computed using the information thus-far retrieved, i.e., the information encoded in the submodel shown in Fig. 2(b). Taking a step backwards from , is reached (Fig. 2(c)). Using a similar line of reasoning to the one presented for , having ensures . Therefore, the following bounds on the posed query can be derived, which, crucially, can be computed using the information thus-far retrieved: . It is straightforward to show that the bounds derived in terms of are tighter than the bounds derived in terms of .888Here we are implicitly making the assumption that the CPDs involved in the parameterization of the underlying CBN are non-degenerate. Dropping this assumption yields the following result: The bounds derived in terms of are equally-tight or tighter than the bounds derived in terms of . Finally, taking one step backward from , y is reached (Fig. 2(d)) and the exact value for can be derived, again using only the submodel thus-far retrieved (Fig. 2(d)).

We are now well-positioned to present our proposed framework, PLIF.

4 PL-based Inference Framework (PLIF)

In this section, we intend to elaborate on how, equipped with the notion of PL, a generic causal query of the form999We do not consider interventions in this work. However, with some modifications, the presented analysis/results can be extended to handle a generic causal query of the form where Z denotes the set of intervened variables. can be derived where O and E denote, respectively, the disjoint sets of target (or objective) and observed (or evidence) variables. In other words, we intend to formalize how inference over a CBN whose nodes are endowed with PL as an attribute should be carried out. Before we present the main result, a few definitions are in order.

Def. 2. (Critical Potential Level (CPL)) The target variable with the least PL is denoted by and its PL is referred to as the CPL. More formally, and . E.g., for the setting given in Fig. 2(a), , and . Viewed through the lens of time, is the furthest target variable into the past, with PL .

There are two possibilities: (a) , or (b) , with denoting the origin of time; cf. Sec. 2. In the sequel we assume that (a) holds. For a discussion on the special case (b), the reader is referred to the Supplementary Information.

Def. 3. (Inference Threshold (IT) and IT Root Set (IT-RS)) To any real-valued quantity, , corresponds a unique set, , obtained as follows: Start at every variable with PL and backtrack along all paths terminating at x. Backtracking along each path stops as soon as a node with PL less than is encountered. Such nodes, together, compose the set . It follows from the definition that: . and are termed, respectively, Inference Threshold (IT) and the IT Root Set (IT-RS) for .

For example, the set of variables circled at the stages depicted in Figs. 2(b-d) are, the IT-RSs for , , and , respectively. Note that instead of, say , we could have said: for any . However, expressing ITs in terms of liberates us from having to express them in terms of intervals thereby simplifying the exposition in the sole hope that the reader finds it easier to follow the work. We would like to emphasize that the adopted notation should not be construed as implying that the assignment of values to ITs is such a sensitive task that everything would have collapsed, had IT not been chosen in such a fine-tuned manner. To recap, in simple terms, bears on how far into the past a reasoner is consulting her mental model in the process of answering a query, and characterizes the furthest-into-the-past concepts entertained by the reasoner in that process.

Next, we formally present the main idea behind PLIF, followed by its interpretation in simple terms.

Lemma 1. For any chosen IT and its corresponding , define . Then the following holds:

(1)

Crucially, the provided bounds can be computed using the information encoded in the submodel retrieved in the very process of obtaining the .

For a formal proof of Lemma 1, the reader is referred to the Supplementary Information. Mathematical jargon aside, the message of Lemma 1 is quite simple: For any chosen inference threshold which is further into the past than , Lemma 1 ensures that the reasoner can condition on S and obtain the reported lower and upper bounds on the query by using only the information encoded in the retrieved submodel.

It is natural to ask under what conditions the exact value to the posed query can be derived using the thus-far retrieved submodel (i.e., the submodel obtained during the identification of ). The following remark bears on that.

Remark 1. If for IT , satisfies either: (i) , or (ii) for all , and , or (iii) the lower and upper bound given in (1) are identical, then the exact value of the posed query can be derived using the submodel retrieved in the process of obtaining . Fig. 2(d) shows a setting wherein conditions (i) and (iii) are both met.

The rationale behind Remark 1 is provided in the Supplementary Information.

4.1 Case Study

Next, we intend to cast the Hidden Markov Model (HMM) studied in (Icard & Goodman, 2015, p. 2) into our framework.

Figure 3: Left: The infinite-sized HMM discussed in (Icard & Goodman, 2015) with parameterization: , and . Right: Applying PLIF on the HMM shown in left. Vertical and horizontal axes denote, respectively, the value of the posed query and the adopted IT . The vertical bars depict the intervals within which the query lies due to Lemma 1. The dotted curves—which connect the lower and upper bounds of the intervals—show how the intervals shrink as IT decreases.

The setting is shown in Fig. 3(left). We adhere to the same parametrization and query adopted therein. All RVs in this section are binary, taking on values from the set ; indicates the event wherein x takes the value 1, and implies the event wherein x takes the value 0. We assume .101010Note that the trend of the upper- and lower-bound curve as well as the size of the intervals shown in Fig. 3(right) are insensitive with regard to the choice of PLs for variables . We should note that the assignment of the PLs for the variables in does not affect the presented results in any way. The query of interest is . Notice that after performing three steps of the sort discussed in the example presented in Sec. 3 (for the IT ), the lower bound on the posed query exceeds 0.5 (shown by the red dashed line in Fig. 3(right)). This observation has the following intriguing implication. Assume, for the sake of argument, that we were presented with the following Maximum A-Posterior (MAP) inference problem: Upon observing all the variables in taking on the value 1, what would be the most likely state for the variable ? Interestingly, we would be able to answer this MAP inference problem simply after three backward moves (corresponding to the IT ). In Fig. 3(right), the intervals within which the posed query falls (due to Lemma 1) in terms of the adopted IT are depicted.

Our analysis confirms Icard and Goodman’s insight (2015) that even in the extreme case of having infinite-sized relevant submodel (Fig. 3(left)), the portion of which the reasoner has to consult so as to obtain a “sufficiently good” answer to the posed query could happen to be very small (Fig. 3(right)).

5 Discussion

To our knowledge, PLIF is the first inference framework proposed that capitalizes on time

to constrain the scope of causal reasoning over CBNs, where the term scope refers to the portion of a CBN on which inference is carried out. PLIF does not restrict itself to any particular inference scheme. The claim of PLIF is that inference should be confined within and carried out over retrieved submodels of the kind suggested by Lemma 1 so as to obtain the reported bounds therein. In this light, PLIF can accommodate all sorts of inference schemes, including Belief Propagation (BP), and sample-based inference methods using Markov Chain Monte Carlo (MCMC), as two prominent classes of inference schemes proposed in the literature.

111111MCMC-based methods have been successful in simulating important aspects of a wide range of cognitive phenomena, and giving accounts for many cognitive biases; cf. Sanborn  Chater (2016). Also, work in theoretical neuroscience has suggested mechanisms for how BP and MCMC-based methods could be realized in neural circuits; cf. Gershman  Beck (2016); Lochmann  Deneve (2011). For example, to cast BP into PLIF amounts to restricting BP’s message-passing within submodels of the kind suggested by Lemma 1. In other words, assuming that BP is to be adopted as the inference scheme, upon being presented with a causal query, an IT according to Lemma 1 will be selected—at the meta-level—by the reasoner and the corresponding submodel, as suggested by Lemma 1, will be retrieved, over which inference will be carried out using BP. This will lead to obtaining lower and upper bounds on the query, as reported in Lemma 1. If time permits, the reasoner builds up incrementally on the thus-far retrieved submodel so as to obtain tighter bounds on the query.121212The very property that the submodel gets constructed incrementally in a nested fashion guarantees that the obtained lower and upper bounds get tighter as the reasoner adopts smaller ITs; cf. Fig. 3(left). MCMC-based inference methods can be cast, in a similar fashion, into PLIF.

The problem of what parts of a CBN are relevant and what are irrelevant for a given query, according to (Geiger, Verma, & Pearl, 1989), was first addressed by Shachter (1988). The approaches proposed for identifying the relevant submodel for a given query fall into two broad categories (cf. (Mahoney & Laskey, 1998) and references therein): (i) top-down approaches, and (ii) bottom-up approaches. Top-down approaches start with the full knowledge of the underlying CBN and, depending on the posed query, gradually prune the irrelevant parts of the CBN. In this respect, top-down approaches are inevitably from “god’s eye” point of view—a characteristic which undermines their cognitive-plausibility. Bottom-up approaches, on the other hand, start at the variables involved in the posed query and move backwards till the boundaries of the underlying CBN are finally reached, only then they start to prune the parts of the constructed submodel—if any—which can be safely removed without jeopardizing the exact computation of the posed query. It is important to note that bottom-up approaches cannot stop at intermediate steps during the backward move and run inference on the thus-far constructed submodel without running the risk of compromising some of the (in)dependence relations structurally encoded in the CBN, which would yield erroneous inferences. This observation is due to the fact that there exists no local signal revealing how the thus-far retrieved nodes are positioned relative to each other and to the to-be-retrieved nodes—a shortcoming circumvented in the case of PLIF by introducing PL. Another pitfall shared by both top-down and bottom-up approaches is their sequential methodology towards the task of inference, according to which the relevant submodel for the posed query should be first constructed, and only then inference is carried out to compute the posed query.131313The computation can be carried out to obtain either the exact value or simply an approximation to the query. Nonetheless, what both top-down and bottom-up approaches agree on is that the relevant submodel is to be first identified, should the reasoner intend to compute exactly or approximately the posed query. On the contrary, PLIF submits to what we call the concurrent approach to reasoning, whereby retrieval and inference take place in tandem. The HMM example analyzed in Sec. 4.1, shows the efficacy of the concurrent approach.

Work on causal judgment provides support for the so-called alternative neglect, according to which subjects tend to neglect alternative causes to a much greater extent in predictive reasoning than in diagnostic reasoning Fernbach  Rehder (2013); Fernbach . (2011). Alternative neglect, therefore, implies that subjects would tend to ignore parts of the relevant submodel while constructing it. Recent findings, however, seem to cast doubt on alternative neglect Cummins (2014); Meder . (2014). Meder et al. (2014), Experiment 1 demonstrates that subjects appropriately take into account alternative causes in predictive reasoning. Also, Cummins (2014) substantiates a two-part explanation of alternative neglect according to which: (i) subjects interpret predictive queries as requests to estimate the probability of the effect when only the focal cause is present, an interpretation which renders alternative causes irrelevant, and (ii) the influence of inhabitory causes (i.e., disablers) on predictive judgment is underestimated, and this underestimation is incorrectly interpreted as neglecting of alternative causes. Cummins (2014), Experiment 2 shows that when predictive inference is queried in a manner that more accurately expresses the meaning of noisy-OR Bayes net (i.e., the normative model adopted by fernbach2011asymmetries) likelihood estimates approached normative estimates. cummins2014impact, Experiment 4 shows that the impact of disablers on predictive judgments is far greater than that of alternative causes, while having little impact on diagnostic judgments. PLIF commits to the retrieval of enablers as well as disablers. As mentioned earlier, PLIF abstracts away from the inference algorithm operating on the retrieved submodel, and, hence, leaves it to the inference algorithm to decide how the retrieved enablers and disablers should be integrated. In this light, PLIF is consistent with the results of Experiment 4.

In an attempt to explain violations of screening-off reported in the literature, park2013mechanistic find strong support for the contradiction hypothesis followed by the mediating mechanism hypothesis, and finally conclude that people do conform to screening-off once the causal structure they are using is correctly specified. PLIF is consistent with these findings, as it adheres to the assumption that reasoners carry out inference on their internal causal model (including all possible mediating variables and disablers), not the potentially incomplete one presented in the cover story; see also Rehder  Waldmann (2015); Sloman  Lagnado (2015).

Experiment 5 in Cummins (2014), consistent with Fernbach  Rehder (2013), shows that causal judgments are strongly influenced by memory retrieval/activation processes, and that both number of disablers and order of disabler retrieval matter in causal judgments. These findings suggest that the CFP and memory retrieval/activation are intimately linked. In that light, next, we intend to elaborate on the rationale behind adopting the term “retrieve” and using it interchangeably with the term “consult” throughout the paper; this is where we relate PLIF to the concepts of Long Term Memory (LTM) and Working Memory (WM) in psychology and neurophysiology. Next, we elaborate on how PLIF could be interpreted through the lenses of two influential models of WM, namely, Baddeley and Hitch’s (1974) Multi-component model of WM (M-WM) and Ericsson and Kintsch’s Long-term Working Memory (LTWM) model (1995). The M-WM postulates that “long-term information is downloaded into a separate temporary store, rather than simply activated in LTM”, a mechanism which permits WM to “manipulate and create new representations, rather than simply activating old memories” (Baddeley, 2003). Interpreting PLIF through the lens of the M-WM model amounts to the value for IT being chosen (and, if time permits, updated so as to obtain tighter bounds) by the central executive in the M-WM and the submodel being incrementally “retrieved” from LTM into M-WM’s episodic buffer. Interpreting PLIF through the lens of the LTWM model amounts to having no retrieval from LTM into WM and the submodel suggested by Lemma 1 being merely “activated in LTM” and, in that sense, being simply “consulted” in LTM. In sum, PLIF is compatible with both of the narratives provided by the M-WM and LTWM models.

A number of predictions follow from PL and PLIF. For instance, PLIF makes the following prediction: Prompted with a predictive or a diagnostic query (i.e., and , respectively), subjects should not retrieve any of the effects of e. Introspectively, this prediction seems plausible, and can be tested, using a similar approach to Cummins (2014); De Neys . (2003), by asking subjects to “think aloud” while engaging in predictive or diagnostic reasoning. Also, PL yields the following prediction: Upon intervening on cause c, subjects should be sensitive to when effect e will occur, even in settings where they are not particularly instructed to attend to such temporal patterns. This prediction is supported by recent findings suggesting that people do have expectations about the delay length between cause and effect Greville  Buehner (2010); Buehner  May (2004).

There is a growing acknowledgment in the literature that, not only time and causality are intimately linked, but that they mutually constrain each other in human cognition Buehner (2014). In line with this view, we see our work also as an attempt to formally articulate how time could guide and constrain causal reasoning in cognition. While many questions remain open, we hope to have made some progress towards better understanding of the CFP at the algorithmic level.

Acknowledgments

We are grateful to Thomas Icard for valuable discussions. We would also like to thank Marcel Montrey and Peter Helfer for helpful comments on an earlier draft of this work. This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under grant RGPIN 262017.

References

  • A. Baddeley (2003) baddeley2003workingBaddeley, A.  2003. Working memory: looking back and looking forward Working memory: looking back and looking forward. Nature Review Neuroscience410829–839.
  • AD. Baddeley  Hitch (1974) baddeley1974workingBaddeley, AD.  Hitch, G.  1974. Working memory Working memory. Psychology of Learning and Motivation847–89.
  • Bramley . (2014) bramley2014orderBramley, NR., Gerstenberg, T.  Lagnado, DA.  2014. The order of things: Inferring causal structure from temporal patterns The order of things: Inferring causal structure from temporal patterns. Proceedings of the 36th Annual Conference of the Cognitive Science Society Proceedings of the 36th annual conference of the cognitive science society ( 236–241).
  • Buehner (2014) buehner2014timeBuehner, MJ.  2014. Time and causality: editorial Time and causality: editorial. Frontiers in Psychology5, 228.
  • Buehner  May (2004) buehner2004abolishingBuehner, MJ.  May, J.  2004. Abolishing the effect of reinforcement delay on human causal learning Abolishing the effect of reinforcement delay on human causal learning. Quarterly Journal of Experimental Psychology Section B572179–191.
  • Cummins (2014) cummins2014impactCummins, DD.  2014. The impact of disablers on predictive inference. The impact of disablers on predictive inference. Journal of Experimental Psychology: Learning, Memory, and Cognition4061638.
  • De Neys . (2003) de2003inferenceDe Neys, W., Schaeken, W.  d’Ydewalle, G.  2003. Inference suppression and semantic memory retrieval: Every counterexample counts Inference suppression and semantic memory retrieval: Every counterexample counts. Memory & Cognition314581–595.
  • Ericsson  Kintsch (1995) ericsson1995longEricsson, KA.  Kintsch, W.  1995. Long-term working memory. Long-term working memory. Psychological Review1022211.
  • Fernbach . (2011) fernbach2011asymmetriesFernbach, PM., Darlow, A.  Sloman, SA.  2011. Asymmetries in predictive and diagnostic reasoning. Asymmetries in predictive and diagnostic reasoning. Journal of Experimental Psychology: General1402168–185.
  • Fernbach  Rehder (2013) fernbach2013cognitiveFernbach, PM.  Rehder, B.  2013. Cognitive shortcuts in causal inference Cognitive shortcuts in causal inference. Argument & Computation4164–88.
  • Fodor (1987) fodor1987modulesFodor, JA.  1987. Modules, frames, fridgeons, sleeping dogs, and the music of the spheres Modules, frames, fridgeons, sleeping dogs, and the music of the spheres.
  • Geiger . (1989) geiger1989dGeiger, D., Verma, T.  Pearl, J.  1989. d-separation: from theorems to algorithms d-separation: from theorems to algorithms. Fifth Workshop on Uncertainty in Artificial Intelligencepp. 118–125.
  • Gershman  Beck (2016) gershman2016complexGershman, SJ.  Beck, JM.  2016. Complex Probabilistic Inference: From Cognition to Neural Computation Complex probabilistic inference: From cognition to neural computation. In Computational Models of Brain and Behavior, ed A. Moustafa.
  • Glymour (1987) glymour1987androidGlymour, C.  1987. Android epistemology and the frame problem, The Robot’s Dilemma: The Frame Problem in AI Android epistemology and the frame problem, The Robot’s Dilemma: The Frame Problem in AI. pp. 65–75.
  • Gopnik . (2004) gopnik2004theoryGopnik, A., Glymour, C., Sobel, DM., Schulz, LE., Kushnir, T.  Danks, D.  2004. A theory of causal learning in children: causal maps and Bayes nets. A theory of causal learning in children: causal maps and bayes nets. Psychological Review11113-32.
  • Greville  Buehner (2010) greville2010temporalGreville, WJ.  Buehner, MJ.  2010. Temporal predictability facilitates causal learning. Temporal predictability facilitates causal learning. Journal of Experimental Psychology: General1394756.
  • Hume (1748/1975) hume1975inquiryHume, D.  1748/1975. An inquiry concerning human understanding An inquiry concerning human understanding. Oxford University Press.
  • Icard  Goodman (2015) icard2015Icard, TF.  Goodman, ND.  2015. A Resource-Rational Approach to the Causal Frame Problem A resource-rational approach to the causal frame problem. Proc. of the 37th Annual Meeting of the Cognitive Science Society.
  • Lagnado  Sloman (2006) lagnado2006timeLagnado, DA.  Sloman, SA.  2006. Time as a guide to cause. Time as a guide to cause. Journal of Experimental Psychology: Learning, Memory, and Cognition323451–60.
  • Lochmann  Deneve (2011) lochmann2011neuralLochmann, T.  Deneve, S.  2011. Neural processing as causal inference Neural processing as causal inference. Current Opinion in Neurobiology215774–781.
  • Mahoney  Laskey (1998) mahoney1998constructingMahoney, SM.  Laskey, KB.  1998. Constructing situation specific belief networks Constructing situation specific belief networks. Proc. of the 14th conference on Uncertainty in Artificial Intelligencepp. 370–378.
  • Marr (1982) marr1982visionMarr, D.  1982. Vision: a computational approach Vision: a computational approach.
  • Meder . (2014) meder2014structureMeder, B., Mayrhofer, R.  Waldmann, MR.  2014. Structure induction in diagnostic causal reasoning. Structure induction in diagnostic causal reasoning. Psychological Review1213277.
  • Park  Sloman (2013) park2013mechanisticPark, J.  Sloman, SA.  2013. Mechanistic beliefs determine adherence to the Markov property in causal reasoning Mechanistic beliefs determine adherence to the markov property in causal reasoning. Cognitive Psychology674186–216.
  • Pearl (1988) pearl2014probabilisticPearl, J.  1988. Probabilistic reasoning in intelligent systems: networks of plausible inference Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann.
  • Rehder  Waldmann (2015) Rehder2015Rehder, B.  Waldmann, MR.  2015. Failures of Explaining Away and Screening Off in Described versus Experienced Causal Learning Scenarios. Submitted for publication Failures of explaining away and screening off in described versus experienced causal learning scenarios. Submitted for publication.
  • Sanborn  Chater (2016) sanborn2016bayesianSanborn, AN.  Chater, N.  2016. Bayesian brains without probabilities Bayesian brains without probabilities. Trends in Cognitive Sciences2012883–893.
  • Schlottmann (1999) schlottmann1999seeingSchlottmann, A.  1999. Seeing it happen and knowing how it works: how children understand the relation between perceptual causality and underlying mechanism. Seeing it happen and knowing how it works: how children understand the relation between perceptual causality and underlying mechanism. Dev Psychol351303.
  • Shachter (1988) shachter1988probabilisticShachter, RD.  1988. Probabilistic inference and influence diagrams Probabilistic inference and influence diagrams. Operations Research364589–604.
  • Simon (1957) simon1957modelsSimon, HA.  1957. Models of Man Models of man. Wiley.
  • Sloman  Lagnado (2015) sloman2015causalitySloman, SA.  Lagnado, D.  2015. Causality in thought Causality in thought. Annual Review of Psychology66223–247.

Supplementary Information

S-I    Proof of Lemma 1:

Simple use of the total probability lemma yields:

(S1)

Equation (S1) immediately reveals a simple fact, namely, that is a linear combination of the members of the set , an observation which grants the validity of the expression given in (1) in the main text.

The key point which is left to be shown is the following: (Q.1) Why can the bounds given in (1) be computed using the submodel retrieved in the process of obtaining the corresponding for the adopted IT ? This is where the notion of PL comes into play. To articulate the intended line of reasoning let us introduce some notations first. According to Def. 3, any chosen IT induces an IT-RS . Let us partition the set of evidence variables E into three mutually disjoint sets and , where denotes the set of variables in E which belong to the IT-RS (i.e., ), denotes the set of variables in E with PLs , and finally, denotes the set of variables in E which are neither in nor in (i.e., ). Note that, by construction, the PLs of the variables in are less than the adopted IT , hence the adopted notation. For example, for the setting depicted in Fig. 2(b) (corresponding to the IT ), and . Also, for the setting depicted in Fig. 2(d) (corresponding to the IT ), and . Next, we present a key result as a lemma.

Lemma S.1. Let denote the posed causal query. For any chosen IT and its corresponding IT-RS , the following conditional independence relation holds:

(S2)

Proof. The relations between the PLs of the variables involved in the statement (S2) ensures that, according to -separation criterion (Pearl, 1988), conditioning on the variables in blocks all the paths between the variables in O and , hence follows (S2).

The following two-part argument responds to the question posed in (Q.1) in the affirmative. First, notice that:

(S3)

Second, note that the process of obtaining , namely, moving backwards from the variables in until is reached, ensures that the submodel retrieved in this process suffices for the derivations of . Using the approach introduced in Geiger . (1989) for identifying the relevant information for the derivation of a query in a Bayesian network, this follows from the following fact: Conditioned on , the set O is -separated from all the nodes in the set whose PLs are less than the adopted IT . Note that denotes the ancesteral graph for the nodes in . This completes the proof.

S-II    The Rationale behind Remark 1:

Case (i) and Case (iii) immediately follow from Lemma 1 in the main text. Case (ii) implies that all the ancestors of variables in are retrieved, hence the sufficiency of the retrieved submodel for the exact derivation of the query; see also Sec. S-III.

S-III    On the Special Case of Having :

In such circumstances, to derive , the set of all the ancestors of variables in should be retrieved and then inference should be carried out on the retrieved submodel.