External Validity: From Do-Calculus to Transportability Across Populations

03/05/2015 ∙ by Judea Pearl, et al. ∙ 0

The generalizability of empirical findings to new environments, settings or populations, often called "external validity," is essential in most scientific explorations. This paper treats a particular problem of generalizability, called "transportability," defined as a license to transfer causal effects learned in experimental studies to a new population, in which only observational studies can be conducted. We introduce a formal representation called "selection diagrams" for expressing knowledge about differences and commonalities between populations of interest and, using this representation, we reduce questions of transportability to symbolic derivations in the do-calculus. This reduction yields graph-based procedures for deciding, prior to observing any data, whether causal effects in the target population can be inferred from experimental findings in the study population. When the answer is affirmative, the procedures identify what experimental and observational findings need be obtained from the two populations, and how they can be combined to ensure bias-free transport.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction: Threats vs. Assumptions

Science is about generalization, and generalization requires that conclusions obtained in the laboratory be transported and applied elsewhere, in an environment that differs in many aspects from that of the laboratory.

Clearly, if the target environment is arbitrary, or drastically different from the study environment nothing can be transferred and scientific progress will come to a standstill. However, the fact that most studies are conducted with the intention of applying the results elsewhere means that we usually deem the target environment sufficiently similar to the study environment to justify the transport of experimental results or their ramifications.

Remarkably, the conditions that permit such transport have not received systematic formal treatment. In statistical practice, problems related to combining and generalizing from diverse studies are handled by methods of meta analysis (Glass (1976); Hedges and Olkin (1985); Owen (2009)), or hierarchical models (Gelman and Hill (2007)

), in which results of diverse studies are pooled together by standard statistical procedures (e.g., inverse-variance reweighting in meta-analysis, partial pooling in hierarchical modeling) and rarely make explicit distinction between experimental and observational regimes; performance is evaluated primarily by simulation.

To supplement these methodologies, our paper provides theoretical guidance in the form of limits on what can be achieved in practice, what problems are likely to be encountered when populations differ significantly from each other, what population differences can be circumvented by clever design and what differences constitute theoretical impediments, prohibiting generalization by any means whatsoever.

On the theoretical front, the standard literature on this topic, falling under rubrics such as “external validity” (Campbell and Stanley (1963), Manski (2007)), “heterogeneity” (Höfler, Gloster and Hoyer (2010)), “quasi-experiments” (Shadish, Cook and Campbell (2002), Chapter 3; Adelman (1991)),111Manski (2007) defines “external validity” as follows: “An experiment is said to have “external validity” if the distribution of outcomes realized by a treatment group is the same as the distribution of outcome that would be realized in an actual program.” Campbell and Stanley (1963), page 5, take a slightly broader view: ““External validity” asks the question of generalizability: to what populations, settings, treatment variables, and measurement variables can this effect be generalized?” consists primarily of “threats,” namely, explanations of what may go wrong when we try to transport results from one study to another while ignoring their differences. Rarely do we find an analysis of “licensing assumptions,” namely, formal conditions under which the transport of results across differing environments or populations is licensed from first principles.222Hernán and VanderWeele (2011) studied such conditions in the context of compound treatments, where we seek to predict the effect of one version of a treatment from experiments with a different version. Their analysis is a special case of the theory developed in this paper (Petersen (2011)). A related application is reported in Robins, Orellana and Rotnitzky (2008) where a treatment strategy is extrapolated between two biological similar populations under different observational regimes.

The reasons for this asymmetry are several. First, threats are safer to cite than assumptions. He who cites “threats” appears prudent, cautious and thoughtful, whereas he who seeks licensing assumptions risks suspicions of attempting to endorse those assumptions.

Second, assumptions are self-destructive in their honesty. The more explicit the assumption, the more criticism it invites, for it tends to trigger a richer space of alternative scenarios in which the assumption may fail. Researchers prefer therefore to declare threats in public and make assumptions in private.

Third, whereas threats can be communicated in plain English, supported by anecdotal pointers to familiar experiences, assumptions require a formal language within which the notion “environment” (or “population”) is given precise characterization, and differences among environments can be encoded and analyzed.

The advent of causal diagrams (Wright (1921); Heise (1975); Davis (1984); Verma and Pearl (1988); Spirtes, Glymour and Scheines (1993); Pearl (1995)) together with models of interventions (Haavelmo (1943); Strotz and Wold (1960)) and counterfactuals (Neyman (1923); Rubin (1974); Robins (1986); Balke and Pearl (1995)) provides such a language and renders the formalization of transportability possible.

Armed with this language, this paper departs from the tradition of communicating “threats” and embarks instead on the task of formulating “licenses to transport,” namely, assumptions that, if they held true, would permit us to transport results across studies.

In addition, the paper uses the inferential machinery of the do-calculus (Pearl (1995); Koller and Friedman (2009); Huang and Valtorta (2006); Shpitser and Pearl (2006)

) to derive algorithms for deciding whether transportability is feasible and how experimental and observational findings can be combined to yield unbiased estimates of causal effects in the target population.

The paper is organized as follows. In Section 2, we review the foundations of structural equations modeling (SEM), the question of identifiability and the do-calculus that emerges from these foundations. (This section can be skipped by readers familiar with these concepts and tools.) In Section 3, we motivate the question of transportability through simple examples, and illustrate how the solution depends on the causal story behind the problem. In Section 4, we formally define the notion of transportability and reduce it to a problem of symbolic transformations in do-calculus. In Section 5, we provide a graphical criterion for deciding transportability and estimating transported causal effects. We conclude in Section 6 with brief discussions of related problems of external validity, these include statistical transportability, and meta-analysis.

2 Preliminaries: The Logical Foundations of Causal Inference

The tools presented in this paper were developed in the context of nonparametric Structural Equations Models (SEM), which is one among several approaches to causal inference, and goes back to (Haavelmo (1943); Strotz and Wold (1960)). Other approaches include, for example, potential-outcomes (Rubin (1974)), Structured Tree Graphs (Robins (1986)), decision analytic (Dawid (2002)

), Causal Bayesian Networks (

Spirtes, Glymour and Scheines (2000); Pearl (2000), Chapter 1; Bareinboim, Brito and Pearl (2012)), and Settable Systems (White and Chalak (2009)). We will first describe the generic features common to all such approaches, and then summarize how these features are represented in SEM.333 We use the acronym SEM for both parametric and nonparametric representations though, historically, SEM practitioners preferred the former (Bollen and Pearl (2013)). Pearl (2011) has used the term Structural Causal Models (SCM) to eliminate this confusion. While comparisons of the various approaches lie beyond the scope of this paper, we nevertheless propose that their merits be judged by the extent to which each facilitates the functions described below.

2.1 Causal Models as Inference Engines

From a logical viewpoint, causal analysis relies on causal assumptions that cannot be deduced from (nonexperimental) data. Thus, every approach to causal inference must provide a systematic way of encoding, testing and combining these assumptions with data. Accordingly, we view causal modeling as an inference engine that takes three inputs and produces three outputs. The inputs are:

  1. [I-3.]

  2. A set of qualitative causal assumptions which the investigator is prepared to defend on scientific grounds, and a model that encodes these assumptions mathematically. (In SEM, takes the form of a diagram or a set of unspecified functions. A typical assumption is that no direct effect exists between a pair of variables (known as exclusion restriction), or that an omitted factor, represented by an error term, is independent of other such factors observed or unobserved, known as well as unknown.

  3. A set of queries concerning causal or counterfactual relationships among variables of interest. In linear SEM, concerned the magnitudes of structural coefficients but, in general, may address causal relations directly, for example:

    1. [:]

    2. What is the effect of treatment on outcome ?

    3. Is this employer practicing gender discrimination?

    In principle, each query should be “well defined,” that is, computable from any fully specified model compatible with . (See Definition 2.2 for formal characterization of a model, and also Section 2.4 for the problem of identification in partially specified models.)

  4. A set of experimental or non-experimental data

    , governed by a joint probability distribution presumably consistent with A.

Figure 1: Causal analysis depicted as an inference engine converting assumptions , queries , and data into logical implications , conditional claims , and data-fitness indices .

The outputs are:

  1. [O-3.]

  2. A set of statements which are the logical implications of , separate from the data at hand. For example, that has no effect on if we hold constant, or that is an instrument relative to {, }.

  3. A set of data-dependent claims concerning the magnitudes or likelihoods of the target queries in , each contingent on . may contain, for example, the estimated mean and variance of a given structural parameter, or the expected effect of a given intervention. Auxiliary to , a causal model should also yield an estimand for each query in , or a determination that is not identifiable from (Definition 2.4).

  4. A list of testable statistical implications of (which may or may not be part of O-2), and the degree , to which the data agrees with each of those implications. A typical implication would be a conditional independence assertion, or an equality constraint between two probabilistic expressions. Testable constraints should be read from the model (see Definition 2.4), and used to confirm or disconfirm the model against the data.

The structure of this inferential exercise is shown schematically in Figure 1. For a comprehensive review on methodological issues, see Pearl (2009a, 2012a).

2.2 Assumptions in Nonparametric Models

A structural equation model (SEM) is defined as follows.

[(Structural equation model (Pearl (2000), page 203))]

  1. [4.]

  2. A set of background or exogenous variables, representing factors outside the model, which nevertheless affect relationships within the model.

  3. A set of endogenous variables, assumed to be observable. Each of these variables is functionally dependent on some subset of .

    Figure 2: The diagrams associated with (a) the structural model of equation (2.2) and (b) the modified model of equation (2.3), representing the intervention .
  4. A set of functions such that each determines the value of , .

  5. A joint probability distribution over .

A simple SEM model is depicted in Figure 2(a), which represents the following three functions:

(1)

where in this particular example, , and are assumed to be jointly independent but otherwise arbitrarily distributed. Whenever dependence exists between any two exogenous variables, a bidirected arrow will be added to the diagram to represent this dependence (e.g., Figure 4).444More precisely, the absence of bidirected arrows implies marginal independences relative of the respective exogenous variables. In other words, the set of all bidirected edges constitute an i-map of (Richardson (2003)). Each of these functions represents a causal process (or mechanism) that determines the value of the left variable (output) from the values on the right variables (inputs), and is assumed to be invariant unless explicitly intervened on. The absence of a variable from the right-hand side of an equation encodes the assumption that nature ignores that variable in the process of determining the value of the output variable. For example, the absence of variable from the arguments of conveys the empirical claim that variations in will leave Y unchanged, as long as variables and remain constant.

It is important to distinguish between a fully specified model in which and the collection of functions are specified and a partially specified model, usually in the form of a diagram. The former entails one and only one observational distribution ; the latter entails a set of observational distributions that are compatible with the graph (those that can be generated by specifying ).

2.3 Representing Interventions, Counterfactuals and Causal Effects

This feature of invariance permits us to derive powerful claims about causal effects and counterfactuals, even in nonparametric models, where all functions and distributions remain unknown. This is done through a mathematical operator called , which simulates physical interventions by deleting certain functions from the model, replacing them with a constant , while keeping the rest of the model unchanged (Haavelmo (1943); Strotz and Wold (1960); Pearl (2014)). For example, to emulate an intervention that sets to a constant in model of Figure 2(a), the equation for in equation (2.2) is replaced by , and we obtain a new model, ,

(2)

the graphical description of which is shown in Figure 2(b).

The joint distribution associated with this modified model, denoted

describes the post-intervention distribution of variables and (also called “controlled” or “experimental” distribution), to be distinguished from the preintervention distribution, , associated with the original model of equation (2.2). For example, if represents a treatment variable,

a response variable, and

some covariate that affects the amount of treatment received, then the distribution gives the proportion of individuals that would attain response level and covariate level under the hypothetical situation in which treatment is administered uniformly to the population.555Equivalently, can be interpreted as the joint probability of under a randomized experiment among units receiving treatment level . Readers versed in potential-outcome notations may interpret as the probability , where is the potential outcome under treatment .

In general, we can formally define the postintervention distribution by the equation

(3)

In words, in the framework of model , the postintervention distribution of outcome is defined as the probability that model assigns to each outcome level . From this distribution, which is readily computed from any fully specified model , we are able to assess treatment efficacy by comparing aspects of this distribution at different levels of .666Counterfactuals are defined similarly through the equation (see Pearl (2009b), Chapter 7), but will not be needed for the discussions in this paper.

2.4 Identification, d-Separation and Causal Calculus

A central question in causal analysis is the question of identification of causal queries (e.g., the effect of intervention ) from a combination of data and a partially specified model, for example, when only the graph is given and neither the functions nor the distribution of . In linear parametric settings, the question of identification reduces to asking whether some model parameter, , has a unique solution in terms of the parameters of (say the population covariance matrix). In the nonparametric formulation, the notion of “has a unique solution” does not directly apply since quantities such as have no parametric signature and are defined procedurally by simulating an intervention in a causal model , as in equation (2.3). The following definition captures the requirement that be estimable from the data:

[(Identifiability)] A causal query is identifiable, given a set of assumptions , if for any two (fully specified) models, and , that satisfy , we have 777An implication similar to (4) is used in the standard statistical definition of parameter identification, where it conveys the uniqueness of a parameter set given a distribution (Lehmann and Casella (1998)). To see the connection, one should think about the query as a function where is the pair that characterizes a fully specified model .

(4)

In words, the functional details of and do not matter; what matters is that the assumptions in (e.g., those encoded in the diagram) would constrain the variability of those details in such a way that equality of ’s would entail equality of ’s. When this happens, depends on only, and should therefore be expressible in terms of the parameters of .

When a query is given in the form of a do-expression, for example, , its identifiability can be decided systematically using an algebraic procedure known as the do-calculus (Pearl (1995)). It consists of three inference rules that permit us to map interventional and observational distributions whenever certain conditions hold in the causal diagram .

The conditions that permit the application these inference rules can be read off the diagrams using a graphical criterion known as d-separation (Pearl (1988)).

[(d-separation)] A set of nodes is said to block a path if either

  1. [2.]

  2. contains at least one arrow-emitting node that is in , or

  3. contains at least one collision node that is outside and has no descendant in .

If blocks all paths from set to set , it is said to “d-separate and ,” and then, it can be shown that variables and are independent given , written .888See Hayduk et al. (2003), Glymour and Greenland (2008) and Pearl (2009b), page 335, for a gentle introduction to d-separation.

D-separation reflects conditional independencies that hold in any distribution that is compatible with the causal assumptions embedded in the diagram. To illustrate, the path in Figure 2(a) is blocked by and by , since each emits an arrow along that path. Consequently, we can infer that the conditional independencies and will be satisfied in any probability function that this model can generate, regardless of how we parameterize the arrows. Likewise, the path is blocked by the null set , but it is not blocked by since is a descendant of the collision node . Consequently, the marginal independence will hold in the distribution, but may or may not hold.999This special handling of collision nodes (or colliders, e.g., ) reflects a general phenomenon known as Berkson’s paradox (Berkson (1946)), whereby observations on a common consequence of two independent causes render those causes dependent. For example, the outcomes of two independent coins are rendered dependent by the testimony that at least one of them is a tail.

2.5 The Rules of do-Calculus

Let , , and be arbitrary disjoint sets of nodes in a causal DAG . We denote by the graph obtained by deleting from all arrows pointing to nodes in . Likewise, we denote by the graph obtained by deleting from all arrows emerging from nodes in . To represent the deletion of both incoming and outgoing arrows, we use the notation .

Figure 3: Causal diagrams depicting Examples 33. In (a) represents “age.” In (b), represents “linguistic skills” while age (in hollow circle) is unmeasured. In (c), represents a biological marker situated between the treatment and a disease .

The following three rules are valid for every interventional distribution compatible with :

[(Insertion/deletion of observations)]

(5)

[(Action/observation exchange)]

(6)

[(Insertion/deletion of actions)]

(7)

where is the set of -nodes that are not ancestors of any -node in .

To establish identifiability of a query , one needs to repeatedly apply the rules of do-calculus to , until the final expression no longer contains a do-operator;101010Such derivations are illustrated in graphical details in Pearl (2009b), page 87. this renders it estimable from nonexperimental data. The do-calculus was proven to be complete for the identifiability of causal effects in the form (Shpitser and Pearl (2006); Huang and Valtorta (2006)), which means that if cannot be expressed in terms of the probability of observables by repeated application of these three rules, such an expression does not exist. In other words, the query is not estimable from observational studies without making further assumptions, for example, linearity, monotonicity, additivity, absence of interactions, etc.

We shall see that, to establish transportability, the goal will be different; instead of eliminating do-operators from the query expression, we will need to separate them from a set of variables S that represent disparities between populations.

3 Inference Across Populations: Motivating Examples

To motivate the treatment of Section 4, we first demonstrate some of the subtle questions that transportability entails through three simple examples, informally depicted in Figure 3.

Consider the graph in Figure 3(a) that represents cause-effect relationships in the pretreatment population in Los Angeles. We conduct a randomized trial in Los Angeles and estimate the causal effect of exposure on outcome for every age group .111111Throughout the paper, each graph represents the causal structure of the population prior to the treatment, hence stands for the level of treatment taken by an individual out of free choice.,121212The arrow from to represents the tendency of older people to seek treatment more often than younger people, and the arrow from to represents the effect of age on the outcome. We now wish to generalize the results to the population of New York City (NYC), but data alert us to the fact that the study distribution in LA is significantly different from the one in NYC (call the latter ). In particular, we notice that the average age in NYC is significantly higher than that in LA. How are we to estimate the causal effect of on in NYC, denoted ?

Our natural inclination would be to assume that age-specific effects are invariant across cities and so, if the LA study provides us with (estimates of) age-specific causal effects , the overall causal effect in NYC should be

(8)

This transport formula combines experimental results obtained in LA, , with observational aspects of NYC population, , to obtain an experimental claim about NYC.131313At first glance, equation (8) may be regarded as a routine application of “standardization” or “recalibration”—a statistical extrapolation method that can be traced back to a century-old tradition in demography and political arithmetic (Westergaard (1916); Yule (1934); Lane and Nelder (1982)). On a second thought, it raises the deeper question of why we consider age-specific effects to be invariant across populations. See discussion following Example 3.

Our first task in this paper will be to explicate the assumptions that renders this extrapolation valid. We ask, for example, what must we assume about other confounding variables beside age, both latent and observed, for equation (8) to be valid, or, would the same transport formula hold if was not age, but some proxy for age, say, language proficiency. More intricate yet, what if stood for an exposure-dependent variable, say hyper-tension level, that stands between and ?

Let us examine the proxy issue first.

Let the variable in Example 3 stand for subjects language proficiency, and let us assume that does not affect exposure or outcome , yet it correlates with both, being a proxy for age which is not measured in either study [see Figure 3(b)]. Given the observed disparity , how are we to estimate the causal effect for the target population of NYC from the -specific causal effect estimated at the study population of LA?

The inequality in this example may reflect either age difference or differences in the way that correlates with age. If the two cities enjoy identical age distributions and NYC residents acquire linguistic skills at a younger age, then since has no effect whatsoever on and , the inequality can be ignored and, intuitively, the proper transport formula would be

(9)

If, on the other hand, the conditional probabilities and are the same in both cities, and the inequality reflects genuine age differences, equation (9) is no longer valid, since the age difference may be a critical factor in determining how people react to . We see, therefore, that the choice of the proper transport formula depends on the causal context in which population differences are embedded.

This example also demonstrates why the invariance of -specific causal effects should not be taken for granted. While justified in Example 3, with = age, it fails in Example 3, in which was equated with “language skills.” Indeed, using Figure 3(b) for guidance, the -specific effect of on in NYC is given by

Thus, if the two populations differ in the relation between age and skill, that is,

the skill-specific causal effect would differ as well.

The intuition is clear. A NYC person at skill level is likely to be in a totally different age group from his skill-equals in Los Angeles and, since it is age, not skill that shapes the way individuals respond to treatment, it is only reasonable that Los Angeles residents would respond differently to treatment than their NYC counterparts at the very same skill level.

The essential difference between Examples 3 and 3 is that age is normally taken to be an exogenous variable (not assigned by other factors in the model) while skills may be indicative of earlier factors (age, education, ethnicity) capable of modifying the causal effect. Therefore, conditional on skill, the effect may be different in the two populations.

Examine the case where is a -dependent variable, say a disease bio-marker, standing on the causal pathways between and as shown in Figure 3(c). Assume further that the disparity is discovered and that, again, both the average and the -specific causal effect are estimated in the LA experiment, for all levels of and . Can we, based on information given, estimate the average (or -specific) causal effect in the target population of NYC?

Here, equation (8) is wrong because the overall causal effect (in both LA and NYC) is no longer a simple average of the -specific causal effects. The correct weighing rule is

(10)

which reduces to (8) only in the special case where is unaffected by . Equation (9) is also wrong because we can no longer argue, as we did in Example 3, that does not affect , hence it can be ignored. Here, lies on the causal pathway between and so, clearly, it affects their relationship. What then is the correct transport formula for this scenario?

To cast this example in a more realistic setting, let us assume that we wish to use as a “surrogate endpoint” to predict the efficacy of treatment on outcome , where is too difficult and/or expensive to measure routinely (Prentice (1989); Ellenberg and Hamilton (1989)). Thus, instead of considering experimental and observational studies conducted at two different locations, we consider two such studies taking place at the same location, but at different times. In the first study, we measure and discover that is a good surrogate, namely, knowing the effect of treatment on allows prediction of the effect of treatment on the more clinically relevant outcome () (Joffe and Greene (2009)). Once is proclaimed a “surrogate endpoint,” it invites efforts to find direct means of controlling . For example, if cholesterol level is found to be a predictor of heart diseases in a long-run trial, drug manufacturers would rush to offer cholesterol-reducing substances for public consumption. As a result, both the prior and the treatment-dependent probability would undergo a change, resulting in and , respectively.

We now wish to reassess the effect of the drug in the new population and do it in the cheapest possible way, namely, by conducting an observational study to estimate , acknowledging that confounding exists between and and that the drug affects both directly and through , as shown in Figure 3(c).

Using a graphical representation to encode the assumptions articulated thus far, and further assuming that the disparity observed stems only from a difference in people’s susceptibility to (and not due to a change in some unobservable confounder), we will prove in Section 5 that the correct transport formula should be

(11)

which is different from both (8) and (9). It calls instead for the -specific effects to be reweighted by the conditional probability , estimated in the target population.141414Quite often the possibility of running a second randomized experiment to estimate is also available to investigators, though at a higher cost. In such cases, a transport formula would be derivable under more relaxed assumptions, for example, allowing for and to be confounded.

To see how the transportability problem fits into the general scheme of causal analysis discussed in Section 2.1 (Figure 1), we note that, in our case, the data comes from two sources, experimental (from the study) and nonexperimental (from the target), assumptions are encoded in the form of selection diagrams, and the query stands for the causal effect (e.g., ). Although this paper does not discuss the goodness-of-fit problem, standard methods are available for testing the compatibility of the selection diagram with the data available.

4 Formalizing Transportability

4.1 Selection Diagrams and Selection Variables

The pattern that emerges from the examples discussed in Section 3 indicates that transportability is a causal, not statistical notion. In other words, the conditions that license transport as well as the formulas through which results are transported depend on the causal relations between the variables in the domain, not merely on their statistics. For instance, it was important in Example 3 to ascertain that the change in was due to the change in the way is affected by , but not due to a change in confounding conditions between the two. This cannot be determined solely by comparing and . If and are confounded [e.g., Figure 6(e)], it is quite possible for the inequality to hold, reflecting differences in confounding, while the way that is affected by (i.e., ) is the same in the two populations—a different transport formula will then emerge for this case.

Figure 4: Selection diagrams depicting specific versions of Examples 33. In (a), the two populations differ in age distributions. In (b), the populations differs in how depends on age (an unmeasured variable, represented by the hollow circle) and the age distributions are the same. In (c), the populations differ in how depends on . In all diagrams, dashed arcs (e.g., ) represent the presence of latent variables affecting both and .

Consequently, licensing transportability requires knowledge of the mechanisms, or processes, through which population differences come about; different localization of these mechanisms yield different transport formulae. This can be seen most vividly in Example 3 [Figure 3(b)] where we reasoned that no reweighing is necessary if the disparity originates with the way language proficiency depends on age, while the age distribution itself remains the same. Yet, because age is not measured, this condition cannot be detected in the probability distribution , and cannot be distinguished from an alternative condition,

one that may require reweighting according to equation (8). In other words, every probability distribution that is compatible with the process of Figure 3(b) is also compatible with that of Figure 3(a) and, yet, the two processes dictate different transport formulas.

Based on these observations, it is clear that if we are to represent formally the differences between populations (similarly, between experimental settings or environments), we must resort to a representation in which the causal mechanisms are explicitly encoded and in which differences in populations are represented as local modifications of those mechanisms.

To this end, we will use causal diagrams augmented with a set, , of “selection variables,” where each member of corresponds to a mechanism by which the two populations differ, and switching between the two populations will be represented by conditioning on different values of these variables.151515Disparities among populations or subpopulations can also arise from differences in design; for example, if two samples are drawn by different criteria from a given population. The problem of generalizing between two such subpopulations is usually called sampling selection bias (Heckman (1979); Hernán, Hernández-Díaz and Robins (2004); Cole and Stuart (2010); Pearl (2013); Bareinboim, Tian and Pearl (2014)). In this paper, we deal only with nature-induced, not man-made disparities.

Intuitively, if stands for the distribution of a set of variables in the experimental study (with X randomized) then we designate by the distribution of if we were to conduct the study on population instead of . We now attribute the difference between the two to the action of a set of selection variables, and write161616Alternatively, one can represent the two populations’ distributions by , and , respectively. The results, however, will be the same, since only the location of enters the analysis.,171717Pearl (1993, 2009b, page 71), Spirtes, Glymour and Scheines (1993) and Dawid (2002), for example, use conditioning on auxiliary variables to switch between experimental and observational studies. Dawid (2002) further uses such variables to represent changes in parameters of probability distributions.

The selection variables in may represent all factors by which populations may differ or that may “threaten” the transport of conclusions between populations. For example, in Figure 4(a) the age disparity discussed in Example 3 will be represented by the inequality

where stands for all factors responsible for drawing subjects at age to NYC rather than LA.

Of equal importance is the absence of an variable pointing to in Figure 4(a), which encodes the assumption that age-specific effects are invariant across the two populations.

This graphical representation, which we will call “selection diagrams” is defined as follows:181818The assumption that there are no structural changes between domains can be relaxed starting with and adding -nodes following the same procedure as in Definition 4, while enforcing acyclicity. In extreme cases in which the two domains differ in causal directionality (Spirtes, Glymour and Scheines (2000), pages 298–299), acyclicity cannot be maintained. This complication as well as one created when is a edge-super set of require a more elaborated graphical representation and lie beyond the scope of this paper.

[(Selection diagram)] Let be a pair of structural causal models (Definition 2.2) relative to domains , sharing a causal diagram . is said to induce a selection diagram if is constructed as follows:

  1. [2.]

  2. Every edge in is also an edge in .

  3. contains an extra edge whenever there might exist a discrepancy or between and .

In summary, the -variables locate the mechanisms where structural discrepancies between the two populations are suspected to take place. Alternatively, the absence of a selection node pointing to a variable represents the assumption that the mechanism responsible for assigning value to that variable is the same in the two populations. In the extreme case, we could add selection nodes to all variables, which means that we have no reason to believe that the populations share any mechanism in common, and this, of course would inhibit any exchange of information among the populations. The invariance assumptions between populations, as we will see, will open the door for the transport of some experimental findings.

For clarity, we will represent the variables by squares, as in Figure 4, which uses selection diagrams to encode the three examples discussed in Section 3. (Besides the variables, these graphs also include additional latent variables, represented by bidirected edges, which makes the examples more realistic.) In particular, Figures 4(a) and 4(b) represent, respectively, two different mechanisms responsible for the observed disparity . The first [Figure 4(a)] dictates transport formula (8), while the second [Figure 4(b)] calls for direct, unadjusted transport (9). This difference stems from the location of the variables in the two diagrams. In Figure 4(a), the variable represents unspecified factors that cause age differences between the two populations, while in Figure 4(b), represents factors that cause differences in reading skills () while the age distribution itself (unobserved) remains the same.

In this paper, we will address the issue of transportability assuming that scientific knowledge about invariance of certain mechanisms is available and encoded in the selection diagram through the nodes. Such knowledge is, admittedly, more demanding than that which shapes the structure of each causal diagram in isolation. It is, however, a prerequisite for any attempt to justify transfer of findings across populations, which makes selection diagrams a mathematical object worthy of analysis.

4.2 Transportability: Definitions and Examples

Using selection diagrams as the basic representational language, and harnessing the concepts of intervention, do-calculus, and identifiability (Section 2), we can now give the notion of transportability a formal definition.

[(Transportability)] Let be a selection diagram relative to domains . Let be the pair of observational and interventional distributions of , and be the observational distribution of . The causal relation is said to be transportable from to in if is uniquely computable from in any model that induces .

Two interesting connections between identifiability and transportability are worth noting. First, note that all identifiable causal relations in are also transportable, because they can be computed directly from and require no experimental information from . Second, note that given causal diagram , one can produce a selection diagram such that identifiability in is equivalent to transportability in . First set , and then add selection nodes pointing to all variables in , which represents that the target domain does not share any mechanism with its counterpart—this is equivalent to the problem of identifiability because the only way to achieve transportability is to identify from scratch in the target population.

While the problems of identifiability and transportability are related, proofs of nontransportability are more involved than those of nonidentifiability for they require one to demonstrate the nonexistence of two competing models compatible with , agreeing on , and disagreeing on .

Definition 4.2 is declarative, and does not offer an effective method of demonstrating transportability even in simple models. Theorem 1 offers such a method using a sequence of derivations in do-calculus.

Theorem 1

Let be the selection diagram characterizing two populations, and , and a set of selection variables in . The relation R = is transportable from to if the expression is reducible, using the rules of do-calculus, to an expression in which appears only as a conditioning variable in do-free terms.

Every relation satisfying the condition of Theorem 1 can be written as an algebraic combination of two kinds of terms, those that involve and those that do not. The former can be written as -terms and are estimable, therefore, from observations on , as required by Definition 4.2. All other terms, especially those involving do-operators, do not contain ; they are experimentally identifiable therefore in . This criterion was proven to be both sufficient and necessary for causal effects, namely (Bareinboim and Pearl (2012)). Theorem 1, though procedural, does not specify the sequence of rules leading to the needed reduction when such a sequence exists. Bareinboim and Pearl (2013b) derived a complete procedural solution for this, based on graphical method developed in (Tian and Pearl (2002); Shpitser and Pearl (2006)). Despite its completeness, however, the procedural solution is not trivial, and we take here an alternative route to establish a simple and transparent procedure for confirming transportability, guided by two recognizable subgoals.

[(Trivial transportability)] A causal relation is said to be trivially transportable from to , if is identifiable from .

This criterion amounts to an ordinary test of identifiability of causal relations using graphs, as given by Definition 2.4. It permits us to estimate directly from observational studies on , unaided by causal information from .

Let be the causal effect and let the selection diagram of and be given by , then is trivially transportable, since .

Another special case of transportability occurs when a causal relation has identical form in both domains—no recalibration is needed.

[(Direct transportability)] A causal relation is said to be directly transportable from to , if .

A graphical test for direct transportability of follows from do-calculus and reads: ; in words, blocks all paths from to once we remove all arrows pointing to and condition on . As a concrete example, this test is satisfied in Figure 4(a) and, therefore, the -specific effects is the same in both populations; it is directly transportable.

The notion of “external validity” as defined by Manski (2007) (footnote 1) corresponds to Direct Transportability, for it requires that retains its validity without adjustment, as in equation (9). Such conditions preclude the use of information from to recalibrate .

Let be the causal effect of on , and let have a single node pointing to , then is directly transportable, because causal effects are independent of the selection mechanism (see Pearl (2009b), pages 72 and 73).

Let be the -specific causal effect of on where is a set of variables, and and differ only in the conditional probabilities and such that , as shown in Figure 4(b). Under these conditions, is not directly transportable. However, the -specific causal effects are directly transportable, and so is . Note that, due to the confounding arcs, none of these quantities is identifiable.

5 Transportability of causal effects—A graphical criterion

We now state and prove two theorems that permit us to decide algorithmically, given a selection diagram, whether a relation is transportable between two populations, and what the transport formula should be.

Theorem 2

Let be the selection diagram characterizing two populations, and , and the set of selection variables in . The strata-specific causal effect is transportable from to if d-separates from in the -manipulated version of , that is, satisfies .

From Rule 3 of do-calculus we have: whenever satisfies in . This proves Theorem 2.

[(-admissibility)] A set of variables satisfying in will be called -admissible (with respect to the causal effect of  on ).

Figure 5: Selection diagrams illustrating -admissibility. (a) Has no -admissible set while in (b), is -admissible.
Corollary 1

The average causal effect is transportable from to if there exists a set of observed pretreatment covariates that is -admissible. Moreover, the transport formula is given by the weighting of equation (8).

The causal effect is transportable in Figure 4(a), since is -admissible, and in Figure 4(b), where the empty set is -admissible. It is also transportable by the same criterion in Figure 5(b), where is -admissible, but not in Figure 5(a) where no -admissible set exists.

Corollary 2

Any variable that is pointing directly into as in Figure 6(a), or that is d-separated from in can be ignored.

This follows from the fact that the empty set is -admissible relative to any such variable. Conceptually, the corollary reflects the understanding that differences in propensity to receive treatment do not hinder the transportability of treatment effects; the randomization used in the experimental study washes away such differences.

Figure 6: Selection diagrams illustrating transportability. The causal effect is (trivially) transportable in (c) but not in (b) and (f). It is transportable in (a), (d) and (e) (see Corollary 2).

We now generalize Theorem 2 to cases involving treatment-dependent variables, as in Figure 4(c).

Theorem 3

The average causal effect is transportable from to if either one of the following conditions holds:

  1. [3.]

  2. is trivially transportable.

  3. There exists a set of covariates, (possibly affected by ) such that is -admissible and for which is transportable.

  4. There exists a set of covariates, that satisfy and for which is transportable.

1. Condition 1 entails transportability.

2. If condition 2 holds, it implies

(12)
(13)
(14)

We now note that the transportability of should reduce to a star-free expression and would render transportable.

3. If condition 3 holds, it implies

(15)
(16)
(17)
(18)

We similarly note that the transportability of should reduce to a star-free expression and would render transportable. This proves Theorem 3.

To illustrate the application of Theorem 3, let us apply it to Figure 4(c), which corresponds to the surrogate endpoint problem discussed in Section 3 (Example 3). Our goal is to estimate —the effect of on in the new population created by changes in how responds to . The structure of the problem permits us to satisfy condition 2 of Theorem 3, since is -admissible and is trivially transportable. The former can be seen from , hence ; the latter can be seen from the fact that and and unconfounded, hence . Putting the two together, we get

(19)

which proves equation (11).

The test entailed by Theorem 3 is recursive, since the transportability of one causal effect depends on that of another. However, given that the diagram is finite and acyclic, the sets and needed in conditions 2 and 3 of Theorem 3 would become closer and closer to , and the iterative process will terminate after a finite number of steps. This occurs because the causal effects (likewise, ) is trivially transportable and equals for any node that is not a descendant of . Thus, the need for reiteration applies only to those members of that lie on the causal pathways from to . Note further that the analyst need not terminate the procedure upon satisfying the conditions of Theorem 3. If one wishes to reduce the number of experiments, it can continue until no further reduction is feasible.

Figure 6(d) requires that we invoke both conditions of Theorem 3, iteratively. To satisfy condition 2, we note that is -admissible, and we need to prove the transportability of . To do that, we invoke condition 3 and note that d-separates from in . There remains to confirm the transportability of , but this is guaranteed by the fact that the empty set is -admissible relative to , since . Hence, by Theorem 2 (replacing with ) is transportable, which bestows transportability on . Thus, the final transport formula (derived formally in the Appendix) is:

(20)

The first two factors of the expression are estimable in the experimental study, and the third through observational studies on the target population. Note that the joint effect need not be estimated in the experiment; a decomposition that results in decrease of measurement cost and sampling variability.

A similar analysis proves the transportability of the causal effect in Figure 6(e) (see Pearl and Bareinboim (2011)). The model of Figure 6(f), however, does not allow for the transportability of as witnessed by the absence of -admissible set in the diagram, and the inapplicability of condition 3 of Theorem 3.

To illustrate the power of Theorem 3 in discerning transportability and deriving transport formulae, Figure 7

Figure 7: Selection diagram in which the causal effect is shown to be transportable in multiple iterations of Theorem 3 (see the Appendix).

represents a more intricate selection diagram, which requires several iteration to discern transportability. The transport formula for this diagram is given by (derived formally in the Appendix):

(21)

The main power of this formula is to guide investigators in deciding what measurements need be taken in both the experimental study and the target population. It asserts, for example, that variables and need not be measured. It likewise asserts that the -specific causal effects need not be estimated in the experimental study and only the conditional probabilities and need be estimated in the target population. The derivation of this formulae is given in the Appendix.

Despite its power, Theorem 3 in not complete, namely, it is not guaranteed to approve all transportable relations or to disapprove all nontransportable ones. An example of the former is contrived in Bareinboim and Pearl (2012), where an alternative, necessary and sufficient condition is established in both graphical and algorithmic form. Theorem 3 provides, nevertheless, a simple and powerful method of establishing transportability in practice.

6 Conclusions

Given judgements of how target populations may differ from those under study, the paper offers a formal representational language for making these assessments precise and for deciding whether causal relations in the target population can be inferred from those obtained in an experimental study. When such inference is possible, the criteria provided by Theorems 2 and 3 yield transport formulae, namely, principled ways of calibrating the transported relations so as to properly account for differences in the populations. These transport formulae enable the investigator to select the essential measurements in both the experimental and observational studies, and thus minimize measurement costs and sample variability.

The inferences licensed by Theorem 2 and 3 represent worst case analysis, since we have assumed, in the tradition of nonparametric modeling, that every variable may potentially be an effect-modifier (or moderator). If one is willing to assume that certain relationships are noninteractive, or monotonic as is the case in additive models, then additional transport licenses may be issued, beyond those sanctioned by Theorems 2 and 3.

While the results of this paper concern the transfer of causal information from experimental to observational studies, the method can also benefit in transporting statistical findings from one observational study to another (Pearl and Bareinboim (2011)). The rationale for such transfer is two-fold. First, information from the first study may enable researchers to avoid repeated measurement of certain variables in the target population. Second, by pooling data from both populations, we increase the precision in which their commonalities are estimated and, indirectly, also increase the precision by which the target relationship is transported. Substantial reduction in sampling variability can be thus achieved through this decomposition (Pearl (2012b)).

Clearly, the same data-sharing philosophy can be used to guide Meta-Analysis (Glass (1976); Hedges and Olkin (1985); Rosenthal (1995); Owen (2009)), where one attempts to combine results from many experimental and observational studies, each conducted on a different population and under a different set of conditions, so as to construct an aggregate measure of effect size that is “better,” in some formal sense, than any one study in isolation. While traditional approaches aims to average out differences between studies, our theory exploits the commonalities among the populations studied and the target population. By pooling together commonalities and discarding areas of disparity, we gain maximum use of the available samples (Bareinboim and Pearl (2013c)).

To be of immediate use, our method relies on the assumption that the analyst is in possession of sufficient background knowledge to determine, at least qualitatively, where two populations may differ from one another. This knowledge is not vastly different from that required in any principled approach to causation in observational studies, since judgement about possible effects of omitted factors is crucial in any such analysis. Whereas such knowledge may only be partially available, the analysis presented in this paper is nevertheless essential for understanding what knowledge is needed for the task to succeed and how sensitive conclusions are to knowledge that we do not possess.

Real-life situations will be marred, of course, with additional complications that were not addressed directly in this paper; for example, measurement errors, selection bias, finite sample variability, uncertainty about the graph structure and the possible existence of unmeasured confounders between any two nodes in the diagram. Such issues are not unique to transportability; they plague any problem in causal analysis, regardless of whether they are represented formally or ignored by avoiding formalism. The methods offered in this paper are representative of what theory permits us to do in ideal situations, and the graphical representation presented in this paper makes the assumptions explicit and transparent. Transparency is essential for reaching tentative consensus among researchers and for facilitating discussions to distinguish that which is deemed plausible and important from that which is negligible or implausible.

Finally, it is important to mention two recent extensions of the results reported in this article. Bareinboim and Pearl (2013a) have addressed the problem of transportability in cases where only a limited set of experiments can be conducted at the source environment. Subsequently, the results were generalized to the problem of “meta-transportability,” that is, pooling experimental results from multiple and disparate sources to synthesize a consistent estimate of a causal relation at yet another environment, potentially different from each of the former (Bareinboim and Pearl (2013c)). It is shown that such synthesis may be feasible from multiple sources even in cases where it is not feasible from any one source in isolation.

Appendix

Derivation of the transport formula for the causal effect in the model of Figure 6(d) [equation (5)]:

(22)

Derivation of the transport formula for the causal effect in the model of Figure 7 [equation (5)]:

(23)

Acknowledgments

This paper benefited from discussions with Onyebuchi Arah, Stuart Baker, Sander Greenland,Michael Hoefler, Marshall Joffe, William Shadish, Ian Shrier and Dylan Small. We are grateful to two anonymous referees for thorough reviews of this manuscript and for suggesting a simplification in the transport formula of Example 5. This research was supported in parts by NIH Grant #1R01 LM009961-01, NSF Grant #IIS-0914211 and ONR Grant #N000-14-09-1-0665.

References

  • Adelman (1991) [author] Adelman, L.L. (1991). Experiments, quasi-experiments, and case studies: A review of empirical methods for evaluating decision support systems. IEEE Transactions on Systems, Man and Cybernetics 21 293–301. doi=10.1109/21.87078, issn=0018-9472 imsref
  • Balke and Pearl (1995)

    [mr] Balke, AlexanderA. Pearl, JudeaJ. (1995). Counterfactuals and policy analysis in structural models. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (P.P. Besnard S.S. Hanks, eds.) 11–18. Morgan Kaufmann, San Francisco, CA. mr=1615008 imsref

  • Bareinboim, Brito and Pearl (2012) [author] Bareinboim, EliasE., Brito, CarlosC. Pearl, JudeaJ. (2012). Local characterizations of causal Bayesian networks. In Graph Structures for Knowledge Representation and Reasoning. Lecture Notes in Artificial Intelligence 7205 1–17. Springer, Berlin. imsref
  • Bareinboim and Pearl (2012) [author] Bareinboim, E.E. Pearl, J.J. (2012). Transportability of causal effects: Completeness results. In Proceedings of the Twenty-Sixth National Conference on Artificial Intelligence 698–704. AAAI Press, Menlo Park, CA. imsref
  • Bareinboim and Pearl (2013a) [author] Bareinboim, EliasE. Pearl, JudeaJ. (2013a). Causal transportability with limited experiments. In Proceedings of the Twenty-Seventh National Conference on Artificial Intelligence 95–101. AAAI Press, Menlo Park, CA. imsref
  • Bareinboim and Pearl (2013b) [author] Bareinboim, EliasE. Pearl, JudeaJ. (2013b). A general algorithm for deciding transportability of experimental results. J. Causal Inference 1 107–134. doi=10.1515/jci-2012-0004 imsref
  • Bareinboim and Pearl (2013c) [author] Bareinboim, EliasE. Pearl, JudeaJ. (2013c). Meta-transportability of causal effects: A formal approach. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2013). J. Mach. Learn. Res. 31 135–143. imsref
  • Bareinboim, Tian and Pearl (2014) [author] Bareinboim, EliasE., Tian, JinJ. Pearl, JudeaJ. (2014). Recovering from selection bias in causal and statistical inference. In Proceedings of The Twenty-Eighth Conference on Artificial Intelligence (Carla E.C. E. Brodley PeterP. Stone, eds.). AAAI Press, Menlo Park, CA. To appear. imsref
  • Berkson (1946) [pbm] Berkson, J.J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometrics 2 47–53. issn=0006-341X, pmid=21001024 imsref
  • Bollen and Pearl (2013) [author] Bollen, K. A.K. A. Pearl, J.J. (2013). Eight myths about causality and structural equation models. In Handbook of Causal Analysis for Social Research (S. L.S. L. Morgan, ed.) Chapter 15. Springer, New York. imsref
  • Campbell and Stanley (1963) [author] Campbell, D.D. Stanley, J.J. (1963). Experimental and Quasi-Experimental Designs for Research. Wadsworth, Chicago. imsref
  • Cole and Stuart (2010) [pbm] Cole, Stephen R.S. R. Stuart, Elizabeth A.E. A. (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. Am. J. Epidemiol. 172 107–115. doi=10.1093/aje/kwq084, issn=1476-6256, pii=kwq084, pmcid=2915476, pmid=20547574 imsref
  • Davis (1984) [author] Davis, James A.J. A. (1984). Extending Rosenberg’s technique for standardizing percentage tables. Social Forces 62 679–708. issn=00377732 imsref
  • Dawid (2002) [author] Dawid, A. P.A. P. (2002). Influence diagrams for causal modelling and inference. Internat. Statist. Rev. 70 161–189. imsref
  • Ellenberg and Hamilton (1989) [author] Ellenberg, S. S.S. S. Hamilton, J. M.J. M. (1989). Surrogate endpoints in clinical trials: Cancer. Stat. Med. 8 405–413. imsref
  • Gelman and Hill (2007) [author] Gelman, AndrewA. Hill, JenniferJ. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge Univ. Press, New York. imsref
  • Glass (1976) [author] Glass, Gene V.G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher 5 3–8. issn=0013189X imsref
  • Glymour and Greenland (2008) [author] Glymour, M. M.M. M. Greenland, S.S. (2008). Causal diagrams. In Modern Epidemiology, 3rd ed. (K. J.K. J. Rothman, S.S. Greenland T. L.T. L. Lash, eds.) 183–209. Lippincott Williams & Wilkins, Philadelphia, PA. imsref
  • Haavelmo (1943) [mr] Haavelmo, TrygveT. (1943). The statistical implications of a system of simultaneous equations. Econometrica 11 1–12. issn=0012-9682, mr=0007954 imsref
  • Hayduk et al. (2003) [mr] Hayduk, LeslieL., Cummings, GretaG., Stratkotter, RainerR., Nimmo, MelanieM., Grygoryev, KostyantynK., Dosman, DonnaD., Gillespie, MichaelM., Pazderka-Robinson, HannahH. Boadu, KwameK. (2003). Pearl’s d-separation: One more step into causal thinking. Struct. Equ. Model. 10 289–311. doi=10.1207/S15328007SEM1002_8, issn=1070-5511, mr=1977133 imsref
  • Heckman (1979) [mr] Heckman, James J.J. J. (1979). Sample selection bias as a specification error. Econometrica 47 153–161. doi=10.2307/1912352, issn=0012-9682, mr=0518832 imsref
  • Hedges and Olkin (1985) [mr] Hedges, Larry V.L. V. Olkin, IngramI. (1985). Statistical Methods for Meta-Analysis. Academic Press, Orlando, FL. mr=0798597 imsref
  • Heise (1975) [author] Heise, D. R.D. R. (1975). Causal Analysis. Wiley, New York. imsref
  • Hernán, Hernández-Díaz and Robins (2004) [pbm] Hernán, Miguel A.M. A., Hernández-Díaz, SoniaS. Robins, James M.J. M. (2004). A structural approach to selection bias. Epidemiology 15 615–625. issn=1044-3983, pii=00001648-200409000-00020, pmid=15308962 imsref
  • Hernán and VanderWeele (2011) [pbm] Hernán, Miguel A.M. A. VanderWeele, Tyler J.T. J. (2011). Compound treatments and transportability of causal inference. Epidemiology 22 368–377. doi=10.1097/EDE.0b013e3182109296, issn=1531-5487, mid=NIHMS483768, pmcid=3805254, pmid=21399502 imsref
  • Höfler, Gloster and Hoyer (2010) [author] Höfler, M.M., Gloster, A. T.A. T. Hoyer, J.J. (2010). Causal effects in psychotherapy: Counterfactuals counteract overgeneralization. Psychotherapy Research 20 668–679. DOI:10.1080/10503307.2010.501041 imsref
  • Huang and Valtorta (2006) [author] Huang, Y.Y. Valtorta, M.M. (2006). Pearl’s calculus of intervention is complete. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (R.R. Dechter T. S.T. S. Richardson, eds.) 217–224. AUAI Press, Corvallis, OR. imsref
  • Joffe and Greene (2009) [mr] Joffe, Marshall M.M. M. Greene, TomT. (2009). Related causal frameworks for surrogate outcomes. Biometrics 65 530–538. doi=10.1111/j.1541-0420.2008.01106.x, issn=0006-341X, mr=2751477 imsref
  • Koller and Friedman (2009)

    [mr] Koller, DaphneD. Friedman, NirN. (2009). Probabilistic Graphical Models: Principles and Techniques. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. mr=2778120 imsref

  • Lane and Nelder (1982) [pbm] Lane, P. W.P. W. Nelder, J. A.J. A. (1982). Analysis of covariance and standardization as instances of prediction. Biometrics 38 613–621. issn=0006-341X, pmid=7171691 imsref
  • Lehmann and Casella (1998) [mr] Lehmann, E. L.E. L. Casella, GeorgeG. (1998). Theory of Point Estimation, 2nd ed. Springer, New York. mr=1639875 imsref
  • Manski (2007) [author] Manski, C.C. (2007). Identification for Prediction and Decision. Harvard Univ. Press, Cambridge, MA. imsref
  • Neyman (1923) [author] Neyman, J.J. (1923). Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle. English translation of excerpts by D. Dabrowska and T. Speed in Statist. Sci. 5 (1990) 463–472. imsref
  • Owen (2009) [mr] Owen, Art B.A. B. (2009). Karl Pearson’s meta-analysis revisited. Ann. Statist. 37 3867–3892. doi=10.1214/09-AOS697, issn=0090-5364, mr=2572446 imsref
  • Pearl (1988) [author] Pearl, J.J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA. imsref
  • Pearl (1993) [author] Pearl, J.J. (1993). Graphical models, causality, and intervention. Statist. Sci. 8 266–273. imsref
  • Pearl (1995) [mr] Pearl, JudeaJ. (1995). Causal diagrams for empirical research. Biometrika 82 669–710. doi=10.1093/biomet/82.4.669, issn=0006-3444, mr=1380809 check related imsref
  • Pearl (2000) [mr] Pearl, JudeaJ. (2000). Causality: Models, Reasoning, and Inference. Cambridge Univ. Press, Cambridge. mr=1744773 imsref
  • Pearl (2009a) [mr] Pearl, JudeaJ. (2009a). Causal inference in statistics: An overview. Stat. Surv. 3 96–146. doi=10.1214/09-SS057, issn=1935-7516, mr=2545291 imsref
  • Pearl (2009b) [mr] Pearl, JudeaJ. (2009b). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge. doi=10.1017/CBO9780511803161, mr=2548166 imsref
  • Pearl (2011) [author] Pearl, JudeaJ. (2011). The structural theory of causation. In Causality in the Sciences (P. McKay Illari, F. Russo and J. Williamson, eds.) 697–727. Clarendon Press, Oxford. imsref
  • Pearl (2012a) [author] Pearl, J.J. (2012a). The causal foundations of structural equation modeling. In Handbook of Structural Equation Modeling (R. H.R. H. Hoyle, ed.). Guilford Press, New York. imsref
  • Pearl (2012b)

    [author] Pearl, JudeaJ. (2012b). Some thoughts concerning transfer learning, with applications to meta-analysis and data-sharing estimation. Technical Report R-387, Cognitive Systems Laboratory, Dept. Computer Science, UCLA. imsref

  • Pearl (2013) [author] Pearl, J.J. (2013). Linear models: A useful “microscope” for causal analysis. J. Causal Inference 1 155–170. imsref
  • Pearl (2014) [author] Pearl, J.J. (2014). Trygve Haavelmo and the emergence of causal calculus. Econometric Theory, Special Issue on Haavelmo Centennial. Published online: 10 June 2014. DOI:10.1017/S0266466614000231. imsref
  • Pearl and Bareinboim (2011) [author] Pearl, J.J. Bareinboim, E.E. (2011). Transportability across studies: A formal approach. Technical Report R-372, Cognitive Systems Laboratory, Dept. Computer Science, UCLA. imsref
  • Petersen (2011) [pbm] Petersen, Maya L.M. L. (2011). Compound treatments, transportability, and the structural causal model: The power and simplicity of causal graphs. Epidemiology 22 378–381. doi=10.1097/EDE.0b013e3182126127, issn=1531-5487, pii=00001648-201105000-00019, pmid=21464653 imsref
  • Prentice (1989) [pbm] Prentice, R. L.R. L. (1989). Surrogate endpoints in clinical trials: Definition and operational criteria. Stat. Med. 8 431–440. issn=0277-6715, pmid=2727467 imsref
  • Richardson (2003) [mr] Richardson, ThomasT. (2003). Markov properties for acyclic directed mixed graphs. Scand. J. Stat. 30 145–157. doi=10.1111/1467-9469.00323, issn=0303-6898, mr=1963898 imsref
  • Robins (1986) [mr] Robins, JamesJ. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Math. Modelling 7 1393–1512. doi=10.1016/0270-0255(86)90088-6, issn=0270-0255, mr=0877758 imsref
  • Robins, Orellana and Rotnitzky (2008) [mr] Robins, JamesJ., Orellana, LilianaL. Rotnitzky, AndreaA. (2008). Estimation and extrapolation of optimal treatment and testing strategies. Stat. Med. 27 4678–4721. doi=10.1002/sim.3301, issn=0277-6715, mr=2528576 imsref
  • Rosenthal (1995) [author] Rosenthal, R.R. (1995). Writing meta-analytic reviews. Psychological Bulletin 118 183–192. imsref
  • Rubin (1974) [author] Rubin, D. B.D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educational Psychology 66 688–701. imsref
  • Shadish, Cook and Campbell (2002) [author] Shadish, W. R.W. R., Cook, T. D.T. D. Campbell, D. T.D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference, 2nd ed. Houghton-Mifflin, Boston. imsref
  • Shpitser and Pearl (2006) [author] Shpitser, I.I. Pearl, J.J. (2006). Identification of conditional interventional distributions. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (R.R. Dechter T. S.T. S. Richardson, eds.) 437–444. AUAI Press, Corvallis, OR. imsref
  • Spirtes, Glymour and Scheines (1993) [mr] Spirtes, PeterP., Glymour, ClarkC. Scheines, RichardR. (1993). Causation, Prediction, and Search. Lecture Notes in Statistics 81. Springer, New York. doi=10.1007/978-1-4612-2748-9, mr=1227558 imsref
  • Spirtes, Glymour and Scheines (2000) [mr] Spirtes, PeterP., Glymour, ClarkC. Scheines, RichardR. (2000). Causation, Prediction, and Search, 2nd ed. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. mr=1815675 imsref
  • Strotz and Wold (1960) [mr] Strotz, Robert H.R. H. Wold, H. O. A.H. O. A. (1960). Recursive vs. nonrecursive systems: An attempt at synthesis. Econometrica 28 417–427. issn=0012-9682, mr=0120034 imsref
  • Tian and Pearl (2002) [author] Tian, J.J. Pearl, J.J. (2002). A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence 567–573. AAAI Press/The MIT Press, Menlo Park, CA. imsref
  • Verma and Pearl (1988) [author] Verma, T.T. Pearl, J.J. (1988). Causal networks: Semantics and expressiveness. In Proceedings of the Fourth Workshop on Uncertainty in Artificial Intelligence 352–359. Mountain View, CA. Also in Uncertainty in AI 4 (1990) (R. Shachter, T. S. Levitt, L. N. Kanal and J. F. Lemmer, eds.) 69–76. North-Holland, Amsterdam. mr=1166827 imsref
  • Westergaard (1916) [author] Westergaard, H.H. (1916). Scope and method of statistics. Publ. Amer. Statist. Assoc. 15 229–276. imsref
  • White and Chalak (2009) [mr] White, HalbertH. Chalak, KarimK. (2009). Settable systems: An extension of Pearl’s causal model with optimization, equilibrium, and learning. J. Mach. Learn. Res. 10 1759–1799. issn=1532-4435, mr=2540776 imsref
  • Wright (1921) [author] Wright, S.S. (1921). Correlation and causation. J. Agricultural Research 20 557–585. imsref
  • Yule (1934) [author] Yule, G. U.G. U. (1934). On some points relating to vital statistics, more especially statistics of occupational mortality. J. Roy. Statist. Soc. 97 1–84. imsref