1 Introduction
Counterfactuals provide the basis for notions pervasive throughout human affairs, such as credit assignment, blame and responsibility, and regret. One of the most powerful constructs in human reasoning —“what if?” questions— evokes hypothetical conditions usually contradicting the factual evidence. Judgment and understanding of critical situations found from medicine to psychology to business involve counterfactual reasoning, e.g.: “Joe received the treatment and died, would he be alive had he not received it?,” “Had the candidate been male instead of female, would the decision from the admissions committee be more favorable?,” or “Would the profit this quarter remain within 5% of its value had we increased the price by 2%?”. By and large, counterfactuals are key ingredients that go in the construction of explanations about why things happened as they did pearl:2k; pearl:mackenzie2018.
The structural interpretation of causality provides proper semantics for representing counterfactuals (pearl:2k, Ch. 7). Specifically, each structural causal model (SCM) induces a collection of distributions related to the activities of seeing (called observational), doing (interventional), and imagining (counterfactual), which together were called the ladder of causation pearl:mackenzie2018; bar:etal2020. The ladder is a containment hierarchy; each type of distribution can be put in increasingly refined layers: observational content goes in layer 1; experimental in layer 2; counterfactual in layer 3 (Fig. 0(a)) .
It is understood that if we have all the information in the world about layer 1, there are still questions about layers 2 and 3 that are unanswerable, or technically undetermined; further, if we have data from layers 1 and 2, there are still questions in the world about layer 3 that are underdetermined pearl:mackenzie2018; pearl:2k; bar:etal2020.
The inferential challenge in these settings arises since the generating model is not fully observed, nor data from all of the layers are necessarily available, perhaps due to the cost or the infeasibility of performing certain interventions. One common task found in the literature is to determine the effect of an intervention of a variable on an outcome , say (layer 2), using data from observations (layer 1), where is the set of observed variables, and possibly other interventions, e.g., . Also, qualitative assumptions about the system are usually articulated in the form of a causal diagram . This setting has been studied in the literature under the rubric of nonparametric identification from a combination of observations and experiments. Multiple solutions exist, including Pearl’s celebrated docalculus pearl:95a, and other increasingly refined solutions that are computationally efficient, sufficient, and necessary Spirtes2001; galles:pea95; pearl:rob95; tian:pea02generalid; shpitser:pea06a; huang:val06identifiability; bareinboim:pea12zid; lee:etal19.
There is growing literature on identification with crosslayer inferences from data in layers 1 and 2 to quantities in layer 3. For example, a data scientist may be interested in evaluating the effect of a treatment on the group of subjects that receive it instead of those randomly assigned to treatment. This measure is known as the effect of treatment on the treated heckman:92; pearl:2k and there exists a graphical condition for mapping it to a (layer 2) causal effect shpitser:pea09. Further, there are also results on the identification of pathspecific effects, which correspond to counterfactuals that isolate specific paths in the graph pearl:01. In particular, shpitser:she18 provides a complete algorithm for identification from observational data, and zhang:bar18a gives identification conditions from observational and experimental data in specific canonical models. Moreover, shpitser:pea07 studied the identification of arbitrary (nonnested) counterfactuals under the assumption that data from experiments in every variable is available. Yet, the problem of identifying such quantities from a subset of the space of all experiments remains open.
For concreteness, consider a counterfactual called direct effect in the context of the causal diagram in Fig. 0(b). This quantity quantifies the sensitivity of a variable to changes in another variable while all other factors in the analysis remain fixed. Suppose is level of exercise, cholesterol levels, and cardiovascular disease. Exercising can improve cholesterol levels, which in turn affect the chances of developing cardiovascular disease. An interesting question is how much exercise prevents the disease by means other than regulating cholesterol. In counterfactual notation, this is to compare and where and are different values. The first quantity represents the value of when and varies accordingly. The second expression is the value attains if is held constant at while still follows . The difference —known as the natural direct effect (NDE)—is nonzero if there is some direct effect of on . In this instance, this nested counterfactual is identifiable only if observational data and experiments on are available.
After all, there is no general identification method for this particular counterfactual family (which also includes indirect and spurious effects) and, more broadly, other arbitrary nested counterfactuals that are welldefined in layer 3. Our goal is to understand the nonparametric identification of arbitrary nested and conditional counterfactuals when the input consists of any combination of observational and interventional distributions, whatever is available for the data scientist. More specifically, our contributions are as follows.

[nolistsep,nosep,leftmargin=1.3em]

We look at nested counterfactuals from an SCM perspective and introduce machinery that supports counterfactual reasoning. In particular, we prove the counterfactual unnesting theorem (CUT), which allows one to map any nested counterfactual to an unnested one (Section 2).

Building on this new machinery, we derive sufficient and necessary graphical conditions and an algorithm to determine the identifiability of marginal nested counterfactuals from an arbitrary combination of observational and experimental distributions (Section 3).

We give a reduction from conditional counterfactuals to marginal ones, and use it to derive a complete algorithm for their identification (Section 4).
See the supplemental material for full proofs of the results in the paper.
1.1 Preliminaries
We denote variables by capital letters, , and values by small letters, . Bold letters, represent sets of variables and sets of values. The domain of a variable is denoted by . Two values and are said to be consistent if they share the common values for . We also denote by the value of consistent with and by the subset of corresponding to variables in . We assume the domain of every variable is finite.
Our analysis relies on causal graphs, which we often assign a calligraphic letter, e.g., , , etc. We denote by the set of vertices (i.e., variables) in a graph . Given a graph , is the result of removing edges coming into variables in and going out from variables in . denotes a vertexinduced subgraph, which includes and the edges among its elements. We use kinship notation for graphical relationships such as parents, children, descendants, and ancestors of a set of variables. For example, the set of parents of in is denoted by . Similarly, we define , , and .
To articulate and formalize counterfactual questions, we require a framework that allows us to reason about events corresponding to different alternative worlds simultaneously. Accordingly, we employ the Structural Causal Model (SCM) paradigm (pearl:2k, Ch. 7). An SCM is a 4tuple , where is a set of exogenous (latent) variables; is a set of endogenous (observable) variables; is a collection of functions such that each variable is determined by a function . Each is a mapping from a set of exogenous variables and a set of endogenous variables to the domain of
. The uncertainty is encoded through a probability distribution over the exogenous variables,
. An SCM induces a causal diagram where every is the set of vertices, there is a directed edge for every and , and a bidirected edge for every pair such that ( and have a common exogenous parent).We assume that the underlying model is recursive. That is, there are no cyclic dependencies among the variables. Equivalently, that is to say, that the corresponding causal diagram is acyclic.
The set can be partitioned into subsets called ccomponents tian:pea02testableimplications according to a diagram such that two variables belong to the same ccomponent if they are connected in by a path made entirely of bidirected edges.
2 SCMs and Nested Counterfactuals
Intervening on a system represented by an SCM results in a new model differing from only on the mechanisms associated with the intervened variables (pearl:94a; dawid:02; dawid:15). If the intervention consists on fixing the value of a variable to a constant , it induces a submodel, denoted as (pearl:2k, Def. 7.1.2). To formally study nested counterfactuals, we extend this notion to account for models derived from interventions that replace functions from the original SCM with other, not necessarily constant, functions. [Derived Model] Let be an SCM, , , and a function. Then, , called the derived model of according to , is identical to , except that the function is replaced with a function identical to . This definition is easily extendable to models derived from an intervention on a set instead of a singleton. When is a collection of functions , the derived model is obtained by replacing each with for . Next, we discuss the concept of potential response (pearl:2k, Def. 7.4.1) with respect to derived models.
[Potential Response] Let be subsets of observable variables, let be a unit, and let be a set of functions from , for where . Then, (or , for short) is called the potential response of to , and is defined as the solution of , for a particular , in the derived model .
A potential response describes the value that variable would attain for a unit (or individual) if the intervention is performed. This concept is tightly related to that of potential outcome, but the former explicitly allows for interventions that do not necessarily fix the variables in to a constant value. Averaging over the space of , a potential response
induces a random variable that we will denote simply as
. If the intervention replaces a function with a potential response of in , we say the intervention is natural.When variables are enumerated as , we may add square brackets around the part of the subscript denoting interventions. We use to denote sets of arbitrary counterfactual variables. Let represent a set of counterfactual variables such that and for . Define , that is, the set of observables that appear in . Let
represent a vector of values, one for each variable in
and define as the subset of corresponding to for any .When all of the variables in the expression have the same subscript, that is, they belong to the same submodel; we will often denote it as .
For most realworld scenarios, having access to a fully specified SCM of the underlying system is unfeasible. Nevertheless, our analysis does not rely on such privileged access but the aspects of the model captured by the causal graph and data samples generated by the unobserved model.
2.1 Nested Counterfactuals
Potential responses can be compounded based on natural interventions. For instance, the counterfactual () can be seen as the potential response of to an intervention that makes equal to . Notice that is in itself a potential response, but from a different (nested) model. Hence we call a nested counterfactual.
Recall the causal diagram in Fig. 0(b) and consider once again the NDE as
(2) 
The second term is also equal to as is consistent with , so it is the value listens to in . Meanwhile, the first one is indeed related to , the probability of a nested counterfactual.
The following result shows how nested counterfactuals can be written in terms of nonnested ones. [Counterfactual Unnesting Theorem (CUT)] Let be any natural interventions on disjoint sets . Then for disjoint from and , we have
(3) 
Proof outline.
Based on Eq. 1, can be seen as a sum of the probabilities for the that induce the event . Such set of can be partitioned based on the values they induce, which are the same that induce the event . Then, the sum over for each subset is equal to the value of the original nested counterfactual. ∎
For instance, for the model in Fig. 0(b) we can write
(4) 
As Section 2.1 allows us to rewrite any nested counterfactual in terms of nonnested counterfactuals, we focus on the latter and assume that any given counterfactual is already unnested.
2.2 Tools for Counterfactual Reasoning
Before characterizing the identification of counterfactuals from observational and experimental data, we develop from first principles a canonical representation of any such query. First, we extend the notion of ancestors for counterfactual variables, which subsumes the usual one described before.
[Ancestors, of a counterfactual] Let be such that . Then, the set of (counterfactual) ancestors of , denoted , consist of each , such that (which includes itself), and . For a set of variables , we define as the union of the ancestors of each variable in the set. That is, . For instance, in Fig. 1(a), , and (depicted in Fig. 1(b)). In Fig. 1(c) and (represented in Fig. 1(d)).
Given a counterfactual variable , it could be the case that some values in become causally irrelevant to after the rest of has been fixed. Formally, [] Let where and is consistent with . Then, . Moreover, such simplification may reveal counterfactual expressions with equivalent or contradicting events. In Fig. 1(c), which has probability if , or that is simply . Similarly, the probabilities of counterfactuals events of the form , , and are trivially and respectively.
For a set of counterfactual variables let . Notice that each variable in the ancestral set is “interventionally minimal” in the sense of Fig. 2.
Probabilistic and causal inference with graphical models exploits local structure among variables, specifically parentchild relationships, to infer and even estimate probabilities. In particular, Tian
tian:pea02testableimplications introduced cfactors which have proven instrumental in solving many problems in causal inference. We naturally generalize this notion to the counterfactual setting with the following definition.[Counterfactual Factor (ctffactor)] A counterfactual factor is a distribution of the form
(5) 
where each and there could be for some .
For example, for Fig. 1(c) , are ctffactors but is not. Using the notion of ancestrality introduced in Section 2.2, we can factorize counterfactual probabilities as ctffactors.
[Ancestral set factorization] Let be an ancestral set, that is, , and let be a vector with a value for each variable in . Then,
(6) 
where each is taken from and is determined for each as follows:

[label=(),topsep=0pt,nolistsep,nosep,leftmargin=2em,itemsep=0.2em]

the values for variables in are the same as in , and

the values for variables in are taken from corresponding to the parents of .
Proof outline..
Following a reverse topological order in , look at each . Since any parent of not in must appear in , the composition axiom (pearl:2k, 7.3.1) licenses adding them to the subscript. Then, by exclusion restrictions pearl:95a, any intervention not involving can be removed to obtain the form in Eq. 6. ∎
For example, consider the diagram in Fig. 1(c) and the counterfactual known as the effect of the treatment on the treated (ETT) heckman:92; pearl:2k. First note that and that , then
(7) 
Then, by Fig. 2 we can write
(8) 
Moreover, the following result describes a factorization of ctffactors based on the ccomponent structure of the graph, which will prove instrumental in the next section. [Counterfactual factorization] Let be a ctffactor, let be a topological order over the variables in , and let be the ccomponents of the same graph. Define and as the values in corresponding to , then decomposes as
(9) 
Furthermore, each factor can be computed from as
(10) 
Armed with these results, we consider the identification problem in the next section.
3 Counterfactual Identification from Observations and Experiments
In this section, we consider the identification of a counterfactual probability from a collection of observational and experimental distributions. This task can be seen as a generalization of that in lee:etal19 where the available data is the same, but the query is a causal effect . Let , and assume that all of are available. Notice that is a valid choice corresponding to the observational (noninterventional) distribution.
[Counterfactual Identification] A query is said to be identifiable from in , if is uniquely computable from the distributions in any causal model which induces .
Given an arbitrary query , we could express it in terms of ctffactors by writing where and then using Fig. 2 to write as a ctffactor. For instance, the ancestral set with in Eq. 8 can be written in terms of ctffactors as
(11) 
The following lemma characterizes the relationship between the identifiability of and .
[] Let be a ctffactor and let be such that . Then, is identifiable from if and only if is identifiable from .
Once the query of interest is in ctffactorform, the identification task reduces to identifying smaller ctffactors according to the ccomponents of . In this respect, Eq. 8 implies the following Let be a ctffactor and be a ccomponent of . Then, if is not identifiable, is also not identifiable.
Proof.
Assume for the sake of contradiction that is not identifiable but is. Then, by Eq. 8, the former is identifiable from the latter, a contradiction. ∎
Let us consider the causal diagrams in Fig. 3 and the counterfactual , with , used to define quantities for fairness analysis in zhang:bar18a (e.g., ):
Unnesting  (12)  
Complete ancestral set  (13)  
Write in ctffactorform  (14) 
Due to the particular ccomponent structure of each model, we can factorize according to each model as:
(15)  
(16)  
(17) 
The question then becomes, whether ctffactors corresponding to individual ccomponents can be identified from the available input. In this example, all factors in Eq. 15 and Eq. 16 are identifiable from . For Eq. 16 in particular, they are given by
(18) 
In contrast, the factor in Eq. 17 (model Fig. 2(c)) is only identifiable if . The following definition and theorem characterize the factors that can be identified from and .
[Inconsistent ctffactor] is an inconsistent ctffactor if it is a ctffactor, has a single ccomponent, and one of the following situations hold:

[label=(),nolistsep,nosep,leftmargin=2em,itemsep=0.2em]

there exist such that and , or

there exists and such that and .
[Ctffactor identifiability] A ctffactor is identifiable from if and only if it is consistent. If consistent, let and ; then is equal to where and are consistent with .
Consider the in Fig. 3(a), we can write
(19) 
While the factor is identifiable from as , the second factor is identifiable only if experimental data on is available, as .
We can also verify that the factor in Fig. 3(b) is inconsistent. For another example consider the ETTlike expression in Fig. 3(c), we have
(20)  
(21)  
(22) 
where the factor is inconsistent.
Using the results in this section, we propose the algorithm ctfID (Algorithm 1) which given a set of counterfactual variables , corresponding values , a collection of observational and experimental distributions , and a causal diagram ; outputs an expression for in terms of the specified distributions or Fail if the query is not identifiable from such input in . Line 1 removes irrelevant subscripts from the query by virtue of Fig. 2. Then, lines 1 and 1 look for inconsistent events and redundant events, respectively. Line 2 finds the relevant ctffactors consisting of a single ccomponent, as licensed by Fig. 2 and Eq. 8. As long as the factors are consistent, and allowed by Fig. 4, lines 49 carry out identification of the causal effect from the available distributions employing the algorithm Identify tian:pea02generalid as a subroutine.^{1}^{1}1For a running example of the ctrID and details on how to use Identify, see Appendix E. The procedure fails if any of the factors is inconsistent or not identifiable from . Otherwise, it returns the corresponding expression.
[ctfID completeness] A counterfactual probability is identifiable from and if and only if ctfID returns an expression for it.
4 Identification of Conditional Counterfactuals
In this section we consider counterfactual quantities of the form . It is immediate to write such a query as with , and try to identify it using ctfID. Nevertheless, depending on the graphical structure, the original query may be identifiable even if the latter is not. To witness, consider the causal diagram in Fig. 4(a) and the counterfactual , which can be written as . Following the strategy explained so far, the numerator is equal to , where the second ctffactor is inconsistent, and therefore not identifiable from . Nevertheless, the conditional query is identifiable as
(23) 
Comments
There are no comments yet.