1 Fundamental Limitations of the Deconfounder Approach
I will begin by summarizing the argument in D’Amour (2019) critiquing the “informal” message about the deconfounder approach (stated most explicitly in the informal statement of Theorem 6 and Section 3.4). Specifically, this message asserts that, under the “no unobserved single-cause confounders” assumption, any well-fitting latent variable model will yield the correct potential outcome distribution in via the adjustment formula (1). This informal story is motivated by strong intuition. Lemmas 1–3 establish that multi-cause confounding leaves an observable “imprint” of dependence between the causes . Thus, it seems natural that we might be able to gain some information, and even adjust for, an unobserved multi-cause confounder by modeling the dependence between the causes .
Unfortunately, this intuition can only be carried so far: while a factor model for the causes can recover information about multi-cause confounders from observed data, the potential outcome distributions are not non-parametrically identified, except in cases where all confounding is observed. Thus, without additional unverifiable assumptions, no method can recover the distributions when there is unobserved confounding. In this section, I briefly demonstrate why this is the case. For a more in-depth argument about lack of identification in this setting with concrete examples, see D’Amour (2019).
As I show formally below, the key difficulty is that the causes cannot be used simultaneously as measurements of the unobserved confounder
, and as treatments whose effects are being estimated. If the eventprovides only a noisy measurement of , there is ambiguity in how the outcome model should align the variability in the residual distributions and ; there are many specifications of the residual dependence between and that are compatible with the observed data. This is a classic problem that arises when confounders are measured with error (see, e.g. Ogburn and Vanderweele, 2012). On the other hand, if the event provides a perfect measurement of , such that there is some function such that , then the overlap condition fails. In this case, is only identified when because the event
has zero probability in the observed data.
Let us now make this argument formal. To do this, we will account for how the two deconfounder assumptions of (a) good model fit, and (b) “no unobserved single-cause confounders” constrain the factor model and its implications about the potential outcomes . This accounting is convenient if we rewrite the joint distribution using copula densities
, which characterize the dependence between random variables independently of their marginal distributions.
Each factor in this composition corresponds to a different assumption. The requirement for good model fit constrains only the first term, which specifies the distribution of observable quantities, while the “no unobserved single-cause confounders” assumption constrains the second term by constraining the causes to be conditionally independent given (Lemma 2). 111The “no unobserved single-cause confounders” assumption does not uniquely identify the factor model by itself. Some structure also needs to be put on the latent variable, and even then, the factor model may not be identified. See D’Amour (2019) for an example where the factor model is itself not identified. This leaves the outcome-confounder copula density unconstrained. This copula specifies the residual dependence between and after conditioning on the causes , and plays a key role in specifying the outcome model .
To complete the argument, note that the potential outcome distributions implied by the latent variable model are sensitive to the specification of this copula. Specifically, the estimand in (1) can be written as
Plugging in different specifications of the copula here yields different conclusions about . Whenever , there are multiple specifications of the copula that yield different conclusions about the potential outcomes. 222To see this, note that the independence copula implies that . Thus, because , this copula and the true copula yield different conclusions about . Thus, is not identified unless there is no confounding and .
We can now revisit the tension between the roles of causes as measurements of , and as treatments. In cases where can only be inferred inexactly (i.e., is non-degenerate), the marginals and put some constraints on the outcome model , but the ambiguity in the copula implies that this model is not identified for any value of . In cases where can be reconstructed deterministically from the causes by some function , (i.e., is degenerate), the outcome model is identified when , but the copula is undefined whenever because this event has zero probability.
The upshot of this argument is that neither the deconfounder nor any other estimation method can adjust for unobserved confounding when estimating under the “no unobserved single-cause confounders” assumption alone. This conclusion holds no matter how much information we can glean about an unobserved confounder from the causes . Although the single-cause confounding assumption does put some non-trivial structure on the latent variable model, it is not enough for causal estimation.
This lack of identification leaves practitioners looking to apply the deconfounder with two options: either make additional assumptions about the latent variable model so that is identified, or seek out causal comparisons where all of the confounding is effectively observed. In the Theory section of the paper, the authors consider both of these paths. I will discuss each of these options in turn.
2 Parametric Identification, If You Must
I now turn to the subject of parametric identification of causal parameters, and offer some cautions about employing this strategy. Parametric identification is a natural strategy to employ when the causal parameters of interest are not non-parametrically identified. One obtains parametric identification by adding parametric assumptions to the working model that constrain the implied potential outcome distributions to be unique. The authors employ this parametric identification strategy in the experimental demonstrations of the deconfounder, as well as the formal result in Theorem 6. In Theorem 6, the copula is restricted by assuming that there is no interaction between the causes and the latent variable in the outcome model (i.e., that they combine linearly), and assuming that the confounder is piecewise constant in . In the paper’s experiments, the authors assume a parametric factor model (e.g., a quadratic factor model for the genome-wide association study simulation), and a true linear outcome model. In the cases of Theorem 6 and the GWAS simulation study, the authors prove that these parametric assumptions are sufficient for identification.
Parametric identification can be a risky strategy to employ in practice. Specifically, the fact that the parametric assumptions are necessary to identify causal parameters implies that some aspects of these assumptions are not testable in the observed data. The decomposition in (2
) makes this clear: given that the observed data are insufficient to identify the causal parameters, the parametric assumptions must restrict some of the unidentified portions of the latent variable model. Thus, to have confidence in this approach, one needs to have confidence in the parametric model used to identify causal effects as atrue model of the world, not merely as an acceptable description of the observed data. This is because the identifying parametric assumptions specify not only a descriptive model of the observed data, but also a structural model for unobserved counterfactual outcomes. Relying on parametric identification may be feasible in cases where one has strong prior knowledge—e.g., about the quantity represented by the unmeasured confounder, or the specific distributions of measurement errors—but such knowledge is often unavailable.
In addition, uncertainty estimates that are based directly on the parametric specification, e.g., Bayesian credible sets, do not capture the full extent of uncertainty about causal effects according to the data. Specifically, these uncertainty estimates only quantify uncertainty within the specified model, and do not include the fundamental uncertainty associated with the lack of non-parametric identification of the potential outcome distributions . As a result, unless the prior information used to specify the parametric assumptions is very strong, these uncertainty estimates will understate the degree of uncertainty about a causal parameter estimate. This is a standard critique of parametric uncertainty quantification, but carries extra weight in the context where conclusions depend on untestable aspects of the parametric model. For example, for the parametrically identified latent variable model in the GWAS example, as the sample size grows, the posterior for the causal parameter will concentrate around a single value, even though there exists a range of outcome models that correspond to different copulas that are equivalently compatible with the observed data, but would concentrate on different causal parameters. In fact, even small, seemingly benign parametric choices can mask alternative causal explanations. Lessons from latent variable models in the missing data and causal inference literatures can be instructive here. For example, analyses of the widely-used Heckman selection model (Heckman, 1979) have noted that the tail thickness of priors on latent variables can induce starkly different conclusions that are hidden by using the Gaussian default (Little and Rubin, 2015; Ding, 2014). See also discussions in Robins et al. (2000) and Linero and Daniels (2017) for other examples.
Here, sensitivity analysis can be a useful tool to account for the fundamental uncertainty due to non-identification of the causal estimand. When performed with parametric models, sensitivity analyses perturb the parametric assumptions made with the estimating model in order to understand what other causal conclusions could be obtained under different parametric specifications. Performing sensitivity analyses on deconfounder estimates is straightforward: a number of sensitivity analysis approaches employ a working model with the same latent variable structure (e.g., Rosenbaum and Rubin, 1983; Imbens, 2003; Dorie et al., 2016; Cinelli and Hazlett, 2018). However, sensitivity analyses can also fall victim to spurious parametric identification if the perturbations are not appropriately parameterized (Gustafson et al., 2018). To avoid this issue, it can be useful to employ sensitivity analysis strategies that cleanly separate the portions of the model that are identified by the observed data from those that are identified by parametric assumptions (Franks et al., 2019; Robins et al., 2000; Linero and Daniels, 2017). In the context of the deconfounder, the decomposition in (2) is a promising place to start, and is the subject of current work.
3 Toward a More Selective Deconfounder Workflow
A more cautious alternative to pursuing parametric identification is to seek out causal questions that have definitive answers under the “no unobserved single-cause confounders” assumption. The authors take this path in Theorems 7 and 8, in a setting where the latent confounder can be deterministically reconstructed as a function of the causes . Here, however, the factor model seems less interesting as a tool for calculating causal effects, and more interesting as a tool for establishing empirically when no unobserved confounding is present. In my opinion, this seems to be a more interesting thread to follow.
To review, in Theorem 7 the authors consider partitioning the causes into a set of focal causes whose effects will be estimated, and a set of auxiliary causes that will serve as measurements of the latent confounder. The theorem then states that if the latent confounder can be written as a function of the auxiliary causes alone, 333This is not how the theorem is stated, but this function restriction is implied by the subsequent overlap condition. then the distributions of potential outcomes defined with respect to the subset of focal causes are identifiable subject to an overlap condition. Meanwhile, Theorem 8 states that certain counterfactual potential outcome distributions of the form are identifiable as long as the causes and map to the same value of the latent confounder, i.e., .
In these results, the authors focus on the role of the factor model in the identification of causal estimands under the “no unobserved single-cause confounders” assumption. However, the factor model is not essential for this point. Note that Theorems 7 and 8 both imply that the causal parameters can be identified in terms of the causes alone, because it is assumed that the confounder can be written as a function of . Written with slightly more generality, the identification result in Theorem 7 implies:
while the identification result in Theorem 8 implies:
To me, the more interesting point is that the factor model can be used in some cases to determine empirically whether some of the assumptions of the theorems are met. For example, the setting of Theorem 7 can be framed as a problem where the unobserved confounder is measured with proxies . It is well-understood that in the limit where is perfectly recovered by the proxies, the potential outcome distribution is identified (Ogburn and Vanderweele, 2012)
; however, in single-cause problems, one cannot determine whether this condition has been met. Similarly, Theorem 8 can be framed as a setting where one is imputing a set of counterfactual outcomes within a subpopulation where there is no confounding because, within this subpopulation, the confounder is fixed. Here, too, in single-cause problems, one cannot definitively identify such subpopulations from observed data. Interestingly, the theory of multi-cause confounding presented in the paper suggests that these assumptions can be empirically validated under some restrictions on the causal DAG relatingto and the “no unobserved single-cause confounders” assumption. For example, this theory supports the following proposition.
Suppose there are no single-cause confounders, and the structural relationships between causes , latent confounder , and observed outcomes can be represented in the DAG in Figure 1. Suppose that in addition to causes , we also have auxiliary covariates , which are conditionally independent of the causes conditional on the multi-cause confounder . Then for any function such that the causes are mutually independent conditional on , the conditional independence also holds for each .
Theorems 7 and 8 can be written as consequences of this proposition. This proposition is potentially useful because it shows that absence of certain confounding structures has observable implications. This insight is closely related to the literature on negative controls (see, e.g., Lipsitch et al., 2010).
This result suggests that one can use a similar workflow to the deconfounder to determine, at least in principle, whether identification statements like (3) or (4) are valid in a given setting. Specifically, one can obtain a function (perhaps by fitting a factor model), then test whether the causes appear to be mutually independent conditional on . If one is satisfied that this is true, (3) or (4) can be applied. Importantly, this procedure is truly agnostic to the parametric specification of the model used to obtain : all of the conditions are only functions of observables.
While the workflow in this procedure is similar to the deconfounder, it has a different use case. Instead of enabling causal inference in a wide range of cases, this procedure would be used to determine whether one can proceed with unconfounded inference at all, and can potentially give “no” as an answer. Still, this sort of procedure can prove useful in complex data contexts, where it can be valuable to surface causal questions that can be adequately answered with the available data. In a specific example of this approach, Sharma et al. (2018) propose a similar testing procedure to uncover unconfounded comparisons, and use it to evaluate the causal effect of a recommender system on purchasing rates for certain products.
In outlining this procedure, I have belabored the point that it is a workflow “in principle” because it could prove tricky to implement. The observable implication that needs to be tested is a complex conditional independence statement, and these are notoriously difficult to test in practice (Shah and Peters, 2018). In particular, one would receive the “green light” to estimate a causal parameter by failing to reject the null of conditional independence, which can only be reliably depended upon if the test has acceptably high power, but designing such tests is difficult, and in some settings, impossible.
Here, it can again be helpful to turn back to sensitivity analysis. Instead of attempting to rule out all possible forms of dependence between the causes conditional on , a sensitivity analysis approach could explore a number of candidate models for the residual dependence between the causes and relate these models to the confounding induced by the unobserved confounder . For example, one could examine the range of causal effects that would be compatible with the assumption that, conditional on , the the causes are no more predictive of a potential outcome than any leave-one-out set of the causes is able to predict a held-out cause . This sort of calibration argument is common in more standard sensitivity analyses (Imbens, 2003; Dorie et al., 2016; Franks et al., 2019; Cinelli and Hazlett, 2018). In cases where dependence between the causes can be ruled out conclusively, this approach would yield a sensitivity region that collapses to a point; however, in the more likely case where many dependences cannot be ruled out, this approach would represent this uncertainty with a wider sensitivity region. It should be noted that constructing a plausible sensitivity analysis of this type would require deep domain knowledge to justify the analogy between different dependences between variables. Negative control methods and related identification strategies Lipsitch et al. (2010) and Miao et al. (2018) could be framed as particularly successful executions of this type of argument.
In writing this paper, the authors have drawn attention to a problem that is simultaneously scientifically important, methodologically interesting, and conceptually subtle. Although I have taken on the role of critic in our conversations, I believe their contribution here is important. I remain skeptical about the deconfounder as a method for causal point estimation, but believe that the authors’ characterization of multi-cause confounding could yield fruitful developments in sensitivity analysis, and in potentially obtaining identification results in more complex settings. This work has certainly inspired me to pay more attention to this problem, and to consider how new methods and tools can be developed to help practitioners draw principled causal conclusions in this setting.
- Cinelli and Hazlett (2018) Carlos Cinelli and Chad Hazlett. Making sense of sensitivity: Extending omitted variable bias. Technical report, Working Paper, 2018.
Bayesian robust inference of sample selection using selection-t
Journal of Multivariate Analysis, 124:451–464, 2014.
- Dorie et al. (2016) Vincent Dorie, Masataka Harada, Nicole Bohme Carnegie, and Jennifer Hill. A flexible, interpretable framework for assessing sensitivity to unmeasured confounding. Statistics in medicine, 35(20):3453–3470, 2016.
On multi-cause causal inference with unobserved confounding:
Counterexamples, impossibility, and alternatives.
The 22nd International Conference on Artificial Intelligence and Statistics, pages 3478–3486, 2019.
- Franks et al. (2019) Alex Franks, Alex D’Amour, and Avi Feller. Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association, (just-accepted):1–38, 2019.
- Gustafson et al. (2018) Paul Gustafson, Lawrence C McCandless, et al. When is a sensitivity parameter exactly that? Statistical Science, 33(1):86–95, 2018.
- Heckman (1979) James J Heckman. Sample selection bias as a specification error. Econometrica, 47(1):153–161, 1979.
- Imbens (2003) Guildo W Imbens. Sensitivity to exogeneity assumptions in program evaluation. American Economic Review, 93(2):126–132, 2003.
- Linero and Daniels (2017) Antonio R Linero and Michael J Daniels. Bayesian approaches for missing not at random outcome data: The role of identifying restrictions. 2017.
- Lipsitch et al. (2010) Marc Lipsitch, Eric Tchetgen Tchetgen, and Ted Cohen. Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology (Cambridge, Mass.), 21(3):383, 2010.
- Little and Rubin (2015) Roderick JA Little and Donald B Rubin. Statistical analysis with missing data. John Wiley & Sons, 2015.
- Miao et al. (2018) Wang Miao, Zhi Geng, and Eric J Tchetgen Tchetgen. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika, 105(4):987–993, 2018.
- Ogburn and Vanderweele (2012) Elizabeth L Ogburn and Tyler J Vanderweele. Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders. Biometrika, 100(1):241–248, 2012.
- Robins et al. (2000) James M Robins, Andrea Rotnitzky, and Daniel O Scharfstein. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical models in epidemiology, the environment, and clinical trials, pages 1–94. Springer, 2000.
- Rosenbaum and Rubin (1983) Paul R Rosenbaum and Donald B Rubin. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society: Series B (Methodological), 45(2):212–218, 1983.
- Shah and Peters (2018) Rajen D Shah and Jonas Peters. The hardness of conditional independence testing and the generalised covariance measure. arXiv preprint arXiv:1804.07203, 2018.
- Sharma et al. (2018) Amit Sharma, Jake M Hofman, Duncan J Watts, et al. Split-door criterion: Identification of causal effects through auxiliary outcomes. The Annals of Applied Statistics, 12(4):2699–2733, 2018.
- Wang and Blei (2019) Yixin Wang and David M Blei. Multiple causes: A causal graphical view. arXiv preprint arXiv:1905.12793, 2019.