Hölder Bounds for Sensitivity Analysis in Causal Reasoning

07/09/2021
by   Serge Assaad, et al.
0

We examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using Hölder's inequality, we derive a set of bounds on the confounding bias |E[Y|T=t]-E[Y|do(T=t)]| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U is independent of T or when U is independent of Y given T (when there is no unobserved confounding). We focus on a special case of this bound depending on the total variation distance between the distributions p(U) and p(U|T=t), as well as the maximum (over all possible values of U) deviation of the conditional expected outcome E[Y|U=u,T=t] from the average expected outcome E[Y|T=t]. We discuss possible calibration strategies for this bound to get interval estimates for treatment effects, and experimentally validate the bound using synthetic and semi-synthetic datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/08/2021

Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

We study the problem of learning conditional average treatment effects (...
04/27/2021

Simple yet Sharp Sensitivity Analysis for Unmeasured Confounding

We present a method for assessing the sensitivity of the true causal eff...
03/03/2020

Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding

It is a truth universally acknowledged that an observed association with...
12/26/2021

Omitted Variable Bias in Machine Learned Causal Models

We derive general, yet simple, sharp bounds on the size of the omitted v...
02/08/2021

Sharp Sensitivity Analysis for Inverse Propensity Weighting via Quantile Balancing

Inverse propensity weighting (IPW) is a popular method for estimating tr...
06/22/2021

Algorithmic Recourse in Partially and Fully Confounded Settings Through Bounding Counterfactual Effects

Algorithmic recourse aims to provide actionable recommendations to indiv...
10/05/2018

Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding

We study the problem of learning conditional average treatment effects (...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A typical assumption made in the treatment effect estimation literature is ignorabilityi.e., that there are no unobserved confounders. This is a useful assumption since it (along with other assumptions) enables point-identification of treatment effects from observed data (Imbens and Wooldridge, 2009; Pearl, 2009). Ignorability may be more plausible for datasets where we collect an exhaustive number of covariates (e.g., electronic health record data (Johnson et al., 2018; Jensen et al., 2012)), but this assumption is untestable based on observed data.

Let be the outcome of interest and be the treatment. Relaxing the ignorability assumption, we may assume the existence of an unobserved confounder , and make assumptions about the strength of and to get an interval estimate of the treatment effect – this is known as sensitivity analysis (Cornfield et al., 1959; Rosenbaum, 2010; Robins et al., 2000). Some proposals for sensitivity analysis proceed with additional modeling or distributional assumptions about the unmeasured confounder (Bross, 1966; Schlesselman, 1978; Rosenbaum and Rubin, 1983), which induce additional untestable assumptions. To overcome this, we make the following contributions in this work:

  1. Making minimal assumptions about , we bound the confounding bias111In this work, we use “confounding bias” to refer to , though previous work has used it to mean e.g., (Zheng et al., 2021; VanderWeele and Arah, 2011). between the observational expectation and the interventional expectation (Pearl, 2009) by a treatment sensitivity parameter (quantifying the strength of the direct connection ), and an outcome sensitivity parameter (quantifying the strength of the direct connection ). These bounds are tight when either or (i.e., when ignorability is satisfied).

  2. We examine a special case of these bounds that is relatively easy to calibrate, and apply it to obtain the interval estimates of treatment effects for any two treatments . Our results are also applicable to conditional average treatment effect (CATE) estimation (conditioned on observed covariates ).

  3. We discuss possible calibration strategies for the bound, allowing us to find reasonable sensitivity parameter values.

2 Methods

2.1 Basic setup

Suppose we observe, for each unit (of units), a treatment , an outcome , and observed covariates . Hence, our observed dataset is . Additionally, we assume the existence of an unobserved confounder . We use the capital letters

to denote the random variables for the covariates, the treatment, the outcome and the unobserved confounder, respectively. The assumed causal graph

(Pearl, 2009) relating these is shown in Figure 1.

Figure 1: Assumed causal graph between unobserved confounder , observed confounder , treatment , and outcome . Gray indicates is unobserved, and the dotted double-sided arrow indicates a possible correlation.

We make the following assumptions:

Assumption 1 (Latent ignorability).

The set blocks all backdoor paths between and .

Assumption 2 (Positivity).

.

For brevity, we omit the condition

from all conditional statements/probabilities and leave it as implicit. We also use

e.g., as shorthand for and e.g., as shorthand for the density of given . Under the above assumptions, we can write the interventional distribution as:

(1)
(2)

where holds from Assumption 1 (Pearl, 2009; D’Amour, 2019). We can also write the observational distribution as:

(3)

Distribution is of interest but inestimable (from observed data), and is estimable but uninteresting. In general (the key difference is that we integrate over in (2) vs. in (3)) – however, there are two special cases (no unobserved confounding) where :

  1. : .

  2. : .

Next, we extrapolate from the above two scenarios – specifically, we provide a bound on that vanishes when or .

2.2 Hölder bounds on the confounding bias

Here, we state our main result on bounding the confounding bias ,

Theorem 1.

Assuming

the confounding bias is bounded, for any s.t. , by:

(4)

where and are the treatment and outcome sensitivity parameters (respectively), defined by:

(5)
(6)

All proofs are provided in the Supplementary Material (SM, Section A.1). Intuitively, quantifies the strength of the connection , and it is easy to see that when . Similarly, quantifies the strength of the connection , and it is easy to see when . Hence, under no unobserved confounding, the bound in (4) vanishes.

2.3 Special cases

There are infinitely many bounds we could obtain from Theorem 1, parametrized by the choice of – we focus on only one of them here, since it is relatively easy to interpret.

Corollary 1.

Setting in Theorem 1, we get:

(7)

where is the total variation distance.

Remark: Note that additional assumptions are required to guarantee that the RHS of (7) is finite – it is sufficient to assume is bounded, which guarantees that the outcome sensitivity parameter is finite.

Corollary 1 bounds the confounding bias by the total-variation distance between and and the largest absolute difference between the conditional expected outcome and the average expected outcome . We argue that this constitutes an interpretable version of Theorem 1 that is relatively easy to calibrate – we elaborate on this in Section 2.5.

Finally, we can of course write the tightest bound from the class of bounds in Theorem 1:

(8)

This is an interesting optimization problem for future work.

2.4 Treatment effect bounds

For any two treatments , we define the average treatment effect and the ignorable treatment effect estimate as:

(9)
(10)

Below, we use the result in Corollary 1 to bound the average treatment effect.

Corollary 2.

For any , we have:

(11)

where the half-width is defined by:

(12)

2.5 Calibration strategies

As for any sensitivity analysis, we need to either (a) justifiably set or (b) calibrate (from observed data) the values of the sensitivity parameters: in our case, we need a strategy to calibrate the treatment sensitivity parameter as well as the outcome sensitivity parameter .

2.5.1 Calibration for ATEs

Outcome sensitivity parameter

In order to set the outcome sensitivity parameter, with the additional assumption that , we can rewrite Corollary 1 as:

(13)

Here, the outcome sensitivity parameter is the maximum percent difference between the expected outcome for an individual/unit and the overall expected outcome – we argue that this can be set by a subject-matter expert. The LHS also has a nice interpretation as the percent deviation of the observational expectation from the interventional expectation. Alternatively, we can compute and make a calibration assumption that

(14)

This calibration assumption is untestable, but it is in the same vein as assumptions made in Franks et al. (2020); Zheng et al. (2021); Cinelli and Hazlett (2020).

Treatment sensitivity parameter

We can make another calibration assumption: that . can be approximated from samples as:

(15)

where can be estimated using a propensity model (e.g.

, logistic regression), and

. A derivation of (15) is provided in the SM (Section A.2).

2.5.2 Calibration for CATEs

For convenience, we rewrite the bound in (7) conditioned on observed covariates :

(16)

where

(17)
(18)

The calibration strategies discussed in Section 2.5.1 work for the average , but more careful treatment is required to calibrate bounds for for a specific covariate value . For this purpose, we borrow the ideas from Zheng et al. (2021); Cinelli and Hazlett (2020).

Outcome sensitivity parameter

To calibrate the outcome sensitivity parameter, we can “hide” the -th observed confounder dimension. We can compute the maximum absolute difference between the “complete” expectation from the “incomplete” expectation (akin to equation (18)):

(19)

where and can be estimated via regression. Finally, we can make a calibration assumption:

(20)

where the max is taken over all dimensions of the observed covariates.

Treatment sensitivity parameter

For the treatment sensitivity parameter, we can approximate (akin to eq. (17)) via:

(21)

where represents a “hidden” covariate dimension and , are estimated via logistic regression. A derivation of the above approximation is provided in the SM (Section A.2). Finally, we make the following calibration assumption:

(22)

There are practical concerns with the above proposals for CATE interval calibration:

  1. Taking a max over all covariate dimensions (as in (20) and (22)) is computationally costly, particularly for high-dimensional covariates, as it requires training separate regression/propensity models for each .

  2. Computing for a single requires maximization over while fixing in the expectation (approximated via a regression model over all observed covariates). We can find all the unique values of in the dataset, then take the max over those unique values.

3 Related Work

There is an extensive body of literature on sensitivity analysis to the ignorability assumption (Robins et al., 2000; McCandless et al., 2007; VanderWeele and Arah, 2011; Lee, 2011). Most proposals, similar to ours, assume a “strength” of and (under different definitions of “strength”) and examine the deviation of a causal estimand of interest from a “naive” estimate (i.e., one that assumes ignorability) based on the assumed strength parameters. Ding and Vanderweele (2016) provide a lower bound on the true risk ratio based on two ratio-scale sensitivity parameters (one treatment sensitivity parameter, and one outcome sensitivity parameter) – they also provide a lower bound on the risk difference based on these same parameters. Franks et al. (2020) propose a framework for flexible modeling of the observed outcomes and relate the observed and unobserved potential outcome distributions via Tukey’s factorization. Zheng et al. (2021) use a copula parametrization to relate the interventional distribution to the observational distribution and show that, under some assumptions about the data-generating process and in the multi-cause setting, we can identify the treatment sensitivity parameter up to a “causal equivalence class”. Kallus et al. (2019) and Jesson et al. (2021)

use an odds ratio between the complete propensity and nominal propensity to quantify the strength of unobserved confounding, and make assumptions about its magnitude to bound treatment effect estimates. Closely related to our work are the confounding bias formulas in

VanderWeele and Arah (2011), where the authors provide formulas for the difference between “naive” effect estimates and true estimates. While this bias is an exact difference (and not a bound), it is difficult to calibrate against observed data, since one has to make assumptions about the distributions of – in contrast this work proposes bounds on the bias (not an exact bias formula) by the product of only two scalars, each of which can be calibrated against observed data.

4 Experiments

4.1 Binary/Categorical

Let be binary. We perform the following experiment:

  • Draw 30,000 joint distributions

    from a Dirichlet().

  • For each drawn and for all , compute the: bias , outcome sensitivity parameter , treatment sensitivity parameter .

Figure 2 shows the bound from Corollary 1 and the confounding bias for all sampled distributions . We see that, for every bias value, we can find a joint distribution for which the bound is close to the true bias (in the binary case) – we will more thoroughly explore bound tightness in future work.

Figure 3 is a contour plot of confounding bias vs. treatment and outcome sensitivity parameters – it shows that the confounding bias has an increasing trend with both sensitivity parameters, suggesting they are of equal importance in bounding the bias. We perform the same experiment for categorical – the results are shown in the SM (Section A.3).

Figure 2: Confounding bias and Hölder bound, vs. the index of the sampled distribution (sorted by the confounding bias value). WLOG, we plot the bias for .
Figure 3: Confounding bias vs. treatment sensitivity parameter and outcome sensitivity parameter. WLOG, we plot the bias for .

4.2 IHDP dataset

We perform experiments on the Infant Health and Development Program (IHDP) dataset (Hill, 2011), which is semi-simulated (i.e., measured covariates but synthetic outcomes) and measures the effect of trained provider visits on children’s test scores. There are 100 datasets within IHDP222downloaded from https://www.fredjo.com, each with an index . Similar to (Jesson et al., 2021), we induce hidden confounding by hiding one of the covariates (specifically, ).

4.2.1 ATE interval estimation

For ATE estimation on the IHDP dataset, we first compute the naïve/ignorable ATE estimate:

(23)

where and . Next, we compute the calibrated treatment and outcome sensitivity parameters via:

(24)
(25)

where , is estimated via logistic regression, and is estimated using a TARNet (Shalit et al., 2017)

regression model. For details on hyperparameter settings, see the SM (Section

A.4). Finally, from Corollary 2, we compute the calibrated interval as:

(26)

where

We can generalize the interval in (26) to scalar multiples of the calibration half-width , as:

(27)

This interval becomes when and degenerates to the point-estimate when .

For the IHDP dataset, we compute:

  • the ATE inclusion rate – i.e., the percentage of datasets (out of 100 repetitions) for which the computed interval includes the true ATE. Formally, this is:

    (28)

    where is an indicator function, is the true ATE for the -th dataset, and is the ATE interval for the -th dataset.

  • the ATE interval zero-crossing rate – i.e., the percentage of datasets for which the computed interval contains 0:

    (29)

A useful estimated ATE interval should do two things: include the true ATE and exclude 0. Desideratum is desirable because we can make a recommendation about which treatment is better on average, even under unobserved confounding. Scaling the interval (by the scalar ) trades off the “correctness” of the interval (measured by ) with its “usefulness” (measured by ). We plot vs. for different values in Figure 4 – the red point showing , i.e., our proposed calibrated ATE interval, which achieves a ATE inclusion rate, and a zero-crossing rate over the 100 repetitions of IHDP (with the 9th covariate hidden).

Figure 4: ATE vs. ATE . The points highlight the cases (ignorable point-estimate ), (proposed interval ), and (conservative interval

). The shaded area is the standard error (over 100 datasets).

4.2.2 CATE interval estimation

We perform a similar experiment for CATE estimation on IHDP, this time focusing only on the first dataset :

  • First, we train an ignorable model (specifically, a TARNet) to predict the expected outcomes (respectively) – we define the naïve CATE estimate as . We also train a logistic propensity model to approximate .

  • Next, we compute calibrated sensitivity parameters:

    (30)
    (31)
  • Finally, we compute the calibrated intervals with half-width multiplier :

    (32)

    where the half-width is:

    (33)

Similar to the previous section, we compute the following metrics:

  • The CATE inclusion rate – i.e. the percentage of samples for which the computed interval includes the true CATE :

    (34)
  • The CATE interval zero-crossing rate – i.e. the percentage of samples for which the computed interval crosses 0:

    (35)

Figure 5 shows our results for CATE estimation on the first repetition of IHDP – the red point shows , our proposed calibrated interval, which achieves a CATE inclusion rate of and a zero-crossing rate of .

Figure 5: CATE vs. CATE . The points highlight the cases (ignorable point-estimate ), (proposed interval ), and (conservative interval ). The shaded area is the standard error (over all samples).

5 Conclusions

We have developed a bound on the confounding bias based on Hölder’s inequality, and used it to compute bounds on both average and conditional average treatment effects. We discussed possibilities to calibrate the sensitivity parameters in the bound, enabling practical sensitivity analysis. Finally, we performed experiments on synthetic and semi-synthetic data, showcasing empirical properties of our bound and how it can be used in practice.

This work leaves several gaps and open research directions, which we aim to explore in future work:

  1. What conditions on make our calibration assumptions (im)plausible? Empirically, we could check this by adding observed covariates to the experiment in Section 4.1, but the question also warrants a theoretical analysis. Also, using the IHDP dataset is not an ideal “stress test” for our calibration assumptions, since we artificially induce hidden confounding by hiding .

  2. Can we find more computationally efficient calibration strategies for CATEs (in particular, one that doesn’t require fitting many outcome models for different covariate dimensions )?

  3. Can we modify the bounds to make them work with a linear-Gaussian model, as in Zheng et al. (2021)? In their current form, the bounds we provide are vacuous (infinite) for the linear-Gaussian model.

  4. How might we characterize the bounds’ tightness? How do the bounds in this work quantitatively compare to other treatment effect bounds in the literature?

References

  • I. D. Bross (1966) Spurious effects from an extraneous variable. Journal of Chronic Diseases 19 (6), pp. 637–647. Cited by: §1.
  • C. Cinelli and C. Hazlett (2020) Making sense of sensitivity: extending omitted variable bias. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (1), pp. 39–67. External Links: Document, Link, https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssb.12348 Cited by: §2.5.1, §2.5.2.
  • J. Cornfield, W. Haenszel, E. C. Hammond, A. M. Lilienfeld, M. B. Shimkin, and E. L. Wynder (1959) Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22 (1), pp. 173–203. Cited by: §1.
  • A. D’Amour (2019) On multi-cause approaches to causal inference with unobserved counfounding: two cautionary failure cases and a promising alternative. In

    Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics

    , K. Chaudhuri and M. Sugiyama (Eds.),
    Proceedings of Machine Learning Research, Vol. 89, pp. 3478–3486. External Links: Link Cited by: §2.1.
  • P. Ding and T. Vanderweele (2016) Sensitivity analysis without assumptions. Epidemiology 27 (3), pp. 368–377. Cited by: §3.
  • A. M. Franks, A. D’Amour, and A. Feller (2020) Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association 115 (532), pp. 1730–1746. External Links: Document, Link, https://doi.org/10.1080/01621459.2019.1604369 Cited by: §2.5.1, §3.
  • J. L. Hill (2011) Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20 (1), pp. 217–240. Cited by: §4.2.
  • G. W. Imbens and J. M. Wooldridge (2009) Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47 (1), pp. 5–86. Cited by: §1.
  • P. B. Jensen, L. J. Jensen, and S. Brunak (2012) Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics 13 (6), pp. 395–405. Cited by: §1.
  • A. Jesson, S. Mindermann, Y. Gal, and U. Shalit (2021) Quantifying ignorance in individual-level causal-effect estimates under hidden confounding. External Links: 2103.04850 Cited by: §3, §4.2.
  • K. W. Johnson, B. S. Glicksberg, R. Hodos, K. Shameer, and J. T. Dudley (2018) Causal inference on electronic health records to assess blood pressure treatment targets: an application of the parametric g formula.. In PSB, pp. 180–191. Cited by: §1.
  • N. Kallus, X. Mao, and A. Zhou (2019) Interval estimation of individual-level causal effects under unobserved confounding. K. Chaudhuri and M. Sugiyama (Eds.), Proceedings of Machine Learning Research, Vol. 89, , pp. 2281–2290. External Links: Link Cited by: §3.
  • W. Lee (2011) Bounding the bias of unmeasured factors with confounding and effect-modifying potentials. Statistics in Medicine 30 (9), pp. 1007–1017. Cited by: §3.
  • L. C. McCandless, P. Gustafson, and A. Levy (2007) Bayesian sensitivity analysis for unmeasured confounding in observational studies. Statistics in Medicine 26 (11), pp. 2331–2347. Cited by: §3.
  • J. Pearl (2009) Causality: models, reasoning and inference. 2nd edition, Cambridge University Press, USA. Cited by: item 1, §1, §2.1, §2.1.
  • J. M. Robins, A. Rotnitzky, and D. O. Scharfstein (2000) Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical models in epidemiology, the environment, and clinical trials, pp. 1–94. Cited by: §1, §3.
  • P. R. Rosenbaum and D. B. Rubin (1983) Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society: Series B (Methodological) 45 (2), pp. 212–218. Cited by: §1.
  • P. R. Rosenbaum (2010) Design sensitivity and efficiency in observational studies. Journal of the American Statistical Association 105 (490), pp. 692–702. Cited by: §1.
  • J. J. Schlesselman (1978) Assessing effects of confounding variables. American Journal of Epidemiology 108 (1), pp. 3–8. Cited by: §1.
  • U. Shalit, F. D. Johansson, and D. Sontag (2017) Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning, D. Precup and Y. W. Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, pp. 3076–3085. External Links: Link Cited by: §A.4, Table 1, §4.2.1.
  • T. J. VanderWeele and O. A. Arah (2011) Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology 22 (1), pp. 42–52. External Links: Document, Link Cited by: §3, footnote 1.
  • J. Zheng, A. D’Amour, and A. Franks (2021) Copula-based sensitivity analysis for multi-treatment causal inference with unobserved confounding. External Links: 2102.09412 Cited by: §2.5.1, §2.5.2, §3, item 3, footnote 1.

Appendix A Supplementary Material

a.1 Proofs

Theorem 1 (restated).

Assuming

the confounding bias is bounded, for any s.t. , by:

(36)

where and are the treatment and outcome sensitivity parameters (respectively), defined by:

(37)
(38)
Proof.

We can write the confounding bias as:

(39)
(40)

Noting that , we have:

(41)
(42)
(43)
(44)

where holds by Fubini’s theorem (since is finite by assumption). By Hölder’s inequality, we have:

(45)

for any s.t. . ∎

Corollary 2 (restated).

For any , we have:

(46)

where the half-width is defined by:

(47)
Proof.
(48)
(49)
(50)
(51)

where holds from Corollary 1. Similarly, starting from equation (49):

(52)
(53)

where also holds from Corollary 1. ∎

a.2 Total Variation Distance approximations

We derive the approximation for the TV presented in equation (15):

(54)
(55)
(56)

Taking MC samples from , we get:

(57)

Finally, we derive the approximation for the TV presented in (21):

(58)