1 Introduction
A typical assumption made in the treatment effect estimation literature is ignorability – i.e., that there are no unobserved confounders. This is a useful assumption since it (along with other assumptions) enables pointidentification of treatment effects from observed data (Imbens and Wooldridge, 2009; Pearl, 2009). Ignorability may be more plausible for datasets where we collect an exhaustive number of covariates (e.g., electronic health record data (Johnson et al., 2018; Jensen et al., 2012)), but this assumption is untestable based on observed data.
Let be the outcome of interest and be the treatment. Relaxing the ignorability assumption, we may assume the existence of an unobserved confounder , and make assumptions about the strength of and to get an interval estimate of the treatment effect – this is known as sensitivity analysis (Cornfield et al., 1959; Rosenbaum, 2010; Robins et al., 2000). Some proposals for sensitivity analysis proceed with additional modeling or distributional assumptions about the unmeasured confounder (Bross, 1966; Schlesselman, 1978; Rosenbaum and Rubin, 1983), which induce additional untestable assumptions. To overcome this, we make the following contributions in this work:

Making minimal assumptions about , we bound the confounding bias^{1}^{1}1In this work, we use “confounding bias” to refer to , though previous work has used it to mean e.g., (Zheng et al., 2021; VanderWeele and Arah, 2011). between the observational expectation and the interventional expectation (Pearl, 2009) by a treatment sensitivity parameter (quantifying the strength of the direct connection ), and an outcome sensitivity parameter (quantifying the strength of the direct connection ). These bounds are tight when either or (i.e., when ignorability is satisfied).

We examine a special case of these bounds that is relatively easy to calibrate, and apply it to obtain the interval estimates of treatment effects for any two treatments . Our results are also applicable to conditional average treatment effect (CATE) estimation (conditioned on observed covariates ).

We discuss possible calibration strategies for the bound, allowing us to find reasonable sensitivity parameter values.
2 Methods
2.1 Basic setup
Suppose we observe, for each unit (of units), a treatment , an outcome , and observed covariates . Hence, our observed dataset is . Additionally, we assume the existence of an unobserved confounder . We use the capital letters
to denote the random variables for the covariates, the treatment, the outcome and the unobserved confounder, respectively. The assumed causal graph
(Pearl, 2009) relating these is shown in Figure 1.We make the following assumptions:
Assumption 1 (Latent ignorability).
The set blocks all backdoor paths between and .
Assumption 2 (Positivity).
.
For brevity, we omit the condition
from all conditional statements/probabilities and leave it as implicit. We also use
e.g., as shorthand for and e.g., as shorthand for the density of given . Under the above assumptions, we can write the interventional distribution as:(1)  
(2) 
where holds from Assumption 1 (Pearl, 2009; D’Amour, 2019). We can also write the observational distribution as:
(3) 
Distribution is of interest but inestimable (from observed data), and is estimable but uninteresting. In general (the key difference is that we integrate over in (2) vs. in (3)) – however, there are two special cases (no unobserved confounding) where :

: .

: .
Next, we extrapolate from the above two scenarios – specifically, we provide a bound on that vanishes when or .
2.2 Hölder bounds on the confounding bias
Here, we state our main result on bounding the confounding bias ,
Theorem 1.
Assuming
the confounding bias is bounded, for any s.t. , by:
(4) 
where and are the treatment and outcome sensitivity parameters (respectively), defined by:
(5)  
(6) 
All proofs are provided in the Supplementary Material (SM, Section A.1). Intuitively, quantifies the strength of the connection , and it is easy to see that when . Similarly, quantifies the strength of the connection , and it is easy to see when . Hence, under no unobserved confounding, the bound in (4) vanishes.
2.3 Special cases
There are infinitely many bounds we could obtain from Theorem 1, parametrized by the choice of – we focus on only one of them here, since it is relatively easy to interpret.
Corollary 1.
Remark: Note that additional assumptions are required to guarantee that the RHS of (7) is finite – it is sufficient to assume is bounded, which guarantees that the outcome sensitivity parameter is finite.
Corollary 1 bounds the confounding bias by the totalvariation distance between and and the largest absolute difference between the conditional expected outcome and the average expected outcome . We argue that this constitutes an interpretable version of Theorem 1 that is relatively easy to calibrate – we elaborate on this in Section 2.5.
Finally, we can of course write the tightest bound from the class of bounds in Theorem 1:
(8) 
This is an interesting optimization problem for future work.
2.4 Treatment effect bounds
For any two treatments , we define the average treatment effect and the ignorable treatment effect estimate as:
(9)  
(10) 
Below, we use the result in Corollary 1 to bound the average treatment effect.
Corollary 2.
For any , we have:
(11) 
where the halfwidth is defined by:
(12) 
2.5 Calibration strategies
As for any sensitivity analysis, we need to either (a) justifiably set or (b) calibrate (from observed data) the values of the sensitivity parameters: in our case, we need a strategy to calibrate the treatment sensitivity parameter as well as the outcome sensitivity parameter .
2.5.1 Calibration for ATEs
Outcome sensitivity parameter
In order to set the outcome sensitivity parameter, with the additional assumption that , we can rewrite Corollary 1 as:
(13) 
Here, the outcome sensitivity parameter is the maximum percent difference between the expected outcome for an individual/unit and the overall expected outcome – we argue that this can be set by a subjectmatter expert. The LHS also has a nice interpretation as the percent deviation of the observational expectation from the interventional expectation. Alternatively, we can compute and make a calibration assumption that
(14) 
This calibration assumption is untestable, but it is in the same vein as assumptions made in Franks et al. (2020); Zheng et al. (2021); Cinelli and Hazlett (2020).
Treatment sensitivity parameter
We can make another calibration assumption: that . can be approximated from samples as:
(15) 
where can be estimated using a propensity model (e.g.
, logistic regression), and
. A derivation of (15) is provided in the SM (Section A.2).2.5.2 Calibration for CATEs
For convenience, we rewrite the bound in (7) conditioned on observed covariates :
(16) 
where
(17)  
(18) 
The calibration strategies discussed in Section 2.5.1 work for the average , but more careful treatment is required to calibrate bounds for for a specific covariate value . For this purpose, we borrow the ideas from Zheng et al. (2021); Cinelli and Hazlett (2020).
Outcome sensitivity parameter
To calibrate the outcome sensitivity parameter, we can “hide” the th observed confounder dimension. We can compute the maximum absolute difference between the “complete” expectation from the “incomplete” expectation (akin to equation (18)):
(19) 
where and can be estimated via regression. Finally, we can make a calibration assumption:
(20) 
where the max is taken over all dimensions of the observed covariates.
Treatment sensitivity parameter
For the treatment sensitivity parameter, we can approximate (akin to eq. (17)) via:
(21) 
where represents a “hidden” covariate dimension and , are estimated via logistic regression. A derivation of the above approximation is provided in the SM (Section A.2). Finally, we make the following calibration assumption:
(22) 
There are practical concerns with the above proposals for CATE interval calibration:

Computing for a single requires maximization over while fixing in the expectation (approximated via a regression model over all observed covariates). We can find all the unique values of in the dataset, then take the max over those unique values.
3 Related Work
There is an extensive body of literature on sensitivity analysis to the ignorability assumption (Robins et al., 2000; McCandless et al., 2007; VanderWeele and Arah, 2011; Lee, 2011). Most proposals, similar to ours, assume a “strength” of and (under different definitions of “strength”) and examine the deviation of a causal estimand of interest from a “naive” estimate (i.e., one that assumes ignorability) based on the assumed strength parameters. Ding and Vanderweele (2016) provide a lower bound on the true risk ratio based on two ratioscale sensitivity parameters (one treatment sensitivity parameter, and one outcome sensitivity parameter) – they also provide a lower bound on the risk difference based on these same parameters. Franks et al. (2020) propose a framework for flexible modeling of the observed outcomes and relate the observed and unobserved potential outcome distributions via Tukey’s factorization. Zheng et al. (2021) use a copula parametrization to relate the interventional distribution to the observational distribution and show that, under some assumptions about the datagenerating process and in the multicause setting, we can identify the treatment sensitivity parameter up to a “causal equivalence class”. Kallus et al. (2019) and Jesson et al. (2021)
use an odds ratio between the complete propensity and nominal propensity to quantify the strength of unobserved confounding, and make assumptions about its magnitude to bound treatment effect estimates. Closely related to our work are the confounding bias formulas in
VanderWeele and Arah (2011), where the authors provide formulas for the difference between “naive” effect estimates and true estimates. While this bias is an exact difference (and not a bound), it is difficult to calibrate against observed data, since one has to make assumptions about the distributions of – in contrast this work proposes bounds on the bias (not an exact bias formula) by the product of only two scalars, each of which can be calibrated against observed data.4 Experiments
4.1 Binary/Categorical
Let be binary. We perform the following experiment:

For each drawn and for all , compute the: bias , outcome sensitivity parameter , treatment sensitivity parameter .
Figure 2 shows the bound from Corollary 1 and the confounding bias for all sampled distributions . We see that, for every bias value, we can find a joint distribution for which the bound is close to the true bias (in the binary case) – we will more thoroughly explore bound tightness in future work.
Figure 3 is a contour plot of confounding bias vs. treatment and outcome sensitivity parameters – it shows that the confounding bias has an increasing trend with both sensitivity parameters, suggesting they are of equal importance in bounding the bias. We perform the same experiment for categorical – the results are shown in the SM (Section A.3).
4.2 IHDP dataset
We perform experiments on the Infant Health and Development Program (IHDP) dataset (Hill, 2011), which is semisimulated (i.e., measured covariates but synthetic outcomes) and measures the effect of trained provider visits on children’s test scores. There are 100 datasets within IHDP^{2}^{2}2downloaded from https://www.fredjo.com, each with an index . Similar to (Jesson et al., 2021), we induce hidden confounding by hiding one of the covariates (specifically, ).
4.2.1 ATE interval estimation
For ATE estimation on the IHDP dataset, we first compute the naïve/ignorable ATE estimate:
(23) 
where and . Next, we compute the calibrated treatment and outcome sensitivity parameters via:
(24)  
(25) 
where , is estimated via logistic regression, and is estimated using a TARNet (Shalit et al., 2017)
regression model. For details on hyperparameter settings, see the SM (Section
A.4). Finally, from Corollary 2, we compute the calibrated interval as:(26) 
where
We can generalize the interval in (26) to scalar multiples of the calibration halfwidth , as:
(27) 
This interval becomes when and degenerates to the pointestimate when .
For the IHDP dataset, we compute:

the ATE inclusion rate – i.e., the percentage of datasets (out of 100 repetitions) for which the computed interval includes the true ATE. Formally, this is:
(28) where is an indicator function, is the true ATE for the th dataset, and is the ATE interval for the th dataset.

the ATE interval zerocrossing rate – i.e., the percentage of datasets for which the computed interval contains 0:
(29)
A useful estimated ATE interval should do two things: include the true ATE and exclude 0. Desideratum is desirable because we can make a recommendation about which treatment is better on average, even under unobserved confounding. Scaling the interval (by the scalar ) trades off the “correctness” of the interval (measured by ) with its “usefulness” (measured by ). We plot vs. for different values in Figure 4 – the red point showing , i.e., our proposed calibrated ATE interval, which achieves a ATE inclusion rate, and a zerocrossing rate over the 100 repetitions of IHDP (with the 9th covariate hidden).
4.2.2 CATE interval estimation
We perform a similar experiment for CATE estimation on IHDP, this time focusing only on the first dataset :

First, we train an ignorable model (specifically, a TARNet) to predict the expected outcomes (respectively) – we define the naïve CATE estimate as . We also train a logistic propensity model to approximate .

Next, we compute calibrated sensitivity parameters:
(30) (31) 
Finally, we compute the calibrated intervals with halfwidth multiplier :
(32) where the halfwidth is:
(33)
Similar to the previous section, we compute the following metrics:

The CATE inclusion rate – i.e. the percentage of samples for which the computed interval includes the true CATE :
(34) 
The CATE interval zerocrossing rate – i.e. the percentage of samples for which the computed interval crosses 0:
(35)
Figure 5 shows our results for CATE estimation on the first repetition of IHDP – the red point shows , our proposed calibrated interval, which achieves a CATE inclusion rate of and a zerocrossing rate of .
5 Conclusions
We have developed a bound on the confounding bias based on Hölder’s inequality, and used it to compute bounds on both average and conditional average treatment effects. We discussed possibilities to calibrate the sensitivity parameters in the bound, enabling practical sensitivity analysis. Finally, we performed experiments on synthetic and semisynthetic data, showcasing empirical properties of our bound and how it can be used in practice.
This work leaves several gaps and open research directions, which we aim to explore in future work:

What conditions on make our calibration assumptions (im)plausible? Empirically, we could check this by adding observed covariates to the experiment in Section 4.1, but the question also warrants a theoretical analysis. Also, using the IHDP dataset is not an ideal “stress test” for our calibration assumptions, since we artificially induce hidden confounding by hiding .

Can we find more computationally efficient calibration strategies for CATEs (in particular, one that doesn’t require fitting many outcome models for different covariate dimensions )?

Can we modify the bounds to make them work with a linearGaussian model, as in Zheng et al. (2021)? In their current form, the bounds we provide are vacuous (infinite) for the linearGaussian model.

How might we characterize the bounds’ tightness? How do the bounds in this work quantitatively compare to other treatment effect bounds in the literature?
References
 Spurious effects from an extraneous variable. Journal of Chronic Diseases 19 (6), pp. 637–647. Cited by: §1.
 Making sense of sensitivity: extending omitted variable bias. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (1), pp. 39–67. External Links: Document, Link, https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssb.12348 Cited by: §2.5.1, §2.5.2.
 Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22 (1), pp. 173–203. Cited by: §1.

On multicause approaches to causal inference with unobserved counfounding: two cautionary failure cases and a promising alternative.
In
Proceedings of the TwentySecond International Conference on Artificial Intelligence and Statistics
, K. Chaudhuri and M. Sugiyama (Eds.), Proceedings of Machine Learning Research, Vol. 89, pp. 3478–3486. External Links: Link Cited by: §2.1.  Sensitivity analysis without assumptions. Epidemiology 27 (3), pp. 368–377. Cited by: §3.
 Flexible sensitivity analysis for observational studies without observable implications. Journal of the American Statistical Association 115 (532), pp. 1730–1746. External Links: Document, Link, https://doi.org/10.1080/01621459.2019.1604369 Cited by: §2.5.1, §3.
 Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20 (1), pp. 217–240. Cited by: §4.2.
 Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47 (1), pp. 5–86. Cited by: §1.
 Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics 13 (6), pp. 395–405. Cited by: §1.
 Quantifying ignorance in individuallevel causaleffect estimates under hidden confounding. External Links: 2103.04850 Cited by: §3, §4.2.
 Causal inference on electronic health records to assess blood pressure treatment targets: an application of the parametric g formula.. In PSB, pp. 180–191. Cited by: §1.
 Interval estimation of individuallevel causal effects under unobserved confounding. K. Chaudhuri and M. Sugiyama (Eds.), Proceedings of Machine Learning Research, Vol. 89, , pp. 2281–2290. External Links: Link Cited by: §3.
 Bounding the bias of unmeasured factors with confounding and effectmodifying potentials. Statistics in Medicine 30 (9), pp. 1007–1017. Cited by: §3.
 Bayesian sensitivity analysis for unmeasured confounding in observational studies. Statistics in Medicine 26 (11), pp. 2331–2347. Cited by: §3.
 Causality: models, reasoning and inference. 2nd edition, Cambridge University Press, USA. Cited by: item 1, §1, §2.1, §2.1.
 Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical models in epidemiology, the environment, and clinical trials, pp. 1–94. Cited by: §1, §3.
 Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society: Series B (Methodological) 45 (2), pp. 212–218. Cited by: §1.
 Design sensitivity and efficiency in observational studies. Journal of the American Statistical Association 105 (490), pp. 692–702. Cited by: §1.
 Assessing effects of confounding variables. American Journal of Epidemiology 108 (1), pp. 3–8. Cited by: §1.
 Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning, D. Precup and Y. W. Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, pp. 3076–3085. External Links: Link Cited by: §A.4, Table 1, §4.2.1.
 Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology 22 (1), pp. 42–52. External Links: Document, Link Cited by: §3, footnote 1.
 Copulabased sensitivity analysis for multitreatment causal inference with unobserved confounding. External Links: 2102.09412 Cited by: §2.5.1, §2.5.2, §3, item 3, footnote 1.
Appendix A Supplementary Material
a.1 Proofs
Theorem 1 (restated).
Assuming
the confounding bias is bounded, for any s.t. , by:
(36) 
where and are the treatment and outcome sensitivity parameters (respectively), defined by:
(37)  
(38) 
Proof.
We can write the confounding bias as:
(39)  
(40) 
Noting that , we have:
(41)  
(42)  
(43)  
(44) 
where holds by Fubini’s theorem (since is finite by assumption). By Hölder’s inequality, we have:
(45) 
for any s.t. . ∎
Corollary 2 (restated).
For any , we have:
(46) 
where the halfwidth is defined by:
(47) 
Comments
There are no comments yet.