Causal inference taking into account unobserved confounding

12/01/2017 ∙ by Minna Genbäck, et al. ∙ Umeå universitet 0

Causal inference with observational data can be performed under an assumption of no unobserved confounders (unconfoundedness assumption). There is, however, seldom clear subject-matter or empirical evidence for such an assumption. We therefore develop uncertainty intervals for average causal effects based on outcome regression estimators and doubly robust estimators, which provide inference taking into account both sampling variability and uncertainty due to unobserved confounders. In contrast with sampling variation, uncertainty due unobserved confounding does not decrease with increasing sample size. The intervals introduced are obtained by deriving the bias of the estimators due to unobserved confounders. We are thus also able to contrast the size of the bias due to violation of the unconfoundedness assumption, with bias due to misspecification of the models used to explain potential outcomes. This is illustrated through numerical experiments where bias due to moderate unobserved confounding dominates misspecification bias for typical situations in terms of sample size and modeling assumptions. We also study the empirical coverage of the uncertainty intervals introduced and apply the results to a study of the effect of regular food intake on health. An R-package implementing the inference proposed is available.



There are no comments yet.


page 13

page 14

page 31

page 32

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In observational studies, a causal effect of a treatment can be identified given the assumption that all variables confounding the effect on the outcome of interest are observed. This unconfoundedness assumption (also called ignorability of the treatment assignment mechanism) is regarded as the Achilles heel of non-experimental studies (Liu et al., 2013) and is not testable without further information (e.g., de Luna and Johansson, 2006, 2014). We therefore develop in this paper uncertainty intervals (Vansteelandt et al., 2006) for average causal effects based on outcome regression estimators and doubly robust estimators, which provide inference taking into account sampling variability and uncertainty due to unobserved confounders. The intervals are obtained by deriving the bias of the estimators due to unobserved confounders as a function of a parameter

(called bias parameter in the sequel) quantifying the amount of unobserved confounding. Using the bias expressions, we deduce bounds on the average causal effects (an identification set in contrast to point identification available under unconfoundedness). Combining these bounds with sampling variability yields uncertainty intervals that have the property to cover the parameter of interest with higher probability than an a priori chosen level (say 95%). The bounds obtained are useful when information on the parameter

is available since they can then be made tighter, in contrast with worst case scenario bounds (e.g., Manski, 2003, Horowitz and Manski, 2006).

The approach taken here is directly related to the quickly expanding literature on methods to perform a sensitivity analysis to the unconfoundedness assumption (Rosenbaum, 2010

, Chap. 14). And, indeed, the uncertainty intervals proposed may be used to perform such a sensitivity analysis whereby the maximum value of the bias parameter is presented for which the uncertainty interval covers zero (no causal effect). Among existing methods to perform sensitivity analyses, many are based on specifying parametric models on how a potential confounder affects the outcome and treatment assignment given the observed covariates, thereby introducing bias parameters (one for the effect of the confounder on the observed outcome and the other for the effect of the confounder on the treatment). Then, typically, using some distributional assumptions for the hypothetical unobserved confounder, the latter is integrated out in order to obtain the bias of an estimator as function of the bias parameters; see, e.g.,

Rosenbaum (2010); Lin et al. (1998); Robins et al. (2000); Gastwirth et al. (1998); Imbens (2003); VanderWeele and Arah (2011) using a frequentist approach, and Greenland (2005); de Luna and Lundin (2014) using a Bayesian framework. The confounder and outcome are often assumed binary, but some approaches allow for a continuous confounder and/or outcome (e.g., VanderWeele and Arah, 2011). A directly related literature deals with sensitivity analyses to departures from the ignorability assumption of a missing outcome data mechanism (missing at random assumption); see, e.g., Copas and Eguchi (2001, 2005); Scharfstein et al. (2003); Daniels and Hogan (2008). In fact, by using the potential outcome framework (Rubin, 1974), the estimation of a causal effect can be cast into a problem of missing outcome (unobserved potential outcomes) and sensitivity analyses for the missing at random assumption can readily be used to study deviations from the unconfoundedness assumption. Alternative approaches to parametrising the relation between a potential unobserved confounder and the outcome and treatment, is to define the bias due to non-ignorability (treatment assignment/missingness mechanism) as the bias parameter, and, e.g., put a prior on this bias within a Bayesian framework (Daniels and Hogan, 2008; Josefsson et al., 2016). Finally, an approach we find appealing from an interpretation point of view is to consider as the bias parameter, the correlation (induced by unobserved confounders) between the treatment assignment and the potential outcomes given the observed covariates; see Copas and Li (1997) and Genbäck et al. (2015) within a missing outcome context, and Imai et al. (2010) within a parametric mediation analysis context. This approach has the advantage of introducing only one bias parameter for each missingness mechanism.

In this paper, we build upon the latter alternative to perform inference on a causal parameter that takes into account uncertainty due to unobserved confounding and sampling variability. When estimating the average causal effect, two data missingness mechanisms must be considered (one for the outcome under treatment and one for the outcome when no treatment is assigned) implying the need for two ignorability assumptions for point-identification, i.e. two bias parameters. On the other hand, if the interest lies solely in an average treatment effect on the treated (or the non-treated), then only one missingness mechanism has to be dealt with, thereby only one bias parameter. We obtain bounds on the causal effect of interest by deducing the bias of the estimators as function of the bias parameter(s). Thus, we are also able to contribute by contrasting the size of the bias of the outcome regression estimator due to, i) violation of the unconfoundedness assumption, and to ii) the misspecification of the models used to explain outcome. Indeed three types of uncertainty can be distinguished: sampling variation, model misspecification and unobserved confounding. Sampling variation decreases as sample size increases. Model misspecification bias may be tackled with double robust estimation, and in any case this bias can in principle be made arbitrarily small with larger samples, e.g. under sparsity assumptions, by increasing the flexibility of the models used. On the other hand, bias due to unobserved confounding, does not disappear with increasing sample size as long as unobserved confounders are omitted, and is therefore essential to take into account in observational studies.

The paper is organized as follows. First, a framework for deducing bounds based on a parametrised model is introduced in Section 2. In Section 3 we focus on outcome regression and double robust estimators of average causal effects. We deduce their bias under confounding and show that confounding bias and model misspecification bias are separable. From confounding bias expressions we obtain bounds, and their corresponding uncertainty intervals for the parameter of interest. The R-package ui implements the methods proposed (available at In Section 4 simulated experiments are conducted to study the relative size of the biases due to confounding and model misspecification, as well to investigate empirical coverage of the uncertainty intervals proposed. In Section 5 we perform a sensitivity analysis in a real data example. The paper is concluded in Section 6.

2 Identification and sampling variation

2.1 Model for point identification

Let and be two potential outcomes, where is the outcome when treatment is assigned (), and is the outcome when treatment is assigned (). The two potential outcomes are defined for each individual in the study although only one is observed ( is observed when , and is observed when ). We further assume that a set of covariates is observed for all individuals. To allow for a more compact notation in the following sections we let the first element of represent the intercept. The evaluation of a treatment effect on the outcome may be done by considering average effects. In this paper we focus on both , the average causal effect, and , the average causal effect on the treated.

Without loss of generality, let us write


where , are functions of and . Let further


where is an indicator function, is not observed, is a function of and .

We can now give sufficient conditions for point identification of and .

Assumption 1a.


Assumption 1b.

, where is the support of

Assumption 2a.


Assumption 2b.

, where is the support of

Assumption 1a and 2a are often called unconfoundedness or ignorability assumptions. We have that and are point identified under Assumption 2.1 and under Assumptions 2.1 and 2.1 respectively (Rosenbaum and Rubin, 1983).

2.2 Bias parameters and uncertainty intervals

The ignorability assumptions of the treatment assignment mechanism cannot be tested with the observed data unless extra information is available, e.g, instrumental variables; see de Luna and Johansson (2014). Thus, unless Assumptions 1a and 2a are true by design of the study, uncertainty about their validity should be taken into account in the inference. For this purpose, it is useful to parametrise deviations from Assumptions 1a and 2a The following parametrization models realistic deviations and is easy to communicate to potential users.

Assumption 3.

Consider model (1-2) with , for an unknown parameter , and, for , let , , , , and .

Here we have introduced the bias parameters and . This model is such that Assumption 1a holds when and not otherwise, and Assumption 2a holds when and not otherwise. Hence, these parameters describe departures from ignorability of the treatment assignment mechanism. We call them bias parameters since they tune the bias that will result from assuming ignorability. The normality of corresponds to a choice of link function for (2) which is convenient mathematically in the sequel, but is not otherwise essential in the model. Note, morevoer, that normality may be relaxed to a more general class of distributions (Genbäck et al., 2015, Sec. 3.2).

If we have unmeasured confounders, one way to interpret is to rewrite the error terms , and from equations (1) and (2) as the sum of the error that can and cannot be explained by the unmeasured confounder. For instance, if we believe that unmeasured confounder(s) explain of the variation in and of the variation in , and that the unmeasured confounders affect treatment assignment negatively and positively, then .

The approach proposed here is to deduce an identification interval for the parameter of interest by using an estimator which is unbiased for under unconfoundedness (). Then, the bias of the estimator is computed as a function of the bias parameter , where

is a nuisance parameter vector (containing

, , and ). Finally, this bias expression together with out-of-data information (if any) on and/or in the form of an interval yields an identification interval for :


where is the true value of , and is the expectation taken over the observed data law, i.e. corresponding to the true but unknown values for and . Note that is the no out-of-data information case. In some applications, however, one may have out-of-data information, for instance that the treatment assignment is not negatively correlated with the outcome, , for some . Another instance arises when a rich and relevant set of covariates is available, in which case one may believe that for small.

In situations where a consistent and asymptotically normal estimator of is available, denoted

, with corresponding standard errors,

, an uncertainty interval containing with probability at least (Gorbach and de Luna, 2017) is given by:



and is the

percentile of the standard normal distribution.

3 Outcome regression and doubly robust estimators: bias and inference

We now consider two families of estimators and apply the approach described above to deduce their bias and the resulting uncertainty intervals. We will use the following assumption of correctly specified regression models.

Assumption 4a.


Assumption 4b.


where and are parameter vectors, and the first element of is 1. Assumption 4 can be made very general by replacing by where includes bases functions of the space spanned by , e.g. cubic splines.

3.1 Estimators of average causal effects

Let us assume that we have a random sample of size of which are treated and are controls (not treated), and let be the indexes for the treated and be the indexes for the controls. We denote , for , where is a matrix of size containing the elements , is the number of covariates and is a vector with elements . We consider the following outcome regression estimators for the average causal effect and average causal effect on the treated (e.g., Tan, 2007):


The outcome regression estimator is an unbiased estimate of

(or ) under Assumption 2.1 (and 2.1), and Assumption 4a (and b.). The doubly robust estimator consists of the outcome regression estimators above with a correction term for the potential misspecification of . For and , the doubly robust estimators are (e.g., Scharfstein et al., 1999, Lunceford and Davidian, 2004 and Rothe and Firpo, 2013):


where is an estimate of the propensity score . The doubly robust estimators are unbiased under Assumption 2.1 (and 2.1), and Assumption 4a (and b.) and/or a correctly specified propensity score model, see below.

3.2 Bias expressions

For the sake of simplicity we denote the (total) bias of an estimator by bias. We investigate two sources of bias, the bias due to model misspecification (Assumption 3 not fulfilled), bias, and the bias due to unobserved confounding (non-ignorability of the treatment assignment mechanism, Assumption 3), bias, as summarized in Table 1. All proofs are given in Appendix B.

Proposition 1.

Bias of OR estimators under correctly specified models.
Under Assumptions 1b, 3, 4a and Regularity Assumption Appendix A in Appendix A,

and under Assumptions 1b, 2b, 3 and 3,

where is a vector with all elements 1 of length , and is a vector of length containing the elements , is the inverse Mill’s ratio and , and and are the normal pdf and cdf.

Let us further investigate model misspecification bias in combination with non-ignorability of treatment.

Proposition 2.

Bias of OR estimators under model misspecification.
Under Assumptions 1b ,3 and Regularity Assumption Appendix A in Appendix A,

and under Assumptions 1b, 2b and 3,

where is a vector of length containing the elements , .

Proposition 3.

Bias of DR estimators.
Under Assumptions 1b and 3 and Regularity Assumption Appendix A and 2 in Appendix A,

Under Assumptions 1b, 2b, 3 and Regularity Assumption 2 in Appendix A,

Assumption 4
Estimator fulfilled not fulfilled
Table 1: The total bias of the outcome regression and double robust estimators decomposed into bias due to model misspecification (bias) and bias due to confounding (bias), see Proposition 1-3 for details.

Since the doubly robust estimator is the outcome regression estimator with a correction term, there is a link between Proposition 2 and 3:


Since for is often close to linear in (Puhani, 2000) the expectation of the difference between and the linear projection of is small. Hence the difference between the confounding bias of doubly robust and outcome regression estimators is also typically small.

3.3 Uncertainty intervals

From Proposition 1-3 identification intervals and uncertainty intervals as defined in Section 2.2 can be derived for and . The estimation of all elements of (Proposition 1-3) is straightforward with the exception of , . A consistent estimator of is given by:



is the residual sample variance from the OLS fit of the potential outcome

(i.e. against ) and is a vector of length containing the elements ; a proof of this result for is found in Gorbach and de Luna (2017, page 9).

From Proposition 1-3, identification intervals, assuming , can be derived by replacing with from (9). For instance, from Proposition 3, an estimated identification interval for is given by:

Ignoring the sampling variability from , and noting that is asymptotically normally distributed (Tsiatis, 2006), the lower and upper bound of the uncertainty interval, are respectively (see (4)):




where is the percentile of the standard normal distribution. Estimated uncertainty intervals for and the outcome regression estimators are obtained similarly. Standard errors for the outcome regression and doubly robust estimators are given in Appendix C. Note that ignoring the sampling variability from in (10) and (11) should have no serious consequences because it is of lower asymptotic order. This is confirmed by the simulation study in Section 4.

4 Simulation study

The purpose of the simulation experiments is to illustrate the relative sizes of the biases due to model misspecification and confounding, as well as to study the empirical coverages of the proposed uncertainty intervals. The data is generated using four different designs, linear or non-linear, using one or five covariates. For each design, we use two different treatment assignments with different amount of imbalance in the propensity scores between the treated and the non-treated. One with low imbalance (high overlap), L1 and , and one with high imbalance (low overlap), L1 and . L1 is the area which is not overlaid in a graph with two density histograms of the propensity scores for the treated and untreated, see Iacus et al. (2011, equation (5)). This measure varies with different bin-size. We use the default of the function hist in R statistical software on all the propensity scores (both from treated and untreated) to select bin-size. In all designs we have about treated, and use a linear model for the treatment assignment mechanism, i.e. in (2). For all four designs we use a sample size of 250 and 500, with 10 000 replications and compute uncertainty intervals based on Proposition 1 and 3, i.e. using the outcome regression estimators adjusting for but not and using the doubly robust estimators adjusting for . Finally, we let

where 0, 0.05, 0.1, 0.3, and 0.5.

In Design A and B we use and , to generate low and high L1 respectively, and . In Design A (linear) we use the outcome equations: , and . In Design B (non-linear) we use and , where:

{43mm ,
{43mm ,

These choices were made to make polynomial approximation difficult.

In Design C and D we use and , to generate low and high L1 respectively. The covariates are simulated such that , and

are Bernoulli distributed with probability

and , where

is uniformly distributed in

, and where . In Design C (linear) we use the outcome equations: , and . In Design D (nonlinear) we use, , .

For all designs we fit a correctly specified propensity score model, , and and with linear models in

, i.e. Assumption 4 is fulfilled in Design A and C but not in Design B and D. We compare width and coverage of 95% confidence intervals for

with and (the corresponding confidence intervals and uncertainty intervals are used for ).

4.1 Results

Figure 1 displays magnitude of the empirical bias of the outcome regression estimator (both model misspecification and confouning bias), defined in Section 3.2, for the two non-linear designs with correctly specified propensity score models. We can see that bias is larger when the propensity scores of the treated and untreated are more separeted (L1 low). However, even when L1 is high, bias and bias are of approximately the same magnitude when , and for dominates . Note also that in Design D and have different signs implying that the total bias is smaller than the confounding bias. Hence, the outcome regression estimator has a smaller total bias than the doubly robust estimator in such a case, since the confounding bias of the two estimators is almost the same.

Figure 1: The magnitude of , and with varying for the two non-linear designs, with low and high imbalance in the propensity scores.

The uncertainty intervals are wider than the confidence interval per definition, which is confirmed in Figure 2 - 3 and Appendix D. In particular, the uncertainty intervals derived under the assumption that and/or are around twice as wide as the corresponding confidence intervals. The empirical coverage of the 95 uncertainty intervals are, as expected, generally high if the assumption on and/or is met ( and/or is covered by the pre-specified interval from which the uncertainty interval is derived); see Figure 2 - 3 and Appendix D. However, if the assumption is not met the empirical coverage is less than . When using the outcome regression estimator in Design B, we do not necessarily expect 95 coverage of the UI:s, even if the assumption on is met, because the outcome regression model is misspecified, and has the same sign as . However, the empirical coverage is at least 95 due to three reasons: first, the uncertainty intervals have higher coverage than 95 if the models are correctly specified; second, is overestimated due to model misspecification; and third, bias has the same size or smaller than bias. Note finally that the empirical coverage of the 95 confidence intervals assuming no unobserved confounding is too low even for small , see Figure 2 - 3 and Appendix D.

(a) Treatment assignment with low L1.
(b) Treatment assignment with high L1.
Figure 2: Boxplot of the width of two 95% uncertainty intervals (assuming , green, and , blue, ) and the 95% confidence interval, red, for the doubly robust (DR) and outcome regression (OR) estimator of under design A-D for and

, with sample size 250. The empirical coverage of each interval is written below each boxplot and the number of outliers that lie outside the window is written at the top of the window above each boxplot.

(a) Treatment assignment with low L1.
(b) Treatment assignment with high L1.
Figure 3: Boxplot of the width of two 95% uncertainty intervals (assuming , green, and , blue, ) and the 95% confidence interval, red, for the doubly robust (DR) and outcome regression (OR) estimator of under design A-D for and , with sample size 500. The empirical coverage of each interval is written below each boxplot and the number of outliers that lie outside the window is written at the top of the window above each boxplot.

5 Effect of regular food intake on health

SHARE is a longitudinal survey on health, socio-economic status and social networks of individuals aged 50 years or older from several European countries (Börsch-Supan et al., 2013). The sampling in SHARE is on a household level where all residents in the household (almost exclusively one individual or one man and one women) are interviewed. We focus this study on women in the 13 countries that participate in both wave 4 and 5 of SHARE, which were collected in 2011 (baseline) and 2013 (follow-up). The observed sample consists of 12 842 individuals. We are interested in investigating the causal effect of regular food intake on health. We define regular food intake as eating at least 3 full meals a day at baseline. A full meal is defined as eating more than 2 items or dishes when you sit down to eat. For example, eating potatoes, vegetables, and meat; or eating an egg, bread, and fruit are both considered full meals. The health outcome used is change in maximum grip strength (in kg, maximum of 4 measures using a dynamometer) from 2011 to 2013. Grip strength is associated with both health-related quality of life, disability and mortality, see e.g. Sayer et al. (2006) and Gale et al. (2007).

In order to estimate the causal effect of interest we control for covariates measured at baseline. These covariates include health, cognition, lifestyle, and socioeconomic variables as well as other background characteristics. The health variables include self reported health (excellent; very good; good; fair; or poor), number of problems with mobility (such as walking; lifting small objects; lifting heavy objects; etc., maximum 10), number of chronic diseases (such as diabetes; cancer; asthma; etc., maximum 15), depression (number of symptoms of depression, maximum 12, using the EURO-D scale), body mass index () and limitations in daily life due to health problems (yes; no). We measure cognition with the number of animals the subject was able to state during 1 minute. The lifestyle variables consist of high alcohol use (drinking at least one glass of alcohol for women and two glasses for men at least 5 days a week), smoking (smoker; stopped smoking; non-smoker), physical inactivity (if respondents engaged in moderate to vigorous physical activity at most 1 to 3 times a month) and having a social network (have someone to discuss important things with, talk at least several times a week). The socioeconomic variables include education level (level 0-1; 2; 3; 4; or 5-6, using ISCED-97 scale) and whether or not the subject is living in an apartment or freestanding building. Finally, demographic characteristics consist of age, sex and country of residence (Austria; Germany; Sweden; Netherlands; Spain; Italy; France; Denmark; Switzerland; Belgium; Czech republic; Slovenia; or Estonia).

We estimated the causal effects by controlling for all main effects in the two potential outcome models. We used two different treatment assignment models, one with all main effects and one more flexible. The flexible treatment assignment model was fitted with a LASSO to select terms from all main effects together with interactions and quadratic terms. More specifically we use the R package glmnet and choose the largest value of the tuning parameter such that the mean cross validated error is within one standard error of the minimum, see Friedman et al. (2010) for details. The models including the selected terms are then refitted using maximum likelihood. The balance in the propensity scores is fairly similar between the main terms and LASSO based treatment assignment model, see Figure 4.

Figure 4: Overlaid histograms showing the amount of imbalance in the propensity scores between treated and untreated for the two different sets of covariates, main effects (left) and LASSO (right). This bin size is default from the function hist in R statistical software and this bin size was also used to derive L1 (0.23 for main effects and 0.25 for LASSO).
coef CI UI,

Main effects

0.28 (0.07, 0.48) (-0.11, 0.66)
0.26 (0.05, 0.46) (-0.12, 0.64)
0.27 (0.08, 0.46) (-0.09, 0.63)
0.26 (0.06, 0.45) (-0.11, 0.62)


0.28 (0.07, 0.48) (-0.10, 0.65)
0.24 (0.03, 0.45) (-0.15, 0.63)
0.27 (0.08, 0.46) (-0.09, 0.63)
0.25 (0.05, 0.44) (-0.12, 0.62)
Table 2: Estimates of the effect of regular food intake on change in grip strength, using doubly robust and outcome regression estimators. The confidence intervals are derived assuming ignorability of treatment assignment. The uncertainty intervals are derived assuming that . The estimates are derived with two different treatment assignment models, including main effects (top) and using LASSO as selection method for main effects, interactions and quadratic terms (bottom).

In Table 2 we can see that the estimates of and assuming ignorability of treatment assignment, are significant (95% CI do not cover zero) for all estimators and estimated to between and , which can be compared to , the average decrease in maximum grip strength of the total study sample. All estimates obtained are fairly similar, in particular when compared to the extra variation introduced by the uncertainty in unobserved confounding (compare UIs with CIs). Indeed, the uncertainty intervals assuming contain 0. The bounds corresponds to unobserved confounding explaining, e.g., 2 % of the unexplained variation in the outcome models and the treatment assignment models (see interpration of given in Section 2.2 above). We have no reason to believe that such unobserved confounding is unreasonable. Thus, here, taking into account uncertainty in unobserved confounding yields inconclusive results, i.e. the data does not give us evidence for a positive effect in contrast with the naive conclusion that would typically be taken by only considering sampling uncertainty through classical confidence intervals.

Note, finally, that this analysis has been performed assuming dropout at follow up to be ignorable. Non-ignorable dropout if suspected could be dealt with similarly by introducing a new bias parameter (Genbäck et al., 2015), thereby further increasing even more the uncertainty around the estimates obtained.

6 Discussion

Causal inference from observational data is often based on the assumption of no unobserved confounding variables. This identifying assumption is typically not empirically testable without further assumptions and/or information such as, e.g., the known existence of instrumental variables (de Luna and Johansson, 2014). This paper proposes an inferential approach for outcome regression and doubly robust estimators that takes into account uncertainty on the possible existence of unobserved confounding. The method proposed is computationally fast and easy to apply (the R-package ui is available at

Outcome regression and double robust estimators make model assumptions which, if mistaken, also imply bias. On the other hand, model misspecification can in principle be empirically investigated. In the simulated settings, even though the model misspecification was quite severe for the outcome regression estimator, bias due to unobserved confounding dominated model misspecification bias when and even more so when propensity scores were not too close to zero or one. More generally, while model misspecification can under some assumptions be made arbitrarily small asymptotically (by increasing model complexity), bias/uncertainty due to unobserved confounding remains unchanged and therefore more relevant when increasing sample size.

We have focused on misspecification of the outcome models instead of the treatment assignment model. The latter is not only more challenging theoretically, but more importantly, one can argue that model building is less difficult for the treatment assignment model than for outcome models since for the former all data is available and no extrapolation is performed, while outcome models are fitted only on one sub-sample at a time (e.g., the controls) and are used to extrapolate on the other sub-sample (e.g., the treated). Extrapolations are thus done on part of the sample space which is sparsly populated, hence, where the model specification is difficult to check. Yet, it has been shown that, for double robust estimators, mild misspecification of both models (for treatment assignment and outcome) may lead to large bias in specific situations, in which case regression outcome estimation may be preferable (Kang and Schafer, 2007), or improved versions of the classic double robust estimator used here; see Rotnitzky and Vansteelandt (2015) for a review.

The proposed uncertainty intervals can be used to perform a sensitivity analysis. For example, for all the estimators presented in Table 2, the UIs would approximately be bounded below by zero if constructed using . Thus, the 5% significance conclusion is here sensitive to unobserved confounding of magnitude

. However, our experience is that sensitivity analyses are difficult to communicate to the layman for whom statistical hypothesis testing may already be a difficult concept. We therefore advocate here the more intuitive interval estimation approach, i.e. providing an UI for the effect of interest given some a priori assumption on unobserved confounding and a desired coverage level.


We are grateful to Elena Stanghellini, Arvid Sjölander, Anders Lundquist and Anita Lindmark for helpful comments. This work was supported by the Swedish Research Council for Health, Working Life and Welfare and the Marianne and Markus Wallenberg Foundation.


  • Börsch-Supan et al. (2013) Börsch-Supan, A., Brandt, M., Hunkler, C., Kneip, T., Korbmacher, J., Malter, F., Schaan, B., Stuck, S., and Zuber, S. (2013). Data resource profile: The survey of health, ageing and retirement in Europe (SHARE). International Journal of Epidemiology 42, 992–1002.
  • Copas and Eguchi (2001) Copas, J. and Eguchi, S. (2001). Local sensitivity approximations for selectivity bias. Journal of the Royal Statistical Society: Series B 63, 871–895.
  • Copas and Eguchi (2005) Copas, J. and Eguchi, S. (2005). Local model uncertainty and incomplete data bias. Journal of the Royal Statistical Society: Series B 67, 459–513.
  • Copas and Li (1997) Copas, J. and Li, H. G. (1997). Inference for non-random samples. Journal of the Royal Statistical Society: Series B 59, 55–95.
  • Daniels and Hogan (2008) Daniels, M. J. and Hogan, J. W. (2008).

    Missing Data In Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis

    Chapman and Hall/CRC, Boca Raton.
  • de Luna and Johansson (2006) de Luna, X. and Johansson, P. (2006). Exogeneity in structural equation models. Journal of Econometrics 132, 527–543.
  • de Luna and Johansson (2014) de Luna, X. and Johansson, P. (2014). Testing for the unconfoundedness assumption using an instrumental assumption. Journal of Causal Inference 2, 187–199.
  • de Luna and Lundin (2014) de Luna, X. and Lundin, M. (2014). Sensitivity analysis of the unconfoundedness assumption with an application to an evaluation of college choice effects on earnings. Journal of Applied Statistics 41, 1767–1784.
  • Friedman et al. (2010) Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22.
  • Gale et al. (2007) Gale, C. R., Martyn, C. N., Cooper, C., and Sayer, A. A. (2007). Grip strength, body composition, and mortality. International Journal of Epidemiology 36, 228–235.
  • Gastwirth et al. (1998) Gastwirth, J. L., Krieger, A. M., and Rosenbaum, P. R. (1998). Dual and simultaneous sensitivity analysis for matched pairs. Biometrika 85, 907–920.
  • Genbäck et al. (2015) Genbäck, M., Stanghellini, E., and de Luna, X. (2015). Uncertainty intervals for regression parameters with non-ignorable missingness in the outcome. Statistical Papers 56, 829–847.
  • Gorbach and de Luna (2017) Gorbach, T. and de Luna, X. (2017). Inference for partial correlation when data are missing not at random. ArXiv e-prints .
  • Greenland (2005) Greenland, S. (2005). Multiple‐bias modelling for analysis of observational data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 168, 267–306.
  • Horowitz and Manski (2006) Horowitz, J. L. and Manski, C. F. (2006). Identification and estimation of statistical functionals using incomplete data. Journal of Econometrics 132, 445–459.
  • Iacus et al. (2011) Iacus, S. M., King, G., and Porro, G. (2011). Multivariate matching methods that are monotonic imbalance bounding. Journal of the American Statistical Association 106, 345–361.
  • Imai et al. (2010) Imai, K., Keele, L., and Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods 15, 309–334.
  • Imbens (2003) Imbens, G. W. (2003). Sensitivity to exogeneity assumptions in program evaluation. The American Economic Rewiew 93, 126–132.
  • Josefsson et al. (2016) Josefsson, M., Luna, X., Daniels, M. J., and Nyberg, L. (2016). Causal inference with longitudinal outcomes and non-ignorable dropout: Estimating the effect of living alone on cognitive decline. Journal of the Royal Statistical Society: Series C (Applied Statistics) 65, 131–144.
  • Kang and Schafer (2007) Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double-robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science 22, 523–539.
  • Lin et al. (1998) Lin, D. Y., Psaty, B. M., and Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54, 948–963.
  • Liu et al. (2013) Liu, W., Kuramoto, S. J., and Stuart, E. A. (2013). An introduction to sensitivity analysis for unobserved confounding in nonexperimental prevention research. Prevention Science 14, 570–580.
  • Lunceford and Davidian (2004) Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine 23, 2937–2960.
  • Manski (2003) Manski, C. F. (2003).

    Partial Identification of Probability Distributions

    Springer, New York.
  • Puhani (2000) Puhani, P. (2000). The heckman correction for sample selection and its critique. Journal of Economic Survey 14, 53–68.
  • Robins et al. (2000) Robins, J. M., Rotnitzky, A., and Scharfstein, D. O. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models, pages 1–94. Springer, New York.
  • Rosenbaum (2010) Rosenbaum, P. R. (2010). Design of Observational Studies. Springer, New York.
  • Rosenbaum and Rubin (1983) Rosenbaum, P. R. and Rubin, D. B. (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society. Series B (Methodological) 45, 212–218.
  • Rosenthal (2006) Rosenthal, J. S. (2006).

    A First Look at Rigorous Probability Theory

    World Scientific, London, 2 edition.
  • Rothe and Firpo (2013) Rothe, C. and Firpo, S. (2013).

    Semiparametric estimation and inference using doubly robust moment conditions.

    IZA Discussion Paper No. 7564.
  • Rotnitzky and Vansteelandt (2015) Rotnitzky, A. and Vansteelandt, S. (2015). Double-robust methods. In Molenberghs, G., Fitzmaurice, G., Kenward, M. G., Tsiatis, A., and Verbeke, G., editors, Handbook of Missing Data Methodology, chapter 9, pages 185–212. Chapman and Hall/CRC, London.
  • Rubin (1974) Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688–701.
  • Sayer et al. (2006) Sayer, A. A., Syddall, H. E., Martin, H. J., Dennison, E. M., Roberts, H. C., and Cooper, C. (2006). Is grip strength associated with health-related quality of life? findings from the hertfordshire cohort study. Age and Ageing 35, 409–415.
  • Scharfstein et al. (2003) Scharfstein, D. O., Daniels, M. J., and Robins, J. M. (2003). Incorporating prior beliefs about selection bias into the analysis of randomized trials with missing outcomes. Biostatistics 4, 495–512.
  • Scharfstein et al. (1999) Scharfstein, D. O., Rotnitzky, A., and Robins, J. M. (1999). Adjusting for non-ignorable drop-out using semiparametric non-response models (with discussion). Journal of the American Statistical Association 94, 1096–1146.
  • Stefanski and Boos (2002) Stefanski, L. A. and Boos, D. D. (2002). The calculus of m-estimation. The American Statistician 56, 29–38.
  • Tan (2007) Tan, Z. (2007). Comment: Understanding OR, PS and DR. Statistical Science 22, 560–568.
  • Tsiatis (2006) Tsiatis, A. (2006). Semiparametric Theory and Missing Data. Springer, New York.
  • VanderWeele and Arah (2011) VanderWeele, T. J. and Arah, O. A. (2011). Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology 22, 42–52.
  • Vansteelandt et al. (2006) Vansteelandt, S., Goetghebeur, E., Kenward, M. G., and Molenberghs, G. (2006). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statistica Sinica 16, 953–979.

Appendix A

Lemma 1.

Under Assumption 1b, 2b, 3 and 3

the bias of the ordinary least squares estimate of

, given is:

for ; where , , . The proof follow from and can be found for in Genbäck et al. (2015), the proof when is similar.

Regularity Assumption 1a.

There exist a constant such that and

Regularity Assumption 1b.

and .

Lemma 2.

Under Regularity Assumption 1a:

for any function of and such that .


The first equality follow from:

if there exist a constant such such that then:

The second equality follow from: