# On Associative Confounder Bias

Conditioning on some set of confounders that causally affect both treatment and outcome variables can be sufficient for eliminating bias introduced by all such confounders when estimating causal effect of the treatment on the outcome from observational data. It is done by including them in propensity score model in so-called potential outcome framework for causal inference whereas in causal graphical modeling framework usual conditioning on them is done. However in the former framework, it is confusing when modeler finds a variable that is non-causally associated with both the treatment and the outcome. Some argue that such variables should also be included in the analysis for removing bias. But others argue that they introduce no bias so they should be excluded and conditioning on them introduces spurious dependence between the treatment and the outcome, thus resulting extra bias in the estimation. We show that there may be errors in both the arguments in different contexts. When such a variable is found neither of the actions may give the correct causal effect estimate. Selecting one action over the other is needed in order to be less wrong. We discuss how to select the better action.

There are no comments yet.

## Authors

• 3 publications
• ### Probabilistic Analysis of Balancing Scores for Causal Inference

Propensity scores are often used for stratification of treatment and con...
04/21/2018 ∙ by Priyantha Wijayatunga, et al. ∙ 0

• ### Educational Note: Paradoxical Collider Effect in the Analysis of Non-Communicable Disease Epidemiological Data: a reproducible illustration and web application

Classical epidemiology has focused on the control of confounding but it ...
09/19/2018 ∙ by Miguel Angel Luque-Fernandez, et al. ∙ 0

• ### Estimating Potential Outcome Distributions with Collaborating Causal Networks

Many causal inference approaches have focused on identifying an individu...
10/04/2021 ∙ by Tianhui Zhou, et al. ∙ 0

• ### Treatment effect bias from sample snooping: blinding outcomes is neither necessary nor sufficient

Popular guidance on observational data analysis states that outcomes sho...
07/06/2020 ∙ by Aaron Fisher, et al. ∙ 0

• ### Variable selection for transportability

Transportability provides a principled framework to address the problem ...
12/10/2019 ∙ by Megha L. Mehrotra, et al. ∙ 0

• ### A Potential Outcomes Approach to Answer Reviewing in Multiple-Choice Exams

Does reviewing previous answers during multiple-choice exams help examin...
01/09/2019 ∙ by Yongnam Kim, et al. ∙ 0

• ### Causal Inference through the Method of Direct Estimation

The intersection of causal inference and machine learning is a rapidly a...
03/16/2017 ∙ by Marc Ratkovic, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

For making causal inferences from observational data (see [1] and [2]) it is important to find, ideally all the potential pretreatment confounders of the given causal relation between the cause (treatment variable) and the effect (outcome variable), in order to obtain unbiased causal effect estimate of the former on the latter. Let denote the treatment received by subjects, taking values from the set and let denote the outcome, taking values from the set where denotes failure and denotes success. In potential outcome causal model [1] it is accepted existence of pair of potential outcomes for each subject, where is the outcome that would have been observed had the treatment been for . It is assumed that the pair is independent of the treatment assignment, written as when the treatment assignments are randomized as in case of a randomized experiment. However in observational studies, the treatment assignments are not randomized. Then, useful assumption for causal inference is that the potential outcomes are conditionally independent of the treatment assignment given the pretreatment covariates, say, multivariate . Ideally, denotes ’all’ the potential pretreatment confounders of and and then it is written as . That is, to estimate the causal effect of on , we need to condition on (control for) . However, it is not necessary to consider all the pretreatment confounders but any ’sufficient’ subset of them. Finding such a sufficient set of condounders is somewhat problematic and the potential outcome framework offers no clear way to do it. However, causal graphical modeling framework [2] offers one way that is called ’back door criterion’. It shows how to choose a subset of covariates in order to identify the causal effect (to estimate it without bias). When a causal graphical model is identified on , and all their causal factors the criterion can find a sufficient subset and such a set is called an ’admissible’ or a ’deconfounding’ set in the literature. Considering some covariates as confounders by ignoring such a criterion can sometimes introduce further bias (p. 351 of [2]). However, the back-door criterion is not complete [2]; there exist causal graphical models where the criterion fails for some sets of covariates though adjusting for them results in valid causal effect estimates.

So, the problem of confounder selection is important in casual inference. In the potential outcome causal model, when the analyst has found all the confounders then he/she uses them either directly or indirectly (in so-called propensity score models [3], [4]) for removing induced bias from them. However, any factor that is causing both the treatment and the outcome could be identified relatively easily as pretreatment confounders with subject domain knowledge. For that it is important to decide causal directions among the variables. But there may be other factors such as the ones that are non-causally related with both the treatment and the outcome, for e.g., those with associations. It seems that some researchers tend to use them for conditioning too, for e.g., including them in the propensity score model assuming that it removes the bias due to them. However generally, the causal graphical modelers do not consider them as confounders. Recently there was a debate (see [5] [6], [7] and [8]) on this issue; if it is necessary to condition on a variable that is not causally related with both the treatment and the outcome but associated with both. In the debate, Rubin argues for and Pearl and his colleagues argue against saying that it will only introduce extra bias. Our goal here is to analyze these arguments a little more deeper and to understand when we should condition on them. We use graphical modeling framework to estimate causal effects therefore, we begin by giving some details of it. We argue that in some cases, it is desirable to condition whereas in others, it is not. Mostly the decision should be taken considering strengths of associations of the potential confounder with the treatment and the outcome.

## 2 Covariate Selection for Adjustment of Confounding

We use concept of intervention in causal graphical models (also called do-calculus) described in [2] and [9] for the causal effect estimation. This approach is equivalent to the potential outcome model (see Ch. 7 of [2] and [10]

). To recall the reader with this calculus, first define the probability distribution of a random variable with conditioning by intervention or action on another variable. For an observed random data sample on a vector of random variables, say,

, we can find the joint probability distribution of them, say, . We can have a factorization of ; let it be where } with the exception of (empty set) using some conditional independence assumptions within X. Note that here we denote random variables (or sets of them) by uppercase letters/expressions (such as , etc.) and their values by relevant lowercase expressions (, respectively). For a causal structure on X one can use, for e.g., time order of happening to index the variables such that cause variables have higher indices than those of effect variables’. For any , such that if then the probability distribution of vector of random variables without , say, when is intervened to a particular value of it, say, , written as , denoted by is defined as follows;

 p(x−i|do(Xi=xi)) = p(x)p(xi|pai)=n∏k=1:k≠ip(xk|pak) ≠ p(x)p(xi)=1p(xi)n∏k=1p(xk|pak)=p(x−i|xi)

where the last expression is corresponding conditional probability distribution when we have observed

. That is, generally two probability distributions differ.

Now, consider two different causal relationships between , and : the first one is such that is a cause of both and , and is a cause of which is represented as causal network model shown by left hand side diagram and the second one is such that and are causes of which is represented as a causal network model shown by right hand side diagram in the Figure 1. And if we intervene on as for , then marginal intervention distribution of for the first causal model is whereas that for the second causal model is , since in latter case. And the causal effect of the treatment option compared to the control option is defined as . It is identifiable if is a valid functional for . Then we see that the estimates for the two cases are different.

The above observation can be shown for a more general causal model. Let be according to time order and represent a set variables that causally affect but we are not sure about chronological order of the elements of with for . Let parents (causes) of in X be and that in be , so for and . Then, the joint probability distribution of is and the intervention (on ) distribution is

 p(xp|do(Xi=xi)) = ∑x0,x1,..xi−1,xi+1,...,xp−1,xp+1,...,xnp(x0)n∏j=1j≠ip(xj|pa+j) = ∑x0,x1,..xi−1,xi+1,...,xp−1p(x0)p−1∏j=1j≠ip(xj|Dj)p(xp|Dp,x0) = ∑x0,x1,..xi−1,xi+1,...,xp−1p(x0)p(x1,..xi−1)p(xi+1,....,xp|Di+1,x0) = ∑x0,papp(x0)p(pap∖{xi})p(xp|pap,x0) = ∑x0,papp(xp|xi,pap∖{xi},x0)p(pap∖{xi},x0)

where such that and . This is of the form of where and affects directly and so is on . If we assume that some of the variables in are associated with some of the variables in the vector or, causally related or associated with variables in then the above result holds.

 p(xp|do(Xi=xi)) = ∑x0,papp(xp,xi,pap∖{xi},x0)p(xi,pap∖{xi},x0)/p(pap∖{xi},x0) = ∑x0,papp(xp,xi,pap∖{xi},x0)p(xi|pap∖{xi},x0)=∑x0,papp(xp,xi,pap∖{xi},x0)p(xi|pa−p) = ∑x0,pa−pp(xp,xi,pa−p)p(xi|pa−p)=∑x0,pa−pp(xp|xi,pa−p)p(pa−p)

where . Again, this is in the form of where represents all the direct causal variables common to both and . And above simplifications show that we can select the confounding variable set as follows.

###### Proposition 1

Let denote the set of all potential causal variables of except for and let where is the smallest subset of in some sense. Then the smallest subset of in a similar sense such that is a sufficient set of confounders for estimating .

Here the smallest subset of can be a set of variables whose sum of their configurations is the smallest. This rule gives a simple way to select covariates for removing confounding bias. We avoid the proof of this rule but it is clear from the above discussion. Recall that the back-door criterion is known to be incomplete (see Ch. 11 of [2]) meaning that the criterion fails for some sets of covariates but adjusting for them is sufficient for removing confounding bias. Above rule avoids inclusion of covariates such as instrumental variables, especially for building propensity score models. In fact, in literature sufficient confounder set is selected such that, firstly each confounder in it is a cause of the treatment, and then it is a cause of the outcome [11]. However, it should be done in the other way round; a confounder should be predictive of the outcome first and then it should also predictive of the treatment. Following this order we do not miss any important confounders, since any confounder should be related to the outcome at the first place. For e.g., consider a causal model for estimating causal effect of teacher’s instructional practice () on student’s reading comprehension achievement () as discussed in [12]. It is assumed that the teacher’s reading knowledge is a causal confounder such that it affects directly both and . Furthermore, it is assumed that the teacher’s professional development in reading affects directly to and and the teacher’s general knowledge () affects directly to and . The causal diagram is shown as Model 1 in Figure 2. Then it is easy to see that . And if we believe that and are dependent, for e.g., through a common cause, then we get that when , that is reasonable to assume.

## 3 Associative Confounders

There is a controversy among the research community about kinds of variables that should be considered as confounders for including, especially in the propensity score models in the potential outcome causal model, since therein the causal diagrams showing the causal structure are often not used. In fact, initially the propensity score concept came into light to describe the treatment allocation process [4], [3]. In the current practice some authors argue that all the variables related to outcome should be included in the propensity score model [13] (there can be some redundancy then) whereas others argue that all the variables related to both the treatment and the outcome should be included [14]. However, the problems occur when one finds variables that have non-causal (associative) relationships with the treatment or the outcome. Researchers usually replace any such association between two variables with a causal fork using so-called common cause principle. This is to replace an association with causal relations [7]. Simply, the principle says that a non-causal association between two variables can be replaced by a third variable that is causally affecting the both. For e.g., such an association between two variables and with a model, say, can be replaced by a model, say, where arrows indicate causal relations. Then, is said to be a common cause of and . Note that we omit the possibility of having feedback causal relations.

Now, let we observe a covariate that is non-causally associated with both and , which is the topic of Rubin and Pearl debate. It can be assumed that the non-causal association structures , is embedded in the context and therefore, apply a causal fork to each of the two associations separately. In fact, the argument of [7] and [6] is based on applying two causal forks for the two non-causal associations, one for each, thus making a so-called M-collider [12]. Their model of discussion is the Model A in Figure 3 but the argument is based on the model that is called an M-structure due to its shape. Here and are taken to be independent. An example of this model is given in [15]: measuring causal effect of low education () on later diabetes risk () where it is assumed that mother’s previous diabetes status () is an associative covariate. A medical opinion is that family income during the childhood () is a cause of and , and mother’s genetic risk of diabetes () is a cause of and .

Though and can be independent, it is appropriate to think that it is a special case and generally, there is some dependence between them. In fact, one can just assume it but here we investigate how and when such cases arise and discuss which actions are appropriate then. For Model B of Figure 3, we can write the joint probability distribution of all the variables as , so with intervention , we get . Then,

 p(y|do(z)) =∑u,w,xp(u)p(w)p(x|u,w)p(y|z,w)=∑wp(w)p(y|z,w)=∑wp(w|z)p(y|z,w) =p(y|z)=∑xp(y∣z,x)p(x∣z)≠∑xp(y∣z,x)p(x)

if and, since . Note that we have whenever . That is, the true probability of when is intervened is different from that obtained by conditioning on . And ignoring gives the true intervened probability. So, when assuming , conditioning on may result in a biased causal effect estimate; above inequality shows that the biasness may have caused due to the dependence between and , since , i.e., when and are weakly dependent the biasness is small. Note that if an error is occurred in the estimate of for then it may not necessarily result in an error of same magnitude in causal effect estimate that is , i.e., two errors may result in a different error. Resultant error (bias) due to conditioning on is . For simplicity, we concentrate on errors that can occur in estimation of for . Note that, above discussion is valid when some of other confounders, say, are present where . And in the above analysis we made a strong assumption that , but this may not sometimes be true in reality. In the following section we show this possibility.

### 3.1 Dependence of X with Z and Y

It is natural to consider the cases when then it may be that even if . However, since is hidden it is unclear how to consider this case. In fact, for Model B in Figure 3 we have [16]. Let us assume the case that and are strongly dependent. We use a geometric figure that is used to visualize the Simpson’s paradox [17] to explore this possibility. Let the association between and be such that . Then, for some we have that . Note that there are infinitely many such but they can be artificial unless given some meaningful interpretation, ideally to few of them. Now consider the case of . It is important to note that the value dissects positive length according to ratio ;

 {p(t∣y)+p(t′∣y)}p(x∣y) =p(t′∣y)p(x∣y,t′)+p(t∣y)p(x∣y,t) p(t∣y){p(x∣y,t)−p(x∣y)} =p(t′∣y){p(x∣y)−p(x∣y,t′)} p(x∣y)−p(x∣y,t′)p(x∣y,t)−p(x∣y) =p(t∣y)p(t′∣y)

Now if is a common cause of and association then we should have and . Therefore, in Figure 4 the conditional probabilities in the former equality are vertically aligned, and so are those in the latter. Then we have and dissects positive length according to ratio and similarly for and . In the Figure 4 those ratios are marked with braces. Since the selection of is restricted by the strength of the dependence between and , for a higher value of it, we can have a higher dependence between and . And if the strength of the dependence between and is characterized by then that between and should be also higher. And the other case is similar, i.e., taking .

If is in our causal model in Figure 3, a common cause for the association between and , then the dependences between and , and and can be strong given that the dependence between and is strong. Similarly, a strong association between and implies that those between and , and and can be strong. With similar arguments, these imply that and can be dependent. An alternative way to see that and are not independent when the associations between and , and and are strong is to use correlations. In [18] it is shown that for any three random variables, say, and the correlation coefficients among them satisfy the relationship . If, for e.g., when and then we cannot have and such that . Therefore, when the dependences between and , and and are strong it may be that the introduced two common causes for those associations are dependent. Furthermore, there can be another possibility for these two associations; both associations may be due one cause, i.e., both and refer to the same hidden variable ( in the Model C in Figure 5).

However, current studies are often done without considering these possibilities. But some researchers have shown that conditioning on associated covariates introduces only a small bias. Their claims may be due to these contexts. Sometimes it is advised [8] to control for all the pretreatment covariates but the graphical causal model researchers reject this idea. Therefore, in the next section we take a look at different possibilities of associative covariates and try to understand when the biasness can be amplified.

### 3.2 Deciding on Conditioning

Consider the case of two dependent hidden causes, i.e., (such as Model D in Figure 5) where the dependence is causal or non-causal. Then,

 p(y|do(z)) =∑u,w,xp(u,w)p(x|u,w)p(y|z,w)=∑wp(w)p(y|z,w)=∑w,xp(x)p(w|x)p(y|w,z,x) ≠∑w,xp(x)p(w|z,x)p(y|w,z,x)=∑xp(y|z,x)p(x)

if i.e., , conditioning on does not give the correct probability estimate that is . And ignoring also does not give the correct estimate, since then we get , i.e., we need to assume in order to have the correct probability for the case but we know that , especially when associations between and , and and are strong. That is, to condition on we should have and to ignore we should have . So, the question remains is that which statement should be accepted in order to be more correct against the other; either or . Accepting the former (rejecting the latter) is to condition on and vice versa. But none of the conditions can be tested, since they involve unobservable .

However, with some subject domain knowledge if one can assume meaningful and and then recognize their dependences with (based on those between and , and and ) it may be possible to decide which option can be better. For e.g., if those dependences are not strong and causation of and on is mostly based on explaining away phenomenon [19], then it may not be desirable to condition of . Note that the explaining away phenomenon is that when we see then observing makes lower and vice versa. If conditioned on in this case, then comparative strata of data sample in the causal effect calculation may have imbalances in the causal variables and . This can cause biased causal effect estimates. And when the dependences of and with is assumed to be high then it is less likely that there is an explaining away phenomenon, i.e, most probably the causation is monotonic (when we see then observing makes higher and vice versa) then conditioning on can be beneficial because it results in balances in the causal variables and . Though one can reason about the actions to be taken as done above, it requires extensive simulation studies to confirm them.

Now consider the case of single hidden cause, say, (Model C in Figure 5). Then

 p(y|do(z)) =∑x,vp(v)p(x|v)p(y|z,v)=∑vp(y|z,v)p(v)=∑x,vp(x)p(v|x)p(y|z,v,x) ≠∑x,vp(x)p(v|z,x)p(y|z,v,x)=∑xp(y|z,x)p(x).

Therefore, here also conditioning on does not gives the correct probability estimate that is if , i.e., . But ignoring also does not give the correct estimate as in this case. Since , ignoring means assuming , but we know that and should be dependent. So, similar to the above case, the question remains is that which should be accepted against the other in order to be more correct; either or or . Accepting the former is to condition on and vice versa. But similar to the above case where the dependences are higher, assuming can be better than assuming , therefore conditioning on . If the subject domain knowledge shows that there is a single common cause then it is beneficial to condition on .

## 4 Conclusion

Causal effect estimation tasks from observational data need to consider confounders of the causal relation of interest for controlling for (conditioning on). However, it is not necessary that all of them are considered but a ”sufficient” subset of them. Often the current practice is to select them according to their predictive ability of the treatment firstly and then the outcome. But it should be done other way round; firstly they should be predictive of the outcome and then the treatment. And we show how to handle associative confounders (those are not causing both the treatment and outcome but associated with them) where currently there is no clear consensus about using them. It is often beneficial to condition on associative confounders when they are strongly dependent with both the treatment and outcome whereas it is not so for weakly dependent ones.

## References

• [1] D. Rubin. Causal Inference Using Potential Outcomes: Design, Modeling, Decisions. Journal of the American Statistical Association 100(469) (2005), 322–331.
• [2] J. Pearl, Causality: Models, Reasoning, and Inference, Cambridge University Press, New York, 2009.
• [3] P. R. Rosenbaum and D. B. Rubin, The central role of the propensity score in observational studies for causal effects. Biometrika 70(1) (1983), 41–55.
• [4] P. R. Rosenbaum and D. B. Rubin, Reducing Bias in Observational Studies Using Subclassification on the Propensity Score. Journal of the American Statistical Association 79(387) (1984), 516–524.
• [5] I. Shrier, Letter to the Editor, Statistics in Medicine 28 (2009), 1315–1318.
• [6] J. Pearl, Letter to the Editor, Statistics in Medicine 28 (2009), 1415–1416.
• [7] A. Sjölander, Letter to the Editor, Statistics in Medicine 28 (2009), 1416–1420.
• [8] D. Rubin, Author’s Reply, Statistics in Medicine28 (2009), 1420–1423.
• [9] S. L. Lauritzen and T. S. Richardson, Chain graph models and their causal interpretations. Journal of Royal Statistical Society, Series B 64(3) (2002), 321–361.
• [10] P. Wijayatunga, Causal Effect Estimation Methods. Journal of Statistical and Econometric Methods 3(2) (2014), 153–170.
• [11] T. J. VanderWeele and I. Shpitser, A New Criterion for Confounder Selection. Biometrics 67 (2011), 1406–1413.
• [12] B. Kelcey and J. Carlisle, The Threshold of Embedded M Collider Bias and Confounding Bias. Society for Research on Eductaional Effectiveness Conference
• [13] D. B. Rubin, Matching using estimated propensity scores: related theory and practice. Biometrics 52 (1996), 249–264.
• [14] S. M. Perkins, W. Tu, M. G. Underhill, X. H. Zhou and M. D. Murray, M. D. The use of propensity scores in pharmacoepidemiologic research. Pharmacoepidemiol Drug Safety 9 (2000), 93–101.
• [15] L. Dallolio, R. Bellocco, L. Richiardi and M. P. Fantini, M.P. Using directed acyclic graphs to understand confounding in observational studies. Biomedical Statistics and Clinical Epidemiology 3(2) (2010), 89–96.
• [16] Lauritzen, S. L., Dawid, A. P., B. N. Larsen, and H.-G. Leimer, Independent Properties of Directed Markov Fields Networks, 20 (1990), 491–505.
• [17] P. Wijayatunga, Viewing Simpson’s paradox. Statistica & Applicazioni, XII(2) (2014), 225–235.
• [18] E. Langford, N. Schwertman and M. Owens, Is the property of being positively correlated transitive? The American Statistician 55(4) (2001), 322–324.
• [19] J. Pearl, Probabilsitic Reasoning in Intelligent Systems: Networks of Plausible Inference (Second Edition), Morgan Kauffmann, San Mateo, CA., 1988.