On the Distinction Between "Conditional Average Treatment Effects" (CATE) and "Individual Treatment Effects" (ITE) Under Ignorability Assumptions

by   Brian G. Vegetabile, et al.

Recent years have seen a swell in methods that focus on estimating "individual treatment effects". These methods are often focused on the estimation of heterogeneous treatment effects under ignorability assumptions. This paper hopes to draw attention to the fact that there is nothing necessarily "individual" about such effects under ignorability assumptions and isolating individual effects may require additional assumptions. Such individual effects, more often than not, are more precisely described as "conditional average treatment effects" and confusion between the two has the potential to hinder advances in personalized and individualized effect estimation.



There are no comments yet.


page 1

page 2

page 3

page 4


Learning Triggers for Heterogeneous Treatment Effects

The causal effect of a treatment can vary from person to person based on...

A General Weighted Average Representation of the Ordinary and Two-Stage Least Squares Estimands

It is standard practice in applied work to study the effect of a binary ...

Causaltoolbox---Estimator Stability for Heterogeneous Treatment Effects

Estimating heterogeneous treatment effects has become extremely importan...

Estimation of Personalized Heterogeneous Treatment Effects Using Concatenation and Augmentation of Feature Vectors

A new meta-algorithm for estimating the conditional average treatment ef...

Predicting Individual Treatment Effects of Large-scale Team Competitions in a Ride-sharing Economy

Millions of drivers worldwide have enjoyed financial benefits and work s...

Hierarchical Bayesian Bootstrap for Heterogeneous Treatment Effect Estimation

A major focus of causal inference is the estimation of heterogeneous ave...

Experimental Evaluation of Individualized Treatment Rules

In recent years, the increasing availability of individual-level data an...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Achieving personalized, or individualized, effect estimation is an ambitious goal throughout science. Further, the capability to estimate individual effects within populations that generalize to new settings would dramatically alter how we approach medicine (and potentially many other disciplines). To that end, there has been a growing body of literature within the field of causal inference on the estimation of heterogeneous treatment effects (Hill, 2011; Kennedy, 2020) and on optimal learning of individualized treatment policies (Murphy, 2003; Kallus and Zhou, 2021). Similarly, experimental designs such as case-crossover designs (Maclure, 1991; Marshall and Jackson, 1993) have long been used in an attempt to estimate “individual” effects and limit the effects of within-individual variability through experimental design.

Recently though, there has been growing usage of the term “individual treatment effects” (ITE) to describe methods that focus on exploiting heterogeneity among effects within a population (Shalit et al., 2017; Lu et al., 2018) and to nonparametrically estimate effects for individuals conditioned on observed covariates. While these methods all estimate effects conditioned on covariates and thus can be useful for personalizing medicine, the strong assumptions that they employ do not necessarily imply that these are the effects for the individual. In fact, the true effect for an individual may actually differ in both magnitude and direction from those estimated using the approaches described. The goal of this paper is to make clear the distinctions between individual treatment effects (ITE) and conditional average treatment effects (CATE) when strong ignorability assumptions are made.

The arguments made here all relate to the fact that the strong ignorability assumptions (Imbens and Rubin, 2015) employed only guarantee that, given certain covariates, it is possible to ignore other covariates as if they were randomized. In Section 2, notation and general assumptions are described, including a distinction on the differences between “individual effect” estimands and “conditional effect” estimands. Section 3

provides heuristic arguments on the importance of the ignorability assumptions and demonstrates how unobserved covariates (but potentially important variables) can be ignored in an analysis provided the assumptions are met. Section

4 provides a simple example, where in the presence of interactions among an exposure and an unobserved covariate, individual effects are not guaranteed to have even the same sign as conditional average effects. Finally, Section 5 discusses the strong ignorability assumption in context of randomized controlled trials and implications for CATE estimation in this setting.

The hope is that this paper illustrates some of the difficulties in estimating individual effects and how ignorability assumptions and the estimation of conditional effects are not sufficient for individual inference.

2 Notation & Assumptions: Causal Inference Framework


be a binary random variable representing treatment assignment and let

be a random variable that represents the observed outcome. Let

be a vector of pre-exposure covariates

111Note: could also be considered , but the notation above is meant to align with the literature and convey the distinction that are the set that provide ignorability and are hidden/unobserved variables with length where is a vector of length of observed covariates (i.e., collected and available in a data set) and is a vector of of length of unobserved covariates (i.e., is not measured and may be large).

The variables will be assumed to loosely follow the graph in Figure 1 where the set temporally occurs before (and and can be correlated, or exhibit some other stronger causal structure) and within only is used to assign treatment levels. We note that while is unobserved, based on the representation in this graph, this set of covariates does not necessarily represent hidden confounding, in that, given , the distribution of is equivalent to .

Figure 1: Notional Graph Describing the Relationships Among the Variables. Only is used to assign exposure levels and is unobserved but related to the outcome variable .

Throughout we will focus on defining effects in the Neyman/Rubin Potential Outcome Framework (see Imbens & Rubin, 2015, for an introduction) where the potential outcome for each unit under each treatment is defined as . Under this framework it is common to define an individual’s treatment effect as a contrast between potential outcomes, e.g.,


Due to the Fundamental Problem of Causal Inference (Holland, 1986) we can only observe one potential outcome for each individual and thus the framework typically considers how average effects can be identified such as the following estimands,


where is the conditional average treatment effect among individuals with the same vector of covariates and is the average treatment effect over a population represented by the distribution of (Li et al., 2018). The above should make the clear distinction that does not necessarily equal , where the first is an individual’s effect and the second is an average among the population.

Identification of the CATE and ATE within the Neyman/Rubin Potential Outcome Framework (in its simplest form) generally require the following sets of assumptions: 1) “Strong Ignorability” given a set of covariates (in this case ), and 2) the Stable Unit Treatment Value Assumption (SUTVA). These are listed below:

Assumptions 1a & 1b: Strong Ignorability Given a Set of Covariates (Rosenbaum and Rubin, 1983) -

We say that a treatment assignment mechanism is strongly ignorable given a set of covariates if and for any of interest.

The first part requires conditional independence between the treatment and the potential outcomes given the set of covariates and the second part of the assumption is often referred to the positivity assumption. The positivity assumption typically is evaluated using the function referred to as the propensity score (Rosenbaum and Rubin, 1983).

Assumption 2a & 2b: SUTVA - (Rubin, 1980; Imbens and Rubin, 2015)

The treatment assignment of one unit does not affect the potential outcomes of another unit (i.e., no interference among units) and no hidden variability in treatment levels (e.g., each 100mg aspirin tablet is equivalent).

Assumptions 2a and 2b generally are generally used to imply consistency between the potential outcomes and the observed outcomes such that we can assert that enabling estimation of effects from the outcomes.

There are many results that demonstrate that if SUTVA and strong ignorability given holds, i.e., , then both the CATE and ATE are identified and estimable (Imbens and Rubin, 2015; Pearl et al., 2016).

3 On the Ignorability Assumption and Bias

To understand the distinction between an ITE and a CATE, it is important to first understand the role of the ignorability assumption in reducing bias in studies. In this section we provide a heuristic discussion on the ignorability assumption and how it enables estimation, but does not provide a guarantee that estimation is individualistic. What follows is largely similar to the arguments of Cochran & Rubin (Cochran and Rubin, 1973).

Consider the following simple relationship,

The true effect in the population (i.e., the ATE) is , but naive estimation of from a model of may be biased based on the structure present in Figure 1.

Specifically, bias could enter estimation because of potential differences in the covariate distributions between the different treated subpopulations, e.g., differences between and . That is, we have that

Bias could be removed from estimates above in two ways: 1) conditioning on all variables important to the outcome and estimating the response function directly and using this to estimate the difference, or 2) finding a set of variables such that once conditioned upon, it follows that and we can ignore the previous differences between the groups within this variable to integrate over it (and this example the remaining differences, i.e., , will all become zero). It does not mean they are not important in the outcome model regression, just that in the contrast we are estimating they can be ultimately ignored.

For example, if we assume ignorability and condition on and , we have

and here bias is no longer a function of the unobserved covariates. Additionally, we can now integrate this over the distribution of , i.e., , to get the population treatment effect.

Thus, one of the largest benefits of the ignorability assumption is that it would allow researchers to potentially remove bias in estimated treatment effects that could arise from variables that are important for predicting the outcome, but are not possible to control for.

4 A Demonstration With Linear Models

One clear example where a CATE will diverge from an ITE is in a linear model when there is an interaction between the exposure and the unobserved variable . To demonstrate the implications of the ignorability assumption and our ability to estimate the CATE and ITE, a simple simulation based on a model with such interactions is used to further illustrate the distinction between the two estimands.

The focus again is illustrating how ignorability provides consistent estimation for the CATE while providing an ability to ignore important variables in the outcome relationship, but there may remain important differences between the estimated CATE and the true ITE.


and with , and independent error terms . Let the observed outcome variable be defined as

In this process, it is assumed that is unobserved and unavailable in the analysis. But, if is observed and conditioned on, then it is possible to satisfy the strong ignorability assumptions required to estimate the CATE, .

Under consistency assumptions, where , we can analytically derive both the ITE and the CATE, where an individual specific effect would be,

and the conditional average effect in the population would be

due to the fact that

under properties of Gaussian distributions.

Note that in this example we have satisfied strong ignorability, but the conditional average effect is not equal to the individual effect, i.e., . Further, when , it is possible to have a situation where there is a constant CATE (i.e., ), but the individual effect will still vary and could be negative for many individuals due to the large variability in . In the appendix, it is shown that for a general form of this model and data-generating process, that while it is possible that there is no correlation between the CATE and ITE, the ITE and CATE should generally have positive correlation.

We can consistently estimate the CATE by specifying a regression model of the form

where is not included because it is not observed. Under this model, it would follow that an estimate for the CATE would be . Strong ignorabliity implies that we will have consistent estimation for each .

4.1 Simulation Results

To illustrate the point that consistent estimation is possible even in the presence of in the causal structure, the data generating process of the previous section was conducted across replications and each replication contained observations. We present results for two settings of .

The average MSE of the CATE was 0.013 and 0.020 respectively, indicating good performance in estimating the CATE under ignorability assumptions. Figure 2 illustrates the point that an individual’s true effect may differ in both magnitude and direction from an estimated CATE. Though in the case when , the high correlation between and can make estimates of the CATE closer to the ITE. We see though when the estimated CATE is almost a constant while there is a large variability in the true ITE.

Figure 2: Comparing the estimated CATE to the true ITE for two different data generating processes described in Section 4. The diagonal represents the 45 line and estimates along this line would occur when the CATE is equal to the ITE. The blue dots represent when the correlation between and is high, i.e., . The green dots represents a case where the correlation between and completely removes the heterogeneity in the estimated CATE, while large variability remains in the ITE. The red sections represent areas where there is disagreement between the CATE and ITE.

The case when is compelling in that there can be no heterogeneity in the CATE, but large variability in individual effects.

4.2 What if the Unobserved Variable is Not Ignorable?

To demonstrate the distinction of the effect of between cases when it is ignorable and when it confounds relationships, this section presents simulation results when is directly related to in the assignment mechanism, i.e., the propensity score is a function of , i.e., . To induce unobserved confounding, we change the assignment mechanism to be and repeat the simulation with .

The results indicate that what should be expected that results are now biased. The average MSE of the CATE across replications rises to 5.699 (as compared with 0.013 earlier) and Figure 3 demonstrates the extent of the bias. This demonstrates that even knowing the correct form for the CATE, it is possible to have biased estimation if the ignorability assumption is violated (i.e., there is unobserved confounding).

Figure 3: Comparing estimates of the CATE across simulations with the true conditional average effects when the ignorability assumption is violated. The blue region and black lines represents the mean estimates and 95% error band on a grid of points. The dotted line represents the 45 line and if the error band contained the dotted line would represent agreement between estimates and truth.

5 RCTs, Ignorability, and CATE estimation

The previous sections clearly demonstrate the distinctions between the CATE and ITE in observational studies, but a more nuanced distinction about the CATE is that under a completely randomized experiment there are many different estimable CATEs (with potentially many contradictory trends).

For example, consider a completely randomized experiment where the probability of assignment to either treatment is a constant. Further consider a partition of the observed covariates

Under this design, the strong ignorability assumption is satisfied given both sets and and thus we are able to identify and consistently estimate both,


More specifically, for any we would also have identification for

Illustrating that there are a multitude of potential CATEs that might be of interest to study.

To further this point, consider the following outcome function,

where is binary and assigned completely at random and where , , with and . It follows that the individual effect would be

and one version of a CATE that is identified is

and another is

demonstrating that it is possible to have two distinct CATES, both with complex contradictory relationships (one a positive quadratic and the other a negative quadratic). Finally, if were not observed, even in a randomized controlled trial using only an ignorability assumption, again it would not be possible to estimate the ITE.

Estimation of an ITE is complicated by both a need to get the functional form correct, but also a need to collect the right set of variables in the data.

6 Implications

The discussion here does not diminish the importance of individual treatment effect estimation, it just implies that ignorability alone is not sufficient for the estimation of individual effects; further the point should be made that the CATE is not an ITE, though both may be correlated. Both are important estimands and in many cases the scientific question of interest may actually call for precise knowledge of the CATE. Further work should investigate the ability to either analytically bound an estimate of the ITE from the CATE, or work on the development of sensitivity analyses that may be helpful to understand the range of reasonable values for the ITE given an estimated CATE.

Fully establishing identification for the estimation of individual effects would often require additional assumptions (potentially very strong assumptions) such as those in case-crossover designs or experiments that contain within-individual repeated observations (Murphy, 2003). In many cases, individual estimation requires a willingness to make these strong assumptions about the suitability for observations to serve as counterfactuals.

It is important to note that while the best way to satisfy ignorability is through design, simple designs alone are not enough to imply that we have the power to precisely estimate individual effects. Finally, when ignorability is asserted in observational studies it should be done so judiciously and with caveats that accurately capture where inference could be applicable.


The author would like to thank the many researchers that helped to think critically about these ideas as well as the workshop reviewers for their feedback on this draft. In particular, Matthew Cefalu, Daniel McCaffrey, Beth Ann Griffin and Daniel Gillen for their advice and support in preparing this work.

Research reported in this manuscript was supported by NIDA of the National Institutes of Health under award number R01DA045049. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


  • W. G. Cochran and D. B. Rubin (1973) Controlling bias in observational studies: a review. Sankhya: The Indian Journal of Statistics, Series A (1961-2002) 35 (4), pp. 417–446. External Links: ISSN 0581572X, Link Cited by: §3.
  • J. L. Hill (2011) Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20 (1), pp. 217–240. External Links: Document, Link, http://dx.doi.org/10.1198/jcgs.2010.08162 Cited by: §1.
  • P. W. Holland (1986) Statistics and causal inference. Journal of the American Statistical Association 81 (396), pp. 945–960. External Links: ISSN 01621459, Link Cited by: §2.
  • G. W. Imbens and D. B. Rubin (2015) Causal inference for statistics, social, and biomedical sciences: an introduction. Cambridge University Press. External Links: Document Cited by: §1, §2, §2.
  • N. Kallus and A. Zhou (2021) Minimax-optimal policy learning under unobserved confounding. Management Science 67 (5), pp. 2870–2890. Cited by: §1.
  • E. H. Kennedy (2020) Optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497. Cited by: §1.
  • F. Li, K. L. Morgan, and A. M. Zaslavsky (2018) Balancing covariates via propensity score weighting. Journal of the American Statistical Association 113 (521), pp. 390–400. Cited by: §2.
  • M. Lu, S. Sadiq, D. J. Feaster, and H. Ishwaran (2018)

    Estimating individual treatment effect in observational data using random forest methods

    Journal of Computational and Graphical Statistics 27 (1), pp. 209–219. Cited by: §1.
  • M. Maclure (1991) The case-crossover design: a method for studying transient effects on the risk of acute events. American journal of epidemiology 133 (2), pp. 144–153. Cited by: §1.
  • R. J. Marshall and R. T. Jackson (1993) Analysis of case-crossover designs. Statistics in medicine 12 (24), pp. 2333–2341. Cited by: §1.
  • S. A. Murphy (2003) Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65 (2), pp. 331–355. Cited by: §1, §6.
  • J. Pearl, M. Glymour, and N. P. Jewell (2016) Causal inference in statistics: a primer. John Wiley & Sons. Cited by: §2.
  • P. R. Rosenbaum and D. B. Rubin (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70 (1), pp. 41–55. External Links: Document, Link Cited by: §2, §2.
  • D. B. Rubin (1980) Comment on ‘randomization analysis of experimental data: the fisher randomization test’ by d. basu. Journal of the American Statistical Association 75 (371), pp. 591–593. External Links: Document, Link, http://dx.doi.org/10.1080/01621459.1980.10477517 Cited by: §2.
  • U. Shalit, F. D. Johansson, and D. Sontag (2017) Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning, D. Precup and Y. W. Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, pp. 3076–3085. External Links: Link Cited by: §1.

Appendix A Supplement to Section 4

Assume the graph in Figure 1 and let and and assume a correlation between and . Further assume the following model for the data generating process for the outcome variable for each individual ,

Under this model, the ITE for an individual with covariate values and would be

and the CATE would be

To further investigate the relationship between the ITE and the CATE, we can investigate their covariance and correlation. In this population, it follows that the covariance between the CATE and ITE is

and that and . Thus,

Therefore, under this outcome model and data-generating process it follows that

with equality when is equal to zero. That is, when there is no heterogeneity with respect to in the CATE.