Entropy Balancing for Generalizing Causal Estimation with Summary-level Information

by   Rui Chen, et al.

In this paper, we focus on estimating the average treatment effect (ATE) of a target population when individual-level data from a source population and summary-level data (e.g., first or second moments of certain covariates) from the target population are available. In the presence of heterogeneous treatment effect, the ATE of the target population can be different from that of the source population when distributions of treatment effect modifiers are dissimilar in these two populations, a phenomenon also known as covariate shift. Many methods have been developed to adjust for covariate shift, but most require individual covariates from the target population. We develop a weighting approach based on summary-level information from the target population to adjust for possible covariate shift in effect modifiers. In particular, weights of the treated and control groups within the source population are calibrated by the summary-level information of the target population. In addition, our approach also seeks additional covariate balance between the treated and control groups in the source population. We study the asymptotic behavior of the corresponding weighted estimator for the target population ATE under a wide range of conditions. The theoretical implications are confirmed in simulation studies and a real data application.



There are no comments yet.


page 1

page 2

page 3

page 4


Robust Estimation of the Weighted Average Treatment Effect for A Target Population

The weighted average treatment effect (WATE) is a causal measure for the...

A Generalizability Score for Aggregate Causal Effect

Scientists frequently generalize population level causal quantities such...

One-step weighting to generalize and transport treatment effect estimates to a target population

Weighting methods are often used to generalize and transport estimates o...

Designing Transportable Experiments

We consider the problem of designing a randomized experiment on a source...

Targeted Optimal Treatment Regime Learning Using Summary Statistics

Personalized decision-making, aiming to derive optimal individualized tr...

Covariate Selection for Generalizing Experimental Results

Scientists are interested in generalizing causal effects estimated in an...

Generalizing a causal effect: sensitivity analysis and missing covariates

While a randomized controlled trial (RCT) readily measures the average t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

It is often of interest to apply causal findings, such as average treatment effect (ATE), of a medical study in one population to another population based on observed characteristics (Colnet et al., 2020; Degtiar and Rose, 2021). This problem is termed generalizability (Cole and Stuart, 2010; Tipton, 2013; Buchanan et al., 2018), external validity (Rothwell, 2005), or transportability (Rudolph and van der Laan, 2017; Dahabreh and Hernán, 2019). It is well-known that such generalization may be problematic when the treatment effect is heterogeneous and there is covariate shift in treatment effect modifiers (Sugiyama et al., 2007). Covariate shift refers to the shift or difference in the distribution of a covariate.

For example, ATE from a properly planned and conducted randomized trial may not be generalizable when the treatment effect depends on certain covariates and these covariates can have different distributions between the study population and the target population for generalization. In other words, we could obtain an unbiased estimate of ATE for the trial population, but the estimate may not equal the ATE of the target population if the study participants do not represent the target population well with respect to treatment effect modifiers.

In the past decade, a common setup for such causal generalization research is based on the scenario that treatment assignment, outcome and covariates are fully observed for the source population, while only covariates are available for the target population (Cole and Stuart, 2010; Tipton, 2013; Rudolph and van der Laan, 2017; Buchanan et al., 2018; Dahabreh et al., 2020; Lu et al., 2021)

. Under this scenario, most existing methods rely on modeling trial participation probability, which reflects similarity between subjects in the two populations. The probability is modeled using individual-level data and the estimated probability is then used in the subsequent analysis for reweighting

(Cole and Stuart, 2010; Buchanan et al., 2018) or post-stratification (Cole and Stuart, 2010). In addition, the methods typically require the source sample to be representative of the target population.

Some existing methods also incorporate outcome modeling to improve estimation efficiency (Rudolph and van der Laan, 2017; Dahabreh et al., 2020; Yang et al., 2020). Similarly, although the outcome models can be estimated with the source population data, the fitted models need to be applied on individual-level target population data.

Nevertheless, detailed individual-level information is not always available due to many practical reasons such as restricted data sharing, storage limitation, and privacy concerns (Degtiar and Rose, 2021). In contrast, summary-level information of a target population is more easily accessible. Such information can be collected from population-based census data, disease registries and health care databases, and published literature.

In this paper, we develop a weighting strategy that can incorporate summary-level data from the target population (Hartman et al., 2015; Westreich et al., 2017). We propose to enhance the entropy balancing weighting framework (Hainmueller, 2012) with additional treatment-control balancing constraints. Since these constraints only consider covariate balance between the treated and control groups, they allow a much more flexible choice of covariate functions and thus can better utilize the source population data. Our theoretical results show that the proposed method can not only achieve consistent estimation under significantly broader situations, but also result in higher estimation efficiency. The proposed method is particularly appealing when the treatment assignment mechanism in the source population data or the potential outcome models is complex, but the effect modification structure is relatively simple.

This paper is organized as follows. In Section 3, we develop our weighting approach by extending the entropy balancing method. In Section 4

, we characterize the theoretical properties of the proposed method, including the limiting function of the weights, and the consistency conditions and asymptotic variance of the corresponding estimator. Sections

5 and 6 compare our method to other weighting methods by simulation studies and a real data application, respectively. We conclude the paper with a discussion in Section 7.

2 Notations and Framework

Suppose we have collected data from subjects, , from a source population .

denotes a vector of pre-treatment covariates which contain confounding factors and treatment effect modifiers,

is a binary treatment indicator and is the outcome of interest. Additionally, suppose we have access to first moments of a set of linearly independent covariate functions from a target population. We assume the first moments are computed on a representative sample consisting of subjects from the target population. But the individual data from is not available for analysis. Namely, we only have the following information from the target population:


We use the potential outcome framework (Rubin, 1974; Rosenbaum and Rubin, 1983) to formulate the causal problem. Under the Stable Unit Treatment Value Assumption (SUTVA), which posits no interference between different subjects and no hidden variation of treatments, each subject has two potential outcomes and , the values of the outcome that would be observed if were to receive control or treatment, respectively. Then the observed outcome in the source sample is . For subjects in the target sample, neither of the potential outcomes is observed.

We associate each subject, either in the source or target sample, with a “full” random vector , where is a population indicator such that for and for . For , can take arbitrary value and will not affect the following analysis. The total sample size is . These random variates across

are assumed to be i.i.d. draws from a joint distribution of

. All the probability calculations and expectations below are taken with respect to this distribution. Specifically, the ATE of the target population can be expressed as

which is the estimand of interest in this paper.

We assume that the treatment assignment mechanism in the source sample is determined by a propensity score (Rosenbaum and Rubin, 1983), which is potentially unknown. We further denote and refer to this as participation probability (Dahabreh et al., 2020). In addition to the SUTVA, we impose the following identifiability assumptions throughout the paper.

Assumption 1.

(Unconfoundedness of treatment assignment) In the source population, are conditionally independent of given : .

Assumption 2.

(Positivity of propensity score) The propensity score of the source population is bounded away from 0 and 1: for some , almost surely.

Assumption 3.

(Mean exchangability across populations) The conditional mean of the potential outcomes given the covariates are equal between the two populations: almost surely for .

Assumption 4.

(Positivity of participation probability) The participation probability is bounded away from 0: almost surely for some .

The first two are common assumptions in causal inference, and together with the SUTVA enable identification of causal quantities with respect to the source population from the observed data. The last two assumptions are adopted from Rudolph and van der Laan (2017) and Dahabreh et al. (2020) and allow us to generalize causal estimates to the target population (Colnet et al., 2020).

In what follows, we use to denote the index set of the subjects in the control group of the source sample, namely, ; is similarly defined for the treated group. We assume the potential outcomes have finite second moments given the covariates, and denote the conditional mean and variance of the potential outcomes in the source population by

Under Assumption 3, . The conditional average treatment effect (CATE) function is defined as .

3 Methodology

Dahabreh et al. (2020) discussed commonly used estimators for , the ATE on the target population, including the outcome regression estimator and the doubly robust estimator. However, these methods involve applying fitted outcome models on the target data, which is generally infeasible in the current setting.

We restrict our focus to weighting estimators for . Under Assumptions 1-4, we can express in terms of the observable:


where the weighting function is given by

Therefore given a set of weights on the source population data , a possible estimator can take the following form:


As depends on (and ), directly modeling is also infeasible. The idea is to use the target summary information (1) to calibrate the weights.

This calibration idea for weighted estimators has been extensively studied for causal inference (Hainmueller, 2012; Imai and Ratkovic, 2014; Chan et al., 2016). In the usual setting without the need for causal generalization, Hainmueller (2012) proposed to construct the weights for the control units as the solution of the following optimization problem (for the treated units, the weights are taken as ):

subject to

To normalize the weights so that , we include as one of the constraints. The balancing constraints equalize the sample means of the covariate functions between the treated group and the weighted control group; in the meantime, the optimization objective keeps the dispersion metric, the opposite of entropy, to a minimum level so that the weights are as close as uniform as possible. Zhao and Percival (2016) investigated the theoretical properties of the weights given by (4

) and the corresponding ATT estimator. They showed the estimation enjoys the so-called doubly robustness property: if either the potential outcome models are linear in the covariate functions being balanced, or the logit of the propensity score is linear in these functions, then the estimation is consistent. Further, if both conditions hold, then the semiparametric efficiency bound in

Hahn (1998) is achieved.

3.1 Existing weighting approaches for causal generalization

In this section, we review existing entropy balancing and model-based weighting approaches for causal generalization that are directly related to our setting. Our method, which extends entropy balancing weights, will be introduced in the next section.

Dong et al. (2020) adapted the entropy balancing weights approach for generalizing ATE estimation from a randomized controlled trial (RCT) to a given target population. Since in an RCT the covariates of the treated and control groups are well-balanced by design, they did not distinguish these two groups in their weighting strategy and proposed to construct weights on the entire source sample by

subject to

So the weights calibrate the sample averages of the covariate functions on the source sample to those on the target sample. By a duality argument, they showed that the solution to (5) admits the form of exponential tilting: for some such that the balancing constraints are satisfied. These weights take the same form as the exponential tilting adjustment method considered in Signorovitch et al. (2010). In fact, the entropy balancing method based on (5) is equivalent to calibrating the covariate shift with an exponential tilting because the estimating equations that Signorovitch et al. (2010) utilized to estimate their exponential tilting parameters are exactly the balancing constraints in (5). Using an application of generalizing the ATE of Rosuvastatin from an RCT to a target population, Hong et al. (2019) showed that the exponential tilting adjustment approach yielded close estimates to the one using individual-level target sample data for covariate shift adjustment.

Josey et al. (2020) then extended the entropy balancing weights approach to the setting when the source sample is from observational studies. To alleviate possible covariate imbalance or confounding between the treated and control groups in observational studies, they proposed a two-step procedure to adjust for covariate shift and confounding separately. They first computed by solving (5), and then used a subsequent step to further adjust for the treatment-control imbalance:

subject to

So the constraints in (6) calibrate the treated and control groups to the target population separately. They showed the resulting weights can also be written as exponential tilting: , . Both Dong et al. (2020) and Josey et al. (2020) established double robustness properties of the weighting estimators similar to those in Zhao and Percival (2016).

Alternatively, one could also use modeling-based weighting method to fully utilizes the individual-level source sample. Specifically, we can use the source sample to estimate the propensity score first, and then set the weights as for the treated units and for the control units. Since (5) is equivalent to modeling with exponential tilting, the weights constructed in this way constitute estimates of . However, this approach is vulnerable to estimation error in the estimated propensity score and can result in extreme weights as one may encounter in the conventional inverse propensity score weights (Kang et al., 2007; Chattopadhyay et al., 2020). In Section 5, we systematically compare this method with the proposed weights (8), and the results suggest that our entropy balancing weighting approach leads to more favorable performance.

3.2 Enhance entropy balancing with additional balancing constraints

We first show that the weights produced by the two-step procedure of Josey et al. (2020) can be consolidated into a one-step procedure. The simplified procedure simply takes as 1 in (6). Note that in this case, the right-hand sides of the constraints in (6) become . Hence, the simplified procedure computes the weights by

subject to

The equivalence between the two-step procedure of Josey et al. (2020) and (7) is based on the dual representations of and in (6). To see this, first note that the solutions from (6) take the exponential tilting form, for and that the weights given by (7) also take the exponential tilting form . So these two sets of weights have the same parametric form. Further, they satisfy the same set of constraints. When is linearly independent, we must have . Therefore, (7) produces exactly the same weights as (6) without first computing .

From (7) we see that Josey et al. (2020) try to overcome covariate imbalance between the treatment and control groups in the source population by calibrating based on the covariate functions . However, these functions are made available to provide summary information from the target sample. They may not be flexible and geared towards overcoming confounding of treatment assignment in the source population. In other words, balancing only on may not provide sufficient calibration of covariate balance between treatment and control groups in the source population.

Therefore we propose to add further balancing requirements between the treated and control groups in the source population to (7) by utilizing an additional set of functions . This gives rise to the following:

subject to

Without loss of generality, we assume the union set of covariate functions is linearly independent. To choose possible , one can use those covariate functions that are deemed important for confounding but not covered in . Our theoretic results in Section 4 will give more guidance on the selection of . It is worth noticing that (7) can be viewed as a special case of (8) with .

To better understand the role of the weights given by (8) and facilitate theoretical investigation, the following proposition characterizes the dual problem. The proof of this proposition as well as other results in Section 4 are relegated to the Supplementary Materials.

Proposition 1.

Let and . The solution of (8) takes the following form:

where is the solution to the dual problem:


where .

Similar to the previous entropy balancing approaches, the weights given by (8) also admit the form of exponential tilting, which is parameterized by a dual vector of length , the number of balancing constraints.

Proposition 1 also reveals an efficient way to solve (8). Since the dual problem is an unconstrained convex optimization and has a closed-form derivative, it can be efficiently solved by common convex optimization algorithms such as the Newton-Raphson method. We note that the first-order optimality conditions of the dual problem (9) are exactly the balancing constraints in (8).

Compared to the weights given by (7), the extended version (8) leads to better versatility as it allows the weights to depend on , which can be a much broader set of covariate functions than . However, unlike the association between the weights and where coefficients on for the treated group can be completely different from those for the control group, the association between the weights and is governed by a special structure: the coefficients for the two groups must be opposite of each other.

4 Theoretical properties

In this section, we study the theoretical properties of using weights given by the extended entropy balancing method (8). We assume the following regularity conditions:

  1. [leftmargin=3ex]

  2. , where is a compact set and stands for the interior of a set.

  3. There exists a constant such that and .

We first introduce the following lemma to characterize convergence of the weights.

Lemma 1.

Suppose for some . Let be the dual parameters in Proposition 1. Then for some satisfying where


It follows from Proposition 1 and Lemma 1 that the probability limits of the weights for the treated units can be obtained by the following function:

Similarly, the limits of weights for the control units can be obtained from

. In other words, when the propensity score follows the usual logistic regression w.r.t. the covariate functions

, the weights converge to the inverse of the propensity score multiplied by . Note that is a positive function and satisfies , so defines a density ratio over the source population. This means the weights asymptotically calibrate the observed source population data to a hypothetical population whose density ratio against the source population is .

However, this hypothetical population is not necessarily the target population. Since the only available information from the target population is the sample means on , the covariate distribution of the target population is unidentifiable in general. Intuitively, our weighting approach essentially utilizes a parametric family of distributions to match these sample means. With this insight, the following theorem identifies conditions under which the resulting weighting estimator is consistent for .

Theorem 1 (Consistency).

Suppose is the solution of (8). If either of Conditions (a)-(c) below holds, is a consistent estimator of :

  1. [label=Condition ()., itemsep=0pt, leftmargin=6em]

  2. , .

  3. , and for some the density ratio between the target population and the source population can be written as where satisfy .

  4. and .

Note that Theorem 1 requires only one of the Conditions (a)-(c). Conditions (a) and (b) are similar to the requirement for the double robustness property established in the previous entropy balancing works (Zhao and Percival, 2016; Dong et al., 2020; Josey et al., 2020). Condition (c) identifies another situation that consistent estimation can be achieved. This condition has an important implication for our generalization setting because the first part of the condition is likely to hold with some careful choice of and the second part only requires the sample means of the effect modifiers from the target population contained in . Intuitively, since the difference in ATEs between the source and the target populations arises from their differences in effect modifier distributions, calibrating the estimate can only be successful when there is sufficient information about the effect modifiers from the target population. In this sense, the second part of Condition (c) is a minimal requirement for consistency.

As (7) is a special case of (8) with being a null set, the theorem also applies to the estimates from Josey et al. (2020). In this case, Conditions (b) and (c) require to be linear in . These conditions become more stringent as may be conveniently chosen and may not contain all the important confounders. On the contrary, the additional balancing constraints in (8) create the possibility of adjusting for confounding on an additional set of covariates that can be carefully chosen.

Theorem 2 (Asymptotic variance).

Suppose is the solution of (8). Assume either Conditions (b) or (c) in Theorem 1 holds. Then the asymptotic variance of with given by (8) is


Here , is defined as in (10), and . denotes projection on w.r.t. covariate distribution . Specifically,

and are defined in a similar manner. denotes projection on , where .

The first term of the asymptotic variance (11) is equal to

when Condition (b) holds. If Condition (c) holds, obviously . Therefore, when Conditions (b) and (c) both hold, the sum of first two terms of (11) becomes

This is exactly the semiparametric efficiency bound for estimating if individual-level target sample data is available. Note that the third term of (11) is always non-negative, so when Conditions (b) and (c) hold, this term quantifies how much the asymptotic variance exceeds the efficiency bound. If Condition (a) also holds, it is easy to check that the third term vanishes and thus the efficiency bound is achieved by the proposed method.

The asymptotic variance also characterizes the efficiency gain from the additional balancing constraints between the treated and control groups on . When Condition (c) holds, . Then by we have . Hence,

where the projection on . Similarly, implies . Since we conclude that (8) results in a more efficient estimation than (7) even when is not related to the propensity score.

Such a result is not surprising — it has been well noted in the causal inference literature that inclusion of covariates that affect the outcomes but are not necessarily related to treatment assignment can lead to efficiency gains (Brookhart et al., 2006; Shortreed and Ertefaie, 2017). This result also suggests that the proposed method can also improve the precision of generalizing effect estimation from an RCT to a target population, especially when the outcomes are largely affected by some covariates. Although by randomization these covariates follow the same distribution across the treated and control groups, under finite samples covariate imbalance often occurs due to sampling error (Li and Ding, 2020). The proposed method provides a strategy to tackle such imbalance without resorting to regression adjustment.

5 Simulation studies

In this section, we conduct simulation studies to evaluate the performance of the proposed method in finite sample settings. In our simulation setup, we generate covariates

from a uniform distribution on

. The participation probability is set as , where , so the covariates have different distributions across the source population and the target population. We suppose the only available information from the target population is the sample means of and ; in other words, . We consider balancing on the first moments of all covariates, so .

To test the performance under various scenarios, we consider three propensity score models for treatment assignment:

  1. [label=(P0), leftmargin=6ex]

  2. ; (linear, only related to )

  3. ; (linear, related to and )

  4. . (nonlinear, related to and )

In the first scenario, the logit of the propensity score is linear in , so covariate balancing on suffices to account for confounding. In the last two scenarios, the propensity score is also related to . Furthermore, the third setting contains a nonlinear term.

The observed outcomes in the source sample are generated as . Here , and the other functions are designed as follows. The CATE function is set as

Thus, all the effect modifiers are contained in . As for , we consider the following two settings:

  1. [label=(M0), leftmargin=6ex]

  2. ,

  3. .

Under the first setting of , all the potential outcome models are linear in the covariates, so Condition (a) in Theorem 1 is satisfied. Under the second setting, however, this condition no longer holds.

For comparison, we consider other methods for constructing weights, including both modeling and balancing approaches. First, we consider the conventional inverse probability weight (IPW) based on modeling the propensity score. We use the source sample to fit a logistic regression for against all the covariates, including and . Then the weights are computed as for the treated units and for the control units. This method is labeled as IPW. Note that the IPW approach does not incorporate the sample means of the target sample to adjust for covariate shift. As discussed in Section 3.1, the covariate shift adjustment approach of Dong et al. (2020) is equivalent to modeling with exponential tilting, so we can combine it with the IPW method. Specifically, we compute by solving (5), and then set the weights as for the treated units and for the control units. This method is labeled as IPW+ET, where “ET” stands for exponential tilting. The last comparator method, labeled as EBAL, uses the weights proposed by Josey et al. (2020), which are given by (7). Each set of weights is normalized so that the sums of the weights within the treated and control groups are both equal to . ATEs are estimated from (3). The total sample size is set as .

The performance is measured in terms of estimation error with respect to . Figure 1 plots the estimation errors over 400 independent runs in boxplots. The conventional IPW gives biased results under all scenarios because it does not account for covariate shift. Although has different distribution across the source population and the target population, only the covariate shift on the effect modifiers, i.e., and , needs to be adjusted. The entropy balancing weights given by (7) can serve this end and adjust for confounding simultaneously when all the confounders are contained in (P1). However, if misses some confounders (P2 or P3), EBAL no longer gives consistent estimates. In contrast, our extended entropy balancing strategy is able to retain consistency under a wider range of situations. Moreover, when the consistency condition for (7) holds, the proposed method can achieve higher efficiency. The efficiency improvement comes from balancing on covariates that relate to the outcome models even though they are not in the propensity score ( and in this case).

Although IPW+ET can also achieve consistent estimation in many scenarios like the proposed method, the result is not as efficient as the proposed method. In a number of simulation runs, this method produces rather unstable estimates, even when the propensity score models are perfectly specified. Such instability is because inverting the probability estimates could potentially inflate the estimation error when the estimated probability is close to 0 or 1. In contrast, the balancing-based weighting approaches generally lead to more stable estimation.

Figure 1: Boxplots of estimation errors under various simulation scenarios.

6 Application: TTEC and mortality in sepsis

In this section, we illustrate the proposed method for evaluating the treatment effect of transthoracic echocardiography (TTEC) for intensive care unit (ICU) patients with sepsis. We use the same dataset in Feng et al. (2018), which is derived from the MIMIC-III database (Johnson et al., 2016)

. This is an observational dataset consisting of 6361 ICU patients. Among the patients, 51.3% had TTEC performed during or in the period less than 24 hours before their ICU admission. We use 28-day survival as the outcome. The dataset contains 17 baseline variables: age (range 18 - 91), gender, weight; severity at admission, which is measured by the simplified acute physiology score (SAPS), the sequential organ failure assessment (SOFA) score, the Elixhauser comorbidity score, comorbidity indicators, including congestive heart failure, atrial fibrillation, respiratory failure, malignant tumor, vital signs, including mean arterial pressure, heart rate and temperature; laboratory results, including platelet count, partial pressure of oxygen, lactate, and blood urea nitrogen. The distributions of lab results are right-skewed, so we apply log transformations on these variables. All the continuous variables are then standardized for further analysis. We impute any missing values using MissForest

(Stekhoven and Bühlmann, 2012), which is a flexible non-parametric missing value imputation approach.

We assume only the average values of the demographic covariates (age, gender, weight) and comorbidity indicators (congestive heart failure, atrial fibrillation, respiratory failure, malignant tumor) are available from the target population.

In order to evaluate the generalization performance under a wide range of settings, we use the following sampling design to induce different levels of covariate shift and confounding, but preserve the covariate-outcome relationship in the real data. First, we sample 40% of the data to construct the source sample, and the remaining data is evenly, randomly broken into a target sample and a test sample. In this way, the test sample follows the same distribution as the target sample. The probability of being selected as a source sample is proportional to

where and is the standard normal cumulative probability function. Here cmb’s represent the comorbidity indicators and is a parameter to induce different levels of covariate shift. We set for small covariate shift and for large covariate shift. Under this sampling design, the target population is older and more likely to have comorbidities than the source population. Among the 40% of the sampled data, we randomly select 50% of the TTEC patients and 50% of the non-TTEC patients to form the source sample. The treated units are selected with probability proportional to while the control units are with probability proportional to , where is set as

We consider two choices of : (a) , so all the patients in this step are sampled with equal probability; (b) , which induces additional confounding determined by a linear combination of the severity scores.

Method Setting 1 Setting 2 Setting 3 Setting 4
  • Setting 1 (): small covariate shift, no extra confounding;
    Setting 2 (): small covariate shift, extra confounding;
    Setting 3 (): large covariate shift, no extra confounding;
    Setting 4 (): large covariate shift, extra confounding.

Table 1: Data analysis results (the biases and RMSEs are multiplied by 100).

So in total we have considered 4 settings; under each one, we run 800 replications of the sampling procedure. For each replication, we apply the proposed method and the comparator methods in Section 5 to estimate the ATE for the target population. To obtain a benchmark for these results, we compute an oracle estimate by constructing the conventional IPW estimator using the test data. This would be closer to the actual ATE of the target population as it is estimated using the individual-level target population data. Then the estimation errors are recorded as the difference between the generalization estimates and the oracle estimate. Table 1 summarizes the mean value of the estimation errors as bias, and the root mean square of them as “rmse”. As we can see, the proposed method usually achieves the smallest bias and rmse among all the weighting methods, especially when there is a substantial difference between the source and target populations.

7 Discussion

In this paper, we have proposed a covariate balancing weighting approach for estimating the ATE for a given target population under the situation where only summary-level data from the target population is available. The proposed method is motivated by the recently developed entropy balancing methods (Hainmueller, 2012; Dong et al., 2020; Josey et al., 2020), but is extended with additional treatment-control balancing terms over a broader set of covariate functions. The consistency conditions and asymptotic variance of the corresponding weighting estimator are characterized. There are three main benefits of the proposed method. First, the only required inputs from the target population are the sample mean of some covariate functions rather than individual-level data, which allows the method to be applied to many practical settings as such summary-level data is usually easier to collect. Second, compared to the other covariate balancing weighting methods, our method allows more flexibility in the selection of covariate functions to balance, not limited to the available information from the target population. As a result, our method can not only achieve consistent estimation under broader scenarios, but also better estimation efficiency. Third, compared to the weighting methods based on probability modeling, our method directly constructs the weights and thus attenuates the instability of inverse weighting due to modeling error. The numerical results confirm the advantages of our extended entropy balancing method.

Throughout this paper we have focused on seeking covariate balancing on two fixed sets of functions and . In practice, while the choice of is determined by what information is available from the target population, the specification of can be a subtle issue. If contains too few functions, it may not be rich enough to account for confounding; on the other hand, if contains too many functions, the linear constraints in (8) may not admit a feasible solution. There are a few extensions that we can consider to this end. First, we may allow the dimension of to grow with the source sample size, for example, using the method of sieves, similar to Chan et al. (2016). Also, to ensure the feasibility of the balancing constraints, we can seek approximate balance instead of exact balance. Wang and Zubizarreta (2020) showed that by allowing small imbalance, one could incorporate more covariate functions into the balancing constraints and lead to possibly preferable results. Alternatively, we can utilize the kernel balancing idea in Wong and Chan (2018) and minimize the treatment-control imbalance over a non-parametric function class. These are all interesting future directions to explore.

Supporting Information

Proof of theorems in Section  4 are included in the Supplementary Materials.


Research reported in this work was funded through a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-2018C2-13180). The views in this work are solely the responsibility of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute (PCORI), its Board of Governors or Methodology Committee.
Conflict of Interest: None declared.


  • Brookhart et al. (2006) Brookhart, M. A., Schneeweiss, S., Rothman, K. J., Glynn, R. J., Avorn, J., and Stürmer, T. (2006). Variable selection for propensity score models. American Journal of Epidemiology 163, 1149–1156.
  • Buchanan et al. (2018) Buchanan, A. L., Hudgens, M. G., Cole, S. R., Mollan, K. R., Sax, P. E., Daar, E. S., Adimora, A. A., Eron, J. J., and Mugavero, M. J. (2018). Generalizing evidence from randomized trials using inverse probability of sampling weights. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181, 1193–1209.
  • Chan et al. (2016) Chan, K. C. G., Yam, S. C. P., and Zhang, Z. (2016). Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78, 673–700.
  • Chattopadhyay et al. (2020) Chattopadhyay, A., Hase, C. H., and Zubizarreta, J. R. (2020). Balancing vs modeling approaches to weighting in practice. Statistics in Medicine 39, 3227–3254.
  • Cole and Stuart (2010) Cole, S. R. and Stuart, E. A. (2010). Generalizing Evidence From Randomized Clinical Trials to Target Populations: The ACTG 320 Trial. American Journal of Epidemiology 172, 107–115.
  • Colnet et al. (2020) Colnet, B., Mayer, I., Chen, G., Dieng, A., Li, R., Varoquaux, G., Vert, J.-P., Josse, J., and Yang, S. (2020). Causal inference methods for combining randomized trials and observational studies: a review. arXiv preprint arXiv:2011.08047 .
  • Dahabreh and Hernán (2019) Dahabreh, I. J. and Hernán, M. A. (2019). Extending inferences from a randomized trial to a target population. European journal of epidemiology 34, 719–722.
  • Dahabreh et al. (2020) Dahabreh, I. J., Robertson, S. E., Steingrimsson, J. A., Stuart, E. A., and Hernan, M. A. (2020). Extending inferences from a randomized trial to a new target population. Statistics in Medicine 39, 1999–2014.
  • Degtiar and Rose (2021) Degtiar, I. and Rose, S. (2021). A review of generalizability and transportability. arXiv preprint arXiv:2102.11904 .
  • Dong et al. (2020) Dong, L., Yang, S., Wang, X., Zeng, D., and Cai, J. (2020). Integrative analysis of randomized clinical trials with real world evidence studies. arXiv preprint arXiv:2003.01242 .
  • Feng et al. (2018) Feng, M., McSparron, J. I., Kien, D. T., Stone, D. J., Roberts, D. H., Schwartzstein, R. M., Vieillard-Baron, A., and Celi, L. A. (2018). Transthoracic echocardiography and mortality in sepsis: analysis of the mimic-iii database. Intensive Care Medicine 44, 884–892.
  • Hahn (1998) Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica pages 315–331.
  • Hainmueller (2012) Hainmueller, J. (2012). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis 20, 25–46.
  • Hartman et al. (2015) Hartman, E., Grieve, R., Ramsahai, R., and Sekhon, J. S. (2015). From sample average treatment effect to population average treatment effect on the treated: combining experimental with observational studies to estimate population treatment effects. Journal of the Royal Statistical Society. Series A (Statistics in Society) pages 757–778.
  • Hong et al. (2019) Hong, J.-L., Webster-Clark, M., Jonsson Funk, M., Stürmer, T., Dempster, S. E., Cole, S. R., Herr, I., and LoCasale, R. (2019). Comparison of methods to generalize randomized clinical trial results without individual-level data for the target population. American Journal of Epidemiology 188, 426–437.
  • Imai and Ratkovic (2014) Imai, K. and Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76, 243–263.
  • Johnson et al. (2016) Johnson, A. E., Pollard, T. J., Shen, L., Li-Wei, H. L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., and Mark, R. G. (2016). Mimic-iii, a freely accessible critical care database. Scientific Data 3, 1–9.
  • Josey et al. (2020) Josey, K. P., Yang, F., Ghosh, D., and Raghavan, S. (2020). A calibration approach to transportability with observational data. arXiv preprint arXiv:2008.06615 .
  • Kang et al. (2007) Kang, J. D., Schafer, J. L., et al. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science 22, 523–539.
  • Li and Ding (2020) Li, X. and Ding, P. (2020). Rerandomization and regression adjustment. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 241–268.
  • Lu et al. (2021) Lu, B., Ben-Michael, E., Feller, A., and Miratrix, L. (2021). Is it who you are or where you are? accounting for compositional differences in cross-site treatment variation. arXiv preprint arXiv:2103.14765 .
  • Rosenbaum and Rubin (1983) Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55.
  • Rothwell (2005) Rothwell, P. M. (2005). External validity of randomised controlled trials: “to whom do the results of this trial apply?”. The Lancet 365, 82 – 93.
  • Rubin (1974) Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology 66, 688.
  • Rudolph and van der Laan (2017) Rudolph, K. E. and van der Laan, M. J. (2017). Robust estimation of encouragement-design intervention effects transported across sites. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 1509.
  • Shortreed and Ertefaie (2017) Shortreed, S. M. and Ertefaie, A. (2017). Outcome-adaptive lasso: Variable selection for causal inference. Biometrics 73, 1111–1122.
  • Signorovitch et al. (2010) Signorovitch, J. E., Wu, E. Q., Andrew, P. Y., Gerrits, C. M., Kantor, E., Bao, Y., Gupta, S. R., and Mulani, P. M. (2010). Comparative effectiveness without head-to-head trials. Pharmacoeconomics 28, 935–945.
  • Stekhoven and Bühlmann (2012) Stekhoven, D. J. and Bühlmann, P. (2012). Missforest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28, 112–118.
  • Sugiyama et al. (2007) Sugiyama, M., Krauledat, M., and MÞller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation.

    Journal of Machine Learning Research

    8, 985–1005.
  • Tipton (2013) Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics 38, 239–266.
  • Wang and Zubizarreta (2020) Wang, Y. and Zubizarreta, J. R. (2020). Minimal dispersion approximately balancing weights: asymptotic properties and practical considerations. Biometrika 107, 93–105.
  • Westreich et al. (2017) Westreich, D., Edwards, J. K., Lesko, C. R., Stuart, E., and Cole, S. R. (2017).

    Transportability of trial results using inverse odds of sampling weights.

    American journal of epidemiology 186, 1010–1014.
  • Wong and Chan (2018) Wong, R. K. and Chan, K. C. G. (2018). Kernel-based covariate functional balancing for observational studies. Biometrika 105, 199–213.
  • Yang et al. (2020) Yang, S., Kim, J. K., and Song, R. (2020).

    Doubly robust inference when combining probability and non-probability samples with high dimensional data.

    Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 445–465.
  • Zhao and Percival (2016) Zhao, Q. and Percival, D. (2016). Entropy balancing is doubly robust. Journal of Causal Inference 5,.