Augmented Synthetic Control Method
The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in panel data settings. The "synthetic control" is a weighted average of control units that balances the treated unit's pre-treatment outcomes as closely as possible. The curse of dimensionality, however, means that SCM does not generally achieve exact balance, which can bias the SCM estimate. We propose an extension, Augmented SCM, which uses an outcome model to estimate the bias due to covariate imbalance and then de-biases the original SCM estimate, analogous to bias correction for inexact matching. We motivate this approach by showing that SCM is a (regularized) inverse propensity score weighting estimator, with pre-treatment outcomes as covariates and a ridge penalty on the propensity score coefficients. We give theoretical guarantees for specific cases and propose a new inference procedure. We demonstrate gains from Augmented SCM with extensive simulation studies and apply this framework to canonical SCM examples. We implement the proposed method in the new augsynth R package.READ FULL TEXT VIEW PDF
Augmented Synthetic Control Method
The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in settings with a modest number of control units and with many pre-treatment outcomes for all units (Abadie and Gardeazabal, 2003; Abadie et al., 2010, 2015). The idea is to construct a weighted average of control units, known as a synthetic control, that matches the treated unit’s pre-treatment outcomes. The estimated impact is then the difference in post-treatment outcomes between the treated unit and the synthetic control. SCM has been widely applied — the main SCM papers have over 4,000 citations — and has been called “arguably the most important innovation in the policy evaluation literature in the last 15 years” (Athey and Imbens, 2017).
An important limitation of this approach is that, while SCM minimizes imbalance in pre-treatment outcomes, it generally fails to achieve exact balance due to the curse of dimensionality (Ferman and Pinto, 2018). The resulting imbalance can lead to bias.
To address this, we propose the augmented synthetic control method (ASCM). Analogous to bias correction for inexact matching (Abadie and Imbens, 2011), ASCM uses an outcome model to estimate the bias due to covariate imbalance and then de-biases the original SCM estimate. If the estimated bias is small, then the SCM and ASCM estimates will be similar.
We relate our proposal to similar estimators by demonstrating that SCM is a regularized inverse propensity score weighting (IPW) estimator, using pre-treatment outcomes as covariates and penalizing the propensity score coefficients via a ridge penalty. This equivalence, which has not previously been noted, relies on a dual representation of the SCM constrained optimization problem. From this perspective, augmented SCM is analogous to augmented IPW in cross-sectional settings (Robins et al., 1994). The IPW perspective also allows us to draw upon the large propensity score literature to better understand SCM and especially to clarify ambiguity about inference and testing.
We make several additional contributions. First, we characterize the finite sample bias for SCM and highlight the role of covariate imbalance. In particular, Abadie et al. (2010) show that SCM is (asymptotically) unbiased under an assumption that SCM weights yield exact balance on the lagged outcomes. This is a strong assumption, however; such weights exist if and only if the treated unit’s pre-treatment time series is inside the convex hull of the control units’ time series. Similar to other matching and weighting estimators (Robins and Ritov, 1997; Abadie and Imbens, 2011)
, SCM is therefore subject to the curse of dimensionality — the probability that exact balancing weights exist vanishes as the number of time periods grows(Ferman and Pinto, 2018). Recognizing this, we bound the bias due to covariate imbalance for general weighting estimators under a linear factor model; the results in Abadie et al. (2010) are a special case of this bound. We also propose estimating this bias directly using an outcome model. While we advocate using this to de-bias standard SCM, the estimated bias itself is a useful diagnostic for researchers using SCM without augmentation.
Second, we show that, unlike SCM, ASCM extrapolates outside the convex hull of the control units, possibly leading to negative weights. This ensures much closer balance, reducing bias, but rests more heavily on modeling assumptions, such as linearity. We give theoretical results when the outcome model is ridge regression, and show that covariate imbalance and the corresponding bias will generally be lower for ridge-augmented SCM than for either SCM or ridge regression alone. We also show that the ASCM framework can incorporate flexible outcome models, including panel data methods like matrix completion(Athey et al., 2017) and the generalized synthetic control method (Xu, 2017)
, and off-the-shelf machine learning models like random forests and neural networks. In addition, ASCM can accommodate auxiliary, time-invariant covariates to further reduce bias. Despite these advantages, we recommend that, as with any model-based estimator, users devote extra effort to checking model specification, especially in settings where the ASCM and SCM weights yield different estimates.
Third, we draw on the IPW connection to clarify inference and testing for SCM. Abadie et al. (2010, 2015) propose a widely adopted testing procedure for SCM based on a uniform permutation approach. Firpo and Possebom (2017) interpret this test as a Fisher randomization test, though Abadie et al. (2015) interpret it as a placebo test that does not require randomization for validity. The connection between SCM and IPW suggests that the randomization-based interpretation is a natural one, though other interpretations are possible. From this perspective, however, a uniform permutation test will be invalid — a valid randomization test should weight permutations by the propensity score (Branson and Bind, 2018). Unfortunately, a weighted permutation test is not practical in most SCM settings. We instead propose a model-based inference procedure that is generally conservative, and which performs well in simulations.
Fourth, we contribute to the growing literature on approximate balancing weights in high dimensions (Athey et al., 2018; Tan, 2018; Wang and Zubizarreta, 2018). This literature has focused on settings with sparsity. We extend results to settings with a latent factor model, which motivates using penalties and ridge regression, rather than and Lasso as in existing approaches. Our development will therefore be useful beyond SCM. Finally, we implement the proposed methodology in the augsynth package for R, available at github.com/ebenmichael/augsynth.
The paper proceeds as follows. Section 2 introduces notation and the SCM estimator. Section 3 introduces Augmented SCM and characterizes covariate balance for the special case of ridge ASCM. Section 4 bounds the bias under a linear factor model, the standard setting for SCM, showing that ASCM will generally lead to lower bias than SCM alone. Section 5 demonstrates the equivalence of SCM and IPW. Section 6 discusses testing and inference. Section 7 extends the ASCM framework to incorporate auxiliary covariates. Section 8 reports on numerical illustrations as well as extensive simulation studies. Finally, Section 9 discusses some outstanding issues and possible directions for further research. The appendix includes all of the proofs, as well as additional derivations and technical discussion.
SCM was introduced by Abadie and Gardeazabal (2003) and Abadie et al. (2010, 2015) and is the subject of an extensive methodological literature. We briefly highlight three relevant strands of research.
The first strand assesses the performance of the original SCM estimator under different settings. Kreif et al. (2016), Gobillon and Magnac (2016), Wan et al. (2018), among others, assess the general performance of SCM methods. Botosaru and Ferman (2017) and Kaul et al. (2018) explore the role of auxiliary covariates in SCM and note several pathologies. Ferman and Pinto (2018) consider the behavior of SCM when the weights fail to exactly balance the lagged outcomes, showing that the resulting SCM weights do not converge to oracle weights when the number of time periods grows. Powell (2018) similarly explores variants of SCM without exact balance.
interpret this approach as a randomization-based test of the sharp null hypothesis of no impact and derive conditions under which this is valid from a finite sample perspective.Toulis and Shaikh (2018) also take a randomization-based perspective, concluding that the proposed permutation procedure cannot be interpreted as a valid randomization test. Alternatively, Hahn and Shi (2017)
assess the same procedure from the perspective of a placebo test, arguing that the approach fails to control Type I error.Ando and Sävje (2013) similarly evaluate this approach under an exchangeability assumption, which they argue is unlikely to be satisfied in practice. Chernozhukov et al. (2017) consider alternative permutations that instead exploit the time series structure of the problem. Finally, several papers consider a sampling approach to inference (Doudchenko and Imbens, 2017; Robbins et al., 2017; Imai et al., 2018).
The third strand extends SCM to allow for more robust estimation and for new data structures. Building on a suggestion in Abadie et al. (2015), several papers have connected SCM to penalized regression (Doudchenko and Imbens, 2017; Abadie and L’Hour, 2018; Minard and Waddell, 2018). Hazlett and Xu (2018) instead outline a promising approach for improving SCM estimation by first using a kernel approach to transform the raw lagged outcomes. There have also been several proposals to use outcome modeling rather than SCM-style weighting. These include the matrix completion method in Athey et al. (2017), the generalized synthetic control method in Xu (2017), and the combined approaches in Hsiao et al. (2018). Relatedly, Doudchenko and Imbens (2017) relax the SCM restriction that control unit weights be non-negative, arguing that there are many settings in which negative weights would be desirable.
Finally, our work also builds on the recent literature on balancing weights, also known as calibrated propensity scores. This literature modifies the traditional propensity score estimator and instead estimates weights that directly balance covariate means between treated and control units, rather than balancing them indirectly by first estimating the propensity score coefficients. Examples include Hainmueller (2011), Graham et al. (2012), Imai and Ratkovic (2013), Zubizarreta (2015), Tan (2017), and Wang and Zubizarreta (2018). Several also combine weighting and outcome modeling, including Athey et al. (2018), Tan (2018), and Hirshberg and Wager (2018). We similarly build on recent results that highlight the connections between covariate balancing approaches and the implied propensity score model. Examples include Robins et al. (2007), Imai and Ratkovic (2013), Zhao and Percival (2017), Zhao (2018), Tan (2017), and Wang and Zubizarreta (2018).
We consider the canonical SCM panel data setting with units observed for time periods. Let be an indicator that unit is treated. We assume that all treated units receive the treatment at a common time ; units with never receive the treatment. There are a total of treated units and control units, often referred to as donor units in the SCM context. The outcome is and is typically continuous.111The source of randomness varies across the SCM literature. As we discuss in Section 4, we focus on the setting where the units are fixed and uncertainty comes from noisy realizations of the latent factor model. This is in contrast to randomization inference in which treatment assignment is the only source of randomness.
We adopt the potential outcomes framework (Neyman, 1923; Rubin, 1974) and invoke SUTVA, which assumes a well-defined treatment and excludes interference between units (Rubin, 1980). The potential outcomes for unit in period under control and treatment are and , respectively.222To simplify the discussion, we assume that potential outcomes under both treatment and control exist for all units in all time periods . With some additional technical caveats, we could relax this assumption to only require that be well-defined for all units, without also requiring control units to have well-defined potential outcomes under treatment. Observed outcomes are:
To keep notation simple, we assume that there is only one post-treatment observation, , though our results are easily extended to larger . We therefore use to represent the single post-treatment observation for unit , dropping from the subscript. We use , for , to represent pre-treatment outcomes, which emphasizes that pre-treatment outcomes serve as covariates in SCM. With some abuse of notation, we use to represent the -by- matrix of control unit pre-treatment outcomes and for the
-vector of control unit post-treatment outcomes. For most of our discussion, we will restrict attention to the case where only a single unit,, is treated; that is, and . Thus, is a scalar, and is a row vector of treated unit pre-treatment outcomes.333With multiple treated units treated at the same time, we can overload and to denote averages across treated units. The implicit estimand is the (Sample) Average Treatment Effect on the Treated. We return to this point in the discussion in Section 9. The data structure is then:
Finally, the treatment effect of interest is .
The Synthetic Control Method imputes the missing potential outcome for the treated unit,, as a weighted average of the control outcomes, . Abadie and Gardeazabal (2003) and Abadie et al. (2010, 2015) propose choosing weights as a solution to the constrained optimization problem:
where is the squared 2-norm on , , and the constraints limit to the simplex, .
Equation (3), known as the constrained regression formulation of SCM, follows the recent methodological literature and focuses solely on balancing the lagged outcomes (see Doudchenko and Imbens, 2017; Ferman and Pinto, 2018; Powell, 2018). By contrast, the original SCM formulation also includes auxiliary covariates and a weighted norm. As Kaul et al. (2018) and others have shown, (3) is identical to the original SCM proposal in a range of practical settings. We discuss additional complications that arise in the original SCM formulation in Section 7.
The SCM weights in Equation (3) minimize the imbalance of pre-treatment outcomes between the treated unit and synthetic control. Abadie et al. (2010) show that the resulting estimator is unbiased (asymptotically in ) under an assumption that exact balance can be achieved, that is, for all . This is a strong assumption, however. Weights that yield exact balance exist if and only if the treated unit is inside the convex hull of the control units.
Achieving exact balance is therefore subject to the curse of dimensionality (Robins and Ritov, 1997; Abadie and Imbens, 2011; D’Amour et al., 2017). Informally, the probability that exact balancing weights exist vanishes as the dimension of the covariates, , grows large. For example, under a population model, Zhao and Percival (2017) show that the probability that the treated unit is in the convex hull of the control units decreases exponentially in the number of covariates. For exact balancing weights to exist (with high probability), the number of control units must therefore be exponentially larger than , a far cry from the typical SCM setting with . Ferman and Pinto (2018) investigate this question in the context of SCM under a linear factor model, arguing that the SCM weights fail to exactly balance the underlying factors with infinite .444We confirm this intuition with simulation. In one simple simulation, presented in Figure 1, the probability of achieving exact balance drops from 47 percent with to 1 percent with . In “calibrated” simulations, presented in Section 8, there is not a single Monte Carlo draw in which SCM achieves exact balance.
As with other matching or weighting estimators, failing to balance covariates can introduce bias.555One approach to mitigating bias is to proceed with the analysis only if covariate imbalance is small. Specifically, Abadie et al. (2015) recommend against using SCM when “the pre-treatment fit is poor or the number of pre-treatment periods is small.” There is little guidance about what constitutes poor fit, however, and common practice is fairly ad hoc. An estimate of the bias based on a prognostic score is a natural summary for assessing quality of fit. However, while this pre-screening might be an effective approach for a specific application, it could lead to a file drawer problem. In our simulations in Section 8, we find that a range of methods give excellent performance when we condition on cases with good covariate balance, but that this conditioning substantially limits the utility of the methods. While SCM does poorly when covariate balance is poor, many alternatives (including ASCM) are more robust to imbalance.Heuristically, let , where is some function of lagged outcomes , and is mean-zero noise conditionally independent of treatment assignment. Then the bias for a weighting estimator is . This is zero if is linear with respect to and the weights achieve exact balance, but may be non-zero if either condition is not satisfied. For instance, as we show in Section 4, under a linear factor model, is approximately linear for large . At the same time, achieving exact balance is particularly challenging for large , suggesting that SCM will be biased. In the next section, we propose to estimate this bias directly and then use this estimate to de-bias the original SCM estimate.
Our main contribution is to propose an Augmented SCM (ASCM) estimator that combines SCM with outcome modeling. Specifically, let be an estimated outcome model under control. Then is an estimate of the SCM bias. We propose the following bias-reducing estimator for :
where are SCM weights. This specializes to standard SCM when we set to be a constant.
Equations (4) and (5), while equivalent, highlight two distinct motivations for ASCM. Equation (4) directly corrects the SCM estimate, , by the estimated bias, . This is analogous to bias correction for inexact matching (Abadie and Imbens, 2011). If the estimated bias is small, then the SCM and ASCM estimates will be similar. Equation (5) is analogous to Augmented IPW (Robins et al., 1994), which begins with the outcome model but uses SCM to re-weight the residuals. This is comparable in form to the generalized regression estimator in survey sampling (Cassel et al., 1976; Breidt and Opsomer, 2017), which has been adapted to the causal inference setting by, among others, Athey et al. (2018) and Tan (2018). We develop the IPW analogy further in Section 5.
The ASCM framework can incorporate any choice of outcome model. For example, we can use recent proposals for outcome modeling in the SCM setting, including matrix completion (MCP; Athey et al., 2017), the generalized synthetic control method (gsynth; Xu, 2017), and Bayesian structural time series modeling (causalImpact; Brodersen et al., 2015)
. Alternatively, we can use generic supervised learning methods, such as random forests and neural networks. The ASCM framework also nests recent proposals fromDoudchenko and Imbens (2017) and Ferman and Pinto (2018) for “de-meaned SCM” or SCM with an intercept shift, which correspond to a simple unit fixed effects outcome model, . Ferman and Pinto (2018) show that this estimator dominates both standard difference-in-differences and SCM, asymptotically in . We explore these options via simulation in Section 8. Finally, in Section 7, we generalize ASCM to include additional covariates beyond lagged outcomes.
We now explore the special case of ASCM where the outcome model
is fit with (linear) ridge regression. Using results from survey sampling, we first show that we can write this estimator as a single weighting estimator (possibly with negative weights) that adjusts the SCM weights to allow for better balance. Using this equivalence, we then show that ridge-augmented SCM generally has better covariate balance than either SCM or ridge regression alone, though it may have higher variance. While we restrict our theoretical results to ridge-augmented SCM, we anticipate that these can be extended to more general, non-linear outcome models via the model calibration weights framework ofWu and Sitter (2001).
Let , where are the coefficients of a ridge regression of control post-treatment outcomes on centered pre-treatment outcomes with penalty hyper-parameter . Ridge regression, like OLS, has a closed-form expression as a weighting estimator (see Appendix A.1). Thus, the augmented SCM estimator merely combines two different weighting estimators, SCM and ridge regression.
The ridge-augmented SCM estimator specialization of (5) is:
Lemma 1 shows that ASCM weights — which, unlike SCM weights, can be negative — adjust the raw SCM weights to achieve better covariate balance. When SCM weights exactly balance the lagged outcomes, ridge ASCM and SCM weights are equivalent, and when SCM yields good balance or the tuning parameter is large, the estimated bias is small and the two weights are close to each other. Conversely, when SCM has poor balance and is small, the adjustment will be large and the weights will be far apart.666In practice we choose using -fold cross-validation for the outcome ridge regression. As a result, ridge-augmented SCM weights will generally achieve better pre-treatment fit than weights from SCM alone, although at the cost of higher variance.
be the minimum eigenvalue of the sample covariance matrix. Then with , the ridge-augmented SCM weights satisfy:
Since , ridge-augmented SCM will have strictly better covariate balance than SCM alone, except in the special case of exact balance. At the same time, ridge-augmented SCM weights will generally have larger variance than pure SCM weights, with larger discrepancies for worse SCM imbalance. Intuitively, this larger variance arises because, unlike SCM weights, ASCM weights can be negative, which can increase the spread of the weights.
Allowing for negative weights is an important departure from standard SCM: Abadie et al. (2010, 2015) argue that negative weights are undesirable because they are difficult to interpret and allow for extrapolation.777The question of interpreting negative weights has been heavily debated in the surveys literature, where negative weights can arise with (generalized) regression estimates (Fuller, 2002). The main drawback is that, from a design-based perspective, negative survey weights can no longer be interpreted as sampling weights (see Lohr, 2007). From a model-based perspective, however, requiring non-negative weights is technically arbitrary. The SCM non-negativity constraint implies choosing a synthetic control within the convex hull created by the control units, even when the treated unit lies outside this hull, and the distance from the treated unit to the convex hull creates the potential for bias. Thus, even with moderate dimensional covariates, an estimator that constrains weights to be non-negative will be biased in practice. ASCM, by contrast, uses negative weights to extrapolate outside of the convex hull, ensuring much closer balance but resting more heavily on the assumption that the expected value of is (approximately) linear in the control outcomes. We confirm this intuition with simulations in Section 8. In cases where covariate balance is excellent, there is little penalty to restricting weights to be non-negative; otherwise, this constraint can lead to severe bias, at least relative to ASCM. Doudchenko and Imbens (2017) make a similar point with an analogy to bias correction for matching, arguing that negative weights play an important role in reducing bias when exact matches are infeasible.
Finally, in the appendix we compare ridge-augmented SCM to ridge regression alone, again relying on a representation of ridge regression as a weighting estimator. As we show, the ridge regression weights correspond to a special case of the elastic-net synthetic controls estimator proposed by Doudchenko and Imbens (2017), with the elastic-net parameter set to zero, and to a penalized version of the Oaxaca-Blinder weights considered by Kline (2011). Ridge regression achieves worse balance than ridge ASCM, but yields lower sampling variance. In the special case with no regularization, ridge regression weights reduce to the standard regression weights discussed in Abadie et al. (2015), and both standard regression and ridge ASCM will yield perfect balance.
We now characterize the bias under the linear factor model considered in Abadie et al. (2010), with treatment assignment that is ignorable given unobserved factor loadings. We begin with bias for a general weighting estimator and then turn to SCM, showing that the results in Abadie et al. (2010) are a special case of our bound. Finally, we show that ridge ASCM will generally have a tighter bias bound than SCM alone. We confirm these results with simulations in Section 8.
Following the setup in Abadie et al. (2010), we assume that there are latent time-varying factors , , with , where will typically be small relative to . Each unit has a vector of factor loadings . Control potential outcomes are weighted averages of these factors plus additive noise :
where the only random quantities are the noise terms . Slightly abusing notation, we collect the pre-intervention factors into a matrix , where the th row of contains the factor values at time , .
where the expectation is taken with respect to . We further assume that the error terms
are independent (across units and over time) sub-Gaussian random variables with scale parameter.
Under the linear factor model and ignorability given , an estimator that balances
will yield an unbiased estimate of, and if exact balance is not achieved the bias will be proportional to the level of imbalance in . However, ignorability given the latent factors does not generally imply ignorability given the observable , and ensuring balance in will not necessarily ensure balance in . We show in the appendix that the bias of any weighting estimator with weights can be expressed as
The first term is the imbalance of observed lagged outcomes and the second term is an approximation error arising from the latent factor structure. Using our assumption that the noise is sub-Gaussian, we can bound this bias using the triangle inequality.
Under the linear factor model (10), for any weights such that ,
with probability at least , where and .
This result holds for any weights that sum to one, and does not require non-negativity. The approximation error is generally non-zero since the outcome is linear in rather than linear in ; however, with large the approximation error will be small. Interestingly, this term depends on the norm of the weights, , which does not generally enter into bias bounds in simpler settings. Similar results exist for bias in other panel data settings. For instance, the approximation error in Equation (12) is analogous to so-called “Nickell (1981) bias” that arises in short panels.
To apply Theorem 1 to SCM, we consider two cases: the special case with exact balance and the more general case with approximate balance. In the exact balance case, the first term of (12) is zero and the second term goes to zero, with high probability, as . This is the basis of the Abadie et al. (2010) claim that SCM is asymptotically (nearly) unbiased in a factor model with exact balance. Intuitively, the lagged outcome for unit at time , , is a noisy proxy for the index . Thus, as we observe more — and can exactly balance each one — we are better able to match on this index and, as a result, on the underlying factor loadings.
If we do not assume exact balance, the bias bound contains two competing terms: the approximation error, which is decreasing in , and the imbalance in , which is non-decreasing in . Figure 1 illustrates this tension via simulation.888We perform 1000 simulations of a simple fixed effects model (which is a special case of the linear factor model) where fixed effects are drawn from a mixture of two Gaussians with , , and varying . The assumed means are 0 and 0.2, with common variance 0.1, mixing proportion 0.8, and additive noise . We use a logistic selection model with the (unobserved) fixed effects, and normalize the probabilities so that a single unit is treated. With this setup the treated unit’s fixed effect is typically in the convex hull of the control fixed effects. As increases, the average covariate imbalance (in terms of RMSE in ) also increases, eventually leveling off. The bias decreases initially as the approximation error falls, but levels off to a positive value as imbalance comes to dominate.
While Theorem 1 only provides an upper bound on the bias, rather than an expression for the bias itself, we nonetheless argue that SCM will still be biased with large . Ferman and Pinto (2018) show that — even if weights exist that exactly balance the latent factor loadings — as the SCM weights will converge to a solution that does not exactly balance the latent factors, leading to bias. Theorem 1
complements their asymptotic analysis with a finite sample bound that holds for all weighting estimators, and explicitly includes the level of imbalance in the lagged outcomes. In AppendixA.3 we discuss further connections to their results and the duality between SCM and IPW presented in the following section. Our simulation evidence, both in Figure 1 and in Section 8, is consistent with conclusion that SCM is biased in a range of scenarios.
Lemma 2 shows that ridge ASCM will generally have better covariate balance than SCM. We now combine this with Theorem 1 to show that ridge ASCM will also have a tighter bias bound than SCM alone. While our theoretical discussion is again limited to bounds on the bias, simulations in Section 8 confirm that the realized bias is consistently smaller for ridge ASCM than SCM alone, at least in the settings we consider.
As in the general case in Theorem 1, the level of imbalance and the complexity of the weights play possibly competing roles. Specifically, the regularization parameter (or the transformation ) controls how much to prioritize balance. In the extreme case with , ridge ASCM is equivalent to SCM. As we reduce , the imbalance in decreases but increases, which increases the approximation error. Note that if is full rank (requiring ), ridge ASCM weights exists with and ridge ASCM exactly balances the lagged outcomes, even where non-augmented SCM does not.
Finally, while the constant terms in Theorem 1 and Corollary 1 are relatively loose, it is instructive to consider the bias bounds for SCM and ridge ASCM when the number of control units is large relative to the number of latent factors. In this case, the approximation error is small and it is bias-optimal to minimize imbalance in the lagged outcomes, setting to be small. Thus, the bias bound for ridge ASCM will be lower than the corresponding bound for SCM. As we show through simulation in Section 8, this intuition holds even when the number of control units is only slightly larger than the number of latent factors, and even in this case the reduction in bias outweighs any increase in variance.
We now connect SCM to other balancing estimators, which builds intuition, helps to motivate the Augmented SCM estimator proposed in Section 3, and gives additional clarity for inference and testing. First, we notice that a form of SCM, which we call penalized SCM, is a special case of an approximate covariate balancing weights estimator. Second, we extend existing results on the duality between balancing weights and inverse propensity score weights to show that SCM is indeed a form of inverse propensity score weighting.
There may be no unique solution to the original SCM problem in Equation (3), if multiple sets of weights achieve exact balance on the lagged outcomes. Following Abadie et al. (2015) and Doudchenko and Imbens (2017), we modify the original SCM procedure to penalize the dispersion of the weights with a strongly convex dispersion function, . For a sufficiently small penalty, penalized SCM will be nearly identical to standard SCM in cases where the latter has a unique solution. However, the penalized SCM problem is guaranteed to have a unique solution for any positive penalization, which is analytically convenient.
To fix ideas, we consider the entropy penalty, , used in Robbins et al. (2017).999Many other dispersion functions are possible, including an elastic net penalty (Doudchenko and Imbens, 2017), a measure of pairwise distance (Abadie and L’Hour, 2018), and a measure of outcome variance (Minard and Waddell, 2018); we discuss alternative dispersion penalties in Section 5.3. An additional motivation for penalized SCM is that the standard SCM weights can often be unstable, with very few (often just three or four) donor units receiving positive weights. While this may minimize bias, it has high variance; in practice, researchers might want to accept some bias for lower variance. Specifically, the entropy penalized SCM weights solve:
The hyperparametersets the relative priority of minimizing the entropy term, which penalizes very large and very small weights, versus covariate imbalance. A larger means greater covariate imbalance, and thus more bias, but lower variance due to the more dispersed weight vector. As with other regularized estimators, if a unique solution exists to the un-regularized (3), we can find an arbitrarily close solution to (14) by setting sufficiently small.
With this setup, we can immediately see that entropy balancing (Hainmueller, 2011) is a special case of Equation (14) in which the weights yield exact balance. Thus, entropy penalized SCM can also be motivated as entropy balancing with approximate—rather than exact—balance. More broadly, SCM is one example of a broad class of covariate balancing weights that have recently become more prominent. In addition to entropy balancing, examples include stable balancing weights (Zubizarreta, 2015), approximate residual balancing (Athey et al., 2018), and several others (Tan, 2017; Wang and Zubizarreta, 2018; Zhao, 2018). See Graham et al. (2012), Chan et al. (2016), and Li et al. (2017) for related examples with slightly different forms. We return to the general case in Section 5.3.
Once we recognize that SCM yields covariate balancing weights, we can leverage recent results connecting balancing and propensity score weights to show that SCM is a form of inverse propensity score weighting. First, we show the duality between entropy-penalized SCM with exact balance and unregularized IPW. This result follows Zhao and Percival (2017), though our argument is more direct. Second, we relax the restriction that the weights yield exact balance. In this case, SCM is equivalent to regularized IPW. We extend results from Wang and Zubizarreta (2018) to the case of an norm and show that entropy-penalized SCM maps to a ridge penalty on the propensity score coefficients. In the appendix we derive a general duality between balancing weights and propensity score estimation that encompasses many estimators in the literature, including extensions to other link functions.
We consider the penalized SCM estimator in Equation (14). Enforcing exact balance, this estimator can be rewritten as:
Note that the entropy dispersion penalty includes the SCM non-negativity constraint. In the appendix, we show that this problem has an equivalent Lagrangian dual:
Thus, the donor weights have the form of IPW weights with a logistic link function, where the propensity score is
and the odds of treatment are. Importantly, the parameters in Equation (17) are fit via calibrated estimation rather than the two-step, maximum likelihood-based approach that is standard for IPW but is impractical in typical SCM settings.101010In traditional IPW, the propensity score model is first estimated via maximum likelihood, then the estimated propensity scores are used to form IPW weights. McCullagh and Nelder (1989) and King and Zeng (2001)
show that the MLE for logistic regression can be badly biased when the response is a rare event. Sinceis in the denominator, the re-weighting step can amplify this bias. Calibrated propensity score estimation also yields consistent estimates of the true propensity score under appropriate conditions; see, for example, Zhao and Percival (2017) and Tan (2017). Specifically, the Lagrangian dual (17) fits the propensity score coefficients so that the implied odds of treatment, , lie on the simplex and the weighted mean of the control units’ covariates is exactly equal to the value of the treated unit, . See Tan (2017) and Wang and Zubizarreta (2018) for additional discussion.
We now relax the unrealistic constraint that SCM yield exact balance. In the dual perspective of balancing weights as IPW, allowing for approximate balance is equivalent to regularizing the propensity score model. Specifically, analogous to the argument under exact balance, the Lagrangian dual to the entropy-penalized SCM problem (14) includes a propensity score model that now includes a ridge penalty on the propensity score coefficients:
In this form it is clear that controls the level of regularization of the propensity score parameters, which maps back to the weights . When is large, the parameter estimates will be near zero, implying that the weights will be near uniform. Conversely, when is small, may be large in magnitude, allowing for extreme weights that prioritize lower bias at the price of higher variance. In practice, SCM implicitly chooses to be as small as possible, so weights are extreme; it is common for only three or four units to receive positive weights. See Doudchenko and Imbens (2017) for a discussion of choosing by cross validation.
Finally, we briefly characterize the entropy-penalized SCM problem as a special case of a broader class of problems. Specifically, we can re-write the penalized SCM problem in Equation (14) in a more general form (see Ben-Michael et al., 2018, for additional discussion):
This general form has two key components: is a strongly convex measure of dispersion (e.g. entropy), which guarantees uniqueness, and is a measure of distance (e.g. the norm). This formulation covers several estimators, including penalized SCM, entropy balancing (Hainmueller, 2011), Oaxaca-Blinder weights (Kline, 2011), and minimal approximately balancing weights (Wang and Zubizarreta, 2018). In the appendix we extend the above arguments to derive the Lagrangian dual of this general balancing weights problem (19):
where a convex, differentiable function has convex conjugate . The solutions to the primal problem (19) are where is the first derivative of the convex conjugate, .
The two components of the primal problem (19) control the propensity score model and how it is regularized. The dispersion measure determines the link function of the propensity score model, where the odds of treatment are . Note that un-penalized SCM, which can yield multiple solutions, does not have a well-defined link function. The balance criterion determines the type of regularization through its conjugate . This formulation recovers the duality between entropy balancing and a logistic link (Zhao and Percival, 2017), Oaxaca-Blinder weights and a log-logistic link (Kline, 2011), and balance and regularization (Wang and Zubizarreta, 2018). This more general formulation also suggests natural extensions of both SCM and ASCM beyond the setting to other forms, especially regularization.
et al. (2010, 2015) propose a widely adopted testing procedure for SCM based on a uniform permutation test.
Interpretation of this test, however, is unsettled.
et al. (2015) justify it as a placebo test that does not require randomization for validity.111111Hahn and
Shi (2017) argue that the placebo test interpretation only holds under strong assumptions on both the model and outcome distribution. Ando and
Sävje (2013) show that a similar motivation, which requires exchangeability of the test statistic, is unlikely to hold since units on the border of the convex hull of the factor loadings are systematically different from the treated unit, which is typically assumed to be inside the convex hull.
show that a similar motivation, which requires exchangeability of the test statistic, is unlikely to hold since units on the border of the convex hull of the factor loadings are systematically different from the treated unit, which is typically assumed to be inside the convex hull.Firpo and Possebom (2017) instead justify it as a Fisher randomization test with uniform treatment assignment and then assess sensitivity to this assumption.121212Chernozhukov et al. (2017) propose an alternative permutation approach that permutes the time periods rather than the treatment assignment. The key assumption for the validity of this approach is unbiasedness of the estimator, which, as we argue, is likely violated in practice for SCM. Simulation results in the appendix confirm that this approach is invalid in the settings we consider.
The IPW equivalence suggests that the randomization-based perspective is a natural interpretation. A valid randomization-based test for an IPW estimator, however, requires weighting the permutation distribution by the propensity score. In the appendix, we describe valid testing with known, varying propensity scores (see Branson and Bind, 2018), and show that the uniform permutation test differs from this ideal case. Thus, we argue that uniform permutation testing will generally be invalid from the randomization-based perspective, though other perspectives are possible.
A weighted permutation test using the estimated propensity score is promising in principle. Toulis and Shaikh (2018) give conditions under which the corresponding -values will be valid asymptotically. In practice, however, there is little reason to expect those conditions to hold in typical, finite sample SCM settings, with a heavily regularized propensity score based on a single, treated unit. Simulation evidence presented in the appendix confirms this pessimistic outlook. Moreover, even if this approach were valid, it would be impractical in most SCM settings: since SCM weights are generally sparse, few permutations will have non-zero weight, and the -value distribution under both the null and alternative will be pathological. For instance, it is quite common for SCM weights to be positive for only three or four donor units. If four units each get equal weight, the lowest possible -value is .
As an alternative to permutation testing, we propose a model-based approach to inference based on the placebo distribution and show that the resulting inference is conservative under some assumptions on the error distribution. Specifically, we consider a generic outcome model with independent, sub-Gaussian noise , . To simplify exposition and notation, we initially assume that the noise terms are homoskedastic at each time , with common variance . Under this model, the variance of the SCM or ASCM treatment effect estimate is:
where is the residual variance at post-treatment time and where the variance conditions on the observed lagged outcomes and treatment assignment. With uniform weights, Equation (21) reduces to the usual variance of the difference in means under homoscedasticity.
We propose to estimate the noise variance via the average squared placebo gap:
where is the leave-one-out SCM estimate of , . Importantly, is a conservative estimator for , in the sense of having a positive bias. To see this, note that the variance of placebo gaps is strictly larger than the variance of the noise: . We then estimate the sampling variance of as
Proposition 1 shows that this is a conservative estimate for .
Let . Under the linear factor model (10), is conservative:
Proposition 1 shows that the upward bias of depends on the bias of the placebo estimates and the -norm of the placebo weights.131313Doudchenko and
Imbens (2017) use a different justification for using as the estimate for . Proposition 1 shows that in some cases may be conservative enough to give valid confidence intervals for
may be conservative enough to give valid confidence intervals for; these intervals may undercover in other situations, however, especially with more than one treated unit. Building on results from Section 4, these quantities will be small when the control units are similar to each other and tightly packed together, which will limit both the bias and the spread of the weights. Conversely, the upward bias of
will be large if there are extreme outliers in the control group.
Many modifications to the simple variance estimate in (23) are possible. For example, we can divide the th squared placebo gap by its contribution to the bias, . We can similarly extend to the fully heteroskedastic case by weighting the mean squared placebo gap. In addition, we can extend this variance estimate to the setting with multiple post-treatment time periods by separately estimating for each post-treatment time period and plugging in to (23).
While we advocate the model-based inference approach, there are many possible alternatives. First, the bootstrap is an attractive approach for inference for IPW estimators in cross-sectional settings (Funk et al., 2011). The SCM setting, however, falls between traditional IPW, where standard case-resampling bootstrap works well, and matching-style estimators, where it can fail (Abadie and Imbens, 2008). The simulations in Section 8 include a simple non-parametric bootstrap and show that it has poor coverage. Second, Robbins et al. (2017)
propose estimating standard errors following a survey sampling approach. This method, however, is restricted to estimators with non-negative weights. Finally, we could pursue a design-based approach to inference, which follows naturally from both the IPW perspective and the randomization-based testing approach for SCM(e.g., Doudchenko and Imbens, 2017). Our initial simulations, however, found that both the design-based and survey sampling approaches performed poorly in settings with a single treated unit. We leave a thorough investigation to future work.
So far, we have focused on a simplified version of SCM that uses only pre-treatment outcomes as covariates. The original SCM formulation, however, also includes auxiliary covariates , which are typically time invariant and which can also include functions of the lagged outcomes, such as the pre-treatment average. We assume these covariates are centered by the control unit averages, and for now assume that the matrix of control unit covariates, , is full rank. Abadie et al. (2010) propose a variant of Equation (3) that minimizes a weighted imbalance measure in , with weights chosen to minimize imbalance of the lagged outcomes . Using our IPW perspective, we show in Appendix A.4.1 that this approach fits a (calibrated) propensity score model using as covariates and a weighted ridge penalty, with weights chosen to minimize the imbalance in .141414This two-step method induces certain pathologies. For instance, Kaul et al. (2018) point out that SCM applications commonly include all lagged outcomes in . In this case, covariates in other than the lagged outcomes receive zero weight, and the two-step estimator reduces to the simpler version in Equation (3).
Our results suggest three transparent alternatives for incorporating auxiliary covariates. First, we can expand ASCM to include alongside , both in the SCM balance criterion and in the outcome model, . This is natural from the (augmented) IPW perspective; including in the balance criterion is equivalent to including it in the propensity score equation. See Appendix A.4.2 for additional details, including the possibility of giving different weights to and in the SCM balance criterion and the outcome model regularization.
Second, we can partition the variables in the ASCM framework, using SCM to balance the lagged outcomes and the outcome model to adjust for possible bias due to imbalance in the auxiliary covariates. Specifically, this approach uses our ASCM proposal, Equation (5), but expresses the outcome model solely in terms of . Because we assume that is full rank, we can fit this model by OLS, yielding , where .
To understand this approach, consider linear projections of and onto , using coefficients estimated from control observations. We can write the projections as and , where , and write the residuals from these projections as and , respectively. Lemma 3 indicates that ASCM with OLS on as the outcome model is a weighting estimator that exactly balances the covariates and thus the projections . The remaining imbalance comes solely from the residual component .
For any weight vector that sums to one, the ASCM estimator from Equation (5) that uses as the outcome model is a weighting estimator,
and the imbalance in the lagged outcomes and the auxiliary covariates are
This has several implications. First, the approach outlined above, using as the outcome model in ASCM, is equivalent to first residualizing against , then estimating the treatment effect as , where are the SCM weights that minimize imbalance in .
Second, this suggests an improvement on this approach that replaces with SCM- or ASCM-style weights that minimize imbalance in instead of . This can be implemented in the form of a partitioned regression: first residualize and against , then apply SCM or ASCM to the residuals and . By Lemma 3, this perfectly balances . Furthermore, by achieving better balance on the residuals , this approach achieves better balance on than is obtained by separately balancing the raw lagged outcomes and fitting OLS on the auxiliary covariates.151515In addition, under the linear factor model with covariates considered by Abadie et al. (2010) and Xu (2017), the bias bound in Theorem 1 will still hold because perfectly balances the covariates. Note that while this partitioned ASCM approach always exactly balances the auxiliary covariates, whether it achieves better balance on the original than SCM alone, without auxiliary covariates, is data dependent. We thus propose this approach as a sensible default, and as with SCM and ASCM encourage careful balance checking. In addition, because we can represent this approach as a weighting estimator, we can compute standard errors with the procedure in Section 6.2.
Finally, the partitioned regression procedure as outlined requires to be full rank. If it is not (e.g., if ), we can still follow the partitioned regression procedure, though the residuals will only be approximately orthogonal to the auxiliary covariates, and Equation (27) will not hold. In practice we suggest following the dimension reduction approaches in the following section, and do this in the numerical illustration in Section 8.
While Abadie et al. (2010) suggest including (functions of) the lagged outcomes in , there is little guidance on how to do so in practice. Consistent with the literature on balancing covariates in high dimensions (e.g., Ning et al., 2017), we argue that a principled approach is to choose functions that best approximate the conditional expectation of , .
This can take several forms. First, we can manually select summary statistics based on an assumed functional form for . Abadie et al. (2010)
argue for this under an autoregressive model; iffollows an model, then exactly balancing the most recent entries is sufficient for unbiasedness. Doudchenko and Imbens (2017) consider the case where includes unit-specific intercepts; here, balancing the pre-treatment average outcome is sufficient to capture the available information about the underlying factors.161616Doudchenko and Imbens (2017) and Ferman and Pinto (2018) propose applying SCM to de-meaned data . This can be seen as a special case of our partitioned regression proposal.
Richer models would point to additional summary statistics, such as the pre-treatment trend or, in a general factor model, estimates of the factor loadings from a singular value decomposition; see Gobillon and Magnac (2016) for similar suggestions.
Alternatively, we can directly model the prognostic score (Hansen, 2008). For example, we could use the ridge regression model in Section 3.2 to estimate for each unit, then fit SCM using this as the only covariate. Or, analogous to the matrix in Abadie
et al. (2010), we could weight the covariates by their relative predictive power for .171717This is the analog of the common “covariate screening” approach for high-dimensional covariates. For example, Ning
et al. (2017) use a Lasso regression of
use a Lasso regression ofon as a pre-processing step to select important covariates and then balance that selected subset. See also Belloni et al. (2014). Hazlett and Xu (2018) also suggest a promising kernel-based approach in the context of SCM, first proposing a kernel representation of the lagged outcomes and then balancing a low-rank approximation to the kernel.
We evaluate specific implementations of dimension reduction in our simulations in Section 8. Simulation evidence indicates that SCM using a well-chosen dimension reduction can improve over SCM using the full vector of pre-treatment outcomes, but that ASCM using the full covariate vector and a flexible outcome model generally improves on either.
We now turn to a numerical illustration and simulations. We focus on two prominent SCM examples: the impact of Proposition 99 on cigarette consumption in California (Abadie et al., 2010) and the impact of terrorism on the Basque economy (Abadie and Gardeazabal, 2003). We describe the Prop. 99 setting below and describe the Basque example in the appendix. We generate simulation studies designed to mimic each application. Overall, we find that SCM is badly biased across these simulations and that there can be substantial gains from ASCM and related approaches. We also use these simulations to show that standard uniform permutation tests are generally invalid and to evaluate coverage of the variance estimator in Section 6.2.
In their seminal paper, Abadie et al. (2010) apply SCM to the impact of California Proposition 99 on per-capita cigarette sales. Enacted in 1989, Prop. 99 increased cigarette taxes in addition to launching several anti-tobacco projects in the state. The authors find that, for the 1989–2000 period, Prop. 99 reduced annual cigarette sales by roughly 25 percent. A small cottage industry has emerged around re-analyzing this example (e.g., Doudchenko and Imbens, 2017; Hazlett and Xu, 2018; Hsiao et al., 2018).
Figures 2 and 3 show the results of our re-analysis of the California data. Figure 2 shows gap plots of the ATT estimates, plus or minus two standard errors calculated using our model-based proposal in Section 6.2. We consider three estimators: (1) SCM alone,181818Note that the SCM estimator used here balances all the lagged outcomes. The original Abadie et al. (2010) study used four auxiliary covariates (cigarette prices, per capita income, the share of the population aged 15-24, and per capita beer consumption) and three lagged outcomes (1975, 1980, and 1988) in (see Section 7). Perhaps surprisingly, we obtain nearly identical estimates using all the lagged outcomes and no auxiliary covariates. (2) ridge-augmented SCM, and (3) the partitioned strategy described in Section 7, using an OLS regression on four auxiliary covariates and the pre-treatment average as the outcome model and balancing the residuals with SCM. Figure 2(a) shows the balance across auxiliary covariates and, in the last row, across pre-treatment average outcomes. Recall that by Lemma 3, the third estimator achieves exact balance on the auxiliary covariates. Figure 2(b) shows the donor unit weights for SCM and ridge ASCM.
The different estimators lead to somewhat similar stories. Our SCM estimate shows relatively weak evidence for a large, negative effect of Prop. 99 on cigarette sales, of around 26 packs per capita in 1997, close to the estimate in Abadie et al. (2010) of 24 packs per capita. By contrast, the ridge ASCM estimate shows a weaker effect of 20 packs per capita, with wider standard errors. The covariate-adjusted ASCM has similar-sized standard errors to SCM but an estimated effect of around 13 packs per capita, roughly half the size of the SCM estimate but close to the estimated 12 packs per capita in the original Prop. 99 study (Glantz and Balbach, 2000).
We can use the tools we have developed to understand these differences. The balance in the lagged outcomes and auxiliary covariates, while good, is not exact for SCM, which suggests that the estimates are still likely biased. The augmented estimators substantially improve balance: the baseline ridge augmented estimator achieves nearly perfect balance of the lagged outcomes, but leaves some imbalance of other covariates, and the covariate-adjusted ASCM estimator achieves perfect balance on the auxiliary covariates and roughly half the imbalance on pre-treatment outcomes as SCM. This suggests that bias will be smaller for the augmented estimators, especially the covariate-adjusted estimator, than for SCM alone.
Finally, as anticipated, the weights for ridge ASCM are more variable than the sparse, non-negative standard SCM weights—the norm of the ridge ASCM weights is roughly 12 percent larger than that of the SCM weights—and many of the weights are negative. Thus, ridge ASCM is extrapolating outside of the convex hull in order to achieve nearly perfect balance on the pre-treatment outcomes. As a consequence, the estimate relies more heavily on global linearity than SCM alone. In addition, these more variable weights increase sampling variability. Finally, we note that the standard error estimates assume homoskedasticity across units, but not across time periods, and so the standard error widen and tighten at different time points.
We now turn to simulation studies in which the true data generating process is known. To make our simulation studies as realistic as possible, we conduct “calibrated” simulation studies (Kern et al., 2016), based on estimates of linear factor models fit to the California (, ) and Basque (, ) examples. We use the Generalized Synthetic Control Method (Xu, 2017) to estimate factor models with four latent factors for each application. We then simulate outcomes using the distribution of estimated parameters. We model selection int