DeepAI
Log In Sign Up

Synthetically Controlled Bandits

02/14/2022
by   Vivek Farias, et al.
0

This paper presents a new dynamic approach to experiment design in settings where, due to interference or other concerns, experimental units are coarse. `Region-split' experiments on online platforms are one example of such a setting. The cost, or regret, of experimentation is a natural concern here. Our new design, dubbed Synthetically Controlled Thompson Sampling (SCTS), minimizes the regret associated with experimentation at no practically meaningful loss to inferential ability. We provide theoretical guarantees characterizing the near-optimal regret of our approach, and the error rates achieved by the corresponding treatment effect estimator. Experiments on synthetic and real world data highlight the merits of our approach relative to both fixed and `switchback' designs common to such experimental settings.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/09/2019

Optimal Experimental Design for Staggered Rollouts

Experimentation has become an increasingly prevalent tool for guiding po...
04/20/2019

Waterfall Bandits: Learning to Sell Ads Online

A popular approach to selling online advertising is by a waterfall, wher...
12/16/2020

Trustworthy Online Marketplace Experimentation with Budget-split Design

Online experimentation, also known as A/B testing, is the gold standard ...
12/15/2020

Network experimentation at scale

We describe our framework, deployed at Facebook, that accounts for inter...
01/29/2019

A/B Testing in Dense Large-Scale Networks: Design and Inference

Design of experiments and estimation of treatment effects in large-scale...
01/25/2022

Inform Product Change through Experimentation with Data-Driven Behavioral Segmentation

Online controlled experimentation is widely adopted for evaluating new f...
02/02/2022

Adaptive Experimentation with Delayed Binary Feedback

Conducting experiments with objectives that take significant delays to m...

1. Introduction

Experimentation is a crucial tool deployed in the data-driven improvement of modern commerce platforms. On such platforms, it is often the case that a new product feature or algorithmic tweak is broadly rolled out only after its prospective benefit is understood via an appropriately designed experiment. In some cases, an appropriate unit of experimentation is simply an end user. In such cases, experiment design and inference, while not entirely trivial, is relatively well understood. On the other hand, it is often the case that the intervention, whose effect the experiment seeks to characterize, induces interactions among individual users of the platform. Often referred to as ‘interference’, this effectively violates the Stable Unit Treatment Value Assumption (SUTVA) that is typically assumed in most designs, and necessary for correct inference. There is an emergent and exciting literature focused on experiment design and inference in the presence of interference.

It remains unclear how to robustly characterize the bias induced by interference. As such, a common strategy used to obviate interference concerns in practice, is simply to pick a unit of experimentation that is sufficiently coarse. For instance, a ride hailing platform experimenting with a new payment feature would simply choose the unit of experimentation to be a region or city. Since the pool of such coarse experimental units is by definition smaller, picking the appropriate controls is no longer a simple matter. In addition, the counterfactual value of the quantity being measured for any unit is likely to have temporal effects, and may potentially even be non-stationary so that even the natural controls for so-called ‘switch-back’ designs are insufficient. These very challenges arise in program evaluation, a common task in empirical economics. There, the synthetic control methodology is seen as ‘arguably the most important innovation … in the last 15 years’ (Athey and Imbens, 2017). This approach seeks to construct a ‘synthetic’ control via a linear combination of non-treatment units that best approximates the treatment unit prior to the treatment period. While originally intended primarily for inference given observational data, the synthetic control approach has become a go-to approach for experiment design and inference on platforms in settings where the unit of experimentation is coarse. The only drawback to this overall scheme is the cost of experimentation: any ‘regret’ from an undesirable intervention is now borne at the level of a city or region (as opposed to a substantially smaller group of users) over the period of the experiment.

This paper proposes a new approach to learning in settings such as the ‘region-split’ experiments described above enabled by a novel device: the synthetically controlled bandit. This approach (a) yields near-optimal sub-linear regret and (b) recovers the treatment effect at the same rate as a traditional experiment with synthetic controls on the event that the treatment effect is positive. On the event that the treatment effect is negative, the approach simply learns that this is the case but does not recover a precise estimate of the treatment effect. As such, the synthetically controlled bandit largely eliminates the cost of experimentation at the expense of being able to learn the treatment effect precisely only on the event where this treatment effect is positive. Since in practical applications, a precise estimate of the treatment effect is only of value when this treatment effect is positive (so as to facilitate, for instance, cost-benefit analyses for a roll-out of the intervention), this new approach makes possible an attractive tradeoff.

1.1. The Synthetically Controlled Bandit

Minimizing the Cost of Exploration:

Consider a setting where the decision whether or not to treat the treated unit (city, region, etc.) in any given epoch is a dynamic one. Over some experimentation horizon, a natural goal aligned with minimizing the cost of experimentation, would be to minimize

regret

. That is, over the experimentation horizon, we effectively minimize the expected number of times a sub-optimal treatment option was chosen for the treated unit. It turns out that in the synthetic control setting, this problem is equivalent to a linear contextual bandit, wherein the context at each period is a low-dimensional latent vector (the so-called ‘unobserved common factors’ in the corresponding synthetic control model). This bandit problem has several salient features worth noting:

  1. Since observations across all units are made contemporaneously, the context vector is unavailable at the time of decision making.

  2. We never observe historical context vectors directly; instead we only ever observe an unknown, noisy, linear transformation of these vectors.

  3. This unknown linear transformation can never be recovered exactly even with infinite data: instead, we can only hope to recover it up to a rotation.

Put succinctly, the underlying contextual bandit is one where the context is not available at the time of decision making, and can, post-facto, only be recovered up-to some unknown rotation and noise. Our primary technical contribution is an algorithm that, despite the challenges above, achieves a regret that scales like . Here is the dimension of the latent context and is the experimentation horizon. We dub our approach Synthetically Controlled Thompson Sampling

(SCTS). SCTS consists of a Thompson sampling routine with carefully designed ‘exploration noise’. Contexts are recovered via principal components analysis (PCA) on historical observations. Importantly, our sampler is robust to the errors in context recovery due to noise and the inability to recover rotations. The careful design of our sampler also allows for a linear dependence on the latent context dimension

as opposed to the achieved by state-of-the-art Thompson sampling approaches; a result of independent interest.

Inference:

The vanilla synthetic control estimator is not asymptotically normal, and as such, inference is either via high-probability confidence intervals, or else general purpose non-parametric approaches (such as the bootstrap or permutation tests). We propose a treatment effect estimator and establish high probability confidence intervals for this estimator that (a) on the event that the true treatment effect is positive line up precisely with the vanilla synthetic control confidence intervals and (b) on the event that the true treatment effect is negative will simply be the negative half-line. In other words, we provide the same quality of inference as in a synthetic control experiment on the event the the treatment effect is positive, but when the effect is negative, are only able to detect that this is the case. Since quantification of the treatment effect is typically only relevant when the effect is positive (so as to facilitate, for instance, cost-benefit analyses for a roll-out of the intervention), it is unclear that the loss in inferential capabilities relative to synthetic control has practical consequences. With an eye to practice, we propose the use of re-randomization based hypothesis tests and confidence intervals derived from inverting these tests. We see, on real world data, that these tests are highly powered even for small treatment effects and that the confidence intervals derived from them provide near-ideal coverage.



Computational Experience: We present experimental work on both a synthetic data setup (wherein the synthetic control model holds by construction), as well as on real world data, where this setup is, at best, an approximation. In both cases, we see that SCTS employs a sub-optimal intervention for a negligible fraction (typically a single digit percentage) of epochs over the experimentation horizon. Compared with both a fixed and switchback design, SCTS thus materially reduces the cost of exploration. Despite this, we see that our treatment effect estimator correctly identifies whether or not the treatment effect is positive in every single one of our instances. Importantly, on the instances where the treatment effect is positive, the relative RMSE of our estimator is comparable to state-of-the-art estimators for both the switchback and fixed designs, and in fact outperforms these incumbents in the real-data setting. Finally, as discussed above, re-randomization based hypothesis tests and confidence intervals provide near-ideal coverage, even on real-world data, allowing for effective inference.

1.2. Related Literature

Synthetic Control and Inference: The notion of synthetic control was introduced initially in the context of program evaluation: (Abadie and Gardeazabal, 2003; Abadie et al., 2010) are seminal papers that propose to recover the counterfactual in an observational setting by creating a “synthetic control”. Specifically, they proposed constructing a convex combination of control units that matches the treated unit in pre-treatment periods. A series of follow-on studies proposed distinct estimators by employing different constraints and regularizers (e.g, (Hsiao et al., 2012; Doudchenko and Imbens, 2016; Li and Bell, 2017; Arkhangelsky et al., 2019; Ben-Michael et al., 2021)). See (Abadie, 2019) for a review of this vibrant literature. Since the underlying generative model justifying the synthetic control framework is in fact a factor model, it is natural to consider using PCA-like techniques in the recovery of a synthetic control; the present work leverages such techniques in a dynamic context. This approach is especially relevant in the setting where the size of the ‘donor pool’ is large. (Athey et al., 2021; Xu, 2017; Amjad et al., 2018; Bai and Ng, 2019; Amjad et al., 2019; Agarwal et al., 2021; Farias et al., 2021) are all papers in this vein. (Farias et al., 2021) in particular compute a min-max optimal estimator for a generalization of the synthetic control problem. It is indeed possible to compute limiting distributions for synthetic control estimators in various special cases. For instance, if one were willing to make probabilistic assumptions on the data, it is possible to compute a limiting distribution for the synthetic control estimator (roughly, this distribution is a projection of the OLS limiting distribution to a convex set); (Li, 2020). Several of the references above also compute limiting distributions in special cases of the problem. In practice, however, inference for average treatment effects in synthetic control is done via non-parametric methods such as permutation tests; see (Chernozhukov et al., 2021).

Synthetic Control in Experiment Design for Commerce: Not surprisingly, synthetic control approaches have gained traction in modern commerce settings; as a relatively early example (Brodersen et al., 2015) describes an approach and corresponding software used by Google in the context of marketing attribution. Going further, however, synthetic control has come to be viewed as an important tool in experiment design as well, as opposed to simply in observational settings. For instance, (Chen et al., 2020b; Jones and Barrows, 2019) describe practical designs, at Lyft and Uber respectively, that assume a synthetic control model holds across units. In a theoretical direction, (Doudchenko et al., 2019) and (Abadie and Zhao, 2021) consider the problem of how best to select an experimental unit assuming the synthetic control model holds, motivated by problems at Facebook and the sorts of ‘region-split’ experiments common to ride-sharing platforms respectively. Like this work, the present paper also actively uses the synthetic control model in experiment design; in our case these designs are ‘dynamic’.

Contextual Bandits and Inference: The dynamic design that this paper constructs is, in a certain idealized sense, a linear contextual bandit. This sort of bandit is classical, studied at least as early as (Auer, 2002). The practical constraints around our design necessitate a sampling methodology that draws from recent work on Thomson sampling for such bandits; see (Agrawal and Goyal, 2013; Abeille and Lazaric, 2017; Kveton et al., 2021). As noted elsewhere, the fundamental challenge we must address is that we never observe contexts directly. There is some limited work discussing the use of dynamic bandit based designs in clinical trials, (Villar et al., 2015; Berry, 2012).

Turning to inference, it is well known that naive sample estimates of arm means in bandits are biased (see e.g. (Villar et al., 2015; Nie et al., 2018)). A recent line of work considers ‘post-contextual bandit’ inference, where certain importance-weighted estimators are shown to be unbiased and asymptotically normal; see (Hadad et al., 2021; Bibaut et al., 2021) and also (Deshpande et al., 2018) for a different approach to the problem under a linear reward model. Extending these to our setting is an exciting direction for future work. In addition to the observability of the contexts, such an extension will also need to address the general problem that this line of work requires a type of forced exploration of arms which may not be consistent with our bandit algorithm. The present paper simply constructs high probability confidence intervals via the usual self-normalized martingale concentration bounds. While loose in practice, they already illustrate that we can expect rates that are essentially on par with what is possible for the vanilla synthetic control estimator. In our experimental work, we complement these with bootstrapped confidence intervals and permutation tests for significance that we show work adequately.

2. Model

We measure some quantity of interest for an experimental unit (e.g., a region, in a region-split experiment) over a pre-treatment period of length , and a subsequent treatment period of length . We denote the measurement made on this experimental unit at any epoch by . We assume that the pre-treatment period consists of epochs in and that the treatment period consists of epochs in . We denote by the indicator of whether or not the experimental unit is treated at time , so that for all epochs in the pre-treatment period. We assume that is determined by the structural equation

(1)

where is an unknown treatment effect, is an (unknown) set of ‘shared common factors’ and is a set of (unknown) ‘factor loadings’ specific to the experimental unit. The noise

is assumed to be independent Gaussian with mean zero, and standard deviation

. The synthetic control paradigm assumes a generative setting where a weighted combination of observations in the pool of donor units closely approximates counterfactual observations on the experimental unit. Specifically we assume a pool of donor units, where for the th such unit

(2)

Here is the same set of shared common factors and is a set of factor loadings specific to the th donor unit. As before, .

2.1. Dynamic Design and Estimator

The typical synthetic control design is simply to set in the treatment period. Now define the cost of experimentation incurred at time by the ‘regret’ incurred in that epoch,

This definition captures both the negative impact of a sub-optimal treatment, and the opportunity cost of not using the treatment should it be optimal. The total cost incurred by the typical synthetic control design may then scale linearly with the length of the treatment period, .

In the interest of minimizing the total cost of experimentation, we allow to be dynamic. Specifically, require to be selected according to a randomized policy that is adapted to , the filtration generated by all treatment decisions taken, and observations made in the treatment and donor units, up to time . The total expected cost of experimentation, , or total expected regret incurred over the course of the experiment is then simply where the expectation is over the noise in observations and randomization in the design. Finally an estimator of the treatment effect, , is simply an

measurable random variable that ideally provides a good approximation to

. With this setup, we are now able to state the problems we wish to address:

  • First, we would like to produce a dynamic design, i.e., a process , that minimizes the total cost of experimentation .

  • Second, on the inferential side, we would like to design an estimator for which is ‘small’ with high probability, particularly when , since quantification of the treatment effect is most important in this case.

In what follows, we describe our main results, making precise the trade-off we achieve between controlling and the quality of our estimator.

2.2. Results

We propose a dynamic design, we dub Synthetically Controlled Thompson Sampling (SCTS), which we show achieves near-optimal experimentation cost:

Theorem 2.1 (Informal).

: Assume that the number of donor units . Then, under mild assumptions on the shared common factors, we have that SCTS incurs a cost of experimentation, .

In a nutshell, SCTS essentially eliminates the cost of experimentation, which as we noted earlier will scale linearly with in the traditional fixed synthetic control design. It is also worth placing the precise regret guarantee in context. To that end, note that we have no information pertaining to (which is essentially arbitrary in the synthetic control model) at the time we decide on

. Imagine for a moment, however, that at time

we observed for all . Treating this as a two-armed linear contextual bandit, Thompson sampling applied to this setup is then known to achieve regret (Abeille and Lazaric, 2017).111The notation suppresses dependence on logarithmic factors, and . Of course, even the history of the shared common factors is not available; rather these must be inferred from our observations over the donor units. As a further complication, even with noiseless observations of on the donor units we would only succeed in recovering the common factors up to a rotation. In light of these salient problem features it is notable that our regret guarantee depends linearly on , which is typically much smaller than the ambient number of donor units, . This guarantee is our main theoretical result.

We turn next to inference. There we know that the nominal (Abadie et al., 2010) synthetic control estimator achieves with probability ,

The regularization implicit in this estimator achieves a rate that is largely independent of .222We write if for some absolute constant . The constant depends on whereas the constant depends on the size of the shared common factors. Asymptotic distributions for this estimator without further distributional assumptions on the common factor process are unknown.

Our own estimator, , works as follows. We compute (i.e., the vanilla synthetic control estimator) using only observations in the pre-treatment period and those experimental epochs over which . We set on the event that the intervention was used over at least epochs; otherwise we set . We are then able to show that when , with probability ,

On the other hand, when , with probability , . Contrasting this with the high probability confidence intervals for , we see that on the event that , we get the same intervals as the synthetic control estimator. On the event that , all we learn is that the treatment effect is negative which is in essence the price we pay for controlling the cost of experimentation. The result follows from a simple idea expanded on in Section 5. In our computational experiments, we see that re-randomization tests for p-values and the corresponding inverted hypothesis tests (Fisher, 1966) for confidence intervals provide adequate power and coverage.

In their totality, these results show that we can largely eliminate the cost of experimentation at a modest cost to inference: when the treatment effect is negative we only learn that this is the case with high probability, as opposed to getting a precise estimate of the effect. Since in practical settings a precise estimate of the treatment effect is typically only needed when the treatment effect is positive (so as to ascertain whether the cost of implementing the intervention is justified), this is perhaps a modest price to pay.

3. Synthetically Controlled Thompson Sampling

We introduce SCTS, which adapts Thompson Sampling (TS) to the problem of dynamically selecting interventions so as to minimize expected regret in the setting of the previous section. The algorithm is conceptually simple: at the start of each epoch, , we compute a distribution over ‘plausible’ values of . This distribution may be thought of informally as an approximation to a posterior over under a non-informative prior, given the information available up to and including time . We then sample from this distribution, and pick if and only if the sampled value, is non-negative. To construct , we

  1. First, estimate the (unobserved) shared common factors for . We will accomplish this via PCA.

  2. Plugging-in the estimates obtained for above into the structural equation (1), we compute an estimate of and

    via ridge regression.

  3. We use the estimates of and the precision matrix obtained from the regression in the previous step to construct our approximation to the posterior on , .

Next, we make precise each of these steps, assuming, simply for notational convenience, that

Estimating Shared Common Factors: Recall from (2), that for each donor unit and epoch , we observe Define by the matrix with entry , and similarly, define by the noise matrix with entry . Now, let be the factor loadings matrix with th row , and denote by the common factors matrix with th row . By (2), we then observe at time :

(3)

We estimate at time by solving

(4)

We fix a specific solution to the above optimization problem via PCA. Specifically, let

be any singular value decomposition (SVD) of

. Denote by and the matrices obtained from the first columns of and respectively. Finally, let be the sub-matrix obtained from from its first rows and columns. By the Young-Eckart theorem, an optimal solution to (4), , can be obtained by setting , and

We recognize as precisely the usual ‘PCA loadings’; will serve as our approximation to .

Ridge Regression: Recall that in our synthetic control model, we have for the treatment unit, at each epoch . At time , we employ this structural equation to estimate via least squares, using as a plug-in estimator333While the subscript in makes precise that this is our estimate of at time , we will sometimes drop the subscript when clear from context. for , , the th row of . Our estimate of at time , , is obtained as the solution to the regularized least squares problem:

(5)

Here, is a regularization penalty.444We fix throughout the paper.

We find it convenient to define the ‘precision matrix’ of the estimator . Specifically, if we denote , then

. The ‘variance

555While for expositional purposes we use the terminology ‘precision matrix’ and ‘variance’, these quantities are of course not a precision matrix or variance since the design of the regression problem is not fixed. of our estimator is simply .

Approximate Posterior: For our approximation to the posterior on at time , we take

to be the uniform distribution,

. Here is a time-dependent ‘expansion’ factor we make precise later; for now we may simply consider to be an increasing sequence with .

As discussed earlier, SCTS draws a sample, from at time . Then, SCTS sets if and only if .

3.1. Discussion: Inconsistent Designs, the Failure of Optimism

Inconsistent Designs: Notice that the design employed in the regression (5) is inconsistent from period to period in the sense that the estimate for any fixed context in the design matrix changes from period to period. Part of this is simply due to noise – as time goes on we hope to compute a more accurate estimate of for any fixed . However, as it turns out even in the absence of noise (i.e., if were identically zero), we would still not expect consistency in the design since even in that case, we would only ever be able to recover the contexts up to a rotation. A priori it is unclear whether this inconsistency will allow for effective recovery of the treatment effect, and as such it is unclear whether we can expect the algorithm we have described to achieve low regret.

Optimistic Algorithms: A natural upper confidence bound (UCB) style alternative to the algorithm we have described, might proceed by defining the upper confidence bound , and then setting if and only if . Perhaps surprisingly, this algorithm would incur linear regret in general; see Appendix A that provides a simple and decidedly non-pathological example of this phenomenon. It is thus interesting that the ‘sampling’ aspect of the algorithm above eventually plays a crucial role in achieving sub-linear regret.

4. Regret Analysis

This section provides a regret analysis for SCTS. We begin by restating Theorem 2.1 formally. In order to do so, we must first state our assumptions, which concern the expected value of the observation matrix on the donor units, i.e., , a rank matrix. Specifically, we make assumptions on the decomposition . To do so, we first note that in our model, it is possible to assume, without loss, a canonical version of this decomposition (note that the selection of and is not unique due to the free choice of a rotation). In particular, letting be an SVD of , we may assume without loss, that and ; see Appendix B for details. Given this canonical decomposition, we assume:

Assumption 4.1 ().

For all , is upper bounded by a constant, , and .

We can now state our main regret bound for SCTS.

Theorem 4.2 (SCTS Regret).

Let . Then, under Assumption 4.1, SCTS achieves expected regret .

The notation in the above regret bound ignores terms that depend polynomially on and . As stated earlier, the linear dependence on above is of note. We also observe that beyond its rank, our guarantee remarkably has no further dependence on the spectrum of .

4.1. Proof Architecture for Theorem 4.2

The proof of Theorem 4.2 follows a familiar architecture that decomposes regret over time. We lay out this architecture here and will make precise two key results (Propositions 4.4 and 4.5) that enable the proof. Establishing these propositions is the core challenge in establishing a useful regret guarantee. In what follows, we find it convenient to define the ‘true context’ vector , and the associated precision matrix . The Elliptical Potential Lemma (Abbasi-Yadkori et al., 2011) then states

Lemma 4.3 (Elliptical Potential Lemma).

Under Assumption 4.1, it holds that

Now, we must control the error in our estimates of the context vectors, , and the consequent error in our estimation of . Specifically, define the event that the error in recovering is small, , according to

Here is the set of -dimensional rotations, and for some universal constant . Observe that we only control this error up to a rotation. We define the event that the error in estimating is small, , according to

We will control single-step regret on the ‘clean’ event that both these errors are controlled, ; this is a high probability event:

Proposition 4.4 (clean event).

For all , under Assumption 4.1,

This result is proved in Appendix D. The result relies on an analysis generalizing the Davis-Kahan theorem (to control ) and the usual self-normalized martingale concentration bounds (to control ). A key additional ingredient is needed — in controlling , we must deal with the issue of inconsistent designs in the regression (5). As discussed in Section 3.1, one issue driving this inconsistency is the fact that can only be recovered up to a rotation. The proof of Proposition 4.4 overcomes this challenge by showing that the actions selected under distinct rotations are in fact equal in distribution so that we can assume a canonical rotation without loss of generality. We now state our bound on single-step regret; this is the key result that enables our regret analysis and will be proved later in this section:

Proposition 4.5 (Single-step regret).

For some universal constant , we have for all ,

Finally, Theorem 4.2 follows from summing single-step regret and applying the Elliptical Potential Lemma,

The second inequality above relies on the fact that the clean event occurs with high probability (Proposition 4.4). The third inequality relies on our bound on the expected single-step regret, Proposition 4.5, and the final inequality is simply the elliptical potential lemma.

4.2. Bounding the single-step regret

Proposition 4.5 is a critical enabler of our regret guarantee. We now proceed with that proof. We will begin with stating three lemmas key to the proof. To that end, let , where if and otherwise; and recall that . Then we have:

Lemma 4.6 ().

On , .

This lemma is crucial to connecting single-step regret with an appropriate norm of so as to eventually facilitate the use of the elliptical potential lemma and will be proved later in this section. It is interesting to note that (Abeille and Lazaric, 2017) proves a version of this result which in our setting would eventually yield regret that scaled like as opposed to the accomplished here; further, the present proof is short. We also note that there is a norm mis-match in the lemma above since we ideally want to measure in the norm. To relate these two norms we note that the following is true on the event that is well approximated:

Lemma 4.7 ().

On , we have for all .

The proof of this Lemma is provided in Appendix D.3. The proof crucially uses the fact that the actions selected under distinct rotations of are in fact equal in distribution so that we can assume a canonical rotation without loss. Finally, we observe that the probability that the optimal action is selected is lower bounded by a constant:

Lemma 4.8 ().

On , the optimal action is selected with at least constant probability,

So equipped, we have

where the first inequality is Lemma 4.6, the second inequality is via Lemma 4.7, the third is simply by the law of total expectation, and the final inequality is via Lemma 4.8. Taking expectations conditioned on now yields the result of the proposition. In the remainder of this Section, we prove Lemmas 4.6 and 4.8.

Proof of Lemma 4.6: First observe that if , then . On the other hand, if , then . Consequently, if , then . Further,

completing the proof.

Proof of Lemma 4.8: Suppose Then whenever . Note that

When , the bound holds by symmetry.

5. Inference

Having run SCTS up to time , we must produce an estimate of the treatment effect, . To this end, we propose the use of one of two estimators. The first is simply to set if the intervention was used over at least epochs and to set otherwise. A distinct alternative considered in this section is to compute the usual synthetic control estimator (Abadie et al., 2010) and to set to this value on the event that the intervention was used over at least epochs (and to otherwise). This section will show that such an estimator

  • enjoys identical confidence intervals to the vanilla SC estimator on the event that the treatment effect is non-negative;

  • is with high probability when the treatment effect is negative.

As such we make precise the promise set out earlier in the paper of allowing for precise estimates of the treatment effect when they matter (i.e., when the treatment effect is non-negative, so as to permit a cost-benefit analysis of implementation, say), while continuing to conclude that the treatment is ineffective when it is not.666It may be feasible to construct estimators for which we can provide limiting distributions with additional distributional assumptions on the setup such as in (Li, 2020; Deshpande et al., 2018) or to more complicated synthetic control estimators (Doudchenko and Imbens, 2016; Li and Bell, 2017; Arkhangelsky et al., 2019). We leave this for future work.

5.1. Vanilla Synthetic Control

Before describing our estimator we make precise what can be accomplished with synthetic control and a fixed design (Abadie et al., 2010). In particular, SC seeks to find a linear combination of donor units to match the experimental unit based on the observations from the pre-treatment period. SC assumes the existence of weights such that for all times in the pre-treatment period . These weights are required to be non-negative, and must sum to one. This requirement that the synthetic control be constructed as a convex (as opposed to affine) combination of donor units serves effectively as a regularization mechanism. SC estimates the treatment effect by averaging over the differences between the synthetic control so constructed and the observed over the treatment period, i.e.,

It is difficult to calculate a limiting distribution for absent further distributional assumptions. That said, the analysis of (Abadie et al., 2010) allows for the following high-probability confidence intervals as a corollary (see Appendix B in (Abadie et al., 2010)):

Proposition 5.1 ().

With probability ,

The constant is a measure of how well conditioned-the subspace spanned by the common factors over the pre-treatment period is; and . In the typical case with and , we have

which is optimal up to logarithmic terms.

5.2. Using the Synthetic Control Estimator with an SCTS Design

We now describe one potential estimator for the setting where the experiment design is determined by SCTS. Let be the same weights used by the vanilla SC estimator; note that these are computed using data available exclusively over the pre-treatment period. Now let be the epochs over which the intervention is applied under our dynamic SCTS design, and denote by , the average difference between observed outcomes and the synthetic control over those epochs,

We then propose the following estimator, for the treatment effect,

This estimator then enjoys high-probability confidence intervals analogous to the vanilla synthetic control setting:

Proposition 5.2 ().

Let be fixed. Then,

  • When , with probability ,

  • When , with probability at least ,

The appendix proves a stronger version of this result that makes precise the dependence of the rate on , and allows for meaningful confidence intervals provided . As outlined at the outset, the result above shows high probability confidence intervals analogous to the vanilla synthetic control setting on the event that the treatment effect is non-negative. On the event where the treatment effect is negative, we see that we only learn that this is the case but do not recover a precise estimate of the effect. As argued earlier, in the practical settings we care about, a precise estimate of the treatment effect is typically not as important when the effect is negative. As a result the regret gains made possible via the use of SCTS likely constitute a beneficial tradeoff relative to the inference possible under .

As discussed earlier, the high-probability confidence intervals described in this section are typically quite conservative in practice. As such, in our experiments, we will explore the use of hypothesis tests based on a certain re-randomization of the data and confidence intervals derived from inverting these tests.

6. Experiments

This section undertakes an experimental evaluation of SCTS using both synthetic and real-world datasets. In the latter datasets, it is unclear that the synthetic control model holds (i.e., it is unclear that the observed data can be explained by a low rank factor model). We compare SCTS against both the standard fixed design (where over the entire treatment period), as well as a switchback design (where is set to with probability independently for each epoch in the treatment period) (Brandt, 1938). In the case of these incumbent designs, we estimate the treatment effect using state-of-the-art estimators gleaned from recent advances in ‘robust’ synthetic control and panel data regression. Our experiments will illustrate the following salient features of SCTS:

  1. The fraction of time SCTS chooses a sub-optimal action is small. In contrast, by definition, the switchback design picks a sub-optimal action half of the time, while the fixed design picks the sub-optimal action all of the time when the treatment effect is negative. Despite the material reduction in regret, our estimator of the treatment effect under SCTS achieves relative error comparable to the competing designs when the treatment effect is positive. This same estimator correctly identified that the treatment effect was negative in all experiments where this was the case.

  2. In the case of real world data where a low-rank factor model provides at best a ‘rough’ fit to the observations, we observe the same relative merits alluded to above. In that case, however, we observe an additional merit for SCTS, where the relative error in estimating the treatment effect is actually substantially lower than that for the fixed design (and comparable to that of the switchback design). The reason is that in the SCTS setting, our estimator can take advantage of data collected over the treatment period in estimating factor loadings and this appears to be particularly valuable when the common factor process is non-stationary.

  3. Re-randomization tests (Fisher, 1966) provide a means to construct well-powered hypothesis tests and confidence intervals for SCTS, despite the inferential challenges introduced by adaptive treatment assignment.

6.1. Low Regret and Estimation Error on Synthetic Data

Our first set of experiments seeks to establish that SCTS incurs low regret while recovering the treatment effect accurately. We first consider this on a synthetic set of problems that we describe next.

Experimental setup: We experiment with a synthetically generated dataset. We generate the latent factors and loadings with entries distributed i.i.d. as , with and . Similarly, we generate with i.i.d.

entries. We experiment with a signal-to-noise ratio of

; i.e., , and vary the sign of .

Estimation: There are a variety of treatment effect estimators we could use for any given experimental design. For SC, we report the best performance over several estimators from the literature; the best performing estimator on our synthetic examples is the robust synthetic control estimator proposed in (Amjad et al., 2018). For the switchback design, we use our ridge regression estimator . For SCTS, we set when , and 0 otherwise.

Results: Table 1 shows mean regret and estimation error for each algorithm averaged over problem instances. The results comport favorably with the salient features we outlined for SCTS at the outset. Specifically:

  1. SCTS picks a sub-optimal action over at most of the available testing epochs, thereby mitigating any cost of experimentation. By construction this number is for the switchback design and for the fixed design when the treatment effective is negative.

  2. Despite the above gain, we continue to recover the treatment effect accurately with the SCTS design. Specifically, the relative RMSE is comparable to the switchback and fixed design cases when the treatment effect is positive. The SCTS estimator correctly identified that the treatment effect was negative in all instances where this was the case.

  3. From the plots in Figure 1, we see that the relative merits of SCTS alluded to above are robust to the choice of experimentation horizon .

SCTS Switchback SC
Regret 0.02 0.50 1.00
Regret 0.01 0.50 0.00
RMSE 0.06 0.08 0.06
Table 1. Regret and relative RMSE, averaged over 50 random synthetic instances. Regret is normalized between 0 and 1. RMSE is normalized by . SCTS virtually eliminates the cost of experimentation, while providing estimates of of the same quality as more costly alternatives when .
Figure 1. Regret (normalized by ) and RMSE (normalized by ) over time, for the synthetic dataset. Unlike SC and switchback, SCTS exhibits regret vanishing over time in addition to a small RMSE. These qualities are robust to the experimentation horizon .

6.2. Real-world data

Our second set of experiments serves the same purpose as the earlier set, except that this time we consider real world data. As such, there is no true low-rank factor model describing the data; at best we may hope that such a model provides a good approximation to the observed data. As before, our goal will be to measure regret for SCTS as well as estimation error.

Experimental Setup: We adapt the Rossman Store Sales dataset777https://www.kaggle.com/c/rossmann-store-sales/, which contains daily sales data for drug stores over days. We take . Letting be the matrix of observations in the dataset, we generate an ensemble of 50 instances as follows. For each instance, we select a random store to be the experimental unit, with outcomes . The remaining stores constitute the control units, with observations . Viewing the rank now as an algorithmic hyper-parameter, we use in our experiments. This choice was made via cross-validation on the pre-treatment period, as in (Owen and Perry, 2009). As before, we experiment with two values of : and , where is estimated as mean squared error of relative to its best rank approximation.

Estimation: We use the same set of estimators here as in the previous set of experiments with synthetic data.

Results: At the outset, we note that the model equations (1)–(2) on which any of our designs or estimation approaches are predicated do not hold exactly in this setup. In particular, approximation error essentially precludes the ‘noise’ in our rank model from being Gaussian or i.i.d. Referring to Table 2, we observe:

  1. SCTS picks a sub-optimal action over at most of the available testing epochs, mitigating the cost of experimentation.

  2. Despite the above gain, we continue to recover the treatment effect accurately with the SCTS design. Specifically, the relative RMSE is comparable to the switchback design. The SCTS estimator correctly identified that the treatment effect was negative in all instances where this was the case.

  3. Especially interesting is that relative RMSE is substantially lower than that for the fixed design. We attribute this to the fact that the SCTS estimator (and the switchback estimator) are effectively able to utilize data obtained over the test period to continually refine the factor loadings defining the synthetic control – this appears to be particularly valuable in settings such as this dataset where the common factor process is not stationary.

  4. As an aside, for the switchback design we also compute the naive difference-in-means estimator888This simply takes the difference between the average reward for periods when , and the average reward for periods when ., which does not make use of the control observations. Doing so results in a much higher relative RMSE, illustrating the value of a synthetic control in estimating the treatment effect with that design.

  5. Finally, from the plots in Figure 2, we see that the relative merits of SCTS alluded to above are robust to the choice of experimentation horizon .

SCTS Switchback SC
Regret 0.13 0.50 1.00
Regret 0.09 0.50 0.00
RMSE 0.12 0.07,0.91* 0.57
Table 2. Regret and relative estimation error, averaged over random instances generated from the Rossman dataset. Regret is normalized between 0 and 1. RMSE is normalized by . In addition to low regret, SCTS even produces better quality estimates of in this setting, compared to SC. For Switchback, we report RMSE for two estimators: (RMSE=0.07), and a simple difference in means with no synthetic controls (RMSE=0.91), highlighting the importance of synthetic controls even with switchback designs.
Figure 2. Regret (normalized by ) and RMSE (normalized by ) over time, averaged over instances from the Rossman dataset. Unlike SC and Switchback, SCTS exhibits regret vanishing over time in addition to a small RMSE. In particular SCTS and Switchback estimators both display much lower RMSE than SC. These qualities hold essentially for all in the horizon.

6.3. A Non-Parametric Approach to Inference

Finally, we explore a re-randomization approach to inference for SCTS. In particular, while the high-probability concentration bounds of Lemma 4.4 can be used to provide confidence intervals for , they assume that the structural model (1) is realizable, and tend to be conservative in practice. On the other hand, post-bandit inference techniques such as those in (Deshpande et al., 2018; Bibaut et al., 2021) could possibly be adapted to our setting, but do not apply immediately (see Section 1.2 for discussion). This latter direction remains an exciting direction for future work.

Here we propose a re-randomization test similar to that of (Bojinov et al., 2020) to construct a hypothesis test for a sharp null that the treatment effect is some specific value. We then obtain confidence intervals by inverting this hypothesis test, as described in (Imbens and Rubin, 2015). The overall conclusion in this section is that (a) our hypothesis test is highly powered for relatively low values of SNR where the treatment effect is dominated by the noise and (b) our confidence intervals attain nearly ideal coverage even for very low SNR. All of these experiments are run on the real data setup described in the preceding section.

A Re-Randomized Hypothesis Test and Confidence Intervals:

We test the sharp null hypothesis

that the treatment effect is some constant , for all . To implement such a test, suppose that we have run SCTS for time steps, obtaining an estimator

. We take this estimator to be our test statistic. We can then construct an approximate hypothesis test, at significance level

, as follows:

  1. We are given an observed trajectory of interventions under SCTS, and the corresponding observations on the experimental unit .

  2. We next draw samples of the test statistic under the null hypothesis. We do so by re-running SCTS, but assuming that we observe the sequence of outcomes

  3. We can then approximate the p-value of the test statistic as one minus the proportion of the samples which are less than .

  4. We reject if the p-value is less than the significance level .

We may now construct confidence intervals by ‘inverting’ the above re-randomized hypothesis test, as described in (Imbens and Rubin, 2015). Precisely, for every null for some , we can implement a re-randomization test and decide whether to reject . The confidence set is then the set of values for which we do not reject the corresponding null .
Results: We assess our re-randomization test on 100 problem instances generated from the Rossman sales dataset, as above. We draw samples of the test statistic for each instance and choose the the significance level to be . The results in Table 3 show that this test is highly powered even when the treatment effect is dominated by the noise (i.e., at an SNR of ). Power is already nearly at an SNR of . Further, we see that the coverage of the test is close to ideal (given the significance level of , ideal here is ) over a broad range of SNRs from to . In summary, we conclude that the re-randomization tests and corresponding confidence intervals reported here are adequate for inference even when SNR is low.

0.02 0.1 0.2 1
Coverage 0.89 0.87 0.87 0.90 0.89
Power 0.14 0.17 0.51 0.81 0.98
Table 3. Performance of the re-randomization test and confidence intervals on Rossman problem instances, as a function of the effect size (normalized as SNR). As expected, coverage attains roughly the nominal level where , while power increases quickly as we increase the effect size.

7. Limitations and Open Directions

We motivated our dynamic design by the real-world setting where, in order to deal with issues of interference, we must select treatment units that are ‘coarse’. We discussed at length that these coarse units often necessitate synthetic controls to enable inference. This is true even in switchback designs where using the ‘untreated’ history of the experimental unit is not a reliable control, especially when the data has non-stationary components. At the same time, by virtue of being coarse, the cost of experimentation as embodied by the number of epochs over which a potentially sub-optimal action was taken, is no longer trivially ignored.

The SCTS approach is an attempt to provide a new dynamic design that addresses these issues. In practical settings, however, one must contend with real-world issues not addressed by the model studied here. We outline these issues here and present directions for future work that might serve to address them:

Spill-over Effects: In dynamic designs — the switchback design is a simple example, SCTS is another — one often cares about ‘spill-over’ effects wherein a treatment applied in one epoch might influence outcomes in subsequent epochs. The fix to this issue is to typically allow a ‘burn-in’ period that ignores epochs impacted by such spill-overs. These burn-in periods typically precede and follow a switch from one type of treatment to another. Whereas we have not posited a formal model, it is reasonable to conjecture that in our setting, one could employ a similar strategy. Since the number of switches in SCTS is small (i.e. ), the added regret from such burn-in periods will scale sub-linearly with the horizon.

Interventions over Consecutive Epochs: For some interventions, it may be practically necessary (for instance, from a consumer experience standpoint) that any intervention be maintained over a certain minimum number of consecutive epochs. We believe this to be an important area for future work, closely related to notions of switching costs and batching in the bandit literature. A number of flavors of this problem have been considered in recent years, including incorporating switching costs (Dekel et al., 2014), and batching that makes the decision to stop using a potential intervention irrevocable (Perchet et al., 2016). Very recently, (Esfandiari et al., 2021; Han et al., 2020) have extended the batched bandit formalism to linear contextual bandits. Whereas none of these models precisely address the modeling need above, they provide a very reasonable foundation for a potential extension to SCTS that incorporates the constraint that any intervention must be pursued for a certain minimal number of consecutive epochs.

Post-Bandit Inference: This is an issue we have discussed earlier. Specifically, while we can establish high-probability confidence intervals (that are conservative) and re-randomization tests (that appear to work well practically), we would ideally like to construct estimators with limiting distributions that permit powerful inference. The growing post-bandit inference literature (Bibaut et al., 2021; Hadad et al., 2021; Deshpande et al., 2018) provides an approach to accomplishing this goal. The primary road block here is that existing proposals ask for a lower bound on the rate of decay of exploration, and it is not clear that such a lower bound is met by the current proposal.

References

  • Abadie (2019) Abadie, A. (2019). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature.
  • Abadie et al. (2010) Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American statistical Association, 105(490):493–505.
  • Abadie and Gardeazabal (2003) Abadie, A. and Gardeazabal, J. (2003). The economic costs of conflict: A case study of the basque country. American economic review, 93(1):113–132.
  • Abadie and Zhao (2021) Abadie, A. and Zhao, J. (2021). Synthetic controls for experimental design. arXiv preprint arXiv:2108.02196.
  • Abbasi-Yadkori et al. (2011) Abbasi-Yadkori, Y., Pál, D., and Szepesvári, C. (2011). Improved Algorithms for Linear Stochastic Bandits.
  • Abeille and Lazaric (2017) Abeille, M. and Lazaric, A. (2017). Linear Thompson Sampling Revisited. In

    Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA

    , pages 176–184.
  • Agarwal et al. (2021) Agarwal, A., Shah, D., Shen, D., and Song, D. (2021). On robustness of principal component regression. Journal of the American Statistical Association, (just-accepted):1–34.
  • Agrawal and Goyal (2013) Agrawal, S. and Goyal, N. (2013). Thompson Sampling for Contextual Bandits with Linear Payoffs. In

    Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013

    , pages 127–135.
  • Amjad et al. (2019) Amjad, M., Misra, V., Shah, D., and Shen, D. (2019). mrsc: Multi-dimensional robust synthetic control. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(2):1–27.
  • Amjad et al. (2018) Amjad, M., Shah, D., and Shen, D. (2018). Robust synthetic control. The Journal of Machine Learning Research, 19(1):802–852.
  • Arkhangelsky et al. (2019) Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., and Wager, S. (2019). Synthetic difference in differences. Technical report, National Bureau of Economic Research.
  • Athey et al. (2021) Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, pages 1–41.
  • Athey and Imbens (2017) Athey, S. and Imbens, G. W. (2017). The state of applied econometrics: Causality and policy evaluation. Journal of Economic Perspectives, 31(2):3–32.
  • Auer (2002) Auer, P. (2002). Using Confidence Bounds for Exploitation-Exploration Trade-offs. Journal of Machine Learning Research, 3(Nov):397–422.
  • Bai and Ng (2019) Bai, J. and Ng, S. (2019). Matrix completion, counterfactuals, and factor analysis of missing data. arXiv preprint arXiv:1910.06677.
  • Ben-Michael et al. (2021) Ben-Michael, E., Feller, A., and Rothstein, J. (2021). The augmented synthetic control method. Journal of the American Statistical Association, (just-accepted):1–34.
  • Berry (2012) Berry, D. A. (2012). Adaptive clinical trials in oncology. Nature reviews Clinical oncology, 9(4):199–207.
  • Bibaut et al. (2021) Bibaut, A., Chambaz, A., Dimakopoulou, M., Kallus, N., and van der Laan, M. J. (2021). Post-Contextual-Bandit Inference. CoRR, abs/2106.00418.
  • Bojinov et al. (2020) Bojinov, I., Simchi-Levi, D., and Zhao, J. (2020). Design and Analysis of Switchback Experiments. Available at SSRN 3684168.
  • Brandt (1938) Brandt, A. (1938). Tests of significance in reversal or switchback trials. 234.
  • Brodersen et al. (2015) Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., and Scott, S. L. (2015). Inferring causal impact using bayesian structural time-series models. The Annals of Applied Statistics, 9(1):247–274.
  • Chen et al. (2020a) Chen, Y., Chi, Y., Fan, J., and Ma, C. (2020a). Spectral methods for data science: A statistical perspective. arXiv preprint arXiv:2012.08496.
  • Chen et al. (2020b) Chen, Y., Loncaric, M., Moallemi, B., and Taylor, S. J. (2020b). Synthetic control estimators in practice. Technical report, Lyft.
  • Chernozhukov et al. (2021) Chernozhukov, V., Wüthrich, K., and Zhu, Y. (2021). An exact and robust conformal inference method for counterfactual and synthetic controls. Journal of the American Statistical Association, (just-accepted):1–44.
  • Dekel et al. (2014) Dekel, O., Ding, J., Koren, T., and Peres, Y. (2014). Bandits with switching costs: T2/3 regret. In

    Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing

    , STOC ’14, pages 459–467, New York, NY, USA. Association for Computing Machinery.
  • Deshpande et al. (2018) Deshpande, Y., Mackey, L., Syrgkanis, V., and Taddy, M. (2018). Accurate inference for adaptive linear models. In International Conference on Machine Learning, pages 1194–1203. PMLR.
  • Doudchenko et al. (2019) Doudchenko, N., Gilinson, D., Taylor, S., and Wernerfelt, N. (2019). Designing experiments with synthetic controls. Technical report, Working paper.
  • Doudchenko and Imbens (2016) Doudchenko, N. and Imbens, G. W. (2016). Balancing, regression, difference-in-differences and synthetic control methods: A synthesis. Technical report, National Bureau of Economic Research.
  • Esfandiari et al. (2021) Esfandiari, H., Karbasi, A., Mehrabian, A., and Mirrokni, V. (2021). Regret Bounds for Batched Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8):7340–7348.
  • Farias et al. (2021) Farias, V. F., Li, A. A., and Peng, T. (2021). Learning treatment effects in panels with general intervention patterns. arXiv preprint arXiv:2106.02780.
  • Fisher (1966) Fisher, R. (1966). Design of Experiments. Hafner of Edinburgh.
  • Hadad et al. (2021) Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S., and Athey, S. (2021). Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences, 118(15).
  • Han et al. (2020) Han, Y., Zhou, Z., Zhou, Z., Blanchet, J., Glynn, P. W., and Ye, Y. (2020). Sequential Batch Learning in Finite-Action Linear Contextual Bandits.
  • Hsiao et al. (2012) Hsiao, C., Steve Ching, H., and Ki Wan, S. (2012). A panel data approach for program evaluation: measuring the benefits of political and economic integration of hong kong with mainland china. Journal of Applied Econometrics, 27(5):705–740.
  • Imbens and Rubin (2015) Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
  • Jones and Barrows (2019) Jones, N. and Barrows (2019). Uber’s synthetic control. https://www.youtube.com/watch?v=j5DoJV5S2Ao.
  • Kveton et al. (2021) Kveton, B., Konobeev, M., Zaheer, M., Hsu, C.-w., Mladenov, M., Boutilier, C., and Szepesvari, C. (2021). Meta-Thompson Sampling. arXiv preprint arXiv:2102.06129.
  • Lattimore and Szepesvári (2020) Lattimore, T. and Szepesvári, C. (2020). Bandit Algorithms.
  • Li (2020) Li, K. T. (2020). Statistical inference for average treatment effects estimated by synthetic control methods. Journal of the American Statistical Association, 115(532):2068–2083.
  • Li and Bell (2017) Li, K. T. and Bell, D. R. (2017). Estimation of average treatment effects with panel data: Asymptotic theory and implementation. Journal of Econometrics, 197(1):65–75.
  • Nie et al. (2018) Nie, X., Tian, X., Taylor, J., and Zou, J. (2018). Why adaptively collected data have negative bias and how to correct for it. In International Conference on Artificial Intelligence and Statistics, pages 1261–1269. PMLR.
  • Owen and Perry (2009) Owen, A. B. and Perry, P. O. (2009). Bi-Cross-Validation of the SVD and the Nonnegative Matrix Factorization. The Annals of Applied Statistics, 3(2):564–594.
  • Perchet et al. (2016) Perchet, V., Rigollet, P., Chassang, S., and Snowberg, E. (2016). Batched bandit problems. The Annals of Statistics, 44(2):660–681.
  • Shamir (2011) Shamir, O. (2011). A variant of azuma’s inequality for martingales with subgaussian tails. arXiv preprint arXiv:1110.2392.
  • Villar et al. (2015) Villar, S. S., Bowden, J., and Wason, J. (2015). Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statistical Science, 30(2):199–215.
  • Xu (2017) Xu, Y. (2017). Generalized synthetic control method: Causal inference with interactive fixed effects models. Political Analysis, 25(1):57–76.

Appendix A Failure of UCB

Here, we construct a class of problem instances to show that the UCB algorithm incurs linear regret.

To begin, consider the case when and (i.e. the noiseless scenario). For any , let and for Further, let Then the observation is given by

and

Recall that UCB chooses actions by the following procedure: (i) compute by estimating the common factors through SVD; (ii) solve the ridge regression problem to obtain and its ‘variance’ estimate ; (iii) play if , and otherwise. Next, we will show that this algorithm will constantly choose , for any ridge regularizer and any sequence of , thereby incurring regret.
Estimating shared common factors. By the SVD of , one has

with and Then the estimator , i.e., in the noiseless setting,
Ridge regression. Recall that we will solve the following (regularized) least squares problem at time step .

Under the constructed setting and the assumption that for , the problem is equivalent to

which has the closed-form solution

Action decision. Since , we have and hence . This implies the UCB algorithm will play for all , thereby incurring regret .

Appendix B Canonical Decomposition

Consider the structural model given by parameters

It is easy to see that the selections of are not unique due to the free rotations.

Let be an SVD of and let . We aim to show that can constitute a canonical representation. In particular, it is sufficient to show that there exist unique and such that

Since is exactly rank , the column spaces of and

must be the same. Therefore there must exist an invertible matrix

such that