1 Introduction
Suppose we observe a population of experimental units. Denote by the potential outcome of unit
in the counterfactual world in which the population is assigned treatments according to the vector
, where () means unit is assigned to treatment (control). The fact that is a function of for represents interference, the possibility that treatments assigned to alters can influence the ego.An important estimand of practical interest in this context is the global average treatment effect, defined as
where () is the dimensional vector of ones (zeros). This compares average outcomes in the counterfactual world in which all units are treated to those in the world in which none are treated. Each average can only be directly observed in the data under an extreme design that assigns all units to the same treatment arm, but any such design would necessarily preclude observation of the other counterfactual. Common designs studied in the literature, including those studied here, assign different units to different treatment arms, so neither average is directly observed in the data. Nonetheless, we show that asymptotic inference on is possible for a class of clusterrandomized designs when the degree of interference diminishes with spatial distance.
Much of the existing literature on interference instead assumes that interference is sufficiently restricted that it can be summarized by a lowdimensional exposure mapping and that units are individually randomized into treatment or control either via Bernoulli or complete randomization (e.g. Aronow and Samii, 2017; Basse et al., 2019; Forastiere et al., 2021; Manski, 2013; Toulis and Kao, 2013). Jagadeesan et al. (2020) and Ugander et al. (2013) also utilize exposure mappings but depart from unitlevel randomization. They propose new designs that introduce clusterdependence in unitlevel assignments in order to improve the precision of certain HorovitzThompson estimators.
We build on this literature by (1) studying rateoptimal choices of both clusterrandomized designs and HorovitzThompson estimators, (2) avoiding exposure mapping restrictions on interference, which can be quite strong (Eckles et al., 2017), and (3) developing a distributional theory for the estimator and a variance estimator for practical inference. Regarding (2), most exposure mappings used in the literature imply that only units within a small, known distance from the ego can interfere with the ego’s outcome. We instead study a weaker restriction on interference similar to Leung (2021a), which states that the degree of interference decreases with distance but does not necessarily zero out at any given distance.
Regarding (1), we study clusterrandomized designs in which units are partitioned into spatial clusters, clusters are independently randomized into treatment and control, and the assignment of a unit is dictated by the assignment of its cluster. By introducing correlation in assignments in this fashion, depending on the estimator used, these designs can reduce overlap problems common under Bernoulli randomization, which improves the rate of convergence. For analytical tractability, we focus on designs in which clusters are equally sized squares, each design distinguished by the number of such squares. We pair each design with a HorovitzThompson estimator that compares the average outcomes of units with all neighbors treated to those with none, where the neighborhood radius is of the same order as the cluster size dictated by the design. See Figure 1 for a depiction of a hypothetical design and neighborhoods used to construct the estimator.
Our results inform how the analyst should choose the number of clusters (and hence, the cluster size and neighborhood radius of the estimator) to minimize the rate of convergence of the estimator. Notably, unlike existing work on clusterrandomization with interference, the sizes of clusters must generally grow with the population size due to the relatively weak restriction on interference imposed. We also discuss practical implementation of the designs by partitioning space using clustering algorithms and explore the performance of spectral clustering in simulations.
Finally, regarding (3), we show that the estimator is asymptotically normal and provide a variance estimator. These results appear to be novel, as no existing central limit theorems seem to apply to our setup in which treatments exhibit cluster dependence, clusters can be large, and units in different clusters are spatially dependent due to interference. We argue that the variance estimator is typically conservatively biased and that the bias reflects heterogeneity in unitlevel treatment effects, which extends the wellknown result on conservative variance estimation
(Neyman, 1990) to our setting with spatial interference.Of course, rateoptimality results do not determine the choice of nonasymptotic constants that are often important in practice under smaller sample sizes. Still, they provide an important first step toward designing practical procedures. Due to the generality of the setting, which imposes quite minimal assumptions on interference, it seems reasonable to first study rateoptimality, as finitesample optimality appears to require substantial additional structure on the problem. We note that existing results on graph cluster randomization, which require even stronger restrictions on interference than this paper, are nonetheless limited to rates, and it has been an open question how to “best” construct clusters in practice.
Most of the literature supposes interference is mediated by a network. Studying optimal design in this setting is difficult because network clusters can be highly heterogeneous, and their graphtheoretic properties can closely depend on the generative model of the network (Leung, 2021b). To make the optimal design problem more analytically tractable, we focus on spatial interference and a class of designs that partitions space into equally sized squares while exploring in simulations the performance of more realistic designs.
Spatial interference is of empirical interest in its own right. A rideshare company may wish to compare the performance of two different pricing algorithms, but the algorithms may induce behavior that generates spatial externalities, such as congestion, resulting in interference. Many phenomena diffuse primarily through physical interaction, for example, infectious diseases, crime (VerbitskySavitz and Raudenbush, 2012), neighborhood effects more broadly (Sobel, 2006), and pollution (Giffin et al., 2020).
Related Literature. There is relatively little work on optimal experimental design under interference. Viviano (2020) studies varianceminimizing twowave experiments under network interference. Baird et al. (2018) study the power of randomized saturation designs under partial interference.
A growing literature studies designs for interference that depart from unitlevel randomization. A key paper motivating our work is Ugander et al. (2013), who propose graph cluster randomization designs under network interference. Ugander and Yin (2020) suggest a new variant of these designs, and Harshaw et al. (2021)
study related designs for bipartite experiments. These papers assume interference is summarized by certain 1neighborhood exposure mappings, which enables the construction of unbiased estimators and use of designs in which clusters are small (asymptotically bounded in size). We focus on a spatial setting but impose weaker restrictions on interference. As we show in
Theorem 2, the latter requires the use of designs in which cluster sizes are large in order for the bias to be small. This creates a biasvariance tradeoff, and we discuss rateoptimal designs to best choose cluster size to minimize asymptotic meansquared error.Eckles et al. (2017) study how graph cluster randomization can reduce the bias of common estimators for in the absence of correctly specified exposure mappings. PougetAbadie et al. (2018) propose twostage clusterrandomized designs to minimize bias under a monotonicity restriction on interference. Basse and Airoldi (2018), Jagadeesan et al. (2020), and Sussman and Airoldi (2017) focus on linear potential outcome models and propose designs targeting the direct average treatment effect, rather than . Under a normalsum model, Basse and Airoldi (2018) compute the meansquared error of the differenceinmeans estimator, which they use to suggest modelassisted designs.
The work on cluster randomization above targets global effects such as (also see Chin, 2019; Choi, 2017)
. However, much of the literature on interference instead considers estimands defined by exposure mappings. These and related estimands can often be functions of assignment probabilities, so that their interpretation may be somewhat specific to the experiment run
(Sävje et al., 2021). Hu et al. (2021) (§5) views this as “largely unavoidable” in nonparametric settings with interference. Our results show that inference on , which avoids this issue, is possible under restrictions on interference weaker than those typically used in the literature.There is a large literature on clusterrandomized trials (e.g. Hayes and Moulton, 2017; Park and Kang, 2021). This literature predominantly requires partial interference, meaning that units are divided into clusters, and those in distinct clusters do not interfere with one another. That is, the clusters themselves impose restrictions on interference. In our setting, clusters are determined by the design and do not restrict interference.
Finally, Aronow et al. (2020) and Pollmann (2020) study spatial interference in a different experimental setting in which treatments are assigned to distinct spatial locations, rather than to units directly. This shares some similarities with spatial cluster randomization, where different spatial regions are randomized into treatment, so some of the ideas here may be applicable to their setting.
Outline. The next section defines the model of spatial interference and the class of designs and estimators studied. In § 3, we derive the estimator’s rate of convergence and discuss rateoptimal designs. In § 4, we prove that the estimator is asymptotically normal, propose a variance estimator, and characterize its asymptotic properties. We report results from a simulation study in § 5, exploring the use of spectral clustering to implement the designs. Finally, § 6 concludes.
2 Setup
Let be a set of units. We study experiments in which units are clusterrandomized into treatment and control, postponing to § 2.2 the specifics of design. For each , let
be a binary random variable where
indicates that is assigned to treatment and indicates assignment to control. Let be the vector of realized treatments and denote a nonrandom vector of counterfactual treatments. Recall from § 1 that is the potential outcome of unit in the counterfactual world in which all units are assigned treatments according to . Formally, for any and , is a nonrandom function from to . We denote ’s factual, or observed, outcome by .We maintain the standard assumption that potential outcomes are uniformly bounded.
Assumption 1 (Bounded Outcomes).
.
2.1 Spatial Interference
Thus far, the model allows for unrestricted interference in the sense that may vary essentially arbitrarily in any component of . That is, the treatment assigned to any unit may arbitrarily affect any other unit’s outcome. In order to obtain a positive result on asymptotic inference, it is necessary to impose restrictions on interference in order to obtain some form of weak dependence across unit outcomes. The existing literature on interference primarily focuses on restrictions captured by neighborhood exposure mappings, which imply that can only interfere with if . We will discuss how this assumption is potentially restrictive and establish results under weaker conditions.
We assume each unit is located in . Label each unit by its location, so that . Equip this space with the sup metric for , , and . Let , the ball of radius centered at . Under the sup metric, balls are squares, and the radius is half the side length of the square. We consider a sequence of population regions such that
and is the origin. That is, units are located in the ball with growing radius . Combined with the next increasing domain assumption, the number of units in the region is , but we will simply assume the number is exactly .
Assumption 2 (Increasing Domain).
There exists such that for any and , .
This assumption is standard in the spatial literature, requiring that units are minimally separated in space (e.g. Jenish and Prucha, 2009).
Let be the set of nonnegative reals and
the neighborhood of . We study the following model of interference similar to that proposed by Leung (2021a).
Assumption 3 (Interference).
There exists a nonincreasing function such that , , and, for all ,
To interpret this, observe that maximizes over pairs of treatment assignment vectors that fix the assignments of units in ’s neighborhood but allow assignments to freely vary outside of this neighborhood. It therefore measures the maximum deviation in ’s potential outcome caused by manipulating treatments assigned to units “distant” from in the sense that . The assumption requires the maximum deviation to vanish with the neighborhood radius , so that treatment assignments of more distant alters interfere less with the ego.
The rate at which the maximum deviation vanishes with is controlled by . In particular, is required to decay at a rate faster than , which is a familiar rate in the literature on spatial dependence. Assumption 3(b) of Jenish and Prucha (2009) and Assumption 4(c) of Jenish and Prucha (2012)
impose the same rate of decay on various measures of spatial dependence (mixing or nearepoch dependence coefficients) to establish central limit theorems.
We next discuss two prominent examples of models of interference permitted by Assumption 3. The first is the standard modeling approach of specifying a neighborhood exposure mapping. Such a mapping is given by , where , in particular its dimension, does not depend on , unlike . The approach is to assume that the lowdimensional exposure mapping summarizes interference by reparameterizing potential outcomes as
(1) 
That is, once we fix ’s exposure mapping , her potential outcome is fully determined. No less important, it is also typically assumed that exposure mappings are restricted to a unit’s neighborhood, where is small, meaning fixed with respect to . This means that for any such that for all , which implies that the treatment assigned to a unit only interferes with if . In practice, choices with are most common, for example for or for . In these examples, the first component of captures the direct effect of the treatment, and the second component, , captures interference from units near .
Crucially, and must be known to the analyst in this approach, which is often a strong requirement. In contrast, Assumption 3 enables the analyst to impose (1) while requiring neither to be known. Indeed, if there exists a neighborhood exposure mapping such that (1) holds, then Assumption 3 holds with for some sufficiently large.
Furthermore, Assumption 3 allows for more complex forms of interference ruled out by (1) in which interference decays with distance without being abruptly truncated at some distance
. An example is the CliffOrd autoregressive model
(Cliff and Ord, 1973, 1981). This is a workhorse model of spatial autocorrelation used in a variety of fields, including geography (Getis, 2008), ecology (Valcu and Kempenaers, 2010), and economics (Anselin, 2001). A typical formulation of the model iswhere we assume is uniformly bounded (to satisfy Assumption 1). Let be the spatial weight matrix whose th entry is . These weights typically decay with distance in a sense to be made precise below. While this model is highly stylized, the important aspect it captures is autocorrelation through the spatial autoregressive parameter . If this is nonzero, then there is no neighborhood exposure mapping for which (1) holds, a point previously noted by Eckles et al. (2017) in the context of network interference.
To see this, first note that coherency of the model requires nonsingularity of , where is the identity matrix. Let be the inverse of this matrix and its th entry. Then the reduced form of the model is
(2) 
a spatial “moving average” model with spatial weight matrix . (See § 5 for some examples of .) Noticeably, can potentially depend on for any , which is ruled out if one imposes a neighborhood exposure mapping.
Jenish and Prucha (2011) show that outcomes satisfying (2) are nearepoch dependent, a notion of weak spatial dependence, when the weights decay with spatial distance in the following sense:
(3) 
for some (see Proposition 5 and eq. (13) of Jenish and Prucha, 2011). The next result shows that this condition is sufficient for verifying Assumption 3 if .
Proposition 1.
Suppose potential outcomes are given by (2) and spatial weights satisfy (3) for . Then Assumption 3 holds with for some that does not depend on .
Proof. For , fix any such that for all . For , . For ,
Define and take . Then by (3), , and uniform boundedness of .
This result shows that, unlike the standard approach of imposing a neighborhood exposure mapping, Assumption 3 can allow for richer forms of interference in which alters arbitrarily distant from the ego can interfere with the ego’s response.
2.2 Design and Estimator
Much of the literature on interference considers designs in which units are individually randomized into treatment and control, either via Bernoulli or complete randomization. A common problem faced by such designs is limited overlap, meaning that, for many exposure mappings of interest, is unlikely to take certain values. For example, suppose that (1) holds with exposure mapping , the number of treated units in ’s neighborhood. Then for larger values of , is small, tending to zero with possibly exponentially in a Bernoulli design. This is problematic for a HorovitzThompson estimator such as where . Due to , the variance grows rapidly with if either or is nonzero. To reduce this problem, Ugander et al. (2013) propose clusterrandomized designs, which deliberately introduce dependence in treatment assignments across certain units.
We consider the following class of such designs. We assign units to mutually exclusive clusters by partitioning the population region into equally sized squares, assuming for simplicity that . That is, to obtain increasingly more clusters, we first divide the population region into four squares, then divide each of these squares into four squares, and so on, as in Figure 2. Label the squares , and call
cluster . Then the number of units in each cluster is uniformly , and the radius of each cluster is
which we assume is greater than 1. For simplicity, we assume there are no units on the common boundaries of different squares, so that the squares partition .
We assume is realized according to a clusterrandomized design, which independently assigns each cluster to treatment and control with some probability
that is fixed with respect to , for example . Within a treated (control) cluster , all are assigned (). Note that is indexed by , as we clearly need a growing number of clusters for the variance of the estimator to concentrate. If is order , then , so clusters are asymptotically bounded in size, the usual case studied in the literature, which includes unitlevel Bernoulli randomization as a special case. However, if is of smaller order, then clusters sizes grow with .
Remark 1 (Practical Implementation).
Since units are likely to be irregularly distributed in practice, division into equally sized squares may be inefficient. An alternative is to use a clustering algorithm. In § 5, we explore in simulations the performance of spectral clustering when choosing the number of clusters according to the theory in § 3.
To construct the estimator, define the neighborhood exposure indicator
an indicator for whether ’s neighborhood is entirely treated () or untreated (). Let . We study the HorovitzThompson estimator
This is a natural estimator for since it estimates (for example) using the outcomes of units surrounded by treated neighbors, for a potentially growing neighborhood radius . Note that the radius depends on through , so is a function of the number of clusters dictated by the design. Figure 1 depicts the relationship between the clusters and the neighborhoods that determine exposure.
Remark 2 (Overlap).
Under Bernoulli randomization, overlap needs to be imposed as a separate assumption for asymptotic inference. By overlap we mean the probability weights and are uniformly bounded away from zero and one, which imposes potentially strong restrictions on the types of exposure mappings the analyst can use, as previously illustrated. In our setup, however, overlap is already satisfied because and , where is the number of clusters whose units lie in . Our choice of implies for all , so overlap holds.
Let be the centroid of cluster . The choice of ensures that, for any unit in the “interior” of a cluster in the sense that , ’s neighborhood also lies within that cluster, in which case the exposure probabilities are simply given by the cluster assignment probability: and . If we instead had chosen, say, , then this would be true only for the centroid, while for the remaining units, and could be as small as , which means less overlap and a more variable estimate. For the purposes of the asymptotic theory, the main requirement is that is of the same order as . If were of smaller order, then, as we will see in the next section, the bias of could be nonnegligible, whereas if were of larger order, then in Remark 2 would grow with , overlap would be limited, and may be nonnegligible.
3 RateOptimal Designs
In this section, we derive the rate of convergence of as a function of , , and and use this to obtain rateoptimal choices of . Recall that designs are parameterized by , which determines the number and sizes of clusters, and also that depends on through , so we will be optimizing over both the design and neighborhood radius that determines the estimator.
Our first result provides asymptotic upper bounds on the bias and variance of . For two sequences and , we write to mean and to mean .
Proof. First, we bound the bias. If , then all units in are treated, so by Assumption 3,
The same argument applies to , so combining these results, we obtain the rate for the squared bias.
Next, we bound the variance. Let
(4) 
This is the set of units whose neighborhoods intersect a cluster that also intersects ’s neighborhood. We have
(5) 
Note that contains units from at most 16 clusters (the worst case is when intersects four clusters), and clusters contain uniformly units by Lemma A.1 of Jenish and Prucha (2009). By Assumption 1 and Remark 2, is uniformly bounded, so
The difficult part of the argument is obtaining a tight enough rate for . Lemma A.1 in § A proves that . Therefore, , which is at most since .
Our second result provides asymptotic lower bounds.
Theorem 2.
Proof. See § A.
The restrictions on hold for and for and .
The result shows that we can construct potential outcomes satisfying the assumptions of Theorem 1 under which the bias is at least order and the variance at least . As discussed in § 1, existing work on cluster randomization under interference assumes clusters have asymptotically bounded size, which, in our setting, implies . Theorem 2 shows that the bias of the HorovitzThompson estimator can then be bounded away from zero, showing that existing results rely crucially on the exposure mapping assumption to obtain unbiased estimates. In the absence of this assumption, it is necessary to consider designs in which cluster sizes are large for the bias to vanish with .
Designs. Theorem 1 implies the meansquared error is at most of order , and Theorem 2 provides a similar asymptotic lower bound. Under either bound, the bias increases with while the variance decreases, so the results show a biasvariance tradeoff in the choice of design. We next derive rates for that minimize or nearly minimize the upper bound, for different assumptions on .

Oracle design. Suppose is known to decay with at some rate . Then since , the rateoptimal design chooses to minimize .

Worstcase bias. More commonly, we may have little prior knowledge about . Nonetheless, Assumption 3 provides an upper bound on the rate: for some . As previously discussed, this is a standard rate for spatial dependence used to establish central limit theorems. If decays slower than this rate, then a general positive result on asymptotic inference appears unattainable.
Since is order , under this worstcase rate, the bias is order . Without knowledge of , we can settle for a “nearly” rateoptimal design under the worstcase bias by setting . Then we would choose to minimize , which yields and an rate of convergence.

Exponential decay. Common specifications of the spatial weight matrix in the CliffOrd model imply that decays exponentially with , for example, the rownormalized matrix
(6) If is known to decay at some exponential rate but the exponent is unknown, then we may choose for any small for a nearly rateoptimal design, which yields a rate of convergence of , close to an rate.

Exposure mappings. If we assume (1) holds for some neighborhood exposure mapping, then for all . If is unknown, then for a nearly rateoptimal design, we can choose to grow slightly slower than , say , which ensures that grows at a slow rate and therefore eventually exceeds . Hence, for large enough , the bias is zero, and the rate of convergence is . On the other hand, if is known, then by choosing , we have and zero bias. In this case, clusters are asymptotically bounded in size, the estimator converges at an rate, and both the design and estimator qualitatively coincide with those of Ugander et al. (2013).
The third design is interesting because it shows we can attain rates close to in the absence of exposure mapping assumptions, despite targeting the global average treatment effect. However, in practice, the analyst likely has minimal knowledge of , in which case our default conservative choice is
Under this class of designs, cluster sizes grow at the rate .
In all four designs above, the squared bias is , which is of smaller order than the variance. This makes the bias negligible from an asymptotic standpoint, but under the worstcase bias, it would be useful to develop biasreduction methods. We also reiterate that, while this analysis only provides rates, at this level of generality, this is apparently by necessity. A finitesample optimal design seems to require substantially more knowledge of the functional form of .
4 Inference
This section provides results for asymptotic inference on . Define .
Assumption 4 (Nondegeneracy).
.
This is a standard condition, and the proof of Theorem 2 constructs potential outcomes for which this is satisfied.
Proof. See § A.
The result centers at its expectation, not the estimand . However, in all of the designs discussed at the end of § 3, the bias is , in which case we can replace with in the result. Also note that implies that clusters sizes grow with . If instead were order , then , so by Theorem 2, we would additionally need to assume that there exists a neighborhood exposure mapping in the sense of (1) in order to guarantee that the bias vanishes at all. But in this case, it is straightforward to establish a normal approximation using existing results in the literature.
We next sketch the proof of Theorem 3. To our knowledge, there is no offtheshelf central limit theorem that we can apply to . Under Assumption 3, the outcomes appear to be nearepoch dependent on the input , but the treatments are clusterdependent with growing cluster sizes, rather than mixing, as required by Jenish and Prucha (2012).
Rewrite the estimator as
(7) 
We first show that the last term is relatively small, to be precise, which means that, on average, is primarily determined by the neighborhood exposure indicators . The proof of this claim is somewhat complicated, but it is similar to the argument showing in (5). To then establish a central limit theorem for , observe that the dependence between “observations” is characterized by the following dependency graph , which, roughly speaking, links two units only if they are dependent. Recalling the definition of from (4), we link two units in if and only if (or equivalently ). Then is indeed a dependency graph because implies that the cluster assignments that determine are distinct from those that determine , so unlinked observations in are independent by cluster randomization. We can then apply a central limit theorem for dependency graphs to prove the result.
This proof demonstrates that there are two sources of dependence. The firstorder source is the first term on the righthand side of (7). Dependence in this average is due to cluster randomization, which induces correlation in the neighborhood exposure indicators . The secondorder source is the second term on the righthand side of (7). Dependence in this average is due to interference, which decays with distance due to Assumption 3. The previous arguments show that the secondorder source of dependence is small relative to the firstorder source when cluster sizes are large.
Therefore, to estimate , it suffices to account only for dependence induced by the neighborhood exposure indicators. Define , and note that and . Let , which is equivalent to . Our proposed variance estimator is
Proof. See § A.
The result shows that is generally not consistent because of the remainder , which is typically nonzero due to the unitlevel heterogeneity. In the nointerference setting, it is wellknown that the variance of the differenceinmeans estimator is biased for the same reason and that consistent estimation of the variance is impossible. Formally, bias arises because does not approach zero asymptotically, except in the special case of homogeneous treatment effects where does not vary across . Leung (2021a) proves a similar result for a different variance estimator under network interference.
It is useful to compare our result to the standard setting with no interference. In this case, , and we replace with and with to estimate the usual average treatment effect. Furthermore, we set for all because units are independent and set since there is no longer need to cluster units. With these changes, , and
(8) 
for and . This is the wellknown expression for the bias in the absence of interference (e.g. Imbens and Rubin, 2015, Theorem 6.2).
In our setting, we have additional “covariance” terms included in due to the nonzero offdiagonals of the dependency graph . By visual inspection, is completely analogous in form to and therefore may be viewed as an estimate of the “variance” of the approximate unitlevel treatment effects (by Theorem 1), in the same way that (8) is interpreted as the variance of the unitlevel treatment effects. Under distributional assumptions on the superpopulation from which potential outcomes are drawn, we expect that converges in probability to a nonnegative constant, in which case is asymptotically conservative, as in the nointerference setting. This can be seen in our simulations in § 5.
As previously discussed, the bias of is for the rateoptimal designs discussed at the end of § 3. Thus, for such designs, the preceding discussion implies that
(9) 
is an asymptotically conservative 95percent confidence interval for
.5 Monte Carlo
We next present results from a simulation study illustrating the quality of the normal approximation in Theorem 3 and coverage of the confidence interval (9) when constructing clusters using spectral clustering. To generate spatial locations, we first draw . Unit locations in are given by for with . We let where is the Euclidean norm.
We set the number of clusters at , rounded to the nearest integer, which corresponds to the nearoptimal design under the worstcase bias discussed at the end of § 3. To construct clusters, we apply spectral clustering to
with the standard Gaussian affinity matrix whose
th entry is . Clusters are randomized into treatment with probability . Figure 3 displays the clusters and treatment assignments for a typical simulation draw.We generate outcomes from three different models. Let be drawn independently of the other primitives. The first model is CliffOrd:
with and spatial weight matrix given by the rownormalized adjacency matrix (6). As discussed in § 3, this model features exponentially decaying , in fact of order (Leung, 2021a, Proposition 1).
We select the second and third models to explore how our methods break down when Assumption 3 are violated or close to violated. For this purpose, we use the “moving average” model (2) with and for for the two respective models. Then decays at a polynomial rate. Notably, the choice of implies that the rate of decay is slow enough that Assumption 3 can fail to hold. This is because
for some by Lemma A.1(iii) of Jenish and Prucha (2009). The righthand side does not converge for , as required by Proposition 1. On the other hand, the choice of is large enough for Assumption 3 to be satisfied since we now replace the 3 on the righthand side of the previous display with 4. However, in smaller samples, or 5 may not be substantially different, so our methods may still break down from the assumption being “close to” violated.
Table 1 displays the results with 5000 simulation draws. Row “Bias” displays , estimated by taking the average over the draws, while “Variance” is the variance of across the draws. The next rows display the coverage of three different confidence intervals. “Coverage” corresponds to (9). “Naive Cvg” corresponds to (9) but replacing
with the i.i.d. standard error, so the extent to which its coverage deviates from 95percent illustrates the degree of spatial dependence. “Oracle Cvg” corresponds to (
9) but replacingwith the standard deviation of
across the draws. “SE” displays our standard error .Moving Avg,  Moving Avg,  CliffOrd  

250  500  1000  250  500  1000  250  500  1000  
40  63  100  40  63  100  40  63  100  
Coverage  0.909  0.925  0.924  0.922  0.937  0.940  0.979  0.983  0.982 
Naive Cvg  0.530  0.489  0.447  0.575  0.539  0.494  0.918  0.913  0.916 
Oracle Cvg  0.943  0.943  0.936  0.950  0.952  0.949  0.983  0.982  0.982 
Bias  0.108  0.093  0.083  0.033  0.027  0.024  0.009  0.004  0.003 
Variance  0.143  0.087  0.054  0.108  0.065  0.039  0.341  0.177  0.087 
SE  0.364  0.292  0.232  0.317  0.252  0.199  0.601  0.432  0.309 
1.432  1.467  1.492  1.289  1.307  1.319  5.804  5.822  5.851 

5k simulations. “Coverage” and “Cvg” rows show the empirical coverage of 95% CIs. “Naive” and “Oracle” respectively correspond to i.i.d. and true standard errors.
For all three confidence intervals, we compute the fraction of times the interval covers . Note that the oracle SE, , approximates because the variance is taken over the randomness of the design as well as of the potential outcomes, so the oracle CI can be conservative.
Note that there are at most 100 clusters in all designs, and the rate of convergence is quite slow at for our choice of . Nonetheless, across all designs, the coverage of the oracle CI is close to 95percent, which illustrates the quality of the normal approximation. For the CliffOrd model, our CI attains at least 95percent coverage even for small sample sizes, despite being chosen suboptimally for the worstcase bias. For the moving average model with , we see some undercoverage in smaller samples due to the larger bias, which is unsurprising from the above discussion, but coverage is close to the nominal level for larger . The results for , as expected, are worse since it is deliberately constructed to violate our main assumption. Once again, our CI exhibits undercoverage due to the larger bias, but coverage seems to improve as grows and the bias decreases.
6 Conclusion
This paper studies the design of clusterrandomized experiments for targeting the global average treatment effect under spatial interference. Each design is characterized by a parameter that determines the number and sizes of clusters. For each such design, we consider a HorovitzThompson estimator that compares units with different neighborhood exposures to treatment, where the neighborhood radius is of the same order as clusters’ sizes. We derive the estimator’s rate of convergence as a function of and the degree of interference to inform rateoptimal choices of . In the worst case where the degree of interference is substantial, the estimator has an rate of convergence under a nearly rateoptimal design, whereas in the best case where interference is characterized by a neighborhood exposure mapping, the rate is under a rateoptimal design. For inference, we derive the asymptotic distribution of the estimator and provide an estimate of the variance. In practice, we suggest using a clustering algorithm to partition space into clusters, where is chosen according to our theory, and explore in simulations the performance of spectral clustering.
Important areas for future research include datadriven choices of and and methods to reduce the bias of the estimator. However, a rigorous theory appears to require more substantive restrictions on interference than what we impose.
Appendix A Proofs
Definitions. The proofs use the following definitions. Let be the centroid of cluster . For , define
(A.1) 
For , this is the “boundary” of , and as we increase , moves through contour sets within that are increasingly further from the boundary. Also, for any two sets , let .
The proofs make use of the following facts, which are a consequence of Lemma A.1 of Jenish and Prucha (2009). Given that has radius , and for some univeral constant that does not depend on or . Also, since is just the boundary of a ball of radius , .
Proof. Step 1. We first establish covariance bounds. Fix such that , the latter defined in (4), and set . Trivially,
First consider the case . Let be the cluster containing unit , , and . As a preliminary result, we bound the discrepancy between and .
Let , the distance between and the nearest unit in the boundary of . By Assumption 3, for any ,
because the conditioning event fixes at their realized values. Similarly,
, so by the law of total probability and