DeepAI

# Rate-Optimal Cluster-Randomized Designs for Spatial Interference

We consider a potential outcomes model in which interference may be present between any two units but the extent of interference diminishes with spatial distance. The causal estimand is the global average treatment effect, which compares counterfactual outcomes when all units are treated to outcomes when none are. We study a class of designs in which space is partitioned into clusters that are randomized into treatment and control. For each design, we estimate the treatment effect using a Horovitz-Thompson estimator that compares the average outcomes of units with all neighbors treated to units with no neighbors treated, where the neighborhood radius is of the same order as the cluster size dictated by the design. We derive the estimator's rate of convergence as a function of the design and degree of interference and use this to obtain estimator-design pairs in this class that achieve near-optimal rates of convergence under relatively minimal assumptions on interference. We prove that the estimators are asymptotically normal and provide a variance estimator. Finally, we discuss practical implementation of the designs by partitioning space using clustering algorithms.

08/10/2022

### Exploiting neighborhood interference with low order interactions under unit randomized design

Network interference, where the outcome of an individual is affected by ...
10/28/2022

### Cluster Randomized Designs for One-Sided Bipartite Experiments

The conclusions of randomized controlled trials may be biased when the o...
11/09/2019

### Optimal Experimental Design for Staggered Rollouts

Experimentation has become an increasingly prevalent tool for guiding po...
04/09/2018

### Central limit theorems via Stein's method for randomized experiments under interference

Controlling for interference through design and analysis can consume bot...
12/07/2022

### Neighborhood Adaptive Estimators for Causal Inference under Network Interference

Estimating causal effects has become an integral part of most applied fi...
04/25/2020

### Limiting Bias from Test-Control Interference in Online Marketplace Experiments

In an A/B test, the typical objective is to measure the total average tr...
03/07/2018

### Optimizing cluster-based randomized experiments under a monotonicity assumption

Cluster-based randomized experiments are popular designs for mitigating ...

## 1 Introduction

Suppose we observe a population of experimental units. Denote by the potential outcome of unit

in the counterfactual world in which the population is assigned treatments according to the vector

, where () means unit is assigned to treatment (control). The fact that is a function of for represents interference, the possibility that treatments assigned to alters can influence the ego.

An important estimand of practical interest in this context is the global average treatment effect, defined as

 θn≡1n∑i∈Nn(Yi(1n)−Yi(0n))

where () is the -dimensional vector of ones (zeros). This compares average outcomes in the counterfactual world in which all units are treated to those in the world in which none are treated. Each average can only be directly observed in the data under an extreme design that assigns all units to the same treatment arm, but any such design would necessarily preclude observation of the other counterfactual. Common designs studied in the literature, including those studied here, assign different units to different treatment arms, so neither average is directly observed in the data. Nonetheless, we show that asymptotic inference on is possible for a class of cluster-randomized designs when the degree of interference diminishes with spatial distance.

Much of the existing literature on interference instead assumes that interference is sufficiently restricted that it can be summarized by a low-dimensional exposure mapping and that units are individually randomized into treatment or control either via Bernoulli or complete randomization (e.g. Aronow and Samii, 2017; Basse et al., 2019; Forastiere et al., 2021; Manski, 2013; Toulis and Kao, 2013). Jagadeesan et al. (2020) and Ugander et al. (2013) also utilize exposure mappings but depart from unit-level randomization. They propose new designs that introduce cluster-dependence in unit-level assignments in order to improve the precision of certain Horovitz-Thompson estimators.

We build on this literature by (1) studying rate-optimal choices of both cluster-randomized designs and Horovitz-Thompson estimators, (2) avoiding exposure mapping restrictions on interference, which can be quite strong (Eckles et al., 2017), and (3) developing a distributional theory for the estimator and a variance estimator for practical inference. Regarding (2), most exposure mappings used in the literature imply that only units within a small, known distance from the ego can interfere with the ego’s outcome. We instead study a weaker restriction on interference similar to Leung (2021a), which states that the degree of interference decreases with distance but does not necessarily zero out at any given distance.

Regarding (1), we study cluster-randomized designs in which units are partitioned into spatial clusters, clusters are independently randomized into treatment and control, and the assignment of a unit is dictated by the assignment of its cluster. By introducing correlation in assignments in this fashion, depending on the estimator used, these designs can reduce overlap problems common under Bernoulli randomization, which improves the rate of convergence. For analytical tractability, we focus on designs in which clusters are equally sized squares, each design distinguished by the number of such squares. We pair each design with a Horovitz-Thompson estimator that compares the average outcomes of units with all neighbors treated to those with none, where the neighborhood radius is of the same order as the cluster size dictated by the design. See Figure 1 for a depiction of a hypothetical design and neighborhoods used to construct the estimator.

Our results inform how the analyst should choose the number of clusters (and hence, the cluster size and neighborhood radius of the estimator) to minimize the rate of convergence of the estimator. Notably, unlike existing work on cluster-randomization with interference, the sizes of clusters must generally grow with the population size due to the relatively weak restriction on interference imposed. We also discuss practical implementation of the designs by partitioning space using clustering algorithms and explore the performance of spectral clustering in simulations.

Finally, regarding (3), we show that the estimator is asymptotically normal and provide a variance estimator. These results appear to be novel, as no existing central limit theorems seem to apply to our setup in which treatments exhibit cluster dependence, clusters can be large, and units in different clusters are spatially dependent due to interference. We argue that the variance estimator is typically conservatively biased and that the bias reflects heterogeneity in unit-level treatment effects, which extends the well-known result on conservative variance estimation

(Neyman, 1990) to our setting with spatial interference.

Of course, rate-optimality results do not determine the choice of nonasymptotic constants that are often important in practice under smaller sample sizes. Still, they provide an important first step toward designing practical procedures. Due to the generality of the setting, which imposes quite minimal assumptions on interference, it seems reasonable to first study rate-optimality, as finite-sample optimality appears to require substantial additional structure on the problem. We note that existing results on graph cluster randomization, which require even stronger restrictions on interference than this paper, are nonetheless limited to rates, and it has been an open question how to “best” construct clusters in practice.

Most of the literature supposes interference is mediated by a network. Studying optimal design in this setting is difficult because network clusters can be highly heterogeneous, and their graph-theoretic properties can closely depend on the generative model of the network (Leung, 2021b). To make the optimal design problem more analytically tractable, we focus on spatial interference and a class of designs that partitions space into equally sized squares while exploring in simulations the performance of more realistic designs.

Spatial interference is of empirical interest in its own right. A rideshare company may wish to compare the performance of two different pricing algorithms, but the algorithms may induce behavior that generates spatial externalities, such as congestion, resulting in interference. Many phenomena diffuse primarily through physical interaction, for example, infectious diseases, crime (Verbitsky-Savitz and Raudenbush, 2012), neighborhood effects more broadly (Sobel, 2006), and pollution (Giffin et al., 2020).

Related Literature. There is relatively little work on optimal experimental design under interference. Viviano (2020) studies variance-minimizing two-wave experiments under network interference. Baird et al. (2018) study the power of randomized saturation designs under partial interference.

A growing literature studies designs for interference that depart from unit-level randomization. A key paper motivating our work is Ugander et al. (2013), who propose graph cluster randomization designs under network interference. Ugander and Yin (2020) suggest a new variant of these designs, and Harshaw et al. (2021)

study related designs for bipartite experiments. These papers assume interference is summarized by certain 1-neighborhood exposure mappings, which enables the construction of unbiased estimators and use of designs in which clusters are small (asymptotically bounded in size). We focus on a spatial setting but impose weaker restrictions on interference. As we show in

Theorem 2, the latter requires the use of designs in which cluster sizes are large in order for the bias to be small. This creates a bias-variance trade-off, and we discuss rate-optimal designs to best choose cluster size to minimize asymptotic mean-squared error.

Eckles et al. (2017) study how graph cluster randomization can reduce the bias of common estimators for in the absence of correctly specified exposure mappings. Pouget-Abadie et al. (2018) propose two-stage cluster-randomized designs to minimize bias under a monotonicity restriction on interference. Basse and Airoldi (2018), Jagadeesan et al. (2020), and Sussman and Airoldi (2017) focus on linear potential outcome models and propose designs targeting the direct average treatment effect, rather than . Under a normal-sum model, Basse and Airoldi (2018) compute the mean-squared error of the difference-in-means estimator, which they use to suggest model-assisted designs.

The work on cluster randomization above targets global effects such as (also see Chin, 2019; Choi, 2017)

. However, much of the literature on interference instead considers estimands defined by exposure mappings. These and related estimands can often be functions of assignment probabilities, so that their interpretation may be somewhat specific to the experiment run

(Sävje et al., 2021). Hu et al. (2021) (§5) views this as “largely unavoidable” in nonparametric settings with interference. Our results show that inference on , which avoids this issue, is possible under restrictions on interference weaker than those typically used in the literature.

There is a large literature on cluster-randomized trials (e.g. Hayes and Moulton, 2017; Park and Kang, 2021). This literature predominantly requires partial interference, meaning that units are divided into clusters, and those in distinct clusters do not interfere with one another. That is, the clusters themselves impose restrictions on interference. In our setting, clusters are determined by the design and do not restrict interference.

Finally, Aronow et al. (2020) and Pollmann (2020) study spatial interference in a different experimental setting in which treatments are assigned to distinct spatial locations, rather than to units directly. This shares some similarities with spatial cluster randomization, where different spatial regions are randomized into treatment, so some of the ideas here may be applicable to their setting.

Outline. The next section defines the model of spatial interference and the class of designs and estimators studied. In § 3, we derive the estimator’s rate of convergence and discuss rate-optimal designs. In § 4, we prove that the estimator is asymptotically normal, propose a variance estimator, and characterize its asymptotic properties. We report results from a simulation study in § 5, exploring the use of spectral clustering to implement the designs. Finally, § 6 concludes.

## 2 Setup

Let be a set of units. We study experiments in which units are cluster-randomized into treatment and control, postponing to § 2.2 the specifics of design. For each , let

be a binary random variable where

indicates that is assigned to treatment and indicates assignment to control. Let be the vector of realized treatments and denote a non-random vector of counterfactual treatments. Recall from § 1 that is the potential outcome of unit in the counterfactual world in which all units are assigned treatments according to . Formally, for any and , is a non-random function from to . We denote ’s factual, or observed, outcome by .

We maintain the standard assumption that potential outcomes are uniformly bounded.

.

### 2.1 Spatial Interference

Thus far, the model allows for unrestricted interference in the sense that may vary essentially arbitrarily in any component of . That is, the treatment assigned to any unit may arbitrarily affect any other unit’s outcome. In order to obtain a positive result on asymptotic inference, it is necessary to impose restrictions on interference in order to obtain some form of weak dependence across unit outcomes. The existing literature on interference primarily focuses on restrictions captured by -neighborhood exposure mappings, which imply that can only interfere with if . We will discuss how this assumption is potentially restrictive and establish results under weaker conditions.

We assume each unit is located in . Label each unit by its location, so that . Equip this space with the sup metric for , , and . Let , the ball of radius centered at . Under the sup metric, balls are squares, and the radius is half the side length of the square. We consider a sequence of population regions such that

 Nn⊂B(0,Rn),whereRnn−1/2→c∈(0,∞)

and is the origin. That is, units are located in the ball with growing radius . Combined with the next increasing domain assumption, the number of units in the region is , but we will simply assume the number is exactly .

###### Assumption 2 (Increasing Domain).

There exists such that for any and , .

This assumption is standard in the spatial literature, requiring that units are minimally separated in space (e.g. Jenish and Prucha, 2009).

Let be the set of non-negative reals and

 N(i,K)=B(i,K)∩Nn,

the -neighborhood of . We study the following model of interference similar to that proposed by Leung (2021a).

###### Assumption 3 (Interference).

There exists a non-increasing function such that , , and, for all ,

 supn∈Nmaxi∈Nnmax{|Yi(d)−Yi(d′)|:d,d′∈{0,1}n,dj=d′j∀j∈N(i,s)}≤ψ(s).

To interpret this, observe that maximizes over pairs of treatment assignment vectors that fix the assignments of units in ’s -neighborhood but allow assignments to freely vary outside of this neighborhood. It therefore measures the maximum deviation in ’s potential outcome caused by manipulating treatments assigned to units “distant” from in the sense that . The assumption requires the maximum deviation to vanish with the neighborhood radius , so that treatment assignments of more distant alters interfere less with the ego.

The rate at which the maximum deviation vanishes with is controlled by . In particular, is required to decay at a rate faster than , which is a familiar rate in the literature on spatial dependence. Assumption 3(b) of Jenish and Prucha (2009) and Assumption 4(c) of Jenish and Prucha (2012)

impose the same rate of decay on various measures of spatial dependence (mixing or near-epoch dependence coefficients) to establish central limit theorems.

We next discuss two prominent examples of models of interference permitted by Assumption 3. The first is the standard modeling approach of specifying a -neighborhood exposure mapping. Such a mapping is given by , where , in particular its dimension, does not depend on , unlike . The approach is to assume that the low-dimensional exposure mapping summarizes interference by reparameterizing potential outcomes as

 Yi(D)=~Yi(Ti). (1)

That is, once we fix ’s exposure mapping , her potential outcome is fully determined. No less important, it is also typically assumed that exposure mappings are restricted to a unit’s -neighborhood, where is small, meaning fixed with respect to . This means that for any such that for all , which implies that the treatment assigned to a unit only interferes with if . In practice, choices with are most common, for example for or for . In these examples, the first component of captures the direct effect of the treatment, and the second component, , captures interference from units near .

Crucially, and must be known to the analyst in this approach, which is often a strong requirement. In contrast, Assumption 3 enables the analyst to impose (1) while requiring neither to be known. Indeed, if there exists a -neighborhood exposure mapping such that (1) holds, then Assumption 3 holds with for some sufficiently large.

Furthermore, Assumption 3 allows for more complex forms of interference ruled out by (1) in which interference decays with distance without being abruptly truncated at some distance

. An example is the Cliff-Ord autoregressive model

(Cliff and Ord, 1973, 1981). This is a workhorse model of spatial autocorrelation used in a variety of fields, including geography (Getis, 2008), ecology (Valcu and Kempenaers, 2010), and economics (Anselin, 2001). A typical formulation of the model is

 Yi=α+λ∑j∈NnWijYj+Diβ+εi,

where we assume is uniformly bounded (to satisfy Assumption 1). Let be the spatial weight matrix whose th entry is . These weights typically decay with distance in a sense to be made precise below. While this model is highly stylized, the important aspect it captures is autocorrelation through the spatial autoregressive parameter . If this is nonzero, then there is no -neighborhood exposure mapping for which (1) holds, a point previously noted by Eckles et al. (2017) in the context of network interference.

To see this, first note that coherency of the model requires nonsingularity of , where is the identity matrix. Let be the inverse of this matrix and its th entry. Then the reduced form of the model is

 Yi(D)≡Yi=∑j∈NnVij(α+Djβ+εj), (2)

a spatial “moving average” model with spatial weight matrix . (See § 5 for some examples of .) Noticeably, can potentially depend on for any , which is ruled out if one imposes a -neighborhood exposure mapping.

Jenish and Prucha (2011) show that outcomes satisfying (2) are near-epoch dependent, a notion of weak spatial dependence, when the weights decay with spatial distance in the following sense:

 supn∈Nmaxi∈Nn∑j∈Nn|Vij|ρ(i,j)γ<∞ (3)

for some (see Proposition 5 and eq. (13) of Jenish and Prucha, 2011). The next result shows that this condition is sufficient for verifying Assumption 3 if .

###### Proposition 1.

Suppose potential outcomes are given by (2) and spatial weights satisfy (3) for . Then Assumption 3 holds with for some that does not depend on .

Proof. For , fix any such that for all . For , . For ,

 |Yi(d)−Yi(d′)|≤∑j∈Nn|Vij||dj−d′j|1{ρ(i,j)>s}≤s−γ∑j∈Nn|Vij|ρ(i,j)γ.

Define and take . Then by (3), , and uniform boundedness of .

This result shows that, unlike the standard approach of imposing a -neighborhood exposure mapping, Assumption 3 can allow for richer forms of interference in which alters arbitrarily distant from the ego can interfere with the ego’s response.

### 2.2 Design and Estimator

Much of the literature on interference considers designs in which units are individually randomized into treatment and control, either via Bernoulli or complete randomization. A common problem faced by such designs is limited overlap, meaning that, for many exposure mappings of interest, is unlikely to take certain values. For example, suppose that (1) holds with exposure mapping , the number of treated units in ’s -neighborhood. Then for larger values of , is small, tending to zero with possibly exponentially in a Bernoulli design. This is problematic for a Horovitz-Thompson estimator such as where . Due to , the variance grows rapidly with if either or is nonzero. To reduce this problem, Ugander et al. (2013) propose cluster-randomized designs, which deliberately introduce dependence in treatment assignments across certain units.

We consider the following class of such designs. We assign units to mutually exclusive clusters by partitioning the population region into equally sized squares, assuming for simplicity that . That is, to obtain increasingly more clusters, we first divide the population region into four squares, then divide each of these squares into four squares, and so on, as in Figure 2. Label the squares , and call

 Ck=Qk∩Nn

cluster . Then the number of units in each cluster is uniformly , and the radius of each cluster is

 rn≡Rn/√mn=√n/mn+o(1),

which we assume is greater than 1. For simplicity, we assume there are no units on the common boundaries of different squares, so that the squares partition .

We assume is realized according to a cluster-randomized design, which independently assigns each cluster to treatment and control with some probability

 p∈(0,1)

that is fixed with respect to , for example . Within a treated (control) cluster , all are assigned (). Note that is indexed by , as we clearly need a growing number of clusters for the variance of the estimator to concentrate. If is order , then , so clusters are asymptotically bounded in size, the usual case studied in the literature, which includes unit-level Bernoulli randomization as a special case. However, if is of smaller order, then clusters sizes grow with .

###### Remark 1 (Practical Implementation).

Since units are likely to be irregularly distributed in practice, division into equally sized squares may be inefficient. An alternative is to use a clustering algorithm. In § 5, we explore in simulations the performance of spectral clustering when choosing the number of clusters according to the theory in § 3.

To construct the estimator, define the neighborhood exposure indicator

 Tti=∏j∈N(i,Kn)1{Dj=t}fort∈{0,1},Kn≡rn/2,

an indicator for whether ’s -neighborhood is entirely treated () or untreated (). Let . We study the Horovitz-Thompson estimator

 ^θ=1n∑i∈NnZiforZi=(T1ip1i−T0ip0i)Yi.

This is a natural estimator for since it estimates (for example) using the outcomes of units surrounded by treated neighbors, for a potentially growing neighborhood radius . Note that the radius depends on through , so is a function of the number of clusters dictated by the design. Figure 1 depicts the relationship between the clusters and the -neighborhoods that determine exposure.

###### Remark 2 (Overlap).

Under Bernoulli randomization, overlap needs to be imposed as a separate assumption for asymptotic inference. By overlap we mean the probability weights and are uniformly bounded away from zero and one, which imposes potentially strong restrictions on the types of exposure mappings the analyst can use, as previously illustrated. In our setup, however, overlap is already satisfied because and , where is the number of clusters whose units lie in . Our choice of implies for all , so overlap holds.

Let be the centroid of cluster . The choice of ensures that, for any unit in the “interior” of a cluster in the sense that , ’s -neighborhood also lies within that cluster, in which case the exposure probabilities are simply given by the cluster assignment probability: and . If we instead had chosen, say, , then this would be true only for the centroid, while for the remaining units, and could be as small as , which means less overlap and a more variable estimate. For the purposes of the asymptotic theory, the main requirement is that is of the same order as . If were of smaller order, then, as we will see in the next section, the bias of could be non-negligible, whereas if were of larger order, then in Remark 2 would grow with , overlap would be limited, and may be non-negligible.

## 3 Rate-Optimal Designs

In this section, we derive the rate of convergence of as a function of , , and and use this to obtain rate-optimal choices of . Recall that designs are parameterized by , which determines the number and sizes of clusters, and also that depends on through , so we will be optimizing over both the design and neighborhood radius that determines the estimator.

Our first result provides asymptotic upper bounds on the bias and variance of . For two sequences and , we write to mean and to mean .

###### Theorem 1.

Under Assumptions 13, and .

Proof. First, we bound the bias. If , then all units in are treated, so by Assumption 3,

 |E[YiT1ip−11i]−Yi(1n)|=|E[Yi(D)∣T1i]−Yi(1n)|≤ψ(Kn).

The same argument applies to , so combining these results, we obtain the rate for the squared bias.

Next, we bound the variance. Let

 Λi={j∈Nn:∃k s.t. Ck∩N(i,Kn)≠∅,Ck∩N(j,Kn)≠∅}. (4)

This is the set of units whose -neighborhoods intersect a cluster that also intersects ’s -neighborhood. We have

 Var⎛⎝1n∑i∈NnZi⎞⎠=1n2∑i∈Nn∑j∈ΛiCov(Zi,Zj)+1n2∑i∈Nn∑j∉ΛiCov(Zi,Zj)≡[A]+[B]. (5)

Note that contains units from at most 16 clusters (the worst case is when intersects four clusters), and clusters contain uniformly units by Lemma A.1 of Jenish and Prucha (2009). By Assumption 1 and Remark 2, is uniformly bounded, so

 [A]≲1n2⋅n⋅nmn=1mn.

The difficult part of the argument is obtaining a tight enough rate for . Lemma A.1 in § A proves that . Therefore, , which is at most since .

Our second result provides asymptotic lower bounds.

###### Theorem 2.

Consider any non-increasing function such that , , and is bounded and differentiable on with derivative satisfying and for all . There exists a sequence of units and potential outcomes satisfying Assumptions 1 and 3 such that and .

Proof. See § A.

The restrictions on hold for and for and .

The result shows that we can construct potential outcomes satisfying the assumptions of Theorem 1 under which the bias is at least order and the variance at least . As discussed in § 1, existing work on cluster randomization under interference assumes clusters have asymptotically bounded size, which, in our setting, implies . Theorem 2 shows that the bias of the Horovitz-Thompson estimator can then be bounded away from zero, showing that existing results rely crucially on the exposure mapping assumption to obtain unbiased estimates. In the absence of this assumption, it is necessary to consider designs in which cluster sizes are large for the bias to vanish with .

Designs. Theorem 1 implies the mean-squared error is at most of order , and Theorem 2 provides a similar asymptotic lower bound. Under either bound, the bias increases with while the variance decreases, so the results show a bias-variance trade-off in the choice of design. We next derive rates for that minimize or nearly minimize the upper bound, for different assumptions on .

1. Oracle design. Suppose is known to decay with at some rate . Then since , the rate-optimal design chooses to minimize .

2. Worst-case bias. More commonly, we may have little prior knowledge about . Nonetheless, Assumption 3 provides an upper bound on the rate: for some . As previously discussed, this is a standard rate for spatial dependence used to establish central limit theorems. If decays slower than this rate, then a general positive result on asymptotic inference appears unattainable.

Since is order , under this worst-case rate, the bias is order . Without knowledge of , we can settle for a “nearly” rate-optimal design under the worst-case bias by setting . Then we would choose to minimize , which yields and an -rate of convergence.

3. Exponential decay. Common specifications of the spatial weight matrix in the Cliff-Ord model imply that decays exponentially with , for example, the row-normalized matrix

 Wij=1{ρ(i,j)≤1}∑k∈Nn1{ρ(i,k)≤1}. (6)

If is known to decay at some exponential rate but the exponent is unknown, then we may choose for any small for a nearly rate-optimal design, which yields a rate of convergence of , close to an -rate.

4. Exposure mappings. If we assume (1) holds for some -neighborhood exposure mapping, then for all . If is unknown, then for a nearly rate-optimal design, we can choose to grow slightly slower than , say , which ensures that grows at a slow rate and therefore eventually exceeds . Hence, for large enough , the bias is zero, and the rate of convergence is . On the other hand, if is known, then by choosing , we have and zero bias. In this case, clusters are asymptotically bounded in size, the estimator converges at an -rate, and both the design and estimator qualitatively coincide with those of Ugander et al. (2013).

The third design is interesting because it shows we can attain rates close to in the absence of exposure mapping assumptions, despite targeting the global average treatment effect. However, in practice, the analyst likely has minimal knowledge of , in which case our default conservative choice is

 mn=n2/3.

Under this class of designs, cluster sizes grow at the rate .

In all four designs above, the squared bias is , which is of smaller order than the variance. This makes the bias negligible from an asymptotic standpoint, but under the worst-case bias, it would be useful to develop bias-reduction methods. We also reiterate that, while this analysis only provides rates, at this level of generality, this is apparently by necessity. A finite-sample optimal design seems to require substantially more knowledge of the functional form of .

## 4 Inference

This section provides results for asymptotic inference on . Define .

###### Assumption 4 (Non-degeneracy).

.

This is a standard condition, and the proof of Theorem 2 constructs potential outcomes for which this is satisfied.

###### Theorem 3.

Suppose and . Under Assumptions 14,

 σ−1n√mn(^θ−E[^θ])d⟶N(0,1).

Proof. See § A.

The result centers at its expectation, not the estimand . However, in all of the designs discussed at the end of § 3, the bias is , in which case we can replace with in the result. Also note that implies that clusters sizes grow with . If instead were order , then , so by Theorem 2, we would additionally need to assume that there exists a -neighborhood exposure mapping in the sense of (1) in order to guarantee that the bias vanishes at all. But in this case, it is straightforward to establish a normal approximation using existing results in the literature.

We next sketch the proof of Theorem 3. To our knowledge, there is no off-the-shelf central limit theorem that we can apply to . Under Assumption 3, the outcomes appear to be near-epoch dependent on the input , but the treatments are cluster-dependent with growing cluster sizes, rather than -mixing, as required by Jenish and Prucha (2012).

Rewrite the estimator as

 (7)

We first show that the last term is relatively small, to be precise, which means that, on average, is primarily determined by the neighborhood exposure indicators . The proof of this claim is somewhat complicated, but it is similar to the argument showing in (5). To then establish a central limit theorem for , observe that the dependence between “observations” is characterized by the following dependency graph , which, roughly speaking, links two units only if they are dependent. Recalling the definition of from (4), we link two units in if and only if (or equivalently ). Then is indeed a dependency graph because implies that the cluster assignments that determine are distinct from those that determine , so unlinked observations in are independent by cluster randomization. We can then apply a central limit theorem for dependency graphs to prove the result.

This proof demonstrates that there are two sources of dependence. The first-order source is the first term on the right-hand side of (7). Dependence in this average is due to cluster randomization, which induces correlation in the neighborhood exposure indicators . The second-order source is the second term on the right-hand side of (7). Dependence in this average is due to interference, which decays with distance due to Assumption 3. The previous arguments show that the second-order source of dependence is small relative to the first-order source when cluster sizes are large.

Therefore, to estimate , it suffices to account only for dependence induced by the neighborhood exposure indicators. Define , and note that and . Let , which is equivalent to . Our proposed variance estimator is

 ^σ2=mnn2∑i∈Nn∑j∈Nn(Zi−¯Z)(Zj−¯Z)Aij.
###### Theorem 4.

Suppose and . Under Assumptions 14, for

 Rn=mnn2∑i∈Nn∑j∈Nn(E[Zi]−E[¯Z])(E[Zj]−E[¯Z])Aij.

Proof. See § A.

The result shows that is generally not consistent because of the remainder , which is typically nonzero due to the unit-level heterogeneity. In the no-interference setting, it is well-known that the variance of the difference-in-means estimator is biased for the same reason and that consistent estimation of the variance is impossible. Formally, bias arises because does not approach zero asymptotically, except in the special case of homogeneous treatment effects where does not vary across . Leung (2021a) proves a similar result for a different variance estimator under network interference.

It is useful to compare our result to the standard setting with no interference. In this case, , and we replace with and with to estimate the usual average treatment effect. Furthermore, we set for all because units are independent and set since there is no longer need to cluster units. With these changes, , and

 Rn=1n∑i∈Nn(τi−¯τ)2 (8)

for and . This is the well-known expression for the bias in the absence of interference (e.g. Imbens and Rubin, 2015, Theorem 6.2).

In our setting, we have additional “covariance” terms included in due to the non-zero off-diagonals of the dependency graph . By visual inspection, is completely analogous in form to and therefore may be viewed as an estimate of the “variance” of the approximate unit-level treatment effects (by Theorem 1), in the same way that (8) is interpreted as the variance of the unit-level treatment effects. Under distributional assumptions on the superpopulation from which potential outcomes are drawn, we expect that converges in probability to a non-negative constant, in which case is asymptotically conservative, as in the no-interference setting. This can be seen in our simulations in § 5.

As previously discussed, the bias of is for the rate-optimal designs discussed at the end of § 3. Thus, for such designs, the preceding discussion implies that

 ^θ±1.96⋅^σm−1/2n (9)

is an asymptotically conservative 95-percent confidence interval for

.

## 5 Monte Carlo

We next present results from a simulation study illustrating the quality of the normal approximation in Theorem 3 and coverage of the confidence interval (9) when constructing clusters using spectral clustering. To generate spatial locations, we first draw . Unit locations in are given by for with . We let where is the Euclidean norm.

We set the number of clusters at , rounded to the nearest integer, which corresponds to the near-optimal design under the worst-case bias discussed at the end of § 3. To construct clusters, we apply spectral clustering to

with the standard Gaussian affinity matrix whose

th entry is . Clusters are randomized into treatment with probability . Figure 3 displays the clusters and treatment assignments for a typical simulation draw.

We generate outcomes from three different models. Let be drawn independently of the other primitives. The first model is Cliff-Ord:

 Yi=α+λ∑j∈NnWijYj+δ∑j∈NnWijDj+Diβ+εi

with and spatial weight matrix given by the row-normalized adjacency matrix (6). As discussed in § 3, this model features exponentially decaying , in fact of order (Leung, 2021a, Proposition 1).

We select the second and third models to explore how our methods break down when Assumption 3 are violated or close to violated. For this purpose, we use the “moving average” model (2) with and for for the two respective models. Then decays at a polynomial rate. Notably, the choice of implies that the rate of decay is slow enough that Assumption 3 can fail to hold. This is because

 (???)=∞∑s=0∑j∈Nn1{ρ(i,j)∈[s,s+1)}ρ(i,j)γ−4≤c∞∑s=0sγ−3

for some by Lemma A.1(iii) of Jenish and Prucha (2009). The right-hand side does not converge for , as required by Proposition 1. On the other hand, the choice of is large enough for Assumption 3 to be satisfied since we now replace the 3 on the right-hand side of the previous display with 4. However, in smaller samples, or 5 may not be substantially different, so our methods may still break down from the assumption being “close to” violated.

Table 1 displays the results with 5000 simulation draws. Row “Bias” displays , estimated by taking the average over the draws, while “Variance” is the variance of across the draws. The next rows display the coverage of three different confidence intervals. “Coverage” corresponds to (9). “Naive Cvg” corresponds to (9) but replacing

with the i.i.d. standard error, so the extent to which its coverage deviates from 95-percent illustrates the degree of spatial dependence. “Oracle Cvg” corresponds to (

9) but replacing

with the standard deviation of

across the draws. “SE” displays our standard error .

For all three confidence intervals, we compute the fraction of times the interval covers . Note that the oracle SE, , approximates because the variance is taken over the randomness of the design as well as of the potential outcomes, so the oracle CI can be conservative.

Note that there are at most 100 clusters in all designs, and the rate of convergence is quite slow at for our choice of . Nonetheless, across all designs, the coverage of the oracle CI is close to 95-percent, which illustrates the quality of the normal approximation. For the Cliff-Ord model, our CI attains at least 95-percent coverage even for small sample sizes, despite being chosen suboptimally for the worst-case bias. For the moving average model with , we see some under-coverage in smaller samples due to the larger bias, which is unsurprising from the above discussion, but coverage is close to the nominal level for larger . The results for , as expected, are worse since it is deliberately constructed to violate our main assumption. Once again, our CI exhibits under-coverage due to the larger bias, but coverage seems to improve as grows and the bias decreases.

## 6 Conclusion

This paper studies the design of cluster-randomized experiments for targeting the global average treatment effect under spatial interference. Each design is characterized by a parameter that determines the number and sizes of clusters. For each such design, we consider a Horovitz-Thompson estimator that compares units with different neighborhood exposures to treatment, where the neighborhood radius is of the same order as clusters’ sizes. We derive the estimator’s rate of convergence as a function of and the degree of interference to inform rate-optimal choices of . In the worst case where the degree of interference is substantial, the estimator has an -rate of convergence under a nearly rate-optimal design, whereas in the best case where interference is characterized by a -neighborhood exposure mapping, the rate is under a rate-optimal design. For inference, we derive the asymptotic distribution of the estimator and provide an estimate of the variance. In practice, we suggest using a clustering algorithm to partition space into clusters, where is chosen according to our theory, and explore in simulations the performance of spectral clustering.

Important areas for future research include data-driven choices of and and methods to reduce the bias of the estimator. However, a rigorous theory appears to require more substantive restrictions on interference than what we impose.

## Appendix A Proofs

Definitions. The proofs use the following definitions. Let be the centroid of cluster . For , define

 J(s,Ck)={j∈Ck:ρ(c(Ck),j)∈[rn−s−1,rn−s)}. (A.1)

For , this is the “boundary” of , and as we increase , moves through contour sets within that are increasingly further from the boundary. Also, for any two sets , let .

The proofs make use of the following facts, which are a consequence of Lemma A.1 of Jenish and Prucha (2009). Given that has radius , and for some univeral constant that does not depend on or . Also, since is just the boundary of a ball of radius , .

###### Lemma A.1.

Recall the definition of from (5). Under the assumptions of Theorem 1, .

Proof. Step 1. We first establish covariance bounds. Fix such that , the latter defined in (4), and set . Trivially,

 |Cov(Zi,Zj)|=|Cov(Zi,Zj)|1{s≤4rn}+|Cov(Zi,Zj)|1{s>4rn}.

First consider the case . Let be the cluster containing unit , , and . As a preliminary result, we bound the discrepancy between and .

Let , the distance between and the nearest unit in the boundary of . By Assumption 3, for any ,

 E[|Zi−Xsi|q∣T1i=1]=p−q1iE[|Yi(D)−E[Yi(D)∣Fi(s)]|q∣T1i=1]≤p−q1iψ(max{t,s})q

because the conditioning event fixes at their realized values. Similarly,

, so by the law of total probability and