1 Introduction
The past decade has witnessed a surge of interest in causal analyses in the context of social networks, social media platforms and online advertising (e.g., Christakis and Fowler, 2007; Aral et al., 2009; Bakshy et al., 2011, 2012; Bond et al., 2012; Gui et al., 2015; Phan and Airoldi, 2015). The challenging aspect of these applications, generally, is the presence of connections, or network data, observed preintervention, possibly with uncertainty, and often missing. For instance, we might be interested in identifying which users should be offered a social coupon that enables them to play a new online multiplayer game for free, with their Facebook friends. Here, the units of analysis are users and the connections are given by the observed friendship links among users on Facebook. Or we might be interested in identifying which advertisers should be invited to bid in an auction to place ads on a banner on the CNN front page. Here, the units are auctions and a connection amounts to having a number of advertisers participating in a pair of auctions.
While there is a welldeveloped literature on several aspects of the statistical analysis of network data (e.g., Wasserman, 1994; Kolaczyk, 2009; Goldenberg et al., 2010; Bickel and Chen, 2009), the literature about statistical methods and theory for experimentation in the social and information sciences that leverages observed connections is at a budding stage (e.g., Rosenbaum, 2007; Hudgens and Halloran, 2008; Parker, 2011; Aronow and Samii, 2013; Ogburn and VanderWeele, 2014). Moreover, even when considering the average treatment effect, or average exposure effects, as the inferential targets of interest, multiple conceptualizations and assumptions are possible, which require different methodological approaches to achieve valid inference when analyzing the same experiment (Imbens and Rubin, 2015; Airoldi and Rubin, 2015).
Here, we consider the problem of how to assign treatment in a causal experiment, when the correlation among the outcomes is informed by a network, available preintervention.
To illustrate the problem, consider the scenario in Figure 1
where the variance of
is and the edge between nodes indicate a correlation of . Suppose that the two possible assignments correspond to the estimators and . Looking only at the variance for now, one can check that , which illustrates two important points: first, the variance of the estimator depends significantly (by a factor of 10 here) on the assignment strategy, and second, the assignment that leverages the correlation structure (right panel) performs better that the assignment that mimics the independent case (left panel). The rest of this Section setsup the problem, and places our contributions in the context of related work. The proposed methods are presented in Section 2 and illustrated with extensive simulations in Section 3. We discuss the results in Section 4.1.1 Causal inference setup and estimands
Let be an undirected graph, and N be the number of nodes. Working within the potential outcomes framework (Rubin, 1974; Holland, 1986; Imbens and Rubin, 2015), we denote by Z=
the assignment vector, where
if unit is assigned to treatment. Accepting the stable unit treatment value assumption (SUTVA henceforth, Rubin, 1980), the potential outcome of unit depends only on its own treatment assignment , therefore we can write its potential outcomes as . We can then define the vectors and for notational convenience.Causal inference methodology for analyzing randomized experiments, using potential outcomes, can be categorized into two general approaches: modelfree and modelbased (Imbens and Rubin, 2015). The modelfree approach, including the classical methodology by Neyman and Fisher (Fisher, 1922, 1954; Neyman, 1990; Speed, 1990; Rubin, 1990), posits the potential outcomes are constants, and the randomness comes from the assignment mechanism. The modelbased approach posits the potential outcome vectors Y(0) and Y(1)
are random variables
(Rubin, 1978, 2005). This implies a joint distribution for
. Estimands of interest, denoted , are also a random variables in this approach.In this paper we consider a setup in between. Specifically, we consider a class of degenerate joint distributions for the potential outcomes vectors, where the estimand is a constant,
Y(0)  
Y(1) 
where is a variancecovariance matrix, not necessarily diagonal, which depends on the network structure, as implied by the model introduced in Equations 7–9, and fully worked out in Equation 18, and denotes the parameters of the model for Y(0).
1.2 Related work
The problem of design and analysis of causal experiments in situations where social connections (i.e., a social network) are available preintervention has recently received considerable attention. Two issues may arise from the availability of social connections: (i) social interference, that is, a situation where the potential outcomes for unit are a function of the treatment assignments of other units that are related to unit through the network, or they are function of the observed outcomes of these units, (ii) networkcorrelated outcomes, an alternative setting where the network informs the correlation among the potential outcomes, because the potential outcomes for unit are a function of its covariates and those of other units.
Much of the current literature deals with social interference, as it is relevant to the estimation of peer effects, or peer influence (e.g., see Parker, 2011; Airoldi et al., 2012; Manski, 2013; Toulis and Kao, 2013; Aronow and Samii, 2013; Ugander et al., 2013; Eckles et al., 2014) on social networks and social media platforms (Aral et al., 2009; Bakshy et al., 2012; Bond et al., 2012; Ogburn and VanderWeele, 2014). When the inferential target is some causal measure of peer influence, SUTVA is an untenable assumption, since this assumption implies the causal effects of interference are zero by definition. Here, we focus on the case of networkcorrelated outcomes, assuming SUTVA holds. The scenario we consider subsumes a recent conceptualization of homophily (McPherson et al., 2001) as a special case. For a more precise discussion of the technical implications and differences between these settings, we refer to Section 4.3.
A number of ideas developed in the classical literature on design of experiments can be adapted to the setting of designing experiments in the presence of networkcorrelated outcomes. Of particular interest to us is the line of work pioneered by Box and Lucas (1959), Draper and Hunter (1967), and Chaloner and Larntz (1989) on the design of experiments with nonlinear response surfaces, and on strategy for how to deal with technical problems that arise in those situations. As we will show in Section 2.2, the optimal design in our setting also depends on the values of some unknown parameters. Similarly relevant to our work are the issues raised in Parker (2011).
1.3 Contributions
Here, we offer three main contributions. First, we formalize the problem of optimal experimental design in the presence of networkcorrelated outcomes, and propose a general methodology for dealing with nuisance parameters at the design stage. Second, we introduce a class of models for networkcorrelated outcomes, for which we can compute the mean squared error (MSE henceforth) in closed form, and we show empirically that minimizing the mean squared error implied by these models leads to significantly better treatment assignments than those that are currently considered optimal, but that ignore the network structure available at the design stage. Third, for a specific instance of our model, we show that the mean squared error decomposes into a sum of terms that have a clear interpretation. This decomposition offers valuable insights into desirable and undesirable unit assignments strategies for experiments on networks.
2 Optimal design with networkcorrelated outcomes
We are interested in estimating the average treatment effect, under SUTVA, defined as
(1) 
or, equivalently, , as a consequence of the additivity assumption in Section 1.1.
2.1 An estimator for the average treatment effect and its mean squared error
In order to estimate the average treatment effect, we propose the following estimator:
(2) 
where and are the number of units assigned to the control and the treatment groups, respectively. If we measure the quality of our estimator using its mean squared error, then a natural goal when designing a randomized experiment is to chose the treatment assignment vector Z that minimizes the mean squared error,
(3) 
The mean squared error can be written more explicitly as
(4)  
where parameterizes the joint model for the potential outcomes, and is a vector of contrasts, with entries for .
If we knew the parameter vector , the design problem would thus reduce to an optimization problem with the mean squared error as objective function. Such an optimization could be carried using a stochastic search algorithm over the possible treatment assignments.
2.2 Nuisance parameters and integrated mean squared error
In general, the parameter vector is an unknown nuisance parameter, so we cannot evaluate the expression in Equation (4) directly. The presence of nuisance parameters in the objective function is a common issue in optimal design problems with nonlinear response surfaces. A popular way to address it is to posit a prior distribution for the nuisance parameters and integrate the mean squared error over the prior support (e.g., see Draper and Hunter, 1967; Chaloner and Verdinelli, 1995). Similarly, we modify our goal for designing an optimal experiment; we will seek the treatment assignment vector Z that minimizes the integrated mean squared error, also known as Bayes risk in the optimal Bayesian design literature,
(5)  
However, it is difficult to obtain a closed form expression for the integrated MSE. Instead, we aim to minimize an estimator of the iMSE, obtained via Monte Carlo approximation,
(6)  
where . Conditions for the existence of the integrated mean squared error for the class of models considered in Section 2.3 are given in appendix. We study the robustness of the proposed method to variation in the estimated in Section 3.4.
Remark.
A special case of this approach is to chose a point mass as a prior distribution (Box and Lucas, 1959). This approach is interesting to consider in our analysis, since for the proposed models the mean squared error is available in closed form. We leverage this observation to study the impact of different priors on the type of designs considered optimal in Section 4.1.
2.3 Modeling networkcorrelated outcomes
In our exposition so far, we have abstracted away the network, simply assuming the existence of a correlation structure between outcomes. In the context of statistical network analysis (Kolaczyk, 2009; Goldenberg et al., 2010), many scientists have noticed that nodes sharing edges are more likely to share some similarity than unconnected nodes. This phenomenon, often referred to in the literature as homophily (McPherson et al., 2001; Hoff et al., 2001), motivates the notion that the network structure might inform the correlation structure among the observed outcomes, when units in the population are connected through a social network. We introduce a simple class of models capturing this idea. A key motivation for the proposed models is that their simplicity leads to a closed form expression for the mean squared error of the estimator for the average treatment effect, which can be easily interpreted as discussed in Section 4.1.
Let G be a network with N nodes (i.e., units), and let be the size of the closed neighborhood of node (i.e., number of neighbors of node , including node ). Define the adjacency matrix (Kolaczyk, 2009) associated to the network and where
is the identity matrix. We propose the following class of models, which can be subsumed into the framework of Section
1.1,(7)  
(8)  
(9) 
where are positive scalars for and imply a diagonal specification for , is a positive scalar, and denotes the natural exponential family distribution (Morris, 1982) with mean and variance function . The natural exponential family includes many popular distributions, making the proposed specifications flexible enough to model both continuous and discrete outcomes (Morris and Lock, 2009). Importantly, for these models, we can write the mean squared error of the estimator of the average treatment effect as
(10) 
where the scalar,
measures the difference in the average size of the neighborhoods for the units in the treatment and control groups. See the Appendix for details.
The following two examples illustrate the flexibility of the class of models we introduced.
Example 1 (NormalNormal model).
Consider the following model,
This is a continuous instance of the general model introduced above, where we set , , and . We can then write the mean squared error as,
(11) 
Example 2 (PoissonGamma model).
Consider the following model,
This is an discrete instance of the general model introduced above, where we set , , and . We can then write the mean squared error as,
(12) 
3 Comparative performance analysis
Here, we investigate the performance of the proposed methodology, empirically, and compare the MSE achieved by the optimal treatment assignment with the MSE of other design strategies, which either ignore the availability of social connections at the design stage, or that use them in a naïve way—by clustering the network and using the clusters as strata.
3.1 Setup
We generated 100 networks from each of four different network models: ErdosRenyi, SmallWorld, powerlaw degree distribution, and stochastic blockmodel (Goldenberg et al., 2010). For each network, we generated outcomes following the normal model in Section 2.3 with priors,
where , and . We then evaluated the proposed optimal treatment assignment, as well as the assignments obtained by alternative strategies, using the integrated mean squared error evaluated under the true model. In Section 3.2, we consider the ideal case, in which we know the complete data generating process, and we use an estimate of the true integrated mean squared error as the objective function. In Section 3.3, we investigate the behavior of the proposed optimal treatment allocation strategy in a realistic situation, in which we have no knowledge of the hyperparameters and we specify them incorrectly.
3.2 Performance evaluation under correct prior specifications
Here, we investigate the performance of the proposed method when we know the full data generating process, and compare it with two other strategies. The first strategy ignores the network structure, and yields balanced randomized designs. The second strategy leverages the network structure by first partitioning the network into clusters, and then performing balanced randomizations within clusters, thus mimicking a stratified design. We evaluate the designs obtained from each method by comparing their iMSE computed under the true prior.
Figure 2 shows a histogram of the iMSE of the proposed method relative to the iMSE of the naïve allocation strategy, which completely ignore the network structure, on the X axis. Figure 3
shows a similar histogram of the iMSE of the proposed method relative to the iMSE of the stratified randomized allocation, where strata were obtained using spectral clustering. The different panels of both figures report the relative iMSE results for the four network models separately, as indicated on the top of each panel. The results clearly indicate that the proposed method reduces the iMSE by at least
(the histogram bars fall below 0.75) in the majority of the simulations when compared to the nav̈e allocation strategy, independently of the network model, and by at least (the histogram bars fall below 0.80) in the majority of the simulations when compared to the stratified allocation strategy, independently of the network model.3.3 Robustness under prior misspecification
Here, we investigate the behavior of our method when we use the proposed generative model, but specify the wrong hyperparameters. More in detail, we explored the comparative performance of the proposed methods over a range of ten priors for (i.e., we minimize the iMSE corresponding to the wrong priors), by letting the corresponding pair of hyperparameters take the following values: (1,0.5), (2,0.7), (5,1), (7, 1.2), (10,1.5), (15, 2), (20, 2.5), (30,3), (40,4), and (50, 5). The first prior corresponds to the prior used to generate the outcome data, and the last one corresponds to a prior that is very far from the correct distribution. Note that the integrated mean squared error is most sensitive to the parameter in the data generating process, so we are exploring a serious case of prior misspecification.
Figure 4 summarizes the performance of the proposed method when we use the wrong prior, over different network models. The results suggest that, although the performance degrades as the prior used to find the optimal design is further from the truth, as expected, the iMSE does not degrade substantially, and the proposed assignment significantly outperforms, on average, the random balanced designs—provided for comparison in the rightmost column. Another attractive feature of the proposed design strategy is that it reduces the variation in the tails of the achieved iMSE; that is, we do not observe the kind of long tails we see for the random balanced designs. We offer some intuition behind the empirical robustness of the proposed strategy in Section 4.1.
3.4 Quantifying sources of variation in the integrated mean squared error
Here, we carry out analysis of variance on collection of comparative performance evaluation results presented above. The goal is to provide quantitative evidence of the robustness of the proposed optimal treatment assignment strategy to prior misspecification, and to compare the contribution of the sources of variation we considered in our simulation study (namely, treatment allocation strategy, network models, replications, prior specification/misspecification, variability of the estimated iMSE objective) to the overall integrated mean squared error.
The choice of treatment allocation strategy (completely randomized, stratified/spectral, minimum iMSE) explains most of the variance observed in the simulations (MSS = 11,673), while the choice of prior hyperparameters and the particular network model regime have little impact (MSS = 10.9 and MSS = 14.2, respectively). As described in Section 3.1, the minimization of iMSE is performed using a stochastic search algorithm over the space of treatment allocation vectors, , which means that even though the parameters underlying the optimization problem are the same, the optimal allocation vectors obtained for each replication might differ. A measure of the variation in the integrated mean squared error due to the stochastic nature of this minimization is given by the residuals of the analysis of variance, which is the smallest source of variation in the iMSE (MSS=3.0) when compared to the factors. This is reassuring.
As we discuss in Section 2.2, we do not have a closed form formula for the integrated mean squared error. Instead, we use a MonteCarlo estimate of this objective used for finding the optimal treatment assignment. Given the stochastic nature of the iMSE estimation routine, which is separate form the stochastic nature of the stochastic search that finds the optimal treatment assignment, it is difficult to analytically assess the impact of this approximation. On the one hand, its magnitude is also captured by the residual sum of squares reported above (MSS=3.0). On the other hand, we can offer additional empirical evidence that the estimation of the IMSE objective is well behaved. We considered pairs of MonteCarlo estimates, and , of the same objective function, . If and are any two designs, such that, then we observe empirically that we have and in the vast majority of simulations. In other words, different estimates of the objective function are unlikely to alter the ranking of treatment assignments, in terms of their iMSE. So if an assignment is better than another, under an estimate of the objective function, it is also likely to be better under the true iMSE.
The only unexplored source of variation at this point is given by misspecifications of the model for the outcomes itself. We expect this to be qualitatively smaller than the variation due to the treatment allocation strategy (the main source of variation in the analysis above) because of the flexibility of the proposed model in explaining variation in the data, which can be inferred from its similarities to other models of network data that encode homophily, and to other exchangeable network models(e.g., see Wasserman, 1994; Kolaczyk, 2009; Goldenberg et al., 2010).
4 Discussion
4.1 Decomposition of the mean square error and interpretation
The simulations in Section 3.3 show that the proposed method to find optimal treatment allocations is fairly robust to prior misspecification. Robustness to prior misspecification was also observed on simulation experiments, not reported here, when optimizing the mean squared error using point priors (Box and Lucas, 1959). Here, we investigate this feature in more details, by studying the analytical form of the MSE under the normal model introduced in Section 2.3,
Below, we expand each of the terms and offer some intuition about what they capture.
Interpretation of the bias term. The bias term can be expanded as follows,
(13) 
The bias is proportional to the difference in the average size of the neighborhoods for the units in the treatment and control groups. Intuitively, this difference measures a lack of balance between the two groups, in terms of their network characteristics—specifically, the average degree. Larger values of the mean amplify this average imbalance in neighborhood sizes.
Since the experimenter does not have control over , desirable treatment assignments have minimize bias by balancing the units assigned to treatment and control in terms of neighborhood sizes.
Interpretation of the variance terms. The first variance term is,
(14) 
is minimized when . Intuitively, this term penalizes lack of balance between the sizes of the treatment and control groups. Larger values of the parameter amplify this average imbalance in group sizes. This is consistent with classical results on the optimality of balanced randomizations for estimating the average treatment effect (Fisher, 1954; Imbens and Rubin, 2015).
The second variance term involves features of the network,
(15)  
(16)  
(17) 
The term on right hand side of Equation 15 is proportional to the average number of neighbors in common between pairs of units assigned to the treatment group. The term in Equation 16 is proportional to the average number of neighbors in common between pairs of units assigned to the control group. The term in Equation 17 is proportional to the average number of neighbors in common between pairs of units, one assigned to the treatment and one assigned to the control groups. Intuitively, looking at the signs in front of these three factors, the second variance term is minimized by assigning units with shared neighbors to different groups, and by avoiding assigning treatment or control to entire clusters of units that are densely connected.
The interpretation of the factors in the decomposition of the mean squared error presented above suggests an explanation for the empirical robustness to prior misspecifications (of the proposed methodology for finding optimal designs) reported in Section 3.3. Prior misspecifications can be expected to have negligible effects on the mean squared error, as long as the treatment allocation respects a few key measures of balance. Namely, balance in the sizes of the treatment and control groups, balance in the average size of the neighborhoods for units in the treatment and control groups, balance in the assignment of units with shared neighbors to different groups, balance in assignment of treatment to dense clusters. These quantities are not a function of the prior hyperparameters, and thus are unaffected by prior misspecifications.
4.2 An alternative strategy to find good designs based on point priors
An arguably simpler approach to optimal design, which builds on the notion of point priors (Box and Lucas, 1959) noted in Section 2.2, which also requires a prior distribution on the parameter space, is the following threesteps procedure. First, consider multiple sets of parameters , and the associated mean squared error functions , and obtain optimal designs by minimizing each of the MSE functions separately. Second, evaluate each design under all the MSE functions; let’s denote
. Third, build a loss function for
Z, by averaging the mean squared error achieved by Z across the sets of parameters with weights equal to the density of the corresponding parameter sets under the prior distribution. That is, for each we define the following loss,We then select the design with the smallest loss as the best among the candidates. This method has the benefit of not relying on a numerical approximation for the integrated mean squared error, but has several limitations in practice. It requires running a stochastic optimization algorithm for each set of parameters. The choice of the sets of parameters is somewhat arbitrary, and quickly runs into issues if the parameters space is large and/or multidimensional. In simulations (not reported here) this simpler strategy did not perform as well as the proposed approach.
4.3 Alternative conceptualizations to leverage social connections in causal analyses
In Section 1.2 we hinted at how the notions of social interference and networkcorrelated outcomes relate to and differ form each other. The distinction is subtle, since both settings involve correlation structure among the outcomes. To appreciate the differences, let us consider how the correlation structure arises in typical causal settings where models for the outcomes are considered (Rubin, 1978). Let be a diagonal covariance matrix, and , be two full covariance matrices both functions of the network G. Then we can write,
no interference  
networkcorrelated outcomes (our setup)  
interference  
In the absence of interference, the potential outcomes are independent both conditionally and marginally. In our setup, there is correlation among the potential outcomes conditionally on the assignment, as well as marginally. In the case of interference, the potential outcomes are independent conditionally on the assignment, but not marginally. The correlation arises when integrating out the assignment mechanism. The effects of interference and networkcorrelated outcomes cannot be identified in observational studies, in the absence of any assumptions (Shalizi and Thomas, 2011). Carefully designed experiments are necessary. We offered minimal assumptions and a strategy to design optimal experiments in the presence of networkcorrelated outcomes.
4.4 Optimal design and the role of randomization
There is a lively debate in the optimal experimental design literature, between those supporting a set of optimal treatment assignments to enable randomization in practice (Harville, 1975; Kempthorne, 1977), mostly in the sciences and healthcare, and those favoring a single optimal treatment assignment (Kiefer, 1959), mostly in industrial engineering. In practice, randomization balances, on average, the effect of unobserved and potentially relevant covariates—so called confounders. More generally, randomization improves the robustness of inference to model misspecifications. In addition, when working within the Bayesian framework, randomization is an ignorable assignment mechanism that greatly simplifies inference (Rubin, 1978).
The proposed methodology yields a trivially ignorable assignment mechanism, thus subsequent analyses are not any more complicated than if we had considered randomization instead. We are not leveraging covariates for balance, although these checks are simple to add. However, the proposed strategy is ultimately informed by the model specification, although we have illustrated the robustness to prior misspecifications. There is an important place for randomization in designing optimal experiments in the presence of networkcorrelated outcomes. We have recently obtained results in this directions in the case of interference, and we plan on leveraging some of the insights in the setting we considered here, elsewhere.
References
 Airoldi and Rubin (2015) Airoldi, E. M. and Rubin, D. B. (2015). Principles for leveraging information about a social network when designing and analyzing experiments. Proceedings of the National Academy of Sciences. Forthcoming.
 Airoldi et al. (2012) Airoldi, E. M., Toulis, P., Kao, E., and Rubin, D. B. (2012). Causal estimation of peer influence effects. In NIPS Workshop on Social Network and Social Media Analysis.
 Aral et al. (2009) Aral, S., Muchnik, L., and Sundararajan, A. (2009). Distinguishing influencebased contagion from homophilydriven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51):21544–21549.
 Aronow and Samii (2013) Aronow, P. M. and Samii, C. (2013). Estimating average causal effects under interference between units. arXiv preprint arXiv:1305.6156.
 Bakshy et al. (2011) Bakshy, E., Hofman, J. M., Mason, W. A., and Watts, D. J. (2011). Everyone’s an influencer: quantifying influence on twitter. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 65–74. ACM.
 Bakshy et al. (2012) Bakshy, E., Rosenn, I., Marlow, C., and Adamic, L. (2012). The role of social networks in information diffusion. In Proceedings of the 21st international conference on World Wide Web, pages 519–528. ACM.
 Bickel and Chen (2009) Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and newman–girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50):21068–21073.
 Bond et al. (2012) Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E., and Fowler, J. H. (2012). A 61millionperson experiment in social influence and political mobilization. Nature, 489(7415):295–298.
 Box and Lucas (1959) Box, G. E. and Lucas, H. (1959). Design of experiments in nonlinear situations. Biometrika, pages 77–90.

Chaloner and Larntz (1989)
Chaloner, K. and Larntz, K. (1989).
Optimal bayesian design applied to logistic regression experiments.
Journal of Statistical Planning and Inference, 21(2):191–208.  Chaloner and Verdinelli (1995) Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, pages 273–304.
 Christakis and Fowler (2007) Christakis, N. A. and Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England journal of medicine, 357(4):370–379.
 Draper and Hunter (1967) Draper, N. R. and Hunter, W. G. (1967). The use of prior distributions in the design of experiments for parameter estimation in nonlinear situations. Biometrika, 54(12):147–153.
 Eckles et al. (2014) Eckles, D., Karrer, B., and Ugander, J. (2014). Design and analysis of experiments in networks: Reducing bias from interference. arXiv preprint arXiv:1404.7530.

Fisher (1922)
Fisher, R. A. (1922).
On the interpretation of
from contingency tables, and the calculation of P.
Journal of the Royal Statistical Society, 85(1):87–94.  Fisher (1954) Fisher, R. A. (1954). Statistical Methods for Research Workers. Oliver and Boyd.

Goldenberg et al. (2010)
Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010).
A survey of statistical network models.
Foundations and Trends® in Machine Learning
, 2(2):129–233.  Gui et al. (2015) Gui, H., Xu, Y., Bhasin, A., and Han, J. (2015). Network A/B testing: From sampling to estimation. In International World Wide Web Conference (WWW), pages 399–409, Florence, Italy.
 Harville (1975) Harville, D. A. (1975). Experimental randomization: who needs it? The American Statistician, 29(1):27–31.
 Hoff et al. (2001) Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2001). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090–1098.
 Holland (1986) Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960.
 Hudgens and Halloran (2008) Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. Journal of the American Statistical Association, 103(482).
 Imbens and Rubin (2015) Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.
 Kempthorne (1977) Kempthorne, O. (1977). Why randomize? Journal of Statistical Planning and Inference, 1(1):1–25.
 Kiefer (1959) Kiefer, J. (1959). Optimum experimental designs. Journal of the Royal Statistical Society. Series B (Methodological), pages 272–319.
 Kolaczyk (2009) Kolaczyk, E. D. (2009). Statistical analysis of network data: methods and models. Springer Science & Business Media.
 Manski (2013) Manski, C. F. (2013). Identification of treatment response with social interactions. The Econometrics Journal, 16(1):S1–S23.
 McPherson et al. (2001) McPherson, M., SmithLovin, L., and Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual review of sociology, pages 415–444.
 Morris (1982) Morris, C. N. (1982). Natural exponential families with quadratic variance functions. The Annals of Statistics, pages 65–80.
 Morris and Lock (2009) Morris, C. N. and Lock, K. F. (2009). Unifying the named natural exponential families and their relatives. The American Statistician, 63(3):247–253.

Neyman (1990)
Neyman, J. (1990).
On the application of probability theory to agricultural experiments. Essay on principles. Section 9.
Statistical Science, 5(4):465–472. Translated and edited by D. M. Dabrowska and T. P. Speed from Polish original, which appeared in Roczniki Nauk Rolniczych Tom X (1923) 1–51 (Annals of Agricultural Sciences).  Ogburn and VanderWeele (2014) Ogburn, E. L. and VanderWeele, T. J. (2014). Vaccines, Contagion, and Social Networks. ArXiv eprints.
 Parker (2011) Parker, B. M. (2011). Design of networked experiments. Presented at the Isaac Newton Institute, Cambridge University, UK.
 Phan and Airoldi (2015) Phan, T. Q. and Airoldi, E. M. (2015). A natural experiment of social network formation and dynamics. Proceedings of the National Academy of Sciences, 112(21):6595–6600.
 Rosenbaum (2007) Rosenbaum, P. R. (2007). Interference between units in randomized experiments. Journal of the American Statistical Association, 102(477):191–200.
 Rubin (1974) Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688–701.
 Rubin (1978) Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, pages 34–58.
 Rubin (1980) Rubin, D. B. (1980). Comment. Journal of the American Statistical Association, 75(371):591–593.
 Rubin (1990) Rubin, D. B. (1990). Comment: Neyman (1923) and causal inference in experiments and observational studies. Statistical Science, 5(4):472–480.
 Rubin (2005) Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331.
 Shalizi and Thomas (2011) Shalizi, C. R. and Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, 40(2):211–239.
 Speed (1990) Speed, T. P. (1990). Introcutory remarks on Neyman (1923). Statistical Science, 5(4):463–464.
 Toulis and Kao (2013) Toulis, P. and Kao, E. (2013). Estimation of causal peer influence effects. Journal of Machine Learning Research, W&CP, 28(3):1489–1497.
 Ugander et al. (2013) Ugander, J., Karrer, B., Backstrom, L., and Kleinberg, J. (2013). Graph cluster randomization: Network exposure to multiple universes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 329–337. ACM.
 Wasserman (1994) Wasserman, S. (1994). Social network analysis: Methods and applications, volume 8. Cambridge university press.
Appendix A Derivation of mean squared error for the proposed models
Appendix B Integrability of the mean squared error
In the general model in Equations 7–9, we distinguish terms , from terms that only depend on the network and on the assignment vector Z. The latter set of terms are not problematic to integrate since, due to the discrete and finite nature of the network, which we assume fixed and available preintervention, they can be bounded by constants. However, the former terms all depend on parameters and might raise integrability issues. We note that these terms enter the problem linearly, so it is enough to check that they are integrable individually. For the normalnormal model in example 1 with prior specifications in Section 3.1 we have,
For the poissongamma model in example 2 with a Gamma prior for and an InverseGamma prior for , similar derivations confirm the integrability of these quantities as well.