1 Introduction
A common assumption made in causal inference is the consistency and interferencefree assumption, i.e., the Stable Unit Treatment Value Assumption (SUTVA) Rubin (1980), under which the individual treatment response is consistently defined and unaffected by variations in other individuals. However, this assumption is problematic under a social network setting since peers are not independent or “no man is an island,” as written by the poet John Donne.
Interference occurs when the treatment response of an individual is influenced through the exposure to its social contacts’ treatments or affected by its social neighbors’ outcomes through peer effects Bowers et al. (2013); Toulis and Kao (2013). For instance, the treatment effect of an individual under a vaccination against an infectious disease might influence the health conditions of its surrounding individuals; or a personalized online advertisement might affect other individuals’ purchase of the advertised item through opinion propagation on social networks. Separating individual treatment effect and peer effect in causal inference becomes an intractable problem under interference since, in randomized experiments or observational studies, one can only observe the superposition of both effects. The issue of how to estimate causal responses and make optimal policies on the network is studied in this work.
One of the main objectives of treatment effect estimation is to make better treatment decision rules for individuals according to their characteristics. Populationaveraged utility functions have been studied in Manski (2009); Athey and Wager (2017); Kallus (2018); Kallus and Zhou (2018). In those publications, a policy learner can adapt and improve its decision rules through the utility function. However, interactions among units are always ignored. On the other hand, a policy learner usually faces a capacity or budget constraint, as studied in Kitagawa and Tetenov (2017). Therefore, in this work, we develop a new type of utility function defined on interconnected units and investigate provable policy improvement with budget constraints.
1.1 Related Work
Causal inference with interference was studied in Hudgens and Halloran (2008); Tchetgen and VanderWeele (2012); Liu and Hudgens (2014). However, the assumption of grouplevel interference, having partial interference within the groups and independence across different groups, is often invalid. Hence, several works focus on unitlevel causal effects under crossunit interference and arbitrary treatment assignments, such as Aronow et al. (2017); Forastiere et al. (2016); Ogburn et al. (2017a, b); Viviano (2019). Other approaches for estimating causal effects on networks use graphical models, which are studied in Arbour et al. (2016); Tchetgen et al. (2017); Ogburn et al. (2018); Sherman and Shpitser (2018); Bhattacharya et al. (2019).
1.2 Notations and Previous Approaches
Let denote a directed or undirected graph with a node set of size , an edge set , and an adjacency matrix . For a node, or unit, , let indicate the set of neighboring nodes with excluding the node itself, and let
denote the covariate vector of node
which is defined in the space . We focus on the Neyman–Rubin causal inference model Rubin (1974); SplawaNeyman et al. (1990) here temporally. Letbe a binary variable with
indicating that node is in the treatment group, and if is in the control group. Moreover, let be the outcome variable with indicating the potential outcome of under treatment and the potential outcome under control . Moreover, we use and to represent the treatment assignments and potential outcomes of neighboring nodes , and the entire treatment assignments vector.In the SUTVA assumption, the individual treatment effect on node is defined as the difference between outcomes under treatment and under control, i.e., . To estimate treatment effects under network interference, an exposure variable is proposed in Toulis and Kao (2013); Bowers et al. (2013); Aronow et al. (2017). The exposure variable is a function of neighboring treatments . For instance, can be a variable indicating the level of exposure to the treated neighbors, i.e., .
Under the assumption that the outcome only depends on the individual treatment and neighborhood treatments, Forastiere et al. (2016) defines an individual treatment effect under the exposure as
(1) 
Moreover, the spillover effect under the treatment and the exposure is defined as . Treatment and spillover effects are then estimated using generalized propensity score (GPS) weighted estimators.
In general, the outcome model can be more complicated, depending on network topology and covariates of neighboring units. Ogburn et al. (2017a) investigates more general causal structural equations under dimensionreducing assumption, and the potential outcome reads , where and are summary functions of neighborhood covariates and treatment, e.g., they could be the summation or average of neighboring treatment assignments and covariates, respectively. Motivated from the above causal structural equation model, we incorporate GNNbased causal estimators with appropriate covariates and treatment aggregation functions as inputs.
Contributions This work has four major contributions. First, we propose GNNbased causal estimators for causal effect prediction and to recover direct treatment effect under interference (Section 2). Second, we define a novel utility function for policy optimization on a network and derive a graphdependent policy regret bound (Section 3). Third, we provide an error bound for the GNNbased causal estimators (Section 3 and Appendix). Last, we conduct extensive experiments to verify the superiority of GNNbased causal estimators and show that the accuracy of a causal estimator is crucial for finding the optimal policy (Section 4).
2 GNNbased Causal Estimators
In this section, we introduce our GNNbased causal effect estimators under general network interference.
2.1 Structural Equation Model
Given the graph , the covariates of all units in the graph , and the entire treatment assignments vector , the structural equation model describing the considered data generation process is given as follows
(2) 
for units . This structural equation model encodes both the observational studies and the randomized experiments setting. In observational studies, e.g., on the Amazon dataset (see Section 4.1), the treatment depends on the covariate and the unknown specification of , or even on the neighboring units under network interference. In the setting of the randomized experiment, e.g., experiments on Wave1 and Pokec datasets, the treatment assignment function is specified as , where
represents predefined treatment probability. Function
characterizes the causal response, which depends on, in addition to and , the graph and neighboring covariates and treatment assignments. If only influences from firstorder neighbors are considered, the response generation can be specified as . When the graph structure is given and fixed, we leave out in the notation.2.2 Distribution Discrepancy Penalty
Even without network interference, a covariate shift problem of counterfactual inference is commonly observed, namely the factual distribution differs from the counterfactual distribution . To avoid biased inference, Johansson et al. (2016); Shalit et al. (2017) propose a balancing counterfactual inference using domainadapted representation learning. Covariate vectors are first mapped to a feature space via a feature map . In the feature space, treated and control populations are balanced by penalizing the distribution discrepancy between and using the Integral Probability Metric. This approach is equivalent to finding a feature space such that the treatment assignment and representation become approximately disentangled, namely . We use the HilbertSchmidt Independence Criterion (HSIC) as the dependence test in the feature space. The empirical HSIC using a Gaussian RBF kernel is written as ^{1}^{1}1Expression for is relegated to Appendix A.. Note that incorporating the feature map and the representation balancing penalty is essential to tackle the imbalanced assignments in observational studies, e.g., on the Amazon dataset (see Section 4.1).
2.3 Graph Neural Networks
Graph neural networks can learn and aggregate feature information from distant neighbors, which makes it a right candidate for capturing the spillover effect given by the neighboring units. Different GNNs are employed and compared in our model, and we briefly provide a review.
Graph Convolutional Network (GCN) Kipf and Welling (2016) The graph convolutional layer in GCN is defined as , where is the hidden output from the th layer with being the input features matrix, and
is the activation function, e.g., ReLU. The modified adjacency
with inserted selfconnections is defined as , and denotes the node degree matrix of .GraphSAGE GraphSAGE Hamilton et al. (2017) is an inductive framework for calculating node embeddings and aggregating neighbor information. The mean aggregation operator in the GraphSAGE reads . Traditional GCN algorithms perform spectral convolution via eigendecomposition of the full graph Laplacian. In contrast, GraphSAGE computes a localized convolution by aggregating the neighborhood around a node, which resembles the simulation protocol of linear treatment response with spillover effect for semisynthetic experiments (see Section 4.1). Due to the resemblance, a better causal estimator is expected when using GraphSAGE as the aggregation function (see the beginning of Appendix G.3 for more heuristic motivations.).
GNN GNN Morris et al. (2018) is a variation of GraphSAGE, which performs separate transformations of node features and aggregated neighborhood features. Since the features of the considered unit and its neighbors contribute differently to the superimposed outcome, it is expected that the GNN is more expressive than GraphSAGE. The convolutional operator of GNN has the form .
2.4 GNNbased Causal Estimators
We use the percentage of treated neighboring nodes, i.e., the random variable
, as the treatment summary function, and the output of GNNs as the covariate aggregation function. The concatenation of node is then fed into the outcome prediction network or , depending on , where and are neural networks with a scalar output. Note that indicates that the treatment vector is also a GNNs’ input. During the implementation, the treatment assignment vector masks the covariates, and GNN models use the masked covariates , for , as inputs. In summary, given and graph , the loss function for GNNbased estimators is defined aswhere and
are tunable hyperparameters. Our model is illustrated in Fig.
1. During the implementation, we incorporate two types of empirical representation balancing: balancing the outputs of representation network to tackle imbalanced assignments, denoted as , and balancing the outputs of the GNN representations to tackle imbalanced spillover exposure, denoted as .At this point, it is necessary to emphasize that only the causal responses of a part of the units in are relevant to the models. The GNNbased models use this part of causal responses, the network structure , and covariates as input, and can predict the superimposed causal effects of the remaining units. Note that for GNNbased nonparametric models, the identifiability of causal response is guaranteed under reasonable assumptions similar to those given in Section 3.2 of Ogburn et al. (2017a). The proof is relegated to Appendix B.
Notice that the outcome prediction networks and are trained to estimate the superposition of individual treatment effect and spillover effect. Still, after fitting the observed outcomes, we expect to extract the noninterfered individual treatment effect from the causal estimators by assuming that the considered unit is isolated. An individual treatment effect estimator can be defined similarly to Eq. 1. To be more specific, the individual treatment effect of unit is expected to be extracted from GNNbased estimators by setting its exposure to and its neighbors’ covariates to , namely ^{2}^{2}2Spillover effect can be extracted similarly.
(3) 
3 Intervention Policy on Graph
After obtaining the treatment effect estimator, we develop an algorithm for learning intervention assignments to maximize the utility on the entire graph, and the learned rule for assignment is called a policy. As suggested in Athey and Wager (2017), without interference a utility function is defined as . An optimal policy is obtained by maximizing the sample empirical utility function given the individual treatment response estimator , i.e., , where indicates the policy function class. Notably, tends to assign treatment to units with positive treatment effect and control to units with negative responses.
Now, consider the outcome variable under network interference. For notational simplicity and clarity of the later proof, we assume firstorder interference from nearest neighboring units, hence the outcome variable can be written as . Inspired by the definition of , the utility function of a policy under interference is defined as
(4) 
where with an empty graph represents the individual outcome under control without any network influence ^{3}^{3}3Hence and are omitted in the expression.. After some manipulations, equals the sum of individual treatment effect and spillover effect, i.e., , where
To be more specific, is the conventional individual treatment effect, while represents the spillover effect under the policy and when . Due to the networkdependency in the spillover effect, an optimal policy will not merely treat units with positive responses but also adjust its intervention on the entire graph to maximize the spillover effects.
Next, we establish guarantees for the regret of learned intervention policy. Let and denote the estimator of and , respectively. Given the true models and , let be the empirical analogue of , and let be the empirical utility with estimators plugged in. Using learned causal estimators, an optimal intervention policy from the empirical utility perspective can be obtained from . Moreover, the best possible intervention policy from the functional class with respect to the utility is written as , and the policy regret between and is defined as . Throughout the estimation of policy regret, we maintain the following assumptions.
Assumption 1.
(BO) Bounded treatment and spillover effects: There exist such that the individual treatment effect satisfies and the spillover effect satisfies .
(WI) Weak independence assumption: For any node indices and , the weak independence assumption assumes that .
(LIP) Lipschitz continuity of the spillover effect w.r.t. policy: Given two treatment policies and , for any node the spillover effect satisfies , where the Lipschitz constant satisfies and .
(ES) Uniformly consistency: after fitting experimental or observational data on , individual treatment effect estimator satisfies , and spillover estimator satisfies , , where and are scaling factors that characterize the errors of estimators.
Notice that the (ES) assumption requires consistent estimators of the individual treatment effect and the spillover effect, which is the fundamental problem of causal inference with interference. In our GNNbased model, these empirical errors are particularly difficult to estimate due to the lack of proper theoretical tools for understanding GNNs. To grasp how these GNNbased causal estimators are influenced by the network structure and network effect, in Appendix G.3, we study a particular class of GNNs, which is inspired by the surrogate model of nonlinear graph neural networks and have the following claim.
Claim 1.
GNNbased causal estimators restricted to a particular class for predicting the superimposed causal effects have an error bound , where and is the maximal node degree in the graph.
The above claim indicates that an accurate and consistent causal estimator is difficult with large network effects. Worse case is that the convergence rate in the (ES) assumption becomes unreachable when depends on the number of units. The exact convergence rate of causal estimators is impossible to derive since it depends on the topology of the network, and it beyond the theoretical scope of this work.
Besides, (LIP) assumes that the change of received spillover effect is bounded after modifying the treatment assignments of one unit’s neighbors. We will use hypergraph techniques, instead of chromatic number arguments, to give a tighter bound of policy regrets. Another advantage is that the weak independence (WI) assumption can be relaxed to support longer dependencies on the network. However, by relaxing (WI), the power of in Theorem 4 and 2 needs to be modified correspondingly. For example, if we assume a nextnearest neighbors dependency of covariates, i.e., for , then the term in Theorem 4 and 2 needs to be modified to .
Under Assumption 1, we can have the following bound.
Theorem 1.
By Assumption 1, for any small , the policy regret is bounded by with probability at least , where indicates the covering number ^{4}^{4}4The covering number characterizes the capacity of a functional class. Definition is provided in the Appendix G. on the functional class with radius , and is the maximal node degree in the graph .
Proof.
Under (WI) and (BO), we can use concentration inequalities of networked random variables defined on a hypergraph, which is derived from graph to bound the convergence rate. Moreover, the Lipschitz assumption (LIP) allows an estimation of the covering number of the policy functional class . More discussions on the plausibility of Assumption 1 and the full proof are relegated to Appendix G. ∎
Suppose that the policy functional class is finite and its capacity is bounded by . According to Theorem 4, with probability at least , the policy regret is bounded by . It indicates that optimal policies are more difficult to find in a dense graph even under weak interactions between neighboring nodes.
In a realworld setting, treatments could be expensive. So the policymaker usually encounters a budget or capacity constraints, e.g., the proportion of patients receiving treatment is limited, and to decide who should be treated under constraints is a challenging problem Kitagawa and Tetenov (2017). Through the interferencefree welfare function , a policy is trained to make treatment choices using only each individual’s features. In contrast, under interference, a smart policy should maximize the utility function Eq. (4) by deciding whether to treat an individual or expose it under neighboring treatment effects such that a required constraint can be satisfied. Therefore, in the second part of the experiments, after fitting causal estimators, we investigate policy networks that maximize the utility function on the graph and satisfy a treatment proportion constraint.
To be more specific, we consider the constraint where only percentage of the population can be assigned to treatment ^{5}^{5}5Note that here differs from the treatment probability from causal structural equations in the randomized experiment setting.. The corresponding sampleaveraged loss function for a policy network under capacity constraint is defined as , where is a hyperparameter for the constraint. Optimal policy under capacity constraint is obtained by . A capacityconstrained policy regret bound is provided in Theorem 2, which is proved in Appendix G.2. It indicates that if in the constraint is small, then the optimal capacityconstrained policy will be challenging to find. Increasing the treatment probability can not guarantee the improvement of the group’s interest due to the nonlinear network effect. Therefore, finding the balance between optimal treatment probability, treatment assignment, and group’s welfare is a provocative question in social science.
Theorem 2.
By Assumption 1, for any small , the policy regret under the capacity constraint is bounded by with probability at least , where indicates the covering number on the functional class with radius , and is the maximal node degree in the graph .
4 Experiments
4.1 Datasets
The difficulties of evaluating the performance of the proposed estimators lie in the broad set of missing outcomes under counterfactual inference. Therefore, we conduct randomized experiments on two semisynthetic datasets with groundtruth response generation functions, and observational studies on one real dataset with unknown treatment assignment and response generation functions. Notably, in the randomized experiment setting, we consider a linear response generation function inspired by Eq. 5 of Toulis and Kao (2013), , where is the outcome under control and without network interference, and represents Gaussian noise. and represent individual treatment effect and spillover effect, respectively, whose forms are datasetdependent and discussed below.
To further investigate the superiority of the GNNbased causal estimators on nonlinear causal responses, we consider the following data generation function inspired by Section 4.2 of Toulis and Kao (2013), , where characterizes the strength of nonlinear effects. In addition, a more complicated nonlinear response generation function is considered, where the quadratic terms signify the spillover effect depending on the individual treatment effect.
Wave1  Pokec  
DA GB  
DA RF  
DR GB  
DR EN  
GPS  
GCN +  
GraphSAGE +  
GNN +  
Improve 
Wave1
Wave1 is an inschool questionnaire data collected through the National Longitudinal Study of Adolescent Health project
Chantala and Tabor (1999). The questionnaire contains questions such as age, grade, health insurance, etc. Due to the anonymity of Wave1, we use the symmetrized NN graph derived from the questionnaire data as the friendship network. In our experiments, we choose , and the resulting friendship network has nodes and links. We assume a randomized experiment conducted on the friendship network which describes students’ improvements of performance through assigning to a tutoring program or through the peer effect. Hence represents the overall performance of student before assignment to a tutoring program and before being exposed to peer influences, the simulated performance difference after an assignment, and the synthetic peer effect. Exact forms of and depend nonlinearly on the features of each student. Moreover, the firstorder peer effect is simulated as , where the decay parameter characterizes the decay of influence. In randomized experiments reported in the main text, we randomly assign of the population to the treatment. Details of the generating process and more experiment results with different settings are relegated to Appendix C and F.DA GB  

DA RF  
DR GB  
DR EN  
GPS  
GCN  
GCN +  
GCN +  
GraphSAGE  
GraphSAGE +  
GraphSAGE +  
GNN  
GNN +  
GNN +  
Improve 
Pokec The friendship network derived from the Wave1 questionnaire data may violate the powerlaw degree distribution of real networks. Hence, we further conduct experiments on the real social network Pokec Takac and Zabovsky (2012) with generated responses. Pokec is an online social network in Slovakia with profile data, including age, gender, education, etc. We consider randomized experiments on the Pokec social network, in which personalized advertisements of a new health medicine are pushed to some users. We assume that the response of exposed users to the advertisement only depends on a few properties, such as age, weight, smoking status, etc. We keep profiles with complete information on these properties, and the resulting Pokec social network contains nodes and links. Let represent the purchase of this new health medicine without external influence on the decision, the purchase difference after seeing the advertisement, the purchase difference due to social influences. For randomized experiments on the Pokec social network, we also consider peer effects from nextnearest neighbors by defining , where the decay parameter characterizes the decay of influence. Details and more experimental results with different hyperparameter settings are given in Appendix D and F.
Amazon The copurchase dataset from Amazon contains product details, review information, and a list of similar products. Therefore, there is a directed network of products that describes whether a substitutable or complementary product is getting copurchased with another product Leskovec et al. (2007). To study the causal effect of reviews on the sales of products, Rakesh et al. (2018) generates a dataset containing products with only positive reviews from the Amazon copurchase dataset, named as pos Amazon, and Amazon for short. In this dataset, all items have positive reviews, i.e., the average rating is larger than , and one item is considered to be treated if there are more than three reviews under this item; otherwise, an item is in the control group. In this setting, pos Amazon is an overtreated dataset with more than of products being in the treatment group. Word2vec embedding of an item’s review serves as the feature vector of this item. Moreover, the individual treatment effect of an item is approximated by matching it to other items having similar features and under minimal exposure to neighboring nodes’ treatments.
Wave1  Pokec  
DA GB  
DA RF 

DR GB 

DR EN 

GPS 

GCN  
GraphSAGE 

GNN 

Improve 
4.2 Results of Causal Estimators
Evaluation Metrics
One evaluation metric is the square root of MSE for the prediction of the observed outcomes on the test dataset
, which is defined as , where denotes the output of the outcome prediction network (see and in Fig. 1). This metric reflects how well an estimator can predict the superimposed individual treatment and spillover effects on a network. Another evaluation metric that quantifies the quality of extracted individual treatment effect is the Precision in Estimation of Heterogeneous Effect studied in Hill (2011), which is defined as , where is defined in Eq. (3).Baselines
Baseline models are domain adaption method Künzel et al. (2019)
with gradient boosting regression (
DA GB), with random forest regression (
DA RF), doublyrobust estimator Funk et al. (2011) with gradient boosting regression (DR GB), and elastic net regression (DR EN). They are implemented via EconML Research (2019) with gridsearched hyperparameters. These baselines incorporate the feature vectors as inputs and exposure as the control variable into the model. For randomized experiments on Wave1 and Pokec, the predefined treatment probability is provided, while for the observational studies on the Amazon dataset, the covariatedependent treatment probability is estimated. Moreover, the generalized propensity score (GPS) method is reproduced and enhanced for a fair comparison, equipped with the same feature map function. More details of baselines, the sketch of the training procedure, and hyperparameters are relegated to Appendix F.Experiments
We use partial outcomes, both in the randomized experiments and observational settings, to train the GNNbased causal estimators. We investigate the effect of penalizing representation imbalance in the observational studies on the Amazon dataset. The entire data points are randomly divided into training (), validation (), and test () sets. Note that the entire network and the covariates of all units are given during the training and test, while only the causal responses of units in the training set are provided in the training phase. For the randomized experiments using the Wave1 and Pokec datasets, we repeat the experiments times and use different random parameters in the response generation process each time.
Experimental results on the Wave1 and Pokec data generated via linear model are presented in Table 1. Both representation balancing and are deployed in the GNNbased estimators for searching for the best performance. GNNbased estimators, especially the GNN estimator, are superior for superimposed causal effects prediction. One can observe a improvement of the metric on the Wave1 dataset when comparing the GNN estimator with the enhanced GPS method and a
improvement on the Pokec dataset. The covariates of neighboring units in the Pokec dataset actually have strong cosine similarity, hence the improvement on the Pokec dataset is not significant, and the network effect can be approximately captured from the exposure variable. Table
2 shows the experimental results on the pos Amazon dataset in the observational study. In particular, we demonstrate the effects of without representation penalty, and with different penalties. It shows that representation penalties can significantly improve the individual treatment effect recovery, serving as a regularization to avoid overfitting the network interference. Furthermore, GNNbased estimators using penalty are slightly better than those using penalty; however, by sacrificing the metric .Wave1  Pokec  
DA GB  
DA RF  
DR GB  
DR EN  
GPS  
GCN  
GraphSAGE  
GNN 
Table 3 reports the performance of GNNbased causal estimators on nonlinear response models. Nonlinear responses are generated via and under . For the metric, GNNbased estimators outperform the best baseline GPS dramatically, showing the effectiveness of predicting nonlinear causal responses. Moreover, a and performance improvement on the metric with the Wave1 dataset shows that setting an empty graph, i.e., , in the GNNbased estimators is an appropriate approach for extracting individual causal effect. Results of nonlinear responses with larger strength parameter are reported in Appendix C and D.
4.3 Results on Improved Intervention Policy
Experiment Settings
DA GB  DA RF  GPS  GCN  GraphSAGE  GNN  

After obtaining the optimal causal effect estimators and feature map (see Fig. 1), we subsequently optimize intervention policy on the same graph. A simple 2layer neural network, with ReLU activation between hidden layers and sigmoid activation at the end, is employed as the policy network. The output of the policy network lies in , and it is interpreted as the probability of treating a node. The real intervention choice is then sampled from this probability via the Gumbelsoftmax trick Jang et al. (2016) such that gradients can be backpropagated. Sampled treatment choices along with corresponding node features are then fed into the feature map and subsequent causal estimators to evaluate the utility function under network interference defined in Eq. (4). Each experiment setting is repeated times until convergence. The hyperparameter in is tuned such that the constraint for the percentage is satisfied within the tolerance . More details of experiment settings and hyperparameters are relegated to Appendix D and E.
To quantify the optimized policy , we evaluate the difference , where represents a randomized intervention underlying the same capacity constraint. The difference indicates how a learned policy can outperform a randomized policy with the same constraint evaluated via learned causal effect estimators. However, from its definition, it is concerned that the policy improvement may be very biased, such that any “expected improvement” may come from the inaccurate causal estimators. Hence, for the Wave1 and Pokec datasets, knowing the generating process of treatment and spillover effects, we also compare the actual utility difference .
Table 4 displays policy optimization results on the undertreated Wave1 and Pokec simulation datasets, where initially only of nodes are randomly assigned to treatment. It shows that an optimized policy network cannot even outperform a randomized policy in ground truth when the causal estimators perform poorly. Hence, policy networks learned from the utility function with plugged in doublyrobust or domain adaption estimators are not reliable. By contrast, the small difference between genuine utility improvement and estimated improvement for the GNNbased causal estimators indicates the reliability of the optimized policy. Moreover, comparing the groundtruth utility improvement on GPS and GCNbased estimator shows that the policy network sensitively relies on the accuracy of the employed causal estimator. Furthermore, one might argue that through baseline estimators, a simple policy network cannot adjust its treatment choice according to neighboring nodes’ features and responses, unlike through GNNbased estimators. For a fair comparison, in Appendix D, we also provide experimental results using a GNNbased policy network. However, we still cannot observe genuine utility improvements on when using baseline models as causal estimators.
Next, we conduct experiments for intervention policy learning on the overtreated pos Amazon dataset under treatment capacity constraint. Since we do not have access to the ground truth of the pos Amazon dataset, Table 5 shows the utility difference under treatment capacity constraint with evaluated only from learned causal estimators. Although the optimized utility improvement achieves the best result via the GPS causal estimator, it might be unreliable compared to the ground truth. A reliable policy improvement having comparable utility improvement via a GNNbased causal estimator is expected.
5 Conclusion
In this work, we first introduced the task of causal inference under general network interference and proposed causal effect estimators using GNNs of various types. We also defined a novel utility function for policy optimization on interconnected nodes, of which a graphdependent policy regret bound can be derived theoretically. We conduct experiments on semisynthetic simulation and real datasets. Experiment results show that GNNbased causal effect estimators, especially GraphSAGE and GNN, with an HSIC distribution discrepancy penalty are superior in superimposed causal effects prediction, and the individual treatment effect can be recovered reasonably well. Subsequent experiments of intervention policy optimization under capacity constraint further confirms the importance of employing an optimal and reliable causal estimator for policy improvement. In future work, we consider the scenario in which the network structure is only partially observed, or dynamic.
References
 Inferring network effects from observational data. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 715–724. Cited by: §1.1.
 Estimating average causal effects under general interference, with application to a social network experiment. The Annals of Applied Statistics 11 (4), pp. 1912–1947. Cited by: §1.1, §1.2.
 Efficient policy learning. arXiv preprint arXiv:1702.02896. Cited by: §1, §3.
 Causal inference under interference and network uncertainty. arXiv preprint arXiv:1907.00221. Cited by: §1.1.
 Reasoning about interference between units: A general framework. Political Analysis 21, pp. 97–124. External Links: Document Cited by: §1.2, §1.
 National longitudinal study of adolescent health: strategies to perform a designbased analysis using the add health data. Cited by: §4.1.
 Identification and estimation of treatment and interference effects in observational studies on networks. arXiv preprint arXiv:1609.06245. Cited by: §1.1, §1.2.
 Doubly robust estimation of causal effects. American journal of epidemiology 173 (7), pp. 761–767. Cited by: §4.2.
 Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §2.3.
 Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20 (1), pp. 217–240. Cited by: §4.2.
 Toward causal inference with interference. jasa 103 (482). Cited by: §1.1.
 Categorical reparameterization with gumbelsoftmax. arXiv preprint arXiv:1611.01144. Cited by: §4.3.

Learning representations for counterfactual inference.
In
International conference on machine learning
, pp. 3020–3029. Cited by: §2.2.  Confoundingrobust policy improvement. In Advances in Neural Information Processing Systems, pp. 9269–9279. Cited by: §1.
 Balanced policy evaluation and learning. In Advances in Neural Information Processing Systems, pp. 8895–8906. Cited by: §1.
 Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.3.
 Who should be treated? empirical welfare maximization methods for treatment choice. Technical report Cemmap working paper. Cited by: §1, §3.
 Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences 116 (10), pp. 4156–4165. Cited by: §4.2.
 The dynamics of viral marketing. ACM Transactions on the Web (TWEB) 1 (1), pp. 5. Cited by: §4.1.
 Large sample randomization inference of causal effects in the presence of interference. Journal of the american statistical association 109 (505), pp. 288–301. Cited by: §1.1.
 Identification for prediction and decision. Harvard University Press. Cited by: §1.
 Weisfeiler and leman go neural: higherorder graph neural networks. arXiv preprint arXiv:1810.02244. Cited by: §2.3.
 Causal inference, social networks, and chain graphs. arXiv preprint arXiv:1812.04990. Cited by: §1.1.
 Causal inference for social network data. arXiv preprint arXiv:1705.08527. Cited by: §1.1, §1.2, §2.4.
 Vaccines, contagion, and social networks. The Annals of Applied Statistics 11 (2), pp. 919–948. Cited by: §1.1.

Linked causal variational autoencoder for inferring paired spillover effects
. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1679–1682. Cited by: §4.1.  EconML: A Python Package for MLBased Heterogeneous Treatment Effects Estimation. Note: https://github.com/microsoft/EconMLVersion 0.x Cited by: §4.2.
 Estimating causal effects of treatments in randomized and nonrandomized studies.. Journal of educational Psychology 66 (5), pp. 688. Cited by: §1.2.
 Randomization analysis of experimental data: the fisher randomization test comment. Journal of the American Statistical Association 75 (371), pp. 591–593. Cited by: §1.
 Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 3076–3085. Cited by: §2.2.
 Identification and estimation of causal effects from dependent data. In Advances in neural information processing systems, pp. 9424–9435. Cited by: §1.1.

On the application of probability theory to agricultural experiments. essay on principles. section 9.
. Statistical Science, pp. 465–472. Cited by: §1.2.  Data analysis in public social networks. In International Scientific Conference and International Workshop Present Day Trends of Innovations, Vol. 1. Cited by: §4.1.
 Autogcomputation of causal effects on a network. arXiv preprint arXiv:1709.01577. Cited by: §1.1.
 On causal inference in the presence of interference. Statistical methods in medical research 21 (1), pp. 55–75. Cited by: §1.1.
 Estimation of causal peer influence effects. In International conference on machine learning, pp. 1489–1497. Cited by: §1.2, §1, §4.1, §4.1.
 Policy targeting under network interference. arXiv preprint arXiv:1906.10258. Cited by: §1.1.
Comments
There are no comments yet.