Estimating spillovers using imprecisely measured networks

03/30/2019 ∙ by Morgan Hardy, et al. ∙ University of Washington 0

In many experimental contexts, whether and how network interactions impact the outcome of interest for both treated and untreated individuals are key concerns. Networks data is often assumed to perfectly represent these possible interactions. This paper considers the problem of estimating treatment effects when measured connections are, instead, a noisy representation of the true spillover pathways. We show that existing methods, using the potential outcomes framework, yield biased estimators in the presence of this mismeasurement. We develop a new method, using a class of mixture models, that can account for missing connections and discuss its estimation via the Expectation-Maximization algorithm. We check our method's performance by simulating experiments on real network data from 43 villages in India. Finally, we use data from a previously published study to show that estimates using our method are more robust to the choice of network measure.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Interactions between peers are of interest in many economic settings, such as health (Oster and Thornton, 2011; Godlonton and Thornton, 2012), education (Angelucci et al., 2010; Duflo et al., 2011), job search (Magruder, 2010; Wang, 2013; Heath, 2018), personal finance (Bursztyn et al., 2014), agriculture (Cai et al., 2015; BenYishay and Mobarak, 2018; Beaman et al., 2018; Vasilaky and Leonard, 2018), and microenterprises (Hardy and McCasland, 2016). Moreover, even when spillovers to non-treated peers are not of direct interest, the possibility of treatment spillovers to the control group violates the stable unit treatment value assumption (SUTVA) needed to identify causal treatment effects (Rubin, 1974). In both cases, knowing the group of peers who are potentially affected by a treatment allows researchers to accurately estimate peer effects and assess potential SUTVA violations.

However, measuring social networks is challenging. It is expensive to collect data on an individual’s entire social network (Breza et al., 2017), leading researchers to use data on proxies for networks such as geography (Foster and Rosenzweig, 1995; Miguel and Kremer, 2004; Bayer et al., 2008; Godlonton and Thornton, 2012), or sharing a common nationality (Beaman, 2011), language (Bandiera et al., 2009, 2010), ethnic group (Fafchamps, 2003), religion (Munshi and Myaux, 2006), or caste (Munshi and Rosenzweig, 2006, 2016). Even if researchers collect network data,111There is a substantial empirical literature that addresses the reliability of network data elicited through surveys, with emphasis on the type and salience of relationships being surveyed, temporal dependence, and how links are elicited. See Bell et al. (2007); Marsden (2016) for reviews. the set of individuals potentially affected by a given treatment may not entirely correspond to the elicited network if networks are truncated due to concerns about survey fatigue222Griffith (2017a) shows how limiting the number of peers a subject can report in the data from the National Adolescent Health Project can bias estimates in the linear-in-means model., or the experiment changes the network itself (Comola and Prina, ; Banerjee et al., 2018; Stein, 2018). It is also difficult to ask respondents to specify the precise set of individuals potentially affected by a given treatment by asking about either past interactions or hypothetical future interactions (Hardy and McCasland, 2016).

In this paper, we focus on the setting where the observed network represents a corrupted version of the true network of treatment interference, allowing for both unreported spillover pathways and misreported links over which no spillovers could occur. We use a local network exposure approach (Aronow and Samii, 2017; Ugander et al., 2013)

that defines a set of individuals whose treatment status can potentially affect each subject. We first show missing links and misreported links in the network can cause mismeasured treatment exposure conditions and hence biased estimators. We develop a class of mixture models that can model the distribution of the latent true exposure conditions and discuss parameter estimation using the Expectation-Maximization (EM) algorithm. These models rely on parametric assumptions about the distribution of missing links conditional on the observed network data as well as parametric assumptions on the behavior of outcomes within each treatment exposure condition. Under a linear regression model for the latter assumptions, we prove the mixture model is identifiable and the maximum likelihood estimator from the EM algorithm is consistent.

We evaluate our method with both simulations and replication of an existing study. We simulate experiments on real networks of Indian households from 47 villages. We are able to recover accurate estimates of direct and indirect treatment effects when state-of-the-art methods fail. Finally, we implement our method using networks data from a randomized evaluation of insurance information sessions with rural farmers in China (Cai et al., 2015). We find that our method produces more consistent estimates of direct and indirect treatment effects across various choices of network measures.

Our results are relevant to many experimental contexts where subject’s behavior or outcome may be influenced by other subjects’ treatment assignment in addition to their own. A common approach in these cases is to randomize treatment at a geographic or organizational level that plausibly contains each treated individual’s network of potential spillovers, such as a village (in isolated, rural settings), and then compare treated individuals to “pure controls” in non-treated units. However, even if this is possible, comparing treated to control subjects still confounds treatment effects and spillovers on these treated subjects.333An exception would be if the treated individuals are a small enough fraction of treated units that they are unlikely to know treated subjects. Comparing treatment to control individuals would then identify the average direct treatment effect by construction. However, this would likely require a large enough number of units to be impractical or prohibitively expensive in many settings. Treatment effects in such contexts are also not particularly informative about the results from scaling up a treatment to an entire population. Moreover, in other cases, a pure control is not feasible, because the experiment must be implemented within a single firm (Bandiera et al., 2009; Bloom et al., 2014; Adhvaryu et al., 2016) or market (Conlon and Mortimer, 2013), or it is not possible to leave a large enough buffer between treated and control areas to render spillovers unimportant. Potential SUTVA violations could then introduce both upward and downward bias in direct treatment effect estimates.444The reduction in exposure to disease from directly treated school children in Kenya may indirectly improve the health outcomes of school children who did not directly receive the treatment, biasing downward naively estimated benefits of deworming pills (Miguel and Kremer, 2004). In contrast, increased police patrolling on the streets of Bogota, Colombia, may merely push crime “around the corner”, biasing upward the estimated impact on crime rates (Blattman et al., 2017).

The local network exposure approach (Aronow and Samii, 2017; Ugander et al., 2013) that we use contrasts with linear-in-means models (Manski, 1993; Bramoullé et al., 2009) in its assumed avenues of treatment interference. Local network exposure models assume the avenues of interference for each subject are limited to the treatment assignments of other subjects in their local neighborhood. Discrete treatment “exposure conditions” are defined based on a subject’s and their connections’ treatment assignments, and are used within Rubin’s potential outcome framework as a set of potential treatment conditions (as opposed to using the levels of treatment). On the other hand, linear-in-means models (Manski, 1993; Bramoullé et al., 2009) hypothesize indirect treatment effects manifest as a linear relationship between a subject’s outcome and the average treatment and average outcome of that subject’s peers. The dependence between outcomes of connected subjects allows for the a subject’s outcome to be influenced by any other subject to which they are directly or indirectly connected to in the social network, with the amount of influence being modulated by their distance in the network. While we use a local network exposure model that allows us to focus on the effect of exposure to at least one treated subject, in section 2.3, we explain how our approach can allow for similar dynamics as a linear-in-means model if we increase the number of indirect treatment bins assumed to influence an individual.

Our approach is related to a growing literature in economics, political science, sociology, and statistics on network sampling and mismeasurement. Fundamentally, we draw a distinction between the observed network data and the true latent network. That is, we view the observed graph as a corrupted version of a true, unobserved, network. As in Handcock and Gile (2010) and Newman (2018), we relate the two via a probabilistic model and, given a set of model parameters, construct a distribution over the true network conditional on the observed graph. Existing methodologies consider cases of sampled networks without measurement error, where some of the potential links are intentionally not elicited. In these cases, estimating model parameters relating the observed graph to the true latent network can be relatively straightforward.555For example, Chandrasekhar and Lewis (2011) shows how egocentrically sampled network data can be used to predict the “full” network in a process they term graphical reconstruction. See Williams (2016) and Griffith (2017b) for sample applications. In effect, the sampled graph can act as a training set from which to learn a probabilistic model that characterizes the full network. In contrast, our paper considers the case where all potential links are measured, but may contain some error. In this case, even a full graph cannot be used to train probabilistic models, because of the potential for error on every link (and non-link). This creates an inability to learn the parameters of the corruption process. For example, the observed network data does not inform the proportion of true links missing from the observed graph.

This paper proceeds as follows. In Section 2 we discuss existing methods for estimating direct and indirect treatment effects, with focus on a model from the potential outcomes framework. We characterize this model’s limitations in the presence of missing links. In Section 3 we propose a mixture model to estimate treatment effects that can account for latent ties between subjects. We discuss when this model is identified, how to estimate model parameters and treatment effects, and examine model performance using simulations. We apply our methodology to an experiment in Section 4, and conclude in Section 5 with a discussion.

2 Measuring network spillovers in experiments

2.1 Notation

In this section we introduce some basic notation that we will use throughout the rest of the paper. Let index the subjects in the study, with corresponding observed outcomes

, which we vectorize as

. For simplicity, suppose treatment is binary with levels “treatment” (1) and “control” (0), and the treatment assignment mechanism is random and explicitly known. Denote the vector of treatment assignment with , in which the treatment of individual is . Suppose the true influence network is directed and binary, with the edge , representing individual ’s influence on individual , encoded by . Let denote the th column of , indicating the influencers of individual , so is the the number of influencers or in-degree of . For now, let us assume is observed. Finally, let denote the th column of G normalized to sum to 1 () unless , in which case .

2.2 Local network exposure model

Aronow and Samii (2017) and Ugander et al. (2013) propose estimators for average direct and indirect treatment effects by building on the Rubin causal model (Rubin, 1974). In the context of experiments, each subject has a set of “potential outcomes” corresponding to the possible outcomes under each treatment. The inference task is to estimate the average treatment effect, defined to be the difference between the average outcome of the population if the entire population was treated and the average outcome if the entire population was in the control:

(1)

This quantity is not observed since we cannot observe the full set of potential outcomes for each subject, but assuming completely random assignment can be estimated by the difference in sample means:

(2)

where is the number of subjects in treatment . A crucial assumption in the Rubin causal model is SUTVA, which states than a subject’s potential outcomes are unaffected by the treatments of other subjects. In experiments on networks, SUTVA is violated if the treatments of peers influence the outcomes for an individual.

Aronow and Samii (2017) considers a violation of SUTVA by allowing for individuals to be systematically affected by the treatment assignments of their peers. By making assumptions that restrict the nature of these influences, they induce mappings of the treatment vector to distinct “exposure conditions”, or what Manski (2013) terms “effective treatments.” In a simple instance of their framework, which we borrow for our model in Section 3, individuals are affected by whether or not any of their influencers in are treated, inducing a random assignment into one of four exposure conditions, corresponding to levels of direct and indirect exposure to treatment:

(3)

In this model, both direct and indirect effects are taken to be binary, with an individual being indirectly exposed to treatment if one or more of their influencers are (directly) treated. Each subject would have four potential outcomes , one for each exposure condition. Note this setup assumes that an individual can only be influenced by a first-order connection in the network and the number of connections treated does not have an effect beyond the presence or absence of at least one. The choice of indirect exposure can be related to diffusion models of information and disease in which “contagion” can occur given a single source of exposure (Centola and Macy, 2007). Both of these assumptions could be relaxed, however, simply by adding additional potential outcomes.

The primary quantities of interest would then be given by average treatment effects akin to equation (1):

(4)

The average direct treatment effect would be given by , while the average indirect treatment effect when not directly treated would be . Estimating these quantities is equivalent to estimating the mean outcomes of the entire population under each exposure condition:

(5)

so we focus on the latter for this section and the next, with the additional assumption that each subject is assigned to treatment with some constant and known probability independently of other subjects. Note if certain subjects have zero probability of being placed in certain exposure conditions, e.g. when a subject has no influencers, estimation must be restricted to the sub-population of individuals with non-zero probability of being placed in every condition. In contrast to the case when the SUTVA assumption is satisfied, we cannot estimate these means using just their sample counterparts. Variability in the in-degrees of individuals causes variation in the probabilities of assignment to each exposure condition. Namely, individuals with high in-degree are more likely to be indirectly exposed to the treatment since they have more influencers who potentially may be treated. This selection bias could affect the mean estimates if there is heterogeneity in the outcomes within exposure conditions associated with in-degree. Horvitz-Thompson estimators use inverse probability weighting in order to take varying exposure probabilities into account to produce unbiased estimators of these mean outcomes:

(6)

Note this estimator is equivalent to the sample mean if the probability of assignment to a cluster is constant among subjects. These estimators are unbiased regardless of the form of the heterogeneity between the outcomes and network degrees, but can have high variance when the exposure conditions are highly imbalanced on in-degree. This would arise when the probabilities

are small for some , yielding large weights .

Explicitly modeling the relationship between potential outcomes and network degrees can result in more stable estimators at the cost of additional assumptions about the validity of these relationships. For example, suppose we believe that for each exposure condition , the relationship for the in-degree and the potential outcome can be modeled with

(7)

where is the in-degree of individual and are model parameters. Assuming this model accurately characterizes the relationship between the potential outcomes and in-degrees, the distribution of potential outcomes is conditionally independent of the exposure assignment (induced by the treatment assignment) vector given the in-degrees of the subjects, such that the exposure assignment mechanism can be “ignored” during inference of the means (Rubin, 1974). The estimate of the mean outcome under exposure condition would then be given by

(8)

provided an estimate of model parameters

. Parametric models

of the outcomes under condition and in-degree are necessary for likelihood-based approaches to estimation and are used in the model we propose in Section 3. A common model familiar to many economists is

(9)

which corresponds to a linear model with different intercepts and slopes for each exposure condition (but common variance). In this case, the estimates of mean outcome would be given by

2.3 Relation to the linear-in-means model

Although we focus on the local network exposure model in the subsequent sections, let us first compare it to another popular approach for accounting for and measuring treatment spillovers: Manski linear-in-means models (Manski, 1993; Bramoullé et al., 2009). In the context of experiments without additional covariates, a basic form of these models would be

(10)

In addition to allowing for “endogenous effects” (), which would account for second- and higher-order indirect treatment effects (e.g. how an individual is affected by their peers’ peers’ outcomes), the linear-in-means-model differs from the described local network exposure model by placing a different set of assumptions on the mechanism of indirect treatment effects. Rather than partitioning indirect effects into varying magnitudes based on indirect exposure conditions, the linear-in-means model assumes the size of indirect effects vary linearly with the proportion of peers treated. Note a similar assumption could be used with the local network exposure setting by characterizing indirect exposure with proportions of peers treated instead of the presence of any peers treated, albeit these proportions would have to be arranged into discrete bins. Similarly, we could introduce a non-linear dependence on the number of treated peers in the linear-in-means framework by adding multiple indicator variables to Equation 10. In both cases, introducing more discrete potential outcome categories will likely lead to small cell counts in practice. This observation highlights the importance of the linearity assumption in the linear-in-means model.

2.4 Local network exposure model under mismeasurement

In this section we consider the potential for bias in Horvitz-Thompson estimators (6) if using a corrupted network, , to estimate exposure conditions instead of the true network . We allow to be corrupted such that there are either links present in that are not in or vice-versa. Suppose our treatment assignment mechanism is constructed such that each subject has positive probability of being placed in treatment and positive probability of being placed in control. We can break the impact of using in estimation into two distinct factors. First, recall that the Horvitz-Thompson estimator can only be used for subjects with non-zero probability of being placed in each exposure condition. Namely, subjects with zero in-degree must be excluded, reflecting the idea that a potential outcome under indirect exposure only makes sense if the subject could be indirectly exposed to treatment under some hypothetical treatment assignment. When we observe a corrupted version of the network, we may not be able to accurately identify which subjects should be excluded. Certain individuals who have positive in-degree in may be observed to have zero in-degree in and thus would be incorrectly excluded for estimation. At the same time, certain individuals with may be observed to have positive in-degree and thus be included during estimation. If either of these situations arose, our estimated average outcomes would then represent a different subpopulation than the true population of subjects with non-zero in-degree.

Second, even if we are able to accurately identify all subjects with non-zero in-degree, bias in mean estimates may be induced by distorted observed exposure conditions. Subjects who are in truth indirectly exposed to treatment would not be observed to be indirectly exposed if all connections to influencers who are treated are unobserved (and no false links to other treated individuals are observed). Similarly, subjects not indirectly exposed to treatment may be falsely observed to be indirectly exposed. The corrupted exposure conditions are able to correctly identify the level of direct treatment for each subject but not necessarily the level of indirect treatment. Mathematically, observing for any may correspond to either or . Recall that in this notation the first subscript denotes the direct treatment condition (whether is directly treated or not) and the second subscript denotes the indirect treatment (whether at least one member of ’s network was treated). The Horvitz-Thompson estimators for each treatment exposure condition under the corrupted network are given by

(11)

where observed is dependent on the true exposure condition and the probabilities are taken over possible treatment assignments . Holding the observed and true networks fixed and taking the expectation of the estimators over the possible treatment assignments we have:

(12)
(13)
(14)

We find the mean estimate of the conditioned on under the corrupted network tends to lie between the mean outcomes under the two exposure conditions corresponding to the same level of direct treatment: and . As an example, if we assume the probability of observing an incorrect exposure condition is constant for all , then

(15)
(16)

for some and our estimate is a linear combination of these two mean outcomes.

In Figure 1, we illustrate the bias of the Horvitz-Thompson estimators in our simulated experiments on the networks of 75 Indian villages using data from Banerjee et al. (2013). Our simulation scenarios vary the proportion of true links which are unreported () and the proportion of falsely observed links666More specifically, the probability of observing a link which should not be observed, scaled by the density of true network. (), where the links being added/dropped are completely at random (an assumption we relax later on). Further details of the simulation are provided in Section 3.4, but we show results here as an empirical counterpart to the finding above.

Figure 1: Bias of the Horvitz-Thompson estimates for the mean outcome under each exposure condition, for various mismeasurement scenarios (indexed by and ). Intensity of color is used to denote the magnitude of these biases, with upwards biases colored red and downwards biases colored blue.

First, consider the behavior of our estimators when we fix and vary the proportion of unreported true edges . As

increases, more links to treated subjects are dropped, leading to a larger proportion of subjects with indirect treatment to be falsely classified as not indirectly treated. The bias in the Horvitz-Thompson estimators for no exposure

and direct exposure increase upwards, as these estimators trend towards the mean outcomes under the indirect and full exposure conditions respectively (in these simulations, and ). The bias in the HT estimators for indirect exposure and full exposure also increase upwards, but to a smaller magnitude. Since true links are dropped independently regardless of the treatment status of the subjects involved, the reduced number of subjects observed to be indirectly treated is accounted for by a decrease in the probabilities in the denominator of Equation 11. Rather, the additional bias for these estimators arises solely from low-degree subjects being removed from the estimation procedure (in these simulations, higher degree subjects tend to have higher outcomes). This effect is more pronounced at low , when fewer (if any) false edges are being added and we are more likely to observe zero-degree subjects.

Similar patterns of behavior emerge when we fix and vary . As increases, more subjects with no true connections to treated individuals are falsely observed to be indirectly treated. As a result, the HT estimators for indirect exposure and full exposure bias further downwards, towards the mean outcomes for no exposure and direct exposure respectively. Additionally, individuals with zero-degree are more likely to be included in the HT estimation procedure, biasing all four estimates downwards. Note this effect is not attenuated at higher , since zero-degree individuals have no true links to drop to begin with.

3 Latent Variable Model for Network Spillovers

In this section we propose a latent variable approach to estimating average treatment effects when the network observed is a noisy representation of the true network of interest. We assume that each true edge is not observed with probability , non edges are observed with probability , and edges are observed/not observed independently of one another. These corruption mechanisms assume the observed edges are a random subset of the true edges and the false edges are a random subset of the non-edges, and notably ignores any differences in saliency between edges. We have relaxed this assumption to include mechanisms which allow for adding/subtracting edges to depend on observed covariates, though we do not include these results in this draft.

3.1 Latent Variable Model

Suppose the true network of interest is unobserved and we only observe a corrupted network . Furthermore, assume the effects of treatment can be characterized with the four exposure conditions defined in (3). For individual , we observe corrupted exposure condition and in-degree . The statistical problem is then to model the relationship between these corrupted statistics and their true, latent, counterparts. Given a distribution over the true exposure condition and in-degree , we can use models like those in equations (7) and (8) to estimate mean outcomes for each exposure category. For notational simplicity, let represent the vector of corrupted exposure conditions, the vector of corrupted in-degree, and and the corresponding latent terms.

Consider subject , who has exposure condition , degree , and connections with treated subjects, but for whom we observe exposure condition , degree , and connections with treated individuals instead. Holding the treatment assignments to be fixed, we can separately model the number of connections to treated subjects and the number of connections to not-treated subjects , from which we can derive the induced exposure conditions. Note this procedure works for any indirect exposure conditions entirely characterized by the number of treated connections and the number of total connections (e.g. ratio of treated connections) and not just (3). Following Bayes’ rule and noting we observe treated connections when of the actual treated connections are dropped and another false connections to treated individuals are observed,

(17)
(18)

where is the probability of

successes from a binomial distribution with

attempts and success probability . Similarly for connections for non-treated subjects,

(19)

Both sets of equation require a (prior) model over the number of true connections to treated and un-treated subjects. Assuming no additional information about the structure of the true network, one of the most simplistic models would be to model the true graph as an Erdos-Renyi graph. Under this model, the number of connections to treated/un-treated subjects could be modeled with binomial distributions. However, in many real-world networks we find that the degree distribution demonstrates extra-binomial variation, and thus in the following sections we prefer to use a beta-binomial model. With a beta-binomial distribution, we can think of each degree as being sampled from a binomial distribution , where

is independently sampled from a Beta distribution

for each draw. Rather than and , we find it helpful to characterize beta-binomial distributions in terms of an average probability of success and an overdispersion parameter . The variance of this beta-binomial would be given by , compared to for a binomial distribution with parameter . We leave the these parameters to be chosen on a application-by-application basis, noting that the choice of these parameters are more influential when there is high corruption in the network and hence higher uncertainty over the true degrees 777Via simulations, we find that a good choice of , which governs the overall density of the true network, is more important for our model to recover unbiased estimates..

Using the above equations, we can express the relationship between the true exposure condition and degree and their observed counterparts:

(20)
(21)
(22)

Equation (22) defines a distribution over the true, unobserved exposure condition and in-degree, conditional on the treatment vector and the number of observed treated and non-treated connections for individual . When coupled with a parametric model (see 7) for the potential outcomes under each (true) exposure condition and in-degree , we can model the observed outcome as arising from a mixture of the with weights corresponding to the probabilities over the unobserved quantities.888One downside of the Horvitz-Thompson estimator (6) is that it does not model individual potential outcomes and thus is less amenable to likelihood-based approaches. The log-likelihood of the parameters given is

(23)

Estimation of the parameters are provided using maximum likelihood estimation via the Expectation-Maximization algorithm, details of which are provided in Section 3.3. Note that likelihood estimation is only justified if the observed outcomes are representative of the potential outcomes under each exposure condition, conditional on the true in-degrees. In general, it is sufficient to have a random treatment mechanism, such as randomly choosing out of subjects to treat999Stratified sampling based on known covariates could also be addressed by directly introducing these covariates into the model..

Provided an estimate of the model parameters , estimating the mean outcome under exposure condition (recall equation (5)) is straightforward and given by the expectation of the potential outcome under exposure for each subject averaged across the population. We estimate with the following plug-in estimator:

(24)
(25)
(26)

3.2 Identification

Before we discuss estimation strategies for our mixture model (23), we will (partially) characterize the conditions under which this model is identifiable. Without model identifiability, estimation may be unstable and parameters estimates uninterpretable. In this section, we assume arise from a common univariate family of distributions parameterized by .

In general, mixture models are trivially unidentifiable since relabeling components yields different parameterizations of a model with the same marginal distribution (see Chapter 1.5 of McLachlan and Basford (1988)). This identifiability issue is of particular concern in our setting, where the labeling of the components is inherently meaningful; for example, being unable to disentangle clusters corresponding to no treatment and indirect treatment would leave us unable to estimate the direction of any indirect treatment effect. We are able to leverage the structure from our corruption model and the linear relationships between mixture components with the same exposure condition to prevent such relabeling from occurring.

Following (Frühwirth-Schnatter, 2006)

, we use “generic identifiability” to refer to identifiability problems not solved by permuting component labels. Generic identifiability holds for mixtures of Gaussians and many other univariate continuous distributions, with the major exceptions being the binomial and uniform distributions. For the binomial distribution, generic identifiability only holds under a lower bound on the number of trials/observations per subject, dependent on the number of components. See

(Frühwirth-Schnatter, 2006) for a review of generic identifiability issues.

Unfortunately, generic identifiability of the mixture model (23) does not directly follow from the generic identifiability of the family , as (Hennig, 2000) showed in the case of mixtures of linear regression models. For example, in a mixture of simple linear regressions with two distinct covariate values and common variance , an equal mixture of and yields the same model as a equal mixture of and . Observations from a third covariate value would yield generic identifiability. While not immediately applicable to our class of models since in-degree (our covariate) is also latent, (Hennig, 2000) and (Grün and Leisch, 2007) define conditions under which mixtures of linear and generalized linear models are generically identifiable.

Next, we explicitly prove the identifiability of our mixture model under the regression model (9) for . Results are readily generalizable to other that arise from generically identifiable families provided that distinct values of would allow for the identification of our model parameters from the distribution parameters .

Proposition 1.

Let be defined as in (9) and as in (22). Assume 101010Both of these edge cases are relatively uninteresting, as when all true edges are not observed and when all non-edges are falsely observed. and that indirect exposure has some effect (i.e. and ). Then

(27)

for all given and implies as long as there exists two distinct such that we have subjects under each direct treatment status with observed degree , and, of these subjects, some have treated connections while others do not, with .

Proof.

It suffices to show identifiability of , since we assume direct treatment status can always be accurately ascertained. The exposure conditions are only corrupted with one another, as are .

Let us begin with the most general case, when both . In this situation, the probabilities are positive over all feasible true exposure conditions and degrees, regardless of the pair of observed degrees (,). The only restriction on the support of these probabilities are that, under no indirect treatment, degree cannot be larger than (otherwise there would have to exist a connection to a treated subject), and degree must be at least one for an individual to be indirectly treated. Mathematically, for any satisfying and for any .

At the other extreme, when there is no corruption ( and ), then the true exposure condition and degree match their observed counterparts. Mathematically, only for and either if or if . When exactly one kind of corruption exists, the support of is limited, but to a lesser extent that when neither types of corruption exist. When but , true edges can be dropped but all observed edges also exist in the true network. Namely, any observed connection to a treated subject must exist in the true graph. For subjects with at least one of these connections , the support of is limited to and . If instead we have , is positive for when and when . Lastly, when and , the observed connections is a superset of the links in the true graph. Thus, when we observe no connections to treated subjects , is only positive for and . When such an connection is observed, is positive for and or and .

Case 1:

, , and

For any pair of

, the LHS is a mixture of normal distributions that includes

distinct components with means and variance for any from . There are at most other mixture components corresponding to the terms. Following the generic identifiability of finite normal mixtures, the same component normals must exist on the RHS, with the same weights. For there to be at least distinct components on the RHS for both and , we must have and . On the LHS, we have components which are evenly spaced apart, while on the RHS we have components evenly spaced apart. Since there are fewer than other components on either side, these components must match, with . This leads to two possibilities: we must have either and or and . The latter cannot occur due to would-be inconsistencies in the weights. For example, consider weights for the component with mean under this scenario. On the LHS, the weight would correspond to the probability , while on the RHS, the weight would correspond to the probability . The former quantity changes with if holding the total observed degree fixed, since the observed treated degree would affect the probability of a true treated connection, but the latter does not since for very large true degree we will always have a treated connection. Thus, we have and .

We can then use our identification of the components to isolate the remaining, unexplained components, which must correspond to . If , the LHS has remaining components, while if , the LHS has one component. The same holds for and the RHS. Thus, when , and we must have . On the other hand, if , both sides consist of components, spaced and apart respectively. We must have either and or and . Following similar logic as above for the components, we can use would-be inconsistencies in the weights to eliminate the second scenario. Namely, consider the weights for the component, which is for the LHS and for the RHS. For fixed , is unaffected by varying as all observed connections regardless of treatment status must be falsely observed, while the treatment status of the observed connections will effect the probability of having a treated connection given true connections.

Case 2:

, , and

Next, let us consider the scenario when we have , but . For any pair of , the LHS is a normal mixture including or components with means and and variance for any from . Since the number of components does not change for any pair of observed degrees, we have , , and . Following the same logic used in case 1 but reversing the order in which we consider the and components, we can show .

The alternate scenario involves the case and . Since we assume there is a non-zero indirect treatment effect (), the LHS consists of a mixture of two normals with means and . Following the generic identifiability of normal mixtures, the RHS must consist of two normals with the same means. In order for the RHS to have two mixture components regardless of observed degree , we must have and as well as non-zero corruption in both and . For , would yield just one mixture component, and similarly with for . Thus, either and or and . If the latter is the case, the weight of the component is the probability of no indirect treatment on the LHS and the probability of indirect treatment on the RHS. These weights must be the same for any pair of . However, when holding fixed and increasing the number of observed connections to treated individuals , the weight of the LHS decreases while the weight of the RHS increase. Thus, we must have and .

Case 3:

and

First, consider an observation with at least one observed connection to a treated subject . The mixture on the LHS consists of components with means corresponding to for any satisfying . If , the LHS will just be one component, while if , the LHS will have distinct components.

Suppose for now the latter is true. Then increasing total observed degree decreases the number of components on the LHS. Changing total observed degree has no effect on the number of distinct components when or , while the case and would imply an increase in the number of distinct components. Thus, to match the behavior on the RHS, we must have and . For the components on both sides to have the same set of means, we must have either and or and . We can again invalidate the second case by examining would-be inconsistencies in the weights , but in this case we can also simply note that the latter scenario cannot be simultaneously valid across multiple choices of . Having established and , we can consider observations with and isolate the remaining components on the LHS, of which there would be either 1 (if ) or (if ) components. Matching these components on the RHS across multiples values of will avoid the potential case where and yield and .

Let us now return to the case where . While we could still find and , examining the number of components when is not sufficient to imply and . However, we can attempt to ascertain whether or not this must be the scenario by examining observations with . If , there would be distinct components on the LHS. A decreasing number of components for these observations as increase is only consistent with and . From here, we can use the equal spacing of these components as well as the structure imposed by the weights to show and .

Lastly, when both and , we observe one mixture component with mean when and two mixture components with means and when . Returning to the logic used in the counterpart scenario in case 2, the LHS can only be matched when and . Then the RHS will have one mixture component when and two components when , and we will have .

Case 4:

and

This case follows identical logic as case 3 but switching the roles of the and components. Namely, observations with will isolate the components, which in turn can be used to inform observations with to match the components.

Case 5:

and

For any pair of , the LHS will consist of consist of a single normal distribution. If or , this behavior could only arise if and . However, we require , so we must have and . Observations from two distinct values of for each of and will uniquely identify the model parameters and respectively.

3.3 Estimation

Maximizing the log-likelihood (23) with respect to the parameters cannot be done in closed form due to the summations inside the logarithmic terms. However, if we had directly observed the latent variables , the log-likelihood of the parameters given , and would be given by

(28)

This would be substantially easier to work with, due to the lack of summation inside the logarithmic terms. The EM algorithm (Dempster et al., 1977) is a well-established technique for maximum likelihood estimation in the presence of latent variables that leverages this disparity between the two log-likelihood expressions. Given some set of initial parameter values

, the algorithm alternates between estimating posterior distribution of latent variables for each subject given the current parameter values (E-step) and updating the parameter values given these posterior probabilities (M-step). Explicitly working with the latent variables in the M-step yields simpler maximization problems. Each iteration of the algorithm increases the log-likelihood, leading to a local optima, and the algorithm is run from multiple initialization values in order to achieve a higher quality final estimate

.

Suppose at iteration we have parameter estimates . In the E-step, we compute the posterior probabilities over the latent variables using the current parameter estimates. These probabilities, or “responsibilities,” are given by

(29)
(30)

In the M-step, we use these responsibilities to maximize the expectation of the complete likelihood 28 under these posterior probabilities

(31)