Estimating Causal Effects of Non-Randomized HIV Prevention Interventions with Interference in Network-based Studies among People who Inject Drugs

Evaluating causal effects in the presence of interference is challenging in network-based studies of hard to reach populations. Like many such populations, people who inject drugs (PWID) are embedded in social networks and often exert influence on others in their network. In our setting, the study design is observational with a non-randomized network-based HIV prevention intervention. The information is available on each participant and their connections that confer possible shared HIV risk behaviors through injection and sexual risk behaviors. We consider two inverse probability weighted (IPW) estimators to quantify the population-level effects of non-randomized interventions on subsequent health outcomes. We demonstrated that these two IPW estimators are consistent, asymptotically normal, and derived a closed form estimator for the asymptotic variance, while allowing for overlapping interference sets (groups of individuals in which the interference is assumed possible). A simulation study was conducted to evaluate the finite-sample performance of the estimators. We analyzed data from the Transmission Reduction Intervention Project, which ascertained a network of PWID and their contacts in Athens, Greece, from 2013 to 2015. We evaluated the effects of community alerts on HIV risk behavior in this observed network, where the links between participants were defined by using substances or having unprotected sex together. In the study, community alerts were distributed to inform people of recent HIV infections among individuals in close proximity in the observed network. The estimates of the risk differences for both IPW estimators demonstrated a protective effect. The results suggest that HIV risk behavior can be mitigated by exposure to a community alert when an increased risk of HIV is detected in the network.



There are no comments yet.


page 1

page 2

page 3

page 4


Causal Inference from Observational Studies with Clustered Interference

Inferring causal effects from an observational study is challenging beca...

Dynamic Network Prediction

We present a statistical framework for generating predicted dynamic netw...

Inverse Probability Weighted Estimators of Vaccine Effects Accommodating Partial Interference and Censoring

Estimating population-level effects of a vaccine is challenging because ...

G-Formula for Observational Studies with Partial Interference, with Application to Bed Net Use on Malaria

Assessing population-level effects of vaccines and other infectious dise...

Estimation for network snowball sampling: Preventing pandemics

Snowball designs are the most natural of the network sampling designs. T...

Evaluating stochastic seeding strategies in networks

When trying to maximize the adoption of a behavior in a population conne...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The objective of this work is to evaluate causal effects in the presence of dissemination (also known as interference or spillover) where usual assumptions, such as partial or clustered interference, may no longer hold. This proves to be a challenging problem in network-based studies of hidden or hard-to-reach populations, such as people who inject drugs (PWID), where participants are frequently recruited through contact tracing. Worldwide in 2011, an estimated 10 of new HIV infections occurred because of injection drug use, and this proportion rose to 30 outside Africa [prejean2011estimated, lansky2014estimating, mathers2008global]. In Greece through 2010, there were only a few sporadic cases of HIV transmission among PWID; the HIV epidemic was traditionally concentrated among men having sex with men. From 2002 to 2010, only between 11 and 19 HIV cases were reported annually among PWID, representing 2% to 4% of newly diagnosed HIV infections per year. In 2011, the number of reported HIV cases among PWID increased 16 fold from the number reported in 2010, to a total of 260 cases. The emergence of the HIV outbreak among PWID in Athens coincided with an economic recession, highlighting its possible causal role in the outbreak due to the temporal ordering [econrecession2013]. Investigation of the outbreak demonstrated that clustered HIV transmission among PWID was rare until 2009. Starting in 2010, a large proportion of HIV sequences from newly diagnosed PWID could be grouped into PWID-specific phylogenetic clusters, indicating that parenteral transmission with contaminated syringes or other injecting equipment was now occurring in this population. Prior to 2011, prevention and harm reduction services, such as medication for opioid use disorder and syringe exchange–distribution programs were available; however, access to these services remained low among PWID. Most of the newly diagnosed PWID (about 70%) in 2011 were residents of Athens, suggesting that the outbreak was also geographically localized. [aristotle, nikolopoulous2015bigevent]

Effective interventions were urgently needed to prevent further transmission in Athens. The Transmission Reduction Intervention Project (TRIP) was a successful attempt to recruit and intervene in this population by contact tracing the injection and sexual networks of recently-infected PWID. The program then referred people recently infected to engage in HIV treatment and care, both to protect their own health and to reduce onward transmission of HIV to others particularly during the early stage of HIV known to be acutely infectious. Interestingly, this network design can be used to investigate the connections or ties among people who are infected and uninfected, and thus we can address questions about why certain groups of people who are uninfected remain that way despite having risk network ties to people who both have high viral loads and engage in risky behavior [williams2018pockets]. The TRIP recruitment strategy successfully found more recently infected PWID then other strategies such as a respondant driven sample or venue based recruitment. The findings suggested that using strategically network-based approaches can accelerate seeking, testing, and treating recently infected PWID. Moreover, reducing viral loads as early as possible is likely to decrease the expected number of transmissions in a community [nikolopoulos2016network].

Public health interventions in these populations often have disseminated effects, also known as indirect or spillover effects [spillover2017, diffusion2020]. Recent work demonstrated that there are disseminated effects of HIV behavioral interventions, suggesting that intervening among highly-connected individuals may maximize spillover benefits to others [Rewleye033759]. Akin to other populations, PWID are embedded in social networks and communities (e.g., injection drug, non-injection drug, sexual risk network) in which they possibly exercise an influence upon other members [hayes2000design, ghosh2017social]. This influence can be measured as a disseminated effect among individuals who were not exposed themselves but possibly receive intervention benefits from their social connections exposure to the intervention. In PWID networks, interventions (e.g., educational training about HIV risk reduction, medical interventions such as pre-exposure prophylaxis, or treatment as prevention) may have disseminated effects, and intervention effects frequently depend on the network structure and intervention coverage levels. Disseminated effects may be larger in magnitude than direct effects (i.e., effect of receiving the intervention while holding the exposure of other individuals fixed), suggesting that an intervention has substantial effects in the network beyond those exposed themselves [buchanan2018assessing].

The current methodologies used to evaluate direct and disseminated effects among members of hidden or hard-to-reach populations remain limited. In particular, relatively few methodological approaches for observational network-based studies have been developed, and methods that incorporate the observed connections in the underlying network structure are needed to ensure that full intervention impacts are better understood. Recent methodological developments have relaxed the no interference assumption and allowed for interference within clusters, known as partial interference [sobel2006randomized, hong2006evaluating, hudgens2008toward, tchetgen2012causal, liu2014large]. In partial interference approaches, a clustering of observations is used to define the interference set (e.g., study clusters, provider practices, or geographic location) that allow for interference within but not across clusters; however, the information on shared connections or ties (i.e., edges) within a particular cluster is typically not measured or utilized in the analytical approach [aronow2017samii]. Another approach defines interference by spatial proximity or network ties [liu2016inverse, forastiere2016identification], allowing for overlapping interference sets. In [liu2016inverse], an IPW estimator is proposed for a generalized interference set that allowed for overlap between interference sets; however, the asymptotic variance was estimated under the assumption of partial interference defined by larger groupings or clusters of participants in the study. The subclassification estimator and generalized propensity score (GPS) were used to quantify effects, and a bootstrapping procedure with resampling at the individual-level or the cluster-level was used to quantify the variance [forastiere2016identification]. However, these approaches either rely on partial interference defined by larger clusters to derive estimators of the variance or resort to bootstrapping with resampling at the unit level or the cluster level. In practice, ignoring the overlapping interference sets while estimating the variance can lead to inaccurate inference and resampling approaches, particularly in a network setting, can be computationally intensive.

While previous work allows for overlapping interference sets for the point estimators, the asymptotic variances were estimated under the assumption of partial interference or using bootstrapping techniques. Our paper addresses an important gap by developing inverse probability weight estimators and derive a closed-form variance estimator that allows for overlapping interference sets, often leading to a more statistically efficient estimator in network-based studies due to the use of additional information on connections between individuals. In our paper, we propose two inverse probability weighted (IPW) estimators where the interference set is defined as the set of the individual’s nearest neighbors within a sociometric network; that is, every individual with whom each participant shares a direct connection in the network. The first IPW estimator is a novel extension for a sociometric network-based study setting of the estimator in [liu2016inverse] originally developed for clustered observational studies. Specifically, the partial interference assumption is relaxed such that interference sets are uniquely defined by nearest neighbors for each person. The second IPW estimator uses a generalized propensity score developed by [forastiere2016identification]; however, we propose a weighted estimator instead of a stratified estimator for comparison to the first IPW estimator. For both estimators, we use nearest neighbors as the interference sets and use this structure to calculate a novel closed-form variance estimator by applying M-estimation.

The rest of the paper is structured as follows. In Sections 2, we introduce the TRIP study design. In Section 3 and 4, we define the notations and assumptions for nearest neighborhood settings and the estimands of interest for this setting. We then provide detailed definitions of the two IPW estimators, and demonstrate the estimator is asymptotically normal, and obtain a closed-form estimator of each variance in Section 5. In Section 6, a simulation study was conducted to demonstrate the finite-sample performance of both estimators. The methods were then utilized to assess the direct and disseminated effects of community alerts on HIV risk behavior in the sociometric network study of PWID and their contacts, Transmission Reduction Intervention Project (TRIP) from 2013 to 2015 in Athens, Greece. We discuss limitations of this approach and next steps for methodological work to quantify causal effects in network-based studies in Section 7.

2 TRIP Study Design

The Transmission Reduction Intervention Project (TRIP) recruited adult individuals who were recently diagnosed with HIV and their potential HIV risk partners through sexual and injection routes of transmission. TRIP used contact network tracing and venue recruitment methods to locate those who were at risk for recent infection with HIV based on their proximity of connection to other recently-infected individuals. PWID who were participants in the ARISTOTLE study at HIV testing centers in Athens were initially recruited into the TRIP study if they were found to be recently-infected with HIV [aristotle]. Each recently diagnosed individual was asked to identify their recent sexual and drug use partners. These direct contacts were then recruited and asked to identify their sexual and drug use partners, who were also recruited and linked back to other individuals in the study. If any of these partners were identified as recently infected with HIV then their contacts and the contacts of their contacts (i.e., two waves of contact tracing) would be recruited as well and linked back to the network. This information was used to create a final observed network where each recruited individual is connected to all other individuals who named them as a contact or was named as a contact by them, regardless of recruitment order. Participants were interviewed using a questionnaire containing demographics, sexual and injection behaviors/partners in the prior 6 months, drug treatment, and antiretroviral treatment. This resulted in a network consisting of individuals recently diagnosed with HIV and their potential risk partners [nikolopoulos2016network]. In addition to HIV testing, the study provided access to treatment as prevention (TasP), referrals for medical care, and distributed community alerts to inform members of the community about temporary increases in the risk for HIV aquistion. These alerts were paper flyers provided to participants and posted in locations frequented by members of the local PWID community. Participants were followed to ascertain demographics, risk behaviors and substance use through interviews and HIV serostatus, timing of HIV infection, and HIV disease markers including HIV viral load, through phylogenetic techniques approximately 6 months later. For this study, we used data from the Athens, Greece site which was collected from 2013 to 2015 during the HIV outbreak that began following the economic recession in 2008 [nikolopoulous2015bigevent, williams2018pockets]. Figure 1 represents the TRIP network with 25 of 217 participants exposed to the community alerts. The network characteristics and distribution of attributes are summarized in Table 1. The TRIP network has average degree (SD=2.75) and density . The transitivity is where the transitivity measures the tendency of the nodes to cluster together. High transitivity means that the network contains communities or groups of nodes that are densely connected internally. The assortativity is which quantifies the extent to which connected nodes share similar properties.

Figure 1: The TRIP network. Black nodes represent the participants who were exposed to community alert and gray nodes represent participants who were not exposed.
Network Characteristics Nodes 217
Edges 363
Average Degree (SD) 3.35 (2.75)
Density 0.0155
Transitivity 0.25
Assortativity 0.24
Community alert Exposed 25 (11.5%)
Not Exposed 192 (88.5%)
HIV Status Positive 113 (52.1%)
Negative 104 (47.9$)
Date of first interview Before ARISTOTLE ended 105 (48.4%)
After ARISTOTLE ended 112 (51.5%)
Education Primary School or less 63 (29%)
High School (first 3 years) 69 (32%)
High School (last 3 years) 53 (24%)
Post High School 32 (15%)
Employment status Employed 33 (15.2%)
Unemployed; looking for work 54 (24.9%)
Can’t work; health reason 102 (47%)
Other 28 (12.9%)
Shared injection equipment Yes 58 (26.7%)
in last 6 months No 159 (73.3%)
Outcome: sharing injection Yes 126 (58%)
equipment at the 6-month visit No 91 (42%)

Note: The transitivity measures the density of triads in a network.
The assortativity quantifies the extent to which connected nodes share similar properties.

Table 1: Descriptive statistics of TRIP network characteristics and attribute variables after excluding isolates and 60 participants (21%) who were lost to follow up before their six-month visit.

3 Notation and Assumptions

We employ a potential outcomes framework for causal inference and assume the sufficient conditions for valid estimation of causal effects, which have been well-described previously [ogburn2014causal, liu2016inverse, forastiere2016identification]. However, we relax the no dissemination or interference assumption [rubin1980]. In our setting, we evaluate the effect of a non-randomized intervention on a subsequent outcome in an observed network, where information is available on the nodes (i.e., each participant) and their edges (i.e., HIV risk connections through sexual or injection behavior). We will focus on assessing the effect of being exposed to community alerts on HIV risk behavior (sharing injection equipment) at 6-month follow up. According to the network-based study design of TRIP that recruited at least one wave of contact tracing for each participant, we anticipate that there could be dissemination or spillover between two individuals connected by an edge (i.e., possible influence of neighbors’ exposure on another individual’s outcome). Based on these connections, we assume that smaller groupings or neighborhoods for each individual can be identified in the data. Following [forastiere2016identification], and [hudgens2008toward], we make the neighborhood interference assumption. The NIA is a network analog to the partial interference assumption for clusters [sobel2006randomized, hudgens2008toward]; however, partial interference does not assume a unique interference set for each participant, but instead the set is the same for all participants in a cluster. The NIA assumption applies to neighborhoods defined uniquely for each participant in the study, so the connections between individuals connecting neighborhoods can now be explicitly considered. This implies that the potential outcomes of a participant depend only on their own exposure and that of their nearest neighbors and not on the exposures of others outside the neighborhood in the network, positing that the disseminated effects go as far as the nearest neighbors. In other words, if the exposures of an individual and its neighbors are held fixed, then changing the exposures of others outside the neighborhood does not change the outcome for the individual. We assume that the neighborhoods are fixed and known and that the observed study network is complete and the baseline covariates are independent between individuals.

Consider a finite population of individuals where each individual self-selects their exposure to a study intervention. Let denote each participant in the study and let be the binary exposure of participant with if exposed and otherwise. Let

denote the vector of covariates for participant

. These participants are connected through an observed network that can be represented by a binary adjacency matrix , with if participants and share an edge or connection, and , otherwise. We assume . Denote the nearest neighbors of participant by . The degree of individual (or number of nearest neighbors) is denoted as . We denote the vector of intervention exposures for the nearest neighbors for participant as . In this setting, the outcome of participant depends not only on their own exposure, but also on the vector of their neighbors’ exposures (NIA). In other words, we let be the interference set of individual in which the neighbors’ exposures may affect the outcome of individual . We also denote the vector of pre-exposure covariates for the nearest neighbors for participant as . Denote realizations of exposures by and by . Similarly, denote realizations of covariates by and by .

Let denote the potential outcome of individual if they received intervention and their nearest neighbors received the vector of interventions denoted by . Let

denote the observed outcome, which holds by consistency. Therefore, the potential outcomes are assumed to be deterministic functions and the observed outcomes are assumed to be random variables. In our study setting,

represents an indicator for whether participant is exposed to community alerts and the pre-exposure covariates include HIV status, date of first interview, education status, employment status, and report of shared drug use equipment (needles, syringe) in last 6 months. The observed outcome is the status of sharing injection equipment at the 6-month visit.

In this paper, we define average potential outcomes using a Bernoulli allocation strategy [tchetgen2012causal], where represents the counterfactual scenario in which participants in receive the exposure with probability and we refer to this parameter as the intervention coverage in neighborhood. To note, we are not assuming that

are independent Bernoulli random variables; however, this distribution of exposure is used to define the counterfactuals. This is essentially like standardizing the observed exposure vectors to study population in which the exposure assignment mechanism follows a Bernoulli distribution with probability

. This allows stochasticity in the intervention assignment for individuals who are possibly members of more than one neighborhood. Recall that in the observed data, we are assuming information was collected on a sociometric network with non-randomized exposure to the intervention. Let denote the probability of the neighborhood of individual receiving exposure under allocation strategy . Let denote the probability of individual receiving exposure and denote the probability of individual together with their nearest neighbors receiving the set of exposures .

4 Estimands

We follow notations from [liu2016inverse] to define the estimands. Define to be the average potential outcome for individual under allocation strategy and treatment where the summation is over all possible values of . Averaging over all individuals, we define the population average potential outcome as . We also define the marginal average potential outcome for individual under allocation strategy by and define the marginal population average potential outcome as .

We consider different contrasts of these average causal effects often of interest in network-based studies, where disseminated effects are of primary interest. We define these on the difference scale and analogous effects can be defined on the ratio scale. The direct effect is defined as , which compares those exposed to those not exposed in a neighborhood with coverage level of . For example, in TRIP study, the direct effect compares the reporting risk behaviors (e.g. shared injection equipment) of those exposed to community alerts to those who did not, in a neighborhood where % of individuals received alerts. The disseminated (i.e., indirect or spillover) effect is , which compares those not exposed themselves in neighborhoods with a coverage level of to those not exposed in neighborhoods with coverage level of . The composite or total effect is defined as , which is a function of both the direct and disseminated effects and is a measure of the maximal intervention effect (assuming that ), comparing those exposed in neighborhoods with coverage to those not exposed in neighborhoods with coverage . Lastly, the overall effect is , which is the difference in average potential outcomes under one coverage compared to another coverage.

5 IPW Estimators and Modeling Assumptions

In an observational study of a network, interventions are typically not randomized at either the network or individual-level, but rather individuals and their nearest neighbors typically self-select their own exposures. Therefore, identification of causal effects does not benefit from exchangeability achieved by randomization and adjustment for a sufficient set of pre-exposure covariates at both the individual- and network-level is needed to quantify causal effects. In this section, we extend two IPW estimators [liu2016inverse, forastiere2016identification] to a setting with neighborhoods and subnetworks in observed networks. We assume that the observed network can be expressed as the union of connected subnetworks . We quantify the variance accounting for correlation within connected subnetworks, or components, of the full observed network. Importantly, we now incorporate the nearest neighbor structure in the estimating equations used to calculate the closed-form variances because this better reflects the underlying structure through which dissemination operates in the observed network. Individuals who share a connection or edge are more likely to influence each other, as opposed to individuals who are clustered together, possibly in a large grouping, without information on their connections or distance in the network between individuals, but the assumption is nonetheless made that these individuals could all possibly influence each other within the set (i.e., a generalized interference set).

5.1 Estimators

Assume that conditional on pre-exposure covariate vector , the treatment allocation for participant is independent of all potential outcomes and other covariates, that is


We also make the stratified interference assumption which is the outcome for an individual depends on the exposures of other individuals only through the number of those who are exposed within the neighbors [hudgens2008toward]. We also make the causal inference assumptions including positivity and treatment variation irrelavance [ogburn2014causal].

The first IPW estimator is an extension of the one proposed by [liu2016inverse], but now we define the interference sets by the nearest neighbors for each individual in the network, and then use this nearest neighbor structure when calculating the closed-form variance. Define the IPW estimator for treatment with coverage as


where is the neighborhood-level propensity score, the probability of treatment assignment conditional on observed baseline covariates, given by

where , ,

and . Here, is the component-level random effect for possible correlation of exposures within components.

The marginal population-level average potential outcome estimator is


The second IPW estimator uses an individual and neighborhood propensity score as defined in [forastiere2016identification]. We now assume that the potential outcomes of individual depend on the total number of exposed neighbors, . In particular,

The IPW estimator for treatment with coverage is defined as


and the IPW marginal estimator as


where . Let

be the probability of individual has treated neighbors and

denote the probability of the individual together with treated neighbors.

The propensity score

is the joint probability distribution of individual treatment and neighborhood treatment given the covariates

and . Here, we express this as a product of the individual propensity score, , and neighborhood propensity score, .

We assume that the individual treatment follows a Bernoulli distribution

with probability being the individual propensity score, modeled as a function of the covariate vector using a logit link:

Furthermore, we assume that the total number of treated neighbors

follows a binomial distribution

with probability modeled as a function of the neighborhood covariate vector using a logit link:

where is an aggregate function of the vector . For instance, the proportion of females or males in the neighborhood or average age of all neighborhood members.

Here, we are making the assumption that conditional on the neighborhood covariates and participant’s exposure, the exposures of nearest neighbors are independent and identically distributed. In other words, the dependency between neighbors’ exposure is captured by the correlation with the person ’s exposure and the covariates of others in the neighborhood.111Given the individual propensity score, in principle, we could compute the neighborhood propensity score as a sum of products of the individual propensity scores for all neighbors for all treatment combinations such that under the assumption of independence of in a neighborhood given a component-level random effects and individual covariates. This would be one correct way of computing the neighborhood propensity score. Instead in this estimator, we use an alternative solution where the neighborhood propensity score is estimated assuming a binomial model conditional on a summary statistics of the neighborhood covariates. This approach, while approximate, is more straightforward and works when the dependency among neighbors’ exposures cannot be attributed to a latent factor shared by all units belonging to the same component in the network. Therefore, the propensity score can be factor into two marginal distributions and as follows:

Assume that for all and , and . Under allocation strategy , and , we consider the following risk difference estimators of the direct, disseminated (indirect), composite (total), and overall effects:

where corresponds to the two IPW estimators that we defined above.

Proposition 1

If the propensity scores and are known or estimated under the correct model specification, then and and .

Proof of Proposition 1 is shown in Appendix A. Using these unbiased estimators, the estimation of the causal effects will also be unbiased because the causal effects are contrasts of these marginal quantities.

5.2 Large sample properties of the inverse probability of sampling weighted estimator

The large sample variance estimators can be derived using M-estimation theory [stefanski2002boos]. We assume that the observed network can be expressed as the union of connected subnetworks, referred to as components of the network. Consider a social network with participants and components . Let where is the set of coefficients in the propensity score and

We assume that the components are a random sample from the infinite super-population of groups such that the random variables are independent and identically distributed. To perform inference, we use independent components (i.e., subnetworks), while we preserve the underlying connections comprising the network structure of each component. That is, by extending [liu2016inverse], every individual is now assigned their own propensity score based on the observed network structure defined by their nearest neighbors. To simplify the notations, we write as here. Let where , , , and be the dimension of the parameter , and

where the average component size is ,


such that . Note that implies that under suitable regularity conditions that as , the asymptotic normality and a consistent sandwich type estimator of the variance can be established for the two IPW estimators (see Proposition 2 below). Let and .

Proposition 2

converges in distribution to as where the variance matrix is given by

The proof of Proposition 2 is shown in Appendix B.

6 Simulation

A simulation study was conducted to evaluate the performance of the two IPW estimators and their corresponding closed-form variance estimators. We focused on the evaluation of the finite sample bias and coverage of the corresponding 95% Wald-type confidence intervals. The network characteristics (number of components, number of nodes in each component) and parameters of potential outcome model are motivated using empirical estimates from the TRIP data. In this simulation study, we considered regular network where each node has the same number of neighbors. We first generate

network components as regular networks of degree four for each node. The number of nodes in each component is sampled from a Poisson distribution with average 10. We conducted several simulations where the numbers of components,

, is from the set . Given a generated network, a total of 1,000 data sets were simulated in the following steps.

  • A baseline covariate was randomly generated as . We then generated all possible potential outcomes

  • Assign the component-level random effect to each component in the network to account for the correlation between the outcomes within components. The exposure was generated as

  • We then obtain the corresponded outcomes from the potential outcomes that we generated in Step 1 as our simulated observed data.

For each simulated data set, the , , , and were computed for and

. The true parameters were calculated by averaging the potential outcomes that we generated in Step 1. The estimated standard errors were derived using the appropriate entries from the variance matrix in Proposition 2, then averaged across simulations to obtain the average standard error (ASE). Empirical standard error (ESE) was the standard deviation of estimated means across all simulated data sets. Empirical coverage probability (ECP) is the proportion of the instances that the true parameters were contained in the Wald-type 95% confidence intervals based on the estimated standard errors among the 1000 simulations with a margin of errors equal to 0.014. The simulation results are summarized in Appendix C.

Figure 2: Absolute bias (left) of IPW (top) and IPW (bottom) estimator and corresponding Wald 95% confidence intervals empirical coverage probability (right) for different number of components in the network

Figure 2 shows that the bias approaches zero and ECPs approach the nominal 0.95 level when the number of components increases from 10 to 200. In Table A5, the ECPs of the estimator IPW under all allocation strategies were close to the nominal level and ECPs of IPW approach the nominal level when the allocation strategies had a coverage level around 50 in the observed data. To compare the performance of the estimators for the asymptotic variances that assume partial interference [liu2016inverse], we used observed components in the network as partial interference sets. The partial interference assumption for variance estimation resulted in higher ASE and ECP, as compared to the asymptotic variance using Proposition 2 (See Figure 3).

Figure 3: Given a network with 100 components, comparison of the average ESE, the average ASE for proposed estimator, IPW, and average ASE based on Liu’s variance estimator of the average potential outcomes under allocation strategies 25%, 50%, and 75%.

In our main scenario, we simulated networks with component size in average 10 and increased the number of components to evaluate the performance of IPW and IPW in estimating the average potential outcomes. In addition to the simulation scenarios above, we also used a regular network of degree 4 with 100 components to compare scenarios with a different exposure generating mechanism without random effects, and a scenario in which the stratified interference assumption is violated. In addition to this simulated regular network, we used TRIP network structure to investigate the performance when using community detection to further divide the network to larger number of component in the network. We considered the following four scenarios:

  • We used the exposure generating mechanism without random effects

    Comparing Table 2 and Table A3 The results suggested that the ECPs of IPW were below the nominal level when the exposure mechanism is misspecified, while IPW’s performance remained largely similar to settings with a correctly specified exposure mechanism.

  • We used a different exposure generating model given by:

    Unlike the previous exposure generating model, this model results in more individuals who have none or 25% of their neighbors exposed in the observed data (See Figure 4). The simulation results in Table A6 suggested that both IPW estimators have higher ECPs at allocation strategy 25% (IPW: 94%, IPW: 97%) and lower at allocation strategy 75% (IPW: 68%, IPW: 71%) in this scenario, suggesting that the performance of both estimators in estimating the point estimates and ASEs are better under allocation strategies where there are more individuals with of their neighbors exposed in the observed data.

  • In this additional scenario, we considered an outcome model where the stratified interference assumption is violated. We used the potential outcome model, , given by:

    The simulation results in Table 3 showed that both estimators did not perform well with respect to the point estimates, as the magnitude of absolute bias was larger. The ECPs of IPW are all greater than 95% which suggested over-coverage. The ECPs of IPW had coverage above (100%) or slightly below the nominal level ().

  • Lastly, we considered the network structure from our motivating study TRIP. The TRIP network consists of 10 disconnected components with 217 total nodes and 542 edges. Based on previous simulation results, such a small number of components may result in poor finite-sample performance of variance estimators. To increase the number of components for estimation of the asymptotic variance of the estimated causal effects, we employed an efficient modularity based (e.g. fast greedy) approach to detect communities to further divide some large connected components of the TRIP network into a total of 20 smaller components. Modularity takes large values when there are more substantial connections among some individuals than expected if the connections were randomly assigned [PhysRevE.70.066111]. As a result, components each comprise a unique set of participants and there are more edges between the participants within components than across components in the TRIP network. By ignoring edges across components, we treat the obtained communities as independent units to improve the estimation of the variance. Importantly, we still define the interference sets using the nearest neighbors for point estimation of the causal effects. The simulation results on the TRIP network with/without community detection (see Table 4) demonstrated that the ECPs of both IPW and IPW had coverage above the nominal level (97%-100%) when using the TRIP network with only 10 components. After further divide the network using community detection, the ECPs have coverage slightly below the nominal level in some cases.

0.0013 0.048 0.046 0.91 0.0149 0.039 0.036 0.86
-0.0003 0.027 0.028 0.96 0.0016 0.022 0.024 0.96
-0.0032 0.041 0.041 0.93 0.0093 0.035 0.033 0.88
-0.0051 0.039 0.040 0.94 0.0093 0.033 0.031 0.90
-0.0018 0.028 0.029 0.95 0.0018 0.023 0.025 0.97
0.0004 0.053 0.051 0.92 0.0195 0.045 0.042 0.83
-0.0035 0.032 0.032 0.94 0.0107 0.027 0.026 0.88
-0.0010 0.021 0.021 0.97 0.0017 0.016 0.018 0.97
-0.0023 0.033 0.030 0.93 0.0118 0.029 0.028 0.86
Table 2: Results from 1000 simulation dataset on a network with 100 components for IPW (left) and IPW (right) for treated (), not treated (), and marginal estimators under allocation strategies 25%, 50%, and 75% using exposure generating model
Figure 4: The frequency of the proportion of exposed neighbors when using the exposure generating models (left) and (right)
0.9965 0.0039 0.068 0.074 0.96 0.0345 0.09 0.074 0.84
0.9885 0.0041 0.033 0.047 0.99 0.0051 0.006 0.048 1.00
0.9724 -0.0134 0.067 0.077 0.96 -0.154 0.115 0.148 0.93
0.9943 -0.0179 0.066 0.078 0.97 -0.1682 0.121 0.154 0.92
0.9821 0.0015 0.034 0.048 0.99 0.0032 0.008 0.049 1.00
0.9583 0.0060 0.069 0.074 0.95 0.0398 0.085 0.070 0.81
0.9949 -0.0125 0.050 0.063 0.98 -0.1175 0.100 0.122 0.92
0.9853 0.0028 0.022 0.040 1.00 0.0042 0.005 0.035 1.00
0.9689 -0.0086 0.050 0.062 0.97 -0.1056 0.094 0.117 0.94
Table 3: Results from 1000 simulation dataset on a network with 100 components for IPW (left) and IPW (right) for treated (), not treated (), and marginal estimators under allocation strategies 25%, 50%, and 75% using an outcome model where the stratified interference assumption is violated.
10 components 20 components
True Bias ECP Bias ECP Bias ECP Bias ECP
0.2473 0.0098 0.986 0.0167 0.988 0.0098 0.849 0.0036 0.890
0.2265 0.0021 0.998 0.0112 0.986 0.0021 0.946 0.0064 0.920
0.2058 0.0020 0.987 0.0126 0.997 0.0020 0.894 0.0057 0.943
0.2304 0.0001 0.996 0.0046 0.999 0.0001 0.920 0.0021 0.968
0.2778 0.0010 1.000 0.0029 1.000 0.0010 0.954 0.0017 0.974
0.3275 0.0073 0.996 0.0038 1.000 0.0073 0.896 0.0019 0.992
0.2346 0.0025 0.999 0.0133 0.971 0.0025 0.943 0.0061 0.915
0.2521 0.0015 1.000 0.0121 0.993 0.0015 0.982 0.0001 0.917
0.2362 0.0004 1.000 0.0130 0.998 0.0004 0.937 0.0046 0.940
Table 4: Results from 1000 simulation dataset on TRIP network for 10 components (left) and using community detection to further divide the network to 20 components (right) for treated (), not treated (), and marginal estimators under allocation strategies 25%, 50%, and 75%.

7 Evaluation of dissemination effects of community alerts in the Transmission Reduction Intervention Project (TRIP)

We apply the estimators proposed in Section 4 to estimate the causal effects of community alerts at baseline on report of risk behavior at the six-month visit. The community alerts intervention status of the index individual and their neighbors was defined with respect to the baseline visit date for the index person. The network structure in TRIP had 10 connected components (i.e. observed subnetworks) with 217 participants and 363 shared connections (average degree is 3.35) after excluding isolates and 60 participants (21%) who were lost to follow up before their six-month visit. Among the 217 participants in TRIP, 25 participants (11.5%) have received a community alert about the increased risk for HIV acquisition in close proximity in their network from the study team and we are interested in evaluating if that information was shared with their the nearest neighbors and ultimately, if this resulted in a reduction in risk behavior beyond those who were contacted by study staff, see Figure 1. Among participants with complete information on the questions related to sharing injection equipment, we considered the report of sharing injection equipment (or not) at the 6-month visit as the binary outcome. The following baseline covariates are included in the adjusted models: HIV status, shared drug equipment (needles, syringe) in last six months, the calendar date at first interview, education (primary school, high school, and post high school), and employment status (employed, unemployed/looking for a work, can’t work because of health reason, and others). HIV status was ascertained from a blood sample from each participant collected by a health program physician [aristotle]. Given the study population included PWID in one geographic location, it may be plausible to assume that social desirability leading to possible reporting bias is comparable between the two exposure groups. Under this assumption, the reporting bias would effectively be removed when estimating contrasts between exposure groups. The point estimates and 95% CI using both estimators under allocation strategies 25%, 50% and 75% are summarized in Appendix D. The normality of random effects in IPW is tested using a diagnostic test for mixed effects model in [normality]. According to the results from simulation scenario 5, we used community detection to further divide the TRIP network into 20 components to improve the finite-sample performance of the variance estimators.

In addition to including the full set of covariates to adjust for measured confounding in the weight models, we conducted sensitivity analyses to evaluate the impact of different sets of covariates on the model results. We first considered univariate models; that is, adjustment for only one covariate at a time. Second, we estimated the effects using the full set of covariates, dropping one covariate at a time. Lastly, we estimated the effects without adjustment for any covariates. The results were largely robust to the set of measured covariates used to adjust for confounding. All models demonstrated an estimated protective disseminated effect, except for the scenario with the full set of covariates, excluding the calendar date at first interview, when estimated using IPW; however, this did not achieve statistical significance.

Direct, disseminated, total, and overall effect estimates and the Wald 95% confidence intervals of both estimators for different allocation strategies and using all five confounding variables are shown in Figure 5. All estimates of the risk differences for both estimators IPW and IPW were protective. These results suggested that the likelihood of reported HIV risk behavior was reduced not only by receipt of alert to an individual, but also by increasing the proportion of an individual’s neighborhood members exposed to community alerts from the study team. Specifically, the estimated direct effect is (95% CI: ), estimated using IPW and (95% CI: ) when estimated with IPW; that is, there were 23 per 100 fewer individuals reporting risk behavior among those who received the alerts compared to those who did not, in a neighborhood where 75% of individuals received alerts when estimated using IPW (23 per 100 fewer using IPW). The disseminated effect is (95% CI:), estimated using IPW under allocation strategies 25% versus 75% and (95% CI:) when estimated with IPW; in other words, there were 13 per 100 fewer individuals reporting risk behavior who did not receive an alert themselves, but were in neighborhoods with 75% of other individuals receiving alerts, compared to those not exposed in neighborhoods with just 25% when estimated using IPW (11 per 100 fewer using IPW). The total effects (95% CI:) estimated using IPW and (95% CI:) estimated using IPW. There were 36 per 100 individuals reporting risk of behavior when estimated using IPW (35 per 100 fewer using IPW) if an individual both received an alert and had 75% of neighborhood members also receiving an alert compared to if an individual did not receive an alert and only 25% of their neighborhood members received an alert. The overall effects, (95% CI:) using IPW and (95% CI:) using IPW. There were 28 per 100 fewer individuals reporting risk behavior using IPW (28 per 100 fewer using IPW) if 75% of neighborhood members received alerts compared to only 25% neighborhood members received alerts. Based on the simulation study, , IPW may be preferred over IPW for counterfactual allocation strategies 25% and 50% given the small number of components of TRIP network and most of individuals in the TRIP study have 25% and 50% of neighbors who were exposed to community alerts. In additiona, the effects under allocation strategies 75% are likely better quantified using IPW.

Figure 5: The point estimates of risk differences and the Wald 95% CI of the effect community alert and HIV risk behavior on TRIP using IPW and IPW.

8 Discussion

In this paper, methods for evaluating disseminated effects were developed for the setting of network-based studies by leveraging a nearest neighbor interference set. The proposed approach uses connections (i.e., edges) between individuals in a network and allows for overlapping interference sets within each component of the network. The two proposed estimators of were shown to be consistent and asymptotically normal. Importantly, a consistent, closed-form estimator of the asymptotic variance was derived. The simulation study demonstrated that the two IPW estimators had reasonable finite-sample performance in terms of consistency and empirical coverage for a large number () of components in the observed network. When the exposure mechanism is misspecified, IPW had coverage below the nominal level while IPW still maintained proper coverage levels. Lastly, in the additional simulation scenario 4, the ECPs were above the nominal level (97%-100%) when using the TRIP network with only 10 components. This may be a result of the uncertainty from the imbalanced component size observed in TRIP, where the total nodes are 217 and the largest component has size 186. After using community detection to further divide the network into a larger number of components (20), the coverage level then decreased to average of 93%.

With these methods, we now have an approach to quantify the social and biological influence on the determinants of risk and HIV transmission in HIV risk networks of PWIDs [friedmanSocialNet2001] when evaluating the impact of interventions, such as treatment as prevention (TasP), or how interventions permeate a risk network [nikolopoulos2016network, friedman2014socially]. These new methodologies will improve the identification of best preventive practices for PWID and provide evidence to expedite policy changes to improve access to HIV treatment and risk reduction interventions in subpopulations of high-risk drug users. In the TRIP study, these methods allowed for quantification of the extent to which the community alerts intervention reduced onward transmission to others in the community by tracking incident infections in the risk networks, and here, we measured that through the proxy of self-reported HIV risk behaviors. Correctly conducted and analyzed studies among PWID will improve existing interventions, inform new interventions, and has the potential to reduce incident HIV infections in this subpopulation.

Studies of network effects among PWID are rich with future methodological problems. The simulation study indicated that the asymptotic variance estimators of the IPW estimators performed poorly when the number of components in the network is small (50). Finite sample correction for quantifying asymptotic variances is needed when the network has small number of components. Furthermore, the outcome of interest might be missing due to participant lost to follow-up in some intervention-based studies where outcomes are observed at certain among of time after intervening such as 21% of TRIP participants were lost to follow up at six-month. Future work should include development of censoring methods to evaluate the IPW outcomes in the presence of missing outcomes. With regard to real data application, the impact of unmeasured confouding is important in causal inference study; however, these sensitivity analyses in the presence of interference currently only exist for two-stage randomized trials with clustering features [tylersensitivity2014]. Designing sensitivity analysis to assess the bias of unmeasured confounding in network-based studies should be included in future research. With these improved inferential methods, investigators will be able to answer questions they were previously unable to address in network-based studies, leading to more effective intervention implementation and far-reaching policy change to prevent HIV infection, reduce risk behavior, ultimately, improve HIV treatment and care among PWID. In addition to study HIV transmission among PWID, this method can also be applied in a wider context to study sexually transmitted infection diseases such as genital herpes and trichomoniasis among adolescents and young adults, men who have sex with men, or pregnant women.


These findings are presented on behalf of the Transmission Reduction Intervention Project (TRIP). We would like to thank all of the TRIP investigators, data management teams, and participants who contributed to this project. The project described was supported grant 1DP2DA046856-01 by the Avenir Award Program for Research on Substance Abuse and HIV/AIDS (DP2) from National Institute on Drug Abuse of the National Institutes of Health, the National Institute on Drug Abuse of the National Institutes of Health award number DP1 DA034989, which funded Preventing HIV Transmission by Recently-Infected Drug Users, the National Institute on Drug Abuse of the National Institutes of Health award number P30DA011041 which supported the Center for Drug Use and HIV Research, and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health award number R01 AI085073 Causal Inference in Infectious Disease Prevention Studies. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


Appendix A Proof of Proposition 1

To show is unbiased, see

The unbiasedness of the marginal inverse probability weighted estimator, , can be proved similarly.

Under the assumption that , IPW is also unbiased