Randomization Inference for Peer Effects

07/04/2018 ∙ by Xinran Li, et al. ∙ Tsinghua University Harvard University Peking University berkeley college 0

Many previous causal inference studies require no interference among units, that is, the potential outcomes of a unit do not depend on the treatments of other units. This no-interference assumption, however, becomes unreasonable when units are partitioned into groups and they interact with other units within groups. In a motivating application from Peking University, students are admitted through either the college entrance exam (also known as Gaokao) or recommendation (often based on Olympiads in various subjects). Right after entering college, students are randomly assigned to different dorms, each of which hosts four students. Because students within the same dorm live together and interact with each other extensively, it is very likely that peer effects exist and the no-interference assumption is violated. More importantly, understanding peer effects among students gives useful guidance for future roommate assignment to improve the overall performance of students. Methodologically, we define peer effects in terms of potential outcomes, and propose a randomization-based inference framework to study peer effects in general settings with arbitrary numbers of peers and arbitrary numbers of peer types. Our inferential procedure does not require any parametric modeling assumptions on the outcome distributions. Additionally, our analysis of the data set from Peking University gives useful practical guidance for policy makers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Causal inference, interference, and peer effects

The classical potential outcomes framework (Neyman 1923) assumes no interference among experimental units (Cox 1958), i.e., the potential outcomes of a unit are functions of its own treatment but not others’ treatments. This constitutes an important part of Rubin (1980)’s Stable Unit Treatment Value Assumption (SUTVA). In some experiments, interference is a nuisance, and researchers try to avoid it by isolating units. Interference, however, is unavoidable in many studies when units have interactions with each other. Examples include vaccine trials for infectious diseases in epidemiology (Halloran and Struchiner 1991, 1995; Perez-Heydrich et al. 2014), group-randomized trials in education (Hong and Raudenbush 2006; Vanderweele et al. 2013), and interventions on networks in sociology (An 2011; VanderWeele and An 2013), political science (Nickerson 2008; Ichino and Schündeln 2012; Bowers et al. 2013) and economics (Manski 1993; Sacerdote 2001; Miguel and Kremer 2004; Graham et al. 2010; Goldsmith-Pinkham and Imbens 2013; Arpino and Mattei 2016). Ogburn and VanderWeele (2014) discussed different types of interference. Forastiere et al. (2016) showed that ignoring interference can lead to biased inferences. It is important to study the pattern of interference in some applications, because it is of scientific interest and useful for decision making. For example, Sacerdote (2001) found significant peer effects in student outcomes (e.g., GPA and fraternity membership) among students living in the same dorm of Dartmouth College. Based on this, Bhattacharya (2009) discussed the optimal peer assignment.

1.2 Motivating application in education

Our motivation comes from a data set of a university in China. It contains a rich set of variables of the students: family background, the ways they were admitted, roommates’ information, GPAs, etc.

The university admits students through two primary channels: the college entrance exam (also known as Gaokao) and recommendation. Gaokao is an annual test in China to assess students’ knowledge in various subjects. Every university has its own minimal test score threshold to admit students. Students from Gaokao study all subjects and often have broader knowledge. Students from recommendation do not need to take Gaokao. They win awards in national or international Olympiads in mathematics, physics, chemistry, biology, or informatics. They concentrate on a certain subject for the corresponding Olympiad. They may even take some college courses on that subject during their high school years. Most of them choose majors related to the subject they focused on in high schools. Students admitted through these two channels have different training and thus different attributes. Students from recommendation generally perform better in GPAs than students from Gaokao.

After entering the university, students usually live in four-person rooms for four years. They often study together and interact with each other. We know that two types of students, from Gaokao and recommendation, have different training in high schools. It is then natural to ask the following questions. Is it beneficial for students from Gaokao to live with students from recommendation, or vice versa? Is there an optimal combination of roommate types for the performance of a certain student? Is there an optimal roommate assignment to maximize the performance of all students? These questions are all about peer effects among students.

1.3 Literature review and contribution

With interference, the potential outcomes of a unit can depend on its own treatment and others’ treatments in various ways. Therefore, causal inference with interference has different mathematical forms. Like many other causal inference problems, there are at least two inferential frameworks for causal inference with interference: the Fisherian and Neymanian perspectives. Under the Fisherian view, Rosenbaum (2007), Luo et al. (2012), Aronow (2012), Bowers et al. (2013), Rigdon and Hudgens (2015), Athey et al. (2018) and Basse et al. (2017)

proposed exact randomization tests for detecting causal effects with interference, and constructed confidence intervals for certain causal parameters by inverting tests.

Choi (2017) discussed a related approach under the monotone treatment effect assumption. Under the Neymanian view, Hudgens and Halloran (2008)

discussed point and interval estimation for several causal estimands with interference under two-stage randomized experiments on both the group and individual levels.

Liu and Hudgens (2014) then established the large sample theory for these estimators. Aronow and Samii (2017), Basse and Feller (2018), and Sävje et al. (2017) extended the discussion to other general contexts. The Fisherian and Neymanian views are both randomization-based in the sense that the uncertainty in testing or estimation comes solely from the treatment assignment mechanism, and all the potential outcomes are fixed constants. When two-stage randomization is infeasible, we need certain unconfoundedness assumptions. Tchetgen and VanderWeele (2012)

proposed an inverse probability weighting estimator.

Perez-Heydrich et al. (2014) applied this methodology to assess effects of cholera vaccination. Liu et al. (2016) studied the theoretical properties. Other studies (Sacerdote 2001; Toulis and Kao 2013) relied on parametric modeling assumptions.

Our framework for peer effects furthers the literature in several ways. First, we define peer effects using potential outcomes. Unlike in previous work (e.g., Sobel 2006; Hudgens and Halloran 2008), our estimands do not involve averages over the treatment assignment. We separate the causal estimands from the treatment assignment. As Rubin (2005) argued, the former are functions of the potential outcomes, and the latter induces randomness and governs the statistical inference. Second, previous works discussed external interventions with known networks, clusters, or groups. Our hypothetical intervention is the roommate assignment in the motivating application. It forms a “network” among units, which further causes interference and peer effects. We explain the distinction between the two types of interference in detail in Section 2.5. Our setting is similar to Sacerdote (2001)’s. However, we formalize the problem using potential outcomes instead of linear models and allow for causal interpretations without imposing model assumptions. Third, we propose randomization-based point estimators, prove their asymptotic Normalities, and construct confidence intervals. We further derive the optimal roommate assignment to maximize the performance of students. The inferential framework is Neymanian, similar to those of Hudgens and Halloran (2008) and Aronow and Samii (2017). Fourth, we apply the new method to the data set from a university in China and find important policy implications. We relegate all the technical details to the Supplementary Material.

2 Notation and framework for peer effects

2.1 Potential outcomes with peers

We consider an experiment with units, where is the number of groups and is the size of each group. Each unit has peers in the same group. The group and peers correspond to room and roommates in our motivating application, where is the number of roommates for each student. Let be the treatment assignment for unit , which is a set consisting of the identity numbers of his/her peers, i.e., . In the motivating application, is a set consisting of three roommates of unit . Let be the treatment assignment for all units, and be the set of all possible values of the assignment . Let be the potential outcome of unit under treatment assignment . This potential outcome depends on treatment assignments of all other units. Let be the attribute or type of unit . In the motivating application, , and if unit is from Gaokao, and if unit is from recommendation. Under treatment assignment , let be the set consisting of the attributes of unit ’s peers, and be the set consisting of the attributes of all units in the group that unit belongs to. We call and the peer attribute set and group attribute set. Both of them contain unordered but replicable elements. Therefore, and where denotes the cardinality of a set. In the motivating application, if unit is from recommendation and has roommates from Gaokao and from recommendation, then and , where we use 112 and 1122 for notational simplicity. In this case, or has a one-to-one mapping to the number of students from Gaokao within the room of unit .

Let be the indicator function. For unit , is the observed outcome, and is the observed peer attribute set. These summations are over all possible values of the treatment assignments for all units.

2.2 Group-level SUTVA and exclusion-restriction-type assumptions

Without further assumptions, the potential outcome depends on the treatments of all units. This makes statistical inference intractable. We invoke the following two assumptions to reduce the number of potential outcomes.

Assumption 1.

If , then , for any two treatment assignments and any unit .

Assumption 1 states that if a unit’s peers do not change, then its potential outcome will not change. This assumption requires no interference between groups but allows for interference within groups. Under Assumption 1, each unit’s potential outcomes depend only on its peers in the same group. Therefore, we can write as , a function of the peers of unit . Assumption 1 is a group-level SUTVA, which is similar to the “partial interference” assumption (Sobel 2006; Hudgens and Halloran 2008).

Assumption 2.

If , then , for any two treatment assignments and any unit .

Assumption 2 states that if the treatment assignment does not affect the attributes of the peers of unit , then it does not affect the outcome of unit . Therefore, the potential outcomes of each unit depend only on its peers’ attributes instead of its peers’ identities. Assumption 2 is similar to “anonymous interaction” (Manski 2013). Assumption 2 implies that the peer attribute set of a unit is the ultimate treatment of interest. We are inferring the treatment effects of the peer attribute set. Previous works often invoked Assumption 2, or a slightly weaker form, for inferring peer effects among college roommates. For example, in Sacerdote (2001)’s study from Dartmouth College, the ultimate treatment was peers’ academic indices created by the admission office, and in Langenskiöld and Rubin (2008)’s study from Harvard College, the ultimate treatment was peers’ smoking behaviors.

Both Assumptions 1 and 2 are untestable based on the observed data from a single experiment. They are strong identifying assumptions. We will relax them in Section 7.

Under Assumptions 1 and 2, simplifies to , a function of the peer attribute set of unit . Recall that contains unordered but replicable elements from . Let be the set consisting of all possible values of . Potential outcome of unit , , simplifies to for some . Then the potential outcome is , and the observed outcome is . Therefore, we can view the elements in as hypothetical treatments, with possible values. In our motivating application, and

As a side note, motivated by the example of the university in China, we consider the case with equal group sizes . When groups have different sizes, we need to modify Assumption 2. For example, we can assume that the potential outcomes of a unit depend on the proportions of his/her peers’ attributes. The plausibility of this assumption depends on the context of the application, and we leave it to future work.

2.3 Causal estimands for peer effects

For units with attribute , let and be the number and proportion, and be the subgroup average potential outcome under treatment . Let be the average potential outcome for all units under treatment . Therefore, is a weighted average of ’s. Comparing treatments , we define as the individual peer effect,

(1)

as the subgroup average peer effect for units with attribute , and

(2)

as the average peer effect for all units. We are interested in estimating the average peer effects and . They are functions of the fixed potential outcomes and do not depend on the treatment assignment mechanism.

For ease of reading, we summarize the key notation in Table 1.

notation definition meaning, properties or possible values
peer assignment of unit a set of the identity numbers of his/her peers
unit ’s attribute
number of units with attribute
proportion of units with attribute and
unit ’s peer attribute set a set of the attributes of unit ’s peers
a set of all possible values of
unit ’s group attribute set a set of attributes of all units in unit ’s group
a set of all possible values of with
unit ’s potential outcome under original treatment
unit ’s potential outcome under ultimate treatment
Table 1: Notation and explanations

2.4 Treatment assignment mechanism

The treatment assignment mechanism is important for identifying and estimating peer effects. We consider treatment assignment mechanisms satisfying some symmetry conditions. First, units with the same attribute must have the same probability to receive all treatments. Second, pairs of units with the same pair of attributes must have the same probability to receive all pairs of treatments. Formally, we require that the treatment assignment mechanism satisfies the following two conditions.

Assumption 3.

For any ,

  • if ;

  • if and for

We will give two examples of treatment assignment mechanisms satisfying Assumption 3.

2.4.1 Random partitioning

Under random partitioning, we randomly assign units to groups of size , and all possible partitions of units have equal probability. To be more specific, if a treatment assignment is compatible with a partition of units into groups of size , then otherwise, . This formula follows from counting all possible random partitions. To generate a random partition, we can randomly permute units and divide them into groups of equal size sequentially.

Random partitioning, however, can result in unlucky realizations of the randomization. We may have too few units with attributes and treatments of interest. For illustration, we consider the motivating education example with students, from Gaokao and from recommendation. Assume that we are interested in , the treatment effect of versus for students from Gaokao. Under random partitioning, it is possible that no students from Gaokao receives treatment or . In that case, it is impossible to estimate precisely. An example of such a realization is that 4 students from Gaokao live in one room and the remaining 1 student from Gaokao and 3 students from recommendation live in the other room.

2.4.2 Complete randomization

We propose another treatment assignment mechanism to avoid the drawback of random partitioning. It requires predetermined number of units for each attribute receiving each treatment. We achieve this goal by fixing the numbers of groups. Recall that the group attribute set contains unordered but replicable elements from . Consider the same education example with 5 students from Gaokao and 3 students from recommendation. Under random partitioning, we may hope that one room has group attribute set and thus the other room has group attribute set . This results in 3 and 2 students from Gaokao receiving treatments and , respectively. Therefore, this avoids other assignments with no students from Gaokao receiving these treatments of interest.

We need additional symbols to describe complete randomization. Let be the set consisting of all possible group attribute sets, with cardinality In our motivating application, with Under treatment assignment , the number of groups with attribute set is

where the divisor appears because all units in the same group must have the same group attribute set. Let

be the vector of numbers of groups corresponding to group attribute sets

under assignment .

Under complete randomization, the assignment must satisfy for a predetermined constant vector , and all such assignments must have equal probability. For any , let be the number of elements in set that are equal to . If is compatible with a partition of units into groups and , then

(3)

otherwise, The above formula (3) follows from counting all possible complete randomizations. To generate a complete randomization, we can first randomly partition the units with attribute into groups, where each of the first groups has units, each of the next groups has units, , each of the last groups has units. The partitions for units with different attributes are mutually independent. Finally, the first groups will have group attribute set , and the last groups will have group attribute set , satisfying the requirement .

We revisit the education example with 5 students from Gaokao and 3 students from recommendation. The treatment of complete randomization has predetermined vector . Thus, one group has attribute set and the other group has attribute set We need to randomly assign 3 students from Gaokao and 1 student from recommendation to group , and assign the remaining students to group . Equivalently, for the 5 students from Gaokao, we randomly assign of them to group and the remaining 2 to group ; for the 3 students from recommendation, we randomly assign of them to group and the remaining 2 to group , independently of the group assignments for students from Gaokao.

For and , let be the number of units with attribute receiving treatment . First, the units with attribute receiving treatment must have group attribute set , which equals for some . Second, each group with attribute set contains units with attribute . Third, all of these units receive the same treatment . These facts imply that

(4)

depends only on the vector . Thus, the ’s are constants under complete randomization. In the previous education example with students, consider complete randomization with predetermined vector . The numbers of units from Gaokao receiving treatments and are constants and . Therefore, complete randomization can guarantee that at least some students from Gaokao receive the treatments of interest.

Moreover, under random partitioning, if we conduct inference conditional on , then the treatment assignment mechanism becomes complete randomization with fixed at the observed vector Therefore, even under random partitioning, we can still conduct inference under complete randomization if we condition on .

2.5 Connection and distinction between existing literature and our paper

We comment on the difference between the majority of the existing literature and our paper. We compare two types of interference.

Figure 1(a) illustrates the first type. The grey or white color of each unit denotes the external treatment (e.g., receiving vaccine or not). Each unit’s outcome depends not only on its own treatment but also on treatments of other units in its circle. Thus, units interfere with each other in the same dashed circle. Importantly, the network structure is fixed.

Figure 1(b) illustrates the second type. The grey or white color denotes the units’ attributes (e.g., from Gaokao or recommendation in the motivating application). The outcome of each unit depends on the attributes of other units in its circle. Thus, units interfere with each other in the same dashed circle. Unlike the first type, the units’ attributes are fixed but the network structure is random.

(a1)
(a2)
(a) The first type of interference with a fixed network and random external interventions. (a1) and (a2) are two possible realizations of random external interventions (colors of the units).
(b1)
(b2)
(b) The second type of interference with fixed attributes of all units and a random network. (b1) and (b2) are two possible realizations of random networks (dashed circles).
Figure 1: Two types of interference with dashed circles indicating networks.

A main difference between these two types comes from the source of randomness. For the first type, the colors are random and the dashed circles are fixed. For the second type, the colors are fixed and the dashed circles are randomly formed. The recent causal inference literature focused on the first type (Hudgens and Halloran 2008; Aronow 2012; Liu and Hudgens 2014; Athey et al. 2018). In this paper, we formalize the second type and propose inferential procedures based on the treatment assignment mechanism.

3 Inference for peer effects under general treatment assignment

3.1 Point estimators for peer effects

Throughout the paper, we invoke, unless otherwise stated, Assumptions 13. For and , let be the probability that a unit with attribute receives treatment . Define

(5)
(6)
Proposition 1.

For and , the estimators and are unbiased for and , respectively.

The unbiasedness of follows from the Horvitz–Thompson-type inverse probability weighting, and the unbiasedness of and then follows directly from the linearity of expectation.

3.2 Sampling variances of the peer effect estimators

For units with attribute and , define

as the finite population variances of the potential outcomes and individual peer effects, and

as the average of the products of the potential outcomes for pairs of units with attribute .

For and if are two units with attributes and , then is the joint treatment assignment probability, and

(7)

measures the dependence between the two events and . We further need a few known constants depending only on the treatment assignment mechanism. For and define

(8)

and

(9)

These constants are useful for expressing the sampling variances of the estimators.

Theorem 1.

Under Assumptions 13, for treatments the sampling variance of the subgroup average peer effect estimator is

(10)

and the sampling variance of the average peer effect estimator is

(11)

From Theorem 1, the sampling variances of the peer effect estimators depend on the finite population variances of potential outcomes and individual peer effects, the products of two subgroup average potential outcomes, and the product averages ’s. In contrast to , the average

excludes the product of two potential outcomes of the same unit. Note that we cannot unbiasedly estimate quantities involving

in general because we cannot jointly observe the potential outcomes, and , for any unit and any treatments .

Moreover, the sampling variance of is a weighted summation of the sampling variances of the ’s, corresponding to the first two terms in (1), and the sampling covariances between and , corresponding to the last double summation in (1).

3.3 Estimating the sampling variances

From Theorem 1, to estimate the sampling variances, we need to estimate the population quantities in (1) and (1). For , define

(12)
Theorem 2.

Under Assumptions 13, for and ,

The estimators in Theorem 2 correspond to the sample analogues of these finite population quantities, with carefully chosen coefficients to ensure unbiasedness. Theorem 2 guarantees that we have unbiased estimators for all terms in and except the variance of the individual peer effects . We cannot unbiasedly estimate from the observed data. This is analogous to other finite population causal inference (Neyman 1923). Because the coefficients of in the variance formulas (1) and (1) are both negative, we can ignore the terms involving and conservatively estimate the sampling variances by simply plugging in the estimators in Theorem 2. Note that holds under additivity defined below.

Definition 1.

The individual peer effects for units with attribute are additive if and only if is constant for each unit with attribute , or, equivalently, .

Therefore, the final estimator for is unbiased under additivity for , and the final estimator for is unbiased under additivity for all .

4 Inference for peer effects under complete randomization

Under random partitioning, the formulas of , and are complicated, and so are the sampling variances of peer effect estimators. We relegate them to the Supplementary Material. Fortunately, they have much simpler forms under complete randomization. In this section, we will focus on the inference under complete randomization.

4.1 Treatment assignment under complete randomization

The randomness in the peer effect estimators comes solely from the treatment assignments for all units, . Therefore, we need to first characterize the distribution of the treatments under complete randomization. Intuitively, the symmetry of complete randomization suggests that has the same distribution as the treatment of a stratified randomized experiment. The following proposition states this equivalence formally.

Proposition 2.

Under Assumptions 1 and 2, the complete randomization defined in Section 2.4.2 induces a stratified randomized experiment, in the sense that (1) for each , in the stratum consisting of units with attribute , units receive treatment for any , and any realization of treatments for these units has the same probability; and (2) the treatments of units are independent across strata.

Proposition 2 follows from the numerical implementation of the complete randomization described in Section 2.4.2. It implies the formulas of , and . We give a formal proof in the Supplementary Material. The group assignment for units with the same attribute induces a completely randomized experiment, with units receiving treatment . Moreover, the group assignments for units with different attributes are mutually independent.

4.2 Point estimators for peer effects

Proposition 2 characterizes the treatment assignment of complete randomization, which allows us to express the peer effect estimators in simpler forms.

Corollary 1.

Under Assumptions 1 and 2, and under the complete randomization defined in Section 2.4.2, for and ,

(13)

Therefore, under complete randomization, the unbiased estimator of the subgroup average peer effect, , is the observed difference in outcome means under treatments and for units with attribute .

4.3 Sampling variances of the peer effect estimators

The sampling variances also have simpler forms under complete randomization.

Corollary 2.

Under Assumptions 1 and 2, and under the complete randomization defined in Section 2.4.2, for and ,

From Corollary 2, the variance formula of the subgroup average peer effect estimator under complete randomization is the same as that for classical completely randomized experiments with multiple treatments (Neyman 1923). This follows from the equivalence relationship in Proposition 2. Corollary 2 also implies that . This follows from the mutual independence of in an experiment stratified on attributes.

From Corollary 2, the ’s are the effective sample sizes. One the one hand, this is intuitive because they are the sample sizes of the stratified experiment described in Proposition 2. One the other hand, this is counterintuitive because units in the same group have correlated observed outcomes. However, this correlation does not diminish the effective sample sizes in contrast to the correlation in standard group-randomized experiments. Units in the same group could potentially be in a different group under a different realization of the treatment assignment. The probability that two given units are in the same group decreases as increases, and so does the correlation between their observed outcomes.

4.4 Estimating the sampling variances

From Proposition 2, reduces to the sample mean, and