Connected but Segregated: Social Networks in Rural Villages

by   Felipe Montes, et al.
Harvard University

There is an increased appreciation for, and utilization of, social networks to disseminate various kinds of interventions in a target population. Homophily, the tendency of people to be similar to those they interact with, can create within-group cohesion but at the same time can also lead to societal segregation. In public health, social segregation can form barriers to the spread of health interventions from one group to another. We analyzed the structure of social networks in 75 villages in Karnataka, India, both at the level of individuals and network communities. We found all villages to be strongly segregated at the community level, especially along the lines of caste and sex, whereas other socioeconomic variables, such as age and education, were only weakly associated with these groups in the network. While the studied networks are densely connected, our results indicate that the villages are highly segregated.


A Model for Urban Social Networks

The emergence of social networks and the definition of suitable generati...

Popularity is linked to neural coordination: Neural evidence for an Anna Karenina principle in social networks

People differ in how they attend to, interpret, and respond to their sur...

Structuring Communities for Sharing Human Digital Memories in a Social P2P Network

A community is sub-network inside P2P networks that partition the networ...

Using an online sample to learn about an offline population

Online data sources offer tremendous promise to demography and other soc...

Social Networks as a Tool for a Higher Education Institution Image Creation

The article presents the dynamics of social networks users increase, dep...

Characterizing Sociolinguistic Variation in the Competing Vaccination Communities

Public health practitioners and policy makers grapple with the challenge...

1 Introduction

The study of social network structure has enabled the identification of social relations as conduits for the spread of health related behaviors in both randomized and observational studies [1, 2, 3]. Several studies have now demonstrated the utility of social networks for identifying initial spreaders within networks and how they may be harnessed to increase the efficiency of public health and development interventions [4, 5, 6, 7]. It has been shown that the community structure of networks, where a community refers to a set of densely interconnected nodes, is important for targeting public health interventions, especially interventions that may have spillover effects, i.e., when the effect of an intervention may spread from one person to another [8, 9]

. An additional consideration is the role of homophily, the tendency for individuals to be connected to others like them, which can result in overly optimistic estimates of the effectiveness of different seeding strategies of interventions if not properly taken into account 

[10]. Karnataka is a southern state of India with approximately 55 million inhabitants, and it agglomerates individuals from heterogeneous castes under a common language and religion. Prior research has studied the Karnataka networks at the household level with the goal of identifying injection points for a microfinance program [4]. The network structure within these villages would be expected to be influenced by the ancient caste system of social stratification that generates hierarchies, restricts dietary and social interactions, and creates physical and educational separation between people from different castes [11]. Our goal in this paper is to carry out a detailed investigation into the structural properties of these networks, their community structure, and the role that nodal covariates play in giving rise to social segregation. A deeper understanding of these networks could make it possible to develop more effective intervention strategies that overcome the existing social and cultural barriers in villages similar to these throughout the world in resource poor settings.

2 Results

2.1 Network Characteristics

We used data, now available in the public domain, collected in 75 villages in Karnataka, India, in a study conducted by the Abdul Latif Jameel Poverty Action Lab in 2006  [4]. As in the Diffusion of Microfinance study, for every village we constructed undirected single-layered networks, where a node represents an individual and an undirected tie connects two nodes if one of the individuals reported at least one of the 12 types of relationships with the other  [4] (see Methods). We also make use of the demographic and social network data collected from surveyed individuals. Nodal attributes or covariates include sex, age, religion (Hinduism, Islam, Christianity), caste (Scheduled Caste, Scheduled Tribe, Other Backward Class and General), level of education (years), an indicator for whether the individual worked during the week preceding the survey, and an indicator for whether the individual had a bank account. The networks sizes across villages vary from 354 to 1773 nodes with a median value of 869 (see Supplementary Materials).

Networks Largest Connected Components
Min Median Max Min Median Max
354 869 1773 346 850 1729
1540 3750 7854 1519 3703 7818
6.8 8.4 10.4 7.0 8.6 10.6
0.6 0.6 0.7 0.6 0.6 0.7
Components 1 7 25 1 1 1
- - - 89% 98% 100%
- - - 95% 99% 100%
Table 1: Characteristics of the 75 networks and of their largest connected components (LCCs). Here is the number of network nodes, the number of LCC nodes, the number of edges, is the number of LCC edges, the edge density, the mean degree, and the mean (local) clustering coefficient.

For our analyses we use the largest connected components (LCCs) of the networks. All networks, with one exception, have more than one connected component and the LCCs of these networks contain a median value of 98% of network nodes and 99% of network ties (Table 1). Moreover, the average degree and mean clustering coefficient of the LCCs are within 3% of those computed for the full networks.

2.2 Dyad-level assortativity

Social segregation might be attributed to dyad-level assortativity which quantifies the extent to which pairs of connected nodes share the value of an attribute of interest  [12, 13]

. By using a logistic regression model, which assumes dyadic independence, we modeled the existence of a tie between a pair of nodes based on the sex, age, caste, religion, education, employment and savings of the individuals. We found that, at the dyadic level, 98% of the networks had significant assortativity based on the caste attribute (Figure 


). Across the villages, individuals belonging to the same caste are 1.10 to 18.56 times more likely to form a tie than individuals belonging to different castes. Assortativity based on age, savings, work and education attributes was significant in 80%, 44%, 41%, and 60% of the villages, respectively, and the odds ratios for these attributes ranged from 0.96 to 1.54 for age, 0.81 to 2.13 for savings, 0.91 to 1.60 for work flag and 0.85 to 1.38 for education (Table 

2). These results show that connected node pairs are much more similar in terms of their nodal attributes than unconnected node pairs. Nevertheless, these associations remain weaker than that associated with the caste attributes of individuals. In addition, assortativity for these attributes are similar and vary in a small range across the villages. In the 75 LCCs, 95.5% of individuals belong to the same religion, and consequently assortativity associated with religion was significant only in 38% of the networks with odds ratios lower than 1.5 except for one village where the odds ratio is close to 2. By contrast, individuals with the same sex are 1.1 to 3.4 times more likely to form a tie than individuals of the opposite sex (odds-ratio cumulative distributions are available in the supplementary materials).

Figure 1:

Dyad-level assortativity odds ratios for different nodal attributes in each of the 75 LCCs. The error bars indicate 95% confidence intervals for the odds ratios. Villages are ordered along

-axis in increasing order of odds ratios (point estimates), not according to village indices. The ordering of villages is consequently different in each panel.
Attribute Number of LCCs with Dyad-level Mutual
categories significant dyad-level odds-ratio Information
assortativity (%) odds-ratio coefficient
Min Median Max
Caste 4 99 4.31 5.06 18.56 0.39
Sex 2 97 1.10 1.56 3.40 0.01
Age 6 80 0.96 1.22 1.54 0.08
Workflag 2 41 0.91 1.15 1.60 0.40
Education 6 60 0.85 1.11 1.38 0.10
Savings 2 44 0.81 1.09 2.13 0.05
Religion 4 38 0.49 1.05 48.52 0.08
Table 2: Dyad-level assortativity odds ratios and community-based assortativity mutual information coefficient for different nodal attributes in each of the 75 LCCs

The logistic model results for dyad-level assortativity by sex do not reflect whether the effect is attributed to male-male or female-female relations. To further investigate dyadic assortativity by sex, we complemented the logistic model results by formulating a mean degree constrained null model and determining whether male-male, female-female and female-male relations were associated with preferential tie formation among individuals. This statistical test is based on keeping the structure of the network fixed and randomly reassigning the sex attribute for each node by a random permutation. According to the results, male-male ties occur more frequently than expected by chance in 72 (96.0%) villages (Table 3). Sex dissortativity is rare and, in fact, the male-female relations were less likely to occur than expected by chance in only 2 networks (2.7%). Female-female ties were more common than expected by chance in just 8 (10.7%) of the villages, in the other 67 (89.3%) villages the test results were not statistically significant. As a sensitivity analysis, we also report the results when the mean degrees of males and females are allowed to deviate as much as 20% from their empirically observed counterparts. As a consequence, the percentage of networks with male assortativity decreases to 74%, the percentage of networks with female assortativity increases to 14%, and the percentage of networks where female-male ties are less likely to occur than by chance remains at 97.4%. This shows that with relaxed restrictions on the permutation there is still male-male preference in most of the cases and female-female preference in some cases. Sex dissortativity results are unaffected by relaxation of the null model.

Degree Relationship Percentage of Percentage of
tolerance type villages with villages with
assortativity dissortativity
5% Male-Male 96.0% 0.0%
Male-Female 0.0% 97.3%
Female-Female 10.7% 0.0%
20% Male-Male 76% 0.0%
Male-Female 0.0% 97.3%
Female-Female 16.0% 0.0%
Table 3: Results for dyad-level assortativity and dissortativity based on sex for 75 villages (networks) in Karnataka.

2.3 Community-level Assortativity

While dyadic analysis permits us to quantify assortativity at the level of node pairs, it does not provide evidence of assortativity at the level of groups consisting of more than two nodes. We assess community-level assortativity by first detecting network communities  [14, 15, 16] and then investigating the extent to which network communities share nodal attribute values. We apply modularity maximization  [12, 17]

to detect communities in the LCC of each network using the so-called Louvain heuristic for maximizing modularity 

[18] (For information on the number of communities per LCC and the communities size distribution see Supplementary Materials). In Figure 2 we provide intuition about which attributes might be associated with network communities detected. We visualized the village networks with nodes colored by the network community assignment, sex, age, religion, caste, education and employment indicators and savings. We show these visualizations for one of the villages (village 52) and observe that caste appears most strongly associated with network communities.

Figure 2: Visualization of a village social network colored by (a) community assignment (obtained via modularity maximization), (b) age (purple: 18-30 years, red: 31-40 years, green: 41-50 years, light green: 51-64 years, blue: years), (c) sex (red: male, blue: female), (d) caste (purple: scheduled caste, green: scheduled tribe, red: OBC, cyan: general), (e) religion (red: Hinduism), (f) work flag (orange: worked last week, purple: did not work last week), (g) savings (purple: does not have a bank account, green: has a bank account), (h) education (pink: 1-9 years, blue: 10-13 years, cyan: 14-15 years, red: no education). For all panels, nodes with missing covariates are not visualized.

We calculate the normalized mutual information coefficient across all nodes in each village between attribute values and community assignments (Table 2). We see that caste has a greater (SD) value than the other attributes (caste 0.39 (0.10), sex 0.01 (0.01), age 0.08 (0.02), religion 0.08 (0.12), education 0.10 (0.03), savings 0.05 (0.03), employment 0.04 (0.02)). Considering all networks, The (SD) values for the other attributes are close to 0, showing a weak association with the community assignment (Figure 3). This supports the notion that caste is a predictor of network-wide segregation. We also observe that although networks exhibit dyad-level assortativity for the node sex attributes as discussed above, the value of between sex and community assignment is low, suggesting that network communities are more strongly associated with caste than sex.

Figure 3: Normalized mutual information coefficient between community assignment of nodes and different node covariates for the 75 Karnataka networks. The attributes in the plot, corresponding to different rows, are sorted by the mean normalized mutual information coefficient across the 75 networks.

2.4 Assortativity among communities

To visualize assortativity among communities, we constructed an undirected network of communities. In this community-level network, each node corresponds a community in the individual-level network detected using modularity maximization, and each edge corresponds to the number of ties in the individual-level network that exist between members of the two communities. To simplify visualization, we only included communities that contained at least 5% of the nodes in the individual-level network in them, and we only included edges between two communities when there existed at least 5% of the possible connections among the individuals belonging to those communities. The resulting community networks contained on average 77% of nodes and 26% of ties present in the underlying individual-level networks.

We represent the caste distribution for each community as a pie chart embedded in the node (Figure 4). Graphically, we observe that communities are mainly composed of people belonging to a single caste. This is consistent with the values showing that the network community structure can be attributed mainly to the caste of the individuals. In addition, we observe that for some villages, the communities appear to be connected to other communities with similar caste composition (Figures 4 a,b).

Figure 4: Examples of two community-level networks. Each node represents a community detected in the individual-level network using modularity maximization, and each edge represents the existence of one or more ties among individuals in different communities. The community caste distribution is shown as an embedded pie chart within each node (brown: scheduled caste, red: scheduled tribe, orange: OBC, yellow: general).

We adapted the method of modularity maximization  [12] to assess if caste-based segregation is also present at the level of communities and not only at the individual level, and we measured caste-based assortativity within and between communities with normalized forms (see Methods). Normalized modularity within communities is positive and higher than 0.3 with a mean (SD) of 0.37 (0.04) in 73 (97.3%) villages. This result shows that there is strong assortativity based on caste among individuals of the same communities in the majority of villages. On the other hand, modularity between communities is positive for 72 (96.0%) villages with a mean (SD) of 0.21(0.12). In addition, for 61 (81.3%) villages is positive and higher than 0.2 showing that there is a tendency of communities to cluster in groups according to their predominant caste. For 11 (14.7%) of the villages, is close to 0 meaning that there is no segregation between communities driven by caste, and for 3 (4.0%) of the villages is negative showing a tendency of communities to be related to communities with a different predominant caste. and values for each network are available in the Supplementary Materials.

In resume, we found that, at the dyadic level, same-caste individuals are about 4 times more likely to form a tie with one another than different caste individuals. In addition, we found that there is an association between the networks community partition and the individuals caste. In fact, our results show that same-caste individuals are more likely to form a tie within communities, and that there is a tendency for same-caste communities to connect.

3 Discussion

We studied the structure of individual-level and community-level networks in the villages of Karnataka, India. Our main finding is that while every village is relatively densely connected at the individual-level, the networks are segregated at both the individual and community levels. At the dyadic level, we found strong evidence for sex-based assortativity, and in all but one village ties among males occurred more frequently than ties among females, or mixed ties involving males and females. Even as the Indian society is witnessing a shift from being male-dominated  [19], sex-based segregation is still evident in the social networks of the Karnataka villages. This result is consistent with recent findings on emergent inequality in social capital in India where resources and benefits accumulate among different communities and groups of people according to sex and caste. [20]. Caste continues to play a particular (gender-specific) role in shaping schooling choices of parents increasing the mismatch in education choices and even occupational outcomes between boys and girls in the same caste [21]. From a public health perspective, however, sex-based segregation at the dyadic level does not necessarily exclude one group from the benefits of an intervention targeted at the other. In fact, it has been shown that while men and women report same-sex friendships much more frequently than mixed-sex friendships, mixed-sex ties play an important role in the spread of public health interventions in resource poor settings  [22].

We found caste to have a greater effect in segregation than the any other attribute in the study, and segregation by caste occurred in 59 of the 75 villages (78.7%). This finding supports the notion that caste remains a dominant factor in the discourse on social exclusion in India [23]. For all villages, caste was associated with segregation at both the dyadic and community levels. In contrast, sex and other demographics, such as age and employment status, were not associated with network communities. Tie formation and dissolution are often correlated across dyads [24], and here we observed that those behaviors appear to occur at the group level. The effect of segregation depends on the village, and even villages that are geographically nearby can exhibit different levels of caste-based segregation. These differences across villages should be taken into account when planning regional interventions as it has been shown that failure to consider homophily can lead to significant overestimates of the effectiveness of seeding strategies for interventions [10]. Rural villages need increased attention from public health practitioners given their isolation, vulnerability, low income, and limited access to services. Social markers of inequality are expected to be present virtually everywhere. In India, the caste of a person is an attribute that is both observable and immutable. In general, homophily may be due to multiple mechanisms, social selection and social influence being the two prominent candidate mechanisms. In our study, however, the immutability of caste excludes social influence as a possible mechanism and, furthermore, the observability of caste makes it a plausible target for social selection. These aspects of the caste system make India a compelling country for our study. Even though, the methods presented in this study could contribute to make a similar analysis in contexts where it is not clear which attributes are leading to social segregation. This could contribute to gain awareness of network structure and network effects among policymakers and practitioners enhancing the effectiveness of public health interventions.

This study has some limitations. First, we did not take into account isolated nodes in the analysis, which represented only an average of 2% of the network nodes. These nodes could be useful for learning about more extreme effects of segregation however we carried out a statistical analysis of the networks structure observing that structural properties, such as the degree distribution and clustering coefficient, were not statistically different when removing the isolated nodes (See Supplementary Materials). Second, attribute data were available for 16983 (25.2%) nodes and 76440 (26.2%) of connected node pairs. Because imputation techniques for network data are still in their infancy, we used complete case analysis, i.e., we only included node pairs that had no missing attributes in our models. Consequently, interpretation of model results is subject to this limitation. Finally, the original dataset only included undirected relations and six nodal attributes. It is possible that other attributes that were not measured during the original study, such as political affiliation, are associated with the observed community structure of the networks. Since essentially all network studies, ours certainly included, are observational studies rather than randomized experiments, it is not possible to identify the causes of the observed network structures. More specifically, here it is not possible to identify the causes of the observed network community structure. This is because, using the language of causal inference, all observational studies are subject to unmeasured confounders, where confounding is a bias that arises when the treatment and the outcome share a cause. We therefore stress that the observed community structure is associated with, rather then caused by, certain individual-level attributes.

The statistical model used here is a simple, scalable, and easily interpretable model that belongs to the exponential family. One could alternatively employ different types of models, such as latent space models [25], stochastic block models  [26], or exponential random graph models (ERGM) [27]. We tried fitting a more complicated ERGM to our data, but unfortunately the model failed to converge. Future studies might employ a multiplex network approach using different similarity measures among layers and hence compressing part of the information. This could be useful for detecting differences in the assortativity effects given the different types of relations reported by individuals. Moreover, future studies could benefit from collecting longitudinal data on social networks in the villages and analyze faction and disagreements from a dynamical perspective. This certainly could add more evidence on the causes of segregation among individuals. In fact, new classes of connectivity-informed designs for cluster randomized trials for infectious diseases have been recently proposed, and the designs appear to be able to simultaneously improve public health impact and detect intervention effects [28]. Adoption of similar designs could improve social and behavioral interventions.

4 Methods

We used data, now available in the public domain, collected in 75 villages in Karnataka, India, in a study conducted by the Abdul Latif Jameel Poverty Action Lab in 2006 for the Diffusion of Microfinance study published by Banerjee et. al in 2013 [4]. The villages were chosen by Bharatha Swamukti Samsthe (BSS), an organization that operates a conventional group-based microcredit program in India. BSS provided the authors with a list of 75 villages in which they were planning to start operations, and prior to BSS’s entry, these villages had almost no exposure to microfinance institutions, and limited access to any type of formal credit. The data contains information about household-level attributes (e.g., roofing material, type of latrine, quality of access to electric power) and individual-level attributes (e.g., age, sex, religion). The individual-level data was collected in a survey administered only to households that had at least one female aged 18–50 living in the household (about 46% of the households did). The survey was administered to the head of the household, the spouse of the household head, and to other adult women and their spouses if these women were available for the survey. For non-Hindu households, the survey was administered only if the group represented a minority group in the village, whereas for the Hindu households the survey was randomly administered to 50% of the households [4].

We make use of the demographic and social network data collected from surveyed individuals. Demographics included data on sex, age, religion (Hinduism, Islam, Christianity), caste (Scheduled Caste, Scheduled Tribe, Other Backward Class and General), level of education (years), an indicator for whether the individual worked during the week preceding the survey, and an indicator for whether the individual had a bank account. The social network data included the names of people (1) who visit the respondent’s home, (2) whose homes the respondent visits, and (3) who are the respondent’s kin in the village; it also had the names of people (4) who are relatives with whom the respondent socializes, (5) from whom the respondent receives medical advice, (6) from whom the respondent would borrow money, (7) those to whom the respondent would lend money, (8) those from whom the respondent would borrow material goods (kerosene, rice, etc.), (9) those to whom the respondent would lend material goods, (10) those from whom the respondent gets advice, (11) those to whom the respondent gives advice, and (12) those with whom the respondent goes to pray (at a temple, church, or mosque). Respondents could nominate individuals who did not answer the survey and, consequently, only 16983 (25.2%) individuals and 76440 (26.2%) of connected node pairs have demographic information available.

4.1 Dyad-level Assortativity

We constructed a simple statistical model to predict the existence of a tie between a pair of nodes based on the sex, age, caste, religion, education, and indicator variables for employment and savings of the individuals. We modeled the binary status of each dyad (tie exists vs. tie does not exist) using logistic regression where similarities of nodal attributes across the dyad were used as predictors. In other words, we considered all node pairs (dyads) in the network, connected or not, and we regressed the binary status of each dyad (0 = not connected, 1 = connected) on the similarity of the corresponding nodal attributes. For continuous attributes, the similarity of two attributes was defined as the difference in their value; for discrete attributes, regardless of the number of categories, the similarity was taken to be 0 if the attributes did not match and 1 if they matched. This model is specified as


where is 1 if there is a tie between nodes and , otherwise it is 0. We denote with the collection of all attributes, and for a given attribute , such as age, denote the values of the binary indicator for nodes and . We use

to denote all model predictors. We estimate this model separately for the LCC of each network, obtaining an estimate and standard error of the regression coefficient

for attribute in each village. Exponentiated coefficients can be interpreted as odds ratios such that a unit difference in predictor corresponds to a multiplicative change of in the odds.

Since sex-based homophily is common and the logistic model results for dyad-level assortativity by sex do not reflect whether the effect is attributed to male-male or female-female relations, we constructed a mean degree constrained null model for determining whether male-male, female-female and female-male relations were associated with preferential tie formation among individuals. We built the degree constrained null model by keeping the structure of the network fixed and randomly reassigning the sex attribute for each node by a random permutation. We repeated this resampling process 1000 times. Ideally, one would like the null model to preserve correlation between local network properties, like degree and the value of the nodal attribute. The reason this is important is that the number of, say, male-male ties expected under the null model should clearly depend on the number of ties males have as a group. If males have more ties than females, say, an unconstrained permutation of nodal attributes (male, female) will not preserve this observed feature of the data and will lead to a biased comparison between observed and expected tie counts. Here we have chosen to preserve, to a reasonable extent, the mean degree of males and females. We accept any given realization of the null model if the mean degree of males and females in the permuted networks are within 5% of their values in the empirical network, denoted by and , respectively. For each such valid simulated network, we count the number of male-male, male-female and female-female ties. Then, we calculate the ratio of observed tie counts to simulated tie counts. After completing 1000 successful simulations, we perform a two-sided test with a Fischer

-value for non-symmetrical distributions under the null hypothesis of no sex-based homophily. A

-value lower than is interpreted as greater-than-by-chance assortativity.

Demographics are not available for all individuals in the LCCs of the networks, thus we only consider the individuals in the LCC that have data. In the simulated network, males and females may not conserve the structural properties of the original LCC subgraph if we permute the sex attributes among all the LCC nodes (including those with missing attributes). The extent to which this happens can be assessed by comparing the mean degrees of nodes having observed sex attributes with nodes having missing sex attributes. By applying Student’s t-test, we observed that structural properties of nodes with missing sex attributes are statistically different from the nodes with observed sex attributes (see Supplementary Materials).

4.2 Community-level Assortativity

We assess community-level assortativity by first detecting network communities  [14, 15, 16] and then investigating the extent to which network communities share nodal attribute values. We apply modularity maximization  [12, 17] to detect communities in the LCC of each network using the so-called Louvain heuristic for maximizing modularity [18]. To quantify the association between community membership of nodes () and nodal attributes (), we compute normalized mutual information (NMI) [29]


where is the mutual information (a non-linear measure of association) between and , is the entropy of and is the entropy of , where . is a continuous measure ranging from 0 to 1, and a value of 0 means that the distribution nodal attribute carries no information about the community memberships of nodes, whereas an value of 1 means that node attribute can be mapped to community memberships (Supplementary Materials). Finally, we measure the variability of across the villages in order to see to what extent this association varies across the villages.

4.2.1 Assortativity within and between Communities

For a given village, we consider the undirected LCC with nodes and edges where each node has been assigned to a single community. The value of modularity enables us to quantify assortativity, i.e., to what extent the edges connect nodes of the same type (same attribute value) compared to what we would expect by chance  [12]:


Here is the network adjacency matrix, and represent the degrees of nodes and , and are the attribute values of nodes and , respectively, and is the Kronecker delta that is equal to 1 if and 0 otherwise. The modularity takes on positive values if there are more edges between nodes sharing an attribute value than what would be expected by chance, and negative values if there are fewer such edges  [17]. Given that modularity maximization assigns every node to a single community, the number of edges in a given network can be decomposed into two parts, the number of edges between communities and the number of edges within communities , yielding . Analogous decomposition can be made for node degrees such that the degree of node is the sum of the number of edges connecting it to nodes of the same community, , and the number of edges connecting it to nodes in other communities, , where . In order to assess the contribution of edges within communities and edges between communities to modularity, we decompose the modularity measure in Eq. 3 into two measures and by including the community assignment of nodes to Eq. 3.

Let represent the community assignment of node assuming values in the integers where is the number of communities detected in the given network. We define the modularity measure for assessing assortativity based on an attribute of nodes within communities as:


A positive value of for a given network means that there are more edges between nodes belonging to the same community and having the same attribute value than we expect by chance.

Another way to assess assortativity is to investigate if there are more edges between nodes belonging to different communities and having the same attribute value than we expect by chance. This measure would be similar to Eq. 4 but instead of considering the number of edges that connect nodes within communities and the within-community degree , we consider the number of edges that connect the nodes across communities and the outside-community degree k that connect node to nodes in other communities. The equation for calculating modularity between communities is now given by:


where the last term of Eq. 4 is replaced by the 1-complement of the Kronecker delta for only taking into account edges connecting nodes in different communities. A positive value of for a given network means that there are more edges between nodes belonging to different communities and having the same attribute value than we would expect by chance.

Across the networks, the maximum value modularity can attain depends on the size of the groups and the degree of the nodes in each network. In order to have comparable results for the different villages, we normalized the within and between community modularities by dividing the and values for a perfectly assortative network [30].



We are grateful to R. Zarama, OL. Sarmiento, and the Onnela Lab members for their help at various stages. We acknowledge the Abdul Latif Jameel Poverty Action Lab that has generously placed the data for this research in the public domain.


  • [1] Damon Centola. The spread of behavior in an online social network experiment. science, 329(5996):1194–1197, 2010.
  • [2] Nicholas a Christakis and James H Fowler. The spread of obesity in a large social network over 32 years. The New England journal of medicine, 357(4):370–379, 2007.
  • [3] Thomas W Valente. Social networks and health: Models, methods, and applications. Oxford University Press, 2010.
  • [4] Abhijit Banerjee, Arun G Chandrasekhar, Esther Duflo, and Matthew O Jackson. The diffusion of microfinance. Science (New York, N.Y.), 341(6144):1236498, jul 2013.
  • [5] Nicholas A Christakis and James H Fowler. Social network sensors for early detection of contagious outbreaks. PLoS One, 5(9):e12948, 2010.
  • [6] Ruth F Hunter, Helen McAneney, Michael Davis, Mark A Tully, Thomas W Valente, and Frank Kee. “hidden” social networks in behavior change interventions. American journal of public health, 105(3):513–516, 2015.
  • [7] David A Kim, Alison R Hwong, Derek Stafford, D Alex Hughes, A James O’Malley, James H Fowler, and Nicholas A Christakis. Social network targeting to maximise population behaviour change: a cluster randomised controlled trial. The Lancet, 2015.
  • [8] Damon Centola. An experimental study of homophily in the adoption of health behavior. Science, 334(6060):1269–1272, 2011.
  • [9] Javier Borge-Holthoefer, Raquel A. Baños, Sandra González-Bailón, and Yamir Moreno. Cascading behaviour in complex socio-technical networks. Journal of Complex Networks, 1(1):3–24, 2013.
  • [10] Sinan Aral, Lev Muchnik, and Arun Sundararajan. Engineering social contagions: Optimal network seeding in the presence of homophily. Network Science, 1(02):125–153, 2013.
  • [11] G D Berreman. Race, caste, and other invidious distinctions in social stratification. Race & Class, 13(4):385–414, 1972.
  • [12] M E J Newman. Mixing patterns in networks. Physical review. E, Statistical, nonlinear, and soft matter physics, 67(2 Pt 2):026126, 2003.
  • [13] Rogier Noldus and Piet Van Mieghem. Assortativity in complex networks. Journal of Complex Networks, 3(4):507–542, 2014.
  • [14] J M Kumpula, J P Onnela, J Saramaki, K Kaski, and J Kertesz. Emergence of communities in weighted networks. Physiccal Review Letters, 99:228701, 2007.
  • [15] S Fortunato. Community detection in graphs. Physics Reports, 486(3):75–174, 2010.
  • [16] Mason a. Porter, Jukka-Pekka Onnela, and Peter J. Mucha. Communities in networks. Notices of the AMS, 486(3-5):1082–1097, 2009.
  • [17] M Girvan and M E J Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002.
  • [18] V D Blondel, J L Guillaume, R Lambiotte, and E Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008.
  • [19] Kalra Gurvinder and Bhugra Dinesh. Sexual violence against women: Understanding cross-cultural intersections. Indian Journal of Psychiatry, 55(3):244–249, 2013.
  • [20] Paromita Sanyal. Group-based Microcredit & Emergent Inequality in Social Capital: Why Socio-religious Composition Matters. Qualitative Sociology, 38(2):103–137, 2015.
  • [21] Kaivan Munshi and Mark Rosenzweig. Traditional institutions meet the modern world: Caste, gender, and schooling choice in a globalizing economy. American Economic Review, 96(4):1225–1252, 2006.
  • [22] Alison R Hwong, Jukka-Pekka Onnela, David A Kim, Derek Stafford, D Alex Hughes, and Nicholas A Christakis. Social ties and health: an analysis of patient-doctor trust and network-based public health interventions through randomized experiments and simulations. Doctoral Dissertation, Harvard University, 2016.
  • [23] Rajan R Patil. Caste-, work-, and descent-based discrimination as a determinant of health in social epidemiology. Social work in public health, 29(4):342–349, 2014.
  • [24] Adam Douglas Henry, Pawel Pralat, and Cun-Quan Zhang. Emergence of segregation in evolving social networks. Proceedings of the National Academy of Sciences, 108(21):8605–8610, 2011.
  • [25] Peter D. Hoff, Adrian E. Raftery, and Mark S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090–1098, 12 2002.
  • [26] Y. J. Wang and G. Y. Wong. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82:8–19, 1987.
  • [27] Garry Robins, Pip Pattison, Yuval Kalish, and Dean Lusher. An introduction to exponential random graph (p*) models for social networks. Social Networks, 29(2):173 – 191, 2007. Special Section: Advances in Exponential Random Graph (p*) Models.
  • [28] Guy Harling, Rui Wang, Jukka-Pekka Onnela, and Victor DeGruttola. Leveraging contact network structure in the design of cluster randomized trials. Harvard University Biostatistics Working Paper Series, 2016.
  • [29] A F McDaid, D Greene, and N Hurley. Normalized mutual information to evaluate overlapping community finding algorithms. arXiv preprint arXiv:1110.2515, 2011.
  • [30] Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.