1 Introduction
Networks are an intuitive and a powerful way to describe interactions among individuals in many fields of application. In social sciences, for example, network structures describe concisely the observed relationships among people, tribes, social media accounts and so forth. A recent review about statistical methods and models used in this research area can be found in Kolaczyk (2009). Most of the literature on modelling network data can be grouped into three main branches, with some natural overlapping between the categories: stochastic block models, exponential random graph models, latent space models. Stochastic block models (SBMs) date back to the work of Holland, Laskey and Leinhardt (1983), where the idea of modeling partitions of the network, called blocks or communities, was first introduced. Since then, numerous extensions, such as mixed memberships and dynamic networks, have been proposed (Wang and Wong, 1987; Nowicki and Snijders, 2001; Airoldi et al., 2008; Xing et al., 2010). Another way to summarize a network structure is to model the amount of substructures, in a graphical and topological sense, comprising the network itself. This approach has been formulated as the exponential random graph model in the early work of Frank and Strauss (1986); see also Wasserman and Pattison (1996) and Robins et al. (2007)
for a review of some recent developments. Finally, the last framework deals with individuals in the network and their relations by projecting them into a latent space, where the probability of interaction between units is modeled based on their distance in this nonobservable representation
(Hoff, Raftery and Handcock, 2002). Recent extensions of this model allow incorporating more complex features of the data, such as dynamic evolution (Handcock, Raftery and Tantrum, 2007; Raftery et al., 2012; Durante and Dunson, 2014; Sewell et al., 2017).The approaches mentioned above can be used on network data where all nodes, or actors, are of the same nature. Some network data, however, are provided in the form of attendances of individuals, actors, to events. These data are also called twomode networks, bipartite graphs, or affiliation networks (Wasserman and Faust, 1994, Chapter 8). Examples of these networks include: people visiting movies, nations belonging to alliances and cosponsorships of legislative bills; see Doreian, Batagelj and Ferligoj (2004) for references. A wellknown twomode network collected by Davis et al. (1941) concerns a group of 18 women in the Southern US attending sociopolitical events, which has been subject of a metaanalysis by Freeman (2002) comparing 21 different methods.
In the literature, there are only a few models that deal directly with this actorevent organization of the network, such as Aitkin, Vu and Francis (2017), who proposed a Rasch model approach. In most cases transformation procedures are used to change actorevent data to actoractor data. A recent example is Signorelli and Wit (2017), who provide a penalized approach for network data representing cosponsorships of legislative bills in the Italian Parliament. Transforming the data has the inherent drawbacks of information loss (Neal, 2014). In addition, in many situations, it is of prime interest to identify clusters, communities, of individuals within the network according to their preferences to attend specific events instead of being based on how they interact with each other. Parallel to SBMs for actoractor data, there is then the need of a clustering model for actorevent data, whereby an actor (unit) is allocated into a community (cluster) based on the probability of attendance to the various events. Differently to SMBs, however, we expect these communities to potentially overlap with each other. Thus, this paper proposes a mixture model formulation that accommodates this assumption and that can be applied directly to actorevent data.
2 Motivating example: Noordin Top terrorist network
In this paper we consider the Noordin Top terrorist network dataset, which contains information about 79 terrorists and their activities in Indonesia and nearby areas, covering the period from 20012010 (Everton, 2012; Aitkin, Vu and Francis, 2017). The network revolved around Noordin Mohammad Top, also known as ‘Moneyman’, his main collaborator Azahari Husin, and their affiliates. Data were periodically collected by the International Crisis Group (2009) in an exhaustive qualitative format. Information was later summarized by Everton (2012) into relationships between terrorists, attendances to events and individual data on each terrorist, such as level of education, nationality, etc. The twomode actorevent network model focuses on the recorded attendances of the 79 terrorists to the 45 events. These could be meetings of the Noordin Top network, actual bombings and attacks, or trainings and other operational situations. In particular, we have: eight organizational meeting (ORG), five operations, i.e. bombings (OPER), eleven training events (TRAIN), two financial meetings (FIN), seven logistics meetings (LOGST) and twelve events generically categorized as ‘meetings’ (MEET).
One salient feature of the network is its sparsity, as can be seen in Figure 1a. Figure 1b shows how there are some terrorists and events capitalizing most of the connections. It is believed that a network of terrorists often operates by communities within the networks itself, whereby the individual terrorists are organized according to their role and contribution to the different activities of the whole group. More importantly, it is likely individuals do not belong to a single community, but to more substructures in the network. The aim of this paper is to develop a model which can identify how terrorists organize to form such structures.
3 Model formulation
The driving idea is to use a modelbased clustering approach to identify clusters of terrorists (actors) within the network, based on their attendances to events of different nature (bombings, trainings, financial meetings and so forth), by allowing for these communities to be potentially overlapped. We name the proposed model multiple allocation model for network data (manet).
3.1 Traditional modelbased clustering with finite mixture model
Data are organized in an matrix of observations , pertaining to individuals and their attendances to events. Each element
is a binary random variable, with
if subject attends event . We assume there exist subpopulations of individuals with cluster proportions. In the traditional setting, where clusters are mutually exclusive, this vector satisfies the conditions (i)
, for each , and (ii) (Aitkin, Vu and Francis, 2017). The task is to group together units sharing the same preferential attendance to theevents. Given the binary nature of response variables
and assuming independence, the marginal density of an observed attendance profile can be represented by , with the attendance profile of the th individual to the events and cluster specific parameters for the probability of attendance, , collected in . A hierarchical representation is available after introducing a unitspecific latent variable : if unit belongs to cluster , the vector is full of zeros except for the th element , so and , leading to the equivalent hierarchical conditional representationFor each individual , the model assumes the attendances to events and to be independent from one another, for all and .
3.2 Multiple allocation model for network data (manet)
In many cases, one is interested in groups that are not mutually exclusive, allowing an actor to be allocated simultaneously to potentially more than a single cluster of the mixture model. This problem has been addressed in the statistical literature by mixture models with overlapping clusters (Ranciati, Viroli and Wit, 2017). In order to cluster actorevent data by allowing possible overlaps, we relax conditions (ii) on the proportions and the condition regarding the allocation vector, for each . Each individual will be allowed to belong to any number of the classes. The number of all possible group membership configurations is equal to . Instead of working with the latent variables , we define a new dimensional allocation vector that satisfies . We can establish a 1to1 correspondence between and . In general, we introduce a binary matrix , with , with denoting the th row of .
For example, when individual may be assigned to the first cluster, , the second cluster , both of them or none and we have
We can now switch from a mixture model with overlapping parent clusters to a finite mixture of nonoverlapping heir clusters. Given our new assumptions on the proportions of the parent mixture model, the model formulation changes to
where now and are the attendance probabilities for the events for units whose distribution function is given by the nonoverlapping cluster . We specify a conjugate Dirichlet distribution for the proportions , that is . From we can always compute back the overlapping proportions with .
In order for the overlapping mixture model to have any use and purpose, the original parent cluster parameters should affect the heir cluster parameters. In particular, the probability for heir cluster of attending event should depend on the parameters of the parent clusters involved in the formation of heir cluster . This can be done in a number of ways, which is described more in detail in the next paragraph.
Linking parent and heir cluster parameters
We define the probability to attend event when belonging to heir cluster through a function , so that we can compute by looking at which parent clusters originated , through the vector , and combining their corresponding probabilities . By changing the definition of one can give different interpretations of the probabilities of attending events when belonging to a multiple allocation cluster.
In this paper we consider the minimum function, whereby we set the probability of attendance for the empty cluster to zero,
For the simple case that , an individual belonging to both clusters, , deciding whether to attend an event or not, will do so by following the lowest ‘preference’ for that specific event, that is . From a Venn diagram perspective, we are implying multiple allocation heir clusters to be intersections of the parent clusters originating them, intersections that are however ‘smaller’ than the parent clusters themselves. In addition, under this combining function , individuals attending few events will tend to be allocated into multiple allocation heir clusters.
It is worth noting that other choices are possible. For example, setting will tend to allocate units with a high number of attendances into multiple allocation clusters. In this case, the set intersections defined by will usually be ‘bigger’ than the parent clusters originating them. Finally, while we pay the price of increasing the number of proportions from to , these new quantities are not additional parameters and they can be computed from the parent parameters without increasing the parameter space’s dimensionality.
3.3 Bayesian inference
The updated hierarchical formulation of nonoverlapping mixture is given by
Following this structure, the joint complete data likelihood of the nonoverlapping clusters model is
where and the product involves only units allocated to cluster . The second term, , is a function of the parameters through the computed quantities . In order to devise a Gibbs sampler for , we consider the equivalent representation for the overlappingclusters mixture, as a function of the original parent parameters, that is . The first term is equivalent in both parametrization thanks to the 1to1 correspondence between and , and the computability of from . We focus now on the second term of the factorization, , as it is not immediately straightforward to define an equivalence. We introduce a new quantity , whereby if . Whereas, if and if we use the minimum operator, i.e. , then is a dimensional vector of zeros except for , with denoting the cluster with the lowest value among all the parameters for a fixed event . In other words: if a unit belongs to only one cluster (let’s say, ) it will fully contribute to the posterior of the corresponding ; but, if the unit is allocated into more than one group its contribution will be given only to the lowest parameter among all the relevant attendance probabilities for that th event. This definition is compatible with the minimum operator . For other operators, one needs to consider other solutions.
This leads to a convenient factorization of the complete data likelihood of the mixture in the space:
A sketch of our sampling scheme is the following. For each unit and heir cluster
, we compute the posterior probabilities of allocation according to
and we sample new latent allocation values for . The proportions are updated through the corresponding full conditional distribution, . Thanks to the priorlikelihood conjugacy, each of the are updated via a Gibbs sampler with
We implement all the samplers in an MCMC algorithm. The latter is also part of the R package manet, available on CRAN.
3.4 Selecting the number of clusters and criterion to allocate units
We select the Deviance Information Criterion (Spiegelhalter et al., 2002, DIC) as the model selection criterion. In the DIC, two quantities are balanced, namely the goodnessoffit and the complexity of the model. In this paper, we rely on the version DIC proposed in Celeux et al. (2006), as the original version does not deal properly with latent variables:
where both terms can be computed starting from the values sampled at each iteration of the MCMC algorithm. In particular,
and
In a set of competing models, differing from one another only by , we select the one with the lowest associated DIC value.
After the choice of and, implicitly, , units are allocated into clusters according to their average posterior probabilities and the maximumaposteriori (MAP) rule: that is, individual will be assigned to cluster showing the highest value for , computed after initial burnin window. This postprocessing of the posterior helps the interpretation of the results.
3.5 Quantifying clustering uncertainty
As a measure of uncertainty about the clustering provided by the algorithm, we define a quantity called posterior confusion matrix (PCM), whose entry PCM
stands for the average number of actor with maximum posterior allocation that will be allocated to . The PCM is a nonsymmetrical matrix computed as follows. For each MCMC iteration and summed across all units , we do the following steps:
Order the posterior probabilities from highest to lowest, and collect them in a vector ;

Define as the vector of cluster labels associated to , so that is the label of the cluster with highest posterior probability (which is ) for unit at iteration among all the possible ones;

Add posterior probability to the PCM at position , so that the diagonal element of the matrix account for the first choice of allocation of unit at iteration ;

While keeping row fixed as a pivotal quantity of this step, add the remaining probabilities to the corresponding positions in the PCM matrix .
To average the cumulative sums at each position of the matrix, we divide the PCM by the total number of MCMC iterations
. The nonrescaled version of the matrix has row sums equal to the number of units in each corresponding cluster. When rescaled by these row sums, the benchmark matrix for comparison is the identity matrix of order
, corresponding to a situation with no uncertainty in the classification.4 Simulation study
In our simulation study we compare the following algorithms: (i) the proposed model, manet
, which uses a finite mixture of Bernoulli distributions with overlapping components (as implemented in the package
manet); (ii) a finite mixture model of Bernoulli distribution with nonoverlapping components, named mixtbern, (iii) a variational method implementing the MixNet model of Daudin, Picard and Robin (2008), implemented in the R package mixer, which is a special case of the binary SBM proposed by Nowicki and Snijders (2001) and (iv) blockmodels, proposed by Leger (2015).4.1 Data generating process
We generate data according to our model with varying values for the number of actors , the number of events and the number of clusters . We consider (i.e. ) and set the components weights to be . We set the probabilities of attendances for the first event equal to and we define the remaining vectors to be all the possible permutations of the values in , by stacking the same values a number of times depending on the value of chosen.
Since blockmodels and mixer only work on actoractor data, for these two methods we transform the data to this structure by calculating the number of events attended by any two actors. This is sufficient for blockmodels, which accounts for weighted edges. Since mixer requires a binary input, we further dichotomize the network by setting a cutoff on the number of events. We select the threshold hat leads to the best results for each of the methods.
4.2 Classification performance
First we evaluate the performance of the methods in terms of classification ability of the actors in the 8 heir clusters. For this simulation, we set and consider three possible values for the number of events, namely . For each of the three values of , we generate 25 independent datasets. For this simulation, we present the true number of clusters to the algorithms, i.e. for our model or
for the competitors. To measure the performance of the four models we apply the MAP rule to the estimated probabilities of allocation and we cluster units accordingly. After the classification is performed, we compute the average misclassification error rate and the adjusted Rand index
(Rand, 1971) for each of the four models across the 25 replicates. The misclassification error rate measures the fraction of units wrongly allocated with respect to the true allocations used to generated the data, whereas the adjusted Rand index (ARI) is a measure between 0 and 1 representing similarity between two different clustering, where we take one of the two to be the true allocation pattern in the data.Table 1 reports the results of this simulation. In each subgroup defined by the value of , our model achieves simultaneously lower (better) average misclassification error rate and higher (better) average adjusted Rand index with respect to the other competitors. The closest in terms of performance is mixtbern, which however exhibits less stability. It is worth noticing that as the number of events, , increases so does the performance improvement in the classification task: this is true for all the models except mixer. The loss of performance for models blockmodels and mixer is partially expected due to the loss of information after transformation of the data into a onemode network.
Misclassification error rate (in %)  

Num. of events  actoractor  actorevent  
blockmodels  mixer  mixtbern  manet  
55.49 (3.11)  52.16 (2.23)  42.67 (5.96)  35.05 (3.99)  
43.07 (4.49)  46.89 (5.87)  20.89 (2.97)  15.33 (2.42)  
30.28 (4.76)  54.32 (7.32)  13.67 (4.14)  6.91 (1.53)  
Adjusted Rand index ()  
Num. of events  actoractor  actorevent  
blockmodels  mixer  mixtbern  manet  
0.22 (0.04)  0.15 (0.03)  0.34 (0.08)  0.45 (0.06)  
0.40 (0.06)  0.31 (0.08)  0.73 (0.05)  0.79 (0.04)  
0.60 (0.06)  0.27 (0.08)  0.85 (0.05)  0.93 (0.02) 
and four competing models; standard errors are reported between brackets. Models are categorized on the type of structure they analyze (
actoractor or actorevent); best results highlighted in bold.4.3 Convergence of parameters’ posterior distributions
In this section we focus on the convergence behavior of the posterior distributions of the attendance probabilities to the true values that produced the data. In particular, we use a fixed setting with , , letting the sample size vary as . We set the true values for the as described in Section 4.1. For each sample size, we simulate 25 replicated datasets and we collect all posterior samples (after burnin) of the same from each MCMC into one single chain. While this inevitably introduces some additional Monte Carlo error, the increased amount of available information should dampen this aggregation effect. Results are visualized in Figure 2. Rows of the plot grid correspond to events (specifically, we are reporting ) and columns are the attendance probabilities of those event for the three different primary clusters. As expected, increasing sample size (from , red curve, to , blue curve) the posterior distribution exhibits less variability, contracting around the true value, i.e., the vertical dashed line, used for the simulations. The same behavior is observed for the posterior distributions of the other and the posterior distribution of , the proportions of the mixture model (not shown).
4.4 Accuracy of model selection criterion
To show the behaviour of the DIC selection criterion discussed in Section 3.4, we simulate 25 replicated datasets with the following configuration: , , increasing sample sizes . For each dataset, we run the algorithm and provide three different values of . We compute the corresponding DIC values and select achieving the lowest one. When , we select in 80% of the replicated datasets; for the rest of the sample sizes , the DIC achieves its lowest value with in all the datasets.
5 Noordin Top terrorist network analysis
We analyze the terrorist dataset with information pertaining to terrorists (actors) and their attendance behavior to events of various nature, such as trainings, operations, bombings, financial and logistics meetings, together with their affiliations to a number of organizations, associated with the leader of the Indonesian terrorist network Noordin Top (Everton, 2012). Rather than leaving out the five lone wolf terrorists, we include them into the analysis.
We run our manet algorithm for 30,000 iterations with a generous burnin window of 15,000, to ensure convergence and in order to compute posterior quantities on samples not affected by labelswitching. The lowest computed DIC value for three possible values of corresponds to , and we therefore select parent clusters, corresponding to heir clusters.
The results are reported in Table 2. The first heir cluster, identifying units belonging to no parent cluster, contains 5 units who are the ‘lone wolves’ attending no event and discarded in Aitkin, Vu and Francis (2017) from their analysis. Only two units are allocated into the second heir cluster: these two individuals are Noordin Top and Azhari Husin, the leader and his main collaborator of the terrorists network, respectively. They form a separate cluster because of their peculiar behavior of participating to most of the 45 events, having the highest raw number of attendances, respectively 23 and 17, and being involved in many of the logistic, financial, and decisionmaking meetings. The third heir cluster is formed by 6 individuals sharing the same pattern of attendances and, in particular, being terrorists affiliated to a specific subgroup called ‘KOMPAK’. Finally, in the fourth heir cluster we find the rest of the terrorists such as trainees, henchmen, and religious leaders, who attend the 45 events with a pattern that is an overlap between the two parent clusters.
For comparison, we explore results from our direct competitor mixtbern. In both models with and , only three clusters are nonempty and the partitioning of the units into these mutually exclusive groups is as follows: 77 units into cluster 1, one unit (Noordin Top) into cluster 2, and one unit (Azhari Husin) into cluster 3. This suggests that the potential overlapping of the terrorists groups and patterns in attending events allowed by manet helps in identifying better the subgroups in the network. In addition, we can find similarities and differences with the analysis in Aitkin, Vu and Francis (2017). Firstly, in both analyses, aside from the ‘lone wolves’, data seem to point towards a 3groups structure. Secondly, while the ‘lone wolves’ are removed in the analysis of Aitkin, Vu and Francis (2017), we are able to naturally account for terrorists belonging to the network but showing no attendances to the events considered. Finally, Azhari Husin and Noordin Top are allocated together into a twounits group in both analyses, but terrorists’ memberships to the other two remaining clusters are more confused in Aitkin, Vu and Francis (2017) than with our model in terms of posterior allocations (see Figure 10 of their manuscript).
Clusters  N. of individuals  

parent cluster  heir cluster  
5  
2  
6  
66  
79 
5.1 Visualizing results and performances
Figure 4 visualizes the twomode (actorevent) Noordin Top network: red square vertexes are the events, with corresponding labeling; round vertexes are the terrorists, with a color scheme representation based on the clustering obtained with manet, and labelled with progressive numbers. Figure 3 provides a graphical representation of the posterior probabilities averaged across the MCMC iterations (after burnin). Each dot represents one of the 79 terrorists (the ‘lone wolves’ are removed for visualization purposes): lower – from left to right – axis of the ternary plot depicts the posterior probability to be allocated into a multiple allocation cluster ; similarly, the other two axes (left and right) measure the posterior probability to be allocated into cluster – top to bottom – or cluster – bottom to top. We can see almost all units bear no uncertainty about their membership to the clusters, except for two terrorists, row 25 and 55 of the matrix. In order to report the uncertainty of the classification for all the groups, we provide the (PCM) in Table 3. As we see from the table, the results are close to a situation with no confusion in the classification except for cluster . This is partially expected because the data matrix is very sparse and units in the multiple allocation cluster attend very few events. This means that the attendance profile, and the clusterspecific vector of event probabilities , for cluster and are indeed very similar, pushing the algorithm to distinguish less the two groups. However, as we saw in Table 2
, the ‘lone wolves’ are classified into cluster
, without any additional unit attending a low number of events.Rescaled PCM with ()  

Cluster  
0.66  0.00  0.00  0.34  
0.00  1.00  0.00  0.00  
0.00  0.00  0.94  0.06  
0.01  0.00  0.01  0.98 
6 Conclusions
In this paper, we have presented a novel finite mixture model and have shown its applicability to the clustering of actorevent data. We formulate the model in such as way that the actorevent data can be modeled directly without transforming it to the more traditional actoractor network data, with the inherent loss of information. The general formulation with potentially overlapping clusters allows for actors to belong to multiple communities. In particular, we have found out that the Noordin Top terrorist network, besides 5 lone wolf suicide bombers, consisted of a large group of 74 terrorists with within two distinct subgroups: the first consisting of 6 members of the KOMPAK terrorist organization and the second consisting of the 2 leaders, namely Top and Husin. This view of the terrorist network gives a more layered understanding of the mode of operation and allegiances within the organization.
We selected a Bayesian inference procedure for calculating the posterior distribution of the parameters in the model. By selecting appropriate conjugate prior distributions the MCMC sampler is efficient and convergence is typically fast. The proposed model is currently implemented in the
R package manet, available on CRAN. This Bayesian formulation easily allows extending the model with individual level covariates for group membership or event attendance probabilities.Acknowledgements
The authors would like to acknowledge the contribution of the COST Action CA15109, called COSTNET, which funded a visit of the first author to Brunel University London.
References

Airoldi et al. (2008)
[author] Airoldi, Edoardo ME. M., Blei, David MD. M., Fienberg, Stephen ES. E. Xing, Eric PE. P. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9 1981–2014.
 Aitkin, Vu and Francis (2017) [author] Aitkin, MurrayM., Vu, DuyD. Francis, BrianB. (2017). Statistical modelling of a terrorist network. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180 751–768.
 Celeux et al. (2006) [author] Celeux, GillesG., Forbes, FlorenceF., Robert, Christian PC. P., Titterington, D MichaelD. M. et al. (2006). Deviance information criteria for missing data models. Bayesian analysis 1 651–673.
 Daudin, Picard and Robin (2008) [author] Daudin, JJJ.J., Picard, FranckF. Robin, StéphaneS. (2008). A mixture model for random graphs. Statistics and computing 18 173–183.
 Davis et al. (1941) [author] Davis, AllisonA., Gardner, Burleigh BB. B., Gardner, Mary RM. R. Warner, W LloydW. L. (1941). Deep South: A Sociological Anthropological Study of Caste and Class. University of Chicago Press.
 Doreian, Batagelj and Ferligoj (2004) [author] Doreian, PatrickP., Batagelj, VladimirV. Ferligoj, AnuškaA. (2004). Generalized blockmodeling of twomode network data. Social networks 26 29–53.
 Durante and Dunson (2014) [author] Durante, DanieleD. Dunson, David B.D. B. (2014). Nonparametric Bayes dynamic modelling of relational data. Biometrika 101 883. 10.1093/biomet/asu040
 Everton (2012) [author] Everton, Sean FS. F. (2012). Disrupting dark networks 34. Cambridge University Press.
 Frank and Strauss (1986) [author] Frank, OveO. Strauss, DavidD. (1986). Markov graphs. Journal of the American Statistical Association 81 832–842.
 Freeman (2002) [author] Freeman, Linton C.L. C. (2002). Finding social groups: A metaanalysis of the southern women data. In In Breiger, R., Carley, C., and Pattison, P.Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers (pp 3997), Washington D.C.: National Research Council, The National Academies. Press.
 International Crisis Group (2009) [author] International Crisis Group (2009). Indonesia: Noordin Top’s Support Base. Asia Briefing N 95, (available at: http://www.refworld.org/docid/4a968a982.html).
 Handcock, Raftery and Tantrum (2007) [author] Handcock, Mark SM. S., Raftery, Adrian EA. E. Tantrum, Jeremy MJ. M. (2007). Modelbased clustering for social networks. Journal of the Royal Statistical Society: Series A (Statistics in Society) 170 301–354.
 Hoff, Raftery and Handcock (2002) [author] Hoff, Peter DP. D., Raftery, Adrian EA. E. Handcock, Mark SM. S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association 97 1090–1098.
 Holland, Laskey and Leinhardt (1983) [author] Holland, Paul WP. W., Laskey, Kathryn BlackmondK. B. Leinhardt, SamuelS. (1983). Stochastic blockmodels: First steps. Social networks 5 109–137.
 Kolaczyk (2009) [author] Kolaczyk, Eric D.E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer, New York.
 Leger (2015) [author] Leger, JBJ. (2015). Blockmodels: Latent and Stochastic Block Model Estimation by a ?VEM?Algorithm. R package version 1.
 Neal (2014) [author] Neal, ZacharyZ. (2014). The backbone of bipartite projections: Inferring relationships from coauthorship, cosponsorship, coattendance and other cobehaviors. Social Networks 39 84–97.
 Nowicki and Snijders (2001) [author] Nowicki, KrzysztofK. Snijders, Tom A BT. A. B. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96 1077–1087.
 Raftery et al. (2012) [author] Raftery, Adrian EA. E., Niu, XiaoyueX., Hoff, Peter DP. D. Yeung, Ka YeeK. Y. (2012). Fast inference for the latent space network model using a casecontrol approximate likelihood. Journal of Computational and Graphical Statistics 21 901–919.
 Ranciati, Viroli and Wit (2017) [author] Ranciati, SaverioS., Viroli, CinziaC. Wit, Ernst C.E. C. (2017). Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIPSeq data. Biometrical Journal doi:10.1002/bimj.201600131. http://onlinelibrary.wiley.com/doi/10.1002/bimj.201600131/full.
 Rand (1971) [author] Rand, William MW. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66 846–850.
 Robins et al. (2007) [author] Robins, GarryG., Snijders, TomT., Wang, PengP., Handcock, MarkM. Pattison, PhilippaP. (2007). Recent developments in exponential random graph (p*) models for social networks. Social networks 29 192–215.
 Sewell et al. (2017) [author] Sewell, Daniel KD. K., Chen, YuguoY. et al. (2017). Latent Space Approaches to Community Detection in Dynamic Networks. Bayesian Analysis 12 351–377.
 Signorelli and Wit (2017) [author] Signorelli, MirkoM. Wit, Ernst C.E. C. (2017). A penalized inference approach to stochastic block modelling of community structure in the Italian Parliament. Journal of the Royal Statistical Society, Series C (Applied Statistics) doi:10.1111/rssc.12234. http://onlinelibrary.wiley.com/doi/10.1111/rssc.12234/full.
 Spiegelhalter et al. (2002) [author] Spiegelhalter, David JD. J., Best, Nicola GN. G., Carlin, Bradley PB. P. Van Der Linde, AngelikaA. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 583–639.
 Wang and Wong (1987) [author] Wang, Yuchung JY. J. Wong, George YG. Y. (1987). Stochastic blockmodels for directed graphs. Journal of the American Statistical Association 82 8–19.
 Wasserman and Faust (1994) [author] Wasserman, StanleyS. Faust, KatherineK. (1994). Social network analysis: Methods and applications 8. Cambridge university press.

Wasserman and
Pattison (1996)
[author] Wasserman, StanleyS. Pattison, PhilippaP. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*. Psychometrika 61 401–425.
 Xing et al. (2010) [author] Xing, Eric PE. P., Fu, WenjieW., Song, LeL. et al. (2010). A statespace mixed membership blockmodel for dynamic network tomography. The Annals of Applied Statistics 4 535–566.