1 Introduction
Community detection and link prediction are two important problems in network analysis. A vast number of community detection algorithms based on various useful heuristics, such as modularity maximization
(Newman and Girvan, 2004) and clique percolation (Palla et al., 2005), have been proposed. See Fortunato (2010) for a comprehensive review. These algorithms, however, are not based on generative models and hence usually cannot be used to generate networks and predict missing edges (links). Moreover, how to set the number of communities is a critical issue that has not been well addressed by them. In this paper, we will fit unweighted undirected relational networks using nonparametric Bayesian generative models, which can be used to simulate random networks, detect latent overlapping communities and communitycommunity interactions, and predict missing edges, with the number of communities automatically inferred from the data.For a relational network, a community can be considered as a subset of nodes (vertices) that are densely connected to each other but sparsely to the others, such as those in a social network, or it can be considered as a subset of nodes that are sparsely connected to each other but densely connected to the nodes belonging to another community, such as those in a network consisting of carnivores and herbivores: tigers and bears both hunt deers but rarely prey on each other. The former phenomenon is usually described as assortativity or homophily, while the latter is known as dissortativity or stochastic equivalence (Hoff, 2008). As a relational network may exhibit both homophily and stochastic equivalence, an algorithm capable of modeling both phenomena would usually be preferred if no prior information on assortativity is available. If analyzing assortative networks with dense intracommunity connections is the main goal, then one may consider an assortative algorithm that models homophily but not necessarily stochastic equivalence.
The stochastic blockmodel (SBM) is a popular latent class model to detect latent communities (Holland et al., 1983; Nowicki and Snijders, 2001)
. It partitions the nodes into disjoint communities, and models the probability for an edge to exist between two nodes solely based on which two communities that they belong to. It is simple and scalable, and models both homophily and stochastic equivalence. In addition, the infinite relational model, a nonparametric Bayesian extension of the SBM based on the Chinese restaurant process
(Aldous, 1985), allows the number of communities to be automatically inferred from the data (Kemp et al., 2006). Despite these attractive properties, the SBM is restrictive in that communities are not allowed to overlap. In practice, however, it is common for a node to belong to multiple communities, motivating the development of more advanced latent class models, such as the mixedmembership stochastic blockmodel (MMSB) of Airoldi et al. (2008) and its various extensions (Gopalan et al., 2012; Kim et al., 2013). The MMSB generalizes the SBM to allow a node to participate in multiple communities, yet since it has to infer two community indicators for each pair of nodes, regardless of whether an edge exists in that pair, its computation grows quadratically as a function of the number of nodes . Moreover, the number of communities in the MMSB is a model parameter that needs to be carefully selected.In this paper, instead of clustering nodes, as in the SBM, or clustering all possible edges, as in the MMSB, we propose the edge partition model (EPM) to partition only the observed edges, which readily leads to the partition of nodes: if the edges linked to a node are partitioned into multiple communities, then the node is naturally affiliated with all these communities, and could be hard assigned to a single community that has the strongest presence in its edges. In contrast to the SBM, the EPM allows communities to overlap; and in contrast to the MMSB that spends computation clustering all possible edges, the EPM spends computation partitioning only observed edges, where is the average degree (number of edges) per node, leading to notable computational savings as is often much smaller than in a big sparse network commonly observed in practice.
To support a potentially infinite number of communities and to model both homophily and stochastic equivalence in an unweighted undirected relational network, we propose a hierarchical gamma process (HGP) EPM, which links each observed edge to a latent count using a BernoulliPoisson link, and then factorizes the latent
random count matrix. The HGP supports the EPM to have an infinite dimensional feature vector for each node to describe its affiliations with communities, and an infinite dimensional square rate matrix, whose diagonal and offdiagonal elements describe the intra and inter community interactions, respectively. We also propose a gamma process EPM as a simplified version of the HGPEPM, which omits intercommunity interactions to gain simpler inference and faster computation at the expense of reduced ability to model stochastic equivalence.
Conceptually, our idea of directly partitioning edges and implicitly partitioning nodes into communities is related to the one in Ahn et al. (2010) and Evans and Lambiotte (2009). In terms of construction, our EPMs are related to the Poisson factor models of Ball et al. (2011) and the Eigenmodel of Hoff (2008). In terms of supporting an infinite number of features, our EPMs are related to the models in Miller et al. (2009) and Morup et al. (2011) that use the Indian buffet process of Griffiths and Ghahramani (2005) to support an infinite binary feature matrix. The proposed models depart from existing ones with several distinctions: 1) a BernoulliPoisson link connects each edge to a latent count that is further partitioned; 2) a hierarchical gamma process is constructed to support an infinite number of communities and an infinitedimensional square matrix to describe communitycommunity interactions; 3) two nonparametric Bayesian EPMs are constructed to factorize the
binary adjacency matrix under the BernoulliPoisson link, supporting a nonnegative feature matrix with an unbounded number of columns, and at the same time assign each edge and hence each node to one or multiple latent communities; and 4) efficient and scalable Bayesian inference via Gibbs sampling is provided.
2 Factor Analysis and BernoulliPoisson Link
Our basic idea is to factorize the BINARY network adjacency matrix using tools developed for COUNT data analysis, and to discover overlapping communities and their interactions by examining how the latent count for each edge is partitioned. This section will primarily discuss individual model components and their properties, with hierarchical Bayesian models presented later.
2.1 Poisson Factor Analysis
We propose a Poisson factor model for a weighted undirected relational network as
(1) 
where is the integervalued weight (observed or latent) that links nodes and , is the positive feature vector for node , is a positive rate, and the symbol denotes “equal by definition.” This model is conceptually simple: with measuring how strongly node is affiliated with community and measuring how strongly communities and interact with each other, the product measures how strongly nodes and are connected due to their affiliations with communities and , respectively, and a weighted combination of all intracommunity weights and intercommunity ones is the expected value of .
The factor model in (1) makes intuitive sense. For example, suppose persons and are both residents of City Avatar and active members of the Avatar anglers Meetup group that organizes fishing trips regularly. In addition, persons
is an active member of the Avatar artificial intelligence (AI) Meetup group while person
is an active member of the Avatar statistics Meetup group. Denoting as the number of times that and attend the same group meeting in 2015, then due to their strong affiliations with the anglers Meetup group, would have a large expected value, which is likely to be further increased if the AI and statistics Meetup groups hold joint events regularly.To model an assortative relational network exhibiting homophily but not necessarily stochastic equivalence, we may omit the intercommunity interactions by letting for and simplify (1) as
(2) 
where indicates the prevalence of community , and two nodes with similar latent features are encouraged to be linked by an edge with a large weight.
We note Ball et al. (2011) had examined a model related to (2) and briefly mentioned a model related to (1
). However, they used a heuristic approach to model binary data under the Poisson distribution, did not provide a principled way to set the number of communities
, and had to create possibly nonexistent selfedges in order to derive tractable expectationmaximization (EM) inference. This paper will address all these issues rigorously, in a nonparametric Bayesian manner, and carefully examine the models in both (
2) and (1) and provide efficient Bayesian inference.2.2 BernoulliPoisson Link
To use the Poisson factor models in (1) and (2
) for an unweighted network with a binary adjacency matrix, we introduce a BernoulliPoisson (BerPo) link function that thresholds a random count at one to obtain a random variable in
as(3) 
where if and if
. The intuition is that two nodes are connected if they interact at least once. The mathematical motivation is after transforming a binarymodeling problem into a countmodeling one, one is readily equipped with a rich set of statistical tools developed for count data analysis using the Poisson and negative binomial distributions.
If is marginalized out from (3), then given , one obtains a Bernoulli random variable as
The conditional posterior of can be expressed as
where follows a truncated Poisson distribution, with for . Thus if , then almost surely (a.s.), and if , then , which can be simulated with rejection sampling: if , we draw till ; and if , we draw both and till , and then let . The acceptance rate is if and if , and reaches its minimum, 63.2%, when .
The BerPo link shares some similarities with the probit link that thresholds a normal random variable at zero, and the logit link that lets
. We advocate the BerPo link as an alternative to the probit and logit links since if , thena.s., which could lead to significant computational savings if a considerable proportion of the data are equal to zero. In addition, the additive property of the Poisson allows us to model the link strength between any two nodes by aggregating the contributions of all possible intra and inter community interactions, and the conjugacy between the Poisson and gamma distributions makes it convenient to construct hierarchical Bayesian models amenable to posterior simulation.
2.3 Overlapping Community Structures
Note that (1) can be augmented as
(4) 
where represents how often nodes and interact due to their affiliations with communities and , respectively. We may consider that the model is partitioning the count into , and hence we call the Poisson factor model in (1) together with the BerPo link in (3) as an edge partition model (EPM), in which each edge is partitioned according to all possible communitycommunity interactions, and how strongly node is affiliated with community can be measured with , where
(5) 
represents how strongly node would interact with all the other nodes through its affiliation with community . We further introduce the latent count
(6) 
to represent how often node is connected to the other nodes due to its affiliation with community . We can then assign node to multiple communities in , or (hard) assign it to a single community using either or . Similar analysis applies to a simpler EPM built on (2). By hard assigning each node to a single community and ordering the nodes from the same community to be adjacent to each other, we expect the ordered adjacency matrix to exhibit a block structure, where the blocks along and off the diagonal represent the intra and inter community connections, respectively.
2.4 Scalability for Big Sparse Networks
We are motivated to construct the EPMs because they not only allow each edge and hence each node to participate in multiple communities, but also readily scale to a big sparse network whose average degree per node is much smaller than . A key observation for scalable computation is that (1) can be augmented as (4), where a.s. for any and if no edge exists between nodes and (i.e., ). On a sparse network, where the edges constitute only a small portion of all possible edges, this property makes the EPMs computationally appealing. By contrast, conceptually related models, including the MMSB of Airoldi et al. (2008), Eigenmodel of Hoff (2008) and latent feature relational model of Miller et al. (2009), spend computation indiscriminately on all pairs of nodes no matter whether an edge exists between nodes and , and hence they have computation and do not scale well as increases.
3 Edge Partition Models
3.1 Hierarchical Gamma Process
The EPM takes a weighted combination of all possible intra and inter community weights to explain each pair of node, however, the number of communities is still a model parameter that needs to be set appropriately. To allow to be inferred from the data and potentially grow to infinity, we need to introduce a stochastic process that can generate a countably infinite number of atoms , where measures how strongly the nodes are affiliated with community , and an infinite dimensional square matrix , where measures how strongly communities and interact with each other. Moreover, we need to ensure to be finite a.s. and we may wish to impose some structural regularization on the infinite square matrix.
To satisfy all these needs, we first define
(7) 
as a gamma process on a product space , where , is a complete separable metric space, is a positive scale parameter, and is a finite and continuous base measure, such that for each Borel set (Ferguson, 1973; Kingman, 1993). The Lévy measure of the gamma process can be expressed as , and a draw from the gamma process, consisting of countably infinite atoms, can be expressed as , where , , is the base distribution, and is the mass parameter. A gamma process based model has an inherent shrinkage mechanism, as in the prior the number of atoms with greater than follows , whose Poisson rate decreases as increases.
Given , we further define a relational gamma process () as
(8) 
a draw from which is defined as
where and are both in , , and
Given a relational gamma process draw , we generate a binary adjacency matrix as
(9) 
Equations (9), (8) and (7) constitute an HGPEPM that supports countably infinite atoms and a countably infinite square matrix, the total sum of whose elements has a finite expectation, as shown in the following Lemma, with proof provided in the Appendix.
Lemma 1.
The expectation of is finite and can be expressed as
The usual scenario to consider an HGP construction is when one models grouped data and wishes to share statistical strengths across groups. For example, the gammanegative binomial process of Zhou and Carin (2012), related to the hierarchical Dirichlet process of Teh et al. (2006), is considered for topic modeling, where each document is associated with a gamma process, and these gamma processes are coupled by sharing a lowerlevel (, further from the data) gamma process as their atomic base measure. The proposed HGP is distinct in that the product of the weights of any two atoms of the lowerlevel gamma process is used to parameterize the shape parameter of a gamma random variable higher in the hierarchy.
The proposed HGP also helps express our prior belief that an atom with a small weight tends to represent a small community, which also tends to interact with the others less frequently. Note that if we let , then the expectation of the matrix given has a rank of one. We use instead of as the shape parameter of to allow to be inferred with Gibbs sampling and to prevent overly shrinking for small communities. Note that Palla et al. (2014)
proposed a reversible infinite hidden Markov model using a related HGP infinite square rate matrix, the normalization of whose each row represents a state transition probability vector. Our HGP serves a distinct modeling purpose; no normalization is required for the infinite square rate matrix, and our model allows exploiting unique data augmentation techniques to infer both
and with closedform Gibbs sampling update equations, as discussed in Section 3.4 and the Appendix.3.2 Hierarchical Gamma Process EPM
We choose the base distribution of the gamma process as . For implementation convenience, we consider a discrete base measure as , where is a truncation level that is set large enough to ensure a good approximation to the truly infinite model. We express the (truncated) HGPEPM as
(10) 
3.3 Gamma Process EPM
If we omit intercommunity interactions by letting for and , then the HGPEPM reduces to a gamma process EPM (GPEPM), which is likely to well fit assortative networks but not necessarily disassortative ones. We notice an interesting connection to the communityaffiliation graph model (AGM) of Yang and Leskovec (2012, 2014): the GPEPM generates an edge with probability
(12) 
if we define and further impose the restriction that , then (12) reduces to
(13) 
where is a set of communities that nodes and share; note that (13) is almost the same as the AGM of Yang and Leskovec (2012, 2014). In fact, one may consider the GPEPM with where and , as a nonparametric Bayesian AGM. Similarly, we also notice that (11) of the HGPEPM is related to the model of Morup et al. (2011) if we restrict .
Yang and Leskovec (2012, 2014) argue that all previous community detection methods, including clique percolation and MMSB, would fail to detect communities with dense overlaps, because they all had a hidden assumption that a community’s overlapping parts are less densely connected than its nonoverlapping ones. The same as the AGM, both the GPEPM and HGPEPM do not make such a restrictive assumption, and they both allow overlaps of communities to be denser than communities themselves; Beyond the AGM, we do not restrict to be either zero or one, and our generative models are built under a rigorous nonparametric Bayesian framework with efficient Bayesian inference, as presented below.
3.4 MCMC Inference
In this paper, we consider an unweighted undirected network, where and selflinks are not defined. Thus we only consider for in (10). Let be defined as in (6) and as
where if and otherwise. Using (5) and the Poisson additive property, we have
(14)  
(15) 
where represents how strongly the nodes interact through communities and . Marginalizing out from (14) and from (15), with and , we have
(16)  
(17) 
Using the BerPo link, the gammaPoisson conjugacy, and the augmentandconquer techniques to infer the negative binomial dispersion parameters (Zhou and Carin, 2012, 2015), we exploit (14)(17) to derive closedform Gibbs sampling update equations for all model parameters except , and construct an excellent proposal distribution to sample using an independence chain MetropolisHastings algorithm. We present in the Appendix the details of MCMC inference for the HGPEPM, and the hierarchical model and closedform Gibbs sampling update equations for the GPEPM. The inference of the nonparametric Bayesian AGM would be almost the same as that of the GPEPM, with the only difference that the
would be sampled from Bernoulli distributions.
4 Experimental Results
For comparison, we consider the infinite relational model (IRM) of Kemp et al. (2006), the Eigenmodel of Hoff (2008), the infinite latent attribute (ILA) model of Palla et al. (2012), the AGM of Yang and Leskovec (2012, 2014), and our GP and HGPEPMs. We use the R package provided for the Eigenmodel. We use the ILA code^{1}^{1}1http://mlg.eng.cam.ac.uk/konstantina/ILA/ILAcode(v1).tar.gz provided for Palla et al. (2012), in which it is shown that the ILA outperforms the related nonparametric latent feature relational model of Miller et al. (2009). We implement a nonparametric Bayesian version of the AGM as a special case of the GPEPM, as discussed in Section 3.3. Matlab code for the EPMs is available on the author’s website.
For the Eigenmodel, we find the best in . For the ILA, we use its default parameter setting^{2}^{2}2The default training/testing partition of the ILA code sends selfedges into the testing set; whereas in this paper, we do not intend to predict selfedges and hence we do not allow them to appear in the testing set. . For the IRM, we choose as the prior for each latent block and as the prior for the Chinese restaurant process concentration parameter; for the nonparametric Bayesian AGM, we let and ; these parameters are found to consistently provide good performance. For our models’ hyperparameters, we choose and let , , and be all drawn from .
We consider 3000 MCMC iterations and collect the last 1500 samples, unless otherwise stated. We consider two smallscale benchmark networks, for which we test all algorithms and set the truncation level as for our algorithms, and another two networks with more than 2000 nodes, for which we set .
To test a model’s ability to predict missing edges of an unweighted undirected relational network, we randomly^{3}^{3}3If removing an edge disconnects a node to all the others, then the edge will be kept in the training set. hold out 20% pairs of nodes and use the the remaining 80% to predict the probability for an edge to exist in each of these heldout pairs. Letting if is held out and otherwise, we only need to slightly modify the inference by only considering in the likelihood. For example, in (5) would be redefined as . We consider exactly the same five random trainingtesting partitions for all algorithms and report the average area under the curve (AUC) of both the receiver operating characteristic (ROC) and precisionrecall (PR) curves (Davis and Goadrich, 2006). For link prediction, the AUCPR is more sensitive to the percentage of true edges among the top ranked ones. Note that in addition to link prediction, the HGPEPM, GPEPM, AGM and IRM all have easily interpretable latent representations that will be used to detect overlapping/disjoint communities.
4.1 Protein230 Network
We first consider the Protein230 dataset of Butland et al. (2005) that describes the interactions between 230 proteins, with 595 edges. This is a smallscale benchmark network that exhibits both homophily and stochastic equivalence, as shown in Hoff (2008) and also tested in Lloyd et al. (2012). We are able to run 3000 MCMC iterations quickly enough for all algorithms except for the ILA on this network.
As shown in Tab. 1, the HGPEPM has the best overall performance. The Eigenmodel is the second best with
and the IRM is the third best. The AGM is not competitive as it restricts its features to be binary. In this and all future tables, we highlight in bold both the best result and the ones that are less than one standard error away from the best. Below we analyze why the HGPEPM performs the best while the simpler GPEPM is not that competitive on this dataset.
Model  AUCROC  AUCPR 

IRM  0.9338 0.0128  0.5026 0.0676 
Eigenmodel  0.9314 0.0188  0.5468 0.0500 
ILA  0.8971 0.0297  0.3693 0.0234 
AGM  0.9145 0.0160  0.3339 0.0359 
GPEPM  0.9335 0.0110  0.4011 0.0452 
HGPEPM  0.9519 0.0100  0.5655 0.0505 
As shown in Figs. 1 (b)(d), the HGPEPM captures both homophily and stochastic equivalence by accurately modeling both diagonal and offdiagonal dense regions of the adjacency matrix; the GPEPM captures homophily by accurately modeling diagonal dense regions that represent intracommunity interactions, but at the expense of creating nonexistent blocks in order to fit dense offdiagonal regions that represent strong intercommunity interactions; and the IRM captures these large dense blocks, but produces a cartoonish estimation, which overlooks small communities that represent fine details along the diagonal.
Fig. 2 shows how the HGPEPM works. First, each feature vector shown in Fig. 2 (a) clearly describes how strongly the nodes are affiliated with the community it represents, and each node may have large weights on multiple community. Second, about 30 latent feature vectors are inferred and the remaining ones are essentially drawn from the prior . Third, the inter and intracommunity interaction strengths in Fig. 2 (b) can be matched to the corresponding communities (subsets of nodes) in Figs. 1 (a) and (b). For example, Fig. 2 (a) suggests that the first and second largest communities have 24 and 22 nodes, respectively, and Fig. 2 (b) suggests that the first and second communities have sparse and dense intracommunity connections, respectively, and have denser connections between them, as confirmed by examining the block structures within the topleft corner of both Figs. 1 (a) and (b).
4.2 NIPS234 Coauthor Network
We consider the smallscale NIPS234 network consists of the top 234 authors in NIPS 117 conferences^{4}^{4}4http://chechiklab.biu.ac.il/gal/data.html in terms of the number of publications, as studied in Miller et al. (2009). There are 598 edges. As shown in Tab. 2, the GPEPM and HGPEPM have the best overall performance, followed by the IRM. Comparing with the simpler GPEPM, the extra flexibility to model stochastic equivalence does not bring the HGPEPM additional advantages on this dataset, which is not surprising as Fig. 3 suggests that this coauthor network mainly exhibits homophily. Note that the IRM performs well measured by the AUCROC, but its AUCPR is clearly worse than those of the EPMs. This may again be explained by its overly smoothed cartoonish estimation that overlooks small communities, as clearly shown in Fig. 3 (d).
Model  AUCROC  AUCPR 

IRM  0.9476 0.0114  0.6677 0.0201 
Eigenmodel  0.9269 0.0177  0.6784 0.0364 
ILA  0.9171 0.0222  0.6793 0.0295 
AGM  0.8906 0.0164  0.5842 0.0357 
GPEPM  0.9501 0.0123  0.7415 0.0319 
HGPEPM  0.9469 0.0163  0.7289 0.0540 
4.3 Yeast and NIPS12 Networks
We also consider the Yeast^{5}^{5}5http://vlado.fmf.unilj.si/pub/networks/data/bio/Yeast/Yeast.htm protein interaction network of Bu et al. (2003), with 2361 nodes and 6646 nonself edges, and the NIPS12 coauthor network^{6}^{6}6http://www.cs.nyu.edu/roweis/data.html that includes all the 2037 authors in NIPS papers vols 012, with 3134 edges. These two mediansize networks are already too large for the Eigenmodel and ILA to produce reasonable results given our computational resources. The results in Tabs. 4 and 4 show that the HGPEPM performs the best on the Yeast proteinprotein interaction network, which is found to clearly exhibit stochastic equivalence by examining the plots corresponding to the ones in Figs. 1 and 3 (not shown for brevity), and the HGPEPM and GPEPM both perform well on the NIPS12 coauthor network, which is found to mainly exhibit homophily by examining related plots (not shown for brevity).
As discussed before, the HGPEPM, GPEPM, AGM and IRM can all be used to assign nodes to disjoint communities. In Fig. 4 we plot the size of an inferred latent community as a function of its rank (smaller ranks indicate larger sizes) on the log10 scale, for the four scalable algorithms on the four tested real networks. It is clear that in contrast to the other three latent factor models, the IRM, a latent class model, infers a smaller number of communities, with more largersize and fewer smallersize ones. Examining the details we find that the IRM tends to place all the lowdegree nodes into one or several largesize communities, whereas the other models are able to better preserve fine details involving smallsize communities.
Model  AUCROC  AUCPR 

IRM  0.9093 0.0059  0.1878 0.0142 
AGM  0.9009 0.0025  0.1225 0.0129 
GPEPM  0.9331 0.0014  0.2486 0.0149 
HGPEPM  0.9367 0.0012  0.2628 0.0184 
Model  AUCROC  AUCPR 

IRM  0.9427 0.0121  0.2066 0.0331 
AGM  0.9328 0.0049  0.2350 0.0177 
GPEPM  0.9768 0.0079  0.4705 0.0362 
HGPEPM  0.9762 0.0081  0.4493 0.0229 
We mention that the HGPEPM, GPEPM and AGM have computation, whereas the Eigenmodel and ILA have at least computation, where is the number of latent features. With unoptimized Matlab on a 2.7 GHz CPU, for 1000 MCMC iterations, the HGPEPM (GPEPM) takes about 80 (20) seconds on Protein230, about 85 (28) seconds on NIPS234, about 50 (18) minutes on Yeast, and about 32 (12) minutes on NIPS12. The Eigenmodel with takes about 200 seconds on NIPS234 to run 1000 MCMC iterations. For the ILA on NIPS234, we considered 1000 MCMC iterations that took over 18 hours to run; for Protein230, the ILA inferred about two times more features as it did on NIPS234, and we considered 500 MCMC iterations that took over 21 hours to run.
5 Conclusions
To model unweighted undirected relational networks characterized by both homophily and stochastic equivalence, we propose a hierarchical gamma process edge partition model (EPM) that supports an infinite number of communities and an infinite square rate matrix to describe communitycommunity interactions. The EPM exploits a BernoulliPoisson link to assign a latent count to each binary edge, and further partitions that count according to the edge’s affiliations with all pairs of communities, which naturally leads to the partition of each node into overlapping communities. We also provide a simpler gamma process EPM that omits intercommunity interactions, which is found to perform well on assortative networks. Efficient MCMC inference with closedform update equations is provided. Experimental results on four real networks illustrate the EPMs’ working mechanisms and properties, as well as their stateoftheart performance and interpretable latent representations. While previous latent feature relational models and their nonparametric Bayesian versions are often not scalable, our infinite EPMs are readily scalable to networks with thousands of nodes. It would be interesting to investigate strategies to make them scalable to relational networks with millions of nodes and edges.
References
 Ahn et al. (2010) Y.Y. Ahn, J. P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature, pages 761–764, 2010.

Airoldi et al. (2008)
E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing.
Mixed membership stochastic blockmodels.
Journal of Machine Learning Research
, pages 1981–2014, 2008.  Aldous (1985) D. Aldous. Exchangeability and related topics. École d’ete de probabilités de SaintFlour XIII1983, pages 1–198, 1985.
 Ball et al. (2011) B. Ball, B. Karrer, and M. E. J. Newman. Efficient and principled method for detecting communities in networks. Physical Review E, 2011.
 Bu et al. (2003) D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen. Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic acids research, pages 2443–2450, 2003.
 Butland et al. (2005) G. Butland, J. M. PeregrínAlvarez, J. Li, W. Yang, X. Yang, V. Canadien, A. Starostine, D. Richards, B. Beattie, N. Krogan, M. Davey, J. Parkinson, J. Greenblatt, and A. Emili. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature, pages 531–537, 2005.
 Davis and Goadrich (2006) J. Davis and M. Goadrich. The relationship between PrecisionRecall and ROC curves. In ICML, 2006.
 Evans and Lambiotte (2009) T. S. Evans and R. Lambiotte. Line graphs, link partitions, and overlapping communities. Physical Review E, page 016105, 2009.
 Ferguson (1973) T. S. Ferguson. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1973.
 Fortunato (2010) S. Fortunato. Community detection in graphs. Physics Reports, pages 75–174, 2010.
 Gopalan et al. (2012) P. Gopalan, S. Gerrish, M. Freedman, D. M. Blei, and D. M. Mimno. Scalable inference of overlapping communities. In NIPS, pages 2249–2257, 2012.
 Griffiths and Ghahramani (2005) T. L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian buffet process. In NIPS, 2005.
 Hoff (2008) P. Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. In NIPS, 2008.
 Holland et al. (1983) P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, pages 109–137, 1983.
 Hoover (1982) D. N. Hoover. Rowcolumn exchangeability and a general model for exchangeability. In G. Koch and F. Spizzichino, editors, Exchangeability in Probability and Statistics. 1982.
 Kemp et al. (2006) C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In AAAI, 2006.
 Kim et al. (2013) D. I. Kim, P. Gopalan, D. M. Blei, and E. Sudderth. Efficient online inference for Bayesian nonparametric relational models. In NIPS, pages 962–970, 2013.
 Kingman (1993) J. F. C. Kingman. Poisson Processes. Oxford University Press, 1993.
 Lloyd et al. (2012) J. Lloyd, P. Orbanz, Z. Ghahramani, and D. Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In NIPS, 2012.
 Miller et al. (2009) K. Miller, M. I. Jordan, and T. L. Griffiths. Nonparametric latent feature models for link prediction. In NIPS, 2009.
 Morup et al. (2011) M. Morup, M. N. Schmidt, and L. K. Hansen. Infinite multiple membership relational modeling for complex networks. In IEEE MLSP, 2011.
 Newman and Girvan (2004) M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, page 026113, 2004.
 Nowicki and Snijders (2001) K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. JASA, pages 1077–1087, 2001.
 Palla et al. (2005) G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, pages 814–818, 2005.
 Palla et al. (2014) K. Palla, D. Knowles, and Z. Ghahramani. A reversible infinite HMM using normalised random measures. In ICML, 2014.
 Palla et al. (2012) K. Palla, D. A. Knowles, and Z. Ghahramani. An infinite latent attribute model for network data. In ICML. 2012.
 Teh et al. (2006) Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. JASA, 2006.
 Yang and Leskovec (2012) J. Yang and J. Leskovec. Communityaffiliation graph model for overlapping network community detection. In ICDM, 2012.
 Yang and Leskovec (2014) J. Yang and J. Leskovec. Structure and overlaps of groundtruth communities in networks. ACM Trans. Intell. Syst. Technol., pages 26:1–26:35, 2014.
 Zhou and Carin (2012) M. Zhou and L. Carin. Augmentandconquer negative binomial processes. In NIPS, 2012.
 Zhou and Carin (2015) M. Zhou and L. Carin. Negative binomial process count and mixture modeling. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015.
 Zhou et al. (2012) M. Zhou, L. Hannah, D. Dunson, and L. Carin. Betanegative binomial process and Poisson factor analysis. In AISTATS, 2012.
Infinite
Edge Partition Models for Overlapping
Community Detection and Link Prediction: Appendix
References
 Ahn et al. (2010) Y.Y. Ahn, J. P. Bagrow, and S. Lehmann. Link communities reveal multiscale complexity in networks. Nature, pages 761–764, 2010.

Airoldi et al. (2008)
E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing.
Mixed membership stochastic blockmodels.
Journal of Machine Learning Research
, pages 1981–2014, 2008.  Aldous (1985) D. Aldous. Exchangeability and related topics. École d’ete de probabilités de SaintFlour XIII1983, pages 1–198, 1985.
 Ball et al. (2011) B. Ball, B. Karrer, and M. E. J. Newman. Efficient and principled method for detecting communities in networks. Physical Review E, 2011.
 Bu et al. (2003) D. Bu, Y. Zhao, L. Cai, H. Xue, X. Zhu, H. Lu, J. Zhang, S. Sun, L. Ling, N. Zhang, G. Li, and R. Chen. Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic acids research, pages 2443–2450, 2003.
 Butland et al. (2005) G. Butland, J. M. PeregrínAlvarez, J. Li, W. Yang, X. Yang, V. Canadien, A. Starostine, D. Richards, B. Beattie, N. Krogan, M. Davey, J. Parkinson, J. Greenblatt, and A. Emili. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature, pages 531–537, 2005.
 Davis and Goadrich (2006) J. Davis and M. Goadrich. The relationship between PrecisionRecall and ROC curves. In ICML, 2006.
 Evans and Lambiotte (2009) T. S. Evans and R. Lambiotte. Line graphs, link partitions, and overlapping communities. Physical Review E, page 016105, 2009.
 Ferguson (1973) T. S. Ferguson. A Bayesian analysis of some nonparametric problems. Ann. Statist., 1973.
 Fortunato (2010) S. Fortunato. Community detection in graphs. Physics Reports, pages 75–174, 2010.
 Gopalan et al. (2012) P. Gopalan, S. Gerrish, M. Freedman, D. M. Blei, and D. M. Mimno. Scalable inference of overlapping communities. In NIPS, pages 2249–2257, 2012.
 Griffiths and Ghahramani (2005) T. L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian buffet process. In NIPS, 2005.
 Hoff (2008) P. Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. In NIPS, 2008.
 Holland et al. (1983) P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, pages 109–137, 1983.
 Hoover (1982) D. N. Hoover. Rowcolumn exchangeability and a general model for exchangeability. In G. Koch and F. Spizzichino, editors, Exchangeability in Probability and Statistics. 1982.
 Kemp et al. (2006) C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an infinite relational model. In AAAI, 2006.
 Kim et al. (2013) D. I. Kim, P. Gopalan, D. M. Blei, and E. Sudderth. Efficient online inference for Bayesian nonparametric relational models. In NIPS, pages 962–970, 2013.
 Kingman (1993) J. F. C. Kingman. Poisson Processes. Oxford University Press, 1993.
 Lloyd et al. (2012) J. Lloyd, P. Orbanz, Z. Ghahramani, and D. Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In NIPS, 2012.
 Miller et al. (2009) K. Miller, M. I. Jordan, and T. L. Griffiths. Nonparametric latent feature models for link prediction. In NIPS, 2009.
 Morup et al. (2011) M. Morup, M. N. Schmidt, and L. K. Hansen. Infinite multiple membership relational modeling for complex networks. In IEEE MLSP, 2011.
 Newman and Girvan (2004) M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, page 026113, 2004.
 Nowicki and Snijders (2001) K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. JASA, pages 1077–1087, 2001.
 Palla et al. (2005) G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, pages 814–818, 2005.
 Palla et al. (2014) K. Palla, D. Knowles, and Z. Ghahramani. A reversible infinite HMM using normalised random measures. In ICML, 2014.
 Palla et al. (2012) K. Palla, D. A. Knowles, and Z. Ghahramani. An infinite latent attribute model for network data. In ICML. 2012.
 Teh et al. (2006) Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. JASA, 2006.
 Yang and Leskovec (2012) J. Yang and J. Leskovec. Communityaffiliation graph model for overlapping network community detection. In ICDM, 2012.
 Yang and Leskovec (2014) J. Yang and J. Leskovec. Structure and overlaps of groundtruth communities in networks. ACM Trans. Intell. Syst. Technol., pages 26:1–26:35, 2014.
 Zhou and Carin (2012) M. Zhou and L. Carin. Augmentandconquer negative binomial processes. In NIPS, 2012.
 Zhou and Carin (2015) M. Zhou and L. Carin. Negative binomial process count and mixture modeling. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015.
 Zhou et al. (2012) M. Zhou, L. Hannah, D. Dunson, and L. Carin. Betanegative binomial process and Poisson factor analysis. In AISTATS, 2012.
Appendix A Proof for Lemma 1
Using the law of total expectation, we have
Using Campbell’s theorem (Kingman, 1993), we have
The proof is completed by further using and . ∎
Appendix B MCMC Inference for HGPEPM
Sample . As in Section 2.2, we sample a latent count for each as
(18) 
Sample . Using the relationships between the Poisson and multinomial distributions, similar to the derivation in Zhou et al. (2012), we partition the latent count as
(19) 
Note that in each MCMC iteration we store and but not necessarily in the memory.
Sample .
Using (16) and the data augmentation technique developed in Zhou and Carin (2012, 2015) for the negative binomial distribution,
we sample as
(20) 
where with a slight abuse of notation, but for added conciseness, we use to represent .
Sample .
Using (14) and the gammaPoisson conjugacy,
we have
(21) 
Sample . Similar to the inference of , using (17), we sample as
(22) 
Sample . We resample the auxiliary variables using the updated and then sample as
(23) 
Sample . Using (15) and the gammaPoisson conjugacy, we have
(24) 
Sample , and . They can be sampled from gamma distributions using the conjugacy between gamma distributions, omitted here for brevity.
Sample . As show in Lemma 1, the mass parameter plays an important role in determining the total sum of the infinite rate matrix . Our experiments show that it could be used as a tuning parameter to impose one’s prior preference
on the number of active communities to be discovered. In this paper, we impose a gamma prior as to let the data infer the posterior of .
We employ an independence chain MetropolisHastings algorithm to sample , with
the proposal distribution constructed as
(25) 
where and
We accept with probability , where is
which is usually greater than 50% for the networks considered in this paper.
Appendix C Gamma Process EPM
The gamma process EPM differs from the HGPEPM in that it omits intercommunity interactions, which leads to a simpler hierarchical model and faster computation at the expense of reduced ability to model stochastic equivalence. It is found to have good performance on assortative networks but not necessarily on disassortative ones.
c.1 Hierarchical Model
The (truncated) gamma process EPM is expressed as
(26) 
where the prior is also imposed on and . As , we recover the (exact) gamma process with a finite and continuous base measure. We usually set to be large enough to ensure a good approximation to the truly infinite model.
Note that if we marginalize out both and , then we have
c.2 Gibbs Sampling
Let the latent counts and be defined as
Using the Poisson additive property, we have
(27)  
(28) 
Appendix D Gamma Process AGM
Closely related to the gamma process EPM, the hierarchical model for the (truncated) gamma process AGM can be expressed as
Comments
There are no comments yet.