1 Introduction
Simple graphs are composed of a set of vertices with undirected connections between them. The graph may represent a set of friendship relationships between individuals, a physical infrastructure network, or a proteinprotein interaction network. Defining flexible and realistic statistical graph models is of great importance in order to perform link prediction or for uncovering interpretable latent structure, and has been the subject of a large body of work in recent years, see e.g. (Newman, 2009; Kolaczyk, 2009; Goldenberg et al., 2010).
Our objective is to develop a class of models with interpretable parameters and realistic asymptotic properties. Of particular interest for this paper are the notions of sparsity and scalefreeness. A sequence of graphs is said to be sparse if the number of edges scales subquadratically with the number of nodes. The degree of a node is the number of connections of that node. The sequence of graphs is said to be scalefree if the proportion of nodes of degree is approximately when the number of nodes is large, where the exponent is greater than 1. That is, for large , the degree distribution behaves like a powerlaw. These notions of sparsity and scalefreeness have received a lot of attention in the network literature in the past years (Barabási and Albert, 1999; Newman, 2009; Orbanz and Roy, 2015; Barabási, 2016; Caron and Fox, 2017); some authors argued that they are desirable properties of random graph models, and that many networks exhibit this scalefree behavior, usually with an exponent . Other authors have recently challenged the scalefree assumption, showing that a powerlaw distribution with exponential cutoff provides a good fit to many realworld networks (Newman, 2009; Broido and Clauset, 2018), see the appendix for more discussion about testing for network scalefreeness. Besides these global asymptotic properties, we are also interested in capturing some latent structure in graphs. Individuals may belong to some latent communities, and their level of affiliation to the community defines the probability that two nodes connect.
We propose a class of sparse graph models with overlapping community structure and wellspecified asymptotic degree distributions. The graph can either be scalefree with exponent , or nonscalefree, with asymptotic degree distribution being a powerlaw distribution with exponential cutoff. The construction builds on inhomogeneous random graphs, a class of models exhibiting degree heterogeneity. This class of models has been studied extensively in the applied probability literature (Aldous, 1997; Chung and Lu, 2002a; Bollobás et al., 2007; van der Hofstad, 2016), but has been left unexplored for the statistical analysis of realworld networks. In Section 2 we provide a formal description of sparsity and scalefreeness for sequences of graphs. In Section 3 we describe the rank1 inhomogeneous random graphs, and present their sparsity property and asymptotic degree distribution. The model is then extended in Section 4 in order to accommodate a latent community structure. Posterior inference is discussed in Section 5. In Section 6 we discuss the relative merits and drawbacks of our approach compared to other random graph models. Section 7 provides an illustration of the approach on several realworld networks, showing that the model can provide a good fit to the empirical degree distribution and recover the latent community structure.
Notations. Throughout the article, denotes convergence in probability, and indicates .
2 Sparse and scalefree networks
We first provide a formal definition of sparsity and scalefreeness, as there is no general agreement on the definition of a scalefree network and these notions are core to the results of this paper.
Let be a sequence of simple random graphs of size where , is the set of vertices and the set of edges. Denote the number of edges. The graph is said to be sparse if . Let be the number of nodes of degree in . We now formally give the definition of a scalefree network informally introduced in Section 1.
Definition 2.1.
A random graph sequence is said to be scalefree with exponent iff there exists a slowly varying function and such that, for each
(1) 
as tends to infinity, where
(2) 
Background definitions and properties of slowly and regularly varying functions are given in the appendix. Intuitively, slowly varying functions are functions that vary more slowly than any power of . The term scalefree comes from the fact that the asymptotic degree distribution satisfies some (asymptotic) scaleinvariance. For any integer ,
The most classical case is when is constant. In this case, the asymptotic degree distribution behaves as a pure powerlaw for large. More generally, the scaleinvariance property defined above will be satisfied for any slowly varying function , which can be e.g. logarithm, or iterated logarithm. creftype 2.1 is slightly more restrictive than the definition of a scalefree graph sequence in (van der Hofstad, 2016, Definition 1.4), which is implied from creftype 2.1 by properties of regularly varying functions (see appendix).
3 Rank1 inhomogeneous random graphs
3.1 Definition
Let be a sequence of simple random graphs of size defined as follows. The probability that two nodes and are connected in the graph is given by
(3) 
where and the positive weights are independently and identically distributed (iid) from some distribution with . The model (3) is known as the NorrosReittu (NR) inhomogeneous random graph model (Norros and Reittu, 2006). This model has been the subject of a lot of interest in the applied probability and graph theory literature (Bollobás et al., 2007; Bhamidi et al., 2012; van der Hofstad, 2013, 2016; Broutin et al., 2018). The parameter accounts for degree heterogeneity in the graph and can be interpreted as a sociability parameter of node . The larger this parameter, the more likely node is to connect to other nodes.
3.2 Sparsity and scalefree properties
The random graph sequence defined by Equation (3) satisfies a number of remarkable asymptotic properties. The first result, which follows from Bollobás et al. (2007) (see details in the appendix), shows that the resulting graphs are sparse.
Theorem 3.1 (Bollobás et al. (2007)).
Let denote the number of edges in the graph . Then
(4) 
The following result is a corollary of Theorem 3.13, remark 2.4 and the discussion in Section 16.4 in (Bollobás et al., 2007)
. It states that the asymptotic degree distribution is a mixture of Poisson distributions, with mixing distribution
.Theorem 3.2 (Bollobás et al. (2007)).
Let be the number of vertices of degree in the graph of size and link probability given by Equation (3). Then, for each , as tends to infinity, where
(5) 
Our analysis on the asymptotic degree distribution is based on the following theorem for the asymptotic behavior of mixed Poisson distributions.
Theorem 3.3.
(Willmot, 1990) Suppose that
where is a locally bounded function on which varies slowy at infinity, , and (with when ). For , define the probabilities of the mixed Poisson distribution as
Then,
The following result is a corollary of creftype 3.2 and creftype 3.3
. It states that if the random variables
are regularly varying (see definition in the appendix), then the sequence of random graphs is scalefree.Corollary 3.1.
Let be the number of vertices of degree in the graph of size and link probability given by Equation (3). Assume that the distribution F is absolutely continuous with pdf verifying as tends to infinity, for some locally bounded slowly varying function and . Then, for each , as tends to infinity, where
(6) 
3.3 Particular examples
We now consider two special cases. The first case yields scalefree graphs with asymptotic powerlaw degree distributions with exponent . The second yields nonscalefree graphs, where the asymptotic degree distribution is powerlaw with exponential cutoff.
3.3.1 Scalefree graph with powerlaw degree distribution
For , let where
denotes the inverse gamma distribution with parameters
and, whose probability density function (pdf) is given by
Here, the constraint is required for the condition . By creftype 3.2, the asymptotic degree distribution is a mixed Poissoninversegamma distribution with probability mass function
where is the modified Bessel function of the second kind. Using creftype 3.1, we obtain
The resulting asymptotic degree distribution is a powerlaw and the graph is scalefree with arbitrary index
. The two hyperparameters of the inverse gamma prior play an important role to decide the asymptotic properties of graphs. The shape parameter
tunes the index of powerlaw, and is also related to the sparsity of graphs. The scale parameter is also related to the sparsity of graphs. Fig. 1 shows the empirical degree distributions and number of edges of graphs generated from inverse gamma NR model.3.3.2 Non scalefree graph with powerlaw degree distribution with exponential cutoff
Now we consider another model with generalized inverse Gaussian (GIG) prior. Let where the density of the GIG distribution with parameter , and is given by
Note that by taking , one obtains the pdf of an inverse gamma distribution as a limiting case. By creftype 3.2, the asymptotic degree distribution is
This distribution is sometimes called the Sichel distribution, after Herbert Sichel (Sichel, 1974). Note that as hence, by creftype 3.3,
In this case, the asymptotic degree distribution is not of the form of Equation (2), and the graph sequence is therefore not scalefree. However, the asymptotic degree distribution has the form
of a powerlaw distribution with exponential cutoff. This class of probability distributions has been shown to provide a good fit to the degree distributions of a wide range of realworld networks
(Clauset et al., 2009). As for the inverse gamma NR model, the hyperparameters tunes the asymptotic properties. determines the powerlaw index of degree distribution, is related to the exponential cutoff and sparsity, and is related to the sparsity. Fig. 1 shows the empirical degree distributions and the number of edges of graphs generated from GIG NR model.4 Extension to Latent Overlapping Communities
4.1 Definition
The inhomogeneous random graphs considered so far only account for degree heterogeneity. However, the connections in realworld networks are often due to some latent interactions between the vertices. Recently, several models that combine a degree correction together with a latent structure to define edge probabilities were proposed (Zhou, 2015; Todeschini et al., 2016; Herlau et al., 2016; Lee et al., 2017). In this section, we propose an extension of the NR model that includes some latent overlapping structure, and study the sparsity, scalefreeness properties and asymptotic degree distribution of this model. Let the edge probability between the vertex and be given by
where are iid random variables with distribution with and are i.i.d. with for all and . We call this model with communities the rank model. As in the rank1 model, the parameter can be interpreted as an overall sociability parameter of node , or degreecorrection. The parameter can be interpreted as the level of affiliation of individual of to community . Similar models, in a different asymptotic framework have been used in (Yang and Leskovec, 2013; Zhou, 2015; Todeschini et al., 2016).
Theorem 4.1.
Let denote the number of edges in the graph defined with link probability (4.1). Then,
Recall that is the number of vertices of degree in the graph of size . Then, for each , as tends to infinity, where
(7) 
where is the distribution of the random variable . If additionally is absolutely continuous with pdf verifying as for some locally bounded slowly varying function and and for some , then
The proof of creftype 4.1 is given in the appendix. In this paper, we consider in particular
where denotes the standard Dirichlet distribution with parameter , where for .
5 Posterior inference
5.1 Posterior inference for the rank1 NR
Let be an (upper triangular part of) adjacency matrix of a graph and . The joint density is written as
Following Caron and Fox (2017) and Zhou (2015), we introduce a set of auxiliary truncated Poisson random variables for the pairs with .
The log joint density is then given as
Note that the terms for the pairs without edges are collapsed into a single summation, and hence the overall computations of the log joint density and its gradient take time. This is a huge advantage of the link function of NR model, while other link functions for rank1 inhomogeneous random graphs (Britton et al., 2006; Chung and Lu, 2002a, b, 2003) suffer from computing times.
For the posterior inference, we use a Markov chain Monte Carlo (MCMC) algorithm. At each step, given the gradient of the log joint density, we update
via Hamiltonian Monte Carlo (HMC, (Duane et al., 1987; Neal, 2011)). Then we resample the auxiliary variables from truncated Poisson, and update hyperparameters for using a MetropolisHastings step. Details can be found in the appendix.5.2 Posterior inference for the rank Nr
The posterior inference for the rank model is similar to that of the rank1 model. Following Todeschini et al. (2016), for tractable inference, we introduce a set of multivariate truncated Poisson random variables ,
where and . The log joint density is
where is the density for Dirichlet distribution with parameters . As for the rank1 model, we can efficiently compute this log joint density and its gradient w.r.t. and with time. At each step of MCMC, we first sample and via HMC, resample from multivariate truncated Poisson, and update hyperparameters via MetropolisHastings. The detailed procedure can be found in the appendix.
6 Discussion
The models described in this paper can capture sparsity, scalefreeness with exponent and latent community structure. One drawback of the construction is that the model lacks projectivity, due to normalisation by in the link probability (3). While this is an undesirable feature of the approach, we stress that there does not exist any projective class of random graphs that can capture all those properties, as we explain below. A popular class of models is the graphonbased or vertexexchangeable graphs, which include as special cases stochastic blockmodels, latent factor models and their extensions, see (Orbanz and Roy, 2015) for a review. While these models have been successfully applied in a wide range of application, they produce dense graphs with probability one, as stressed by Orbanz and Roy (2015). Alternative models have been proposed, either based on exchangeable point processes (Caron and Fox, 2017; Veitch and Roy, 2015; Borgs et al., 2016), or on the notion of edgeexchangeability (Crane and Dempsey, 2015, 2017; Cai et al., 2016). Caron and Rousseau (2017) showed that using exchangeable point processes, one can obtain scalefree graphs with exponent , but not above. While no results exist for the scalefreeness of edgeexchangeable random graphs in the sense of creftype 2.1 (see (Janson, 2017, Problem 9.8)), it is likely that a similar range is achieved for this class of models. Another family of models are nonexchangeable models based on preferential attachment (Barabási and Albert, 1999). The generated graphs are scalefree with exponent . However, the generative process makes it difficult to consider more general constructions that take into account community structure. Additionally, the nonexchangeability implies that the ordering of nodes must be known or need to be inferred for inference, which limits its applicability. By contrast, our model is finitely exchangeable for each , and so the ordering of the nodes needs not to be known in order to make inference. As a consequence, no other projective class of model can give scalefree networks with exponent , interpretable parameters capturing community structure, and scalable inference, as described in this paper. While the model has a number of attractive properties, it also has some limitations. The mean number of triangles in inhomogeneous random graphs converges to a constant as tends to infinity (van der Hofstad, 2018). Although the latent community structure introduced may mitigate this effect for reasonable , this property appears undesirable for realworld network.
7 Experiments
7.1 Experiments with the rank1 models
In this section, we test our inversegamma NR model (IGNR) and generalized inverse Gaussian NR model (GIGNR) on synthetic and real world graphs. For all experiments, we ran three MCMC chains for 10,000 iterations for our algorithms, and collected every 10th samples after 5,000 burnin samples. The prior distributions for the hyperparameters of the different models are given in the appendix. The code for our experiments is available at https://github.com/OxCSMLBayesNP/BNRG.
Experiments with synthetic graphs.
We first fitted the basic models with Inversegamma prior (IG) and generalized inverse Gaussian prior (GIG) on synthetic graphs generated from IGNR model and GIGNR model. For IG, we generated a graph with nodes with parameters and . For GIG, we generated a graph with 5,000 nodes with parameters . As summarized in Fig. 2
, the posterior distribution recovers the hyperparameter values used to generated the graphs, and the posterior predictive distribution provides a good fit to the empirical degree distribution.
Experiments with realworld graphs.
Now we evaluate our models on three realworld networks:
condmat^{1}^{1}1https://toreopsahl.com/datasets/#newman2001: coauthorship network based on arXiv preprints for condensed matter, 16,264 nodes and 47,594 edges.
Enron^{2}^{2}2https://snap.stanford.edu/data/emailEnron.html: Enron collaboration email network,
36,692 nodes and 183,831 edges.
internet^{3}^{3}3https://www.cise.ufl.edu/research/sparse/matrices/Pajek/internet.html: Network of internet routers, 124,651 nodes and 193,620 edges.
To evaluate the goodnessoffit in terms of degree distributions, as suggested in Clauset et al. (2009), we sample graphs from the posterior predictive distribution based on the posterior samples, and computed the reweighted KolmogorovSminorov (KS) statistic:
where is the CDF of observed degrees, is the CDF of degrees of graphs sampled from the predictive distribution, and is the minimum values among the observed degree and predictive degree. We compare our model to the random graph model with generalized gamma process prior (GGP, (Caron and Fox, 2017)), whose asymptotic degree distribution is a powerlaw with exponent in . We ran MCMC for the GGP model with 40,000 iterations and three chains. Posterior predictive degree distribution are reported in Fig. 3. Credible intervals of the hyperparameters and KS statistics for the different models are given in Table 1. Both IG and GIG provide a good to the degree distribution, with an exponent greater than 2, while the GGP model fails to capture the shape of the degree distribution.
condmat  Enron  internet  

hyperparams  hyperparams  hyperparams  
IG  0.070.01 

0.130.05 

0.190.00 

GIG  0.070.01 

0.120.01 

0.190.00 

GGP  0.150.06 

0.180.02 

0.400.10 

Average reweighted KS statistic of predictive degree distributions and 95% credible intervals of estimated hyperparameters for IG, GIG and GGP models.
7.2 Experiments with latent overlapping communities
Finally, we tested our models with latent overlapping communities on two realworld graphs
with groundtruth communities.
polblogs^{4}^{4}4http://www.cise.ufl.edu/research/sparse/matrices/Newman/polblogs: the network of Americal political blogs. 1,224 nodes and 16.715 edges, two true communities (left or right).
DBLP^{5}^{5}5https://snap.stanford.edu/data/comDBLP.html: Coauthorship network of DBLP computer science bibliography. The original network has 317,080 nodes. Based on the groundtruth communities extracted in Yang and Leskovec (2012), we took three largest communities and subsampled 1,990 nodes among them. The subsampled graph contains 4,413 edges.
We compared our two models IGNR and GIGNR models to the random graph model based on compound generalized gamma process (CGGP, (Todeschini et al., 2016)), and mixed membership stochastic blockmodel (MMSB, Airoldi et al. (2009)). CGGP can capture the latent overlapping communities and has asymptotic powerlaw degree distsribution of exponent in . MMSB can capture the latent communities, but does not include a degree correction term. For all three models, we set the number of communities to be equal to two for polblogs, and three for DBLP. The CGGP was ran for 200,000 iterations after 10,000 initial iterations where was initialized by running the model without communities (GGP). Each iteration of the sampler for MMSB scales quadratically with the number of nodes, and the sampler was therefore ran for a smaller number of iterations (5,000) for fair comparison. We found that longer iterations did not lead to improved performances. All methods were ran with three MCMC chains. For CGGP and MMSB methods, point estimates of the parameters measuring the level of affiliation of each individual were obtained using the Bayesian estimator described in Todeschini et al. (2016)
. For IGNR and GIGIR, we simply took the maximum a posteriori estimate of
. To compare to the ground truth communities, nodes are then assigned to the community where they have the strongest affiliation. The learned communities are shown in the appendix. Posterior predictive of the degree distributions for the different models are given in Fig. 4, and the KS statistic in Table 2. Both GIGNR and CGGP exhibit a good fit to the polblogs dataset, where there does not seem to be evidence for a powerlaw exponent greater than 2. For the DBLP, both IGNR and GIGNR provide a good fit, while CGGP fails to capture adequately the degree distribution. The classification accuracy is also reported in Table 2. The classification accuracy is similar for IGNR, GIGNR and CGGP on polblogs. IGNR and GIGNR outperform other methods on the DBLP network. MMSB failed to capture both degree distributions and community structures, due to the large degree heterogeneity, a limitation already reported in previous articles (Karrer and Newman, 2011; Gopalan et al., 2013).polblogs  DBLP  

Acc (%)  Acc (%)  
IG  0.710.50  94.28  0.080.03  72.46 
GIG  0.14 0.03  93.79  0.090.03  76.58 
CGGP  0.120.03  94.12  0.330.02  57.49 
MMSB  3.741.18  52.12  0.370.07  39.94 
References
 Airoldi et al. [2009] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems 22, 2009.
 Aldous [1997] D. Aldous. Brownian excursions, critical random graphs and the multiplicative coalescent. The Annals of Probability, pages 812–854, 1997.
 Barabási [2016] A.L. Barabási. Network science, chapter 4. Cambridge university press, 2016.
 Barabási and Albert [1999] A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.
 Barabasi [2018] L. Barabasi. All you need is love, 2018. URL https://www.barabasilab.com/post/loveisallyouneed. Blog post.
 Betancourt [2010] Michael Betancourt. Crusing the simplex: Hamiltonian Monte Carlo and the Dirichlet distribution. arXiv:1010.3436, 2010.
 Bhamidi et al. [2012] S. Bhamidi, R. van Der Hofstad, and J. Van Leeuwaarden. Novel scaling limits for critical inhomogeneous random graphs. The Annals of Probability, 40(6):2299–2361, 2012.
 Bingham et al. [1989] N. H. Bingham, C. M. Goldie, and J. L. Teugels. Regular variation, volume 27. Cambridge university press, 1989.

Bollobás et al. [2007]
B. Bollobás, S. Janson, and O. Riordan.
The phase transition in inhomogeneous random graphs.
Random Structures & Algorithms, 31:3–122, 2007.  Borgs et al. [2016] C. Borgs, J. T. Chayes, H. Cohn, and N. Holden. Sparse exchangeable graphs and their limits via graphon processes. ArXiv preprint arXiv:1601.07134, 2016.
 Britton et al. [2006] T. Britton, M. Deijfen, and A. MartinLöf. Generating simple random graphs with prescribed degree distribution. Journal of Statistical Physics, 124(6):1377–1397, 2006.
 Broido and Clauset [2018] A. D. Broido and A. Clauset. Scalefree networks are rare. arXiv preprint arXiv:1801.03400, 2018.
 Broutin et al. [2018] N. Broutin, T. Duquesne, and M. Wang. Limits of multiplicative inhomogeneous random graphs and Lévy trees. arXiv preprint arXiv:1804.05871, 2018.
 Cai et al. [2016] D. Cai, T. Campbell, and T. Broderick. Edgeexchangeable graphs and sparsity. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 4249–4257. Curran Associates, Inc., 2016.
 Caron and Fox [2017] F. Caron and E. B. Fox. Sparse graphs using exchangeable random measures. Journal of the Royal Statistical Society B (discussion paper), 79:1295–1366, 2017.
 Caron and Rousseau [2017] F. Caron and J. Rousseau. On sparsity and powerlaw properties of graphs based on exchangeable point processes. arXiv preprint arXiv:1708.03120, 2017.
 Chung and Lu [2002a] F. Chung and L. Lu. Connected components in random graphs with given expected degree sequences. Annals of Combinatorics, 6:125–145, 2002a.
 Chung and Lu [2002b] F. Chung and L. Lu. The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences of the Unites States of America, 99(25):15879–15882, 2002b.
 Chung and Lu [2003] F. Chung and L. Lu. The average distance in a random graph with given expected degrees. Internet Mathematics, 1(1):91–113, 2003.
 Clauset et al. [2009] A. Clauset, C. R. Shalizi, and M. E. J. Newman. Powerlaw distributions in empirical data. SIAM Reivew, 51:661–703, 2009.
 Crane and Dempsey [2015] H. Crane and W. Dempsey. A framework for statistical network modeling. arXiv preprint arXiv:1509.08185, 2015.
 Crane and Dempsey [2017] H. Crane and W. Dempsey. Edge exchangeable models for interaction networks. Journal of the American Statistical Association, 2017.
 Duane et al. [1987] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid Monte Carlo. Physics Letters B, 195(2):216–222, 1987.

Goldenberg et al. [2010]
A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi.
A survey of statistical network models.
Foundations and Trends in Machine Learning
, 2(2):129–233, 2010.  Gopalan et al. [2013] Prem Gopalan, Chong Wang, and David M. Blei. Modeling overlapping communities with node popularities. In Advances in Neural Information Processing Systems 26, 2013.
 Herlau et al. [2016] T. Herlau, M. N. Schmidt, and M. Mørup. Completely random measures for modelling blockstructured sparse networks. In Advances in Neural Information Processing Systems 29, 2016.
 Janson [2017] S. Janson. On edge exchangeable random graphs. Journal of Statistical Physics, pages 1–37, 2017.
 Karrer and Newman [2011] Brian Karrer and Mark E. J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83(1), 2011.
 Kolaczyk [2009] E. D. Kolaczyk. Statistical analysis of network data: methods and models. Springer Science & Business Media, 2009.
 Lee et al. [2017] J. Lee, C. Heakulani, Z. Ghahramani, L. F. James, and S. Choi. Bayesian inference on random simple graphs with powerlaw degree distributions. In Proceedings of the 34th International Conference on Machine Learning, 2017.

Mikosch [1999]
Thomas Mikosch.
Regular variation, subexponentiality and their applications in probability theory
. Eindhoven University of Technology, 1999.  Neal [2011] R. M. Neal. MCMC using Hamiltonian Monte Carlo, volume 2. Chapman & Hall / CRC Press, 2011.
 Newman [2009] M. Newman. Networks: an introduction. OUP Oxford, 2009.
 Norros and Reittu [2006] I. Norros and H. Reittu. On a conditionally Poissonian graph process. Advances in Applied Probability, 38(1):59–75, 2006.
 Orbanz and Roy [2015] P. Orbanz and D. M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures. IEEE transactions on pattern analysis and machine intelligence, 37(2):437–461, 2015.
 Resnick [2007] S. I. Resnick. Heavytail phenomena: probabilistic and statistical modeling. Springer Science & Business Media, 2007.
 Sichel [1974] H. S. Sichel. On a distribution representing sentencelength in written prose. Journal of the Royal Statistical Society. Series A (General), pages 25–34, 1974.
 Todeschini et al. [2016] A. Todeschini, X. Miscouridou, and F. Caron. Exchangeable random measures for sparse and modular graphs with overlapping communities. arXiv:1602.0211, 2016.
 van der Hofstad [2013] R. van der Hofstad. Critical behavior in inhomogeneous random graphs. Random Structures & Algorithms, 42(4):480–508, 2013.
 van der Hofstad [2016] R. van der Hofstad. Random graphs and complex networks: volume 1. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2016.
 van der Hofstad [2018] R. van der Hofstad. Random graphs and complex networks: volume 2. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.
 Veitch and Roy [2015] V. Veitch and D. M. Roy. The class of random graphs arising from exchangeable random measures. arXiv preprint arXiv:1512.03099, 2015.
 Willmot [1990] G. E. Willmot. Asymptotic tail behaviour of Poisson mixtures with applications. Advances in Applied Probability, 22(1):147–159, 1990.
 Yang and Leskovec [2012] J. Yang and J. Leskovec. Defining and evaluating network communities based on groundtruth. In IEEE 12th International Conference on Data Mining, 2012.
 Yang and Leskovec [2013] J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 587–596. ACM, 2013.

Zhou [2015]
M. Zhou.
Infinite edge partition models for overlapping community detection
and link prediction.
In
Proceedings of the 18th International Conference on Artificial Intelligence and Statistics
, pages 1135–1143, 2015.
Appendix A On scalefree networks
A recent article by Broido and Clauset [2018] (BC) raised concerns about the claim that many realworld networks are scalefree. BC performed a statistical analysis on a large number of networks to test whether the degree distribution follows a powerlaw distribution or some alternative distributions. One of the alternative degree distribution considered is a powerlaw distribution with an exponential cutoff, which was shown to provide a better fit for a majority of the datasets considered. BC conclude in their article that scalefree networks are rare.
While we agree with the authors that there is a need for rigorous statistical testing of the scalefree hypothesis, and that scalefree networks may indeed by more rare than originally thought, we do not think that the conclusion of the authors is supported by their experiments, except if one considers a very narrow definition of a scalefree network. As pointed out by Barabasi [2018] in a blog post discussing their article, scalefreeness is an asymptotic property: as the sample size goes to infinity, the degree distribution converges to a powerlaw (up to a slowly varying function, see Definition 2.1). Degree distributions of finitesize graphs may still depart significantly from a pure powerlaw distribution.
A salient example is given by the class of networks introduced by Caron and Fox [2017], which are known to be scalefree with exponent between 1 and 2 for some values of the parameters. As shown in Figure 5, while the degree distribution is asymptotically powerlaw, any finitesize graph exhibits an exponential cutoff, which shifts to the right as the sample size increases. Therefore, any statistical test on a fixedn graph is likely to reject the pure powerlaw hypothesis although the network model is indeed scalefree.
For reference, empirical degree distributions for the IGNR and GIGNR are also plotted in Figure 5. The IGNR is scalefree, and each finite distribution is additionally close to a pure powerlaw distribution. The GIGNR is not scalefree, and the asymptotic degree distribution is a powerlaw distribution with exponential cutoff.
Appendix B Background material on regular variation
In this section we give some definitions and properties of slowly and regularly varying functions, see the books of Bingham et al. [1989], Mikosch [1999] and Resnick [2007] for reference.
Definition B.1.
A positive function is regularly varying at infinity with index if
for all .
If , the function is said to be slowly varying. Examples of slowly varying include constant functions, functions converging to a constant, logarithms, etc. If a function is regularly varying with index , then there exists a slowly varying function such that
as .
Definition B.2.
A nonnegative random variable with cdf is said to be regularly varying with exponent if
as tends to infinity, where is a slowly varying function at infinity. If and is absolutely continuous with density , where is ultimately monotone, then
as .
Proposition B.1.
Let be a regularly varying random variable with exponent , and be a positive random variable, independent of , with for some . Then is regularly varying with exponent and
as tends to infinity. If is absolutely continuous with ultimately monotone density , this implies
as tends to infinity.
Proposition B.2.
If is a regularly varying function with index , then
Appendix C Background on inhomogeneous random graph models
In this section, we review the general framework of inhomogeneous random graphs (IRG) presented in Bollobás et al. [2007]. We start by introducing a vertex space and a kernel to define IRGs.
Definition C.1.
A vertex space is a triplet , where is a separable metric space, is a Borel probability measure on , and ^{6}^{6}6To be precise, we should write , but we omit the superscript for simplicity. is a random sequence of points in such that for each ,
where denotes the convergence in probability. The pair is called a ground space.
Definition C.2.
A kernel on a ground space is a symmetric nonnegative Borel measurable function on .
Rougly speaking, a vertex space is a space of values assigned to vertices in a graph, such as vertex weights or popularities. Each vertex is associated with a point in , and these points are used to contruct edge probabilities between vertices through the kernel . Kernels should be further restricted to be in a class of functions satisfying some conditions, and we will explain those shortly after. Given a vertex space and a kernel, an IRG is defined with link function (edge probabilities)
All the following arguments will be explained with this choice of link function, but everything still holds with the following alternative choices of link functions [Bollobás et al., 2007, Remark 2.4].
All these three functions are related to existing works on IRGs. The link function (C) is a generic version of Chung and Lu [2002a], (C) is for Norros and Reittu [2006], and (C) is for Britton et al. [2006]. We chose (C) for our model because of the computational efficiency in posterior inference.
Let be a graphs generated from IRG described above with a vertex space and a kernel . The kernel are assumed to be graphical, which is defined as follows.
Definition C.3.
A kernel on a vertex space is graphical if the followings hold:

is continuous almost everywhere on .

.

Let be the set of edges in . Then,
The first and second conditions are natural technical requirements. The third condition is related to the density of graphs. It requires to measure the density of the edges [Bollobás et al., 2007].
The following theorem characterizes the asymptotic degree distribution of IRGs.
Theorem C.1.
([Bollobás et al., 2007, Theorem 3.13]) Let be a graphical kernel on a vertex space . For any fixed ,
where is the number of vertices in with degree in , and
Hence, one can easily compute the asymptotic degree distribution of any IRG that fits into the framework once the corresponding vertex space and kernel is specified. This is what we do in the next two sections.
Appendix D Proof of creftype 3.1 and creftype 3.2
creftype 3.1 and creftype 3.2 in the main paper are directly obtained by showing that the NorrosReittu IRG (NRIRG) fits into the general framework discussed in Appendix C of this supplementary material. Actually, the NRIRG has been discussed as an example of rank1 IRGs, see [Bollobás et al., 2007, Section 16.4]. More precisely, define a vertex space with
where denotes the law of , and define a kernel
To see if this kernel is graphical, note that
Hence, combined with Bollobás et al. [2007, Lemma 8.1], we get
and the kernel is therefore graphical. The second part of creftype 3.1 then follows from Bollobás et al. [2007, Proposition 8.9], and creftype 3.2 follows from creftype C.1 with
Appendix E Proof of creftype 4.1
creftype 4.1 also follows by showing that the rank model fits into the general framework discussed in Appendix C. Define a vertex space with
where and denote the laws of and , and
Define a kernel on this space
where denotes the th component of . To see if this kernel is graphical, note that
Now note that
Plugging this into the above equation yields
Hence, by Bollobás et al. [2007, Lemma 8.1], we get
The second part of creftype 3.1 then follows from Bollobás et al. [2007, Proposition 8.9], and creftype 3.2 follows from creftype C.1 with
Appendix F Details on posterior inferences
f.1 Posterior inference for the rank1 model
The posterior ineference for rank1 model is summarized in three steps.

Sample via HMC (we use the transformation and update ).

Sample from truncated Poisson distribution,

Sample hyperparameters for via MetropolisHastings.
We used the step size and the number of leapfrog steps for all experiments.
Sampling hyperparameters for IG.
In IG we have two hyperparameters and . We place a lognormal prior on and .
Then we updated and via MetropolisHastings with proposal distribution and . We found that the initialization of and was important to capture degree distributions. We initialized and using the asymptotic relation
set
Sampling hyperparameters for GIG.
We have three parameters , , and (we restricted to get the positive powerlaw exponent). We placed lognormal priors on and .
We updated and via MetropolisHastings with proposal distributions and . We initialized and . was initialized by solving
using a numerical rootfinding algorithm.
f.2 Posterior inference for the rank model
The posterior inference for the rank model is quite similar to that of the rank1 model.

Sample via HMC (we use the transformation and update ).

Sample via HMC (see below).

Sample from multivariate truncated Poisson distribution,
where .

Sample hyperparameters of and .
Details on HMC for .
Each vector
is a Dirichlet random variable such that , so transforming it to an unconstrained vector is quite tricky. We adapt the trick presented in Betancourt [2010]. Let . Define i.i.d. beta random variables,where
Then, if we take a transform
we have
. The advantage of this transformation is that the Jacobian can be computed efficiently. By the chain rule, we have
We take another logistic transform on to make it completely unconstrained.
Hence, the gradient for the unconstrained variable is computed as
In our algorithm, HMC for is done on the unconstrained variables .
Initialization and step sizes.
Unlike Todeschini et al. [2016] where the model is initialized by running MCMC for the simplified model without communities to initialize , we initialize the chain by running MCMC only for while holding fixed as . We found this helpful for the algorithm to discover better community structures. For this initialization, we ran HMC for with and . After initialization, we ran HMC for with and , and ran HMC for with and . We decayed for to after burnin.
Sampling hyperparameters for .
We assume lognormal prior distributions on the hyperparameters .
Then we updated via MetropolisHastings with proposal distribution . We initialized .
Appendix G Additional Figures
g.1 Empirical degree distributions and number of edges for rank model
We first demonstrate the empirical degree distributions and number of edges of graphs generated from rank model with . The results are presented in Fig. 6. As predicted from creftype 4.1, the degree distribution and sparsity are not affected by the introduction of the community affiliation factors .
g.2 Discovered community structures
We present the community structures discoverd by IG, GIG, CGGP and MMSB in Fig. 7. IG, GIG and CGGP discovered reasonable communities where the edge densities within communities are much higher than the edge densities between communities. However, MMSB completely failed to discover the communities. The fact that the models without degree heterogeneity fail to capture community structures has been reported in various works [Karrer and Newman, 2011, Gopalan et al., 2013, Todeschini et al., 2016], and our results confirm it.