1 Introduction
Random graph models provide statistical tools for network analysis and can be used as prior distributions in a Bayesian framework. Such models aim to capture various properties of realworld networks, such as powerlaw degree distributions (Albert and Barabási, 2002; Dangalchev, 2004; BloemReddy and Orbanz, 2016), smallworld properties (Watts and Strogatz, 1998), or latent community structure (Holland et al., 1983; Airoldi et al., 2008; Karrer and Newman, 2011).
One statistic of interest is the density of a graph, defined as the number of edges over the number of possible edges for a binary, undirected graph. This can be extended to distributions over graphs: we think of a random graph as being dense if the number of edges grows quadratically with the number of vertices, and sparse if it grows subquadratically with the number of vertices (Nešetřil and Ossona de Mendez, 2012). Many commonlyused random graphs, such as ErdösRényi graphs (Gilbert, 1959; Erdös and Rényi, 1960) or stochastic blockmodels and their variants (Holland et al., 1983; Airoldi et al., 2008; Karrer and Newman, 2011), concentrate on dense graphs (see Lloyd et al., 2012, for a discussion). This behavior is commonly seen in small, closed communities, where every vertex has the option to interact with all other vertex and a fully connected graph is at least conceivable. However it is not generally seen in larger networks, where a given vertex will not have the opportunity to interact with more than a small subset of the other vertices—for example international online social networks.
Recently, a number of models have been proposed that can generate sparse graphs. These models share certain similarities in construction: a measure is placed on the space of potential edges, and the observed edges are iid given this measure. For example, Caron and Fox (2017) construct networks as a Poisson process using a discrete, countably infinite base measure on , where is the space of potential vertices. Cai et al. (2016), Williamson (2016) and Crane and Dempsey (2017)
use discrete, countably infinite probability measures on either
or , and sample sequences of edges from this distribution. Appropriate random measure choices can yield sparsity and powerlaw degree distribution.While models such as these offer the ability to model sparse graphs, they do not capture certain important network behaviors. While largescale graphs are usually sparse, locally they are often dense: if two vertices and are both connected to vertex , a connection between vertices and is more likely than a connection between and a randomly selected vertex. Such graphs will typically exhibit a high average local clustering coefficient, indicating that most vertices’ neighborhoods are close to being cliques. This type of behavior is a key component of “small world” graphs (Watts and Strogatz, 1998).
In this paper we explore distributions over graphs that, while locally dense, exhibit global sparsity. We achieve this by explicitly modeling graphs as collections of cliques, or fully connected subsets of vertices. We present a framework for random clique edge covers, with cliques selected using a nonparametric feature selection model (in this paper, the stable beta Indian buffet process
(Teh and Görür, 2009), but other choices are possible). The resulting graphs exhibit local density, with clique sizes controlled primarily by a single parameter. On a global scale, the graphs are sparse, with flexible degree of sparsity that is primarily controlled by a separate parameter. We observe powerlaw distributions over both the number of cliques a vertex belongs to, and its degree.2 Random clique covers
A graph
is an ordered pair
of vertices and edges (which are pairs of vertices). can alternatively be described in terms of its edge clique cover (Orlin, 1977). An edge clique cover (or intersection graph) is a set of cliques—i.e. fully connected subgraphs—such that two vertices share an edge iff they have at least one clique in common.This formulation suggests a method for generating random graphs by placing a distribution over cliques. Concretely, let a random clique be a random, finite subset of a (possibly uncountable) set of potential vertices. Given a distribution over sequences of such subsets, we can generate a sequence of cliques, that can be translated into a graph by adding undirected edges between all vertices with at least one clique in common, so that
If we number the vertices in our graph, and represent our cliques in terms of a binary matrix where iff clique includes vertex , then we can represent the graph’s adjacency matrix as
where the min is taken as an elementwise operation.
2.1 Cliques generated using exchangeable random measures
In order to specify our distribution over graphs, we must specify both a distribution over the number of cliques, and a distribution over the vertices appearing in those cliques. For the former, we choose to let , although other choices are possible. Reasonable desiderata for the random clique selection mechanism might be that the total number of vertices (and edges) is unbounded; that the size of each clique is random; and that cliques overlap with finite probability (avoiding the trivial solution where our graph is made up of a series of disconnected subgraphs).
One choice that meets these criteria is the stable beta process (Teh and Görür, 2009). The stable beta process is a completely random measure (Kingman, 1967) whose atoms lie in [0,1], parametrized by scalar parameters , and , and a smooth, sigmafinite measure on . If , this reduces to the (homogeneous) beta process (Hjort, 1990; Thibaux and Jordan, 2007). If , the stable beta process corresponds to a stable process with all atoms of size larger than one removed.
We can use the stable beta process to construct a distribution over cliques, which we can represent in terms of a binary matrix. If where each location corresponds to a vertex, we can sample an exchangeable sequence of cliques by including the th vertex in the th clique with probability . We can represent the resulting clique allocation as a binary matrix with infinitely many columns, where iff vertex is in clique .
If we marginalize out , we obtain a predictive distribution for that is known as the stablebeta Indian buffet process (SBIBP, Teh and Görür, 2009). If is the number of times vertex has been previously selected, then the th clique will include that feature (i.e. ) with probability
In addition to features that have previously appeared, will also select new vertices, where .
The SBIBP exhibits a number of powerlaw behaviors (Teh and Görür, 2009; Broderick et al., 2012; Heaukulani and Roy, 2015) that directly translate into desirable graph properties. The total number of nonzero columns of (in our case, observed vertices) in rows (cliques) grows, as , as .
In addition to having powerlaw behavior in the number of observed vertices, the number of nonzero entries per column (in our case, the number of cliques each vertex appears in) follows a Zipf’s law. Let be the number of features that appear exactly times in rows. Then, following Broderick et al. (2012),
(1) 
As we will see in Section 3, when applied to edge clique covers, the SBIBP yields many interesting graphspecific properties.
2.2 A model for partially observed cliques
In a modeling context, we may wish to consider the cliques as latent communities (Holland et al., 1983; Airoldi et al., 2008; Karrer and Newman, 2011). For example in a social network, a clique might represent a shared interest or hobby. In this context, we would not necessarily expect all vertices in the community to be connected. Instead, we can think of the community as being a noisy instantiation of a fully connected clique.
To model this in the current context, start with the distribution over random edge clique covers described in Section 2.1. If we associate each latent clique with a probability we can form a graph by including an edge between vertices and with probability
If we use a single probability across all cliques, this corresponds to including edges according to a noisyOR likelihood: if two vertices have latent cliques in common, they share an edge with probability .
One way of thinking of this model is as a superposition of locallydefined ErdösRenyi graphs. A graph builds a graph with vertices, by including each of the potential edges with probability . In the partially observed clique model, each row of the SBIBP selects a clique of vertices, and builds a subgraph on those vertices according to a model. Within the clique, we have a dense subgraph almost surely.
3 Graph properties
In this section, we explore the statistical properties of graphs constructed using an SBIBP distribution over cliques. In particular, we show that we can obtain sparse graphs that exhibit densely connected subgraphs. We focus on the fully observed setting described in Section 2.1, noting that the partially observed extension described in Section 2.2 will inherit similar properties.
3.1 Sparsity
Let be the number of edges in a graph, and be the number of vertices. We say our graph is dense if grows quadratically with , and sparse if it grows subquadratically. To explore the sparsity of the graph, we will look at how and behave as the number of cliques grows.
Let be the number of vertices first introduced in the th clique, and be the number of previously introduced vertices included in the th clique. We know (from the predictive distribution of the SBIBP) that and, marginally, . So, the expected number of vertices (Broderick et al., 2012; Heaukulani and Roy, 2015) is
Conditioned on the probabilities assigned by the stable beta process to vertices and , the probability of an edge between and given cliques is . By linearity of expectation, the total expected number of edges, given , is therefore . Therefore, we can obtain the expected total number of edges in the graph as
(2) 
In a related context, Cai et al. (2016) show, via a Poissonization approach, that this grows as .^{2}^{2}2For results for alternative choices of completely random measure, see Cai et al. (2016). Therefore, the expected number of edges grows as , which is always subquadratic in the number of edges. As we approach the limit where , we obtain the trivially sparse network where each clique contains a number of vertices, and there are no edges connecting cliques.
We can validate this limiting behavior using simulations from the network. Figure 1 shows how and covary for in random graphs, for different values of and . The scatter plots show pairs for ten network simulations, evaluated after each of 100 cliques are added. For high values of we have a nearlinear relationship between and , indicating an extremely sparse graph. As decreases the exponent increases and the level of sparsity decreases (although the graph never becomes dense). The concentration parameter
controls the variance.
3.2 Local density and clique distributions
The sparsity of a graph is related to the proportion of the total number of cliques that a given vertex belongs to. Due to overlap, the random cliques represented by the clique membership matrix do not represent all cliques in the resulting graph. Therefor, we distinguish between generating cliques and maximal cliques. By generating cliques, we mean the cliques represented in . By maximal cliques, we refer to the set of all cliques in the resulting graph that cannot be enlarged by including another vertex. While some cliques may be both generating and maximal, the two will not generally coincide. Since generating cliques are explicitly represented in our model, we can derive their statistical properties directly. The maximal cliques are only indirectly related to the generating model, therefore we explore their properties empirically.
We first consider the distribution over the number of generating cliques a vertex belongs to, conditioned on the fact that it appears in the graph and therefore belongs to at least one clique. As , the probability of a vertex appearing in more than one clique vanishes, and we end up with a graph with unconnected cliques. For , the number of cliques a vertex appears in grows as for large and (Teh and Görür, 2009; Broderick et al., 2012), indicating that as increases, the expected proportion of cliques a vertex belongs to decreases and we see increasingly heavytailed distributions over the number of cliques a vertex belongs to. We can see this empirically in Figure 4, which shows box plots of the number of cliques that each vertex belongs to.
While the number of generating cliques each vertex belongs to controls the global sparsity, the size and degree of overlap of those generating cliques controls the local density. The size of the generating cliques is directly controlled by and is marginally . The size of the largest maximal clique to which a vertex belongs is a function of both the size of the underlying generating cliques, and the extent to which these cliques overlap (which is controlled by ). As , the generating cliques and the maximal cliques collide; Figure 5 shows how, as decreases, the average largest maximal clique for each vertex increases.
3.3 Degree distribution
The expected size of each generating clique is , and the number of cliques an edge belongs to follows a power law distribution. This leads to a power law distribution over the degree, with the expected degree decreasing with and increasing with . In Figure 6, we can see empirically that the expected degree decreases with , approaching as .
The top row of Figure 7 shows the degree distribution (in the form of a boxplot) for different values of and and different numbers of cliques . We see increasingly heavy tails as increases. Since the number of vertices also increases with , the maximum degree is higher for larger . The bottom row of Figure 7 shows the degree divided by the total number of vertices, i.e. the proportion of the other vertices to which a given vertex is connected. Here, it is much easier to compare different values of , and we see that the average proportion decreases and the distribution grows heavier tailed as increases, for all values of and .
3.4 Density of the intersection graph of generating cliques
The clique graph of a graph is the intersection graph of its maximal cliques. As discussed above, our construction does not explicitly generate maximal cliques, so instead we will consider the generatingclique graph—i.e. the intersection graph of the generating cliques specified in . Perhaps surprisingly given the sparsity of the overall graph, this intersection graph is dense.
To show this, consider the expected number of vertices by which two cliques overlap. This is given by . Campbell’s theorem tells us that , where is a Poisson process on with rate measure —in our case, the Lévy measure of the stable beta process,
—and is a measureable function. Therefore the expected overlap between two cliques is
This indicates that two cliques overlap with positive probability, meaning that the resulting intersection graph is dense and that there is a path between any two vertices with positive probability, even if they do not belong to the same clique.
4 Related work
The idea of a random edge clique cover was explored by Barber (2008), under the name Clique Matrices. As in this paper, a binary matrix
is used to represent cliques within the network; however the number of vertices is fixed and the entries are i.i.d. Bernoulli random variables. Such a network is a special case of the model proposed here, but it lacks the sparsity properties and unbounded number of vertices that are discussed in Section
3. Barber (2008) also proposes a noisilyobserved Clique Matrix model, where two vertices are connected with probability (whererepresents the sigmoid function). Unlike the noisyOR formulation for partially observed cliques proposed in Section
2.2, this model precludes sparse graphs, since it allows edges between vertices with no cliques in common. A variational inference procedure is proposed.In recent years, a number of Bayesian nonparametric models for sparse graphs have been developed. Caron and Fox (2017) propose sampling multigraphs according to a Poisson process on the space of potential edges. The base measure of this Poisson process is the product measure of a generalized gamma process . The nonparametric nature of the generalized gamma processes means the total number of edges is unbounded, and its powerlaw behaviors yield sparse networks. This work has been generalized by Veitch and Roy (2015) and Borgs et al. (2018) to include both sparse and dense graphs.
A related class of models are edgeexchangeable networks. Most models in this class build multigraphs by repeatedly sampling a single edge from a probability distribution on
that can be decomposed into either the product measure of a single nonparametric probability measure on (Cai et al., 2016; Crane and Dempsey, 2017) or the product measure of two hierarchically coupled probabilitiy measures (Williamson, 2016). In the former setting, if is a normalized generalized gamma process, the resulting multigraph corresponds to the conditional distribution of the Caron and Fox model, given the total number of edges.Like the models proposed in this paper, these edge exchangeable models can be interpreted as generating a random edge clique cover. Multigraphs are constructed by repeatedly sampling cliques consisting of one or two vertices. By contrast, the models proposed in this paper will generate cliques of size . While this does not mean that all maximal cliques will have size 2, the smaller buildingblock cliques in the edge exchangeable models will tend to give graphs with smaller cliques than those described in this paper.
One variant of the edge exchangeable graphs that is particularly relevant to our work is the multiple edges per step edgeexchangeable network proposed by Cai et al. (2016). Here, rather than specify a probability distribution on , the authors specify an arbitrary random measure on , and use this to parametrize a distribution over edges. At each step a random number of edges are added, with edge selected with probability . If—as is given as an example in Cai et al. (2016)—the are the atom sizes of a stable beta process, then the expected total number of edges and vertices, and the expected number of edges between two vertices conditioned on their associated , coincide with the model described in this paper, due to linearity of expectation. The difference arises because the multipleedge exchangeable model generates edges independently at each step. In our model, within each clique, the edges are dependent – giving rise to dense subgraphs. Despite having similar expected sparsity, the multistep edgeexchangeable model does not have the same local density properties as our model.
Further, by representing edge probabilities using a product of beta stable processes, this edge exchangeable model model lacks the conjugacy that is present when we represent clique inclusion probabilities using a single stable beta process. As a result, posterior inference in the multipleedge stable beta model poses a significant challenge, and has not yet been addressed in the literature.
5 Posterior inference
We infer the clique matrix underlying a random graph using a reversible jump MCMC algorithm that proposes either splitting or merging cliques. In the partially observed setting, we augment this split/merge procedurewith Gibbs steps that help learn the finegrained structure of the clique cover; we found that adding Gibbs steps had little benefit in the fully observed setting. We can either optimize the hyperparameters, or place a prior on them and infer their posterior distribution using MetropolisHastings proposals.
Initialization
In the fully observed graph setting, the likelihood is a step function that is one iff , so a sampler can only make moves that do not change the sparsity pattern of . Because of this, the clique must be initialized to values compatible with . One way of doing this is to initialize to the set of 2cliques corresponding to the edges of the graph. An alternative approach is to use an edge clique cover algorithm; however this may be infeasible for large graphs since finding a minimal cover is NP hard. For the partially observed graph, it suffices to choose an initialiation so that covers all edges in the graph.
Split/merge sampling for
To explore the space of clique covers, we propose split/merge moves that can either split a large clique into two potentially overlapping cliques, of that merge two cliques into a single clique. We select an edge of uniformly at random, and then for each vertex associated with that edge, we select a clique uniformly from the set of cliques it belongs to. If , we propose replacing the clique with two cliques . We set and . Then, for each nonzero entry in , we consider possible settings of . We exclude any settings that are incompatible with the network structure, and select uniformly between the other options.
If , we propose replacing the cliques and with their union . We note that the resulting proposal could have a likelihood of zero, due to introducing edges that do not appear in the network.
We accept or reject the split or merge by calculating the appropriate reversible jump MCMC acceptance probability.
Gibbs sampling step for
For such that , we can augment the split/merge proposals with Gibbs steps for that sample from the conditional distribution
where is the likelihood of the graph given the proposed clique matrix.
We can then sample the number of allzero cliques using a MetropolisHastings proposal, proposing a new number of allzero cliques from an appropriate distribution and calculating the corresponding acceptance probability. We found adding Gibbs sampling steps did not improve mixing in the fully observed setting (where the set of compatible clique matrices is more constrained), but was important in the partially observed setting.
6 Experimental evaluation
We explore properties of our distribution over graphs in both the fully observed and the partially observed setting.
6.1 Fully observed graph setting
We begin by ascertaining how well the fully observed model can capture statistical properties of realworld graphs. We consider the following graph statistics: ratio of edges to vertices; density (defined as ); degree distribution; average local clustering coefficient (Watts and Strogatz, 1998); and the distribution over the largest maximal clique a vertex belongs to. We use the NetworkX python package (Hagberg et al., 2008) to calculate clustering coefficients and maximal cliques.








Clique  Mean dist.  Authors 

1  2.02  Wainwright(1), Ihler(3), Steck(3), Freeman(2), Jaakkola(2), Tappen(3), Weiss(1), Adelson(3), Willsky(2), Sudderth(2) 
2  3.57  Moore(8), Pelleg(9), Liu(9), Neill(9), Gray A.(9), Gray A. G.(7) 
3  1.76  Russell(1), Griffiths(1), Tenenbaum(1), Ng(1), Nguyen(1), Kim(1), Blei(1), Xing(1), Sanjana(2), Sastry(1), Karp(1) 
4  4.73  Pickup (2), Fukumizu, Akaho (2), Bach, Tarassenko (2), Amari (2), Hughes (2), Zisserman (2), Roberts (2) 
5  3.09  Zheng(1), Frey(3), Southey(5), Moran(3), Patrascu(3), Jaakkola(2), Schuurmans(4), Ng(1), Liblit(1), Monteleoni(3), Aiken(1), Ghodsi(5) 
6  2.98  Greensmith(2), Baxter(2), Taycher(4), Frean(2), McAuliffe(1), Bartlett(1), Mason(2), Fisher(3), Darrell(3) 
7  3.81  Sallans(6), Mayraz(6), Todorov(1), Paccanaro(6), Brown(6), Hinton(5) 
8  2.07  Lanckriet(1), Bhattacharyya(1), Sundararajan(3), Keerthi(2), Ghaoui(1), elGhaoui(1), Nilim(2) 
To determine how wellsuited our model is for modeling real graphs, we consider four realworld graphs,

The coauthorship graph for NIPS publications from 19872003.^{3}^{3}3https://www.kaggle.com/benhamner/nipspapers

The largest connected component of the coappearance graph generated from IMDB cast lists of comedies from 20002002.^{4}^{4}4www.imdb.com

The coauthorship graph for arXiv publications in the category GRQC (General Relativity and Quantum Cosmology) from 2003 (Leskovec et al., 2007).

The full interaction graph from a single month (January 2000) of the ENRON dataset (Klimt and Yang, 2014).
We model each graph using our proposed (fully observed) model (labeled RCC in tables and figures), and learn the hyperparameters using maximum likelihood. We then sample 25 graphs using these hyperparameters, and compare the average statistics of the sampled graph with those of the original graph in Table 1 and Figure 8. For comparison, we also model the realworld graphs using the sparse graph model of Caron and Fox (2017) (labeled BNPgraph in tables and figures), which can capture graph sparsity, and powerlaw degree distribution, and an unbounded number of vertices.
We see that both models do a good job at capturing sparsity and degree distribution. However, the Caron & Fox model does not capture the locally dense structure. While simulations from our model have similar clustering coefficients to the real graphs, the Caron & Fox model has nearzero clustering coefficients. Our simulations also exhibit larger maximal cliques. While the maximal clique distribution for the IMDB dataset seems to overestimate the number of cliques, we believe this is an artifact of the data. The dataset only includes the four highest billed actors for each movie, artificially deflating clique sizes.
6.2 Partially observed graph setting
The partially observed setting offers the ability to infer a latent graph—and corresponding set of latent cliques—underlying an observed graph. Intuitively, this latent graph is likely to capture latent communitytype structure. To evaluate this empirically, we modeled the NIPS coauthorship dataset described above using a partially observed model. We consider publications within the period 19992003, assume a shared clique parameter , and infer hyperparameters using MetropolisHastings. Figure 9 shows the latent structure found (based on a single sample from the posterior).
Posterior draws for concentrated around 0.3, robustly with respect to hyperparameter specification. Thus, roughly 70% of the edges in the inferred graph are not in the original graph. As expected–given the short period of analysis–445 out of the 516 authors in consideration belonged to only one clique, and only eleven of them to more than three. The average clique size was 10.82. Not surprisingly, the author that appeared in eight cliques was Michael Jordan. The members of these eight cliques are shown in Table 2. It is worth mentioning that casebycase inspection show that in some cases, inferred edges were in fact collaborations outside the period of observation.
7 Discussion
We have presented a new class of Bayesian nonparametric prior for graphs, based on random clique selection mechanisms, that are appropriate for many realworld graphs. We have presented some preliminary work in a modeling context; we hope to see this form of random graphbased model explored further in future.
While we base our model on the stable beta process, alternative subset selection mechanisms could be proposed. For example, a restricted stable beta IBP (Williamson et al., 2013; Utkovski et al., 2018) could give more explicit control over the number of cliques. We leave exploration of alternative mechanisms as future work.
References

Airoldi et al. [2008]
Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing.
Mixed membership stochastic blockmodels.
Journal of Machine Learning Research
, 9(Sep):1981–2014, 2008.  Albert and Barabási [2002] Réka Albert and AlbertLászló Barabási. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47, 2002.

Barber [2008]
David Barber.
Clique matrices for statistical graph decomposition and
parameterising restricted positive definite matrices.
In
Proceedings of the TwentyFourth Conference on Uncertainty in Artificial Intelligence
, pages 26–33, 2008.  BloemReddy and Orbanz [2016] Benjamin BloemReddy and Peter Orbanz. Random walk models of network formation and sequential Monte Carlo methods for graphs. arXiv preprint arXiv:1612.06404, 2016.
 Borgs et al. [2018] Christian Borgs, Jennifer T Chayes, Henry Cohn, and Nina Holden. Sparse exchangeable graphs and their limits via graphon processes. Journal of Machine Learning Research, 18(210):1–71, 2018.
 Broderick et al. [2012] Tamara Broderick, Michael I Jordan, Jim Pitman, et al. Beta processes, stickbreaking and power laws. Bayesian analysis, 7(2):439–476, 2012.
 Cai et al. [2016] Diana Cai, Trevor Campbell, and Tamara Broderick. Edgeexchangeable graphs and sparsity. In Advances in Neural Information Processing Systems, pages 4249–4257, 2016.
 Caron and Fox [2017] François Caron and Emily B Fox. Sparse graphs using exchangeable random measures. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1295–1366, 2017.
 Crane and Dempsey [2017] Harry Crane and Walter Dempsey. Edge exchangeable models for interaction networks. Journal of the American Statistical Association, 2017.
 Dangalchev [2004] Chavdar Dangalchev. Generation models for scalefree networks. Physica A: Statistical Mechanics and its Applications, 338(34):659–671, 2004.
 Erdös and Rényi [1960] Paul Erdös and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.
 Gilbert [1959] Edgar N Gilbert. Random graphs. The Annals of Mathematical Statistics, 30(4):1141–1144, 1959.
 Hagberg et al. [2008] Aric Hagberg, Pieter Swart, and Daniel S Chult. Exploring network structure, dynamics, and function using NetworkX. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2008.
 Heaukulani and Roy [2015] Creighton Heaukulani and Daniel M Roy. Gibbstype Indian buffet processes. arXiv preprint arXiv:1512.02543, 2015.

Hjort [1990]
Nils Lid Hjort.
Nonparametric Bayes estimators based on beta processes in models for life history data.
The Annals of Statistics, pages 1259–1294, 1990.  Holland et al. [1983] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
 Karrer and Newman [2011] Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in networks. Physical review E, 83(1):016107, 2011.
 Kingman [1967] John Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1):59–78, 1967.
 Klimt and Yang [2014] Bryan Klimt and Yiming Yang. Introducing the Enron corpus. In CEAS, 2014.
 Leskovec et al. [2007] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2, 2007.
 Lloyd et al. [2012] James Lloyd, Peter Orbanz, Zoubin Ghahramani, and Daniel M Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In Advances in Neural Information Processing Systems, pages 998–1006, 2012.
 Nešetřil and Ossona de Mendez [2012] Jaroslav Nešetřil and Patrice Ossona de Mendez. Sparsity: Graphs, Structures, and Algorithms. Springer, 2012.
 Orlin [1977] James Orlin. Contentment in graph theory: Covering graphs with cliques. In Indagationes Mathematicae (Proceedings), volume 80, pages 406–424. Elsevier, 1977.
 Teh and Görür [2009] Yee W Teh and Dilan Görür. Indian buffet processes with powerlaw behavior. In Advances in Neural Information Processing Systems, pages 1838–1846, 2009.
 Thibaux and Jordan [2007] Romain Thibaux and Michael I Jordan. Hierarchical beta processes and the Indian buffet process. In Artificial Intelligence and Statistics, pages 564–571, 2007.
 Utkovski et al. [2018] Zoran Utkovski, Melanie F Pradier, Viktor Stojkoski, Fernando PerezCruz, and Ljupco Kocarev. Economic complexity unfolded: Interpretable model for the productive structure of economies. PloS one, 13(8):e0200822, 2018.
 Veitch and Roy [2015] Victor Veitch and Daniel M Roy. The class of random graphs arising from exchangeable random measures. arXiv preprint arXiv:1512.03099, 2015.
 Watts and Strogatz [1998] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘smallworld’ networks. Nature, 393(6684):440, 1998.
 Williamson [2016] Sinead A Williamson. Nonparametric network models for link prediction. The Journal of Machine Learning Research, 17(1):7102–7121, 2016.
 Williamson et al. [2013] Sinead A Williamson, Steve N MacEachern, and Eric P Xing. Restricting exchangeable nonparametric distributions. In Advances in Neural Information Processing Systems, pages 2598–2606, 2013.