Log In Sign Up

Random clique covers for graphs with local density and global sparsity

by   Sinead A. Williamson, et al.

Large real-world graphs tend to be sparse, but they often contain densely connected subgraphs and exhibit high clustering coefficients. While recent random graph models can capture this sparsity, they ignore the local density. We show that models based on random edge clique covers can capture both global sparsity and local density, and are an appropriate modeling tool for many real-world graphs.


page 1

page 2

page 3

page 4


Random Overlapping Communities: Approximating Motif Densities of Large Graphs

A wide variety of complex networks (social, biological, information etc....

Growing Graphs with Hyperedge Replacement Graph Grammars

Discovering the underlying structures present in large real world graphs...

Dynamic Nonparametric Edge-Clustering Model for Time-Evolving Sparse Networks

Interaction graphs, such as those recording emails between individuals o...

The jump of the clique chromatic number of random graphs

The clique chromatic number of a graph is the smallest number of colors ...

Clustering powers of sparse graphs

We prove that if G is a sparse graph — it belongs to a fixed class of bo...

Mining Contrasting Quasi-Clique Patterns

Mining dense quasi-cliques is a well-known clustering task with applicat...

Causal Structural Learning Via Local Graphs

We consider the problem of learning causal structures in sparse high-dim...

1 Introduction

Random graph models provide statistical tools for network analysis and can be used as prior distributions in a Bayesian framework. Such models aim to capture various properties of real-world networks, such as power-law degree distributions (Albert and Barabási, 2002; Dangalchev, 2004; Bloem-Reddy and Orbanz, 2016), small-world properties (Watts and Strogatz, 1998), or latent community structure (Holland et al., 1983; Airoldi et al., 2008; Karrer and Newman, 2011).

One statistic of interest is the density of a graph, defined as the number of edges over the number of possible edges for a binary, undirected graph. This can be extended to distributions over graphs: we think of a random graph as being dense if the number of edges grows quadratically with the number of vertices, and sparse if it grows sub-quadratically with the number of vertices (Nešetřil and Ossona de Mendez, 2012). Many commonly-used random graphs, such as Erdös-Rényi graphs (Gilbert, 1959; Erdös and Rényi, 1960) or stochastic blockmodels and their variants (Holland et al., 1983; Airoldi et al., 2008; Karrer and Newman, 2011), concentrate on dense graphs (see Lloyd et al., 2012, for a discussion). This behavior is commonly seen in small, closed communities, where every vertex has the option to interact with all other vertex and a fully connected graph is at least conceivable. However it is not generally seen in larger networks, where a given vertex will not have the opportunity to interact with more than a small subset of the other vertices—for example international online social networks.

Recently, a number of models have been proposed that can generate sparse graphs. These models share certain similarities in construction: a measure is placed on the space of potential edges, and the observed edges are iid given this measure. For example, Caron and Fox (2017) construct networks as a Poisson process using a discrete, countably infinite base measure on , where is the space of potential vertices. Cai et al. (2016), Williamson (2016) and Crane and Dempsey (2017)

use discrete, countably infinite probability measures on either

or , and sample sequences of edges from this distribution. Appropriate random measure choices can yield sparsity and power-law degree distribution.

While models such as these offer the ability to model sparse graphs, they do not capture certain important network behaviors. While large-scale graphs are usually sparse, locally they are often dense: if two vertices and are both connected to vertex , a connection between vertices and is more likely than a connection between and a randomly selected vertex. Such graphs will typically exhibit a high average local clustering coefficient, indicating that most vertices’ neighborhoods are close to being cliques. This type of behavior is a key component of “small world” graphs (Watts and Strogatz, 1998).

In this paper we explore distributions over graphs that, while locally dense, exhibit global sparsity. We achieve this by explicitly modeling graphs as collections of cliques, or fully connected subsets of vertices. We present a framework for random clique edge covers, with cliques selected using a nonparametric feature selection model (in this paper, the stable beta Indian buffet process

(Teh and Görür, 2009), but other choices are possible). The resulting graphs exhibit local density, with clique sizes controlled primarily by a single parameter. On a global scale, the graphs are sparse, with flexible degree of sparsity that is primarily controlled by a separate parameter. We observe power-law distributions over both the number of cliques a vertex belongs to, and its degree.

When used directly as a distribution over graphs, we show that this model’s statistical properties mimic the statistics of real-world graphs (Section 6.1. When used in a hierarchical modeling context, the cliques can be thought of as latent communities (Section 6.2).

2 Random clique covers

A graph

is an ordered pair

of vertices and edges (which are pairs of vertices). can alternatively be described in terms of its edge clique cover (Orlin, 1977). An edge clique cover (or intersection graph) is a set of cliques—i.e. fully connected subgraphs—such that two vertices share an edge iff they have at least one clique in common.

This formulation suggests a method for generating random graphs by placing a distribution over cliques. Concretely, let a random clique be a random, finite subset of a (possibly uncountable) set of potential vertices. Given a distribution over sequences of such subsets, we can generate a sequence of cliques, that can be translated into a graph by adding undirected edges between all vertices with at least one clique in common, so that

If we number the vertices in our graph, and represent our cliques in terms of a binary matrix where iff clique includes vertex , then we can represent the graph’s adjacency matrix as

where the min is taken as an element-wise operation.

2.1 Cliques generated using exchangeable random measures

In order to specify our distribution over graphs, we must specify both a distribution over the number of cliques, and a distribution over the vertices appearing in those cliques. For the former, we choose to let , although other choices are possible. Reasonable desiderata for the random clique selection mechanism might be that the total number of vertices (and edges) is unbounded; that the size of each clique is random; and that cliques overlap with finite probability (avoiding the trivial solution where our graph is made up of a series of disconnected subgraphs).

One choice that meets these criteria is the stable beta process (Teh and Görür, 2009). The stable beta process is a completely random measure (Kingman, 1967) whose atoms lie in [0,1], parametrized by scalar parameters , and , and a smooth, sigma-finite measure on . If , this reduces to the (homogeneous) beta process (Hjort, 1990; Thibaux and Jordan, 2007). If , the stable beta process corresponds to a stable process with all atoms of size larger than one removed.

We can use the stable beta process to construct a distribution over cliques, which we can represent in terms of a binary matrix. If where each location corresponds to a vertex, we can sample an exchangeable sequence of cliques by including the th vertex in the th clique with probability . We can represent the resulting clique allocation as a binary matrix with infinitely many columns, where iff vertex is in clique .

If we marginalize out , we obtain a predictive distribution for that is known as the stable-beta Indian buffet process (SB-IBP, Teh and Görür, 2009). If is the number of times vertex has been previously selected, then the th clique will include that feature (i.e. ) with probability

In addition to features that have previously appeared, will also select new vertices, where .

The SB-IBP exhibits a number of power-law behaviors (Teh and Görür, 2009; Broderick et al., 2012; Heaukulani and Roy, 2015) that directly translate into desirable graph properties. The total number of non-zero columns of (in our case, observed vertices) in rows (cliques) grows, as , as .

In addition to having power-law behavior in the number of observed vertices, the number of non-zero entries per column (in our case, the number of cliques each vertex appears in) follows a Zipf’s law. Let be the number of features that appear exactly times in rows. Then, following Broderick et al. (2012),


As we will see in Section 3, when applied to edge clique covers, the SB-IBP yields many interesting graph-specific properties.

2.2 A model for partially observed cliques

In a modeling context, we may wish to consider the cliques as latent communities (Holland et al., 1983; Airoldi et al., 2008; Karrer and Newman, 2011). For example in a social network, a clique might represent a shared interest or hobby. In this context, we would not necessarily expect all vertices in the community to be connected. Instead, we can think of the community as being a noisy instantiation of a fully connected clique.

To model this in the current context, start with the distribution over random edge clique covers described in Section 2.1. If we associate each latent clique with a probability we can form a graph by including an edge between vertices and with probability

If we use a single probability across all cliques, this corresponds to including edges according to a noisy-OR likelihood: if two vertices have latent cliques in common, they share an edge with probability .

One way of thinking of this model is as a superposition of locally-defined Erdös-Renyi graphs. A graph builds a graph with vertices, by including each of the potential edges with probability . In the partially observed clique model, each row of the SB-IBP selects a clique of vertices, and builds a subgraph on those vertices according to a model. Within the clique, we have a dense subgraph almost surely.

3 Graph properties

In this section, we explore the statistical properties of graphs constructed using an SB-IBP distribution over cliques. In particular, we show that we can obtain sparse graphs that exhibit densely connected subgraphs. We focus on the fully observed setting described in Section 2.1, noting that the partially observed extension described in Section 2.2 will inherit similar properties.

3.1 Sparsity

Let be the number of edges in a graph, and be the number of vertices. We say our graph is dense if grows quadratically with , and sparse if it grows sub-quadratically. To explore the sparsity of the graph, we will look at how and behave as the number of cliques grows.

Let be the number of vertices first introduced in the th clique, and be the number of previously introduced vertices included in the th clique. We know (from the predictive distribution of the SB-IBP) that and, marginally, . So, the expected number of vertices (Broderick et al., 2012; Heaukulani and Roy, 2015) is

Conditioned on the probabilities assigned by the stable beta process to vertices and , the probability of an edge between and given cliques is . By linearity of expectation, the total expected number of edges, given , is therefore . Therefore, we can obtain the expected total number of edges in the graph as


In a related context, Cai et al. (2016) show, via a Poissonization approach, that this grows as .222For results for alternative choices of completely random measure, see Cai et al. (2016). Therefore, the expected number of edges grows as , which is always sub-quadratic in the number of edges. As we approach the limit where , we obtain the trivially sparse network where each clique contains a number of vertices, and there are no edges connecting cliques.

We can validate this limiting behavior using simulations from the network. Figure 1 shows how and co-vary for in random graphs, for different values of and . The scatter plots show pairs for ten network simulations, evaluated after each of 100 cliques are added. For high values of we have a near-linear relationship between and , indicating an extremely sparse graph. As decreases the exponent increases and the level of sparsity decreases (although the graph never becomes dense). The concentration parameter

controls the variance.

Figure 1: Number of edges vs number of vertices for a graph simulated with and varying values for and . Top: linear scale; bottom: log scale.
Figure 2: Network densities for graphs with varying values of , as the number of cliques increases.
Figure 3: Density for increasing values of , for , , and varying values of

3.2 Local density and clique distributions

Figure 4: Number of cliques each vertex belongs to, on a linear (left) and log (right) scale, for .
Figure 5: Average largest maximal clique per vertex, for a binary graph with , , and varying values of .

The sparsity of a graph is related to the proportion of the total number of cliques that a given vertex belongs to. Due to overlap, the random cliques represented by the clique membership matrix do not represent all cliques in the resulting graph. Therefor, we distinguish between generating cliques and maximal cliques. By generating cliques, we mean the cliques represented in . By maximal cliques, we refer to the set of all cliques in the resulting graph that cannot be enlarged by including another vertex. While some cliques may be both generating and maximal, the two will not generally coincide. Since generating cliques are explicitly represented in our model, we can derive their statistical properties directly. The maximal cliques are only indirectly related to the generating model, therefore we explore their properties empirically.

We first consider the distribution over the number of generating cliques a vertex belongs to, conditioned on the fact that it appears in the graph and therefore belongs to at least one clique. As , the probability of a vertex appearing in more than one clique vanishes, and we end up with a graph with unconnected cliques. For , the number of cliques a vertex appears in grows as for large and (Teh and Görür, 2009; Broderick et al., 2012), indicating that as increases, the expected proportion of cliques a vertex belongs to decreases and we see increasingly heavy-tailed distributions over the number of cliques a vertex belongs to. We can see this empirically in Figure 4, which shows box plots of the number of cliques that each vertex belongs to.

While the number of generating cliques each vertex belongs to controls the global sparsity, the size and degree of overlap of those generating cliques controls the local density. The size of the generating cliques is directly controlled by and is marginally . The size of the largest maximal clique to which a vertex belongs is a function of both the size of the underlying generating cliques, and the extent to which these cliques overlap (which is controlled by ). As , the generating cliques and the maximal cliques collide; Figure 5 shows how, as decreases, the average largest maximal clique for each vertex increases.

3.3 Degree distribution

The expected size of each generating clique is , and the number of cliques an edge belongs to follows a power law distribution. This leads to a power law distribution over the degree, with the expected degree decreasing with and increasing with . In Figure 6, we can see empirically that the expected degree decreases with , approaching as .

Figure 6: Average degree versus , for and three values of

. Error bars show one standard deviation across ten repetitions. Dotted lines represent corresponding values of


The top row of Figure 7 shows the degree distribution (in the form of a boxplot) for different values of and and different numbers of cliques . We see increasingly heavy tails as increases. Since the number of vertices also increases with , the maximum degree is higher for larger . The bottom row of Figure 7 shows the degree divided by the total number of vertices, i.e. the proportion of the other vertices to which a given vertex is connected. Here, it is much easier to compare different values of , and we see that the average proportion decreases and the distribution grows heavier tailed as increases, for all values of and .

Figure 7: Vertex degrees for different values of , and , for . Left: Raw degrees; Right: degree/number of vertices.

3.4 Density of the intersection graph of generating cliques

The clique graph of a graph is the intersection graph of its maximal cliques. As discussed above, our construction does not explicitly generate maximal cliques, so instead we will consider the generating-clique graph—i.e. the intersection graph of the generating cliques specified in . Perhaps surprisingly given the sparsity of the overall graph, this intersection graph is dense.

To show this, consider the expected number of vertices by which two cliques overlap. This is given by . Campbell’s theorem tells us that , where is a Poisson process on with rate measure —in our case, the Lévy measure of the stable beta process,

—and is a measureable function. Therefore the expected overlap between two cliques is

This indicates that two cliques overlap with positive probability, meaning that the resulting intersection graph is dense and that there is a path between any two vertices with positive probability, even if they do not belong to the same clique.

4 Related work

The idea of a random edge clique cover was explored by Barber (2008), under the name Clique Matrices. As in this paper, a binary matrix

is used to represent cliques within the network; however the number of vertices is fixed and the entries are i.i.d. Bernoulli random variables. Such a network is a special case of the model proposed here, but it lacks the sparsity properties and unbounded number of vertices that are discussed in Section 

3. Barber (2008) also proposes a noisily-observed Clique Matrix model, where two vertices are connected with probability (where

represents the sigmoid function). Unlike the noisy-OR formulation for partially observed cliques proposed in Section 

2.2, this model precludes sparse graphs, since it allows edges between vertices with no cliques in common. A variational inference procedure is proposed.

In recent years, a number of Bayesian nonparametric models for sparse graphs have been developed. Caron and Fox (2017) propose sampling multigraphs according to a Poisson process on the space of potential edges. The base measure of this Poisson process is the product measure of a generalized gamma process . The nonparametric nature of the generalized gamma processes means the total number of edges is unbounded, and its power-law behaviors yield sparse networks. This work has been generalized by Veitch and Roy (2015) and Borgs et al. (2018) to include both sparse and dense graphs.

A related class of models are edge-exchangeable networks. Most models in this class build multigraphs by repeatedly sampling a single edge from a probability distribution on

that can be decomposed into either the product measure of a single nonparametric probability measure on (Cai et al., 2016; Crane and Dempsey, 2017) or the product measure of two hierarchically coupled probabilitiy measures (Williamson, 2016). In the former setting, if is a normalized generalized gamma process, the resulting multigraph corresponds to the conditional distribution of the Caron and Fox model, given the total number of edges.

Like the models proposed in this paper, these edge exchangeable models can be interpreted as generating a random edge clique cover. Multigraphs are constructed by repeatedly sampling cliques consisting of one or two vertices. By contrast, the models proposed in this paper will generate cliques of size . While this does not mean that all maximal cliques will have size 2, the smaller building-block cliques in the edge exchangeable models will tend to give graphs with smaller cliques than those described in this paper.

One variant of the edge exchangeable graphs that is particularly relevant to our work is the multiple edges per step edge-exchangeable network proposed by Cai et al. (2016). Here, rather than specify a probability distribution on , the authors specify an arbitrary random measure on , and use this to parametrize a distribution over edges. At each step a random number of edges are added, with edge selected with probability . If—as is given as an example in Cai et al. (2016)—the are the atom sizes of a stable beta process, then the expected total number of edges and vertices, and the expected number of edges between two vertices conditioned on their associated , coincide with the model described in this paper, due to linearity of expectation. The difference arises because the multiple-edge exchangeable model generates edges independently at each step. In our model, within each clique, the edges are dependent – giving rise to dense subgraphs. Despite having similar expected sparsity, the multi-step edge-exchangeable model does not have the same local density properties as our model.

Further, by representing edge probabilities using a product of beta stable processes, this edge exchangeable model model lacks the conjugacy that is present when we represent clique inclusion probabilities using a single stable beta process. As a result, posterior inference in the multiple-edge stable beta model poses a significant challenge, and has not yet been addressed in the literature.

5 Posterior inference

We infer the clique matrix underlying a random graph using a reversible jump MCMC algorithm that proposes either splitting or merging cliques. In the partially observed setting, we augment this split/merge procedurewith Gibbs steps that help learn the fine-grained structure of the clique cover; we found that adding Gibbs steps had little benefit in the fully observed setting. We can either optimize the hyperparameters, or place a prior on them and infer their posterior distribution using Metropolis-Hastings proposals.


In the fully observed graph setting, the likelihood is a step function that is one iff , so a sampler can only make moves that do not change the sparsity pattern of . Because of this, the clique must be initialized to values compatible with . One way of doing this is to initialize to the set of 2-cliques corresponding to the edges of the graph. An alternative approach is to use an edge clique cover algorithm; however this may be infeasible for large graphs since finding a minimal cover is NP hard. For the partially observed graph, it suffices to choose an initialiation so that covers all edges in the graph.

Split/merge sampling for

To explore the space of clique covers, we propose split/merge moves that can either split a large clique into two potentially overlapping cliques, of that merge two cliques into a single clique. We select an edge of uniformly at random, and then for each vertex associated with that edge, we select a clique uniformly from the set of cliques it belongs to. If , we propose replacing the clique with two cliques . We set and . Then, for each non-zero entry in , we consider possible settings of . We exclude any settings that are incompatible with the network structure, and select uniformly between the other options.

If , we propose replacing the cliques and with their union . We note that the resulting proposal could have a likelihood of zero, due to introducing edges that do not appear in the network.

We accept or reject the split or merge by calculating the appropriate reversible jump MCMC acceptance probability.

Gibbs sampling step for

For such that , we can augment the split/merge proposals with Gibbs steps for that sample from the conditional distribution

where is the likelihood of the graph given the proposed clique matrix.

We can then sample the number of all-zero cliques using a Metropolis-Hastings proposal, proposing a new number of all-zero cliques from an appropriate distribution and calculating the corresponding acceptance probability. We found adding Gibbs sampling steps did not improve mixing in the fully observed setting (where the set of compatible clique matrices is more constrained), but was important in the partially observed setting.

6 Experimental evaluation

We explore properties of our distribution over graphs in both the fully observed and the partially observed setting.

6.1 Fully observed graph setting

We begin by ascertaining how well the fully observed model can capture statistical properties of real-world graphs. We consider the following graph statistics: ratio of edges to vertices; density (defined as ); degree distribution; average local clustering coefficient (Watts and Strogatz, 1998); and the distribution over the largest maximal clique a vertex belongs to. We use the NetworkX python package (Hagberg et al., 2008) to calculate clustering coefficients and maximal cliques.

Truth RCC BNPGraph
edges/vertices 1.74
1000density 1.29
mean degree 4.16
mean max. clique 3.50
clustering coeff. 0.59
(a) NIPS collaborations.
Truth RCC BNPGraph
edges/vertices 2.29
1000density 2.0
mean degree 4.57
mean max. clique 4.28
clustering coeff. 0.81
(b) IMDB comedies.
Truth RCC BNPGraph
edges/vertices 2.77
1000density 1.06
mean degree 5.53
mean max. clique 4.82
clustering coeff. 0.53
(c) ArXiv GR-QC.
Truth RCC BNPGraph
edges/vertices 3.66
1000density 0.63
mean degree 7.33
mean max. clique 3.17
clustering coeff. 0.22
(d) Enron emails.
Table 1: Empirical evaluation of statistical properties of random graphs generated from the parameters learned from real-world graphs. The parameters are taken from posterior MCMC draws.
(a) NIPS collaborations (1987-2003).
(b) IMDB comedies cast (2000-2002).
(c) ArXiv GR-QC (up to 2003).
(d) ENRON (1 month of data)
Figure 8: Empirical degree distributions and maximal clique distributions
Clique Mean dist. Authors
1 2.02 Wainwright(1), Ihler(3), Steck(3), Freeman(2), Jaakkola(2), Tappen(3), Weiss(1), Adelson(3), Willsky(2), Sudderth(2)
2 3.57 Moore(8), Pelleg(9), Liu(9), Neill(9), Gray A.(9), Gray A. G.(7)
3 1.76 Russell(1), Griffiths(1), Tenenbaum(1), Ng(1), Nguyen(1), Kim(1), Blei(1), Xing(1), Sanjana(2), Sastry(1), Karp(1)
4 4.73 Pickup (2), Fukumizu, Akaho (2), Bach, Tarassenko (2), Amari (2), Hughes (2), Zisserman (2), Roberts (2)
5 3.09 Zheng(1), Frey(3), Southey(5), Moran(3), Patrascu(3), Jaakkola(2), Schuurmans(4), Ng(1), Liblit(1), Monteleoni(3), Aiken(1), Ghodsi(5)
6 2.98 Greensmith(2), Baxter(2), Taycher(4), Frean(2), McAuliffe(1), Bartlett(1), Mason(2), Fisher(3), Darrell(3)
7 3.81 Sallans(6), Mayraz(6), Todorov(1), Paccanaro(6), Brown(6), Hinton(5)
8 2.07 Lanckriet(1), Bhattacharyya(1), Sundararajan(3), Keerthi(2), Ghaoui(1), el-Ghaoui(1), Nilim(2)
Table 2: Meta-analysis of the eight cliques of Michael Jordan. Shortest-path distance to Michael Jordan is shown in parentheses. Mean dist. is the mean path length between clique members.

To determine how well-suited our model is for modeling real graphs, we consider four real-world graphs,

  • The co-authorship graph for NIPS publications from 1987-2003.333

  • The largest connected component of the co-appearance graph generated from IMDB cast lists of comedies from

  • The co-authorship graph for arXiv publications in the category GR-QC (General Relativity and Quantum Cosmology) from 2003 (Leskovec et al., 2007).

  • The full interaction graph from a single month (January 2000) of the ENRON dataset (Klimt and Yang, 2014).

We model each graph using our proposed (fully observed) model (labeled RCC in tables and figures), and learn the hyperparameters using maximum likelihood. We then sample 25 graphs using these hyperparameters, and compare the average statistics of the sampled graph with those of the original graph in Table 1 and Figure 8. For comparison, we also model the real-world graphs using the sparse graph model of Caron and Fox (2017) (labeled BNPgraph in tables and figures), which can capture graph sparsity, and power-law degree distribution, and an unbounded number of vertices.

We see that both models do a good job at capturing sparsity and degree distribution. However, the Caron & Fox model does not capture the locally dense structure. While simulations from our model have similar clustering coefficients to the real graphs, the Caron & Fox model has near-zero clustering coefficients. Our simulations also exhibit larger maximal cliques. While the maximal clique distribution for the IMDB dataset seems to overestimate the number of cliques, we believe this is an artifact of the data. The dataset only includes the four highest billed actors for each movie, artificially deflating clique sizes.

6.2 Partially observed graph setting

The partially observed setting offers the ability to infer a latent graph—and corresponding set of latent cliques—underlying an observed graph. Intuitively, this latent graph is likely to capture latent community-type structure. To evaluate this empirically, we modeled the NIPS co-authorship dataset described above using a partially observed model. We consider publications within the period 1999-2003, assume a shared clique parameter , and infer hyperparameters using Metropolis-Hastings. Figure 9 shows the latent structure found (based on a single sample from the posterior).

Posterior draws for concentrated around 0.3, robustly with respect to hyperparameter specification. Thus, roughly 70% of the edges in the inferred graph are not in the original graph. As expected–given the short period of analysis–445 out of the 516 authors in consideration belonged to only one clique, and only eleven of them to more than three. The average clique size was 10.82. Not surprisingly, the author that appeared in eight cliques was Michael Jordan. The members of these eight cliques are shown in Table 2. It is worth mentioning that case-by-case inspection show that in some cases, inferred edges were in fact collaborations outside the period of observation.

(a) Original
(b) Inferred with partial model
Figure 9: Partially observed graph inference, new edges are shown in red.

7 Discussion

We have presented a new class of Bayesian nonparametric prior for graphs, based on random clique selection mechanisms, that are appropriate for many real-world graphs. We have presented some preliminary work in a modeling context; we hope to see this form of random graph-based model explored further in future.

While we base our model on the stable beta process, alternative subset selection mechanisms could be proposed. For example, a restricted stable beta IBP (Williamson et al., 2013; Utkovski et al., 2018) could give more explicit control over the number of cliques. We leave exploration of alternative mechanisms as future work.


  • Airoldi et al. [2008] Edoardo M Airoldi, David M Blei, Stephen E Fienberg, and Eric P Xing. Mixed membership stochastic blockmodels.

    Journal of Machine Learning Research

    , 9(Sep):1981–2014, 2008.
  • Albert and Barabási [2002] Réka Albert and Albert-László Barabási. Statistical mechanics of complex networks. Reviews of Modern Physics, 74(1):47, 2002.
  • Barber [2008] David Barber. Clique matrices for statistical graph decomposition and parameterising restricted positive definite matrices. In

    Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence

    , pages 26–33, 2008.
  • Bloem-Reddy and Orbanz [2016] Benjamin Bloem-Reddy and Peter Orbanz. Random walk models of network formation and sequential Monte Carlo methods for graphs. arXiv preprint arXiv:1612.06404, 2016.
  • Borgs et al. [2018] Christian Borgs, Jennifer T Chayes, Henry Cohn, and Nina Holden. Sparse exchangeable graphs and their limits via graphon processes. Journal of Machine Learning Research, 18(210):1–71, 2018.
  • Broderick et al. [2012] Tamara Broderick, Michael I Jordan, Jim Pitman, et al. Beta processes, stick-breaking and power laws. Bayesian analysis, 7(2):439–476, 2012.
  • Cai et al. [2016] Diana Cai, Trevor Campbell, and Tamara Broderick. Edge-exchangeable graphs and sparsity. In Advances in Neural Information Processing Systems, pages 4249–4257, 2016.
  • Caron and Fox [2017] François Caron and Emily B Fox. Sparse graphs using exchangeable random measures. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1295–1366, 2017.
  • Crane and Dempsey [2017] Harry Crane and Walter Dempsey. Edge exchangeable models for interaction networks. Journal of the American Statistical Association, 2017.
  • Dangalchev [2004] Chavdar Dangalchev. Generation models for scale-free networks. Physica A: Statistical Mechanics and its Applications, 338(3-4):659–671, 2004.
  • Erdös and Rényi [1960] Paul Erdös and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.
  • Gilbert [1959] Edgar N Gilbert. Random graphs. The Annals of Mathematical Statistics, 30(4):1141–1144, 1959.
  • Hagberg et al. [2008] Aric Hagberg, Pieter Swart, and Daniel S Chult. Exploring network structure, dynamics, and function using NetworkX. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2008.
  • Heaukulani and Roy [2015] Creighton Heaukulani and Daniel M Roy. Gibbs-type Indian buffet processes. arXiv preprint arXiv:1512.02543, 2015.
  • Hjort [1990] Nils Lid Hjort.

    Nonparametric Bayes estimators based on beta processes in models for life history data.

    The Annals of Statistics, pages 1259–1294, 1990.
  • Holland et al. [1983] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, 1983.
  • Karrer and Newman [2011] Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in networks. Physical review E, 83(1):016107, 2011.
  • Kingman [1967] John Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1):59–78, 1967.
  • Klimt and Yang [2014] Bryan Klimt and Yiming Yang. Introducing the Enron corpus. In CEAS, 2014.
  • Leskovec et al. [2007] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2, 2007.
  • Lloyd et al. [2012] James Lloyd, Peter Orbanz, Zoubin Ghahramani, and Daniel M Roy. Random function priors for exchangeable arrays with applications to graphs and relational data. In Advances in Neural Information Processing Systems, pages 998–1006, 2012.
  • Nešetřil and Ossona de Mendez [2012] Jaroslav Nešetřil and Patrice Ossona de Mendez. Sparsity: Graphs, Structures, and Algorithms. Springer, 2012.
  • Orlin [1977] James Orlin. Contentment in graph theory: Covering graphs with cliques. In Indagationes Mathematicae (Proceedings), volume 80, pages 406–424. Elsevier, 1977.
  • Teh and Görür [2009] Yee W Teh and Dilan Görür. Indian buffet processes with power-law behavior. In Advances in Neural Information Processing Systems, pages 1838–1846, 2009.
  • Thibaux and Jordan [2007] Romain Thibaux and Michael I Jordan. Hierarchical beta processes and the Indian buffet process. In Artificial Intelligence and Statistics, pages 564–571, 2007.
  • Utkovski et al. [2018] Zoran Utkovski, Melanie F Pradier, Viktor Stojkoski, Fernando Perez-Cruz, and Ljupco Kocarev. Economic complexity unfolded: Interpretable model for the productive structure of economies. PloS one, 13(8):e0200822, 2018.
  • Veitch and Roy [2015] Victor Veitch and Daniel M Roy. The class of random graphs arising from exchangeable random measures. arXiv preprint arXiv:1512.03099, 2015.
  • Watts and Strogatz [1998] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393(6684):440, 1998.
  • Williamson [2016] Sinead A Williamson. Nonparametric network models for link prediction. The Journal of Machine Learning Research, 17(1):7102–7121, 2016.
  • Williamson et al. [2013] Sinead A Williamson, Steve N MacEachern, and Eric P Xing. Restricting exchangeable nonparametric distributions. In Advances in Neural Information Processing Systems, pages 2598–2606, 2013.