On the Power of Edge Independent Graph Models

Why do many modern neural-network-based graph generative models fail to reproduce typical real-world network characteristics, such as high triangle density? In this work we study the limitations of edge independent random graph models, in which each edge is added to the graph independently with some probability. Such models include both the classic Erdös-Rényi and stochastic block models, as well as modern generative models such as NetGAN, variational graph autoencoders, and CELL. We prove that subject to a bounded overlap condition, which ensures that the model does not simply memorize a single graph, edge independent models are inherently limited in their ability to generate graphs with high triangle and other subgraph densities. Notably, such high densities are known to appear in real-world social networks and other graphs. We complement our negative results with a simple generative model that balances overlap and accuracy, performing comparably to more complex models in reconstructing many graph statistics.



There are no comments yet.


page 1

page 2

page 3

page 4


Towards quantitative methods to assess network generative models

Assessing generative models is not an easy task. Generative models shoul...

Random Overlapping Communities: Approximating Motif Densities of Large Graphs

A wide variety of complex networks (social, biological, information etc....

GRAM: Scalable Generative Models for Graphs with Graph Attention Mechanism

Graphs are ubiquitous real-world data structures, and generative models ...

A generative neural network model for random dot product graphs

We present GraphMoE, a novel neural network-based approach to learning g...

A Tunable Model for Graph Generation Using LSTM and Conditional VAE

With the development of graph applications, generative models for graphs...

An Interpretable Graph Generative Model with Heterophily

Many models for graphs fall under the framework of edge-independent dot ...

Graphlet Count Estimation via Convolutional Neural Networks

Graphlets are defined as k-node connected induced subgraph patterns. For...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Our work centers on edge independent graph models, in which each edge is added to the graph independently with some probability . Formally,

Definition 1 (Edge Independent Graph Model).

For any symmetric matrix let be the distribution over undirected unweighted graphs where contains edge independently, with probability . I.e., .

Edge independent models encompass many classic random graph models. This includes the Erdös-Rényi model, where for all , for some fixed Erdös and Rényi (1960). It also includes the stochastic block model where if two nodes are in the same community and if two nodes are in different communities for some fixed with Snijders and Nowicki (1997). Other examples include e.g., the Chung-Lu configuration model Chung and Lu (2002), stochastic Kronecker graphs Leskovec et al. (2010).

Recently, significant attention has focused on graph generative models, which seek to learn a distribution over graphs that share similar properties to a given training graph, or set of graphs. Many algorithms parameterize this distribution as an edge independent model or closely related distribution. E.g., NetGAN and the closely related CELL model both produce and then sample edges independently without replacement with probabilities proportional to its entries, ensuring that at least one edge is sampled adjacent to each node Bojchevski et al. (2018); Rendsburg et al. (2020). Variational Graph Autoencoders (VGAE), GraphVAE, Graphite, and MolGAN are also all based on edge independent models Kipf and Welling (2016); Simonovsky and Komodakis (2018); De Cao and Kipf (2018); Grover et al. (2019).

Given their popularity in both classical and modern graph generative models, it is natural to ask:

How suited are edge independent models to modeling real-world networks. Are they able to capture features such as power-law degree distributions, small-world properties, and high clustering coefficients (triangle densities)?

1.1 Impossibility Results for Edge Independent Models

In this work we focus on the ability of edge independent models to generate graphs with high triangle, or other small subgraph densities. High triangle density (equivalently, a high clustering coefficient) is a well-known hallmark of real-work networks Watts and Strogatz (1998); Sala et al. (2010); Durak et al. (2012) and has been the focus of recent work exploring the power and limitations of edge-independent graph models Seshadhri et al. (2020); Chanpuriya et al. (2020).

It is clear that edge independent models can generate triangle dense graphs. In particular, in Def. 1 can be set to the binary adjacency matrix of any undirected graph, and will generate that graph with probability , no matter how triangle dense it is. However, this would not be a particularly interesting generative model – ideally should generate a wide range of graphs. To capture this intuitive notion, we define the overlap of an edge-independent model, which is closely related to the overlap stopping criterion for training used in training graph generative models Bojchevski et al. (2018); Rendsburg et al. (2020).

Definition 2 (Expected Overlap).

For symmetric let and

That is, for any , is the ratio of the expected number of edges shared by two graphs drawn independently from to the expected number of edges in a graph drawn from . In one extreme, when is a binary adjacency matrix, , and our generative model has simply memorized a single graph. In the other, if for all (i.e., is Erdös-Rényi), . This is the minimum possible overlap when .

Our main result is that for any edge independent model with bounded overlap, cannot have too many triangles in expectation. In particular:

Theorem 1 (Main Result – Expected Triangles).

For a graph , let denote the number of triangles in . Consider symmetric .

As an example, consider the setting where the distribution generates sparse graphs, with . Theorem 1 shows that whenever , – i.e. the graph is very triangle sparse with the number of triangles sublinear in the number of nodes. This verifies that an Erdös-Rényi graph cannot achieve simultaneously linear number of edges (i.e., ) and super-linear number of triangles (i.e., ) under our proposed lens of viewing generative models.

We extend Theorem 1 to give similar bounds for the density of squares and other -cycles (Thm. 4), as well as for the global clustering coefficient (Thm. 6). In all cases we show that our bounds are tight – e.g., in the triangle case, there is indeed an edge independent model with , matching the lower bound in Theorem 1.

1.2 Empirical Findings

Our theoretical results help explain why, despite performing well in a variety of other metrics, edge independent graph generative models have been reported to generate graphs with many fewer triangles and squares on average than the real-world graphs that they are trained on. Rendsburg et al. Rendsburg et al. (2020) test a suite of these models, including their own CELL model and the related NetGAN model Bojchevski et al. (2018). Of all these models, when trained on the Cora-ML graph with 2,802 triangles and 14,268 squares, none is able to generate graphs with more than 1,461 triangles and 6,880 squares on average. Similar gaps are observed for a number of other graphs. Rendsburg et al. also report that the triangle count increases as their notion of overlap (closely related to Def. 2) increases. Theorem 1 demonstrates that this underestimation of triangle count, and its connection to overlap is inherent to all edge independent models, no matter how refined a method used to learn the underlying probability matrix .

While our theoretical results bound the performance of any edge independent model, there may still be variation in how specific models trade-off overlap and realistic graph generation. To better understand this trade-off, we introduce two simple models with easily tunable overlap as baselines. One is based on reproducing the degree sequence of the original graph; the other, which is even simpler, is based on reproducing the volume. In both models, is a weighted average of the input graph adjacency matrix and a probability matrix of minimal complexity which matches either the input degrees or the volume. In the latter case, to match just the volume, we simply use an Erdös-Rényi graph. In the former case, to match the degree sequence, we introduce our own model, the odds product model; this model is similar to the Chung-Lu configuration model Chung and Lu (2002), but, unlike Chung-Lu, is able to match degree sequences of real-world graphs with high maximum degree. We find that these simple baselines are often competitive with more complex models like CELL in terms of matching key graph statistics, like triangle count and clustering coefficient, at similar levels of overlap.

1.3 Related Work

Existing impossibility results. Our work is inspired by that of Seshadhri et al. Seshadhri et al. (2020), which also proves limitations on the ability of edge independent models to represent triangle dense graphs. They show that if where for and the max and min are applied entrywise, then cannot have many triangles adjacent to low-degree nodes in expectation. This setting arises commonly when is generated using low-dimensional node embeddings – represented by the rows of . Chanpuriya et al. Chanpuriya et al. (2020), show that in a slightly more general model, where , this lower bound no longer holds – can be chosen so that is the binary adjacency matrix of any graph with maximum degree upper bounded by – no matter how triangle dense that graph is. Thus, even such low-rank edge independent models can represent triangle dense graphs – by memorizing a single one. In the appendix, we prove a similar result when is generated from the CELL model of Rendsburg et al. (2020), which simplifies NetGAN Bojchevski et al. (2018).

Our results show that this trade-off between the ability to capture triangle density and memorization is inherent – even without any low-rank constraint, edge independent models with low overlap simply cannot represent graphs with high triangle or other small subgraph density.

It is well understood that specific edge independent models, e.g., Erdös-Rényi graphs, the Chung-Lu model, and stochastic Kronecker graphs, do not capture many properties of real-world networks, including high triangle density Watts and Strogatz (1998); Pinar et al. (2012). Our results can be viewed as a generalization of these observations, to all edge independent models with low overlap. Despite the limitations of classic models, edge independent models are still very prevalent in today’s literature on graph generative models. Our more general results make clear the limitations of this approach.

Non-independent models. While edge independent models are very prevalent in the literature, many important models do not fit into this framework. Classic models include the Barabási–Albert and other preferential attachment models Barabási and Albert (1999), Watts–Strogatz small-world graphs Watts and Strogatz (1998), and random geometric graphs Dall and Christensen (2002). Many of these models were introduced directly in response to shortcomings of classic edge independent models, including their inability to produce high triangle densities

More recent graph generative models include GraphRNN You et al. (2018) and a number of other works Li et al. (2018); Liao et al. (2019). Our impossibility results do not apply to such models, and in fact suggest that perhaps they may be preferable to edge independent models, if a distribution over graphs with high triangle density is desired. A very interesting direction for future work would be to prove limitations on broad classes of non-independent models, and perhaps to understand exactly what type of correlation amongst edges is needed to generate graphs with both low overlap 111We note that for non-edge independent models, the measure of overlap as defined earlier should be adapted to take into account the order (permutation) of the vertices in the final graph. In particular, the overlap in this case should be the maximum value of it over any permutation of the vertex set.and hallmark features of real-world networks.

2 Impossibility Results for Edge Independent Models

We now prove our main results on the limitations of edge independent models with bounded overlap. We start with a simple lemma that will be central in all our proofs.

Lemma 2.

For any symmetric ,


Let be the indicator random variable that an edge appears in the graph . . By linearity of expectation and the independence of and we have,

The bound follows since is symmetric. Note that the lower bound is an equality if is on the diagonal – i.e., there is no probability of self loops. ∎

2.1 Triangles

Lemma 2 connects to

and in turn the eigenvalue spectrum of

since , where are the eigenvalues of . The expected number of triangles in can be written in terms of this spectrum as well, allowing us to relate overlap to this expected triangle count, and prove our main theorem (Theorem 1), restated below.

Theorem 1.

For a graph , let denote the number of triangles in . Consider symmetric .


By linearity of expectation,


Letting denote the largest magnitude eigenvalue of , we can in turn bound

Since , this gives via Lemma 2

Combining this bound with (1) completes the theorem. ∎

The bound of Theorem 1 is tight up to constants, for any possible value of . The tight example is when is simply an Erdös-Rényi graph.

Theorem 3 (Tightness of Expected Triangle Bound).

For any , there exists a symmetric with and .


Let for all . We have and Thus, . Further, by linearity of expectation,

We note that another example when Theorem 1 is tight is when is a union of a fixed clique on nodes and an Erdös-Rényi graph with connection probability on the rest of the nodes.

2.2 Squares and Other -cycles

We can extend Thm. 1 to bound the expected number of -cycles in in terms of .

Theorem 4 (Bound on Expected -cycles).

For a graph , let denote the number of -cycles in . Consider symmetric .


For notational simplicity, we focus on . The proof directly extends to general . is the number of non-backtracking 4-cycles in (i.e. squares), which can be written as

The factor accounts for the fact that in the sum, each square is counted

times – once for each potential starting vector

and once of each direction it may be traversed. For general -cycles this factor would be . We then can bound

For general -cycles this bound would be This in turn gives

where the last bound follows from Lemma 2. This completes the theorem.. ∎

It is not hard to see that Theorem 4 is also tight up to a constant depending on for any overlap , also for an Erdös-Rényi graph with connection probability .

Theorem 5 (Tightness of Expected -cycle Bound).

For any , there exists with and .

2.3 Clustering Coefficient

Theorem 1 shows that the expected number of triangles generated by an edge independent model is bounded in terms of the model’s overlap. Intuitively, we thus expect that graphs generated by the edge independent model will have low global clustering coefficient, which is the fraction of wedges in the graph that are closed into triangles Watts and Strogatz (1998).

Definition 3 (Global Clustering Coefficient).

For a graph with triangles, no self-loops, and node degrees , the global clustering coefficient is given by

We extend Theorem 1 to give a bound on in terms of . The proof is related, but more complex due to the in the denominator of .

Theorem 6 (Bound on Expected Clustering Coefficient).

Consider symmetric with zeros on the diagonal and with .


By Theorem 1 we have . We will show that with high probability, , which will give the theorem. Note that . Thus, by a Bernstein bound, for large enough since .

We can bound . Thus, with probability ,

where in the last step we use that and so . Combined with our bound on , and the fact that always, we have

Thus, to have a constant clustering coefficient for a graph with edges in expectation, we need . Note that the requirement of is very mild – it means that the expected average degree is at least .

As with our triangle bound, Theorem 6 is tight when is just an Erdös-Rényi distribution.

Theorem 7 (Tightness of Expected Clustering Coefficient Bound).

For any , there exists with zeros on the diagonal, and .


Let for all . We have and . Additionally, , and, if is large enough with respect to , with very high probability, . This gives:

3 Baseline Edge Independent Models

We now shift from proving theoretical limitations of edge independent models to empirically evaluating the tradeoff between overlap and performance for a number of particular models. Given an input adjacency matrix , these generative models produce a , samples from which should match various graph statistics of , such as the triangle count, clustering coefficient, and assortativity. At the same time, should ideally have lower overlap so that the model does not just memorize the original graph. We propose two simple generative models as baselines to more complicated existing models – in both the level of overlap is easily tuned. Our first baseline, the odds product model, is based on just matching the degree sequence of ; more simple still, the second baseline computes as a linear function of , just matching its volume.

Odds product model.

In this model, each node is assigned a logit

, and the probability of adding an edge between nodes and is , where is the logistic function. We fit the model by finding a vector of logits, with one logit for each node, such that the reconstructed network has the same expected degrees (i.e. row and column sums) as the original graph. We note that this model can be seen as a special case of the MaxEnt De Bie (2011) and inner-product Ma et al. (2020); Hoff (2003, 2005) models. In the context of directed graphs, has been called the expansiveness or popularity of node  Goldenberg et al. (2010).

For adjacency matrix , we denote its degree sequence by , where is the all-ones vector of length . Similarly, the degree sequence of the model is . We pose fitting the model as a root-finding problem: we seek such that the degree errors are zero, that is, . We use the multivariate Newton-Raphson method to solve this root-finding problem. To apply Newton-Raphson, we need the Jacobian matrix of derivatives of the degree errors with respect to the entries of . Since does not vary with , these derivatives are exactly . Letting be if and otherwise (i.e. the Kronecker delta),

In Algorithm 1, we provide pseudocode for computing the Jacobian matrix and for implementing Newton-Raphson method to compute . We do not have a proof that Algorithm 1 always converges and produces which exactly reproduces in the inut degree sequence. However, the algorithm converged on all test cases, and proving that it always converges would be an interesting future direction.

input graphical degree sequence , error threshold
output symmetric matrix with row/column sums approximately

1: is the vector of logits, initialized to all zeros
2: is the logistic function applied entrywise,
                                                            and is the all-ones column vector of length
3: degree sequence of
4:while   do
5:      is an entrywise product
6:      diag is the diagonal matrix with the input vector along its diagonal
7:      rather than inverting , we solve this linear system
10:end while
Algorithm 1 Fitting the odds product model

Our odds product model can be viewed as a variant of the Chung-Lu configuration model Chung and Lu (2002), which is also based on degree sequence matching. However, but our model comes without a certain restriction on the maximum degree: in Chung-Lu, it is assumed that the degrees of all nodes are bounded above by the square root of the volume of the graph, that is, for all nodes . Given this restriction, each node is assigned a weight , and the probability of adding edge is . Since the weights are all in , they can be interpreted as probabilities, and the probability of adding an edge between two nodes is the product of the two nodes’ probabilities.

Our odds product model works similarly, but instead of a probability, for each node, there is an associated odds, that is, a value in , and the odds of adding an edge between two nodes is the product of the two nodes’ odds. There is a one-to-one-to-one relationship between probability , odds , and logit . We outlined above how our model is based on adding logits associated with each node; since the odds is the exponentiation of the logit, the model can equally be viewed as multiplying odds associated with nodes.

Varying overlap in the odds product model. We propose a simple method to control the trade-off between overlap and accuracy in matching the input graph statistics in the odds product model. Given the original adjacency matrix and the generated by the odds product model to match the degree sequence of , we use a convex combination of and . That is, we use , where . As increases to , approaches a model which returns the original graph with high certainty; hence high produce with high overlap which closely match graph statistics, while low produce with lower overlap which may diverge from in some statistics. Note that since is a convex combination of adjacency matrices with the expected degree sequence of , also has the same expected degree sequence regardless of the value of .

Linear model. As an even simpler baseline, we also propose and evaluate the following model: we produce an Erdös-Rényi model with the same expected volume as the original graph , then return a convex combination of and . In particular, each entry of is , and, as with the odds product model, , where . This model can alternatively be seen as producing a by lowering each entry of which is to some probability , and raising each entry of which is to a probability , with , such that the volume is conserved.

4 Experimental Results

We now present our evaluations of different edge independent graph generative models in terms of the tradeoff achieved between overlap and performance in generating graphs with similar key statistics to an input network. These experiments highlight the strengths and limitations of each model, as well as the overall limitations of this class, as established by our theoretical bounds.

4.1 Methods

We compare our proposed models from Section 3 with a number of existing models described below

  1. CELL Rendsburg et al. (2020) (Cross-Entropy Low-rank Logits) An alternative to the popular NetGAN method Bojchevski et al. (2018) which strips the proposed architecture of deep leaning components and achieves comparable performance in significantly less time, via a low-rank approximation approach. To control overlap, we follow the approach of the original paper, halting training once the generated graph exceeds a specified overlap threshold with the input graph. We set the rank parameter to a value that allows us to get up to 75% overlap (typical values are 16 and 32).

  2. TSVD

    (Truncated Singular Value Decomposition) A classic spectral method which computes a rank-

    approximation of the adjacency matrix using truncated SVD. As in Seshadhri et al. (2020), the resulting matrix is clipped to [0,1] to yield . Overlap is controlled by varying .

  3. CCOP (Convex Combination Odds Product) The odds product model as of Sec. 3 with overlap controlled by taking a convex combination of and the input adjacency matrix .

  4. HDOP (Highest Degree Odds Product) The odds product model, but with overlap controlled by fixing the edges adjacency to a certain number of the highest degree nodes. See Appendix for results on other variants, e.g., where some number of dense subgraphs are fixed.

  5. Linear The convex combination between the input adjacency matrix and an Erdös-Rényi graph, as described in Sec. 3, with overlap controlled by varying the parameter.

CCOP, HDOP, and Linear all produce edge probability matrices with the same volume, , in expectation as the original adjacency matrix. For TSVD, letting be the low-rank approximation of the adjacency matrix, we learn a scalar shift parameter using Newton’s method such that has volume . We then generate new networks from the edge independent distribution (Def. 1). For CELL, we follow the authors’ approach of generating edges without replacement - an edge is added with probability proportional to ).

We sample 5 networks from each distribution and report the average for every statistic.

4.2 Datasets and network statistics

For evaluation, we use the following seven popular datasets with varied structure, from triangle-rich social networks to planar road networks:

  1. PolBlogs: A collection of political blogs and the links between them.

  2. Citeseer: A collection of papers from six scientific categories and the citations among them.

  3. Cora: A collection of scientific publications and the citations among them.

  4. Road-Minnesota: A road network from the state of Minnesota. Each intersection is a node.

  5. Web-Edu: A web-graph drawn from educational institutions.

  6. PPI: A subgraph of the PPI network for Homo Sapiens. Vertices represent proteins and edges represent interactions.

  7. Facebook: A union of ego networks of Facebook users.

See Table 1 for statistics about the networks. We treat all networks as binary, in that we set all non-zero weights to , and undirected, in that if edge appears in the network, we also include edge . Also, we keep only the largest connected component of each network.

Dataset Nodes Edges Triangles
PolBlogs Adamic and Glance (2005) 1,222 33,428 101,043
Citeseer Sen et al. (2008) 2,110 7,336 1,083
Cora Sen et al. (2008) 2,485 10,138 1,558
Road-Minnesota Rossi and Ahmed (2015) 2,640 6,604 53
Web-Edu Gleich et al. (2004) 3,031 12,948 10,058
PPI Stark et al. (2010) 3,852 75,682 91,461
Facebook Leskovec and Mcauley (2012) 4,039 176,468 1,612,010
Table 1: Dataset summaries

We evaluate performance in matching the following key network statistics:

  1. Pearson correlation of the degree sequences of the input and the generated network.

  2. Maximum degree over all nodes.

  3. Exponent of a power-law distribution fit to the degree sequence.

  4. Assortativity, a measure that captures the preference of nodes to attach to others with similar degree (ranging from -1 to 1).

  5. Pearson correlation of the triangle sequence (number of triangles a node participates in).

  6. Total triangle count (analyzed theoretically in Thm. 1).

  7. Global clustering coefficient (defined in Def. 3 and analyzed theoretically in Thm. 6).

  8. Characteristic path length (average path length between any two nodes).

4.3 Results

The theoretical results from Section 2 highlight a key weakness of edge independent generative models: they cannot generate many triangles (or other higher-order locally dense areas), without having high overlap and thus not generating a diversity of graphs. We observe that these theoretical findings hold in practice – generally speaking, all models tested tend to significantly underestimate triangle count and global clustering coefficient, as well as inaccurately match the triangle degree sequence, when overlap is low. See Figures 1, 2, 3, 4, 5, 6, and 7 for results on the tested networks. As overlap increases, performance in reconstructing these metrics does as well, as expected.

All methods are able to capture certain network characteristics accurately, even at low overlap. Even for a relatively small overlap (less than 0.2), the CCOP and HDOP methods accurately capture the degree sequences of the true networks (as they are designed to do). These methods, especially HDOP which fixes edges from high degree nodes, often outperform more sophisticated methods like CELL in terms of triangle density and triangle degree sequence correlation. On the other hand, CELL seems to do a somewhat better job capturing global features, like the characteristic path length. TSVD provides a fair compromise – it performs better than CELL in terms of degree sequence and triangle counts, but worse in terms of characteristic path length. In general, it is the method that gives the best results when the overlap is extremely small, appearing to be less sensitive to the variation in overlap.

Broadly speaking, all methods do reasonably well in matching the power-law degree distribution of the networks, even when they do not match the actual degree sequence closely. With the exception of Web-Edu, they tend to underestimate the characteristic path length. This is perhaps not surprising due to the independent random edge connections, however it would be interesting to understand more theoretically.

Figure 1: Metrics for PolBlogs.
Figure 2: Metrics for citeseer.
Figure 3: Metrics for cora.
Figure 4: Metrics for road-minnesota.
Figure 5: Metrics for web-edu.
Figure 6: Metrics for PPI
Figure 7: Metrics for Facebook.

4.4 Code for Reproducing Results

Code is available at https://github.com/konsotirop/edge_independent_models. Our implementation of the methods we introduce is written in Python and uses the NumPy Harris et al. (2020) and SciPy Virtanen et al. (2020) packages. Additionally, to calculate the various graph metrics, we use the following packages: powerlaw Alstott et al. (2014) and MACE (MAximal Clique Enumerator) Takeaki (2012).

5 Conclusion

Our theoretical results prove limitations on the ability of any edge independent graph generative model to produce networks that match the high triangle densities of real-world graphs, while still generating a diverse set of networks, with low model overlap. These results match empirical findings that popular edge independent models indeed systematically underestimate triangle density, clustering coefficient, and related measures. Despite the popularity of edge independent models, many non-independent models, such as graph RNNs You et al. (2018) have been proposed. An interesting future direction would be to study the representative power and limitations of such models, giving general theoretical results that provide a foundation for the study of graph generative models.


  • [1] L. A. Adamic and N. Glance (2005) The political blogosphere and the 2004 us election: divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. Cited by: Table 1.
  • [2] J. Alstott, E. Bullmore, and D. Plenz (2014) Powerlaw: a python package for analysis of heavy-tailed distributions. PloS one 9 (1), pp. e85777. Cited by: §4.4.
  • [3] A. Barabási and R. Albert (1999) Emergence of scaling in random networks. Science 286 (5439), pp. 509–512. Cited by: §1.3.
  • [4] A. Bojchevski, O. Shchur, D. Zügner, and S. Günnemann (2018) NetGAN: Generating graphs via random walks. 2018. Cited by: Appendix A, §1.1, §1.2, §1.3, §1, item 1.
  • [5] S. Chanpuriya, C. Musco, K. Sotiropoulos, and C. E. Tsourakakis (2020) Node embeddings and exact low-rank representations of complex networks. 2020. Cited by: Appendix A, §1.1, §1.3.
  • [6] F. Chung and L. Lu (2002) The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences 99 (25), pp. 15879–15882. Cited by: §1.2, §1, §3.
  • [7] J. Dall and M. Christensen (2002) Random geometric graphs. Physical Review E 66 (1), pp. 016121. Cited by: §1.3.
  • [8] T. De Bie (2011) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery 23 (3), pp. 407–446. Cited by: §3.
  • [9] N. De Cao and T. Kipf (2018) MolGAN: An implicit generative model for small molecular graphs. ICML Deep Generative Models Workshop. Cited by: §1.
  • [10] N. Durak, A. Pinar, T. G. Kolda, and C. Seshadhri (2012) Degree relations of triangles in real-world networks and graph models. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Cited by: §1.1.
  • [11] P. Erdös and A. Rényi (1960) On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5 (1), pp. 17–60. Cited by: §1.
  • [12] D. Gleich, L. Zhukov, and P. Berkhin (2004) Fast parallel pagerank: a linear system approach. Yahoo! Research Technical Report 13, pp. 22. Cited by: Table 1.
  • [13] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi (2010) A survey of statistical network models. Cited by: §3.
  • [14] A. Grover, A. Zweig, and S. Ermon (2019) Graphite: iterative generative modeling of graphs. In 2019, Cited by: §1.
  • [15] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E. Oliphant (2020-09) Array programming with NumPy. Nature 585 (7825), pp. 357–362. External Links: Document, Link Cited by: §4.4.
  • [16] P. D. Hoff (2003) Random effects models for network data. In Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers, Cited by: §3.
  • [17] P. D. Hoff (2005) Bilinear mixed-effects models for dyadic data. Journal of the American Statistical Association 100 (469), pp. 286–295. Cited by: §3.
  • [18] T. N. Kipf and M. Welling (2016) Variational graph auto-encoders.

    NeurIPS Bayesian Deep Learning Workshop

    Cited by: §1.
  • [19] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani (2010) Kronecker graphs: an approach to modeling networks..

    Journal of Machine Learning Research

    11 (2).
    Cited by: §1.
  • [20] J. Leskovec and J. J. Mcauley (2012) Learning to discover social circles in ego networks. In 2012, pp. 539–547. Cited by: Table 1.
  • [21] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia (2018) Learning deep generative models of graphs. 1803.03324. Cited by: §1.3.
  • [22] R. Liao, Y. Li, Y. Song, S. Wang, C. Nash, W. L. Hamilton, D. Duvenaud, R. Urtasun, and R. S. Zemel (2019) Efficient graph generation with graph recurrent attention networks. 1910.00760. Cited by: §1.3.
  • [23] Z. Ma, Z. Ma, and H. Yuan (2020) Universal latent space model fitting for large networks with edge covariates.. J. Mach. Learn. Res. 21, pp. 4–1. Cited by: §3.
  • [24] A. Pinar, C. Seshadhri, and T. G. Kolda (2012) The similarity between stochastic Kronecker and Chung-Lu graph models. In Proceedings of the 2012 SIAM International Conference on Data Mining, Cited by: §1.3.
  • [25] L. Rendsburg, H. Heidrich, and U. von Luxburg (2020) NetGAN without GAN: from random walks to low-rank approximations. In 2020, Cited by: Appendix A, Appendix A, §1.1, §1.2, §1.3, §1, item 1.
  • [26] R. A. Rossi and N. K. Ahmed (2015) The network data repository with interactive graph analytics and visualization. In AAAI, External Links: Link Cited by: Table 1.
  • [27] A. Sala, L. Cao, C. Wilson, R. Zablit, H. Zheng, and B. Y. Zhao (2010) Measurement-calibrated graph models for social network experiments. In 2010, Cited by: §1.1.
  • [28] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad (2008) Collective classification in network data. AI Magazine 29 (3), pp. 93–93. Cited by: Table 1.
  • [29] C. Seshadhri, A. Sharma, A. Stolman, and A. Goel (2020) The impossibility of low-rank representations for triangle-rich complex networks. Proceedings of the National Academy of Sciences 117 (11), pp. 5631–5637. Cited by: §1.1, §1.3, item 2.
  • [30] M. Simonovsky and N. Komodakis (2018) GraphVAE: towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pp. 412–422. Cited by: §1.
  • [31] T. A. Snijders and K. Nowicki (1997) Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification 14 (1), pp. 75–100. Cited by: §1.
  • [32] C. Stark, B. Breitkreutz, A. Chatr-Aryamontri, L. Boucher, R. Oughtred, M. S. Livstone, J. Nixon, K. Van Auken, X. Wang, X. Shi, et al. (2010) The BioGRID interaction database: 2011 update. Nucleic Acids Research 39, pp. D698–D704. Cited by: Table 1.
  • [33] U. Takeaki (2012) Implementation issues of clique enumeration algorithm. Special issue: Theoretical computer science and discrete mathematics, Progress in Informatics 9, pp. 25–30. Cited by: §4.4.
  • [34] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, pp. 261–272. External Links: Document Cited by: §4.4.
  • [35] D. J. Watts and S. H. Strogatz (1998) Collective dynamics of ‘small-world’networks. Nature 393 (6684), pp. 440–442. Cited by: §1.1, §1.3, §1.3, §2.3.
  • [36] J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec (2018) GraphRNN: generating realistic graphs with deep auto-regressive models. In 2018, pp. 5708–5717. Cited by: §1.3, §5.

Appendix A Exact Embeddings in the CELL Model

Recently, Rendsburg et al [25] propose the CELL graph generator: a major simplification of the NetGAN algorithm for [4], which gives comparable performance, much faster runtimes, and helps clarify the key components of the generator. CELL uses a simple low-rank factorization model. Here we prove that, when its rank parameter is , the CELL model can ‘memorize’ any graph with degree bounded by . This allows the model to trivially produce distributions with very high expected triangle densities. However, as our main results show, this inherently requires memorization and high overlap.

Our result can be viewed as an extension of the results of [5], which considers a different edge independent model. The proof techniques are very similar. Interestingly, our result seem to indicate that the good generalization of CELL in link prediction tests may mostly be due to the fact that this model is not fully optimized, to the point of memorizing the input.

The CELL Model. We first describe the CELL model introduced in [25].

  1. Given a graph adjacency matrix , let


    where applies a softmax rowwise to – ensuring that each row of sums to .

  2. Let and let

    be the eigenvector satisfying


  3. Let .

  4. Generate .

Note that the last step described above is slightly different than the approach taken in CELL. Rather than use an edge-independent model as in Def. 1, they form by sampling edges without replacement, with probability proportional to the entries in . They also insure that at least one edge is sampled adjacent to every node. However, this distinction is minor.

Unconstrained Optimum. We first show that, if the rank constraint in (2) is removed, then the optimal has , where is the diagonal degree matrix. At this minimum, we can check that , the degree of the node, and thus and . That is, the model simply outputs the input graph with probability .

Theorem 8 (CELL Optimum).

The unconstrained CELL objective function (2) is minimized when . At this minimum, the edge independent model is simply . That is, the model just returns the input graph with probability .


It suffices to consider the row of for each , since the objective function of (2) breaks down rowwise. Let be the rows of and respectively. Note that is a probability vector, with for all and .

We seek to minimize We need to show that this objective is minimized when – i.e., when places mass at each nonzero entry in is the row of , so applying this argument to all gives that is the overall minimizer. Assume for the sake of contradiction that there is some other minimizer . Since , we must have for some where . In turn, there must be some with either (1) and or (2) and . In case (1), clearly moving mass from to will decrease the objective function. In case (2), due to the concavity of the log function, moving mass from to will also decrease the objective function. Thus, cannot be a minimizer, completing the proof. ∎

Rank-Constrained Optimum. We next show that the unconstrained optimum of , which leads to CELL memorizing the input graph (Thm. 8) can be achieved even with the rank constraint of (2), as long as , where is the maximum degree of the input graph.

Theorem 9 (CELL Exact Factorization).

If is an adjacency matrix with maximum degree , there is a rank matrix with

where . Note that the rank of does not depend on , and so we can drive and find a rank- which is arbitrarily close to minimizing (2) and thus produces which is arbitrarily close to .


Let be the Vandermonde matrix with . For any , . That is: is a degree polynomial evaluated at the integers .

Let be the row of . Note that has at most nonzeros whose positions we denote by . To prove the theorem, for each row , we will construct a polynomial which has the same positive value at each and is negative all all other integers . Then, we will let be the matrix with columns and . Note that , and is equal to a fixed positive value whenever A is one and negative whenever it is zero. If we scale by a very large number (which does not affect its rank), we will have arbitrarily close to , since the rowwise softmax will place equal probability on each positive entry in row of and arbitrarily close to probability on each negative. So the row will exactly have at the nonzero entries of , entries each equal to .

It remains to exhibit the polynomial need to construct . We start by constructing a polynomial of degree that is positive on each nonzero position of and negative at all other indices. Later we will modify this polynomial to have the same positive value at each nonzero position of . Let and be any values with and . Consider the polynomial with roots at each and – this polynomial has roots and so degree at most . It will flip signs just at each and , and will in fact have the same sign at (either positive or negative). Simply negativing the coefficients we can ensure that this sign is positive, while it is negative at all other indices, giving the result.

The polynomial above can be written as . Choose and , where is arbitrarily small and is a weight chosen specifically for which we’ll set later. We have for any ,

This, if we set , in the limit as we will have . If we negate and scale the polynomially appropriately (which doesn’t change its degree) we will have arbitrarily close to one for each nonzero index , and negative for each zero index. This gives the theorem. ∎