1 Introduction
Our work centers on edge independent graph models, in which each edge is added to the graph independently with some probability . Formally,
Definition 1 (Edge Independent Graph Model).
For any symmetric matrix let be the distribution over undirected unweighted graphs where contains edge independently, with probability . I.e., .
Edge independent models encompass many classic random graph models. This includes the ErdösRényi model, where for all , for some fixed Erdös and Rényi (1960). It also includes the stochastic block model where if two nodes are in the same community and if two nodes are in different communities for some fixed with Snijders and Nowicki (1997). Other examples include e.g., the ChungLu configuration model Chung and Lu (2002), stochastic Kronecker graphs Leskovec et al. (2010).
Recently, significant attention has focused on graph generative models, which seek to learn a distribution over graphs that share similar properties to a given training graph, or set of graphs. Many algorithms parameterize this distribution as an edge independent model or closely related distribution. E.g., NetGAN and the closely related CELL model both produce and then sample edges independently without replacement with probabilities proportional to its entries, ensuring that at least one edge is sampled adjacent to each node Bojchevski et al. (2018); Rendsburg et al. (2020). Variational Graph Autoencoders (VGAE), GraphVAE, Graphite, and MolGAN are also all based on edge independent models Kipf and Welling (2016); Simonovsky and Komodakis (2018); De Cao and Kipf (2018); Grover et al. (2019).
Given their popularity in both classical and modern graph generative models, it is natural to ask:
How suited are edge independent models to modeling realworld networks. Are they able to capture features such as powerlaw degree distributions, smallworld properties, and high clustering coefficients (triangle densities)?
1.1 Impossibility Results for Edge Independent Models
In this work we focus on the ability of edge independent models to generate graphs with high triangle, or other small subgraph densities. High triangle density (equivalently, a high clustering coefficient) is a wellknown hallmark of realwork networks Watts and Strogatz (1998); Sala et al. (2010); Durak et al. (2012) and has been the focus of recent work exploring the power and limitations of edgeindependent graph models Seshadhri et al. (2020); Chanpuriya et al. (2020).
It is clear that edge independent models can generate triangle dense graphs. In particular, in Def. 1 can be set to the binary adjacency matrix of any undirected graph, and will generate that graph with probability , no matter how triangle dense it is. However, this would not be a particularly interesting generative model – ideally should generate a wide range of graphs. To capture this intuitive notion, we define the overlap of an edgeindependent model, which is closely related to the overlap stopping criterion for training used in training graph generative models Bojchevski et al. (2018); Rendsburg et al. (2020).
Definition 2 (Expected Overlap).
For symmetric let and
That is, for any , is the ratio of the expected number of edges shared by two graphs drawn independently from to the expected number of edges in a graph drawn from . In one extreme, when is a binary adjacency matrix, , and our generative model has simply memorized a single graph. In the other, if for all (i.e., is ErdösRényi), . This is the minimum possible overlap when .
Our main result is that for any edge independent model with bounded overlap, cannot have too many triangles in expectation. In particular:
Theorem 1 (Main Result – Expected Triangles).
For a graph , let denote the number of triangles in . Consider symmetric .
As an example, consider the setting where the distribution generates sparse graphs, with . Theorem 1 shows that whenever , – i.e. the graph is very triangle sparse with the number of triangles sublinear in the number of nodes. This verifies that an ErdösRényi graph cannot achieve simultaneously linear number of edges (i.e., ) and superlinear number of triangles (i.e., ) under our proposed lens of viewing generative models.
We extend Theorem 1 to give similar bounds for the density of squares and other cycles (Thm. 4), as well as for the global clustering coefficient (Thm. 6). In all cases we show that our bounds are tight – e.g., in the triangle case, there is indeed an edge independent model with , matching the lower bound in Theorem 1.
1.2 Empirical Findings
Our theoretical results help explain why, despite performing well in a variety of other metrics, edge independent graph generative models have been reported to generate graphs with many fewer triangles and squares on average than the realworld graphs that they are trained on. Rendsburg et al. Rendsburg et al. (2020) test a suite of these models, including their own CELL model and the related NetGAN model Bojchevski et al. (2018). Of all these models, when trained on the CoraML graph with 2,802 triangles and 14,268 squares, none is able to generate graphs with more than 1,461 triangles and 6,880 squares on average. Similar gaps are observed for a number of other graphs. Rendsburg et al. also report that the triangle count increases as their notion of overlap (closely related to Def. 2) increases. Theorem 1 demonstrates that this underestimation of triangle count, and its connection to overlap is inherent to all edge independent models, no matter how refined a method used to learn the underlying probability matrix .
While our theoretical results bound the performance of any edge independent model, there may still be variation in how specific models tradeoff overlap and realistic graph generation. To better understand this tradeoff, we introduce two simple models with easily tunable overlap as baselines. One is based on reproducing the degree sequence of the original graph; the other, which is even simpler, is based on reproducing the volume. In both models, is a weighted average of the input graph adjacency matrix and a probability matrix of minimal complexity which matches either the input degrees or the volume. In the latter case, to match just the volume, we simply use an ErdösRényi graph. In the former case, to match the degree sequence, we introduce our own model, the odds product model; this model is similar to the ChungLu configuration model Chung and Lu (2002), but, unlike ChungLu, is able to match degree sequences of realworld graphs with high maximum degree. We find that these simple baselines are often competitive with more complex models like CELL in terms of matching key graph statistics, like triangle count and clustering coefficient, at similar levels of overlap.
1.3 Related Work
Existing impossibility results. Our work is inspired by that of Seshadhri et al. Seshadhri et al. (2020), which also proves limitations on the ability of edge independent models to represent triangle dense graphs. They show that if where for and the max and min are applied entrywise, then cannot have many triangles adjacent to lowdegree nodes in expectation. This setting arises commonly when is generated using lowdimensional node embeddings – represented by the rows of . Chanpuriya et al. Chanpuriya et al. (2020), show that in a slightly more general model, where , this lower bound no longer holds – can be chosen so that is the binary adjacency matrix of any graph with maximum degree upper bounded by – no matter how triangle dense that graph is. Thus, even such lowrank edge independent models can represent triangle dense graphs – by memorizing a single one. In the appendix, we prove a similar result when is generated from the CELL model of Rendsburg et al. (2020), which simplifies NetGAN Bojchevski et al. (2018).
Our results show that this tradeoff between the ability to capture triangle density and memorization is inherent – even without any lowrank constraint, edge independent models with low overlap simply cannot represent graphs with high triangle or other small subgraph density.
It is well understood that specific edge independent models, e.g., ErdösRényi graphs, the ChungLu model, and stochastic Kronecker graphs, do not capture many properties of realworld networks, including high triangle density Watts and Strogatz (1998); Pinar et al. (2012). Our results can be viewed as a generalization of these observations, to all edge independent models with low overlap. Despite the limitations of classic models, edge independent models are still very prevalent in today’s literature on graph generative models. Our more general results make clear the limitations of this approach.
Nonindependent models. While edge independent models are very prevalent in the literature, many important models do not fit into this framework. Classic models include the Barabási–Albert and other preferential attachment models Barabási and Albert (1999), Watts–Strogatz smallworld graphs Watts and Strogatz (1998), and random geometric graphs Dall and Christensen (2002). Many of these models were introduced directly in response to shortcomings of classic edge independent models, including their inability to produce high triangle densities
More recent graph generative models include GraphRNN You et al. (2018) and a number of other works Li et al. (2018); Liao et al. (2019). Our impossibility results do not apply to such models, and in fact suggest that perhaps they may be preferable to edge independent models, if a distribution over graphs with high triangle density is desired. A very interesting direction for future work would be to prove limitations on broad classes of nonindependent models, and perhaps to understand exactly what type of correlation amongst edges is needed to generate graphs with both low overlap ^{1}^{1}1We note that for nonedge independent models, the measure of overlap as defined earlier should be adapted to take into account the order (permutation) of the vertices in the final graph. In particular, the overlap in this case should be the maximum value of it over any permutation of the vertex set.and hallmark features of realworld networks.
2 Impossibility Results for Edge Independent Models
We now prove our main results on the limitations of edge independent models with bounded overlap. We start with a simple lemma that will be central in all our proofs.
Lemma 2.
For any symmetric ,
Proof.
Let be the indicator random variable that an edge appears in the graph . . By linearity of expectation and the independence of and we have,
The bound follows since is symmetric. Note that the lower bound is an equality if is on the diagonal – i.e., there is no probability of self loops. ∎
2.1 Triangles
Lemma 2 connects to
and in turn the eigenvalue spectrum of
since , where are the eigenvalues of . The expected number of triangles in can be written in terms of this spectrum as well, allowing us to relate overlap to this expected triangle count, and prove our main theorem (Theorem 1), restated below.Theorem 1.
For a graph , let denote the number of triangles in . Consider symmetric .
Proof.
The bound of Theorem 1 is tight up to constants, for any possible value of . The tight example is when is simply an ErdösRényi graph.
Theorem 3 (Tightness of Expected Triangle Bound).
For any , there exists a symmetric with and .
Proof.
Let for all . We have and Thus, . Further, by linearity of expectation,
∎
We note that another example when Theorem 1 is tight is when is a union of a fixed clique on nodes and an ErdösRényi graph with connection probability on the rest of the nodes.
2.2 Squares and Other cycles
We can extend Thm. 1 to bound the expected number of cycles in in terms of .
Theorem 4 (Bound on Expected cycles).
For a graph , let denote the number of cycles in . Consider symmetric .
Proof.
For notational simplicity, we focus on . The proof directly extends to general . is the number of nonbacktracking 4cycles in (i.e. squares), which can be written as
The factor accounts for the fact that in the sum, each square is counted
times – once for each potential starting vector
and once of each direction it may be traversed. For general cycles this factor would be . We then can boundFor general cycles this bound would be This in turn gives
where the last bound follows from Lemma 2. This completes the theorem.. ∎
It is not hard to see that Theorem 4 is also tight up to a constant depending on for any overlap , also for an ErdösRényi graph with connection probability .
Theorem 5 (Tightness of Expected cycle Bound).
For any , there exists with and .
2.3 Clustering Coefficient
Theorem 1 shows that the expected number of triangles generated by an edge independent model is bounded in terms of the model’s overlap. Intuitively, we thus expect that graphs generated by the edge independent model will have low global clustering coefficient, which is the fraction of wedges in the graph that are closed into triangles Watts and Strogatz (1998).
Definition 3 (Global Clustering Coefficient).
For a graph with triangles, no selfloops, and node degrees , the global clustering coefficient is given by
We extend Theorem 1 to give a bound on in terms of . The proof is related, but more complex due to the in the denominator of .
Theorem 6 (Bound on Expected Clustering Coefficient).
Consider symmetric with zeros on the diagonal and with .
Proof.
By Theorem 1 we have . We will show that with high probability, , which will give the theorem. Note that . Thus, by a Bernstein bound, for large enough since .
We can bound . Thus, with probability ,
where in the last step we use that and so . Combined with our bound on , and the fact that always, we have
∎
Thus, to have a constant clustering coefficient for a graph with edges in expectation, we need . Note that the requirement of is very mild – it means that the expected average degree is at least .
As with our triangle bound, Theorem 6 is tight when is just an ErdösRényi distribution.
Theorem 7 (Tightness of Expected Clustering Coefficient Bound).
For any , there exists with zeros on the diagonal, and .
Proof.
Let for all . We have and . Additionally, , and, if is large enough with respect to , with very high probability, . This gives:
∎
3 Baseline Edge Independent Models
We now shift from proving theoretical limitations of edge independent models to empirically evaluating the tradeoff between overlap and performance for a number of particular models. Given an input adjacency matrix , these generative models produce a , samples from which should match various graph statistics of , such as the triangle count, clustering coefficient, and assortativity. At the same time, should ideally have lower overlap so that the model does not just memorize the original graph. We propose two simple generative models as baselines to more complicated existing models – in both the level of overlap is easily tuned. Our first baseline, the odds product model, is based on just matching the degree sequence of ; more simple still, the second baseline computes as a linear function of , just matching its volume.
Odds product model.
In this model, each node is assigned a logit
, and the probability of adding an edge between nodes and is , where is the logistic function. We fit the model by finding a vector of logits, with one logit for each node, such that the reconstructed network has the same expected degrees (i.e. row and column sums) as the original graph. We note that this model can be seen as a special case of the MaxEnt De Bie (2011) and innerproduct Ma et al. (2020); Hoff (2003, 2005) models. In the context of directed graphs, has been called the expansiveness or popularity of node Goldenberg et al. (2010).For adjacency matrix , we denote its degree sequence by , where is the allones vector of length . Similarly, the degree sequence of the model is . We pose fitting the model as a rootfinding problem: we seek such that the degree errors are zero, that is, . We use the multivariate NewtonRaphson method to solve this rootfinding problem. To apply NewtonRaphson, we need the Jacobian matrix of derivatives of the degree errors with respect to the entries of . Since does not vary with , these derivatives are exactly . Letting be if and otherwise (i.e. the Kronecker delta),
In Algorithm 1, we provide pseudocode for computing the Jacobian matrix and for implementing NewtonRaphson method to compute . We do not have a proof that Algorithm 1 always converges and produces which exactly reproduces in the inut degree sequence. However, the algorithm converged on all test cases, and proving that it always converges would be an interesting future direction.
Our odds product model can be viewed as a variant of the ChungLu configuration model Chung and Lu (2002), which is also based on degree sequence matching. However, but our model comes without a certain restriction on the maximum degree: in ChungLu, it is assumed that the degrees of all nodes are bounded above by the square root of the volume of the graph, that is, for all nodes . Given this restriction, each node is assigned a weight , and the probability of adding edge is . Since the weights are all in , they can be interpreted as probabilities, and the probability of adding an edge between two nodes is the product of the two nodes’ probabilities.
Our odds product model works similarly, but instead of a probability, for each node, there is an associated odds, that is, a value in , and the odds of adding an edge between two nodes is the product of the two nodes’ odds. There is a onetoonetoone relationship between probability , odds , and logit . We outlined above how our model is based on adding logits associated with each node; since the odds is the exponentiation of the logit, the model can equally be viewed as multiplying odds associated with nodes.
Varying overlap in the odds product model. We propose a simple method to control the tradeoff between overlap and accuracy in matching the input graph statistics in the odds product model. Given the original adjacency matrix and the generated by the odds product model to match the degree sequence of , we use a convex combination of and . That is, we use , where . As increases to , approaches a model which returns the original graph with high certainty; hence high produce with high overlap which closely match graph statistics, while low produce with lower overlap which may diverge from in some statistics. Note that since is a convex combination of adjacency matrices with the expected degree sequence of , also has the same expected degree sequence regardless of the value of .
Linear model. As an even simpler baseline, we also propose and evaluate the following model: we produce an ErdösRényi model with the same expected volume as the original graph , then return a convex combination of and . In particular, each entry of is , and, as with the odds product model, , where . This model can alternatively be seen as producing a by lowering each entry of which is to some probability , and raising each entry of which is to a probability , with , such that the volume is conserved.
4 Experimental Results
We now present our evaluations of different edge independent graph generative models in terms of the tradeoff achieved between overlap and performance in generating graphs with similar key statistics to an input network. These experiments highlight the strengths and limitations of each model, as well as the overall limitations of this class, as established by our theoretical bounds.
4.1 Methods
We compare our proposed models from Section 3 with a number of existing models described below

CELL Rendsburg et al. (2020) (CrossEntropy Lowrank Logits) An alternative to the popular NetGAN method Bojchevski et al. (2018) which strips the proposed architecture of deep leaning components and achieves comparable performance in significantly less time, via a lowrank approximation approach. To control overlap, we follow the approach of the original paper, halting training once the generated graph exceeds a specified overlap threshold with the input graph. We set the rank parameter to a value that allows us to get up to 75% overlap (typical values are 16 and 32).

TSVD
(Truncated Singular Value Decomposition) A classic spectral method which computes a rank
approximation of the adjacency matrix using truncated SVD. As in Seshadhri et al. (2020), the resulting matrix is clipped to [0,1] to yield . Overlap is controlled by varying . 
CCOP (Convex Combination Odds Product) The odds product model as of Sec. 3 with overlap controlled by taking a convex combination of and the input adjacency matrix .

HDOP (Highest Degree Odds Product) The odds product model, but with overlap controlled by fixing the edges adjacency to a certain number of the highest degree nodes. See Appendix for results on other variants, e.g., where some number of dense subgraphs are fixed.

Linear The convex combination between the input adjacency matrix and an ErdösRényi graph, as described in Sec. 3, with overlap controlled by varying the parameter.
CCOP, HDOP, and Linear all produce edge probability matrices with the same volume, , in expectation as the original adjacency matrix. For TSVD, letting be the lowrank approximation of the adjacency matrix, we learn a scalar shift parameter using Newton’s method such that has volume . We then generate new networks from the edge independent distribution (Def. 1). For CELL, we follow the authors’ approach of generating edges without replacement  an edge is added with probability proportional to ).
We sample 5 networks from each distribution and report the average for every statistic.
4.2 Datasets and network statistics
For evaluation, we use the following seven popular datasets with varied structure, from trianglerich social networks to planar road networks:

PolBlogs: A collection of political blogs and the links between them.

Citeseer: A collection of papers from six scientific categories and the citations among them.

Cora: A collection of scientific publications and the citations among them.

RoadMinnesota: A road network from the state of Minnesota. Each intersection is a node.

WebEdu: A webgraph drawn from educational institutions.

PPI: A subgraph of the PPI network for Homo Sapiens. Vertices represent proteins and edges represent interactions.

Facebook: A union of ego networks of Facebook users.
See Table 1 for statistics about the networks. We treat all networks as binary, in that we set all nonzero weights to , and undirected, in that if edge appears in the network, we also include edge . Also, we keep only the largest connected component of each network.
Dataset  Nodes  Edges  Triangles 

PolBlogs Adamic and Glance (2005)  1,222  33,428  101,043 
Citeseer Sen et al. (2008)  2,110  7,336  1,083 
Cora Sen et al. (2008)  2,485  10,138  1,558 
RoadMinnesota Rossi and Ahmed (2015)  2,640  6,604  53 
WebEdu Gleich et al. (2004)  3,031  12,948  10,058 
PPI Stark et al. (2010)  3,852  75,682  91,461 
Facebook Leskovec and Mcauley (2012)  4,039  176,468  1,612,010 
We evaluate performance in matching the following key network statistics:

Pearson correlation of the degree sequences of the input and the generated network.

Maximum degree over all nodes.

Exponent of a powerlaw distribution fit to the degree sequence.

Assortativity, a measure that captures the preference of nodes to attach to others with similar degree (ranging from 1 to 1).

Pearson correlation of the triangle sequence (number of triangles a node participates in).

Total triangle count (analyzed theoretically in Thm. 1).

Characteristic path length (average path length between any two nodes).
4.3 Results
The theoretical results from Section 2 highlight a key weakness of edge independent generative models: they cannot generate many triangles (or other higherorder locally dense areas), without having high overlap and thus not generating a diversity of graphs. We observe that these theoretical findings hold in practice – generally speaking, all models tested tend to significantly underestimate triangle count and global clustering coefficient, as well as inaccurately match the triangle degree sequence, when overlap is low. See Figures 1, 2, 3, 4, 5, 6, and 7 for results on the tested networks. As overlap increases, performance in reconstructing these metrics does as well, as expected.
All methods are able to capture certain network characteristics accurately, even at low overlap. Even for a relatively small overlap (less than 0.2), the CCOP and HDOP methods accurately capture the degree sequences of the true networks (as they are designed to do). These methods, especially HDOP which fixes edges from high degree nodes, often outperform more sophisticated methods like CELL in terms of triangle density and triangle degree sequence correlation. On the other hand, CELL seems to do a somewhat better job capturing global features, like the characteristic path length. TSVD provides a fair compromise – it performs better than CELL in terms of degree sequence and triangle counts, but worse in terms of characteristic path length. In general, it is the method that gives the best results when the overlap is extremely small, appearing to be less sensitive to the variation in overlap.
Broadly speaking, all methods do reasonably well in matching the powerlaw degree distribution of the networks, even when they do not match the actual degree sequence closely. With the exception of WebEdu, they tend to underestimate the characteristic path length. This is perhaps not surprising due to the independent random edge connections, however it would be interesting to understand more theoretically.
4.4 Code for Reproducing Results
Code is available at https://github.com/konsotirop/edge_independent_models. Our implementation of the methods we introduce is written in Python and uses the NumPy Harris et al. (2020) and SciPy Virtanen et al. (2020) packages. Additionally, to calculate the various graph metrics, we use the following packages: powerlaw Alstott et al. (2014) and MACE (MAximal Clique Enumerator) Takeaki (2012).
5 Conclusion
Our theoretical results prove limitations on the ability of any edge independent graph generative model to produce networks that match the high triangle densities of realworld graphs, while still generating a diverse set of networks, with low model overlap. These results match empirical findings that popular edge independent models indeed systematically underestimate triangle density, clustering coefficient, and related measures. Despite the popularity of edge independent models, many nonindependent models, such as graph RNNs You et al. (2018) have been proposed. An interesting future direction would be to study the representative power and limitations of such models, giving general theoretical results that provide a foundation for the study of graph generative models.
References
 [1] (2005) The political blogosphere and the 2004 us election: divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43. Cited by: Table 1.
 [2] (2014) Powerlaw: a python package for analysis of heavytailed distributions. PloS one 9 (1), pp. e85777. Cited by: §4.4.
 [3] (1999) Emergence of scaling in random networks. Science 286 (5439), pp. 509–512. Cited by: §1.3.
 [4] (2018) NetGAN: Generating graphs via random walks. 2018. Cited by: Appendix A, §1.1, §1.2, §1.3, §1, item 1.
 [5] (2020) Node embeddings and exact lowrank representations of complex networks. 2020. Cited by: Appendix A, §1.1, §1.3.
 [6] (2002) The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences 99 (25), pp. 15879–15882. Cited by: §1.2, §1, §3.
 [7] (2002) Random geometric graphs. Physical Review E 66 (1), pp. 016121. Cited by: §1.3.
 [8] (2011) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery 23 (3), pp. 407–446. Cited by: §3.
 [9] (2018) MolGAN: An implicit generative model for small molecular graphs. ICML Deep Generative Models Workshop. Cited by: §1.
 [10] (2012) Degree relations of triangles in realworld networks and graph models. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Cited by: §1.1.
 [11] (1960) On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5 (1), pp. 17–60. Cited by: §1.
 [12] (2004) Fast parallel pagerank: a linear system approach. Yahoo! Research Technical Report 13, pp. 22. Cited by: Table 1.
 [13] (2010) A survey of statistical network models. Cited by: §3.
 [14] (2019) Graphite: iterative generative modeling of graphs. In 2019, Cited by: §1.
 [15] (202009) Array programming with NumPy. Nature 585 (7825), pp. 357–362. External Links: Document, Link Cited by: §4.4.
 [16] (2003) Random effects models for network data. In Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers, Cited by: §3.
 [17] (2005) Bilinear mixedeffects models for dyadic data. Journal of the American Statistical Association 100 (469), pp. 286–295. Cited by: §3.

[18]
(2016)
Variational graph autoencoders.
NeurIPS Bayesian Deep Learning Workshop
. Cited by: §1. 
[19]
(2010)
Kronecker graphs: an approach to modeling networks..
Journal of Machine Learning Research
11 (2). Cited by: §1.  [20] (2012) Learning to discover social circles in ego networks. In 2012, pp. 539–547. Cited by: Table 1.
 [21] (2018) Learning deep generative models of graphs. 1803.03324. Cited by: §1.3.
 [22] (2019) Efficient graph generation with graph recurrent attention networks. 1910.00760. Cited by: §1.3.
 [23] (2020) Universal latent space model fitting for large networks with edge covariates.. J. Mach. Learn. Res. 21, pp. 4–1. Cited by: §3.
 [24] (2012) The similarity between stochastic Kronecker and ChungLu graph models. In Proceedings of the 2012 SIAM International Conference on Data Mining, Cited by: §1.3.
 [25] (2020) NetGAN without GAN: from random walks to lowrank approximations. In 2020, Cited by: Appendix A, Appendix A, §1.1, §1.2, §1.3, §1, item 1.
 [26] (2015) The network data repository with interactive graph analytics and visualization. In AAAI, External Links: Link Cited by: Table 1.
 [27] (2010) Measurementcalibrated graph models for social network experiments. In 2010, Cited by: §1.1.
 [28] (2008) Collective classification in network data. AI Magazine 29 (3), pp. 93–93. Cited by: Table 1.
 [29] (2020) The impossibility of lowrank representations for trianglerich complex networks. Proceedings of the National Academy of Sciences 117 (11), pp. 5631–5637. Cited by: §1.1, §1.3, item 2.
 [30] (2018) GraphVAE: towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pp. 412–422. Cited by: §1.
 [31] (1997) Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification 14 (1), pp. 75–100. Cited by: §1.
 [32] (2010) The BioGRID interaction database: 2011 update. Nucleic Acids Research 39, pp. D698–D704. Cited by: Table 1.
 [33] (2012) Implementation issues of clique enumeration algorithm. Special issue: Theoretical computer science and discrete mathematics, Progress in Informatics 9, pp. 25–30. Cited by: §4.4.
 [34] (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, pp. 261–272. External Links: Document Cited by: §4.4.
 [35] (1998) Collective dynamics of ‘smallworld’networks. Nature 393 (6684), pp. 440–442. Cited by: §1.1, §1.3, §1.3, §2.3.
 [36] (2018) GraphRNN: generating realistic graphs with deep autoregressive models. In 2018, pp. 5708–5717. Cited by: §1.3, §5.
Appendix A Exact Embeddings in the CELL Model
Recently, Rendsburg et al [25] propose the CELL graph generator: a major simplification of the NetGAN algorithm for [4], which gives comparable performance, much faster runtimes, and helps clarify the key components of the generator. CELL uses a simple lowrank factorization model. Here we prove that, when its rank parameter is , the CELL model can ‘memorize’ any graph with degree bounded by . This allows the model to trivially produce distributions with very high expected triangle densities. However, as our main results show, this inherently requires memorization and high overlap.
Our result can be viewed as an extension of the results of [5], which considers a different edge independent model. The proof techniques are very similar. Interestingly, our result seem to indicate that the good generalization of CELL in link prediction tests may mostly be due to the fact that this model is not fully optimized, to the point of memorizing the input.
The CELL Model. We first describe the CELL model introduced in [25].

Given a graph adjacency matrix , let
(2) where applies a softmax rowwise to – ensuring that each row of sums to .

Let .

Generate .
Note that the last step described above is slightly different than the approach taken in CELL. Rather than use an edgeindependent model as in Def. 1, they form by sampling edges without replacement, with probability proportional to the entries in . They also insure that at least one edge is sampled adjacent to every node. However, this distinction is minor.
Unconstrained Optimum. We first show that, if the rank constraint in (2) is removed, then the optimal has , where is the diagonal degree matrix. At this minimum, we can check that , the degree of the node, and thus and . That is, the model simply outputs the input graph with probability .
Theorem 8 (CELL Optimum).
The unconstrained CELL objective function (2) is minimized when . At this minimum, the edge independent model is simply . That is, the model just returns the input graph with probability .
Proof.
It suffices to consider the row of for each , since the objective function of (2) breaks down rowwise. Let be the rows of and respectively. Note that is a probability vector, with for all and .
We seek to minimize We need to show that this objective is minimized when – i.e., when places mass at each nonzero entry in is the row of , so applying this argument to all gives that is the overall minimizer. Assume for the sake of contradiction that there is some other minimizer . Since , we must have for some where . In turn, there must be some with either (1) and or (2) and . In case (1), clearly moving mass from to will decrease the objective function. In case (2), due to the concavity of the log function, moving mass from to will also decrease the objective function. Thus, cannot be a minimizer, completing the proof. ∎
RankConstrained Optimum. We next show that the unconstrained optimum of , which leads to CELL memorizing the input graph (Thm. 8) can be achieved even with the rank constraint of (2), as long as , where is the maximum degree of the input graph.
Theorem 9 (CELL Exact Factorization).
If is an adjacency matrix with maximum degree , there is a rank matrix with
where . Note that the rank of does not depend on , and so we can drive and find a rank which is arbitrarily close to minimizing (2) and thus produces which is arbitrarily close to .
Proof.
Let be the Vandermonde matrix with . For any , . That is: is a degree polynomial evaluated at the integers .
Let be the row of . Note that has at most nonzeros whose positions we denote by . To prove the theorem, for each row , we will construct a polynomial which has the same positive value at each and is negative all all other integers . Then, we will let be the matrix with columns and . Note that , and is equal to a fixed positive value whenever A is one and negative whenever it is zero. If we scale by a very large number (which does not affect its rank), we will have arbitrarily close to , since the rowwise softmax will place equal probability on each positive entry in row of and arbitrarily close to probability on each negative. So the row will exactly have at the nonzero entries of , entries each equal to .
It remains to exhibit the polynomial need to construct . We start by constructing a polynomial of degree that is positive on each nonzero position of and negative at all other indices. Later we will modify this polynomial to have the same positive value at each nonzero position of . Let and be any values with and . Consider the polynomial with roots at each and – this polynomial has roots and so degree at most . It will flip signs just at each and , and will in fact have the same sign at (either positive or negative). Simply negativing the coefficients we can ensure that this sign is positive, while it is negative at all other indices, giving the result.
The polynomial above can be written as . Choose and , where is arbitrarily small and is a weight chosen specifically for which we’ll set later. We have for any ,
This, if we set , in the limit as we will have . If we negate and scale the polynomially appropriately (which doesn’t change its degree) we will have arbitrarily close to one for each nonzero index , and negative for each zero index. This gives the theorem. ∎
Comments
There are no comments yet.