Stochastic Kronecker graphs were introduced by  as a method for simulating very large random graphs. Random synthetic graphs are used to test graph algorithms and to understand observed properties of graphs. By using simulated graphs, instead of real measured ones, it is possible to test algorithms on graphs larger or denser than presently observed ones. Simulated graphs also allow one to judge which features of a real graph are likely to hold in other similar graphs and which are idiosyncratic to the given data.
Stochastic Kronecker graphs are able to serve these purposes through a model that has only three or four parameters. Parameter estimation poses unique challenges for those graphs. The main problem is that for a graph with nodes, the likelihood has contributions from permutations of the nodes . In practice, many thousands or millions of randomly sampled permutations are used to estimate the likelihood. Even then it takes more than work to evaluate the likelihood contribution from one of the permutations.
In this paper we present a method of moments strategy for parameter estimation. While moment methods can be inefficient compared to maximum likelihood, statistical efficiency is of reduced importance for enormous samples and in settings where the dominant error is lack of fit. The method equates expected to observed counts for edges, triangles, hairpins (-stars or wedges) and tripins (-stars). The Kronecker model gives quite tractable formulas for these moments.
The outline of this paper is as follows. Section 2 defines Kronecker graphs and introduces some notation. Section 3 derives the expected feature counts. Section 4 describes how to solve method of moment equations for the parameters of the Kronecker graph model. Section 5 presents some examples on fitting Kronecker models to some real world graphs. We compare several moment based ways to estimate Kronecker graph parameters and find the most reliable results come from a criterion that sums squared relative errors between observed and expected features. We find that the fitted Kronecker models usually underestimate the number of triangles compared to the real graphs. While our parameter estimates underestimate triangle counts and some other features, we find that they provide much closer matches than some previously published parameters fit by KronFit. Section 6 fits parameters to graphs that were randomly generated from the Kronecker model. We find that the estimated parameters closely track their generating values, with some small bias when a parameter is at the extreme range of valid values. Section 7 has our conclusions.
The data for our examples is online at
along with the code used to estimate Kronecker parameters.
2 The Kronecker model
Given a node set of cardinality , and a matrix defined over , a random graph is one where the edge
exists with probabilityand all edges exist or don’t independently. The graph includes loops and may possibly include both and . We snip these out by defining the random graph with edges only when and , using any non-random ordering of . Both and are in fact probability weighted ensembles of graphs, but for simplicity we describe them as single random graphs. We assume that is a symmetric matrix and so the ordering of nodes does not affect the distribution.
The description of allows up to parameters that affect the outcome. Much more parsimonious descriptions can be made by taking to be the Kronecker product of two or more smaller matrices. Recall that the Kronecker product of matrices and is
An extremely parsimonious stochastic Kronecker graph takes to be the -fold Kronecker product of for . That is
If the power is known, then only three numbers need to be specified, and with them we can then simulate other graphs that are like the original. Perhaps surprisingly, stochastic Kronecker graphs imitate many, but of course not all, of the important features seen in large real world graphs. See for example .
We would like to pick parameters to match the properties seen in a real and large graph. Parameter matrices and give rise to the same graph distribution. To force identifiability, we may assume that .
3 Moment formulas
The Kronecker structure in makes certain aspects of very tractable. For example, the number of edges in can be shown to have expectation
Simply counting the edges in
gives us valuable information on the parameter vector. Because
is a sum of independent Bernoulli random variables we find thatand so the relative uncertainty will be small in a graph with a large number of expected edges.
This section derives equation (1) and similar formulas for the expected number of features of various types. The expected feature counts require sums over various sets of nodes. Section 3.1 records some summation formulas that simplify that task. Then Section 3.2 turns expected feature counts into sums and Section 3.3 shows how those sums simplify for stochastic Kronecker matrices.
3.1 Summation formulas
Let for a finite index set . A plain summation sign represents sums over all combinations of levels of all the subscripting indices used. The symbol includes all levels of all indices, except for any combinations where two or more of those indices take the same value. In several places we find that sums are easier to do over all levels of all indices, while the desired sums are over unique levels. Here we record some formulas to translate the second type into the first.
It is elementary that
When there are four indices, we get
In some of our formulas below, the first index is singled out but the others are exchangeable. By this we mean that , when there are three indices, while is the version for four indices.
When indices after the first are exchangeable, then equation (3) simplifies to
and equation (4) simplifies to
When all indices are exchangeable, so that , then equation (5) simplifies to
3.2 Expected feature counts for independent edges
The graph features we describe are shown in Figure 1. In addition to edges, there are hairpins (-stars) where two edges share a common node, tripins (-stars) where three edges share a node, and triangles. The Kronecker model has independent edges. Here we find the expected feature counts for any random graph where edge appears with probability and edges are independent.
Recall that is a random graph with (independently). Let it have incidence matrix . There may be loops , and for , and are independently generated. The graph is formed by deleting loops from and symmetrizing the incidence matrix via
The number of edges in is . The expected number of edges satisfies
The number of hairpins in is . Dividing by two adjusts the sum for counting twice. The expected value of satisfies
by letting , for which , and applying equation (5).
The number of triangles in is , because the sum counts each triangle times. The expected value of each term is which is symmetric in its three arguments and so we may apply equation (7) to get
The number of tripins in is . The final three indices in are exchangeable, and so equation (6) applies. Thus
3.3 Simplifying the sums
The sums in the expected counts simplify, because of the properties of the Kronecker graph. Let the node set be . For write for . Similarly let , , and be described in terms of for .
The matrix entry may be written
For , we simplify the expression by induction using a smaller version of the problem defined via . Specifically,
where indices , and are summed over their full ranges, and the indices , , for are summed over the node set .
All of the sums of products of elements of listed in the previous section, with summation over all levels of each index, also reduce this way to ’th powers of their value for the case .
For we need to sum products of elements of over or over or over . These cases correspond to the first , , or rows of Table 1 for . For instance requires which we know to be the ’th power of
In the rest of this section, we record the other sums we need. First, the sums over one index variable take the form
where cases are used in our expected feature counts. The sums over two index variables are
The cases we need are for .
Four sums over three indices are used. They are:
Finally, one sum over four indices is used:
3.4 Expected feature counts
Now we can specialize the results of Section 3.2 to the Kronecker graph setting. Gathering together the previous developments, we find
In each formula, the terms from sums over fewer indices come after the ones from more indices. The later terms adjust for loops and double edges and other degenerate quantities. For large , we expect that the first term should be most important. In particular if then in all cases the first quantity raised to the power is the largest one. For example the first term in is times as large as the second one, which subtracts out loops.
The first term will dominate for large unless . The relative magnitude of the second term is
where . If then dropping the second term in makes a smaller difference than the sampling uncertainty in . This holds when the off diagonal element of is not too small compared to the average of the diagonal elements: .
Some special cases of the formulas are of interest. For example if then there are no edges in apart from loops. As a result has isolated nodes. We find from the above that when .
If instead then each node with coordinates has a dual node which has coordinates for . The only possible edges in are between nodes and their duals. There are nodes each with probability of having an edge out to its dual. The formula above gives when , as it should. We also get when .
If , then has every possible edge and loop with probability . As a result is the complete graph on nodes. Then it has edges, hairpins, triangles, and it has tripins.
4 Solving for , , and
There are four equations in Section 3.4. To estimate , , and will require at least three of them. Because they are high order polynomials it is possible that there are multiple solutions or even none at all. The latter circumstance would provide some evidence of lack of fit of the stochastic Kronecker model to a given graph. Regardless, each of the equations involves the count of a feature in the graph.
4.1 Counting features in a graph
Three of the features we use are easily obtainable from the degrees of the nodes. Let be the degree of node in graph . Then
give the number of edges, hairpins (or wedges), and tripins in terms of the degrees .
4.2 Objective functions
A pragmatic way to choose , , and is to solve
) are scaled by an approximate variance. A sharper expression would account for correlations among the features used. That should increase statistical efficiency, but in large problems lack of fit to the Kronecker model is likely to be more important than inefficiency of estimates within it.
Many real world networks may not have good fits in terms of these three Kronecker parameters. This is the case for most of the forthcoming experiments. The following more general objective can be more robust to these deviances:
Here is either of the two distance functions:
and is one of the normalizations:
Using and makes it equal to the previous objective (11).
In principle, either of the two distance functions can be combined with any of the four normalizations. We do not think it is reasonable to expect a quadratic denominator to be a suitable match for the absolute error. Therefore our investigations exclude combination of with either or .
a sum of squared relative errors.
Because there are only three parameters, the criterion (12) can simply be evaluated over a grid inside . To be sure of taking a point within of the minimizer takes work . An alternative is to employ a general nonlinear minimization procedure. The remainder of this section looks at a method to reduce that effort.
4.3 Matching leading terms
In a synthetic graph is known. When fitting to a real world graph a pragmatic choice is . The interpretation is that the random graph may have had isolated nodes that were then dropped when forming , but we suppose that fewer than half of the nodes in have been dropped.
If we consider just the lead terms, then we could get estimates , , and by solving three of the equations:
The equations for and together can be solved to get
where we have assumed that . The transformed tripin count matches and so it is redundant given and , if we are just using lead terms. We must either count triangles, or use higher order terms.
Equation (14) may fail to have a meaningful solution. At a minimum we require and . These translate into
which is equivalent to
that in terms of node degrees is
The left hand inequality in (15) holds for any graph, but the right hand side need not. It holds when . If the variance of the node degrees is smaller than their mean, then equation (14) does not have real valued solutions. The degree distribution of a stochastic Kronecker graph has heavy tails . Therefore in applications where that model is suitable equation (14) will give a reasonable solution.
When have a variance larger than their mean, then we can do a univariate grid search for using equation (14) to get and . The choice of can then be made as the minimizer of .
In this section, we experiment with different techniques for fitting the parameters of the Kronecker model. These experiments involve 8 real world networks whose statistical properties are listed in the rows of the forthcoming tables labeled “Source.”
The networks ca-GrQc, ca-HepTh, ca-HepPh are co-authorship networks from arXiv . The nodes of the network represent authors, and there is an edge between two nodes when the authors jointly wrote a paper. Likewise, the hollywood-2009 network is a collaboration graph between actors and actresses in IMDB [1, 2]. Nodes are actresses or actors, and edges are collaborations on a movie, as evidenced by jointly appearing on the cast. These networks are naturally undirected and all edges are unweighted.
Both as20000102 and as-Skitter are technological infrastructure networks . Each node represents a router on the internet and edges represent a physical or virtual connection between the routers. Again, these networks are undirected and unweighted.
The wikipedia-20051105 graph is a symmetrized link graph of the articles on Wikipedia generated from a data download on November 5th, 2005 . The underlying network is directed, but in these experiments, we have converted it into an undirected network by dropping the direction of the edges.
All of the previously described networks have distinctly skewed degree distributions. That is, there are a few authors, actors, routers, or articles with a large number of links, despite the overall network having a small average degree. The final network we study is usroads, a highway-level network from the National Highway Planning Network (http://www.fhwa.dot.gov/planning/nhpn/), which does not have a highly skewed distribution. We include it as an example of a nearly planar network. It is also naturally undirected.
In two of the experiments, we generate synthetic Kronecker networks. The algorithm to realize these networks is an explicit coin-flipping procedure instead of the more common ball-dropping method . For each cell in the upper triangular portion, we first determine the log of the probability of a non-zero value in that cell, then generate a random coin flip with that probability as heads and record an edge when the coin comes up heads. This procedure is scalable because the full matrix of probabilities is never formed. It is also easily parallelizable. Our implementation uses pthreads to exploit multi-core parallelism. It takes somewhat more work than the ball-dropping procedure, scaling as instead of , where is the number of balls dropped. Often , that is, balls per vertex . Each ball generates about one edge; see  for a more thorough analysis. Coin-flipping preserves the exact Kronecker distribution whereas ball-dropping is an approximation.
The experiments with these networks investigate (i) the difference in results from the various choices of and in the objective (12); (ii) the fitted parameters to the 8 real world networks; and (iii) the difference in fitted parameters when only using three of the four graph features.
5.1 Objective functions
The first study regards the choice of objective function. Of eight possible combinations of distance and normalization, we considered two to be unreasonable a priori. Here we investigate the other six pairs.
Table 2 shows the different parameters and chosen by each objective function, as well as the expected feature counts for those parameters for three graphs: a single realization of a Kronecker graph with , the collaboration network ca-GrQc, and the infractucture network as20000102. The rows labeled “Source” contain the actual feature counts in each network. The optimization algorithm to pick uses the best objective value from three procedures. First, it tries 50 random starting points for the fmincon function in Matlab R2010b, an active set algorithm. Then, it performs a grid search procedure with 100 equally spaced points in each dimension. Finally, it tries the leading term matching algorithm from Section 4.3, and considers those parameters.
The results in the table show that the choice of objective function does not make a difference when the graph fits the Kronecker model. However, it can make a large difference when the graph does not exactly fit, as in the ca-GrQc and as20000102 networks. Both of the objectives and
produced distinctly different fits for these two networks, compared to the other objectives. These two fits seem to be primarily matching the number of triangles – almost to the exclusion of the other features. The other odd fit for the ca-GrQc graph comes from theobjective. This fit appears to be matching the tripin count and ignoring other features, something that also seems to be true for the as20000102 graph. Among the remaining fits for ca-GrQc, there is little difference among the fitted parameters and estimated features. The results are a bit different for as20000102. The fits for and are almost identical and show a good match to the tripin count, but a poor match to the remaining features. The fits for and are similar and deciding which is better seems like a matter of preference. These observations held up under further experimentation, which we omit here in the interest of space.
Based on these results, either of the objectives or appears to be a robust choice when the model does not fit exactly. Due to the continuity of the function, the rest of our fits in this manuscript uses the variation.
|Graph||Kron. Parameters||Graph / Expected Features||Obj.|
5.2 Parameters for real-world networks
For the 8 networks previously described, we use the objective function (12) with to fit the parameters . The results, along with the expected feature counts for the fitted parameters, are presented in Table 3. We show the minimizer for the three different strategies to optimize the objective described in the previous section: a direct minimization procedure, the grid search procedure, and the leading term matching approach (Section 4.3). For each approach, the table also shows the time required for that algorithm and the value of the objective function at the minimizer.
Leskovec et al.  provide the fitted parameters and from their KronFit algorithm for the networks ca-GrQc, ca-HepTh, ca-HepPh, and as20000102. We include them in Table 3 for comparison. In all cases but one, the expected feature count using KronFit is farther from the observed feature count than the expectation under our moment based fits. Sometimes it is much farther. There was one exception. For the graph as20000102, KronFit gave a better estimate of the number of edges than our moment method gave.
KronFit typically underestimates the feature counts. The effect is severe for triangles. Kronecker random graphs commonly have many fewer triangles than the real world graphs to which they are fit. Our moment based estimators find parameters leading to many more triangles than the KronFit parameters do.
In fairness, we point out that our method is designed to match expected to observed feature counts, while KronFit fits by maximum likelihood. Therefore the evaluation criterion is closer to the fitting criterion for us. But maximum likelihood ordinarily beats or matches the method of moments in large samples from parametric models; it’s mismatching criteria are more than compensated for by superior statistical efficiency. The explanation here may involve maximum likelihood being less robust to lack of fit of the Kronecker model, or it may be that KronFit is not finding the MLE.
The results in Table 3 show small differences in the fits between the direct and grid algorithms, although the direct algorithm is much faster. The leading term matching algorithm, when it succeeds, generates similar Kronecker parameters, although with a distinctly worse objective value. The results from the KronFit algorithm differ and likely match the graph in another aspect.
Lead term matching is tens of times faster than direct search and roughly times faster than grid search. But even the grid search takes under a minute in our examples, so the speed savings from the lead term approach is of little benefit here. For the large graphs, the time to compute the network features dominates the time to fit the parameters, showing that this approach scales to large networks.
Overall, the results indicate that the Kronecker models tend not to be a good fit to the data. The model appears to have a considerable difference in at least once of the graph features. Usually, it’s the number of triangles, which differs by up to two orders of magnitude for many of the collaboration networks.
|Graph||Kron. Parameters||Graph / Expected Features||Time|