On Large-Scale Graph Generation with Validation of Diverse Triangle Statistics at Edges and Vertices

03/24/2018
by   Geoffrey Sanders, et al.
0

Researchers developing implementations of distributed graph analytic algorithms require graph generators that yield graphs sharing the challenging characteristics of real-world graphs (small-world, scale-free, heavy-tailed degree distribution) with efficiently calculable ground-truth solutions to the desired output. Reproducibility for current generators used in benchmarking are somewhat lacking in this respect due to their randomness: the output of a desired graph analytic can only be compared to expected values and not exact ground truth. Nonstochastic Kronecker product graphs meet these design criteria for several graph analytics. Here we show that many flavors of triangle participation can be cheaply calculated while generating a Kronecker product graph. Given two medium-sized scale-free graphs with adjacency matrices A and B, their Kronecker product graph has adjacency matrix C = A ⊗ B. Such graphs are highly compressible: | E| edges are represented in O(| E|^1/2) memory and can be built in a distributed setting from small data structures, making them easy to share in compressed form. Many interesting graph calculations have worst-case complexity bounds O(| E|^p) and often these are reduced to O(| E|^p/2) for Kronecker product graphs, when a Kronecker formula can be derived yielding the sought calculation on C in terms of related calculations on A and B. We focus on deriving formulas for triangle participation at vertices, t_C, a vector storing the number of triangles that every vertex is involved in, and triangle participation at edges, Δ_C, a sparse matrix storing the number of triangles at every edge.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset