I Introduction
Complex networks are prevalent in every aspect of our lives: from technological networks to biological systems like the human brain. These networks are composed of billions of entities that give rise to emerging properties and structures. Analyzing these structures aids us in gaining new insights about our surroundings. In order to find these patterns, massive amounts of data have to be acquired and processed. Designing and evaluating algorithms to handle these datasets is a crucial task on the road to understanding the underlying systems. However, real-world datasets are often scarce or do not scale sufficiently to the desires of algorithm developers and researchers.
To alleviate this issue, network generators provide synthetic instances based on random network models. These models are able to accurately describe a wide variety of different real-world scenarios: from ad-hoc wireless networks to protein-protein interactions [1, 2]. A substantial amount of work has been contributed to understanding the properties and behavior of these models. In theory, network generators allow us to build instances of arbitrary size with controllable parameters. This makes them an indispensable tool for the systematic evaluation of algorithms on a massive scale. For example, one of the largest benchmarks for supercomputers, the Graph 500 [3], uses the R-MAT graph generator [4] to build instances of up to vertices and edges.
Even though generators like R-MAT are able to provide promising results, the generated instances are limited to a specific family of graphs [4]. Many other important network models still fall short when it comes to offering a scalable network generator and in turn to make them a viable replacement for R-MAT. These shortcomings can often be attributed to the apparently sequential nature of the underlying model or prohibitive hardware requirements.
Our Contribution
In this work we introduce a set of novel network generators that focus on scalability. We achieve this by using a communication-free paradigm [5], i.e. our generators require no communication between processing entities (PEs).
Each PE is assigned a disjoint set of local vertices. It then is responsible for generating all incident edges for this set of vertices. This is a common setting in distributed computation [6].
The models that we target are the classic Erdős-Rényi models and [7, 8] and different spatial network models including random geometric graphs (RGGs) [9], random hyperbolic graphs (RHGs) [10] and random Delaunay graphs (RDGs). For each generator, we provide bounds for their parallel (and sequential) running times. A key-component of our algorithms is the clever use of pseudorandomization and divide-and-conquer strategies. These components enable us to perform efficient recomputations in a distributed setting without the need for communication.
To highlight the practical impact of our generators, we also present an extensive experimental evaluation. First, we show that our generators rival the current state-of-the-art in terms of sequential and/or parallel running time. Second, we are able to show that our generators have near optimal scaling behavior in terms of weak scaling (and strong scaling). Finally, our experiments show that we are able to produce instances of up to vertices and edges in less than 22 minutes. These instances are in the same order of magnitude as those generated by R-MAT for the Graph 500 benchmark [3]. Hence, our generators enable the underlying network models to be used in massively distributed settings.
Ii Preliminaries
We define a graph (network) as a pair . The set () denotes the vertices of . For a directed (undirected) graph () is the set of edges. Two vertices that are endpoints of an edge are called adjacent. Edges in directed graphs are ordered tuples . An edge is called a self-loop. If not mentioned otherwise, we only consider graphs that contain no self-loops.
The set of neighbors for any vertex is defined as . For an undirected graph, we define the degree of a vertex as . In the directed case, we have to separate between the indegree and outdegree of a vertex.
We denote that a random variable
follows a probability distribution
with parameters as . The probabilty mass of a random variable is denoted as .Ii-a Network Models
Ii-A1 Erdős-Rényi Graphs
The Erdős-Rényi (ER) model was the first model for generating random graphs and supports both directed and undirected graphs. For both cases, we are interested in graph instances without parallel edges. We now briefly introduce the two closely related variants of the model.
The first version, proposed by Gilbert [8], is denoted as the model. In this model we start from an empty graph with vertices and then add edges uniformly at random with a probability .
The second version, proposed by Erdős and Rényi [7], is denoted as the model. In the model, we chose a graph uniformly at random from the set of all possible graphs which have vertices and edges.
For sake of brevity, we only discuss the generation of graphs in the model. However, all of our algorithms can easily be transferred to the model.
Ii-A2 Random Geometric Graphs
Random geometric graphs (RGGs) are undirected spatial networks where we place vertices uniformly at random in a -dimensional unit cube . Two vertices are connected by an edge iff their -dimensional Euclidean distance is within a given threshold radius . Thus, the RGG model can be fully described using the two parameters and . Note that the expected degree of any vertex that does not lie on the border, i.e. whose neighborhood sphere is completely contained within the unit cube, is [12]. In our work we focus on two and three dimensional random geometric graphs, as these are very common in real-world scenarios [13].
Ii-A3 Random Hyperbolic Graphs
Random hyperbolic graphs (RHGs) are undirected spatial networks generated in the hyperbolic plane with negative curvature. Analogous to RGGs, RHGs are parameterized by the number of vertices and a hyperbolic radius .^{1}^{1}1The parameter C controls the average degree of the graph [10]. Additionally, this model is given a power-law exponent . To generate a RHG graph, vertices are placed on a disk of radius in the hyperbolic plane.
Each vertex has an angular coordinate and a radial coordinate . The angular coordinate is sampled uniformly at random from the interval . The radial coordinate
is chosen using the probability density function
The parameter controls the growth of the random graph and determines the vertex density. Krioukov et al. [10, 14] show that for the degree distribution follows a power-law distribution with exponent .
Two vertices are connected iff their hyperbolic distance
is less than . Therefore, the neighborhood of a vertex consists of all the vertices that are within the hyperbolic circle of radius around it.
Ii-A4 Random Delaunay Graphs
A -simplex is a generalization of a triangle () to -dimensional space. A -simplex is a -dimensional polytope, i.e. the convex hull of points. The convex hull of a subset of size of these points is called an -face of . Specifically, the -faces are the vertices of and the -faces are its facets. Given a -dimensional point set with for all , a triangulation is a subdivision of the convex hull of into -simplices, such that the set of the vertices of coincides with and any two simplices of intersect in a common facet or not at all. The union of all simplices in is the convex hull of point set . A Delaunay triangulation is a triangulation of such that no point of is inside the circumhypersphere of any simplex in . If the points of are in general position, i.e. no points lie on a common -hypersphere, is unique.
In our work we are concerned with two and three dimensional random Delaunay graphs, with points sampled uniformly at random in the -dimensional unit cube .
Ii-B Sampling Algorithms
Most of our generators require sampling (with/without replacement) of elements from a (finite) universe . Sequentially, both of these problems can be solved in expected time [15]. These bounds still hold true, even if a sorted sample has to be generated [15, 16]. However, most of these algorithms are hard to apply in a distributed setting since they are inherently sequential. Recently, Sanders et al. [17] proposed a set of simple divide-and-conquer algorithms that allow sampling elements on PEs in with high probability^{2}^{2}2 i.e. with probability at least for any constant
(w.h.p.). Their algorithm follows the observation that by splitting the current universe in equal sized subsets, the number of samples in each subset follows a hypergeometric distribution. Based on this observation, they develop a divide-and-conquer algorithm to determine the number of samples for each PE. Hypergeometric random deviates are synchronized without the need for communication by making use of pseudorandomization via hash functions. Therefore, PEs that follow the same recursion subtrees generate the same random deviates, while deviates in different subtrees are independent of each other.
To be more specific, for each subtree of the recursion a unique seed value is computed (independent of the rank of the PE). Afterwards, a hash value for this seed is computed and used to initialize a pseudorandom number generator (PRNG). In turn, this PRNG is used to compute the random deviates.
We also require the sampling of random numbers that follow a particular probability distribution, e.g. binomial or hypergeometric distributions. For this purpose, we use the acceptance-rejection method [18, 19]. Thus, we are able to generate binomial and hypergeometric random deviates in expected constant time [20, 21].
Iii Related Work
We now cover important related work for each of the network models used in our work. Additionally, we highlight recent advances for other network models that are relevant for the design of our algorithms.
Iii-a ER Model
Batagelj and Brandes [22] present optimal sequential algorithms for the as well as the model. Their generator makes use of an adaptation of a linear time sampling algorithm (Algorithm D) by Vitter [15]. In particular, the algorithm samples skip distances between edges of the resulting graph. Thus, they are able to generate a graph in optimal time . Their generator is based on a virtual Fisher-Yates shuffle [23] and also has an optimal running time of .
Nobari et al. [24] proposed a data parallel generator for both the directed and undirected model. Their generators are designed for graphics processing units (GPUs). Like the generators of Batagelj and Brandes [22], their algorithm is based on sampling skip distances but uses precomputations and prefix sums to adapt it for a data parallel setting.
Iii-B RGG Model
Generating random geometric graphs with vertices and radius can be done naïvely in time . This bound can be improved if the vertices are known to be generated uniformly at random [25]. To this end, a partitioning of the unit square into squares with side length is created. To find the neighbors of each vertex, we consider each cell and its neighbors. The resulting generator has an expected running time of .
Holtgrewe et al. [25, 26] proposed a distributed memory parallelization of this algorithm for the two dimensional case. Using sorting and vertex exchanges between PEs, they distribute vertices such that edges can be generated locally. The expected time for the local computation of their generator is , due to sorting. The expected time needed for communication is bounded by
We are not aware of efficient distributed implementations of RGG generators for dimensions greater than two.
Iii-C RHG Model
Von Looz et al. [27, 28] propose two different algorithms for generating random hyperbolic graphs. Their first algorithms relates the hyperbolic space to Euclidean geometry using the Poincaré disk model to perform neighborhood queries on a polar quadtree. The resulting generator has a running time of .
In their second approach, von Looz et al. [28] propose a generator with an observed running time of . Their algorithm uses a partitioning of the hyperbolic plane into concentric ring-shaped slabs where vertices are stored in sorted order. Neighborhood queries are computed using angular boundaries for each slab to bound the number of vertex comparisons.
Finally, independent of this work, Penschuck [29] proposed a memory efficient streaming generator that can be adapted to a distributed setting. Similar to von Looz et al. [28], they partition the hyperbolic plane into concentric slabs. Vertices in a slab are maintained in sorted order and edges are generated in a request centric approach. Their algorithm has a memory footprint of and a running time of with high probability. Additionally, they present a shared memory parallelization of their algorithm that can be adapted to a distributed setting with a constant communication overhead.
Iii-D RDG Model
As the Delaunay triangulation (DT) of a point set is uniquely defined, generating random Delaunay graphs can be separated into generating a random point set and computing its DT. A plethora of algorithms for computing the DT of a given point set in two and three dimensions exist. Funke and Sanders [30] review recent work on parallel DT algorithms and propose a competitive DT algorithm suitable for large clusters. The generation of a random point set is identical to the one in the RGG model.
Iii-E Miscellaneous
Iii-E1 Barabasi and Albert Model
Recently, Sanders and Schulz [5] proposed an algorithm for generating massive scale-free networks using the preferential attachment model by Barabasi and Albert [31]. Their work is based on the optimal sequential algorithm by Batagelj and Brandes [22]. To adapt the algorithm to a distributed setting, Sanders and Schulz use pseudorandomness to reproduce random behavior when generating edges. This allows them to compute edges independently from one another and makes their algorithm embarrassingly parallel.
Iii-E2 Recursive Matrix Model
The recursive matrix model (R-MAT) by Chakrabarti et al. [4] is a special case of the stochastic Kronecker graph model [32]. This model is well known for its usage in the popular Graph 500 benchmark [3]. Generating a graph with vertices and edges is done by sampling each of the edges independently. For this purpose, the adjacency matrix is recursively subdivided into four partitions. Each partition is assigned an edge probability . Recursion continues until a partition is encountered, in which case the corresponding edge is added to the graph. The time complexity of the R-MAT generator therefore is since recursion has to be repeated for each edge.
Iv ER Generator
We now introduce our distributed graph generators, starting with the Erdős-Rényi generators for both the directed and undirected case.
Iv-a Directed Graphs
Generating a (directed) graph in the model is the same as sampling a graph from the set of all possible graphs with vertices and edges. To do so, we can sample edges uniformly at random from the set of all possible edges. Since we are not interested in graphs with parallel edges, sampling has to be done without replacement. We do so by using an adaptation of the distributed sampling algorithm by Sanders et al. [17].
Our generator starts by dividing the set of possible edges into chunks, one for each PE. Each chunk represents a set of rows of the adjacency matrix of our graph. We then assign each chunk to its corresponding PE using its id . Afterwards, PE is responsible for generating the sample (set of edges) for its chunk. Note that we can easily adapt this algorithm to an arbitrary number of consecutive chunks per PE.
To compute the correct sample size (number of edges) for each chunk, we use the same divide-and-conquer technique used by the distributed sampling algorithm [17] (see Section II-B). The resulting samples are then converted to directed edges using simple offset computations.
Lemma IV.1.
If the maximum universe size per PE is then the distributed sampling algorithm runs in time .
Proof.
See Sanders et al. [17]. ∎
Lemma IV.2.
The directed generator runs in expected time .
Proof.
Our algorithm is an adaptation of the distributed sampling algorithm that evenly divides the set of vertices, and therefore the set of potential edges, between PEs. Thus, the universe per PE has size and the running time directly follows from Lemma IV.1. ∎
Iv-B Undirected Graphs
In the undirected case, we have to ensure that an edge is sampled by both PEs, the one that is assigned and the one that is assigned . Since each PE is assigned a different chunk, they follow different paths in the recursion tree. Hence, due to the independence of the random variables generated in each recursion tree, it is highly unlikely that they both sample the edge independently. To solve this issue, we introduce a different partitioning of the graphs adjacency matrix into chunks.
Our generator begins by dividing each dimension of the adjacency matrix into sections of size roughly . A chunk is then defined as a set of edges that correspond to a submatrix of the adjacency matrix. Due to the symmetry of the adjacency matrix, we are able to restrict the sampling to the lower triangular adjacency matrix. Thus, we have a total of chunks that can be arranged into a triangular chunk matrix. Afterwards, each PE is assigned a row and column of this matrix based on its id as seen in Fig. 1. By generating rectangular chunks instead of whole rows or columns, we can make sure that both PE and PE redundantly generate chunk using the same set of random values. In turn, they both sample the same set of edges independently from each other.
We now explain how to adapt the divide-and-conquer algorithm by Sanders et al. [17] for our chunk matrix. To generate the required partitioning of the adjacency matrix, we start by dividing the chunk matrix into equal sized quadrants. This is done by splitting the rows (and columns) into two equal sized sections and . We choose as our splitting value.
We then compute the number of edges within each of the resulting quadrants. Since we are only concerned with the lower triangular adjacency matrix, there are two different types of quadrants: triangular and rectangular. The second and fourth quadrant are triangular matrices with and rows (and columns) respectively. We then generate a set of three hypergeometric random deviates to determine the number of samples (edges) in each quadrant. As for the distributed sampling algorithm [17], we make use of pseudorandomization via hash functions that are seeded based on the current recursion subtree to synchronize deviates between PEs.
Each PE then uses its id to decide which quadrants to handle recursively. Note that at each level of the recursion, a PE only has to handle two of the four quadrants. We use a sequential sampling algorithm once a single chunk is remaining. Offset computations are performed to map samples to edges based on the type of the chunk (rectangular or triangular). The resulting recursion trees has at most levels and size .
Lemma IV.3.
The undirected generator runs in expected time .
Proof.
See [generatorarxiv].
∎
V RGG Generator
Generating a -dimensional random geometric graph can be done naïvely in time. We reduce this bound by introducing a spatial grid data structure similar to the one used by Holtgrewe et al. [25]. The result is a partition of the unit cube into equal sized cells with side length of at least . Therefore, the number of grid cells can be bound by . In the following, we assume (for see [33]). However, our generator can easily be adapted for higher dimensions.
Since vertices are uniformly distributed within the unit square, we can model the vertex distribution using a
balls into bins problem. The expected number of vertices in each cell therefore is .Lemma V.1.
The expected number of vertex comparisons for generating a random geometric graph using a three dimensional grid data structure with side length is .
Proof.
See Appendix.
∎
Parallelization
We now discuss how to parallelize our approach in a communication-free way. To the best of our knowledge, the resulting generator is the first efficient distributed implementation of a RGG generator for .
Our generator again uses the notion of chunks. A chunk in the RGG case represents a rectangular section of the unit square. We therefore partition the unit cube into disjoint chunks and assign one of them to each PE. There is one caveat with this approach, in that only cube numbers of are possible. To alleviate this issue, we generate more than chunks and distribute them evenly between PEs. To be more specific, we are able to generate chunks and then distribute them to the PEs in a locality-aware way by using a Z-order curve [34].
Each PE is then responsible for generating the vertices in its chunk(s) as well as all their incident edges. Again, we use a divide-and-conquer approach similar to the previous generators.
For this purpose the unit cube is evenly partitioned into eight equal-sized subcubes. In turn, the probability for a vertex to be assigned to an individual subcube is the ratio of the area of the subcube to the area of the whole cube. Thus, we generate a constant number of binomial random deviates to compute the number of vertices within each of the subcubes. The binomial distribution is parameterized using the number of remaining vertices
and the aforementioned subcube probability . As for the ER generators, deviates are generated by exploiting pseudorandomization via hash functions seeded on the current recursion subtree. Therefore, deviates are synchronized between PEs and we require no communication for generating local vertices. Note that the resulting recursion tree has at most levels and size . Once a PE is left with a single chunk, we compute additional binomial random deviates to get the number of vertices in each cell of side length .Since we want each PE to generate all incident edges for its local vertices, we have to make sure that the cells of neighboring chunks that are within the radius of local vertices are also generated. Because each cell has a side length of at least , this means we have to generate all cells directly adjacent to the chunk(s) of a PE. Due to the communication-free design of our algorithm, the generation of these cells is done through recomputations using the same divide-and-conquer algorithms as for the local cells. We therefore repeat the vertex generation process for the neighboring cells. An example of the subgraph that a single PE generates for the two dimensional case is given in Fig. 2.
Afterwards, we can simply iterate over all local cells and generate the corresponding edges by vertex comparisons with each neighboring cell. To avoid duplicate edges, we only perform vertex comparisons for local neighboring cells with a higher id.
Lemma V.2.
The three dimensional RGG generator has an expected running time of .
Proof.
See Appendix.
∎
Vi RDG Generator
The point generation phase for Delaunay graphs follows the same principles as for RGGs, differing only in the definition of the cell side length , which is set to the mean distance of the th-nearest-neighbor for vertices in the unit -hypercube, [35].
To produce the DT of the generated point set, our algorithm proceeds as follows. Each PE computes the DT of its assigned chunk with expected vertices.^{4}^{4}4Like for the RGG, a PE can be assigned more than one chunk. To ensure correct incident edges to a chunk , vertices from a halo of neighboring cells are added to the DT. Initially, the cell radius of the halo is set to one. We iteratively increase the radius of the halo – thus, adding vertices from additional “layers” of cells to the DT – until there exists no simplex with at least one vertex from chunk , whose circumhypersphere is not fully contained within the halo. Exploiting pseudorandomness to ensure all PEs generate the same vertices for the same cell, we can thus certify that no simplex of chunk violates the Delaunay criterion for any point of another chunk [30, 36]. With our choice of cell size length in expectation suffices for the DT to converge.
Lemma VI.1.
The RDG generator has to compute the DT for an expected number of vertices, resulting in an expected running time of for uniformly distributed vertices.
Proof.
See Appendix. ∎
Vii RHG Generator
As for the RGG generator, we can naïvely create a random hyperbolic graph in by comparing all pairs of vertices. We improve this bound by introducing a partitioning of the hyperbolic plane [37, 27, 28]. For this purpose, we split the hyperbolic plane into concentric annuli with constant width. Since , this results in annuli [10]. Each annulus is defined by two radii and , with and .
Since each PE has to determine the number of vertices in each annulus, we compute a multinomial random deviate with outcomes. This can be done by computing a set of dependent binomial random deviates via pseudorandomization. The probability that an individual vertex is assigned to annulus is given by
Because vertices are distributed multinomially, the expected value for the number of vertices in annulus is . We then continue to further partition each annulus in the angular direction into chunks using a divide-and-conquer approach that uses binomial random deviates as for the other generators. Note that the resulting recursion tree within a single annulus has a size of at most and a height of .
Finally, we perform a third partitioning of chunks into a set of equal-sized cells in the angular direction. The number of cells per chunk is chosen in such a way that each cell contains an expected constant number of vertices . Fig. 3 shows the resulting partitioning of vertices in the hyperbolic plane into cells and annuli.
Lemma VII.1.
Generating the grid data structure for PEs takes expected time and assigns each PE an expected number of vertices.
Proof.
See Appendix.
∎
Neighborhood Queries
We now describe how we use our grid data structure to efficiently reduce the number of vertex comparisons. For this purpose, we begin by iterating over the cells in increasing order from the innermost annulus outwards and perform a neighborhood query for each vertex.
The query begins by determining how far the angular coordinate of a potential neighbor is allowed to deviate from the angular coordinate of our query vertex . If we assume that lies in annulus with radial boundaries and , we can use the distance inequality
to determine this deviation. We then gather the set of cells that lie within the resulting boundary coordinates. To do so, we start from the cell that intersects the angular coordinate of our query vertex and then continue outward in both angular directions until we encounter a cell that lies outside the boundary. For each cell that we encounter, we perform distance comparisons to our query vertex and add edges accordingly. To avoid the costly evaluation of trigonometric functions for each comparison we maintain a set of precomputed values as proposed by Penschuck [29]. Note that in order to avoid duplicate edges in the sequential case, we can limit neighborhood queries to annuli that have an equal or larger radial boundary than our starting annulus.
Lemma VII.2.
The expected time complexity of the sequential RHG generator for vertices with radius and a power-law exponent is .
Proof.
See Appendix. ∎
To adapt our queries for a distributed setting, we need to recompute all non-local vertices that lie within the hyperbolic circle (of radius ) of any of our local vertices. To find these vertices, we perform an additional inward neighborhood query.
Queries in the distributed setting work similarly to the sequential case, with the addition that any non-local chunks that we encounter during the search are recomputed. These newly generated vertices are then assigned their respective cells and stored for future searches.
One issue with this approach is that the innermost annuli, which contains only a constant number of vertices with high probability as grows to infinity, are divided into chunks. However, since the innermost annuli are almost always contained within the search radius for any given vertex, we have to iterate over chunks for each of these levels. This severely impacts the running time for each individual search for a large number of PEs. To alleviate this, the innermost annuli are computed redundantly on each PE and stored directly in a single chunk.
Lemma VII.3.
The expected time complexity of the parallel RHG generator for vertices with radius , average degree and a power-law exponent is .
Proof.
See Appendix. ∎
Viii Experimental Evaluation
We now present the experimental evaluation of our graph generators. For each algorithm, we perform a running time comparison and analyze its scaling behavior.
Viii-a Implementation
An implementation of our graph generators (KaGen) in C++ is available at https://github.com/sebalamm/KaGen. We use Spooky Hash^{5}^{5}5http://www.burtleburtle.net/bob/hash/spooky.html as a hash function for pseudorandomization. Hash values are used to initialize a Mersenne Twister [38] for generating uniform random deviates. Non-uniform random deviates are generated using the stocc library^{6}^{6}6http://www.agner.org/random/. If the size of our inputs (e.g. the number of vertices) exceeds bit, we use the multiple-precision floating points library GMP^{7}^{7}7http://www.mpfr.org and a reimplementation of the stocc library. All algorithms and libraries are compiled using g++ version 5.4.1 using optimization level fast and -march=native. In the distributed setting, we use Intel MPI version 1.4 compiled with g++ version 4.9.3.
Viii-B Experimental Setup
We use two different machines to conduct our experiments. Sequential comparisons are performed on a single core of a dual-socket Intel Xeon E4-2670 v3 system with 128 GB of DDR4-2133 memory, running Ubuntu 14.04 with kernel version 3.13.0-91-generic. If not mentioned otherwise, all results are averages of ten iterations with different seeds.
For scaling experiments and parallel comparisons we use the Phase 1 thin nodes of the SuperMUC supercomputer. The SuperMUC thin nodes consist of 18 islands and a total of 9216 nodes. Each computation node has two Sandy Bride-EP Xeon E5-2680 8C processors, as well as 32 GB of main memory. Each node runs the SUSE Linux Enterprise Server (SLES) operating system. We use the maximum number of 16 cores per node for our scaling experiments. The maximum size of our generated instances is bound by the memory per core (2 GB). To generate even larger instances, one could use a full streaming approach which will be discussed in Section IX.
We analyze the scaling behavior of our algorithms in terms of weak and strong scaling. Weak scaling measures how the running time varies with the number of PEs for a fixed problem size per PE. Analogously, strong scaling measures the running time for a fixed problem size over all PEs. However, since the results for the strong scaling experiments do not provide additional insights, we omit them from our evaluation. Again, results are averaged over ten iterations with different seeds.
Viii-C Erdős-Rényi Generator
There are various implementations of efficient sequential Erdős-Rényi generators available (e.g. [22]). However, there is little to no work on distributed memory generators. Thus, we perform a sequential comparison of our generator to the implementation found in the Boost^{8}^{8}8http://www.boost.org/doc/libs/1_62_0/libs/graph/doc/erdos_renyi_generator.html library. Their generator uses a sampling procedure similar to Algorithm D [15] and serves as an example for an efficient linear time generator.
For our comparison, we vary the number of vertices from to and the number of edges from to . Fig. 4 shows the time per edge for both generators for the two largest sets of vertices.
First, we note that both implementations have a constant time per edge with increasing . However, the Boost implementation also has an increasing time per edge for growing numbers of vertices . In contrast, the running time of our generator is independent of . This is no surprise, since our generator uses a simple edge list and does not maintain a full graph data structure.
For the directed model, our generator is roughly times faster than Boost for the largest value of . In the undirected case, our generator is roughly times faster and has an equally lower time per edge. All in all, the results are consistent with the optimal theoretical running times of for both algorithms.
Next, we discuss the weak scaling behavior of our Erdős-Rényi generators. For this purpose, each PE is assigned an equal number of vertices and edges to sample. For our experiments, we set and let the number of edges range from to . Results are presented in Fig. 5.
We can see that our directed generator has an almost perfect weak scaling behavior. Only for the smaller input sizes and more than PEs, the logarithmic term of our running time becomes noticeable. The minor irregularities that we observe for the largest number of PEs are due to performance differences for nodes in the supercomputer. Nonetheless, our results are consistent with our asymptotic running time .
If we look at the weak scaling behavior of our undirected generator, we can see that for small numbers of PEs the running time starts to increase and then remains nearly constant. This is due to the fact that as the number of PEs/chunks increases, the number of redundantly generated edges also increases up to twice the number of sequentially sampled edges. For smaller values of and large , we also see a linear increase in running time. We attribute this to the linear time needed to locate the correct chunks for each PE.
Viii-D RGG Generator
There are various implementations of the naïve generator available (e.g. [39]). However, a more efficient and distributed algorithm is presented by Holtgrewe et al. [25].
Since their algorithm and our own generator are nearly identical in the sequential case, we are mainly interested in their parallel running time for a growing number of PEs. Therefore, we measure the total running time and vary the input size per PE from to . It should be noted that Holtgrewe et al. only support two dimensional random geometric graphs and thus the three dimensional generator is excluded. To maintain an equal amount of work per PE (weak scaling), the radius is set to . This choice also ensures that the resulting graph is almost always connected [40]. Fig. 6 shows the running time of both competitors for a growing number of PEs. Additionally, Fig. 7 shows the full scaling experiments for our two and three dimensional generators.
Due to the recomputations used by our generator, Holtgrewe et al. quickly become faster by up to a factor of 2 as the number of PEs increases. To be more specific, the number of neighbors that we have to generate redundantly increases from zero for one PE up to eight neighbors for more than four PEs. This increase in running time can be bound by computing the additional amount of vertices created through redundant computations and multiplying it by the average degree . The resulting bound yields roughly twice the running time needed for the sequential computation, which is consistent with the experimental results. A very similar analysis can also be done for our three dimensional generator. Again, minor irregularities are due to performance differences for individual nodes in the supercomputer.
Once we reach PEs, the communication required by Holtgrewe et al. rapidly becomes noticeable and KaGen is significantly faster. Overall, the results are in line with our asymptotic running time of .
Viii-E RDG Generator
Our implementation uses the CGAL library to compute the DT of the vertices of a chunk and its halo [41]. CGAL provides a state-of-the art implementation, which most of the other available DT generators use as backend as well. We therefore omit a sequential comparison study.
The experimental setup for the RDG is equivalent to the RGG weak scaling experiments, varying the input size per PE from to for the 2D RDG and – due to memory constraints – from to for the three dimensional one. Moreover, for 3D RDG and PEs, only the smallest input size could be computed within the memory limit per core of SuperMUC. Our experiments show an almost constant time per edge – depicted in Fig. 8 – well in agreement with our expected asymptotic running time of . Similarly to the RGG, the initial increase in runtime can be attributed to the redundant vertex generation of neighboring cells. As the expected halo cell size is one, no significant further increase in runtime can be observed for more than PEs. We present additional experiments that validate the size of the halo in Appendix.
Viii-F RHG Generator
Lastly, we compare our RHG generator to the algorithm by von Looz et al. [28] available in the NetworKit ^{9}^{9}9https://networkit.iti.kit.edu/api/generators.html library and the algorithm by Penschuck [29]. Both algorithms, which are capable of multi-threaded execution, are limited to a single thread to ensure a fair comparison. We test all generators for to vertices and average degrees varying from to . These are average degrees found in various real-world networks [27]. Fig. 9 shows the time per edge for increasing values of the power-law exponent and different average degrees [42, 43].
First, we can see that our generator is inferior to our competitors for small average degrees. We attribute this to the additional overhead needed to initialize our bucket data structure. However, as the average degree increases, these costs are quickly amortized by our linear time queries.
Overall, the algorithm by Penschuck [29] is the fastest algorithm for all average degrees tested. This is to be expected, since their algorithm has a large emphasis on cache and memory efficiency.
Considering the asymptotic behavior of the generators, NetworKit has an increase in running time for growing number of vertices and larger average degrees. This behavior is in line with the experimentally observed running time of [28]. The generator by Penschuck and our own generator both have a roughly constant time per edge independent of .
Lastly, we present the results of our scaling experiments for the RHG generator. For the weak scaling experiments each PE is again assigned an equal number of vertices. The number of vertices per PE ranges from to . The power-law exponent is fixed to . Fig. 10 shows the weak scaling results of the RHG generator with an average degree of .
Looking at the weak scaling behavior, we see that there is a relatively steep increase in running time for a growing number of PEs. We can attribute this behavior to the redundant computations that are introduced through parallelization. Additionally, high degree vertices, which are hard to distribute efficiently if we want a partitioned output graph, severally impede the scaling behavior. Since the maximum degree is with high probability [14], these vertices dominate the running time of our algorithm.
Ix Conclusion
We presented scalable graph generators for a set of commonly used network models. Our work includes the classic Erdős-Rényi model, in both the and variants, random geometric graphs, random Delaunay graphs and random hyperbolic graphs.
Our algorithms make use of a combination of divide-and-conquer schemes and grid data structures to narrow down their local sampling space. We redundantly compute parts of the network that are within the neighborhood of local vertices. These computations are made possible through the use of pseudorandomization via hash functions. The resulting algorithms are embarrassingly parallel and communication-free.
Our extensive experimental evaluation indicates that our generators are competitive to state-of-the-art algorithms while also providing near-optimal scaling behavior. In turn, we are able to generate instances with up to vertices and edges in less than 22 minutes on 32768 cores.^{10}^{10}10Largest instance generated with the directed generator. Therefore, our generators enable new network models to be used for research on a massive scale. In order to help researchers to use our generators, we provide all algorithms in a widely usable open-source library.
Future Work
As mentioned in Section VIII-B, we would like to extend our generators to use a full streaming approach. This would drastically reduce the memory needed for the auxiliary data structures, especially for the spatial network generators. However, this would lead to larger constant factors in the running time since recomputations have to occur more frequently.
Additionally, our generators allow us to perform an extensive study on new graph models for high-performance computing benchmarks. In turn, these benchmarks could target a wider variety or real-world models and scenarios. A more detailed theoretical analysis using tighter bounds, especially for the parallel running times of our generators would be beneficial for this purpose.
Acknowledgment
The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (www.lrz.de).
References
- [1] S. Muthukrishnan and G. Pandurangan, “Thresholding random geometric graph properties motivated by ad hoc sensor networks,” JCSS, vol. 76, no. 7, pp. 686 – 696, 2010.
- [2] A. Clauset, C. R. Shalizi, and M. E. Newman, “Power-law distributions in empirical data,” SIAM review, vol. 51, no. 4, pp. 661–703, 2009.
- [3] “Graph 500 benchmark,” http://www.graph500.org, accessed: 2016-10-05.
- [4] D. Chakrabarti and C. Faloutsos, “Graph mining: Laws, generators, and algorithms,” ACM Comput. Surv., vol. 38, no. 1, 2006.
- [5] P. Sanders and C. Schulz, “Scalable generation of scale-free graphs,” Inf. Process. Lett., vol. 116, no. 7, pp. 489–491, 2016.
- [6] A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry, “Challenges in parallel graph processing,” Parallel Process. Lett., vol. 17, no. 01, pp. 5–20, 2007.
- [7] P. Erdős and A. Rényi, “On Random Graphs I.” Publicationes Mathematicae (Debrecen), vol. 6, pp. 290–297, 1959.
- [8] E. N. Gilbert, “Random graphs,” Ann. Math. Statist., vol. 30, no. 4, pp. 1141–1144, Dec 1959.
- [9] X. Jia, “Wireless networks and random geometric graphs,” in IEEE I-SPAN, 2004, pp. 575–580.
- [10] D. V. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguñá, “Hyperbolic geometry of complex networks,” CoRR, no. abs/1006.5169, 2010.
- [11] R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang, “Introducing the graph 500,” Cray User’s Group, 2010.
- [12] M. Penrose, Random geometric graphs. Oxford University Press, 2003, no. 5.
- [13] N. Pržulj, D. G. Corneil, and I. Jurisica, “Modeling interactome: scale-free or geometric?” Bioinformatics, vol. 20, no. 18, pp. 3508–3515, 2004.
- [14] L. Gugelmann, K. Panagiotou, and U. Peter, “Random hyperbolic graphs: Degree sequence and clustering,” in ICALP, ser. LNCS, vol. 7392, 2012, pp. 573–585.
- [15] J. S. Vitter, “An efficient algorithm for sequential random sampling,” ACM Trans. Math. Softw., vol. 13, no. 1, pp. 58–67, 1987.
- [16] J. L. Bentley and J. B. Saxe, “Generating sorted lists of random numbers,” ACM TOMS, vol. 6, no. 3, pp. 359–364, 1980.
- [17] P. Sanders, S. Lamm, L. Hübschle-Schneider, E. Schrade, and C. Dachsbacher, “Efficient random sampling-parallel, vectorized, cache-efficient, and online,” arXiv, no. 1610.05141, 2016.
- [18] C. Robert and G. Casella, Monte Carlo statistical methods. Springer Science & Business Media, 2013.
- [19] J. Von Neumann, “13. Various techniques used in connection with random digits,” 1951.
- [20] E. Stadlober, “Ratio of uniforms as a convenient method for sampling from classical discrete distributions,” in WSC. ACM, 1989, pp. 484–489.
- [21] E. Stadlober and H. Zechner, “The Patchwork rejection technique for sampling from unimodal distributions,” ACM Trans. Model. Comput. Simul., vol. 9, no. 1, pp. 59–80, 1999.
- [22] V. Batagelj and U. Brandes, “Efficient generation of large random networks,” Phys. Rev. E, vol. 71, p. 036113, Mar 2005.
- [23] R. A. S. Fisher and F. Yates, Statistical tables for biological, agricultural, and medical research. Oliver and Boyd, 1963.
- [24] S. Nobari, X. Lu, P. Karras, and S. Bressan, “Fast random graph generation,” in EDBT. ACM, 2011, pp. 331–342.
- [25] M. Holtgrewe, “A scalable coarsening phase for a multi-level graph partitioning algorithm,” 2009.
- [26] M. Holtgrewe, P. Sanders, and C. Schulz, “Engineering a scalable high quality graph partitioner,” in IPDPS. IEEE, 2010, pp. 1–12.
- [27] M. von Looz, H. Meyerhenke, and R. Prutkin, “Generating random hyperbolic graphs in subquadratic time,” in ISAAC, ser. LNCS, vol. 9472, 2015, pp. 467–478.
- [28] M. von Looz, M. S. Özdayi, S. Laue, and H. Meyerhenke, “Generating massive complex networks with hyperbolic geometry faster in practice,” pp. 1–6, 2016.
- [29] M. Penschuck, “Generating practical random hyperbolic graphs in near-linear time with sub-linear memory,” in SEA. Springer, 2017.
- [30] D. Funke and P. Sanders, “Parallel d-D Delaunay Triangulations in Shared and Distributed Memory,” in ALENEX. SIAM, 2017, p. 207–217.
- [31] A.-L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, 1999.
- [32] J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos, “Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication,” in PKDD. Berlin, Heidelberg: Springer, 2005, pp. 133–145.
- [33] S. Lamm, “Communication efficient algorithms for generating massive networks,” 2017.
- [34] G. M. Morton, A computer oriented geodetic data base and a new technique in file sequencing. IBM, 1966.
- [35] P. Bhattacharyya and B. K. Chakrabarti, “The mean distance to the th neighbour in a uniform distribution of random points,” European Journal of Physics, vol. 29, no. 3, p. 639, 2008.
- [36] S. Lo, “Parallel Delaunay triangulation in three dimensions,” Comput.Methods in Appl.Mech.Eng., vol. 237-240, pp. 88–106, 2012.
- [37] K. Bringmann, R. Keusch, and J. Lengler, “Geometric inhomogeneous random graphs,” CoRR, no. abs/1511.00576, 2015.
- [38] M. Matsumoto and T. Nishimura, “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Trans. Model. Comput. Simul., vol. 8, no. 1, pp. 3–30, 1998.
- [39] A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using NetworkX,” in SciPy, 2008, pp. 11–15.
- [40] M. J. Appel and R. P. Russo, “The connectivity of a graph on uniform points on [0, 1] d,” Statistics & Probability Letters, vol. 60, no. 4, pp. 351–357, 2002.
- [41] S. Hert and M. Seel, “dD convex hulls and delaunay triangulations,” in CGAL User and Reference Manual, 4.7 ed. CGAL, 2015.
- [42] K. Choromanski, M. Matuszak, and J. Miekisz, “Scale-free graph with preferential attachment and evolving internal vertex structure,” Journal of Statistical Physics, vol. 151, no. 6, pp. 1175–1183, 2013.
- [43] J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.-L. Barabási, “Structure and tie strengths in mobile communication networks,” Proc. Nat. Acad. of Sciences (USA), vol. 104, no. 7332, 2007.
- [44] P. Sanders, S. Lamm, L. Hübschle-Schneider, E. Schrade, and C. Dachsbacher, “Efficient random sampling - Parallel, vectorized, cache-efficient, and online,” CoRR, no. abs/1610.05141, 2016.
- [45] A. Maus, “Delaunay triangulation and the convex hull of points in expected linear time,” BIT Numerical Mathematics, vol. 24, no. 2, pp. 151–163, Jun 1984.
Appendix A Proofs
A-a Erdős-Rényi Model
Lemma A.1.
The undirected generator runs in expected time .
Proof.
Each PE has to generate a total of chunks consisting of a single triangular submatrix and rectangular submatrices. Additionally, each edge has to be generated twice (except when ), once by the PE that is assigned vertex , and once by the PE that is assigned vertex . Thus, we have to sample a total of edges. At every level of our recursion, we need to split the quadrants and in turn compute three hypergeometric random deviates. Therefore, the time spent at every level only takes expected constant time. Since there are at most levels until each PE reaches its chunks, the expected total time spent on recursion is .
A-B Random Geometric Graph Model
Lemma A.2.
The expected number of vertex comparisons for generating a random geometric graph using a three dimensional grid data structure with side length is .
Proof.
By construction each cell of our grid has a side length of at least and a constant number of neighbors. As stated previously, each cell contains vertices in expectation. Therefore the number of comparison to perform between neighboring cells is . As mentioned in Section V, there are cells in total. We therefore get vertex comparisons in expectation. The expected average degree of a vertex outside the border in the RGG model is , as presented in Section II-A2. Vertices on the border have fewer edges than the ones in the center and therefore can be omitted from our analysis. We then have to generate edges in total. Thus, the expected total number of vertex comparisons is .
The case where our graph is not connected, i.e. , the expected number of vertices per cell is constant. Analogously to the previous analysis we are able to show that the running time for sampling the edges is bound by . ∎
Lemma A.3.
The three dimensional RGG generator has an expected running time of .
Proof.
We first show that our divide-and-conquer algorithm assigns each PE vertices in time with high probability. As stated above, the recursion tree has at most levels and size . At each level of the recursion we generate three binomial random deviates, which takes expected constant time. Since the number of vertices that is distributed to each chunk follows a uniform distribution we have a balls into bins situation. Thus, we have an expected number of vertices per PE. By using Chernoff bounds we can show that the number of vertices is w.h.p. if and w.h.p. if . Generating these vertices and placing them into the correct grid cell can be done in linear time.
Generating the vertices for the neighbors of the local chunk can also be done in , since each chunk only has a constant number of neighbors. Thus, generating and partitioning all vertices necessary for each individual PE has an expected time complexity of .
Following the proof of Lemma A.2, we are able to show that the expected number of edges that each PE has to generate is with high probability. Therefore, the generator has an expected running time of . ∎
A-C Random Delaunay Graph Model
Lemma A.4.
The RDG generator has to compute the DT for an expected number of vertices, resulting in an expected running time of for uniformly distributed vertices.
Proof.
Following Lemma A.3 each PE can generate the vertices of its chunk and a constant number of neighboring chunks in time.
The mean distance of the th-nearest-neighbor for vertices in the unit -hypercube is bound by [35]
Given this definition of the cell side length , an expected number of vertices fall into each cell, totaling
vertices for a halo of cell radius and fixed dimension . In expectation, a halo of cell radius contains the th-nearest-neighbor of a vertex on the convex hull of the chunk. Thus, for uniformly distributed points, an expected constant suffices to determine the Delaunay triangle of such a vertex . Therefore only a constant number neighboring cells need to be generated by a PE to determine the DT of the vertices of its chunk.
The DT of a uniformly distributed point set can be computed in expected linear time [45], yielding the Lemma. ∎
Fig. 11 validates our expected constant bound on experimentally.
A-D Random Hyperbolic Graph Model
Lemma A.5.
Generating the grid data structure for PEs takes expected time and assigns each PE an expected number of vertices.
Proof.
Chunks are chosen in such a way that they assign each PE an equal angular interval of the hyperbolic plane . The number of vertices per chunk in an annulus is generated through a set of binomial random variates. This result in a uniform distribution of the vertices in the interval with respect to their angular coordinate. Thus, each PE is assigned vertices in expectation.
The time spent during the chunk creation per annulus is with high probability, since the size of the recursion tree is at most and we only spent expected constant time per level for generating the binomial random deviate. We have to repeat this recursion for each of the annuli. Thus, the total time spent for the recursion over all annuli is .
As stated above, the chunk for each PE contains vertices. The running time bound then follows by combining the time spent on vertex creation as well as recursion. ∎
Lemma A.6.
The expected time complexity of the sequential RHG generator for vertices with radius and a power-law exponent is .
Proof.
Generating our bucket data structure can be done in expected time . We begin by showing that the set of candidates gathered by our querying approach is at worst a constant overestimation of the set of neighbors. For this purpose, we use the approximations derived by Gugelmann et al. [14].
The probability of a point contributing to annulus is thus given by
Since we set , the ratio between two successive annuli is
Using this result, the expected number of vertices in annulus is . Next, we use the distance inequality and to derive a bound on the expected number of points we examine in each annulus. We can approximate the angular deviation for annulus by using . Dividing this bound by thus gives us the probability of a vertex in annulus contributing to our set of candidates. The total number of candidates can then be calculated by a summation over all annuli
Therefore, the time needed for all queries is .
Next, we show that the time needed to iterate over the annuli for all vertices takes expected time . Recall that for each vertex we process all annuli in an outward fashion starting from the current annulus. For a vertex in annulus , we thus have to process the other annuli. The expected number of annuli that we have to process for vertices in annulus in turn is Again, summation over all annulus results in an expected number of annuli
Overall, we need time to build and iterate over our bucket data structure. Additionally, the expected time needed for all queries is . Therefore, our sequential generator has an expected running time of . ∎
Lemma A.7.
The expected time complexity of the parallel RHG generator for vertices with radius , average degree and a power-law exponent is .
Proof.
Building our bucket data structures takes time as shown in Lemma A.5. Sampling local vertices and edges has an expected asymptotic running time of as for the sequential approach.
We now give a bound on the fraction of incident edges and thus vertices that have to be recomputed locally. We do so by first analyzing the number of vertices that have to be recomputed during the inward search. For this purpose, we limit the angular boundaries for our vertex queries to be only a constant fraction of the local angular interval. This can be done by using the distance inequality and setting the maximal angular deviation equal to . Solving this equation for and determining the corresponding expected degree gives us a threshold on the radius of points that need to be recomputed [29]. Thus, the number of vertices that have to be computed redundantly is binomially distributed around the mean
For the outward search, we get a bound on the recomputed vertices by examining the maximum degree . Gugelmann et al. [14] showed that with high probability . ∎
Comments
There are no comments yet.