Design, Generation, and Validation of Extreme Scale Power-Law Graphs

03/04/2018
by   Jeremy Kepner, et al.
0

Massive power-law graphs drive many fields: metagenomics, brain mapping, Internet-of-things, cybersecurity, and sparse machine learning. The development of novel algorithms and systems to process these data requires the design, generation, and validation of enormous graphs with exactly known properties. Such graphs accelerate the proper testing of new algorithms and systems and are a prerequisite for success on real applications. Many random graph generators currently exist that require realizing a graph in order to know its exact properties: number of vertices, number of edges, degree distribution, and number of triangles. Designing graphs using these random graph generators is a time-consuming trial-and-error process. This paper presents a novel approach that uses Kronecker products to allow the exact computation of graph properties prior to graph generation. In addition, when a real graph is desired, it can be generated quickly in memory on a parallel computer with no-interprocessor communication. To test this approach, graphs with 10^12 edges are generated on a 40,000+ core supercomputer in 1 second and exactly agree with those predicted by the theory. In addition, to demonstrate the extensibility of this approach, decetta-scale graphs with up to 10^30 edges are simulated in a few minutes on a laptop.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/02/2021

Some Combinatorial Problems in Power-law Graphs

The power-law behavior is ubiquitous in a majority of real-world network...
10/20/2017

Communication-free Massively Distributed Graph Generation

Analyzing massive complex networks yields promising insights about our e...
03/02/2020

Recent Advances in Scalable Network Generation

Random graph models are frequently used as a controllable and versatile ...
11/19/2019

The k-Power Domination Number in Some Self-Similar Graphs

The k-power domination problem is a problem in graph theory, which has a...
03/24/2018

On Large-Scale Graph Generation with Validation of Diverse Triangle Statistics at Edges and Vertices

Researchers developing implementations of distributed graph analytic alg...
10/01/2021

Whole Sampling Generation of Scale-Free Graphs

This paper presents the development of a new class of algorithms that ac...
04/03/2017

Towards a property graph generator for benchmarking

The use of synthetic graph generators is a common practice among graph-o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

footnotetext: This material is based in part upon work supported by the NSF under grant number DMS-1312831. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Power-law (or heavy-tail) [1, 2] graphs are found throughout a wide range of applications [3, 4]. In such graphs, there are a small number of vertices with a large number of edges and a large number of vertices with a small number of edges. Specific domains where such graphs are important include genomics [5, 6, 7, 8, 9, 10], brain mapping [11], computer networks [12, 13, 14, 15], social media [16, 17], cybersecurity [18, 19], and sparse machine learning [20, 21, 22, 23, 24].

Many graph processing systems are currently under development. These systems are exploring innovations in algorithms [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35], software architecture [36, 37, 38, 39, 40, 41, 42, 43, 44], software standards [45, 46, 47, 48, 49], and parallel computing hardware [50, 51, 52, 53, 54, 55, 56]. The development of novel algorithms and systems to process these data requires the design, generation, and validation of enormous graphs with known properties. Such graphs accelerate the proper testing of new algorithms and systems and are a prerequisite for success on real applications.

Many random graph generators currently exist that require creating a graph in order to know its exact properties, such as the number of vertices, number of edges, degree distribution, and number of triangles. Perhaps the most well-known and scalable power-law graph generator is used in the Graph500.org [57, 58, 59] and GraphChallenge.org [60, 61, 62] benchmarks. This generator, often referred to as R-MAT, is based on randomly sampling recursive Kronecker graphs. Other highly scalable graph generators are based on randomly specified degree distributions [63, 64, 65]. Designing graphs using these random graph generators is an iterative process whereby the graph designer selects the parameters of the graph generator, randomly creates the graph with those parameters, and then measures the desired properties. Such a process places certain natural limits on the ability of the graph designer to explore enormous graphs and know prior to graph generation the exact properties of the graph.

This paper presents a complementary approach using Kronecker products that allows the exact computation of graph properties prior to graph generation. In addition, when a real graph is desired, it can be generated quickly in memory on a parallel computer with no interprocessor communication. The paper begins with a review of the relevant properties of Kronecker products. Next, the types of constituent matrices that are suited for generating power-law graphs are described. Various mathematical properties of power-law Kronecker graphs are then derived. Subsequently, a parallel algorithm for rapidly generating large graphs is provided. A variety of performance results and specific examples of various graphs generated using this approach are presented. Finally, the conclusions and a discussion of further research are given.

Ii Kronecker Products

The Kronecker product of two square matrices is defined as follows [66]

where A, B, and C matrices of scalar values

More explicitly, the Kronecker product can be written as

The element-wise multiply operation can be a variety of functions so long as the resulting operation obeys the standard rules of element-wise multiplication, such as being the multiplicative annihilator for any value of

Furthermore, if element-wise multiplication and addition obey the conditions of a semiring [67, 68, 69], then the Kronecker product has many of the same desirable properties, such as associativity

and element-wise distributivity over addition

Finally, one unique feature of the Kronecker product is its relation to the matrix product. Specifically, the matrix product of two Kronecker products is equal to the Kronecker product of two matrix products

where matrix multiply

is given by

Iii Generating Power-Law Graphs

Generating graphs is a common operation in a wide range of graph algorithms. Graph generation is used in the testing of graph algorithms, in creating graph templates to match against, and for comparing real graph data with models. Given a graph adjacency matrix , if

then there exists an edge going from vertex to vertex [70, 71]. Likewise, if

then there is no edge from to . The Kronecker product of two graph adjacency matrices is a convenient, well-defined matrix operation that can be used for generating a wide range of graphs from a few parameters [57, 58]. The relation of the Kronecker product to graphs is easily illustrated in the context of bipartite graphs. Bipartite graphs have two sets of vertices, and every vertex has an edge to the other set of vertices but no edges within its own set of vertices. The Kronecker product of such graphs was first looked at by Weischel [72], who observed that the Kronecker product of two bipartite graphs resulted in a new graph consisting of two bipartite sub-graphs (see Figure 1).

Fig. 1: Kronecker product of the adjacency matrix of two bipartite graphs and results in a graph with two bipartite sub-graphs. The notation is used to indicate that the adjacency matrix has been permuted so that the two bipartite sub-graphs are more apparent.

The essence of a power-law graph is that it has a degree distribution vector

with non-zero entries that follows the relation

where is the number of edges in the vertex of a graph, is the number vertices with a specific degree , and is the slope of the power law when it is plotted using logarithmic axes [65]. If the graph is represented as an adjacency matrix, then the degree of a vertex is the number of non-zero (nnz) entries in the corresponding row and column in the matrix.

A star graph is a bipartite graph where one set has only one vertex. Star graphs are always a power-law graph. If a star graph has vertices, then the number of points in the star is given by

with a corresponding degree distribution of

which agrees with the power-law relation given by

where is the degree of the vertex with the most edges.

The Kronecker product of two star graphs can, under certain conditions, produce another power-law graph. In Figure 1, the graph of the Kronecker product of two star graphs with and has a degree distribution of

which are all points on the curve

The Kronecker product of star graphs can be used to build up extremely large power-law graphs. The degree distributions will follow the power-law relation as long as all of the products of the corresponding are unique.

It is worth noting that real-world graphs often have approximate power-law distributions when plotted simply, as in this case, or when plotted with logarithmic degree binning, but rarely both. It is possible to use Kronecker products to produce power-law graphs under logarithmic degree binning by placing additional constraints on the values of .

Iv Properties of Kronecker Graphs

The most powerful feature of Kronecker graphs is that many of their properties can be computed from their constituent matrices without ever having to form the full matrix. It is thus possible to design and analyze extremely large graphs quickly and only actually form the full graph when it is needed.

Let the adjacency matrix of graph be constructed by the following Kronecker product

where are each adjacency matrices of the smaller constituent graphs. The number of vertices in the graphs is equal to the number of rows in (or columns since are square), which can be computed from

Likewise, the number of edges in the graph is equal to the number of non-zero entries in and is given by

The degree distribution can be computed from the Kronecker product of the degree distributions

Iv-a Triangles

The number of vertices, number of edges, and degree distribution are good examples of the core properties of Kronecker products. A more sophisticated example is computing the number of triangles in a graph [73, 74, 75, 76]. Triangles are an important feature of a graph, and counting triangles is a basic property of many graph analysis systems. The total number of triangles in a graph can be computed from the following formula

where is a column vector of all 1’s and is the element-wise product. The same properties of Kronecker products apply to counting triangles, and the number of triangles can be computed from the component matrices via

Fig. 2: (top) Kronecker product of two star graphs with self-loops on the central vertex. The resulting graph has 15 triangles. (bottom) Kronecker product of two star graphs with self-loops on a leaf node. The resulting graph has 3 triangles.

Iv-B Case 1: Many Triangles

Bipartite graphs have no triangles, so the Kronecker product of star graphs will produce a large graph with zero triangles, which can be a useful test case. Fortunately, it is possible to simply modify the to create a graph with a rich triangle structure. Specifically, if a self-loop is put on the central vertex of the star, the resulting graph will have a large number of triangles. If the central vertex in the star is denoted by vertex 1, then a self-loop can be created in every constituent graph by setting

Removal of the self-loop in the final graph is accomplished by setting a single value back to zero

The number of vertices is unmodified by the inclusion of the self-loops. The number of edges is computed from the as before, followed by subtracting 1 from the total to account for the removal of the self-loop

Likewise, the degree distribution is computed from the as before with the following adjustments

The triangle count is computed from the as before with the following correction

Figure 2 (top) shows an example of a graph with 15 triangles produced using this method.

Iv-C Case 2: Some Triangles

A more modest number of triangles can be generated if one self-loop is put on one of the point vertices of each star, for example by setting

Removal of the self-loop in the final graph is accomplished by setting a single value back to zero

The number of vertices is unmodified by the inclusion of the self-loops. The number of edges is computed from the as before, followed by subtracting 1 from the total to account for the removal of the self-loop

Likewise, the degree distribution is computed from the as before with the following adjustments

The triangle count is computed from the as before with the following correction

Figure 2 (bottom) shows an example of a graph with 1 triangle produced using this method.

Iv-D Incidence Matrix

An incidence, or edge, matrix uses the rows to represent every edge in the graph, and the columns represent every vertex. There are a number of conventions for denoting an edge in an incidence matrix. One such convention is to use two incidence matrices

to indicate that edge is a connection from to . Incidence matrices are useful because they can easily represent multi-graphs and hyper-graphs. These complex graphs are difficult to capture with an adjacency matrix. One of the most common uses of matrix multiplication is to construct an adjacency matrix from an incidence matrix representation of a graph. For a graph with out-vertex incidence matrix and in-vertex incidence matrix , the corresponding adjacency matrix is [77, 69]

Kronecker products can also be used to construct incidence matrices that satisfy the above adjacency matrix equation. Specifically, let and be incidence matrices corresponding to . The incidence matrices can then be constructed by

and

It is worth noting that the order of edges in the incidence matrices is not uniquely determined. Different realizations of an incidence matrix are only equivalent when comparing their resulting adjacency matrices.

V Parallel Generation

Kronecker products allow the properties of a graph to be determined in advance, thus avoiding the iterative approach of other methods. Once the desired graph properties have been determined, Kronecker products also allow large graphs to be generated quickly on a parallel processor. The overall approach is to split the constituent matrices into two matrices and

The matrices and are designed so that both can fit in the memory of any one processor. Let the parallel computer have processors, and each processor is given an identifier [78, 79]. Each processor reads in and and extracts the triples of the non-zero element into three vectors , , and , each of length . Each processor then selects a of the triples , , and . If the underlying sparse storage of the matrices is compressed sparse columns (CSC), then the minimum value of is subtracted from and a new matrix is formed from these triples. Each processor can then form the submatrix of the overall matrix via the Kronecker product

The resulting matrices will have the same number of non-zero entries on each processor. In addition, the resulting graph is free of many of the problematic vertices and edges, such as empty vertices and self-loops, that are found in randomly generated graphs. These problematic vertices and edges often require randomly generated graphs to be reindexed before their properties can be computed.

Vi Results

This section presents a variety of scalability results to demonstrate the properties of the proposed Kronecker graph generation method. Figure 3 shows the rate of graph edge generation as a function of the number of processing cores used in the parallel graph generation technique described in the previous section. In this example, is a 530,400 vertex graph with 13,824,000 edges constructed from the Kronecker product of star graphs with . Likewise, is a 21,074 vertex graph with 82,944 edges constructed from the Kronecker product of star graphs with . The Kronecker product of and , produces a graph with 11,177,649,600 vertices and 1,146,617,856,000 edges and zero triangles. This graph construction was run in parallel on a supercomputer consisting of 648 compute nodes, each with at least 64 Xeon processing cores, for a total of 41,472 processing cores. Using the entire system, the trillion edge graph was generated in 1 second.

Fig. 3: Edge generation rate vs. number of processor cores. Performance scales linearly with processor cores and achieves a peak rate of over 1 trillion edges generated per second on over 40,000 processor cores.

Computing the degree distribution of the generated graph can be used to verify that a generated graph agrees with the theory. Figure 4 shows the measured and predicted degree distribution of a graph produced using the parallel graph generation technique. In this example, is a 530,400 vertex graph with 22,160,060 edges constructed from the Kronecker product of star graphs with and self-loops on the central vertices of the stars. Likewise, is a 21,074 vertex graph with 83,618 edges constructed from the Kronecker product of star graphs with and self-loops on the central vertices of the stars. The Kronecker product of and produces a graph with 11,177,649,600 vertices and 1,853,002,140,758 edges and 6,777,007,252,427 triangles. This calculation confirms that the predicted and measured graph are in exact agreement.

Fig. 4: Trillion-edge () power-law Kronecker graph showing the exact agreement between the predicted and measured degree distribution. The resulting graph has exactly 11,177,649,600 vertices, 1,853,002,140,758 edges, and 6,777,007,252,427 triangles.

Kronecker products can allow the rapid design of very large graphs suitable for the world’s largest computers. Figures 5 and 6 show the degree distribution for two graphs with over edges. Both graphs are generated from star graphs with and 6,997,208,649,600 vertices. Figure 5 has 1,433,272,320,000,000 edges and zero triangles, and the degree distribution exactly follows the power-law degree formula. Figure 6 is generated with self-loops on the central vertices producing 2,318,105,678,089,508 edges, 12,720,651,636,552,426 triangles, with the degree distribution that follows the power-law degree formula with small deviations above and below the line.

Fig. 5: Quadrillion-edge () power-law Kronecker graph predicted degree distribution. The resulting graph has exactly 6,997,208,649,600 vertices, 1,433,272,320,000,000 edges, and zero triangles.
Fig. 6: Quadrillion-edge () power-law Kronecker graph predicted degree distribution. The resulting graph has exactly 6,997,208,649,600 vertices, 2,318,105,678,089,508 edges, and 12,720,651,636,552,426 triangles.

Kronecker products can also enable the exact analysis of graphs that are far beyond the scale of any current or planned computing system. Figures 7 shows the degree distribution of a graph with over edges. The graph was generated from star graphs with and a self-loop on one point vertex of each star. The resulting graph has exactly 144,111,718,793,178,936,483,840,000 vertices, 2,705,963,586,782,877,716,483,871,216,764 edges, and 178,940,587 triangles. Most of the points follow the power-law degree line, but there are many points that deviate from this. This degree distribution was computed on a standard laptop computer in a few minutes.

Fig. 7: Predicted degree distribution of a decetta-edge () power-law Kronecker graph. The resulting graph is predicted to have exactly 144,111,718,793,178,936,483,840,000 vertices, 2,705,963,586,782,877,716,483,871,216,764 edges, and 178,940,587 triangles.

Vii Conclusion

Emerging data in metagenomics, brain mapping, Internet-of-things, cybersecurity, and sparse machine learning produce massive power-law graphs and are driving the development of novel algorithms and systems to process these data. The scale and distribution of these data makes validation of graph processing systems a significant challenge. The ability to create enormous graphs with exactly known properties can significantly accelerate the design, generation, and validation of new graph processing systems. Many current graph generators produce random graphs whose exact properties, such as number of vertices, number of edges, degree distribution, and number of triangles, can only be computed after the graph has been generated. Thus, designing graphs using these random graph generators is a time-consuming trial-and-error process.

Kronecker products of the adjacency matrices of star graphs are a powerful way to create large power-law graphs. The properties of Kronecker products allow many properties of a larger graph to be computed by simply combining the corresponding properties of the constituent matrices. The ability to compute the properties of large graphs using only small graphs allows the graph designer to find these prior to creating the actual graph. Furthermore, real graphs can be created using Kronecker products on a parallel computer with no interprocessor communication. The resulting graphs will have the same number of edges on each processor. In addition, the graph avoids many of the difficulties, such as empty vertices and self-loops, that are found in other graph generators that rely random sampling. These problematic vertices and edges often require randomly generated graphs to be reindexed before their properties can be computed.

To test this approach, graphs with edges are generated on a 40,000+ core supercomputer in 1 second and exactly agree with those predicted by the theory. In addition, in order to demonstrate the extensibility of this approach, decetta-scale graphs with up to edges are simulated in a few minutes on laptop. These results indicate that the proposed method can be a powerful tool for enabling the design, generation, and validation of new graph processing systems.

This paper has presented formulas for a number of properties of Kronecker graphs. There are many additional properties that could be computed in future research, such as eigenvectors, iso-parametric ratios, betweenness centrality, and triangle enumeration. The parallel Kronecker graph generator is ideally suited to the GraphBLAS.org software standard and the creation of a high performance version using this standard is a future goal. Finally, the ability to reason about graphs that are beyond any current or planned computer opens up new possibilities for the theoretical study of phenomena on these large graphs.

Acknowledgments

The authors wish to acknowledge the following individuals for their contributions and support: Alan Edelman, Charles Leiserson, Steve Pritchard, Michael Wright, Bob Bond, Dave Martinez, Sterling Foster, Paul Burkhardt, and Victor Roytburd.

References

  • [1] V. Pareto, Manuale di Economia Politica.   Societa Editrice, 1906, vol. 13.
  • [2] G. K. Zipf, The Psycho-Biology of Language.   Houghton-Mifflin, 1935.
  • [3] A.-L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, 1999.
  • [4] D. F. Gleich, “PageRank beyond the web,” SIAM Review, vol. 57, no. 3, pp. 321–363, 2015.
  • [5] J. L. Morrison, R. Breitling, D. J. Higham, and D. R. Gilbert, “GeneRank: using search engine technology for the analysis of microarray experiments,” BMC Bioinformatics, vol. 6, no. 1, p. 233, 2005.
  • [6] B. L. Mooney, L. R. Corrales, and A. E. Clark, “MoleculaRnetworks: An integrated graph theoretic and data mining tool to explore solvent organization in molecular simulation,” Journal of Computational Chemistry, vol. 33, no. 8, pp. 853–860, 2012.
  • [7] D. Polychronopoulos, D. Sellis, and Y. Almirantis, “Conserved noncoding elements follow power-law-like distributions in several genomes as a result of genome dynamics,” PloS one, vol. 9, no. 5, p. e95437, 2014.
  • [8] S. Dodson, D. O. Ricke, and J. Kepner, “Genetic sequence matching using D4M big data approaches,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2014.
  • [9] S. Dodson, D. O. Ricke, J. Kepner, N. Chiu, and A. Shcherbina, “Rapid sequence identification of potential pathogens using techniques from sparse linear algebra,” in Symposium on Technologies for Homeland Security.   IEEE, 2015.
  • [10] N. Gouda, Y. Shiwa, M. Akashi, H. Yoshikawa, K. Kasahara, and M. Furusawa, “Distribution of human single-nucleotide polymorphisms is approximated by the power law and represents a fractal structure,” Genes to Cells, vol. 21, no. 5, pp. 396–407, 2016.
  • [11] A. Fornito, “Graph theoretic analysis of human brain networks,” fMRI Techniques and Protocols, pp. 283–314, 2016.
  • [12] S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” Computer Networks and ISDN Systems, vol. 30, no. 1, pp. 107–117, 1998.
  • [13] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the internet topology,” in ACM SIGCOMM Computer Communication Review, vol. 29.4.   ACM, 1999, pp. 251–262.
  • [14] G. Yan, G. Tsekenis, B. Barzel, J.-J. Slotine, Y.-Y. Liu, and A.-L. Barabási, “Spectrum of controlling and observing complex networks,” Nature Physics, vol. 11, no. 9, pp. 779–786, 2015.
  • [15]

    R. Fontugne, P. Abry, K. Fukuda, D. Veitch, K. Cho, P. Borgnat, and H. Wendt, “Scaling in internet traffic: a 14 year and 3 day longitudinal study, with multiscale analyses and random projections,”

    IEEE/ACM Transactions on Networking, 2017.
  • [16] M. Zuckerburg, “Facebook and computer science,” Harvard University CS50 guest lecture, Dec. 7 2005.
  • [17] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in Proceedings of the 19th International Conference on World Wide Web.   ACM, 2010, pp. 591–600.
  • [18] S. Shao, X. Huang, H. E. Stanley, and S. Havlin, “Percolation of localized attack on complex networks,” New Journal of Physics, vol. 17, no. 2, p. 023049, 2015.
  • [19] S. Yu, G. Gu, A. Barnawi, S. Guo, and I. Stojmenovic, “Malware propagation in large-scale networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 1, pp. 170–179, 2015.
  • [20] H. Lee, C. Ekanadham, and A. Y. Ng, “Sparse deep belief net model for visual area v2,” in Advances in neural information processing systems, 2008, pp. 873–880.
  • [21]

    M. Ranzato, Y.-l. Boureau, and Y. L. Cun, “Sparse feature learning for deep belief networks,” in

    Advances in neural information processing systems, 2008, pp. 1185–1192.
  • [22]

    X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks.” in

    Aistats, vol. 15, no. 106, 2011, p. 275.
  • [23] D. Yu, F. Seide, G. Li, and L. Deng, “Exploiting sparseness in deep neural networks for large vocabulary speech recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on.   IEEE, 2012, pp. 4409–4412.
  • [24] J. Kepner, M. Kumar, J. Moreira, P. Pattnaik, M. Serrano, and H. Tufo, “Enabling massive deep neural networks with the GraphBLAS,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [25] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms.   Cambridge: MIT Press, 2009.
  • [26] B. A. Miller, N. Arcolano, M. S. Beard, J. Kepner, M. C. Schmidt, N. T. Bliss, and P. J. Wolfe, “A scalable signal processing architecture for massive graph analysis,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on.   IEEE, 2012, pp. 5329–5332.
  • [27] A. Buluç, G. Ballard, J. Demmel, J. Gilbert, L. Grigori, B. Lipshitz, A. Lugowski, O. Schwartz, E. Solomonik, and S. Toledo, “Communication-avoiding linear-algebraic primitives for graph analytics,” in International Parallel and Distributed Processing Symposium Workshops (IPDPSW).   IEEE, 2014.
  • [28] C. Voegele, Y.-S. Lu, S. Pai, and K. Pingali, “Parallel triangle counting and k-truss identification using graph-centric methods,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [29] S. Smith, X. Liu, N. K. Ahmed, A. S. Tom, F. Petrini, and G. Karypis, “Truss decomposition on shared-memory parallel systems,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [30] Y. Hu, P. Kumar, G. Swope, and H. H. Huang, “Trix: Triangle counting at extreme scale,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [31] T. La Fond, G. Sanders, C. Klymko et al., “An ensemble framework for detecting community changes in dynamic networks,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [32]

    D. Zhuzhunashvili and A. Knyazev, “Preconditioned spectral clustering for stochastic block partition streaming graph challenge (preliminary version at arxiv.),” in

    High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [33] T. M. Low, V. N. Rao, M. Lee, D. Popovici, F. Franchetti, and S. McMillan, “First look: Linear algebra-based triangle counting without matrix multiplication,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [34] A. J. Uppal, G. Swope, and H. H. Huang, “Scalable stochastic block partition,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [35] S. Mowlaei, “Triangle counting via vectorized set intersection,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [36] A. Buluç and J. R. Gilbert, “The combinatorial BLAS: Design, implementation, and applications,” The International Journal of High Performance Computing Applications, vol. 25, no. 4, pp. 496–509, 2011.
  • [37] J. Kepner, W. Arcand, W. Bergeron, N. Bliss, R. Bond, C. Byun, G. Condon, K. Gregson, M. Hubbell, J. Kurz, A. McCabe, P. Michaleas, A. Prout, A. Reuther, A. Rosa, and C. Yee, “Dynamic Distributed Dimensional Data Model (D4M) database and computation system,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2012, pp. 5349–5352.
  • [38] R. Pearce, “Triangle counting for scale-free graphs at scale in distributed memory,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [39] M. Halappanavar, H. Lu, A. Kalyanaraman, and A. Tumeo, “Scalable static and dynamic community detection using grappolo,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [40] A. S. Tom, N. Sundaram, N. K. Ahmed, S. Smith, S. Eyerman, M. Kodiyath, I. Hur, F. Petrini, and G. Karypis, “Exploring optimizations on shared-memory platforms for parallel triangle counting algorithms,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [41] O. Green, J. Fox, E. Kim, F. Busato, N. Bombieri, K. Lakhotia, S. Zhou, S. Singapura, H. Zeng, R. Kannan et al., “Quickly finding a truss in a haystack,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [42] H. Kabir and K. Madduri, “Parallel k-truss decomposition on multicore systems,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [43] S. Zhou, K. Lakhotia, S. G. Singapura, H. Zeng, R. Kannan, V. K. Prasanna, J. Fox, E. Kim, O. Green, and D. A. Bader, “Design and implementation of parallel pagerank on multicore platforms,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [44] D. Hutchison, “Distributed triangle counting in the graphulo matrix math library,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [45] T. Mattson, D. Bader, J. Berry, A. Buluc, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. Leiseron, A. Lumsdaine, D. Padua, S. Poole, S. Reinhardt, M. Stonebraker, S. Wallach, and A. Yoo, “Standards for graph algorithm primitives,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2013.
  • [46] J. Kepner, D. Bader, A. Buluç, J. Gilbert, T. Mattson, and H. Meyerhenke, “Graphs, matrices, and the graphblas: Seven good reasons,” Procedia Computer Science, vol. 51, pp. 2453–2462, 2015.
  • [47] J. Kepner, P. Aaltonen, D. Bader, A. Buluç, F. Franchetti, J. Gilbert, D. Hutchison, M. Kumar, A. Lumsdaine, H. Meyerhenke et al., “Mathematical foundations of the graphblas,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2016.
  • [48] A. Buluç, T. Mattson, S. McMillan, J. Moreira, and C. Yang, “Design of the graphblas api for c,” in Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International.   IEEE, 2017, pp. 643–652.
  • [49] T. Davis, “Suitesparse:graphblas,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [50] W. S. Song, V. Gleyzer, A. Lomakin, and J. Kepner, “Novel graph processor architecture, prototype system, and results,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2016.
  • [51] M. Bisson and M. Fatica, “Static graph challenge on gpu,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [52] K. Date, K. Feng, R. Nagi, J. Xiong, N. S. Kim, and W.-M. Hwu, “Collaborative (cpu+ gpu) algorithms for triangle counting and truss decomposition on the minsky architecture: Static graph challenge: Subgraph isomorphism,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [53] E. P. DeBenedictis, J. Cook, S. Srikanth, and T. M. Conte, “Superstrider associative array architecture,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [54] S. Manne, B. Chin, and S. K. Reinhardt, “If you build it, will they come?” IEEE Micro, vol. 37, no. 6, pp. 6–12, 2017.
  • [55] P. M. Kogge, “Graph analytics: Complexity, scalability, and architectures,” in Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International.   IEEE, 2017, pp. 1039–1047.
  • [56] R. Gioiosa, A. Tumeo, J. Yin, T. Warfel, D. Haglin, and S. Betelu, “Exploring datavortex systems for irregular applications,” in Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International.   IEEE, 2017, pp. 409–418.
  • [57] D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: A recursive model for graph mining,” in Proceedings of the 2004 SIAM International Conference on Data Mining.   SIAM, 2004, pp. 442–446.
  • [58] J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos, “Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication,” in European Conference on Principles of Data Mining and Knowledge Discovery.   Springer, 2005, pp. 133–145.
  • [59] D. Bader, K. Madduri, J. Gilbert, V. Shah, J. Kepner, T. Meuse, and A. Krishnamurthy, “Designing scalable synthetic compact applications for benchmarking high productivity computing systems,” Cyberinfrastructure Technology Watch, vol. 2, pp. 1–10, 2006.
  • [60] P. Dreher, C. Byun, C. Hill, V. Gadepally, B. Kuszmaul, and J. Kepner, “Pagerank pipeline benchmark: Proposal for a holistic system benchmark for big-data platforms,” in Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International.   IEEE, 2016, pp. 929–937.
  • [61] E. Kao, V. Gadepally, M. Hurley, M. Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, W. Song, D. Staheli, and S. Smith, “Streaming Graph Challenge - Stochastic Block Partition,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [62] S. Samsi, V. Gadepally, M. Hurley, M. Jones, E. Kao, S. Mohindra, P. Monticciolo, A. Reuther, S. Smith, W. Song, D. Staheli, and J. Kepner, “Static graph challenge: Subgraph isomorphism,” in High Performance Extreme Computing Conference (HPEC).   IEEE, 2017.
  • [63] C. Seshadhri, T. G. Kolda, and A. Pinar, “Community structure and scale-free collections of erdős-rényi graphs,” Physical Review E, vol. 85, no. 5, p. 056109, 2012.
  • [64] J. Kepner, “Perfect power law graphs: Generation, sampling, construction and fitting,” in SIAM Annual Meeting, 2012.
  • [65] V. Gadepally and J. Kepner, “Using a power law distribution to describe big data,” in High Performance Extreme Computing Conference (HPEC), 2015 IEEE.   IEEE, 2015, pp. 1–5.
  • [66] C. F. Van Loan, “The ubiquitous Kronecker product,” Journal of Computational and Applied Mathematics, vol. 123, no. 1, pp. 85–100, 2000.
  • [67] M. Gondran and M. Minoux, “Dioïds and semirings: Links to fuzzy sets and other applications,” Fuzzy Sets and Systems, vol. 158, no. 12, pp. 1273–1294, 2007.
  • [68] J. S. Golan, Semirings and their Applications.   Springer Science & Business Media, 2013.
  • [69] J. Kepner and H. Jananthan, Mathematics of Big Data.   MIT Press, 2018.
  • [70] D. König, “Graphen und matrizen (graphs and matrices),” Mat. Fiz. Lapok, vol. 38, no. 1931, pp. 116–119, 1931.
  • [71] J. Kepner and J. Gilbert, Graph algorithms in the language of linear algebra.   SIAM, 2011.
  • [72] P. M. Weichsel, “The Kronecker product of graphs,” Proceedings of the American Mathematical Society, vol. 13, no. 1, pp. 47–52, 1962.
  • [73] J. Cohen, “Graph twiddling in a mapreduce world,” Computing in Science and Engg., vol. 11, no. 4, pp. 29–41, Jul. 2009.
  • [74] A. Pavan, K. Tangwongsan, S. Tirthapura, and K.-L. Wu, “Counting and sampling triangles from a graph stream,” Proc. VLDB Endow., vol. 6, no. 14, pp. 1870–1881, Sep. 2013.
  • [75] A. Azad, A. Buluç, and J. Gilbert, “Parallel triangle counting and enumeration using matrix algebra,” in Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, ser. IPDPSW ’15.   Washington, DC, USA: IEEE Computer Society, 2015, pp. 804–811.
  • [76] P. Burkhardt, “Graphing trillions of triangles,” Information Visualization, vol. 0, no. 0, p. 1473871616666393, 2016.
  • [77] H. Jananthan, K. Dibert, and J. Kepner, “Constructing adjacency arrays from incidence arrays,” in IPDPS GABB Workshop, 2017 IEEE.   IEEE, 2017.
  • [78] N. Travinin Bliss and J. Kepner, “pMATLAB Parallel MATLAB Library,” The International Journal of High Performance Computing Applications, vol. 21, no. 3, pp. 336–359, 2007.
  • [79] J. Kepner, Parallel MATLAB for Multicore and Multinode Computers.   SIAM, 2009.