I Introduction††footnotetext: This material is based in part upon work supported by the NSF under grant number DMS-1312831, by DOE ASCR under contract number DE-AC02-05CH11231, and by the DoD under contract number FA8721-05-C-0003. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation, the Department of Energy, or the Department of Defense.
Graphs are among the most important abstract data structures in computer science, and the algorithms that operate on them are critical to applications in bioinformatics [Georganas et al 2014], computer networks, and social media [Ediger et al 2010, Ediger et al 2011, Riedy et al 2012, Riedy & Bader 2013]. Graphs have been shown to be powerful tools for modeling complex problems because of their simplicity and generality [Staudt et al 2016, Bergamini & Meyerhenke 2016]
. For this reason, the field of graph algorithms has become one of the pillars of theoretical computer science, informing research in such diverse areas as combinatorial optimization, complexity theory, and topology. Graph algorithms have been adapted and implemented by the military, commercial industry, and researchers in academia, and have become essential in controlling the power grid, telephone systems, and, of course, computer networks.
Parallel graph algorithms are notoriously difficult to implement and optimize [Ediger et al 2012, Ediger & Bader 2013, McLaughlin & Bader 2014a, McLaughlin & Bader 2014b, McLaughlin et al 2014, Staudt & Meyerhenke 2016]. The irregular data access patterns and inherently high communication-to-computation ratios found in graph algorithms mean that even the best algorithms will have parallel efficiencies that decrease as the number of processors is increased [Buluç & Gilbert 2012, Azad et al 2015]. Recent work on communication-avoiding algorithms, and their applications to graph computations [Ballard et al 2013, Solomonik et al 2013], might defer but not completely eliminate the parallel scalability bottleneck. Consequently, novel hardware architectures will also be required [Song et al 2010, Song et al 2013]. A common graph processing interface provides a useful tool for optimizing both software and hardware to provide high performance graph applications.
The duality between the canonical representation of graphs as abstract collections of vertices and edges and a matrix representation has been a part of graph theory since its inception [Konig 1931, Konig 1936]. Matrix algebra has been recognized as a useful tool in graph theory for nearly as long (see [Harary 1969] and references therein, in particular [Sabadusi 1960, Weischel 1962, McAndrew 1963, Teh & Yap 1964, McAndrew 1965, Harary & Tauth 1964, Brualdi 1967]). The modern description of the duality between graph algorithms and matrix mathematics (or sparse linear algebra) has been extensively covered in the literature and is summarized in the cited text [Kepner & Gilbert 2011]. This text has further spawned the development of the GraphBLAS math library standard (GraphBLAS.org)[Mattson et al 2013] that has been developed in a series of proceedings [Mattson 2014a, Mattson 2014b, Mattson 2015, Buluç 2015, Mattson 2016] and implementations [Buluç & Gilbert 2011, Kepner et al 2012, Ekanadham et al 2016, Hutchison et al 2015, Anderson et al 2016, Zhang et al 2016]. This paper describes the mathematical properties that have been developed since [Kepner & Gilbert 2011] to support the GraphBLAS.
The foundational mathematical concepts for matrix-based graph analysis are the adjacency matrix and incidence matrix representations of graphs. From these concepts, a more formal definition of a matrix can be constructed. How such a matrix can be manipulated depends on the types of values the matrix holds and the operations allowed on those values. Furthermore, the mathematical properties of the matrix values determine the mathematical properties of the whole matrix. This paper describes the key mathematical concepts of the GraphBLAS and presents preliminary results that show the overhead of the GraphBLAS is minimal (as compared to their underlying matrix libraries).
Ii Adjacency Matrix
Given an adjacency matrix , if
then there exists an edge going from vertex to vertex (see Figure 1). Likewise, if
then there is no edge from to . Adjacency matrices can have direction, which means that may not be the same as . Adjacency matrices can also have edge weights. If
then the edge going from to is said to have weight . Adjacency matrices provide a simple way to represent the connections between vertices in a graph. Adjacency matrices are often square, and both the out-vertices (rows) and the in-vertices (columns) are the same set of vertices. Adjacency matrices can be rectangular, in which case the out-vertices (rows) and the in-vertices (columns) are different sets of vertices. Such graphs are often called bipartite graphs. In summary, adjacency matrices can represent a wide range of graphs, which include any graph with any set of the following properties: directed, weighted, and/or bipartite.
Iii Incidence Matrix
An incidence, or edge matrix , uses the rows to represent every edge in the graph and the columns to represent every vertex. There are a number of conventions for denoting an edge in an incidence matrix. One such convention is to use two incidence matrices
to indicate that edge is a connection from to (see Figure 2). Incidence matrices are useful because they can easily represent multi-graphs, hyper-graphs, and multipartite graphs. These complex graphs are difficult to capture with an adjacency matrix. A multi-graph has multiple edges between the same vertices. If there was another edge, , from to , this relationship can be captured in an incidence matrix by setting
(see Figure 3) [Note: Another convention is to use +1 and -1, in which case the resulting matrix multiplication is the graph Laplacian.] In a hyper-graph, one edge can connect more than two vertices. For example, to denote that edge has a connection from to and can be accomplished by also setting
(see Figure 3). Furthermore, , , and can be drawn from different classes of vertices. can be used to represent multipartite graphs by defining an additional incidence array and seting
Thus, an incidence matrix can be used to represent a graph with any set of the following graph properties: directed, weighted, multipartite, multi-edge, and/or hyper-edge.
Iv Matrix Values
A typical matrix has rows and columns of real numbers. Such a matrix can be denoted as
The row and and column indexes of the matrix are
so that any particular value can be denoted as . The row and column indices of matrices are natural numbers . [Note: a specific implementation of these matrices might use IEEE 64-bit double-precision floating point numbers to represent real numbers, 64-bit unsigned integers to represent row and column indices, and the compressed sparse rows (CSR) format or the compressed sparse columns (CSC) format to store the nonzero values inside the sparse matrix.]
A matrix of complex numbers
A matrix of integers
A matrix of natural numbers
Using the above concepts, a matrix is defined as the following two-dimensional (2D) mapping
where the indices are finite sets of integers with and elements, respectively, and
is a set of scalars. Without loss of generality, matrices can be denoted
A vector is a matrix in which either or
. A column vector is denotedor simply . A row vector can be denoted or simply . A scalar is a single element of a set and has no matrix dimensions.
V Scalar Operations
Matrix operations are built on top of scalar operations that can be used for combining and scaling graph edge weights. The primary scalar operations are standard arithmetic addition, such as
and arithmetic multiplication, such as
These scalar operations of addition and multiplication can be defined to be a wide variety of functions. To prevent confusion with standard arithmetic addition and arithmetic multiplication, will be used to denote scalar addition and will be used to denote scalar multiplication. In this notation, standard arithmetic addition and arithmetic multiplication of real numbers
Generalizing and to a variety of operations enables a wide range of algorithms on scalars of all different types (not just real or complex numbers).
Certain and combinations over certain sets of scalars are particularly useful because they preserve essential mathematical properties, such as additive commutativity
and the distributivity of multiplication over addition
The properties of commutativity, associativity, and distributivity are extremely useful properties for building graph applications because they allow the builder to swap operations without changing the result. Example combinations of and that preserve scalar commutativity, associativity, and distributivity include (but are not limited to) standard arithmetic
finite (Galois) fields such as GF(2)
and power set algebras
Other functions that do not preserve the above properties can also be defined for and . For example, it is often useful for or to pull in other data, such as vertex indices of a graph.
Vi Matrix Properties
Associativity, distributivity, and commutativity are very powerful properties that enable the construction of composable graph algorithms (i.e., operations can be reordered with the knowledge that the answers will remain unchanged). Composability makes it easy to build a wide range of graph algorithms with just a few functions. Given matrices
let their elements be specified by
Commutativity, associativity, and distributivity of scalar operations translates into similar properties on matrix operations in the following manner.
Additive commutativity allows graphs to be swapped and combined via matrix element-wise addition (see Figure 4) without changing the result
where matrix element-wise addition is given by
Multiplicative commutativity allows graphs to be swapped, intersected, and scaled via matrix element-wise multiplication (see Figure 5) without changing the result
where matrix element-wise (Hadamard) multiplication is given by
Additive associativity allows graphs to be combined via matrix element-wise addition in any grouping without changing the result
Multiplicative associativity allows graphs to be intersected and scaled via matrix element-wise multiplication in any grouping without changing the result
Element-wise distributivity allows graphs to be intersected and/or scaled and then combined or vice versa without changing the result
Matrix multiply distributivity allows graphs to be transformed via matrix multiply and then combined or vice versa without changing the result
where matrix multiply
is given by
for matrices with dimensions
Matrix multiply associativity is another implication of scalar distributivity and allows graphs to be transformed via matrix multiplication in various orderings without changing the result
Matrix multiply commutativity can be achieved when combined with the transpose operation
where the transpose of a matrix is given by
Vii 0-Element: No Graph Edge
Sparse matrices play an important role in graphs. Many implementations of sparse matrices reduce storage by not storing the 0-valued elements in the matrix. In adjacency matrices, the 0 element is equivalent to no edge from the vertex that is represented by the row to the vertex that is represented by the column. In incidence matrices, the 0 element is equivalent to the edge represented by the row not including the vertex that is represented by the column. In most cases, the 0 element is standard arithmetic 0, but in other cases it can be a different value. Nonstandard 0 values can be helpful when combined with different and operations. For example, in different contexts 0 might be , -, or (empty set). For any value of 0, if the 0 element has certain properties with respect to scalar and , then the sparsity of matrix operations can be managed efficiently. These properties are the additive identity
and the multiplicative annihilator
Example combinations of and that exhibit the additive identity and multiplicative annihilator include
standard arithmetic () on real numbers
max-plus algebra () on real numbers with a defined minimal element
min-plus algebra () using real numbers with a defined maximal element
max-min algebra () using non-negative real numbers
min-max algebra ()] using non-positive real numbers
max-min algebra () using non-positive real numbers with a minimal element
min-max algebra () using non-negative real numbers with a maximal element
Galois field () over a set of two numbers
power set ()] on any subset of integers
The above examples are a small selection of the operators and sets that are useful for building graph algorithms. Many more are possible. The ability to change the scalar values and operators while preserving the overall behavior of the graph operations is one of the principal benefits of using matrices for graph algorithms.
Viii Matrix Graph Operations
The main benefit of a matrix approach to graphs is the ability to perform a wide range of graph operations on diverse types of graphs with a small number of matrix operations. These core matrix operations and some example graph operations they support are as follows
building a sparse matrix from row, column, and value triples, which corresponds to constructing a graph from a set of out-vertices, in-vertices, and edge weights
extracting the row, column, and value tuples corresponding to the nonzero elements in a sparse matrix, which corresponds to extracting graph edges from the matrix representation of a graph
transposing the rows and the columns of a sparse matrix, which is equivalent to swapping the out-vertices and the in-vertices of a graph
using matrix multiplication to perform single-source breadth-first search, multisource breadth-first search, and weighted breadth-first search on a graph
extracting a sub-matrix from a larger matrix is equivalent to selecting a sub-graph from a larger graph
assigning a matrix to a set of indices in a larger matrix inserts a sub-graph into a graph
using element-wise addition of matrices and element-wise multiplication of matrices to perform graph union and intersection along with edge weight scaling and combining
The above collection of functions has been shown to be useful for implementing a wide range of graph algorithms. These functions strike a balance between providing enough functions to be useful to application builders while being few enough that they can be implemented effectively.
Viii-a Building a Matrix: Edge List to Graph
Graph data can often be represented as triples of vectors , , and corresponding to the nonzero elements in the sparse matrix. Constructing an sparse matrix from vector triples can be denoted
are all element vectors. The optional operation defines how multiple entries with the same row and column are handled.
Viii-B Extracting Tuples: Graph to Vertex List
Extracting the nonzero tuples from a sparse matrix can be denoted mathematically as
Viii-C Transpose: Swap Out-Vertices and In-Vertices
Swapping the rows and columns of a sparse matrix is a common tool for changing the direction of vertices in a graph (see Figure 6). The transpose is denoted as
or more explicitly
Transpose also can be implemented using triples as follows
Viii-D Matrix Multiplication: Breadth-First-Search, and Adjacency Matrix Construction
Matrix multiplication is the most important matrix operation and can be used to implement a wide range of graph algorithms. Examples include finding the nearest neighbors of a vertex (see Figure 7) and constructing an adjacency matrix from an incidence matrix (see Figure 8). In its most common form, matrix multiplication using standard arithmetic addition and multiplication is given by
or more explicitly
Matrix multiplication has many important variants that include non-arithmetic addition and multiplication
and the notation makes explicit that and can be other functions.
One of the most common uses of matrix multiplication is to construct an adjacency matrix from an incidence matrix representation of a graph. For a graph with out-vertex incidence matrix and in-vertex incidence matrix , the corresponding adjacency matrix can be computed by
where the individual values in can be computed via
Viii-E Extract: Selecting Sub-graphs
Selecting sub-graphs is a very common graph operation (see Figure 9). This operation is performed by selecting out-vertices (row) and in-vertices (columns) from a matrix
or more explicitly
where , , , and select specific sets of rows and columns in a specific order. The resulting matrix can be larger or smaller than the input matrix . This operation can also be used to replicate and/or permute rows and columns in a matrix.
Extraction can also be implemented with matrix multiplication as
where and are selection matrices given by
Viii-F Assign: Modifying Sub-Graphs
Modifying sub-graphs is a very common graph operation. This operation is performed by selecting out-vertices (row) and in-vertices (columns) from a matrix and assigning new values to them from another sparse matrix,
or more explicitly
where , , and select specific sets of rows and columns.
Viii-G Element-Wise Addition and Element-Wise Multiplication: Combining Graphs, Intersecting Graphs, and Scaling Graphs
Combining graphs along with adding their edge weights can be accomplished by adding together their sparse matrix representations
where or more explicitly
where , and .
Intersecting graphs along with scaling their edge weights can be accomplished by element-wise multiplication of their sparse matrix representations
where or more explicitly
where , and .
A standard such as the GraphBLAS can only be effective if it does not impose unnecessary overhead on the computations it performs. One test of the overhead is to compare the GraphBLAS implementation to other standard sparse matrix libraries. Figure 10 shows the performance of one prototype GraphBLAS implementation compared to a state-of-the art GPU graph library (Gunrock) [Wang et al 2016].
The dataset used are random undirected Kronecker graphs with edge factor 32 and scale factor ranging from 16 to 21. Each experiment conducts a BFS starting from a high degree node in the graph. The GraphBLAS performance of sparse matrix - sparse vector multiplication is similar to Gunrock BFS performance. The similarity in performance indicates that the GraphBLAS is not introducing a high overhead. Each experiment is launched on these graphs from node 0 except on the scale 19 graph, which is launched from node 1. The runtime is an average of 10 runs to reduce variance.
We ran all experiments in this paper on a Linux workstation with 3.50 GHz Intel 4-core E5-2637 v2 Xeon CPUs, 256 GB of main memory, and an NVIDIA K40c GPU with 12 GB on-board memory. The GPU programs were compiled with NVIDIA’s nvcc compiler (version 7.5.17) using the -O3 optimization level. The C code was compiled using gcc 4.8.5. All results ignore transfer time (from disk-to-memory and CPU-to-GPU). The Gunrock code was executed using the command-line configuration --undirected --traversal-mode=1 --iteration-num=10.
Figure 11 shows the overhead of a second prototype GraphBLAS implementation, the GraphBLAS Template Library (GBTL)[Zhang et al 2016].We measured the GraphBLAS API overhead using the GraphBLAS Template Library (GBTL) on a machine with an Intel i5-4670k processor and a GTX660 CUDA-capable graphics card. The overhead results reflect the difference in runtime, in terms of percentages, between the CUDA backend of GBTL invoked using GraphBLAS API and the direct calling of underlying implementation. We obtain the numbers by averaging the overhead of 16 runs on Erdős-Rényi random graphs generated using the same dimension and sparsity. The code is compiled using the -O2 optimization level on version 7.5.18 of the CUDA toolkit with gcc 4.9.3. The results indicate that the overhead of the GraphBLAS is small compared to the underlying math being performed.
Matrices are a powerful tool for representing and manipulating graphs. Adjacency matrices represent directed-weighted-graphs with each row and column in the matrix representing a vertex and the values representing the weights of the edges. Incidence matrices represent directed-weighted-multi-hyper-graphs with each row representing an edge and each column representing a vertex. Perhaps the most important aspects of matrix-based graphs are the mathematical properties of commutativity, associativity, and distributivity. These properties allow a very small number of matrix operations to be used to construct a large number of graphs. These properties of the matrix are determined by the element-wise properties of addition and multiplication on the values in the matrix. The GraphBLAS allows these matrix properties to be readily applied to graphs in a low-overhead manner.
The authors would like to thank Hedayat Alghassi, Michael Anderson, Ariful Azad, Muthu Baskaran, Paul Burkhardt, Steven Dalton, Tim Davis, Joe Eaton, Alan Edelman, Sterling Foster, Vijay Gadepally, Joseph Gonzalez, Torsten Hoefler, Erik Holk, Thejaka Kanewala, Tze Meng Low, Dave Martinez, John Matty, Asit Mishra, Samantha Misurda, Mostofa Patwary, Fabrizio Petrini, Albert Reuther, Jason Riedy, Victor Roytburd, Nadathur Satish, Narayanan Sundaram, Richard Veras, Michael Wolf, Albert-Jan Yzelman, Peter Zhang, and Xia Zhu.
- [Anderson et al 2016] M. Anderson, N. Sundaram, N. Satish, M. Patwary, T. L. Willke, & P. Dubey, GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms, Proceedings of the IPDPS, 2016.
- [Azad et al 2015] A. Azad, G. Ballard, A. Buluç, J. Demmel, L. Grigori, O. Schwartz, S. Toledo, & S. Williams, Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication, Technical Report 1510.00844.arXiv
- [Ballard et al 2013] G. Ballard, A. Buluç, J. Demmel, L. Grigori, B. Lipshitz, O. Schwartz, & S. Toledo, Communication optimal parallel multiplication of sparse random matrices, In Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures (pp. 222-231), 2013
- [Bergamini & Meyerhenke 2016] E. Bergamini & H. Meyerhenke, Approximating Betweenness Centrality in Fully-dynamic Networks. Accepted by Internet Mathematics. Taylor and Francis Group. To appear.
- [Buluç & Gilbert 2011] A. Buluç & J. Gilbert, The Combinatorial BLAS: Design, implementation, and applications. International Journal of High Performance Computing Applications (IJHPCA), 2011
- [Buluç & Gilbert 2012] A. Buluç & J. Gilbert, Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments, SIAM Journal on Scientific Computing 34.4 (2012): C170-C191
- [Buluç 2015] A. Buluç, GraphBLAS Special Session, IEEE HPEC 2015, Waltham, MA
- [Brualdi 1967] R.A. Brualdi, Kronecker products of fully indecomposable matrices and of ultrastrong digraphs, Journal of Combinatorial Theory, 2:135-139, 1967
- [Chakrabarti 2004] D. Chakrabarti, Y. Zhan, and C. Faloutsos, R-MAT: A recursive model for graph mining. SIAM Data Mining, 2004.
- [Ediger et al 2010] D. Ediger, K. Jiang, J. Riedy, and D.A. Bader, Massive Streaming Data Analytics: A Case Study with Clustering Coefficients, 4th Workshop on Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April 23, 2010
- [Ediger et al 2011] D. Ediger, J. Riedy, H. Meyerhenke, and D.A. Bader, Tracking Structure of Streaming Social Networks, 5th Workshop on Multithreaded Architectures and Applications (MTAAP), Anchorage, AK, May 20, 2011
- [Ediger et al 2012] D. Ediger, R. McColl, J. Riedy, and D.A. Bader, STINGER: High Performance Data Structure for Streaming Graphs, The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 20-22, 2012
- [Ediger & Bader 2013] D. Ediger and D.A. Bader, Investigating Graph Algorithms in the BSP Model on the Cray XMT, 7th Workshop on Multithreaded Architectures and Applications (MTAAP), Boston, MA, May 24, 2013
- [Ekanadham et al 2016] K. Ekanadham, B. Horn, J. Jann, M. Kumar, J. Moreira, P. Pattnaik, M. Serrano, G. Tanase, H. Yu, Graph programming interface (GPI): a linear algebra programming model for large scale graph computations, Proceedings of the ACM International Conference on Computing Frontiers (CF’16), 72-81, 2016.
- [Georganas et al 2014] E. Georganas, A. Buluç, J. Chapman, L. Oliker, D. Rokhsar and K. Yelick, Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14), 2014
- [Harary & Tauth 1964] F. Harary & C.A. Tauth, Connectedness of products of two directed graphs, SIAM Journal on Applied Mathamatics, 14:250-254, 1966
- [Harary 1969] F. Harary, Graph Theory, Reading:Addison-Wesley, 1969
- [Hutchison et al 2015] D. Hutchison, J. Kepner, V. Gadepally, & A. Fuchs, Graphulo implementation of server-side sparse matrix multiply in the Accumulo database, IEEE High Performance Extreme Computing (HPEC) Conference, Walham, MA, September 2015.
- [Kepner & Gilbert 2011] J. Kepner & J. Gilbert (editors), Graph Algorithms in the Language of Linear Algebra, SIAM Press, Philadelphia, 2011
- [Kepner et al 2012] J. Kepner, W. Arcand, W. Bergeron, N. Bliss, R. Bond, C. Byun, G. Condon, K. Gregson, M. Hubbell, J. Kurz, A. McCabe, P. Michaleas, A. Prout, A. Reuther, A. Rosa & C. Yee, Dynamic Distributed Dimensional Data Model (D4M) Database and Computation System, ICASSP (International Conference on Acoustics, Speech, and Signal Processing), 2012, Kyoto, Japan
- [Konig 1931] D. Konig, Graphen und Matrizen (Graphs and Matrices), Matematikai Lapok, 38:116-119, 1931.
- [Konig 1936] D. Konig, Theorie der endlichen und unendlichen graphen (Theory of finite and infinite graphs), Leipzig:Akademie Verlag M.B.H., 1936; see Richard McCourt (Birkhauser 1990) for an english translation of this classic work
- [Leskovec 2005] J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos. Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2005), Porto, Portugal, 2005
- [Mattson et al 2013] T. Mattson, D. Bader, J. Berry, A. Buluç, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. Leiserson, A. Lumsdaine, D. Padua, S. Poole, S. Reinhardt, M. Stonebraker, S. Wallach, & A. Yoo, Standards for Graph Algorithms Primitives, IEEE HPEC 2013, Waltham, MA
- [Mattson 2014a] T. Mattson, Workshop on Graph Algorithms Building Blocks, IPDPS 2014, Pheoniz, AZ
- [Mattson 2014b] T. Mattson, GraphBLAS Special Session, IEEE HPEC 2014, Waltham, MA
- [Mattson 2015] T. Mattson, Workshop on Graph Algorithms Building Blocks, IPDPS 2015, Hyderabad, India
- [Mattson 2016] T. Mattson, Workshop on Graph Algorithms Building Blocks, IPDPS 2016, Chicago, IL
- [McAndrew 1963] M.H. McAndrew, On the product of directed graphs, Proceedings of the American Mathematical Society, 14:600-606, 1963
- [McAndrew 1965] M.H. McAndrew, On the polynomial of a directed graph, Proceedings of the American Mathematical Society, 16:303-309, 1965
- [McLaughlin & Bader 2014a] A. McLaughlin and D.A. Bader, Revisiting Edge and Node Parallelism for Dynamic GPU Graph Analytics, 8th Workshop on Multithreaded Architectures and Applications (MTAAP), Phoenix, AZ, May 23, 2014
- [McLaughlin & Bader 2014b] A. McLaughlin and D.A. Bader, Scalable and High Performance Betweenness Centrality on the GPU, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14), New Orleans, LA, November 16-21, 2014
- [McLaughlin et al 2014] A. McLaughlin, J. Riedy, and D.A. Bader, Optimizing Energy Consumption and Parallel Performance for Betweenness Centrality using GPUs, The 18th Annual IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 9-11, 2014
- [Meyerhenke et al 2015] H. Meyerhenke, P. Sanders, C. Schulz, Parallel Graph Partitioning for Complex Networks, 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2015)
- [Riedy & Bader 2013] J. Riedy & D.A. Bader, Multithreaded Community Monitoring for Massive Streaming Graph Data, 7th Workshop on Multithreaded Architectures and Applications (MTAAP), Boston, MA, May 24, 2014
- [Riedy et al 2012] J. Riedy, H. Meyerhenke, and D.A. Bader, Scalable Multi-threaded Community Detection in Social Networks, 6th Workshop on Multithreaded Architectures and Applications (MTAAP), Shanghai, China, May 25, 2012
- [Sabadusi 1960] G. Sabadusi, Graph multiplication, Mathematische Zeitschrift, 72:446-457, 1960
- [Solomonik et al 2013] E. Solomonik, A. Buluç, & J. Demmel, Minimizing communication in all-pairs shortest paths. In IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 548-559, 2013
- [Song et al 2010] W.S. Song, J. Kepner, H.T. Nguyen, J.I. Kramer, V. Gleyzer, J.R. Mann, A.H. Horst, L.L. Retherford, R.A. Bond, N.T. Bliss, E.I. Robinson, S. Mohindra, and J. Mullen, 3-D Graph Processor, Workshop on High Performance Embedded Computing, September 2010
- [Song et al 2013] W.S. Song, J. Kepner, V. Gleyzer, H.T. Nguyen, and J.I. Kramer, Novel Graph Processor Architecture, MIT Lincoln Laboratory Journal, vol. 20, no. 1, pp. 92-104, 2013
- [Staudt & Meyerhenke 2016] C.L. Staudt & H. Meyerhenke, Engineering Parallel Algorithms for Community Detection in Massive Networks, IEEE Transactions on Parallel and Distributed Systems vol. 27, no. 1, pp. 171-184, 2016.
- [Staudt et al 2016] C.L. Staudt, A. Sazonovs, H. Meyerhenke, NetworKit: A Tool Suite for Large-scale Network Analysis, Network Science, Cambridge University Press
- [Teh & Yap 1964] H.H. Teh & H.D. Yap, Some construction problems of homogeneous graphs, Bulletin of the Mathematical Society of Nanying University, 164-196, 1964
- [Van Loan 2000] C.F.V. Loan. The ubiquitous Kronecker product. Journal of Computation and Applied Mathematics, 123(1-2):85–100, 2000
- [Wang et al 2016] Y. Wang, A. Davidson, Y. Pan, Yuduo Wu, A. Riffel & J.D. Owens, Gunrock: A high-performance graph processing library on the GPU, 21th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, March 2016
- [Weischel 1962] P.M. Weischel. The Kronecker product of graphs, Proceedings of the American Mathematical Society, 13(1):47–52, 1962
- [Zhang et al 2016] P. Zhang, M. Zalewski, A. Lumsdaine, S. Misurda, & S. McMillan, GBTL-CUDA: Graph Algorithms and Primitives for GPUs, GABB workshop at IPDPS 2016