I Introduction
Largescale networks are widely used to represent object relationships in realworld applications. As many applications involve largescale networks with complex structures, it is generally hard to explore and analyze the key properties directly from these network. Hence the network summarization techniques have been commonly used to facilitate the process.
The general goal of network summarization is to approximate a large network with a coarsened network that has substantially fewer nodes and links but preserves certain network properties. The groupbased network summarization usually consists of the first step to group the nodes into nonoverlapping subsets such that nodes within the same subset share some specific properties [1]. One branch is to partition the graphs nodes into clusters such that nodes within the same cluster are strongly connected to each other, while nodes between different clusters are weakly connected [2, 3] as shown in Figure 1. However, such schemes only generate coarsened networks with strong community structures but fail to preserve other important structural properties of the original network. Another branch is to partition the graph nodes with similar connectivity patterns[4, 5]. The main purpose is group duplicate or similar nodes in terms of connectivity patterns to others to reduce the information lost during the coarsening process. However, how to quantify the connectivity patterns for each node and design objective functions to obtain those connectivitybased network partitions is challenging.
In this paper, we tackle the network summarization problem with the tools from the spectral graph theory. The field of spectral graph theory mainly studies the graph spectra, which are eigenvalues of the adjacent matrix or its Laplacian form, in relation with the specific graph structures. Networks with similar spectral properties are generally regarded to share similar structural patterns
[7], especially for largescale networks.The main contributions of this paper are summarized as follows,

We present a new framework for addressing the network summarization problem based on the spectral graph theory.

We propose a novel notion of spectral distance between the original and coarsened networks and rigorously establish several results about the spectral distance.

We provide an efficient algorithm to generate the coarsened networks and show that these coarsened networks preserve the spectral properties of the original networks for a large range of networks compared to other network summarization algorithms published in the literature.
The rest of this paper is organized as follows. Section 2 covers the related work for the network summarization problem. Section 3 introduces the basic concepts and notations used in this paper as well as the formal definition of network summarization problem. Our notion of spectral distance and its important properties are introduced in Section 4. Section 5 covers the spectral partitioning algorithm as well as the brief description of the parallel implementation. Experimental results on largescale networks are given in Section 6 and Section 7 contains the conclusion.
Ii Related Work
In this section, we review previous works on the network summarization problem as well as other branches that are closely related to the problem.
Iia Network summarization
Previous works on the network summarization problem mainly focused on the design of the objective function to characterize the distance between the original network and coarsened network. A comprehensive survey is provided in [6]. Note that he problem is also termed as ”graph compression” problem in some previous work [5]. The definition of the distance function plays a most important role in the network summarization problem. Hannu et al. proposed algorithms to tackle both the simple and generalized graph summarization problems [5]. For the simple problem, the distance function is defined as the average of the edgewise differences between the original network and the ”reconstructed network”. As for the generalized problem, the distance is further generalized as the differences between the overall connectivity. Saket et al. proposed the Minimum Description Length (MDL) principle to develop possible solutions for the summarization of unweighted graphs [4]. Manish et al. proposed an efficient algorithm to preserve the key characteristics for diffusion processes measured by the differences between the first eigenvalues of the corresponding adjacent matrices [8].
IiB Network sampling
The goal of the network sampling problem is to sample a subnetwork that represents the the whole network that preserves certain properties [9, 10]. Although the goals of the network sampling and network summarization are similar, both of which aim to generate a smaller network that preserves certain properties from the original network, the methodology of network sampling is different from the groupbased network summarization techniques. The network sampling mainly focuses on the strategies of selecting the nodes and edges that can best represent the original network.
IiC Graph partitioning
Graph partition problem has been extensively studied in many branches of fields such as community detection, VLSI design, scientific partitioning. The basic graph partitioning problem is to decompose a network into small components that satisfy specific properties. One common problem is to decompose a network into several components such that nodes within the same cluster are strongly connected to each other, while nodes between different clusters are weakly connected noted as graph clustering problem [2, 3]. As many graph partitioning problems are NPhard by nature, algorithms have been proposed to obtain the approximate solutions for the specific tasks [11]. Software packages, such as METIS [12] and GRACLUS [13], have been developed and widely used for many graph partitioning tasks.
IiD Graph sparsification
The graph sparsification problem is to approximate a graph with a sparse graph while preserving certain important properties of the orignial graph [14, 15, 16]. The sparse graph has the same number of nodes as the original graph but with much fewer edges. The distance between the original and sparse network is measured by the differences in their Laplacian spectra. The sparsification process allows for the running times of various matrix algorithms, whose computational complexity is related to the number of edges, to be much faster.
IiE Applications
There are a wide range of applications where the network summarization techniques are used,

Social network: In social network analysis, the network summarization has been used to delineate population groups with common interests based on the structural equivalence of individuals [17]. The structural equivalence has been considered under the block model with the aim to identify groups with different connectivity patterns.

Biological network: Largescale biological networks, such as protein networks and brain networks, have been partitioned into biologically functional modules which can aid in the detection and identification of subjects with certain diseases [18, 19]. For protein networks, graph summarization (GS) techniques are used to cluster protein interaction networks into biologically relevant modules where the biological module is defined as a set of proteins that have similar sets of interaction patterns [18]
. For brain networks, neurons that have similar connectivity patterns to other neurons are regarded to form the basic functional units in the human brain
[19, 20]. 
Entity resolution
: Entity resolution problem aims to find all the mentions that represent the same realworld entity from a large knowledge graph
[21, 22]. The problem can be formulated as a multitype graph summarization problem that cluster graph nodes that refer to the same entity into one supernode and creating weighted links among the supernodes that summarize the intercluster links in the original graph.
Iii Network Summarization Problem
Iiia Preliminaries
We start by defining some basic concepts and introducing some notations. In this paper, networks and graphs are used interchangeably, and the networks are typically weighted and undirected.
Definition 1.
A weighted graph is represented as , where is the set of nodes and is the weight matrix where denotes the edge weight between and . indicates that there is no edge between and . For unweighted graphs, is either (edge) or (no edge). In this work, we only consider undirected graphs, i.e. is a real symmetric matrix.
The network partitioning can be broadly defined as follows,
Definition 2.
The network partitioning of is to partition the graph nodes into nonoverlapping subsets where and for all . In general, we would like the partitions to possess some properties as will be explained later. The correspondence between node and its corresponding partition is expressed through a mapping function such that for .
In this paper, we use a matrix
consisting of the set of indicator vectors
to represent the network partitions where each vector reflect the membership relationship for cluster as follows,(1) 
IiiB Problem formulation
The general network summarization problem is to approximate a large network given by with a smaller network such that the coarsened network preserves as much as possible certain properties with the original network. The network summarization process is based upon a fixed network partition such that the nodes of the coarsened network correspond to the partitions and the edge weights are computed as,
(2) 
The coarsened network is formed by merging the nodes in the same partition into a ”supernode” and the ”superedge” weight is determined by the accumulative connectivity weights of all edges between all pairs of nodes within the corresponding supernodes.
In a more mathematical formulation, the weights of the coarsened network can be computed through the following matrix computation,
(3) 
There is no single criterion to construct the coarsened network from the original network. Figure 2 shows that even a simple original network can result in two significantly different coarsened networks. Hence the distance function measuring the differences between original and coarsened networks has to be flexible to accurately capture the distance between the original and coarsened networks with different structures.
The next few sections will focus on the design of the distance function to measure the difference between the original and coarsened networks as well as the algorithm to obtain the coarsened network.
Iv Spectral distance
This section covers the detailed description of the notion of spectral distance. The definition of the spectral distance will be given first, followed by the detailed justification.
Iva Definition
The definition of the spectral distance between two networks and with and respectively is based on the difference between the spectra of the normalized graph Laplacian of the two networks. The original network is assumed to contain no isolated nodes, that is, hold for any .
In spectral graph theory, the graph Laplacian of network is defined as
(4) 
where is a diagonal matrix where each element . The graph Laplacian satisfies the following property,
(5) 
for any .
The normalized Laplacian matrix is defined as
(6) 
The spectra of the normalized graph Laplacian are simply the eigenvalues of the normalized Laplacian matrix . We denote the spectra of the original network as
with the corresponding eigenvectors
, and the spectra of the coarsened network as with the corresponding eigenvectors .Definition 3.
The spectral distance is defined as
(7) 
The spectral distance compares the eigenvalues of the coarsened network with the head and tail eigenvalues of the original network. The parameter allows the flexibility to control which part of the spectra are being matched.
Figure 3 shows three examples of the original network and the ”true” coarsened network as well as the corresponding normalized Laplacian spectra. The ”true” network partitions for summarizing the networks are formed by grouping duplicate nodes with exactly the same connectivity patterns. These examples are to illustrate the flexibility of our notion of spectral distance in capturing the distance between different types of networks.
Other notions of the spectral distance are mainly concerned with networks of the same size which are usually defined as the pairwise differences between corresponding eigenvalues [23, 24, 25]. However, we are the first to propose the notion of spectral distance to compare between networks with significantly different sizes, which would be of independent interests in other fields.
IvB Relationship between the Normalized Laplacian and Network Structure
In spectral graph theory, the spectra of the weight (often adjacency) matrix , the graph Laplacian , and the normalized graph Laplacian have been studied extensively and have been shown to capture network properties from different perspectives. In this paper, we chose to use the spectrum of the normalized Laplacian for the following reasons.

The spectrum of the normalized graph Laplacian ranges between and for all networks.

The spectrum of the normalized Laplacian is invariant under scaling and isomorphism.

Previous studies empirically show that the cospectrality ratio, i.e. the proportion of the cospectral graphs over all possible graphs, is the lowest among the possible graph representations [7].

The normalized Laplacian matrix has been shown to outperform other graph matrices in identifying community structures and modularity of networks [26].
The relationships between the normalized Laplacian spectra and the network structures are summarized as follows,

The first few eigenvalues reflect the community structures. The multiplicity of eigenvalue 0 is equal to the number of connected component in the network. Networks with small eigenvalues indicate a strong community structure that can be quantified by the Cheeger’s inequality associated with the conductance of the graph as
(8) where the conductance of the graph is defined to reflect the community structure as follows,
(9) is the node set and is edge set containing edges connecting between nodes in and . is the sum of degrees for all nodes in .
IvC Properties of the spectral distance
In this section, we present some important properties of the proposed spectral distance which will be shown to be useful in network summarization problem.
We start by stating the interlacing theorem making use of the weak covering results in [30].
Theorem 1.
Let and be two networks with the edge weights formed by where has the structure as in Equation 1 (also called that is a weak cover of )， then the eigenvalues of the normalized Laplacian matrices satisfy the following inequality,
(10) 
Proof.
Let be the dimensional subspace , then
(11) 
For the other hand, let , then
(12) 
∎
The interlacing theorem 10 is an important result which tightly bounds the eigenvalues of the coarsened network by the eigenvalues of the original network.
Then the spectral distance has the following property,
Proposition 1.
Proof.
Each term, either or in the notion of spectral distance is greater or equal to 0 according to the interlacing theorem. Hence . ∎
We give a special case where the spectral distance between the original network and coarsened network is 0 for some parameter . We first define the concept of recoverability of the original network from the coarsened network with respect to the network partitions with the mapping function that indicates the membership.
Definition 4.
The original network can be recovered from the coarsened network with respect to the network partitions if there exist such that the edge weights of the original network satisfy the following,
(13) 
The following theorem gives a special case where the spectral distance is as stated below,
Theorem 2.
If the original network can be recovered from the coarsened network with respect to the network partitions , then there exist such that .
The following theorem on motif doubling states the results of the multiplicity of the eigenvalue 1 appearing in the spectrum of the normalized Laplacian that is useful for proving Theorem 2 [31, 28].
Definition 5.
A motif is a connected subgraph of that contains all edges of between the nodes of . In general, motifs are subgraphs that often occurs in a network.
Theorem 3.
Let be a motif of a graph with vertices . If the network is obtained from by adding a copy of the motif consisting of the nodes and the corresponding connections between them, and connecting each with all that are neighbors of . Then possess an eigenvalue 1 of the normalized Laplacian with the corresponding eigenvector that is nonzeros only at and
A special case is the vertex doubling where the motif is simply a single node,
Corollary 1.
Let the network be obtained by adding a vertex to the network where the complete connectivity profile of is proportional to that of a vertex in . The normalized Laplacian of network has exactly the same eigenvalues as except that it contains an additional eigenvalue 1.
Back to the proof of Theorem 2.
Proof.
As the normalized Laplacian spectra are insensitive to scales, we use without loss of generality. The original network can be regarded as repeatedly adding copies of each node for times to the coarsened network . Hence the normalized Laplacian of has the same eigenvalues as except that it contains more multiplicities of the eigenvalue 1.
Other distance measurements, such as the simple distance and generalized distance in [5], are defined with the difference between the original network and the recovered network with the edge weights defined as,
(15) 
It is easy to verify that the recovered network contains exactly the same eigenvalues with the coarsened network except for multiplicities of 1. When the original network is recoverable from the coarsened network, our spectral distance is consistent with other notions of distance measurements. Figure 4 shows the relations between the spectra of the original network, coarsened network and the recovered network that summarizes the section.
IvD Justification with stochastic block model
Besides the recoverable case mentioned previously, we use the stochastic block model to justify the notion of the spectral distance. The stochastic block model is a generative model for random graphs with explicit block structures [32] parameterized by where indicates the membership and is the edge probability between blocks. The stochastic block model has the property where nodes within the same block have the same edge probability to other nodes of the network. The edge probability between nodes within block and is determined by and the edge probability matrix is,
(16) 
The random network is generated from the stochastic block model parameterized by
with the edge weight sampled from the Bernoulli distribution as,
(17) 
The stochastic block model has been extensively studied in the fields of machine learning, statistics and information theory
[33, 34, 35]. The model is also an ideal model to study the network summarization problem where nodes within the same block share similar connectivity patterns to all other nodes and the ”true” network partitions and the specific structures of the coarsened network are known in prior.We will show that the definition of the spectral distance is accurate for the network summarization of random networks generated from the stochastic block model. In this following example, we use a simplified model with parameters where as the edge probability between nodes within the same block and as the edge probability between nodes in different blocks. The mixed case is when one half of the blocks is parametrized by and the other half is parametrized by . The two halves are disconnected.
Figure 5 shows the example results for the stochastic block model. The size of the original network is with blocks, each of which contains nodes. Each experiment is repeated 50 times. The coarsened network is constructed by the ”true” network partitions defined in the block model. All the three examples show that the spectra of the coarsened network match either the head or tail (or both) eigenvalues of the original network with small spectral distance with respect to different .
V Algorithm
Va Spectral partitioning algorithm
The spectral partitioning algorithm is to use the rows of the eigenvectors corresponding to the head and tail eigenvalues as the feature vectors to obtain the coarsened networks as described in Algorithm 1.
The spectral clustering algorithm iterate over all possible parameter
to find the coarsened network with the smallest spectral distance. In each iteration, the algorithm tries to match the eigenvalues of the coarsened network to the smallest and largest eigenvalues of the original network. Kmeans clustering algorithm is applied to derive the approximate solution corresponding to the network partitions from the composite eigenvectors.There are other heuristic ways for choosing
as follows,
Extreme cases: or .

Choose such that the chosen eigenvalues are farthest away from eigenvalue 1.
When is large, the coarsened network exhibits the community structure. A special case is when , the spectral distance algorithm is exactly the vanilla spectral clustering algorithm that minimizes the normalized cut[11]. When is small, the coarsened network tends to preserve the higher ends of the eigenvalues of the original network.
VB Complexity analysis
The computational complexity of the spectral partitioning algorithm is determined by the eigensolvers as well as the kmeans clustering algorithm. The worstcase computational complexity of the eigensolver is with the Implicitly Restarted Lanczos Method (IRLM) [36]. The computational complexity for the kmeans clustering algorithm is where is the number of iteration which is usually upper bounded by a constant. Hence the worstcase overall complexity for the spectral partitioning algorithm is .
VC GPUbased Implementation
We provided a fast GPUbased implementation of the spectral partitioning algorithm based on the parallel eigensolvers and kmeans algorithm in [37, 38]. For the acceleration of the parallel eigensolver, we utilize the reverse communication interfaces of ARPACK software and cuSPARSE in CUDA library to separate the complex computations of eigensolvers on CPU and GPU. Specifically, we leverage CPU to compute the complex core procedures of IRLM and GPU to compute the expensive largescale matrix operations.
The acceleration of the kmeans algorithm is achieved by transforming all the computation into sparse matrix computations in a nontrivial way. Then we leverage on GPU’s parallel computational ability to perform the matrix operations, which achieves up to 300x speedup compared to other serial implementations in commodity softwares such as Matlab and Python packages [37].
Vi Experiments
Via Datasets
We evaluate our network summarization scheme on a set of largescale datasets from both realworld applications and the synthetic network generated by the stochastic block model. We collect most of the realworld datasets from Stanford Large Network Dataset Collection (SNAP)[39]. Table I contains the basic statistics of the networks.
Dataset  Nodes  Edges  Mean degree 

Brain network  142541  3992290  28.00 
4039  88234  21.85  
ASTRO  18772  198110  10.55 
DBLP  317080  1049866  3.311 
Syn200  20000  773388  38.66 
Amazon  334863  925872  2.76 
36692  183831  5.01 

Brain network: The dataset contains a structural brain network that captures the fiber connectivity information in the human brain. Nodes in the brain network represent the 3D brain voxels and the edge weights reflect the connectivity strengths between them. The network summarization scheme is applied to obtain the network partitions as well as the the coarsened network that depicted the global cortical organization and the underlying connections within the human brain [40].

Facebook: This dataset is a network collected from a Facebook application where each node represents an anonymous user and edges exist between users that share similar political interests.

ASTRO: This dataset is a collaboration network that covers the scientific coauthorship between authors who submitted to Astro Physics category.

DBLP: This dataset consists of a comprehensive coauthorship network in computer science bibliography. Nodes represent the authors and edges exist between authors if they coauthored at least one publication [39].

SBM: The dataset is randomly generated from the stochastic block model as described in Section IVD. The synthetic sparse network is randomly generated such that two nodes are connected with probability if they are within the same cluster and if they are in different clusters. The number of block is 200, each of which contains 100 nodes.

Amazon: This dataset reflects the copurchased relationship among products purchased at Amazon. Nodes represent products and edges exist between products that are commonly purchased together.

Email: The dataset is an email communication network that covers all the email communications within a dataset of around half million emails. Nodes are email addresses and edges represent the communications between them.
All the experiments are conducted on the machine with 8core Intel Xeon CPU (2.40 GHz) with 64GB main memory and 2 Tesla K80 GPU with 10GB memory.
ViB Baseline algorithms
We list the following algorithms as baseline algorithms to compare with the spectral partitioning algorithm.
ViB1 Randomized algorithm
Each network node is randomly assigned a label from to . The network partitions are formed by grouping network nodes with the same label. Each group contains at least one node. As the randomized algorithm takes no consideration of any optimization function, we expect it to perform the worst for all of the datasets.
Time complexity: The random partitioning process takes and the generation of coarsened network is . The overall complexity is .
ViB2 CoarseNet
CoarseNet is the algorithm proposed in [8] to obtain a coarsened network that preserves the key characteristics for diffusion process on the network.
Time complexity: The worst case time complexity reported in [8] is where is the number of edges and is the maximum degree across all nodes.
ViB3 Greedy algorithm
Previous works, such as [5, 4], proposed greedy strategies to obtain the coarsened network. The naïve algorithm is to iteratively merge two nodes such that the distance between successive networks achieves the optimal at each step. Twohop optimization is used to reduce the computational cost by constraining the search space of the candidate node pairs that are within twohop distance [5].
In the experiment, we use the greedy algorithm as described in [5] with the distance function defined as,
(18) 
The algorithm is described in Algorithm 2. is the coarsened network by merging node and in the network .
Time complexity: The worst case is when all nodes are within twohop distance and .
ViC Performance evaluation
We evaluate these algorithms on the performance on the spectral distance as well as the runtime analysis. Figure 6 shows the spectral distance in relation with the size of coarsened networks generated by different algorithms and Figure 7 shows the corresponding running time.
ViC1 Spectral distance
For all datasets, the spectral partitioning algorithm generates the coarsened networks that achieve the best performance in terms of the spectral distance. For some datasets, CoarseNet performs almost as well as the spectral partitioning algorithm mainly because the characteristic properties preserved by CoarseNet is a spectral component in the spectral distance. The greedy algorithm performs poorly for almost all datasets as it only considers the edgewise differences which fails to capture the differences of the global connectivity. The performance of the randomized algorithm largely depends on the intrinsic spectral properties of the original network. According to the semicircular law, the randomized algorithm tends to generate coarsened networks with eigenvalues centered around 1. When there are many extreme Laplacian spectra, such as high multiplicities of 0’s and 2’s, the randomized algorithm performs poorly.
ViC2 Runtime analysis
As CoarseNet and greedy algorithm adopt a bottomup strategy, generating coarsened networks with small sizes costs more time than networks with large sizes. Since the size of coarsened networks is small, both the CoarseNet and greedy algorithm have almost constant time costs. As claimed in the original paper [8], CoarseNet has neartime complexity such that the time cost is small. However, the greedy algorithm has much worse computational complexity and hence the time costs are the highest for all networks. The time costs for the spectral partitioning algorithm increase significantly with the size of coarsened network due to the complexity of eigensolvers. The spectral partitioning is more capable to generate coarsened networks with small sizes.
ViD Visualization result
Figure 8 provides a small example to illustrate the summarization process. The network on the left is randomly sampled from the Email dataset with the number of nodes 50. The network contains a special structure where one node connects to many other nodes like a star structure and the network also contains a few other small connected components. The two adjacent matrices on the right show the two possible coarsened network. The coarsened network on the top shows that the original network contains a strongly connected component that consists of nodes in the ”start” structure and a few small connected component. The coarsened network on the bottom shows that there are two partitions that are strongly connected but has weak connections within the partition, which is achieved by separating the ”star” into two bipartite partitions.
Both of the coarsened networks have very small spectral distance with the original network, which capture some specific properties from the original network. Moreover both networks are obtained from the spectral partitioning algorithm with and respectively.
Vii Conclusion
In this paper, we have introduced a new framework to solve the network summarization problem based on the spectral graph theory. Our new framework uses a novel notion of distance that quantifies the differences between the original and coarsened networks with significantly different scales. In particular, we compared the eigenvalues of normalized Laplacian of the coarsened network with the head and tail eigenvalues of the normalized Laplacian of the original network. We have rigorously justified our new notion of and showed that this notion is accurate for tackling the network summarization problem supported by the experimental evidence with the stochastic block. We are the first to introduce a parameter in the distance function that provides the flexibility to generate coarsened networks with significantly different structural properties that captures different aspects of the original network. We have proposed a fast implementation of the spectral partitioning algorithm to generate the coarsened networks and showed that our algorithms perform better than previous algorithms in terms of spectral distance as well as the running time.
References
 [1] A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz, “Recent advances in graph partitioning,” Preprint, 2013.
 [2] S. E. Schaeffer, “Graph clustering,” Computer science review, vol. 1, no. 1, pp. 27–64, 2007.
 [3] M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
 [4] S. Navlakha, R. Rastogi, and N. Shrivastava, “Graph summarization with bounded error,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008, pp. 419–432.
 [5] H. Toivonen, F. Zhou, A. Hartikainen, and A. Hinkka, “Compression of weighted graphs,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011, pp. 965–973.
 [6] Y. Liu, A. Dighe, T. Safavi, and D. Koutra, “A graph summarization: A survey,” arXiv preprint arXiv:1612.04883, 2016.
 [7] E. R. Van Dam and W. H. Haemers, “Which graphs are determined by their spectrum?” Linear Algebra and its applications, vol. 373, pp. 241–272, 2003.
 [8] M. Purohit, B. A. Prakash, C. Kang, Y. Zhang, and V. Subrahmanian, “Fast influencebased coarsening for large networks,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014, pp. 1296–1305.
 [9] S.D. Lin, M.Y. Yeh, and C.T. Li, “Sampling and summarization for social networks.”
 [10] N. K. Ahmed, J. Neville, and R. Kompella, “Network sampling: From static to streaming graphs,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 8, no. 2, p. 7, 2014.
 [11] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007.
 [12] G. Karypis and V. Kumar, “Multilevelkway partitioning scheme for irregular graphs,” Journal of Parallel and Distributed computing, vol. 48, no. 1, pp. 96–129, 1998.
 [13] I. S. Dhillon, Y. Guan, and B. Kulis, “Weighted graph cuts without eigenvectors a multilevel approach,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 11, 2007.

[14]
D. A. Spielman and S.H. Teng, “Nearlylinear time algorithms for graph
partitioning, graph sparsification, and solving linear systems,” in
Proceedings of the thirtysixth annual ACM symposium on Theory of computing
. ACM, 2004, pp. 81–90.  [15] D. A. Spielman and N. Srivastava, “Graph sparsification by effective resistances,” SIAM Journal on Computing, vol. 40, no. 6, pp. 1913–1926, 2011.
 [16] J. Batson, D. A. Spielman, N. Srivastava, and S.H. Teng, “Spectral sparsification of graphs: theory and algorithms,” Communications of the ACM, vol. 56, no. 8, pp. 87–94, 2013.
 [17] F. Lorrain and H. C. White, “Structural equivalence of individuals in social networks,” The Journal of mathematical sociology, vol. 1, no. 1, pp. 49–80, 1971.
 [18] S. Navlakha, M. C. Schatz, and C. Kingsford, “Revealing biological modules via graph summarization,” Journal of Computational Biology, vol. 16, no. 2, pp. 253–264, 2009.

[19]
Y. Jin, J. F. JaJa, R. Chen, and E. H. Herskovits, “A datadriven approach to extract connectivity structures from diffusion tensor imaging data,” in
2015 IEEE International Conference on Big Data. IEEE, 2015.  [20] D. MorenoDominguez, A. Anwander, J. Haueisen, and T. R. Knösche, “Wholebrain cortical parcellation: A hierarchical method based on dmri tractography,” Ph.D. dissertation, Max Planck Institute for Human Cognitive and Brain Sciences Leipzig, 2014.
 [21] L. Getoor and A. Machanavajjhala, “Entity resolution: theory, practice & open challenges,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2018–2019, 2012.
 [22] L. Zhu, M. GhasemiGol, P. Szekely, A. Galstyan, and C. A. Knoblock, “Unsupervised entity resolution on multitype graphs,” in International Semantic Web Conference. Springer, 2016, pp. 649–667.
 [23] R. C. Wilson and P. Zhu, “A study of graph spectra for comparing graphs and trees,” Pattern Recognition, vol. 41, no. 9, pp. 2833–2841, 2008.
 [24] I. Jovanović and Z. Stanić, “Spectral distances of graphs,” Linear Algebra and its Applications, vol. 436, no. 5, pp. 1425–1435, 2012.
 [25] J. Gu, B. Hua, and S. Liu, “Spectral distances on graphs,” Discrete Applied Mathematics, vol. 190, pp. 56–74, 2015.
 [26] H.W. Shen and X.Q. Cheng, “Spectral methods for the detection of network community structure: a comparative analysis,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2010, no. 10, p. P10020, 2010.

[27]
S. de Lange, M. de Reus, and M. Van Den Heuvel, “The laplacian spectrum of neural networks,”
Frontiers in computational neuroscience, vol. 7, p. 189, 2014.  [28] R. Mehatari and A. Banerjee, “Effect on normalized graph laplacian spectrum by motif attachment and duplication,” Applied Mathematics and Computation, vol. 261, pp. 382–387, 2015.
 [29] F. Bauer and J. Jost, “Bipartite and neighborhood graphs and the spectrum of the normalized graph laplacian,” arXiv preprint arXiv:0910.3118, 2009.
 [30] S. K. Butler, Eigenvalues and structures of graphs. ProQuest, 2008.
 [31] A. Banerjee, “The spectrum of the graph laplacian as a tool for analyzing structure and evolution of networks,” Ph.D. dissertation, 2008.
 [32] B. Karrer and M. E. Newman, “Stochastic blockmodels and community structure in networks,” Physical Review E, vol. 83, no. 1, p. 016107, 2011.
 [33] K. Rohe, S. Chatterjee, and B. Yu, “Spectral clustering and the highdimensional stochastic blockmodel,” The Annals of Statistics, pp. 1878–1915, 2011.
 [34] Y. Wan and M. Meila, “A class of network models recoverable by spectral clustering,” in Advances in Neural Information Processing Systems, 2015, pp. 3285–3293.
 [35] E. Abbe, “Community detection and stochastic block models: recent developments,” arXiv preprint arXiv:1703.10146, 2017.
 [36] R. B. Lehoucq and D. C. Sorensen, “Deflation techniques for an implicitly restarted arnoldi iteration,” SIAM Journal on Matrix Analysis and Applications, vol. 17, no. 4, pp. 789–821, 1996.
 [37] Y. Jin and J. F. Jaja, “A high performance implementation of spectral clustering on cpugpu platforms,” in Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 2016, pp. 825–834.
 [38] Y. Jin, “A fast implementation of spectral clustering on gpucpu platform,” https://github.com/yujumd/fastsc/.
 [39] J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network dataset collection,” http://snap.stanford.edu/data.
 [40] Y. Jin, J. F. JaJa, R. Chen, and E. H. Herskovits, “A datadriven approach to extract connectivity structures from diffusion tensor imaging data,” in Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015, pp. 944–951.