DeepAI

# Network Summarization with Preserved Spectral Properties

Large-scale networks are widely used to represent object relationships in many real world applications. The occurrence of large-scale networks presents significant computational challenges to process, analyze, and extract information from such networks. Network summarization techniques are commonly used to reduce the computational load while attempting to maintain the basic structural properties of the original network. Previous works have primarily focused on some type of network partitioning strategies with application-dependent regularizations, most often resulting in strongly connected clusters. In this paper, we introduce a novel perspective regarding the network summarization problem based on concepts from spectral graph theory. We propose a new distance measurement to characterize the spectral differences between the original and coarsened networks. We rigorously justify the spectral distance with the interlacing theorem as well the results from the stochastic block model. We provide an efficient algorithm to generate the coarsened networks that maximally preserves the spectral properties of the original network. Our proposed network summarization framework allows the flexibility to generate a set of coarsened networks with significantly different structures preserved from different aspects of the original network, which distinguishes our work from others. We conduct extensive experimental tests on a variety of large-scale networks, both from real-world applications and the random graph model. We show that our proposed algorithms consistently perform better results in terms of the spectral measurements and running time compared to previous network summarization algorithms.

• 9 publications
• 5 publications
01/20/2020

### Randomized Spectral Clustering in Large-Scale Stochastic Block Models

Spectral clustering has been one of the widely used methods for communit...
04/25/2020

### Randomized spectral co-clustering for large-scale directed networks

Directed networks are generally used to represent asymmetric relationshi...
07/19/2020

### A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network Representation Learning

Network Embedding has been widely studied to model and manage data in a ...
12/11/2018

### Towards Scalable Spectral Sparsification of Directed Graphs

Recent spectral graph sparsification research allows constructing nearly...
04/25/2022

### Parallel coarsening of graph data with spectral guarantees

Finding coarse representations of large graphs is an important computati...
11/08/2022

### Graph Summarization via Node Grouping: A Spectral Algorithm

Graph summarization via node grouping is a popular method to build conci...
10/28/2018

### Computing Vertex Centrality Measures in Massive Real Networks with a Neural Learning Model

Vertex centrality measures are a multi-purpose analysis tool, commonly u...

## I Introduction

Large-scale networks are widely used to represent object relationships in real-world applications. As many applications involve large-scale networks with complex structures, it is generally hard to explore and analyze the key properties directly from these network. Hence the network summarization techniques have been commonly used to facilitate the process.

The general goal of network summarization is to approximate a large network with a coarsened network that has substantially fewer nodes and links but preserves certain network properties. The group-based network summarization usually consists of the first step to group the nodes into non-overlapping subsets such that nodes within the same subset share some specific properties [1]. One branch is to partition the graphs nodes into clusters such that nodes within the same cluster are strongly connected to each other, while nodes between different clusters are weakly connected [2, 3] as shown in Figure 1. However, such schemes only generate coarsened networks with strong community structures but fail to preserve other important structural properties of the original network. Another branch is to partition the graph nodes with similar connectivity patterns[4, 5]. The main purpose is group duplicate or similar nodes in terms of connectivity patterns to others to reduce the information lost during the coarsening process. However, how to quantify the connectivity patterns for each node and design objective functions to obtain those connectivity-based network partitions is challenging.

In this paper, we tackle the network summarization problem with the tools from the spectral graph theory. The field of spectral graph theory mainly studies the graph spectra, which are eigenvalues of the adjacent matrix or its Laplacian form, in relation with the specific graph structures. Networks with similar spectral properties are generally regarded to share similar structural patterns

[7], especially for large-scale networks.

The main contributions of this paper are summarized as follows,

• We present a new framework for addressing the network summarization problem based on the spectral graph theory.

• We propose a novel notion of spectral distance between the original and coarsened networks and rigorously establish several results about the spectral distance.

• We provide an efficient algorithm to generate the coarsened networks and show that these coarsened networks preserve the spectral properties of the original networks for a large range of networks compared to other network summarization algorithms published in the literature.

The rest of this paper is organized as follows. Section 2 covers the related work for the network summarization problem. Section 3 introduces the basic concepts and notations used in this paper as well as the formal definition of network summarization problem. Our notion of spectral distance and its important properties are introduced in Section 4. Section 5 covers the spectral partitioning algorithm as well as the brief description of the parallel implementation. Experimental results on large-scale networks are given in Section 6 and Section 7 contains the conclusion.

## Ii Related Work

In this section, we review previous works on the network summarization problem as well as other branches that are closely related to the problem.

### Ii-a Network summarization

Previous works on the network summarization problem mainly focused on the design of the objective function to characterize the distance between the original network and coarsened network. A comprehensive survey is provided in [6]. Note that he problem is also termed as ”graph compression” problem in some previous work [5]. The definition of the distance function plays a most important role in the network summarization problem. Hannu et al. proposed algorithms to tackle both the simple and generalized graph summarization problems [5]. For the simple problem, the distance function is defined as the average of the edge-wise differences between the original network and the ”reconstructed network”. As for the generalized problem, the distance is further generalized as the differences between the overall connectivity. Saket et al. proposed the Minimum Description Length (MDL) principle to develop possible solutions for the summarization of unweighted graphs [4]. Manish et al. proposed an efficient algorithm to preserve the key characteristics for diffusion processes measured by the differences between the first eigenvalues of the corresponding adjacent matrices [8].

### Ii-B Network sampling

The goal of the network sampling problem is to sample a sub-network that represents the the whole network that preserves certain properties [9, 10]. Although the goals of the network sampling and network summarization are similar, both of which aim to generate a smaller network that preserves certain properties from the original network, the methodology of network sampling is different from the group-based network summarization techniques. The network sampling mainly focuses on the strategies of selecting the nodes and edges that can best represent the original network.

### Ii-C Graph partitioning

Graph partition problem has been extensively studied in many branches of fields such as community detection, VLSI design, scientific partitioning. The basic graph partitioning problem is to decompose a network into small components that satisfy specific properties. One common problem is to decompose a network into several components such that nodes within the same cluster are strongly connected to each other, while nodes between different clusters are weakly connected noted as graph clustering problem [2, 3]. As many graph partitioning problems are NP-hard by nature, algorithms have been proposed to obtain the approximate solutions for the specific tasks [11]. Software packages, such as METIS [12] and GRACLUS [13], have been developed and widely used for many graph partitioning tasks.

### Ii-D Graph sparsification

The graph sparsification problem is to approximate a graph with a sparse graph while preserving certain important properties of the orignial graph [14, 15, 16]. The sparse graph has the same number of nodes as the original graph but with much fewer edges. The distance between the original and sparse network is measured by the differences in their Laplacian spectra. The sparsification process allows for the running times of various matrix algorithms, whose computational complexity is related to the number of edges, to be much faster.

### Ii-E Applications

There are a wide range of applications where the network summarization techniques are used,

• Social network: In social network analysis, the network summarization has been used to delineate population groups with common interests based on the structural equivalence of individuals [17]. The structural equivalence has been considered under the block model with the aim to identify groups with different connectivity patterns.

• Biological network: Large-scale biological networks, such as protein networks and brain networks, have been partitioned into biologically functional modules which can aid in the detection and identification of subjects with certain diseases [18, 19]. For protein networks, graph summarization (GS) techniques are used to cluster protein interaction networks into biologically relevant modules where the biological module is defined as a set of proteins that have similar sets of interaction patterns [18]

. For brain networks, neurons that have similar connectivity patterns to other neurons are regarded to form the basic functional units in the human brain

[19, 20].

• Entity resolution

: Entity resolution problem aims to find all the mentions that represent the same real-world entity from a large knowledge graph

[21, 22]. The problem can be formulated as a multi-type graph summarization problem that cluster graph nodes that refer to the same entity into one supernode and creating weighted links among the supernodes that summarize the inter-cluster links in the original graph.

## Iii Network Summarization Problem

### Iii-a Preliminaries

We start by defining some basic concepts and introducing some notations. In this paper, networks and graphs are used interchangeably, and the networks are typically weighted and undirected.

###### Definition 1.

A weighted graph is represented as , where is the set of nodes and is the weight matrix where denotes the edge weight between and . indicates that there is no edge between and . For unweighted graphs, is either (edge) or (no edge). In this work, we only consider undirected graphs, i.e. is a real symmetric matrix.

The network partitioning can be broadly defined as follows,

###### Definition 2.

The network partitioning of is to partition the graph nodes into non-overlapping subsets where and for all . In general, we would like the partitions to possess some properties as will be explained later. The correspondence between node and its corresponding partition is expressed through a mapping function such that for .

In this paper, we use a matrix

consisting of the set of indicator vectors

to represent the network partitions where each vector reflect the membership relationship for cluster as follows,

 qi(j)={1,if node j belongs to partition Pi0,else (1)

### Iii-B Problem formulation

The general network summarization problem is to approximate a large network given by with a smaller network such that the coarsened network preserves as much as possible certain properties with the original network. The network summarization process is based upon a fixed network partition such that the nodes of the coarsened network correspond to the partitions and the edge weights are computed as,

 Ws(i,j)=∑u∈Pi,v∈PjW(u,v) (2)

The coarsened network is formed by merging the nodes in the same partition into a ”supernode” and the ”superedge” weight is determined by the accumulative connectivity weights of all edges between all pairs of nodes within the corresponding supernodes.

In a more mathematical formulation, the weights of the coarsened network can be computed through the following matrix computation,

 Ws=QTsWQs (3)

There is no single criterion to construct the coarsened network from the original network. Figure 2 shows that even a simple original network can result in two significantly different coarsened networks. Hence the distance function measuring the differences between original and coarsened networks has to be flexible to accurately capture the distance between the original and coarsened networks with different structures.

The next few sections will focus on the design of the distance function to measure the difference between the original and coarsened networks as well as the algorithm to obtain the coarsened network.

## Iv Spectral distance

This section covers the detailed description of the notion of spectral distance. The definition of the spectral distance will be given first, followed by the detailed justification.

### Iv-a Definition

The definition of the spectral distance between two networks and with and respectively is based on the difference between the spectra of the normalized graph Laplacian of the two networks. The original network is assumed to contain no isolated nodes, that is, hold for any .

In spectral graph theory, the graph Laplacian of network is defined as

 L=D−W (4)

where is a diagonal matrix where each element . The graph Laplacian satisfies the following property,

 xTLx=∑i,jW(i,j)(xi−xj)2 (5)

for any .

The normalized Laplacian matrix is defined as

 L=I−D−1/2WD−1/2 (6)

The spectra of the normalized graph Laplacian are simply the eigenvalues of the normalized Laplacian matrix . We denote the spectra of the original network as

with the corresponding eigenvectors

, and the spectra of the coarsened network as with the corresponding eigenvectors .

###### Definition 3.

The spectral distance is defined as

 SD(G,Gs,l)=1k(l∑i=1(λsi−λi)+k∑i=l+1(λi+n−k−λsi)) (7)

The spectral distance compares the eigenvalues of the coarsened network with the head and tail eigenvalues of the original network. The parameter allows the flexibility to control which part of the spectra are being matched.

Figure 3 shows three examples of the original network and the ”true” coarsened network as well as the corresponding normalized Laplacian spectra. The ”true” network partitions for summarizing the networks are formed by grouping duplicate nodes with exactly the same connectivity patterns. These examples are to illustrate the flexibility of our notion of spectral distance in capturing the distance between different types of networks.

Other notions of the spectral distance are mainly concerned with networks of the same size which are usually defined as the pairwise differences between corresponding eigenvalues [23, 24, 25]. However, we are the first to propose the notion of spectral distance to compare between networks with significantly different sizes, which would be of independent interests in other fields.

### Iv-B Relationship between the Normalized Laplacian and Network Structure

In spectral graph theory, the spectra of the weight (often adjacency) matrix , the graph Laplacian , and the normalized graph Laplacian have been studied extensively and have been shown to capture network properties from different perspectives. In this paper, we chose to use the spectrum of the normalized Laplacian for the following reasons.

• The spectrum of the normalized graph Laplacian ranges between and for all networks.

• The spectrum of the normalized Laplacian is invariant under scaling and isomorphism.

• Previous studies empirically show that the cospectrality ratio, i.e. the proportion of the cospectral graphs over all possible graphs, is the lowest among the possible graph representations [7].

• The normalized Laplacian matrix has been shown to outperform other graph matrices in identifying community structures and modularity of networks [26].

The relationships between the normalized Laplacian spectra and the network structures are summarized as follows,

• The first few eigenvalues reflect the community structures. The multiplicity of eigenvalue 0 is equal to the number of connected component in the network. Networks with small eigenvalues indicate a strong community structure that can be quantified by the Cheeger’s inequality associated with the conductance of the graph as

 ϕG≤√2λ2 (8)

where the conductance of the graph is defined to reflect the community structure as follows,

 ϕG=minS⊂V|∂(S)|min(d(S),d(V−S)) (9)

is the node set and is edge set containing edges connecting between nodes in and . is the sum of degrees for all nodes in .

• The multiplicity of eigenvalue is related with the motif addition and duplication [28]. The property is later used to justify our notion of spectral distance in Section IV-C.

• The largest eigenvalue reflects the level of bipartitieness of the most bipartite connected component. A connected network contains the eigenvalue 2 if and only if it is bipartite. The eigenvalue is also closely related to the number of odd cyclic motifs in the network

[29].

### Iv-C Properties of the spectral distance

In this section, we present some important properties of the proposed spectral distance which will be shown to be useful in network summarization problem.

We start by stating the interlacing theorem making use of the weak covering results in [30].

###### Theorem 1.

Let and be two networks with the edge weights formed by where has the structure as in Equation 1 (also called that is a weak cover of )， then the eigenvalues of the normalized Laplacian matrices satisfy the following inequality,

 λi≤λsi≤λi+n−k (10)
###### Proof.

Let be the -dimensional subspace , then

 λsi=maxx∈SixTLsxxTx=maxx∈SixTQTLQxxTQTQx( weak covering property)≥mindim(S′i)=imaxy∈S′iyTLyyTy(Let y=Qx)=λi ( min-max theorem) (11)

For the other hand, let , then

 λsi=minx∈^SixTLsxxTx=minx∈^SixTQTLQxxTQTQx( weak covering property)≤maxdim(S′i)=k−i+1miny∈^S′iyTLyyTy(Let y=Qx)=λi+n−k ( min-max theorem) (12)

The interlacing theorem 10 is an important result which tightly bounds the eigenvalues of the coarsened network by the eigenvalues of the original network.

Then the spectral distance has the following property,

###### Proof.

Each term, either or in the notion of spectral distance is greater or equal to 0 according to the interlacing theorem. Hence . ∎

We give a special case where the spectral distance between the original network and coarsened network is 0 for some parameter . We first define the concept of recoverability of the original network from the coarsened network with respect to the network partitions with the mapping function that indicates the membership.

###### Definition 4.

The original network can be recovered from the coarsened network with respect to the network partitions if there exist such that the edge weights of the original network satisfy the following,

 W(i,j)=γWs(Π(i),Π(j)) (13)

The following theorem gives a special case where the spectral distance is as stated below,

###### Theorem 2.

If the original network can be recovered from the coarsened network with respect to the network partitions , then there exist such that .

The following theorem on motif doubling states the results of the multiplicity of the eigenvalue 1 appearing in the spectrum of the normalized Laplacian that is useful for proving Theorem 2 [31, 28].

###### Definition 5.

A motif is a connected subgraph of that contains all edges of between the nodes of . In general, motifs are subgraphs that often occurs in a network.

###### Theorem 3.

Let be a motif of a graph with vertices . If the network is obtained from by adding a copy of the motif consisting of the nodes and the corresponding connections between them, and connecting each with all that are neighbors of . Then possess an eigenvalue 1 of the normalized Laplacian with the corresponding eigenvector that is nonzeros only at and

A special case is the vertex doubling where the motif is simply a single node,

###### Corollary 1.

Let the network be obtained by adding a vertex to the network where the complete connectivity profile of is proportional to that of a vertex in . The normalized Laplacian of network has exactly the same eigenvalues as except that it contains an additional eigenvalue 1.

Back to the proof of Theorem 2.

###### Proof.

As the normalized Laplacian spectra are insensitive to scales, we use without loss of generality. The original network can be regarded as repeatedly adding copies of each node for times to the coarsened network . Hence the normalized Laplacian of has the same eigenvalues as except that it contains more multiplicities of the eigenvalue 1.

Let be the maximum index such that

 l=max{i|λi<1} (14)

then the spectral distance as illustrated in Figure 4. ∎

Other distance measurements, such as the simple distance and generalized distance in [5], are defined with the difference between the original network and the recovered network with the edge weights defined as,

 Wr(i,j)=Ws(Π(i),Π(j)); (15)

It is easy to verify that the recovered network contains exactly the same eigenvalues with the coarsened network except for multiplicities of 1. When the original network is recoverable from the coarsened network, our spectral distance is consistent with other notions of distance measurements. Figure 4 shows the relations between the spectra of the original network, coarsened network and the recovered network that summarizes the section.

### Iv-D Justification with stochastic block model

Besides the recoverable case mentioned previously, we use the stochastic block model to justify the notion of the spectral distance. The stochastic block model is a generative model for random graphs with explicit block structures [32] parameterized by where indicates the membership and is the edge probability between blocks. The stochastic block model has the property where nodes within the same block have the same edge probability to other nodes of the network. The edge probability between nodes within block and is determined by and the edge probability matrix is,

 P=ZBZT (16)

The random network is generated from the stochastic block model parameterized by

with the edge weight sampled from the Bernoulli distribution as,

 Wij={Bernoulli(Pij), if i≤j,Wji, if i>j. (17)

The stochastic block model has been extensively studied in the fields of machine learning, statistics and information theory

[33, 34, 35]. The model is also an ideal model to study the network summarization problem where nodes within the same block share similar connectivity patterns to all other nodes and the ”true” network partitions and the specific structures of the coarsened network are known in prior.

We will show that the definition of the spectral distance is accurate for the network summarization of random networks generated from the stochastic block model. In this following example, we use a simplified model with parameters where as the edge probability between nodes within the same block and as the edge probability between nodes in different blocks. The mixed case is when one half of the blocks is parametrized by and the other half is parametrized by . The two halves are disconnected.

Figure 5 shows the example results for the stochastic block model. The size of the original network is with blocks, each of which contains nodes. Each experiment is repeated 50 times. The coarsened network is constructed by the ”true” network partitions defined in the block model. All the three examples show that the spectra of the coarsened network match either the head or tail (or both) eigenvalues of the original network with small spectral distance with respect to different .

## V Algorithm

### V-a Spectral partitioning algorithm

The spectral partitioning algorithm is to use the rows of the eigenvectors corresponding to the head and tail eigenvalues as the feature vectors to obtain the coarsened networks as described in Algorithm 1.

The spectral clustering algorithm iterate over all possible parameter

to find the coarsened network with the smallest spectral distance. In each iteration, the algorithm tries to match the eigenvalues of the coarsened network to the smallest and largest eigenvalues of the original network. K-means clustering algorithm is applied to derive the approximate solution corresponding to the network partitions from the composite eigenvectors.

There are other heuristic ways for choosing

as follows,

• Extreme cases: or .

• Choose such that the chosen eigenvalues are farthest away from eigenvalue 1.

When is large, the coarsened network exhibits the community structure. A special case is when , the spectral distance algorithm is exactly the vanilla spectral clustering algorithm that minimizes the normalized cut[11]. When is small, the coarsened network tends to preserve the higher ends of the eigenvalues of the original network.

### V-B Complexity analysis

The computational complexity of the spectral partitioning algorithm is determined by the eigensolvers as well as the k-means clustering algorithm. The worst-case computational complexity of the eigensolver is with the Implicitly Restarted Lanczos Method (IRLM) [36]. The computational complexity for the kmeans clustering algorithm is where is the number of iteration which is usually upper bounded by a constant. Hence the worst-case overall complexity for the spectral partitioning algorithm is .

### V-C GPU-based Implementation

We provided a fast GPU-based implementation of the spectral partitioning algorithm based on the parallel eigensolvers and k-means algorithm in [37, 38]. For the acceleration of the parallel eigensolver, we utilize the reverse communication interfaces of ARPACK software and cuSPARSE in CUDA library to separate the complex computations of eigensolvers on CPU and GPU. Specifically, we leverage CPU to compute the complex core procedures of IRLM and GPU to compute the expensive large-scale matrix operations.

The acceleration of the k-means algorithm is achieved by transforming all the computation into sparse matrix computations in a non-trivial way. Then we leverage on GPU’s parallel computational ability to perform the matrix operations, which achieves up to 300x speedup compared to other serial implementations in commodity softwares such as Matlab and Python packages [37].

## Vi Experiments

### Vi-a Datasets

We evaluate our network summarization scheme on a set of large-scale datasets from both real-world applications and the synthetic network generated by the stochastic block model. We collect most of the real-world datasets from Stanford Large Network Dataset Collection (SNAP)[39]. Table I contains the basic statistics of the networks.

• Brain network: The dataset contains a structural brain network that captures the fiber connectivity information in the human brain. Nodes in the brain network represent the 3D brain voxels and the edge weights reflect the connectivity strengths between them. The network summarization scheme is applied to obtain the network partitions as well as the the coarsened network that depicted the global cortical organization and the underlying connections within the human brain [40].

• Facebook: This dataset is a network collected from a Facebook application where each node represents an anonymous user and edges exist between users that share similar political interests.

• ASTRO: This dataset is a collaboration network that covers the scientific co-authorship between authors who submitted to Astro Physics category.

• DBLP: This dataset consists of a comprehensive co-authorship network in computer science bibliography. Nodes represent the authors and edges exist between authors if they coauthored at least one publication [39].

• SBM: The dataset is randomly generated from the stochastic block model as described in Section IV-D. The synthetic sparse network is randomly generated such that two nodes are connected with probability if they are within the same cluster and if they are in different clusters. The number of block is 200, each of which contains 100 nodes.

• Amazon: This dataset reflects the co-purchased relationship among products purchased at Amazon. Nodes represent products and edges exist between products that are commonly purchased together.

• Email: The dataset is an email communication network that covers all the email communications within a dataset of around half million emails. Nodes are email addresses and edges represent the communications between them.

All the experiments are conducted on the machine with 8-core Intel Xeon CPU (2.40 GHz) with 64GB main memory and 2 Tesla K80 GPU with 10GB memory.

### Vi-B Baseline algorithms

We list the following algorithms as baseline algorithms to compare with the spectral partitioning algorithm.

#### Vi-B1 Randomized algorithm

Each network node is randomly assigned a label from to . The network partitions are formed by grouping network nodes with the same label. Each group contains at least one node. As the randomized algorithm takes no consideration of any optimization function, we expect it to perform the worst for all of the datasets.

Time complexity: The random partitioning process takes and the generation of coarsened network is . The overall complexity is .

#### Vi-B2 CoarseNet

CoarseNet is the algorithm proposed in [8] to obtain a coarsened network that preserves the key characteristics for diffusion process on the network.

Time complexity: The worst case time complexity reported in [8] is where is the number of edges and is the maximum degree across all nodes.

#### Vi-B3 Greedy algorithm

Previous works, such as [5, 4], proposed greedy strategies to obtain the coarsened network. The naïve algorithm is to iteratively merge two nodes such that the distance between successive networks achieves the optimal at each step. Two-hop optimization is used to reduce the computational cost by constraining the search space of the candidate node pairs that are within two-hop distance [5].

In the experiment, we use the greedy algorithm as described in [5] with the distance function defined as,

 dist(G,Gs)=√∑{a,b}∈V×V(W(a,b)−Ws(Π(a),Π(b)))2 (18)

The algorithm is described in Algorithm 2. is the coarsened network by merging node and in the network .

Time complexity: The worst case is when all nodes are within two-hop distance and .

### Vi-C Performance evaluation

We evaluate these algorithms on the performance on the spectral distance as well as the runtime analysis. Figure 6 shows the spectral distance in relation with the size of coarsened networks generated by different algorithms and Figure 7 shows the corresponding running time.

#### Vi-C1 Spectral distance

For all datasets, the spectral partitioning algorithm generates the coarsened networks that achieve the best performance in terms of the spectral distance. For some datasets, CoarseNet performs almost as well as the spectral partitioning algorithm mainly because the characteristic properties preserved by CoarseNet is a spectral component in the spectral distance. The greedy algorithm performs poorly for almost all datasets as it only considers the edge-wise differences which fails to capture the differences of the global connectivity. The performance of the randomized algorithm largely depends on the intrinsic spectral properties of the original network. According to the semi-circular law, the randomized algorithm tends to generate coarsened networks with eigenvalues centered around 1. When there are many extreme Laplacian spectra, such as high multiplicities of 0’s and 2’s, the randomized algorithm performs poorly.

#### Vi-C2 Runtime analysis

As CoarseNet and greedy algorithm adopt a bottom-up strategy, generating coarsened networks with small sizes costs more time than networks with large sizes. Since the size of coarsened networks is small, both the CoarseNet and greedy algorithm have almost constant time costs. As claimed in the original paper [8], CoarseNet has near-time complexity such that the time cost is small. However, the greedy algorithm has much worse computational complexity and hence the time costs are the highest for all networks. The time costs for the spectral partitioning algorithm increase significantly with the size of coarsened network due to the complexity of eigensolvers. The spectral partitioning is more capable to generate coarsened networks with small sizes.

### Vi-D Visualization result

Figure 8 provides a small example to illustrate the summarization process. The network on the left is randomly sampled from the Email dataset with the number of nodes 50. The network contains a special structure where one node connects to many other nodes like a star structure and the network also contains a few other small connected components. The two adjacent matrices on the right show the two possible coarsened network. The coarsened network on the top shows that the original network contains a strongly connected component that consists of nodes in the ”start” structure and a few small connected component. The coarsened network on the bottom shows that there are two partitions that are strongly connected but has weak connections within the partition, which is achieved by separating the ”star” into two bipartite partitions.

Both of the coarsened networks have very small spectral distance with the original network, which capture some specific properties from the original network. Moreover both networks are obtained from the spectral partitioning algorithm with and respectively.

## Vii Conclusion

In this paper, we have introduced a new framework to solve the network summarization problem based on the spectral graph theory. Our new framework uses a novel notion of distance that quantifies the differences between the original and coarsened networks with significantly different scales. In particular, we compared the eigenvalues of normalized Laplacian of the coarsened network with the head and tail eigenvalues of the normalized Laplacian of the original network. We have rigorously justified our new notion of and showed that this notion is accurate for tackling the network summarization problem supported by the experimental evidence with the stochastic block. We are the first to introduce a parameter in the distance function that provides the flexibility to generate coarsened networks with significantly different structural properties that captures different aspects of the original network. We have proposed a fast implementation of the spectral partitioning algorithm to generate the coarsened networks and showed that our algorithms perform better than previous algorithms in terms of spectral distance as well as the running time.

## References

• [1] A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz, “Recent advances in graph partitioning,” Preprint, 2013.
• [2] S. E. Schaeffer, “Graph clustering,” Computer science review, vol. 1, no. 1, pp. 27–64, 2007.
• [3] M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
• [4] S. Navlakha, R. Rastogi, and N. Shrivastava, “Graph summarization with bounded error,” in Proceedings of the 2008 ACM SIGMOD international conference on Management of data.   ACM, 2008, pp. 419–432.
• [5] H. Toivonen, F. Zhou, A. Hartikainen, and A. Hinkka, “Compression of weighted graphs,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2011, pp. 965–973.
• [6] Y. Liu, A. Dighe, T. Safavi, and D. Koutra, “A graph summarization: A survey,” arXiv preprint arXiv:1612.04883, 2016.
• [7] E. R. Van Dam and W. H. Haemers, “Which graphs are determined by their spectrum?” Linear Algebra and its applications, vol. 373, pp. 241–272, 2003.
• [8] M. Purohit, B. A. Prakash, C. Kang, Y. Zhang, and V. Subrahmanian, “Fast influence-based coarsening for large networks,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2014, pp. 1296–1305.
• [9] S.-D. Lin, M.-Y. Yeh, and C.-T. Li, “Sampling and summarization for social networks.”
• [10] N. K. Ahmed, J. Neville, and R. Kompella, “Network sampling: From static to streaming graphs,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 8, no. 2, p. 7, 2014.
• [11] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and computing, vol. 17, no. 4, pp. 395–416, 2007.
• [12] G. Karypis and V. Kumar, “Multilevelk-way partitioning scheme for irregular graphs,” Journal of Parallel and Distributed computing, vol. 48, no. 1, pp. 96–129, 1998.
• [13] I. S. Dhillon, Y. Guan, and B. Kulis, “Weighted graph cuts without eigenvectors a multilevel approach,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 11, 2007.
• [14] D. A. Spielman and S.-H. Teng, “Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems,” in

Proceedings of the thirty-sixth annual ACM symposium on Theory of computing

.   ACM, 2004, pp. 81–90.
• [15] D. A. Spielman and N. Srivastava, “Graph sparsification by effective resistances,” SIAM Journal on Computing, vol. 40, no. 6, pp. 1913–1926, 2011.
• [16] J. Batson, D. A. Spielman, N. Srivastava, and S.-H. Teng, “Spectral sparsification of graphs: theory and algorithms,” Communications of the ACM, vol. 56, no. 8, pp. 87–94, 2013.
• [17] F. Lorrain and H. C. White, “Structural equivalence of individuals in social networks,” The Journal of mathematical sociology, vol. 1, no. 1, pp. 49–80, 1971.
• [18] S. Navlakha, M. C. Schatz, and C. Kingsford, “Revealing biological modules via graph summarization,” Journal of Computational Biology, vol. 16, no. 2, pp. 253–264, 2009.
• [19]

Y. Jin, J. F. JaJa, R. Chen, and E. H. Herskovits, “A data-driven approach to extract connectivity structures from diffusion tensor imaging data,” in

2015 IEEE International Conference on Big Data.   IEEE, 2015.
• [20] D. Moreno-Dominguez, A. Anwander, J. Haueisen, and T. R. Knösche, “Whole-brain cortical parcellation: A hierarchical method based on dmri tractography,” Ph.D. dissertation, Max Planck Institute for Human Cognitive and Brain Sciences Leipzig, 2014.
• [21] L. Getoor and A. Machanavajjhala, “Entity resolution: theory, practice & open challenges,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2018–2019, 2012.
• [22] L. Zhu, M. Ghasemi-Gol, P. Szekely, A. Galstyan, and C. A. Knoblock, “Unsupervised entity resolution on multi-type graphs,” in International Semantic Web Conference.   Springer, 2016, pp. 649–667.
• [23] R. C. Wilson and P. Zhu, “A study of graph spectra for comparing graphs and trees,” Pattern Recognition, vol. 41, no. 9, pp. 2833–2841, 2008.
• [24] I. Jovanović and Z. Stanić, “Spectral distances of graphs,” Linear Algebra and its Applications, vol. 436, no. 5, pp. 1425–1435, 2012.
• [25] J. Gu, B. Hua, and S. Liu, “Spectral distances on graphs,” Discrete Applied Mathematics, vol. 190, pp. 56–74, 2015.
• [26] H.-W. Shen and X.-Q. Cheng, “Spectral methods for the detection of network community structure: a comparative analysis,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2010, no. 10, p. P10020, 2010.
• [27]

S. de Lange, M. de Reus, and M. Van Den Heuvel, “The laplacian spectrum of neural networks,”

Frontiers in computational neuroscience, vol. 7, p. 189, 2014.
• [28] R. Mehatari and A. Banerjee, “Effect on normalized graph laplacian spectrum by motif attachment and duplication,” Applied Mathematics and Computation, vol. 261, pp. 382–387, 2015.
• [29] F. Bauer and J. Jost, “Bipartite and neighborhood graphs and the spectrum of the normalized graph laplacian,” arXiv preprint arXiv:0910.3118, 2009.
• [30] S. K. Butler, Eigenvalues and structures of graphs.   ProQuest, 2008.
• [31] A. Banerjee, “The spectrum of the graph laplacian as a tool for analyzing structure and evolution of networks,” Ph.D. dissertation, 2008.
• [32] B. Karrer and M. E. Newman, “Stochastic blockmodels and community structure in networks,” Physical Review E, vol. 83, no. 1, p. 016107, 2011.
• [33] K. Rohe, S. Chatterjee, and B. Yu, “Spectral clustering and the high-dimensional stochastic blockmodel,” The Annals of Statistics, pp. 1878–1915, 2011.
• [34] Y. Wan and M. Meila, “A class of network models recoverable by spectral clustering,” in Advances in Neural Information Processing Systems, 2015, pp. 3285–3293.
• [35] E. Abbe, “Community detection and stochastic block models: recent developments,” arXiv preprint arXiv:1703.10146, 2017.
• [36] R. B. Lehoucq and D. C. Sorensen, “Deflation techniques for an implicitly restarted arnoldi iteration,” SIAM Journal on Matrix Analysis and Applications, vol. 17, no. 4, pp. 789–821, 1996.
• [37] Y. Jin and J. F. Jaja, “A high performance implementation of spectral clustering on cpu-gpu platforms,” in Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International.   IEEE, 2016, pp. 825–834.
• [38] Y. Jin, “A fast implementation of spectral clustering on gpu-cpu platform,” https://github.com/yuj-umd/fastsc/.
• [39] J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network dataset collection,” http://snap.stanford.edu/data.
• [40] Y. Jin, J. F. JaJa, R. Chen, and E. H. Herskovits, “A data-driven approach to extract connectivity structures from diffusion tensor imaging data,” in Big Data (Big Data), 2015 IEEE International Conference on.   IEEE, 2015, pp. 944–951.