# Nearly-Linear Time Spectral Graph Reduction for Scalable Graph Partitioning and Data Visualization

This paper proposes a scalable algorithmic framework for spectral reduction of large undirected graphs. The proposed method allows computing much smaller graphs while preserving the key spectral (structural) properties of the original graph. Our framework is built upon the following two key components: a spectrum-preserving node aggregation (reduction) scheme, as well as a spectral graph sparsification framework with iterative edge weight scaling. We show that the resulting spectrally-reduced graphs can robustly preserve the first few nontrivial eigenvalues and eigenvectors of the original graph Laplacian. In addition, the spectral graph reduction method has been leveraged to develop much faster algorithms for multilevel spectral graph partitioning as well as t-distributed Stochastic Neighbor Embedding (t-SNE) of large data sets. We conducted extensive experiments using a variety of large graphs and data sets, and obtained very promising results. For instance, we are able to reduce the "coPapersCiteseer" graph with 0.43 million nodes and 16 million edges to a much smaller graph with only 13K (32X fewer) nodes and 17K (950X fewer) edges in about 16 seconds; the spectrally-reduced graphs also allow us to achieve up to 1100X speedup for spectral graph partitioning and up to 60X speedup for t-SNE visualization of large data sets.

## Authors

• 7 publications
• 6 publications
• 11 publications
12/11/2018

### Towards Scalable Spectral Sparsification of Directed Graphs

Recent spectral graph sparsification research allows constructing nearly...
10/12/2017

### Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices ...
08/17/2020

### SF-GRASS: Solver-Free Graph Spectral Sparsification

Recent spectral graph sparsification techniques have shown promising per...
11/23/2019

### GRASPEL: Graph Spectral Learning at Scale

Learning meaningful graphs from data plays important roles in many data ...
01/01/2018

### On Spectral Graph Embedding: A Non-Backtracking Perspective and Graph Approximation

Graph embedding has been proven to be efficient and effective in facilit...
04/16/2021

### SGL: Spectral Graph Learning from Measurements

This work introduces a highly scalable spectral graph densification fram...
11/14/2017

### Similarity-Aware Spectral Sparsification by Edge Filtering

In recent years, spectral graph sparsification techniques that can compu...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

Recent research shows that by leveraging the key spectral properties of eigenvalues and eigenvectors of graph Laplacians, more efficient algorithms can be developed for tackling many graph-related computing tasks (teng2016scalable, ). For example, spectral methods can potentially lead to much faster algorithms for solving sparse matrices (spielman2014sdd, ; zhiqiang:iccad17, ), numerical optimization (christiano2011flow, ), data mining (peng2015partitioning, ), graph analytics (koren2003spectral, )

(defferrard2016convolutional, ), as well as very-large-scale integration (VLSI) computer-aided design (CAD) (zhuo:dac16, ; zhiqiang:dac17, ; zhiqiang:iccad17, ; zhuo:dac18, ). To this end, spectral sparsification of graphs has been extensively studied in the past decade (spielman2011graph, ; batson2012twice, ; spielman2011spectral, ; Lee:2017, ) to allow computing almost-linear-sized 111The number of vertices (nodes) is similar to the number of edges. subgraphs or sparsifiers that can robustly preserve the spectrum, such as the eigenvalues and eigenvectors of the original Laplacian. The sparsified graphs retain the same set of vertices but much fewer edges, which can be regarded as ultra-sparse graph proxies and have been leveraged for developing a series of nearly-linear-time numerical and graph algorithms  (spielman2011graph, ; Fung:2011stoc, ; christiano2011flow, ; spielman2014sdd, ).

In this paper, we introduce a scalable algorithmic framework for spectral reduction of graphs

for dramatically reducing the size (both nodes and edges) of undirected graphs while preserving the key spectral (structural) properties of the original graph. The spectrally-reduced graphs will immediately lead to the development of much faster numerical and graph-related algorithms. For example, spectrally-reduced social (data) networks may allow for more efficiently modeling, mining and analysis of large social (data) networks, spectrally-reduced neural networks allow for more scalable model training and processing in emerging machine learning tasks, spectrally-reduced circuit networks may lead to more efficient simulation, optimization and verification of large integrated circuit (IC) systems, etc.

Our approach consists of two key phases: 1) a scalable spectrum-preserving node aggregation (reduction) phase, and 2) a spectral graph sparsification phase with iterative subgraph scaling. To achieve truly scalable (nearly-linear time) performance for spectral graph reduction, we leverage recent similarity-aware spectral graph sparsification method (zhuo:dac18, ), graph-theoretic algebraic multigrid (AMG) Laplacian solver (livne2012lean, ; zhiqiang:iccad17, )

and a novel constrained stochastic gradient descent (SGD) optimization approach. The major contribution of this work has been summarized as follows:

(1) To well preserve the key spectral properties of the original graph in the reduced graph, a nearly-linear time spectrum-preserving node aggregation (reduction) scheme is proposed for robustly constructing reduced graphs that have much less number of nodes.

(2) A scalable framework for spectral graph sparsification and iterative subgraph scaling is introduced for assuring sparsity of the reduced graphs by leveraging a novel constrained SGD optimization approach.

(3) We introduce a simple yet effective procedure for refining solutions, such as the Laplacian eigenvectors, computed with spectrally-reduced graphs, which immediately allows using much smaller graphs in many numerical and graph algorithms while achieving superior solution quality.

(4) In addition, multilevel frameworks that allow us to leverage spectrally-reduced graphs for much faster spectral graph partitioning as well as t-distributed Stochastic Neighbor Embedding (t-SNE) of large data sets are proposed.

(5) We have obtained very promising experiment results for a variety of graph problems: the spectrally-reduced graphs allow us to achieve up to speedup for spectral graph partitioning and up to speedup for t-SNE visualization of large data sets.

The rest of this paper is organized as follows: Section 2 provides a brief introduction to spectral graph sparsification; Section 3 presents the proposed spectral graph reduction approach and its complexity analysis. Section 4 introduces applications of spectral graph reduction methods for scalable graph partitioning and data visualization. Extensive experimental results have been demonstrated in Section 5, which is followed by the conclusion of this work in Section 6.

## 2. Preliminaries

### 2.1. Laplacian Matrices of graphs

Consider an undirected graph with denoting the set of vertices, denoting the set of undirected edges, and denoting the associated edge weights. We define to be a diagonal matrix with being equal to the (weighted) degree of node , and and to be the adjacency and Laplacian matrices of undirected graph as follows, respectively:

 (1) AG(i,j)={wG(i,j) if (i,j)∈EG0otherwise

Graph Laplacians can be constructed by using and will satisfy the following conditions: 1. Each column and row sum will be equal to zero; 2. All off-diagonal elements are non-positive; 3. The graph Laplacian is a symmetric diagonally dominant (SDD) matrix.

### 2.2. Spectral Sparsification of Graphs

To further push the limit of spectral methods for handling big (data) graphs, many research problems for dramatically simplifying large graphs leveraging spectral graph theory have been extensively studied by mathematics and theoretical computer science (TCS) researchers in the past decade (spielman2011graph, ; Kolla:2010stoc, ; batson2012twice, ; spielman2011spectral, ; kolev2015note, ; peng2015partitioning, ; cohen2017almost, ; Lee:2017, ). Recent spectral graph sparsification research allows constructing nearly-linear-sized subgraphs that can well preserve the spectral (structural) properties of the original graph, such as the first few eigenvalues and eigenvectors of the graph Laplacian. The related results can potentially lead to the development of a variety of nearly-linear time

numerical and graph algorithms for solving large sparse matrices, graph-based semi-supervised learning (SSL), computing the stationary distributions of Markov chains and personalized PageRank vectors, spectral graph partitioning and data clustering, max-flow of undirected graphs, etc

(Spielman:usg, ; spielman2011graph, ; spielman2010algorithms, ; Kolla:2010stoc, ; miller:2010focs, ; Fung:2011stoc, ; spielman2011spectral, ; christiano2011flow, ; spielman2014sdd, ; cohen2017almost, ).

Spectral graph sparsification aims to find a spectrally-similar subgraph (sparsifier) that has the same set of vertices of the original graph , but much fewer edges. There are two types of sparsification methods: the cut sparsification methods preserve the cuts of the original graph through random sampling of edges (benczur1996approximating, ), whereas spectral sparsification methods preserve the graph spectral (structural) properties, such as distances between vertices, cuts in the graph, as well as the stationary distributions of Markov chains (cohen2017almost, ; spielman2011spectral, ). Therefore, spectral graph sparsification is a much stronger notion than cut sparsification. We say and its subgraph are spectrally similar if the following condition holds for all real vectors :

 (2) x⊤LPxσ≤x⊤LGx≤σx⊤LPx,

where and denote the Laplacian matrices of graph and , respectively. Define the relative condition number as , where and are the largest and smallest nonzero eigenvalues of

 (3) LGu=λLPu,

where is the generalized eigenvector of . It can be further shown that , which indicates that a smaller relative condition number or corresponds to a higher spectral similarity. Recent nearly-linear time spectral sparsification algorithm leverages spectral perturbation analysis to construct nearly-linear-sized spectrally-similar subgraphs (zhuo:dac16, ; zhuo:dac18, ), which leads to the development of much faster SDD matrix solvers (zhiqiang:iccad17, ) and spectral graph partitioning algorithm (zhuo:dac18, ).

## 3. Spectral Reduction of Graphs

### 3.1. Overview

In the following, assume that is a weighted, undirected, and connected graph, is the spectrally sparsified graph of , is the reduced graph of without sparsification, and is the sparsified reduced graph of . The Laplacian matrices of the corresponding graphs have been shown in Table 1 that also includes the fine-to-coarse (G-to-R) graph mapping matrix denoted by as well as the coarse-to-fine (R-to-G) graph mapping matrix denoted by .

This work introduces a spectral graph reduction framework (as shown in Figure 1) that allows computing much smaller yet spectrally-similar graph such that the following condition holds for all real vectors :

 (4) xR⊤LSxRσ≤x⊤LGx≤σxR⊤LSxR,   xR=HRGx.

An overview of the proposed method for spectral reduction of large graphs is described as follows. Our approach for spectral reduction of undirected graphs includes the following two phases: Phase (A) will determine the fine-to-coarse graph mapping operator using spectral node proximity measurement computed based on algebraic distance (chen2011algebraic, ), and reduce the original graph into a much smaller graph using the fine-to-coarse graph mapping operator; Phase (B) will extract spectrally-similar sparsifiers of the original (reduced) graph and scale up edge weights in the sparsified graphs to better match the key spectral (structural) properties, such as the eigenvalues/eigenvectors of graph Laplacians. Since the spectral node proximity metric based on algebraic distance cannot be directly applied to dense graphs (chen2011algebraic, ), our approach will first examine the average node degrees in the original graph: if the original graph is relatively sparse (), Phases (A) to (B) will be performed in sequence as shown in Figure 1; otherwise, if the original graph is too dense (), Phase (B) for spectral graph sparsification and edge scaling will be performed first, which is followed by Phase (A).

### 3.2. Phase (A): Spectrum-Preserving Node Reduction

#### 3.2.1. Spectral node affinity metric.

To generate the reduced graph based on the original graph, a spectrum-preserving node aggregation scheme is applied based on spectral node affinity defined as follows for neighboring nodes and (livne2012lean, ; chen2011algebraic, ):

 (5) ap,q=∥(Xp,Xq)∥2(Xp,Xp)(Xq,Xq),(Xp,Xq)=ΣKk=1(x(k)p⋅x(k)q)

where includes test vectors computed by applying a few Gauss-Seidel (GS) relaxations for solving the linear system of equations for with random vectors that are orthogonal to the all-one vector or equivalently satisfying . Let denote the approximation of the true solution after applying several GS relaxations to . Due to the smoothing property of GS relaxation, the latest error can be expressed as , which will only contain the smooth (low-frequency) modes of the initial error, while the oscillatory (high-frequency) modes of the initial error will be effectively eliminated (briggs2000multigrid, ). Based on the smoothed vectors in , it is possible to embed each node into a -dimensional space such that node and node are considered spectrally-close enough to each other if their low-dimensional embedding vectors and are highly correlated. Spectrally-similar nodes and can be then aggregated together for node reduction purpose.

#### 3.2.2. Spectral similarity between nodes.

It has been shown that the node affinity metric can usually effectively reflect the distance or strength of connection between nodes and in a graph (livne2012lean, ): a larger value indicates a stronger spectral similarity (correlation) between nodes and . The above affinity metric can be better understood by looking at the algebraic distance between node and computed by that can be used to represent the geometric distance in grid-structured graphs. For example, consider an -node path graph with a discretization size of . It can be shown that is proportional to in the discretized Poisson equation in which nodes locate at , for and . Consequently, the nodes with large affinity or small algebraic distance should be aggregated together to form the nodes in the reduced graph. Once node aggregation schemes are determined, the graph mapping operators ( and ) can be obtained and leveraged for constructing spectrally-reduced graphs. For example, the reduced Laplacian can be computed by , which uniquely defines the reduced graph.

We emphasize that the node aggregation (reduction) scheme based on the above spectral node affinity calculations will have a (linear) complexity of and thus allow preserving the spectral (global or structural) properties of the original graph in the reduced graph in a highly efficient and effective way: the smooth components in the first few Laplacian eigenvectors can be well preserved after node aggregation, which is key to preserving the first few (bottom) eigenvalues and eigenvectors of the original graph Laplacian in the reduced graphs (pmlr-v80-loukas18a, ).

#### 3.2.3. Limitations when dealing with dense graphs.

The above node reduction scheme based on the algebraic distance metric may not be reliable when applied to dense graph problems. Since each node in the dense graph will typically connect to many other nodes, running a few GS relaxations will result in many nodes seemingly close to each other and can lead to rather poor node aggregation results. For example, an extreme case is to directly apply the above node aggregation scheme to a complete graph where each node has edges connecting to the rest of the nodes: since applying GS relaxations will immediately assign same values to all nodes, no meaningful clusters of nodes can be identified. As shown in our experiment results, it is not possible to use the above node affinity metric for aggregating nodes for the “appu” graph (davis2011matrix, ) that has high average node degrees ().

To this end, we propose to perform a spectral sparsification and scaling procedure (Phase (B)) before applying the node aggregation (reduction) phase. Such a scheme will allow extracting ultra-sparse yet spectrally-similar subgraphs and subsequently aggregate nodes into clusters using the above node affinity metric. As a result, the spectral graph reduction flow proposed in this work can be reliably applied to handle both sparse and dense graphs, as shown in Figure 1.

### 3.3. Phase (B): Spectral Graph Sparsification and Scaling

#### 3.3.1. Copping with high graph densities.

The proposed spectral node aggregation scheme in Section 3.2 will enable us to reliably construct smaller graphs that have much less number of nodes. However, the aggregated nodes may potentially result in much denser graphs (with significantly higher node degrees), which may incur even greater computational and memory cost for graph operations. For example, emerging multi-way spectral graph partitioning (clustering) algorithms (lee2014multiway, ; peng2015partitioning, ) require to compute multiple Laplacian eigenvectors, which can still be very costly for dense graphs since the efficiency of nowadays eigen-solvers or eigendecomposition methods strongly depend on the matrix sparsity (xue2009numerical, ; lehoucq1997arpack, ; saad2003iterative, ).

To address the challenging issues caused by relatively dense graphs, we propose the following highly effective yet scalable algorithms in Phase (B): the nearly-linear time spectral graph sparsification and subgraph scaling schemes for handling dense graphs . Note that when Phase (B) is applied for a sparse input graph, the same procedures can be applied to the reduced graph (with potentially higher density) for computing after the node aggregation scheme or the fine-to-coarse graph mapping operator is determined.

#### 3.3.2. Spectral approximation with spanning tree subgraphs.

Denote the total stretch of the spanning-tree subgraph with respect to the original graph to be . Spielman (spielman2009note, ) showed that has at most generalized eigenvalues greater than . It has been shown that every graph has a low-stretch spanning tree (LSST) with bounded total stretch (spielman2009note, ), which leads to:

 (6) κ(LG,LP)≤Tr(L+PLG)=stP(G)≤(mlognloglogn),

where , , and is the trace of . Such a result motivates to construct an ultra-sparse yet spectrally-similar subgraphs by recovering only a small portion of important off-tree edges to the spanning tree. For example, a recent spectral perturbation framework (zhuo:dac16, ; zhuo:dac18, ) allows constructing the -similar spectral sparsifiers with off-tree edges in nearly-linear time.

#### 3.3.3. Towards better approximation with off-tree edges.

To dramatically reduce the spectral distortion (the total stretch) between the original graph and the spanning tree, a spectral off-tree edge embedding scheme and edge filtering method with approximate generalized eigenvectors have been proposed in (zhuo:dac16, ; zhuo:dac18, ), which is based on following spectral perturbation analysis:

 (7) LG(ui+δui)=(λi+δλi)(LP+δLP)(ui+δui),

where a perturbation is applied to , which results in perturbations in generalized eigenvalues and eigenvectors for , respectively. The first-order perturbation analysis (zhuo:dac16, ) leads to:

 (8) −δλiλi=u⊤iδLPui,

which indicates that the reduction of is proportional to the Laplacian quadratic form of with the generalized eigenvector . Therefore, if the dominant eigenvector is applied, the largest generalized eigenvalue can be significantly reduced by properly choosing that includes the set of off-tree edges and their weights. Once the largest generalized eigenvalue becomes sufficiently small, the distortion between subgraph and original graph will be greatly reduced.

An alternative view of such a spectral embedding scheme is to consider the following Courant-Fischer theorem for generalized eigenvalue problems:

 (9) λn=max|x|≠0x⊤1=0x⊤LGxx⊤LPx≈max|x|≠0x(p)∈{0,1}x⊤LGxx⊤LPx=max|∂G(Q)||∂P(Q)|,

where is the all-one vector, the node set is defined as

 (10) Qdef={p∈V:x(p)=1},

and the boundary of in is defined as

 (11) ∂G(Q)def={(p,q)∈EG:p∈Q,q∉Q},

 (12)

According to (9), will reflect the largest mismatch of the boundary (cut) size between and , since finding the dominant generalized eigenvector is similar to finding the node set such that or the mismatch of boundary (cut) size between the original graph and subgraph is maximized. Once or can be identified by spectral graph embedding using dominant generalized eigenvectors, the edges in can be selected and recovered to to dramatically reduce the maximum mismatch or .

Denote to be the elementary unit vector with only the -th element being and others being , and we denote . Then by including the off-tree edges, the generalized eigenvalue perturbation can be expressed as follows:

 (13) −δλiλi=u⊤iδLP,maxui=∑(p,q)∈EG∖EPwG(p,q)(eTp,qui)2,

where , and denotes the weight of edge in the original graph. The spectral criticality of each off-tree edge is defined as:

 (14) cp,q=wG(p,q)(eTp,qun)2.

If we consider the undirected graph to be a resistor network, and to be the voltage vector for that resistor network, can be regarded as the edge Joule heat (power dissipation). Consequently, the most spectrally critical off-tree edges from can be identified and recovered into LSST for spectral graph topology sparsification by (14), which allows improving spectral approximation in the subgraph by dramatically reducing the . In practice, approximate generalized eigenvectors computed through a small number of generalized power iterations will suffice for low-dimensional spectral off-tree edge embedding, which can be realized as follows:

(1) Compute an approximate eigenvector by applying -step generalized power iterations on an initial vector :

 (15)

(2) Compute the quadratic form for off-tree edges with :

 (16)

where denotes the approximate spectral criticality of each off-tree edge . It should be noted that using vectors computed by (15) will enable to embed each node into a -dimensional

generalized eigenspace

, which can facilitate edge filtering from to avoid recovering similar edges into . In this work, we choose , which already leads to consistently good results for a large variety of graph problems. To achieve more effective edge filtering for similarity-aware spectral graph sparsification, an incremental graph densification procedure (zhuo:dac18, )

will be adopted in this work. During each graph densification iteration, a small portion of “filtered” off-tree edges will be added to the latest spectral sparsifier, while the spectral similarity is estimated to determine if more off-tree edges are needed.

#### 3.3.4. Subgraph scaling via constrained optimization.

To aggressively limit the number of edges in the subgraph while still achieving a high quality approximation of the original graph , we propose an efficient spectral scaling scheme for scaling up edge weights in the subgraph to further reduce the largest mismatch or . The dominant eigenvalue perturbation can be expressed in terms of edge weight perturbations as follows:

 (17) −δλnλn=u⊤nδLPun=∑(p,q)∈EPδwP(p,q)(e⊤p,qun)2,

which directly gives the sensitivity of with respect to each edge weight in graph :

 (18) δλnδwP(p,q)=−λn(e⊤p,qun)2≈−λn(e⊤p,qht)2.

The (approximate) sensitivity expressed in (18) can be leveraged for finding a proper edge weight scaling factor for each edge in such that will be dramatically reduced. Since scaling up edge weights in will result in the monotonic decrease of both and , it is likely that will decrease at a faster rate than , which leads to a degraded spectral similarity between and . To avoid such a degradation in spectral approximation quality, we propose the following methods for estimating the extreme generalized eigenvalues and , which allows us to more properly scale up edge weights in .

The largest eigenvalues of are well separated from each other (spielman2009note, ), which allows computing rather accurate largest eigenvalue () by performing only a small number of generalized power iterations with an initial random vector. Since the generalized power iterations can converge at a geometric rate governed by , the error of the estimated largest generalized eigenvalue will drop to after iterations for an initial error . As a result, only a few (five to ten) iterations will be sufficient to compute a good estimation of for well separated largest eigenvalues that lead to small . To gain scalable runtime performance, we will leverage recent graph-theoretic algebraic multigrid (AMG) algorithms for solving the sparsified Laplacian matrix (livne2012lean, ; zhiqiang:iccad17, ).

Since the smallest eigenvalues of are crowded together, using (shifted) inverse power iterations may not be efficient due to the slow convergence caused by relatively poor separation of smallest eigenvalues. To more efficiently estimate the smallest generalized eigenvalue, we leverage the Courant-Fischer theorem for approximately computing the smallest generalized eigenvalues:

 (19) λ1=λmin=min|x|≠0x⊤1=0x⊤LGxx⊤LPx,

which indicates that the key to locating the smallest generalized eigenvalues is to find a vector that minimizes the ratio between the quadratic forms of the original and sparsified Laplacians. In our method, we will require every element in to only take a value or for each node in both and for minimizing the following ratio, which will lead to a good estimation for :

 (20)

To this end, we initialize all nodes with the same value of and only select a single node to be assigned with a value of , which leads to:

 (21) λ1≈minp∈VdG(p)dP(p),

where and are the diagonal vectors of and satisfying and . (21) indicates that can be well approximated in linear time by finding the node with minimum weighted degree ratio of and .

Based on the above scalable methods for estimating the extreme eigenvalues and of , as well as the weight sensitivity in (18), the following constrained nonlinear optimization framework for scaling up edge weights in has been proposed.

 (22) minimize:     λn(wP)s. t.:(a)       LGui=λiLPui,  i=1,...,n;(b)    λmax=λn≥λn−1...≥λ1=λmin;(c)        λ(f)1≥λ(0)1¯¯¯¯¯Δλ1.

In the above formulation, and represent the smallest nonzero eigenvalue before and after subgraph scaling, respectively. represents the upper bound of reduction factor in after edge scaling. (22) aims to minimize by scaling up subgraph edge weights while limiting the decrease in .

A constrained SGD algorithm with momentum (sutskever2013importance, ) has been proposed for iteratively scaling up edge weights, as shown in Algorithm 1. The algorithm inputs include: the graph Laplacians and , vectors and for storing diagonal elements in Laplacians, the initial largest and smallest generalized eigenvalues and , the upper bound reduction factor for , the coefficient for combining the previous and the latest updates during each SGD iteration with momentum, the maximum step size for update, as well as the SGD convergence control parameters and . Lines 1-2 initialize parameters for the following SGD iterations. Line 3 monitors the convergence condition for SGD iterations. Lines 6-7 compute the weight update in SGD using the latest sensitivity and the previous update (momentum). Lines 8-17 check the impact on due to weight update: if decreases significantly, an upper bound for weight update is applied; otherwise directly apply the weight update computed in the previous steps.

### 3.4. Algorithm Complexity of The Proposed Spectral Graph Reduction Approach

The complete algorithm flow for the proposed spectral graph reduction approach has been shown in Algorithm 2. The algorithm complexity of Phase (A) for the spectrum-preserving node reduction procedure is for dense graphs and for sparse graphs, the complexity of Phase (B) for spectral graph sparsification and edge scaling by SGD iterations is for dense graphs and for sparse graphs. Therefore, the worse-case algorithm complexity of the proposed spectral graph reduction method is .

### 3.5. Solution Refinement by Graph Filters

#### 3.5.1. Graph Signal Processing and Spectral Sparsification/Reduction

To efficiently analyze signals on undirected graphs, graph signal processing techniques have been extensively studied in recent years (shuman2013emerging, ). There are analogies between traditional signal processing or classical Fourier analysis and graph signal processing (shuman2013emerging, ): (1) The signals at different time points in classical Fourier analysis correspond to the signals at different nodes in an undirected graph; (2) The more slowly oscillating functions in time domain correspond to the graph Laplacian eigenvectors associated with lower eigenvalues or the more slowly varying (smoother) components across the graph. The spectrally sparsified/reduced graphs can be regarded as a “low-pass” filtered graphs, which have retained as few as possible edges/nodes for preserving the slowly-varying or “low-frequency” signals on the original graphs. Consequently, spectrally sparsified/reduced graphs will be able to preserve the eigenvectors associated with low eigenvalues more accurately than high eigenvalues.

#### 3.5.2. Solution Error due to Spectral Sparsification

Denote the ascending eigenvalues and the corresponding unit-length, mutually-orthogonal eigenvectors of by , and , respectively. Similarly denote the eigenvalues and eigenvectors of by and , respectively. It should be noted that both and are the normalized all-one vector . Then the following spectral decompositions of and will hold:

 (23) LG=n∑i=1ζiωiω⊤i,LP=n∑i=1~ζi~ωi~ω⊤i.

We assume that the smallest eigenvalues and their eigenvectors of have been pretty well preserved in , while the remaining higher eigenvalues and eigenvectors are not. Consequently the following approximate spectral decompositions of will hold:

 (24) LP≈k∑i=1ζiωiω⊤i+n∑i=k+1~ζi~ωi~ω⊤i.

In the following, we show that using spectrally-sparsified graphs for solving sparse matrix problems will only result in solution errors that can be expressed with high eigenvectors, while the error analysis for spectrally-reduced graphs will be quite similar and omitted in this work. Consider the following SDD matrix solution problem:

 (25) (LG+δI)x=b⊥,

where is a random right-hand-side (RHS) vector orthogonal to the all-one vector , is a very small positive real value added to graph Laplacian for modeling boundary conditions, and

is an identity matrix that can be written as follows:

 (26) I=n∑i=1ωiω⊤i≈k∑i=1ωiω⊤i+n∑i=k+1~ωi~ω⊤i,

we can rewrite as follows:

 (27) LG+δI=n∑i=1(ζi+δ)ωiω⊤i.

Consequently, can be written as:

 (28) x=n∑i=1ωiω⊤ib⊥ζi+δ.

Let denote the approximate solution obtained with , then:

 (29) ~x≈n∑i=k+1~ωi~ω⊤ib⊥~ζi+δ+k∑i=1ωiω⊤ib⊥ζi+δ,

which allows us to express the error vector as follows:

 (30) e=x−~x≈n∑i=k+1(ωiω⊤ib⊥ζi+δ−~ωi~ω⊤ib⊥~ζi+δ).

(30) indicates that when using the sparsified graph Laplacian for solving the SDD matrix, the solution error can be expressed as a linear combination of high eigenvectors corresponding to large Laplacian eigenvalues. Therefore, the error due to the sparsified graph Laplacian will be a combination of high frequency signals on graphs, which thus can be efficiently filtered out using “low-pass” graph signal filters (shuman2013emerging, ).

#### 3.5.3. Solution Refinement by Smoothing

Motivated by recent graph signal processing research (shuman2013emerging, ), we introduce a simple yet effective procedure for improving solutions computed using spectrally sparsified/reduced graphs, which will enable to leverage ultra-sparse subgraphs in many numerical and graph algorithms without sacrificing solution quality. To this end, weighted Jacobi or Gauss-Seidel methods can be applied for filtering out such high-frequency error signals on graphs, which have been widely adopted in modern iterative methods for solving large sparse matrices (saad2003iterative, ), such as the smoothing (relaxation) function in multigrid algorithms (livne2012lean, ). This work adopts a weighted Jacobi iteration scheme for filtering eigenvectors on the graph, while the detailed filtering algorithm has been described in Algorithm 3. The algorithm inputs include the original Laplacian matrix that has been decomposed into a diagonal matrix and an adjacency matrix , the approximate solution vectors obtained using sparsified Laplacian , as well as the weight and iteration number for signal filtering.

## 4. Spectral Reduction for Multilevel Graph Partitioning and Data Visualization

In this section, multilevel frameworks that allow us to leverage spectrally-reduced graphs for much faster spectral graph partitioning as well as t-distributed Stochastic Neighbor Embedding (t-SNE) of large data sets are introduced.

### 4.1. Multilevel Laplacian Eigensolver for Scalable Spectral Graph Partitioning

The k-way spectral graph partitioning (clustering) algorithm has been described in Algorithm 4 (shi2000normalized, ; von2007tutorial, ), where the Laplacian eigensolver is usually the computational bottleneck for dealing with large graphs. To this end, we proposed a multilevel Laplacian eigensolver for more efficiently solving eigenvalue problems by leveraging spectrally-reduced graphs. Note that only the first few nontrivial eigenvectors of the original graph Laplacian are needed for spectral partitioning (clustering) tasks, the spectrally-reduced graphs will thus enable us to solve the eigenvalue problem in a much faster way without loss of solution quality.

The algorithm flow of the proposed multilevel eigensolver is shown in Figure 2. Instead of directly computing the first eigenvectors of the generalized eigenvalue problem , we will first reduce the original graph into a much smaller graph such that the eigenvectors of reduced graph can be efficiently calculated. Next, we will map the eigenvectors of the reduced graph Laplacian onto a finer level using the graph mapping operators (as shown Table 1) determined during node aggregation procedure (Phase A). To further improve the approximation quality of these eigenvectors, we apply an eigenvector refinement (smoothing) procedure similar to Algorithm 3

. The eigenvector mapping and smoothing procedures are recursively applied until the finest-level graph is reached. In the last, all eigenvectors for finest-level graph will be orthonormalized through the Gram-Schmidt process.

The proposed eigenvector smoothing process is based on the following equations for :

 (31) (LυG−λΥiBυG)uυi=0,

where is the Laplacian on level after graph reduction, where represents the finest level. We use for denoting the coarsest (bottom) level, where ; will be used for ratio cut and for normalized cut (see Section A in the Appendix for more details); is the eigenvalue of following generalized eigenproblem:

 (32) LΥGuΥi=λΥiBΥGuΥi

The detailed algorithm for multilevel Laplacian eigensolver is shown in Algorithm 5. The inputs of the algorithm include the Laplacian matrix of each hierarchical level , where ; mapping operator from level to level ; and the number of eigenvectors . In the last, spectral partitioning or clustering can be performed using the eigenvectors computed by Algorithm 5 in the subsequent k-means clustering step.

### 4.2. Multilevel t-SNE Algorithm for Scalable Data Visualization

Visualization of high-dimensional data is a fundamental problem in data analysis and has been used in many applications, such as medical sciences, physics and economy. In recent years, the t-Distributed Stochastic Neighbor Embedding (t-SNE) has become the most effective visualization tool due to its capability of performing dimensionality reduction in such a way that the similar data points in high-dimensional space are embedded onto nearby locations in low-dimensional space of two or three dimensions with high probability. However, t-SNE may suffer from very high computational cost for visualizing large real-world data sets due to the superlinear algorithm computational complexity

(maaten2008visualizing, ; van2014accelerating, ), where is the number of data points in the data set.

Recent research work shows that there is a clear connection between spectral graph partitioning (data clustering) and t-SNE (linderman2017clustering, ): the low-dimensional data points embedding obtained with t-SNE is closed related to first few eigenvectors of the corresponding graph Laplacian that encodes the manifold of the original high-dimensional data points. This motivates us to leverage the spectrally-reduced graphs for computing similar t-SNE embedding results by proposing a multilevel t-SNE algorithm, as described in Algorithm 6 and shown in Figure 3.

The main idea of our multilevel t-SNE algorithm is to aggregate the data points that are closely related to each other on the manifold into much smaller sets, such that visualizing the reduced data set using t-SNE will be much faster and produce similar embedding results. To this end, we start by constructing a nearest-neighbor (NN) graph, such as the k-NN graph, for the original high-dimensional data points; then a spectrally-reduced (NN) graph is computed using the proposed spectral reduction algorithm. Note that for k-NN graphs, the graph sparsification and scaling procedure (Phase B) will be performed before the spectral node aggregation step (Phase A).

## 5. Experimental Results

In this section, extensive experiments have been conducted to evaluate the proposed spectral graph reduction and spectral partitioning methods with various types of graphs from the DIMACS10 graph collection(bader2014benchmarking, ; bader2012graph, ). Graphs are from different applications, such as finite-element analysis problems (“fetooth”, “ferotor”) (davis2011matrix, ), numerical simulation graphs (“wing_nodal”), clustering graphs (“uk”) and social network graphs (“coAuthorsDBLP” and “coPapersCiterseer”) (davis2011matrix, ), etc. All experiments have been conducted on a single CPU core of a computing platform running 64-bit RHEW 6.0 with 2.67GHz 12-core CPU and 48GB DRAM memory.

Figure 4 shows the spectral drawings (koren2003spectral, ) of the feocean graph and its reduced graph computed by the proposed spectral graph reduction algorithm, where the node and edge reduction ratio are and , respectively. We observe that the spectral drawings of two graphs are highly similar to each other, which indicates very well preserved spectral properties (Laplacian eigenvectors) in the reduced graph. Figure 5 shows the first few normalized eigenvalues of the original and reduced graph Laplacians, indicating clearly that the smallest eigenvalues of the original Laplacian and the reduced Laplacian match very well even for very large reduction ratios.

Table 2 shows spectral graph reduction results on different kinds of graphs using the proposed method, where denotes the spectral graph reduction time. Compared to other test cases that correspond to sparse graphs, the graph densities of “” and “” are much higher and thus have been processed as dense graphs. We want to further emphasize that directly applying the prior algebraic-distance-based node aggregation scheme (chen2011algebraic, ) will not produce acceptable results. For example, the node aggregation algorithm failed to generate the reduced graph for “” due to very high graph density. On the other hand, there will be no issue for dense graphs if we apply Phase (B) for spectral graph sparsification and scaling before the node aggregation phase.

Figure 6 shows the total spectral graph reduction time with different problem sizes for various graphs, where () denotes the number of edges (nodes) of the original graphs, respectively. As observed, the total spectral reduction runtime increases almost linearly with the problem size, indicating highly scalable performance of the proposed method ().