Many of the most interesting structures and phenomena of our world are naturally described as graphs (eg,111The authors agree with the sentiment of the footnote on page 15 in , viz, omitting superfluous full stops to obtain a more efficient (yet similarly lossless) compression of, eg: videlicet, exempli gratia, etc.
brains, social networks, the internet, etc). Indeed, graph data are becoming increasingly relevant to the field of machine learning[2, 3, 4]. These graphs are frequently massive, easily surpassing our working memory, and often the computer’s relevant cache . It is therefore essential to obtain smaller approximate graphs to allow for more efficient computation and storage. Moreover, graph reduction aids in visualization and can provide structural insights.222For example, a common theme in biological systems is the presence of complicated pathways that produce a relatively simple result (eg, protein activation pathways). Semantic understanding comes from the reduction of these subsystems to their resulting behavior (eg, activates a chain that eventually inhibits ).
Graphs are defined by a set of nodes and a set of edges between them, and are often represented as an adjacency matrix333 A quick notation note: underlines denote bundling of multiple scalars into a single object, where the number of underlines denotes the rank of the tensor (ie, a single underline for vectors, double for matrices). Subscripts have nothing to do with Einstein summation notation; they are contextually-defined adornments.
A quick notation note: underlines denote bundling of multiple scalars into a single object, where the number of underlines denotes the rank of the tensor (ie, a single underline for vectors, double for matrices). Subscripts have nothing to do with Einstein summation notation; they are contextually-defined adornments.-0.75pt OcFFL with size and density . Reducing either of these quantities is advantageous; graph “coarsening” focuses on the former, aggregating nodes while respecting the overall structure, and graph “sparsification” on the latter, preferentially retaining the important edges.
A variety of algorithms have been proposed to attain these goals, and a frequently recurring theme is to consider the graph Laplacian , where -0.75pt OcFFL is the diagonal matrix of node degrees. Indeed, it appears in a wide range of applications (enough to motivate the coining of the term “Laplacian paradigm”). For example: its spectral properties can be leveraged for community detection ; it can be used to efficiently solve min cut/max flow problems 
; and for undirected, positively weighted graphs (the focus of our paper), it induces a natural quadratic form (which can be used, eg, to smoothly interpolate functions over the nodes).
Work on spectral graph sparsification focuses on the preservation of the Laplacian quadratic form , a popular measure of spectral similarity suggested by Spielman and collaborators 
. A key result is that any graph may be sparsified by subsampling its edges, with optimal probability proportional to their effective resistance (a measure of edge importance defined via the Laplacian pseudoinverse). Work on graph coarsening, however, has not reached a similar consensus. While it has been recently proposed to measure similarity between the original and coarsened graph using an analogous restricted quadratic form , the objective functions used for coarsening do not explicitly optimize this quantity. For example, Jin & Jaja  use the lowest eigenvectors of the Laplacian (those corresponding to global structure) as feature vectors to perform -means clustering of the nodes, while Purohit et al 
aim to minimize the change to the largest eigenvalue of the adjacency matrix.
While recent work has combined coarsening and sparsification , they were performed by separate algorithmic primitives, essentially analyzing the composition of the above algorithms. In this work, we capture both of these operations using a single objective function that naturally promotes the preservation of large-scale structure. Our primary contributions include:
An identification of infinite edge weight with edge contraction, highlighting the finite limit of the Laplacian pseudoinverse (Section 2);
A measure of edge importance of independent interest, which could be used to analyze connections in real-world networks (Section 3);
A more sensitive measure of spectral similarity of graphs (Section 5); and
A demonstration that our framework preferentially preserves the large-scale structure (Section 5).
2 Why the Laplacian pseudoinverse?
Many computations over graphs involve solving for . Thus, the (algebraically) relevant object is arguably the Laplacian pseudoinverse , and in fact it has been used to derive useful measures of graph similarity . Moreover, its largest eigenvalues are associated to global structure, thus, preserving the action of will preferentially maintain the overall shape of the graph. We now describe why is a natural operator to consider for both graph coarsening and sparsification.
Attention is often restricted to undirected, positively weighted graphs . These graphs have many convenient properties, eg, their Laplacians are positive semi-definite () and have a well-understood kernel and cokernel (). The edge weights are defined as a mapping . When the weights represent connection strength, it is generally understood that is equivalent to removing edge . However, the closure of the positive reals has a reciprocal limit, namely .
This limit is rarely considered, as it can lead to divergences in many classical notions of graph similarity. A relevant example is the standard notion of spectral approximation, defined as preserving the Laplacian quadratic form to within a factor of for all vectors . Clearly, this limit yields a graph that does not approximate the original for any : any with different values for the two nodes joined by edge now yields an infinite quadratic form. This suggests considering only vectors that have the same value for these two nodes, essentially contracting them into a single “supernode”.
Algebraically, this interpretation is reflected in , which remains finite in this limit: the pair of rows (and columns) corresponding to the contracted nodes becoming identical.
Physically, consider the behavior of the heat equation : as , the values on the two nodes immediately equilibrate between themselves, and remain tethered for the rest of the evolution.444In the spirit of another common analogy (edge weights as conductances of a network of resistors), breaking a resistor is equivalent to deleting that edge, while contraction amounts to completely soldering over it.
Geometrically, the reciprocal limits of and have dual interpretations: consider a planar graph and its planar dual; edge deletion in one graph corresponds to contraction in the other, and vice versa. This naturally extends to nonplanar graphs via their graphical matroid and its dual .
3 Our graph reduction framework
We provide a framework to construct probabilistic algorithms that generate a reduced graph from an initial graph , motivated by the following desirable properties:
reduce the number of edges and nodes,
preserve in expectation,
minimize the change in .
In this section, we state these goals more formally, starting with a single iteration in its simplest form (ie, applied to a single, predetermined edge ).
3.1 Preserving the Laplacian pseudoinverse
Consider perturbing the weight of a single edge by . The change in the Laplacian is simply
where and are the original and perturbed Laplacians, respectively, and is the signed incidence (column) vector associated to edge , with entries
The change in is given by the Woodbury matrix identity555This expression is only officially applicable when the initial and final matrices are full-rank; additional care must be taken when they are not. However, as and share the same kernel, and is perpendicular to it, the original formula remains valid  (so long as the graph remains connected). :
Notice that this change can be expressed as a matrix that depends only on the choice of edge , multiplied by a scalar term that depends (nonlinearly) on the perturbation to that edge:
Hence, if the probabilistic reweight of this edge is chosen such that , then we have , as desired. Importantly, remains finite in the following relevant limits:
3.2 Minimizing the error
Minimizing the magnitude of the change requires a choice of matrix norm, which we take to be the sum of the squares of the entries (ie, the square of the Frobenius norm). Our motivation is twofold. First, is the algebraically convenient fact that the Frobenius norm of a rank one matrix has a simple form, viz
Second, the square of this norm behaves as a variance; to the extent that theassociated to different edges can be approximated as (entrywise) uncorrelated, one can decompose multiple perturbations as follows:
which greatly simplifies the analysis when multiple reductions are considered (see Section 4).
3.3 Reducing edges and nodes
Depending on the application, one may desire to reduce the number of nodes (ie, coarsen), the number of edges (ie, sparsify), or both.
Let be the number of relevant items reduced during a particular iteration. When those items are nodes, then for a contraction, and for a deletion. When those items are edges, then for a deletion. However, for a contraction, is possible; if the contracted edge formed a triangle in the original graph, then the other two edges will become parallel in the reduced graph. With respect to the Laplacian, this is equivalent to a single edge, with weight given by the sum of these parallel edges. Thus, for a contraction, , where is the number of triangles in which the contracted edge participates.
3.4 A cost function for spectral graph reduction
Motivated by the previous discussions, we seek to minimize the following quantity:
where is a parameter that controls the tradeoff between items reduced and error incurred in the .
Let () be the probability of deleting (contracting) edge . If the probabilistic change to this edge results in neither deletion nor contraction, it will be reweighted (with probability ). Hence, the constraint (9) requires that these reweights satisfy
Clearly, the space of allowed solutions lies within the simplex , , . The additional constraint further implies that and . Hence, we substitute (12) into (11), and minimize it over this domain (given the parameters , , , and ).
For a given edge , there are three regimes for the solution, depending on the value of (see Table 1 and Table 2). For , the edge should not be perturbed. For , the optimal probabilistic action leads to either deletion or contraction. In the intermediate case (), if , the edge is either deleted or reweighted, and if , the edge is either contracted or reweighted.
3.5 Node weighted Laplacian
Often, when nodes are merged, one represents the connectivity by a matrix of smaller size. To properly compare its spectral properties with those of the original, one must keep track of the number of original nodes that make up these “supernodes” and assign them proportional weights. The appropriate Laplacian is then , where the -0.75pt OcFFL are the diagonal matrices of node weights666We remark that the use of the random walk matrix is essentially using node degree as a surrogate for node weights. (commonly referred to as the “mass matrix” ) and edge weights, respectively, and -0.75pt OcFFL is the signed incidence matrix.
Moreover, when updating the , one must be careful to choose the appropriate pseudoinverse for this Laplacian, which is given by
where is the vector of node weights.
4 Proposed algorithms for graph reduction
Using this framework, we now describe our graph reduction algorithms.
Similar to many graph coarsening algorithms [22, 23], our scheme obtains the reduced graph by acting on the initial graph (as opposed to building it up by adding edges to the empty graph, as in most sparsification algorithms [24, 25]). We first present a simple algorithm that reduces the graph in a single step. We then outline a more general multi-step scheme.
4.1 A single-step algorithm
Algorithm 1 describes the procedure for a single-step reduction. It assumes an appropriate choice of: , the fraction of edges to be considered for reduction; and , the parameter controlling the error.
Care must be taken, as multiple deletions or contractions may result in undesirable behavior. For example, while any edge that is itself a cut-set will never be deleted, a collection of edges that together make a cut-set might all have finite deletion probability. Hence, if multiple edges are simultaneously deleted, the graph could become disconnected. Conversely, this algorithm could underestimate the change in the inverse associated with simultaneous contractions. For example, consider two important nodes both connected to the same unimportant node: contracting the unimportant one into either of the other two would be fine, but performing both contractions would merge the two important nodes.
We now present a more general multi-step scheme, and a conservative limit that eliminates these issues.
4.2 A multi-step scheme
Algorithm 2 describes our general multi-step scheme. Its inputs are: , the original graph; , the fraction of edges to be sampled each iteration; , the minimum expected decrease in target items per perturbed edge; , the fraction of sampled edges to be acted upon; and StopCriterion, a user-defined function.
With these inputs, we select implicitly. Let be the minimum such that for edge . Each iteration, we compute the for all sampled edges, and choose a such that a fraction of them have . We then apply the corresponding probabilistic actions to this subset of sampled edges.
We note that , , and could vary as a function of the iteration number, and an appropriate choice could lead to an improved tradeoff between accuracy and running time.
However, we set such that only the sampled edge with the lowest is acted upon (ie, ). While likely too conservative, this choice avoids the problems associated with simultaneous reductions. Additionally, we choose to compute at the onset and update it using the Woodbury matrix identity.
5 Empirical results
In this section, we validate our framework and compare it with existing algorithms. We start by considering two natural limits of our general framework, namely graph sparsification (removing regimes involving edge contraction), and graph coarsening (where the goal is to reduce the number of nodes).
5.1 Comparison with spectral graph sparsification
Figure 1 compares our algorithm (targeting reduction of edges and not considering contraction) with the spectral sparsification algorithm from . We consider the stochastic block model, as there is a clear separation of the eigenvectors associated to the global structure (ie, the communities) and the bulk of the spectrum. Note that these algorithms have different objectives (preserving the and -0.75pt OcFFL, respectively), and both accomplish their desired goal.