A Unifying Framework for Spectrum-Preserving Graph Sparsification and Coarsening

02/26/2019 ∙ by Gecia Bravo Hermsdorff, et al. ∙ Princeton University 0

How might one "reduce" a graph? That is, generate a smaller graph that preserves the global structure at the expense of discarding local details? There has been extensive work on both graph sparsification (removing edges) and graph coarsening (merging nodes, often by edge contraction); however, these operations are currently treated separately. Interestingly, for a planar graph, edge deletion corresponds to edge contraction in its planar dual (and more generally, for a graphical matroid and its dual). Moreover, with respect to the dynamics induced by the graph Laplacian (e.g., diffusion), deletion and contraction are physical manifestations of two reciprocal limits: edge weights of 0 and ∞, respectively. In this work, we provide a unifying framework that captures both of these operations, allowing one to simultaneously coarsen and sparsify a graph, while preserving its large-scale structure. Using synthetic models and real-world datasets, we validate our algorithm and compare it with existing methods for graph coarsening and sparsification. While modern spectral schemes focus on the Laplacian (indeed, an important operator), our framework preserves its inverse, allowing one to quantitatively compare the effect of edge deletion with the (now finite) effect of edge contraction.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many of the most interesting structures and phenomena of our world are naturally described as graphs (eg,111The authors agree with the sentiment of the footnote on page 15 in [1], viz, omitting superfluous full stops to obtain a more efficient (yet similarly lossless) compression of, eg: videlicet, exempli gratia, etc.

brains, social networks, the internet, etc). Indeed, graph data are becoming increasingly relevant to the field of machine learning

[2, 3, 4]. These graphs are frequently massive, easily surpassing our working memory, and often the computer’s relevant cache [5]. It is therefore essential to obtain smaller approximate graphs to allow for more efficient computation and storage. Moreover, graph reduction aids in visualization and can provide structural insights.222For example, a common theme in biological systems is the presence of complicated pathways that produce a relatively simple result (eg, protein activation pathways). Semantic understanding comes from the reduction of these subsystems to their resulting behavior (eg, activates a chain that eventually inhibits ).

Graphs are defined by a set of nodes and a set of edges between them, and are often represented as an adjacency matrix333

A quick notation note: underlines denote bundling of multiple scalars into a single object, where the number of underlines denotes the rank of the tensor (ie, a single underline for vectors, double for matrices). Subscripts have nothing to do with Einstein summation notation; they are contextually-defined adornments.

-0.75pt  OcFFL with size and density . Reducing either of these quantities is advantageous; graph “coarsening” focuses on the former, aggregating nodes while respecting the overall structure, and graph “sparsification” on the latter, preferentially retaining the important edges.

A variety of algorithms have been proposed to attain these goals, and a frequently recurring theme is to consider the graph Laplacian , where -0.75pt  OcFFL is the diagonal matrix of node degrees. Indeed, it appears in a wide range of applications (enough to motivate the coining of the term “Laplacian paradigm”[6]). For example: its spectral properties can be leveraged for community detection [7]; it can be used to efficiently solve min cut/max flow problems [8]

; and for undirected, positively weighted graphs (the focus of our paper), it induces a natural quadratic form (which can be used, eg, to smoothly interpolate functions over the nodes

[9]).

Work on spectral graph sparsification focuses on the preservation of the Laplacian quadratic form , a popular measure of spectral similarity suggested by Spielman and collaborators [10]

. A key result is that any graph may be sparsified by subsampling its edges, with optimal probability proportional to their effective resistance

[11] (a measure of edge importance defined via the Laplacian pseudoinverse). Work on graph coarsening, however, has not reached a similar consensus. While it has been recently proposed to measure similarity between the original and coarsened graph using an analogous restricted quadratic form [12], the objective functions used for coarsening do not explicitly optimize this quantity. For example, Jin & Jaja [13] use the lowest eigenvectors of the Laplacian (those corresponding to global structure) as feature vectors to perform -means clustering of the nodes, while Purohit et al [14]

aim to minimize the change to the largest eigenvalue of the adjacency matrix.

While recent work has combined coarsening and sparsification [15], they were performed by separate algorithmic primitives, essentially analyzing the composition of the above algorithms. In this work, we capture both of these operations using a single objective function that naturally promotes the preservation of large-scale structure. Our primary contributions include:

  1. An identification of infinite edge weight with edge contraction, highlighting the finite limit of the Laplacian pseudoinverse (Section 2);

  2. A graph reduction framework that unifies graph coarsening and sparsification by explicitly preserving (Sections 3 and 4);

  3. A measure of edge importance of independent interest, which could be used to analyze connections in real-world networks (Section 3);

  4. A more sensitive measure of spectral similarity of graphs (Section 5); and

  5. A demonstration that our framework preferentially preserves the large-scale structure (Section 5).

2 Why the Laplacian pseudoinverse?

Many computations over graphs involve solving for [6]. Thus, the (algebraically) relevant object is arguably the Laplacian pseudoinverse , and in fact it has been used to derive useful measures of graph similarity [16]. Moreover, its largest eigenvalues are associated to global structure, thus, preserving the action of will preferentially maintain the overall shape of the graph. We now describe why is a natural operator to consider for both graph coarsening and sparsification.

Attention is often restricted to undirected, positively weighted graphs [17]. These graphs have many convenient properties, eg, their Laplacians are positive semi-definite () and have a well-understood kernel and cokernel (). The edge weights are defined as a mapping . When the weights represent connection strength, it is generally understood that is equivalent to removing edge . However, the closure of the positive reals has a reciprocal limit, namely .

This limit is rarely considered, as it can lead to divergences in many classical notions of graph similarity. A relevant example is the standard notion of spectral approximation, defined as preserving the Laplacian quadratic form to within a factor of for all vectors . Clearly, this limit yields a graph that does not approximate the original for any : any with different values for the two nodes joined by edge now yields an infinite quadratic form. This suggests considering only vectors that have the same value for these two nodes, essentially contracting them into a single “supernode”.

Algebraically, this interpretation is reflected in , which remains finite in this limit: the pair of rows (and columns) corresponding to the contracted nodes becoming identical.

Physically, consider the behavior of the heat equation : as , the values on the two nodes immediately equilibrate between themselves, and remain tethered for the rest of the evolution.444In the spirit of another common analogy (edge weights as conductances of a network of resistors), breaking a resistor is equivalent to deleting that edge, while contraction amounts to completely soldering over it.

Geometrically, the reciprocal limits of and have dual interpretations: consider a planar graph and its planar dual; edge deletion in one graph corresponds to contraction in the other, and vice versa. This naturally extends to nonplanar graphs via their graphical matroid and its dual [18].

3 Our graph reduction framework

We provide a framework to construct probabilistic algorithms that generate a reduced graph from an initial graph , motivated by the following desirable properties:

  1. reduce the number of edges and nodes,

  2. preserve in expectation,

  3. minimize the change in .

In this section, we state these goals more formally, starting with a single iteration in its simplest form (ie, applied to a single, predetermined edge ).

3.1 Preserving the Laplacian pseudoinverse

Consider perturbing the weight of a single edge by . The change in the Laplacian is simply

where and are the original and perturbed Laplacians, respectively, and is the signed incidence (column) vector associated to edge , with entries

(1)

The change in is given by the Woodbury matrix identity555This expression is only officially applicable when the initial and final matrices are full-rank; additional care must be taken when they are not. However, as and share the same kernel, and is perpendicular to it, the original formula remains valid [19] (so long as the graph remains connected). [20]:

Notice that this change can be expressed as a matrix that depends only on the choice of edge , multiplied by a scalar term that depends (nonlinearly) on the perturbation to that edge:

where

(2)
(3)
(4)

Hence, if the probabilistic reweight of this edge is chosen such that , then we have , as desired. Importantly, remains finite in the following relevant limits:

(5)

3.2 Minimizing the error

Minimizing the magnitude of the change requires a choice of matrix norm, which we take to be the sum of the squares of the entries (ie, the square of the Frobenius norm). Our motivation is twofold. First, is the algebraically convenient fact that the Frobenius norm of a rank one matrix has a simple form, viz

(6)

Second, the square of this norm behaves as a variance; to the extent that the

associated to different edges can be approximated as (entrywise) uncorrelated, one can decompose multiple perturbations as follows:

(7)

which greatly simplifies the analysis when multiple reductions are considered (see Section 4).

3.3 Reducing edges and nodes

Depending on the application, one may desire to reduce the number of nodes (ie, coarsen), the number of edges (ie, sparsify), or both.

Let be the number of relevant items reduced during a particular iteration. When those items are nodes, then for a contraction, and for a deletion. When those items are edges, then for a deletion. However, for a contraction, is possible; if the contracted edge formed a triangle in the original graph, then the other two edges will become parallel in the reduced graph. With respect to the Laplacian, this is equivalent to a single edge, with weight given by the sum of these parallel edges. Thus, for a contraction, , where is the number of triangles in which the contracted edge participates.

3.4 A cost function for spectral graph reduction

Motivated by the previous discussions, we seek to minimize the following quantity:

(8)

subject to

(9)

where is a parameter that controls the tradeoff between items reduced and error incurred in the .

Let () be the probability of deleting (contracting) edge . If the probabilistic change to this edge results in neither deletion nor contraction, it will be reweighted (with probability ). Hence, the constraint (9) requires that these reweights satisfy

(10)

where we have used the limits in (5). Likewise, the cost function (8) for acting on edge becomes:

(11)

For a fixed and , is fixed by equation (10), and the inequality becomes an equality under minimization of (11). Thus, if an edge is to be reweighted, it will be changed by the unique satisfying

(12)

Clearly, the space of allowed solutions lies within the simplex , , . The additional constraint further implies that and . Hence, we substitute (12) into (11), and minimize it over this domain (given the parameters , , , and ).

For a given edge , there are three regimes for the solution, depending on the value of (see Table 1 and Table 2). For , the edge should not be perturbed. For , the optimal probabilistic action leads to either deletion or contraction. In the intermediate case (), if , the edge is either deleted or reweighted, and if , the edge is either contracted or reweighted.

,
,
Table 1: Actions that minimize . For a given edge , there are three regimes for the solution, depending on the value of . For the case when the target items to reduce are the nodes, set .
Table 2: Values of dividing the three regimes. Note that when the target items to reduce are the edges, the number of triangles enters into the expressions, and when they are the nodes, there is no deletion in the intermediate regime. However, regardless of the target items, both deletion and contraction may have finite probability (eg, if ).

3.5 Node weighted Laplacian

Often, when nodes are merged, one represents the connectivity by a matrix of smaller size. To properly compare its spectral properties with those of the original, one must keep track of the number of original nodes that make up these “supernodes” and assign them proportional weights. The appropriate Laplacian is then , where the -0.75pt  OcFFL are the diagonal matrices of node weights666We remark that the use of the random walk matrix is essentially using node degree as a surrogate for node weights. (commonly referred to as the “mass matrix” [21]) and edge weights, respectively, and -0.75pt  OcFFL is the signed incidence matrix.

Moreover, when updating the , one must be careful to choose the appropriate pseudoinverse for this Laplacian, which is given by

(13)
-0.75pt  OcFFL (14)

where is the vector of node weights.

4 Proposed algorithms for graph reduction

Using this framework, we now describe our graph reduction algorithms.

Similar to many graph coarsening algorithms [22, 23], our scheme obtains the reduced graph by acting on the initial graph (as opposed to building it up by adding edges to the empty graph, as in most sparsification algorithms [24, 25]). We first present a simple algorithm that reduces the graph in a single step. We then outline a more general multi-step scheme.

4.1 A single-step algorithm

Algorithm 1 describes the procedure for a single-step reduction. It assumes an appropriate choice of: , the fraction of edges to be considered for reduction; and , the parameter controlling the error.

1:  Inputs: graph , sample fraction , parameter
2:  Initialize
3:  Sample edges uniformly without replacement
4:  for (edge ) in  do
5:     Compute , (see equations (4) and (6))
6:     Calculate , , (see Tables 1 and 2)
7:     Probabilistically choose reweight, delete, or contract
8:  end for
9:  Perform reweights and deletions to
10:  Perform contractions to
11:  return reduced graph
Algorithm 1 ReduceGraph

Care must be taken, as multiple deletions or contractions may result in undesirable behavior. For example, while any edge that is itself a cut-set will never be deleted, a collection of edges that together make a cut-set might all have finite deletion probability. Hence, if multiple edges are simultaneously deleted, the graph could become disconnected. Conversely, this algorithm could underestimate the change in the inverse associated with simultaneous contractions. For example, consider two important nodes both connected to the same unimportant node: contracting the unimportant one into either of the other two would be fine, but performing both contractions would merge the two important nodes.

We now present a more general multi-step scheme, and a conservative limit that eliminates these issues.

4.2 A multi-step scheme

Algorithm 2 describes our general multi-step scheme. Its inputs are: , the original graph; , the fraction of edges to be sampled each iteration; , the minimum expected decrease in target items per perturbed edge; , the fraction of sampled edges to be acted upon; and StopCriterion, a user-defined function.

With these inputs, we select implicitly. Let be the minimum such that for edge . Each iteration, we compute the for all sampled edges, and choose a such that a fraction of them have . We then apply the corresponding probabilistic actions to this subset of sampled edges.

1:  Inputs: graph , sample fraction , minimum per edge perturbed , fraction of sampled edges to perturb , some StopCriterion
2:  Initialize , ,
3:  while not (stopdo
4:     Sample edges uniformly without replacement
5:     for (edge ) in  do
6:        Compute , (see equations (4) and (6))
7:        Evaluate (see Tables 1 and 2)
8:     end for
9:     Choose according to
10:     Probabilistically choose reweight, delete, or contract for each edge
11:     Perform reweights and deletions to
12:     Perform contractions to
13:     
14:     
15:  end while
16:  return reduced graph
Algorithm 2 ReduceGraphMulti

We note that , , and could vary as a function of the iteration number, and an appropriate choice could lead to an improved tradeoff between accuracy and running time.

However, we set such that only the sampled edge with the lowest is acted upon (ie, ). While likely too conservative, this choice avoids the problems associated with simultaneous reductions. Additionally, we choose to compute at the onset and update it using the Woodbury matrix identity.

5 Empirical results

In this section, we validate our framework and compare it with existing algorithms. We start by considering two natural limits of our general framework, namely graph sparsification (removing regimes involving edge contraction), and graph coarsening (where the goal is to reduce the number of nodes).

5.1 Comparison with spectral graph sparsification

Figure 1 compares our algorithm (targeting reduction of edges and not considering contraction) with the spectral sparsification algorithm from [11]. We consider the stochastic block model, as there is a clear separation of the eigenvectors associated to the global structure (ie, the communities) and the bulk of the spectrum. Note that these algorithms have different objectives (preserving the and -0.75pt  OcFFL, respectively), and both accomplish their desired goal.

Figure 1: Validation of our graph reduction algorithm and comparison with spectral sparsification. We apply our algorithm without contraction (in red) and the spectral sparsification algorithm by Spielman et al [11] (in blue) to a single instance of the symmetric stochastic block model ( nodes, communities, and intra- and inter-community connection probabilities and , respectively) several times. We combine the nontrivial eigenvectors (“global structure”), centering the shading at their collective mean, ranging of the change from their individual initial values. We combine the remaining eigenvectors in a similar way (“local details”). Upper left shows quadratic form using the Laplacian pseudoinverse, and upper right the standard quadratic form. Note that the upward bias of the reciprocal metric is expected for both algorithms (as

for any random variable

). A more discriminating measure is shown in the bottom plots, and demonstrates that our algorithm preferentially preserves the global structure (bottom left).

5.2 Comparison with spectral graph coarsening

Figure 2 compares our algorithm (targeting reduction of nodes) with a proposed method for spectral graph coarsening [15, 13], which performs -means clustering on the nodes, using as features the lowest (nonzero) eigenvectors of the Laplacian. Their reduced Laplacian is given by