Spectral Subspace Sparsification

10/07/2018 ∙ by Huan Li, et al. ∙ berkeley college FUDAN University 0

We introduce a new approach to spectral sparsification that approximates the quadratic form of the pseudoinverse of a graph Laplacian restricted to a subspace. We show that sparsifiers with a near-linear number of edges in the dimension of the subspace exist. Our setting generalizes that of Schur complement sparsifiers. Our approach produces sparsifiers by sampling a uniformly random spanning tree of the input graph and using that tree to guide an edge elimination procedure that contracts, deletes, and reweights edges. In the context of Schur complement sparsifiers, our approach has two benefits over prior work. First, it produces a sparsifier in almost-linear time with no runtime dependence on the desired error. We directly exploit this to compute approximate effective resistances for a small set of vertex pairs in faster time than prior work (Durfee-Kyng-Peebles-Rao-Sachdeva '17). Secondly, it yields sparsifiers that are reweighted minors of the input graph. As a result, we give a near-optimal answer to a variant of the Steiner point removal problem. A key ingredient of our algorithm is a subroutine of independent interest: a near-linear time algorithm that, given a chosen set of vertices, builds a data structure from which we can query a multiplicative approximation to the decrease in the effective resistance between two vertices after identifying all vertices in the chosen set to a single vertex with inverse polynomial additional additive error in near-constant time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph sparsification has had a number of applications throughout algorithms and theoretical computer science. In this work, we loosen the requirements of spectral sparsification and show that this loosening enables us to obtain sparsifiers with fewer edges. Specifically, instead of requiring that the Laplacian pseudoinverse quadratic form is approximated for every vector, we just require that the sparsifier approximates the Laplacian pseudoinverse quadratic form on a subspace:

Definition 1.1 (Spectral subspace sparsifiers).

Consider a weighted graph , a vector space that is orthogonal to , and . For a minor of with contraction map , let be a matrix with for all . A reweighted minor of is called an -spectral subspace sparsifier if for all vectors ,

where .

[KMST10] also considers a form of specific form of subspace sparsification related to controlling the

smallest eigenvalues of a spectral sparsifier for

. When is the dimension subspace of that is orthogonal to , a -spectral subspace sparsifier is a sparsifier for the Schur complement of restricted to the set of vertices . Schur complement sparsifiers are implicitly constructed in [KS16] and [KLP16] by an approximate Gaussian elimination procedure and have been used throughout spectral graph theory. For example, they are used in algorithms for random spanning tree generation [DKP17, DPPR17], approximate maximum flow [MP13], and effective resistance computation [GHP18, GHP17, DKP17].

Unlike the existing construction of Schur complement sparsifiers [DKP17], our algorithm (a) produces a sparsifier with vertices outside of and (b) produces a sparsifier that is a minor of the input graph. While (a) is a disadvantage to our approach, it is not a problem in applications, in which the number of edges in the sparsifier is the most relevant feature for performance, as illustrated by our almost-optimal algorithm for -approximate effective resistance computation. (b) is an additional benefit to our construction and connects to the well-studied class of Steiner point removal problems [CGH16, EGK14].

In the Approximate Terminal Distance Preservation problem [CGH16], one is given a graph and a set of vertices . One is asked find a reweighted minor of with size for which

for all and some small distortion . The fact that is a minor of is particularly useful in the context of planar graphs. One can equivalently phrase this problem as a problem of finding a minor in which the -norm of the -minimizing flow between any two vertices is within an -factor of the norm of the -minimizing flow in . The analogous problem for norms is the problem of constructing a flow sparsifier (with non- demands as well). Despite much work on flow sparsifiers [Moi09, LM10, CLLM10, MM10, EGK14, Chu12, AGK14, RST14], it is still not known whether -flow sparsifiers with size exist, even when the sparsifier is not a minor of the original graph.

1.1 Our Results

Our main result is the following:

Theorem 1.2.

Consider a weighted graph , a -dimensional vector space , and . Then an -spectral subspace sparsifier for with edges exists.

When is the maximal subspace of orthogonal to for some set of vertices , -spectral subspace sparsifiers satisfy the same approximation guarantee as Schur complement sparsifiers. The approximation guarantee of a spectral subspace sparsifier of is equivalent to saying that for any demand vector , the energy of the -minimizing flow for in is within a factor of the energy for the -minimizing flow for in . This yields an near-optimal (up to a factor) answer to the -approximate Steiner vertex removal problem for the norm. The version is substantially different from the problem, in which there do not exist -size minors that 2-approximate all terminal distances [CGH16].

Unlike Schur complement sparsifiers, -spectral subspace sparsifiers may contain “Steiner nodes;” i.e. vertices outside of . This is generally not relevant in applications, as we illustrate in Section 6. Allowing Steiner nodes allows us to obtain sparsifiers with fewer edges, which in turn allows us to obtain faster constructions. Specifically, we show the following result:

Theorem 1.3.

Consider a weighted graph , a set of vertices , and . Let denote the time it takes to generate a random spanning tree from a distribution with total variation distance at most

from the uniform distribution. Then

-spectral subspace sparsifier for with edges can be constructed in time.

This sparsifier has as many edges as the Schur complement sparsifier given in [DKP17] but improves on their runtime. An important ingredient in the above construction is an algorithm for multiplicatively approximating changes in effective resistances due to certain modifications of . This algorithm is called with in this paper:

Lemma 1.4.

Consider a weighted graph , a set of vertices , and . There is an -time algorithm that outputs numbers for all with the guarantee that

Finally, we replace the use of Theorem 6.1 in [DKP17] with our Theorem 1.3 in their improvement to Johnson-Lindenstrauss to obtain a faster algorithm:

Corollary 1.5.

Consider a weighted graph , a set of pairs of vertices , and an . There is an -time algorithm that outputs -multiplicative approximations to the quantities

for all pairs .

This directly improves upon the algorithm in [DKP17], which takes -time.

1.2 Technical Overview

To construct Schur complement sparsifiers, [DKP17] eliminates vertices one-by-one and sparsifies the cliques resulting from those eliminations. This approach is fundamentally limited in that each clique sparsification takes time in general. Furthermore, in the vertex star graph with vertices connected to a single vertex , a -approximate Schur complement sparsifier without Steiner vertices for the set must contain edges. As a result, it seems difficult to obtain Schur complement sparsifiers in time less than time using vertex elimination.

Instead, we eliminate edges from a graph by contracting or deleting them. Edge elimination has the attractive feature that, unlike vertex elimination, it always reduces the number of edges. Start by letting . To eliminate an edge from the current graph , sample

for some probability

depending on , contract if , and delete if .

To analyze the sparsifier produced by this procedure, we set up a matrix-valued martingale and reduce the problem to bounding the maximum and minimum eigenvalues of a random matrix with expectation equal to the identity matrix. The right value for

for preserving this matrix in expectation turns out to be the probability that a uniformly random spanning tree of contains the edge

. To bound the variance of the martingale, one can use the Sherman-Morrison rank one update formula to bound the change in

due to contracting or deleting the edge . When doing this, one sees that the maximum change in eigenvalue is at most a constant times

where is the probability that is in a uniformly random spanning tree of . This quantity is naturally viewed as the quotient of two quantities:

  1. The maximum fractional energy contribution of to any demand vector in ’s electrical flow.

  2. The minimum of the probabilities that is in or is not in a uniformly random spanning tree of .

We now make the edge elimination algorithm more specific to bound these two quantities. Quantity (a) is small on average over all edges in (see Proposition 3.9), so choosing the lowest-energy edge yields a good bound on the maximum change. To get a good enough bound on the stepwise martingale variance, it suffices to sample an edge uniformly at random from the half of edges with lowest energy. Quantity (b) is often not bounded away from 0, but can be made so by modifying the sampling procedure. Instead of contracting or deleting the edge , start by splitting it into two parallel edges with double the resistance or two series edges with half the resistance, depending on whether or not . Then, pick one of the halves , contract it with probability , or delete it otherwise. This produces a graph in which the edge is either contracted, deleted, or reweighted. This procedure suffices for proving our main existence result (Theorem 1.2). This technique is similar to the technique used to prove Lemma 1.4 of [Sch17].

While the above algorithm does take polynomial time, it does not take almost-linear time. We can accelerate it by batching edge eliminations together using what we call steady oracles. The contraction/deletion/reweight decisions for edges in during each batch can be made by sampling just one -approximate uniformly random spanning tree, which takes time. The main remaining difficulty is finding a large set of edges for which quantity (a) does not change much over the course of many edge contractions/deletions. To show the existence of such a set, we exploit electrical flow localization [SRS17]. To find this set, we use matrix sketching and a new primitive for approximating the change in leverage score due to the identification of some set of vertices (Lemma 1.4), which may be of independent interest. The primitive for approximating the change works by writing the change in an Euclidean norm, reducing the dimension by Johnson-Lindenstrauss Lemma, and then computing the embedding by Fast Laplacian Solvers in near-linear time.

We conclude by briefly discussing why localization is relevant for showing that quantity (a) does not change over the course of many iterations. The square root of the energy contribution of an edge to ’s electrical flow after deleting an edge is

by Sherman-Morrison. In particular, the new energy on is at most the old energy plus some multiple of the energy on the deleted edge . By [SRS17], the average value of this multiplier over all edges and is , which means that the algorithm can do edge deletions/contractions without seeing the maximum energy on edges change by more than a factor of 2.

Acknowledgements. We thank Richard Peng, Jason Li, and Gramoz Goranci for helpful discussions.

2 Preliminaries

2.1 Graphs and Laplacians

For a graph and a subset of vertices , let denote the graph obtained by identifying to a single vertex . Specifically, for any edge in , replace each endpoint with and do not change any endpoint not in . Then, remove all self-loops to obtain .

Let be a weighted undirected graph with vertices, edges, and edge weights . The Laplacian of is an matrix given by:

We define edge resistances by for all .

If we orient every edge arbitrarily, we can define the signed edge-vertex incidence matrix by

Then we can write as , where is a diagonal matrix with .

For vertex sets , denotes the submatrix of with row indices in and column indices in .

is always positive semidefinite, and only has one zero eigenvalue if is connected. For a connected graph , let be the eigenvalues of . Let

be the corresponding set of orthonormal eigenvectors. Then, we can diagonalize

and write

The pseudoinverse of is then given by

In the rest of the paper, we will write to denote the smallest eigenvalue and to denote the largest eigenvalue. We will also write

to denote the largest singular value, which is given by

for any matrix .

We will also need to use Schur complements which are defined as follows:

Definition 2.1 (Schur Complements).

The Schur complement of a graph onto a subset of vertices , denoted by or , is defined as

where .

The fact below relates Schur complements to the inverse of graph Laplacian:

Fact 2.2 (see, e.g., Fact 5.4 in [Dkp17]).

For any graph and ,

where denotes the identity matrix, and denotes the matrix whose entries are all .

2.2 Leverage scores and rank one updates

For a graph and an edge , let denote the signed indicator vector of the edge ; that is the vector with on one endpoint, 1 on the other, and 0 everywhere else. Define the leverage score of to be the quantity

Let be two vectors with . Then the following results hold by the Sherman-Morrison rank 1 update formula:

Proposition 2.3.

For a graph and an edge , let denote the graph with deleted. Then

Proposition 2.4.

For a graph and an edge , let denote the graph with contracted. Then

2.3 Random spanning trees

We use the following result on uniform random spanning tree generation:

Theorem 2.5 (Theorem 1.2 of [Sch17]).

Given a weighted graph with edges, a random spanning tree of can be sampled from a distribution with total variation distance at most from the uniform distribution in time .

Let denote the uniform distribution over spanning trees of . We also use the following classic result:

Theorem 2.6 ([Kir47]).

For any edge , .

For an edge , let denote a random graph obtained by contracting with probability and deleting otherwise.

2.4 Some useful bounds and tools

We now describe some useful bounds/tools we will need in our algorithms. In all the following bounds, we define the quantities and as follows:

The following lemma bounds the range of eigenvalues for Laplacians and SDDM matrices:

Lemma 2.7.

For any Laplacian and ,

(1)
(2)
(3)
Proof.

Defered to Appendix A. ∎

The lemma below gives upper bounds on the largest eigenvalues/singular values for some useful matrices:

Lemma 2.8.

The following upper bounds on the largest singular values/eigenvalues hold:

(4)
(5)
(6)

where .

Proof.

Defered to Appendix B. ∎

We will need to invoke Fast Laplacian Solvers to apply the inverse of a Laplacian of an SDDM matrix. The following lemma characterizes the performance of Fast Laplacian Solvers:

Lemma 2.9 (Fast Laplacian Solver [St14, Ckm14]).

There is an algorithm which takes a matrix either a Laplacian or an SDDM matrix with nonzero entries, a vector , and an error parameter , and returns a vector such that

holds with high probability, where , , and denotes the pseudoinverse of when is a Laplacian. The algorithm runs in time .

The following lemmas show how to bound the errors of Fast Laplacian Solvers in terms of norms, which follows directly from the bounds on Laplacian eigenvalues in Lemma 2.7:

Lemma 2.10.

For any Laplacian , vectors both orthogal to , and real number satifiying

the following statement holds:

Proof.

Defered to Appendix C. ∎

Lemma 2.11.

For any Laplacian , , vectors , and real number satifiying

where , the following statement holds:

Proof.

Defered to Appendix C. ∎

When computing the changes in effective resistances due to the identification of a given vertex set (i.e. merging vertices in that set and deleting any self loops formed), we will need to use Johnson-Lindenstrauss lemma to reduce dimensions:

Lemma 2.12 (Johnson-Lindenstrauss Lemma [Jl84, Ach01]).

Let be fixed vectors and be a real number. Let be a positive integer such that and be a random matrix. With high probability, the following statement holds for any :

3 Existence of sparsifiers

In this section, we reduce the construction of spectral subspace sparsifiers to an oracle that outputs edges that have low energy with respect to every demand vector in the chosen subspace . We prove it by splitting and conditioning on edges being present in a uniformly random spanning tree one-by-one until edges are left. This construction is a high-dimensional generalization of the construction given in Section 10.1 of [Sch17]. We use the following matrix concentration inequality:

Theorem 3.1 (Matrix Freedman Inequality applied to symmetric matrices [Tro11]).

Consider a matrix martingale whose values are symmetric matrices with dimension , and let be the difference sequence . Assume that the difference sequence is uniformly bounded in the sense that

almost surely for . Define the predictable quadratic variation process of the martingale:

Then, for all and ,

Now, we give an algorithm that proves Theorem 1.2. The algorithm simply splits and conditions on the edge that minimizes the martingale difference repeatedly until there are too few edges left. For efficiency purposes, receives martingale-difference-minimizing edges from a steady oracle with the additional guarantee that differences remain small after many edge updates. This oracle is similar to the stable oracles given in Section 10 of [Sch17].

Definition 3.2 (Steady oracles).

A -steady oracle is a function that takes in a graph and a subspace that satisfy the following condition:

  • (Leverage scores) For all , .

and outputs a set . Let and for each , obtain by picking a uniformly random edge , arbitrarily letting or , and letting . satisfies the following guarantees with high probability for all :

  • (Size of )

  • (Leverage score stability)

  • (Martingale change stability)

We now state the main result of this section:

Lemma 3.3.

Consider a weighted graph , a -dimensional vector space , and . There is an algorithm that, given access to a -steady-oracle , computes a -spectral subspace sparsifier for with

edges in time

where is the time required to generate a spanning tree of from a distribution with total variation distance from uniform and is the runtime of the oracle.

The algorithm will use two simple subroutines that modify the graph by splitting edges. Split replaces each edge with approximate leverage score less than 1/2 with a two-edge path and each edge with approximate leverage score greater than 1/2 with two parallel edges. Unsplit reverses this split for all pairs that remain in the graph. We prove the following two results about this subroutines in the appendix:

Proposition 3.4.

There is a linear-time algorithm that, given a graph , produces a graph with and a set of pairs of edges with the following additional guarantees:

  • (Electrical equivalence) For all that are supported on , .

  • (Bounded leverage scores) For all ,

  • ( description) Every edge in is in exactly one pair in . Furthermore, there is a bijection between pairs and edges for which either (a) and have the same endpoint pair or (b) , , and for some degree 2 vertex .

Proposition 3.5.

There is a linear-time algorithm that, given a graph and a set of pairs of edges in , produces a minor with and the following additional guarantees:

  • (Electrical equivalence) For all that are supported on , .

  • (Edges of ) There is a surjective map from non-self-loop,non-leaf edges of such that for any pair , . Furthermore, for each , either (a) , (b) , with and having the same endpoints as or (c) , with and , and for a degree 2 vertex .

Input: A weighted graph , a vector space , , and (implicitly) a -steady oracle
Output: A -spectral subspace sparsifier for
1 while  do
2       a uniformly random subset of with size a spanning tree of drawn from a distribution with TV distance from uniform the graph with all edges in contracted and all edges in deleted
3 end while
return
Algorithm 1

We analyze the approximation guarantees of by setting up two families of matrix-valued martingales. In all of the proof besides the final “Proof of Lemma 3.3,” we sample from the uniform distribution rather than from a distribution with total variation distance from uniform. We bound the error incurred from doing this in the final “Proof of Lemma 3.3.”

We start by defining the first family, which just consists of one martingale. Let and let be the graph between iterations and of the while loop of SubspaceSparsifier. Let . Since is orthogonal to , , which means that has a basis for which for all and for all . Let be the matrix with th column and let . Let . Since the s form a basis of , there is a vector for which for any . Furthermore, for any . In particular,

so it suffices to show that for all , where is the number of while loop iterations.

In order to bound the change between and , we introduce a second family of martingales consisting of one martingale for each while loop iteration. Let during the th iteration of the while loop in SubspaceSparsifier. Generate in during iteration of the while loop by sampling a sequence of edges without replacement from . Let for all . For a vector , let be the vector with for and for . For and , let . Let be the matrix with th column . Let . For any , , and , . In particular,

Next, we write an equivalent formulation for the steady oracle “Martingale change stability” guarantee that is easier to analyze:

Proposition 3.6.
Proof.

Notice that