1 Introduction
Graph sparsification has had a number of applications throughout algorithms and theoretical computer science. In this work, we loosen the requirements of spectral sparsification and show that this loosening enables us to obtain sparsifiers with fewer edges. Specifically, instead of requiring that the Laplacian pseudoinverse quadratic form is approximated for every vector, we just require that the sparsifier approximates the Laplacian pseudoinverse quadratic form on a subspace:
Definition 1.1 (Spectral subspace sparsifiers).
Consider a weighted graph , a vector space that is orthogonal to , and . For a minor of with contraction map , let be a matrix with for all . A reweighted minor of is called an spectral subspace sparsifier if for all vectors ,
where .
[KMST10] also considers a form of specific form of subspace sparsification related to controlling the
smallest eigenvalues of a spectral sparsifier for
. When is the dimension subspace of that is orthogonal to , a spectral subspace sparsifier is a sparsifier for the Schur complement of restricted to the set of vertices . Schur complement sparsifiers are implicitly constructed in [KS16] and [KLP16] by an approximate Gaussian elimination procedure and have been used throughout spectral graph theory. For example, they are used in algorithms for random spanning tree generation [DKP17, DPPR17], approximate maximum flow [MP13], and effective resistance computation [GHP18, GHP17, DKP17].Unlike the existing construction of Schur complement sparsifiers [DKP17], our algorithm (a) produces a sparsifier with vertices outside of and (b) produces a sparsifier that is a minor of the input graph. While (a) is a disadvantage to our approach, it is not a problem in applications, in which the number of edges in the sparsifier is the most relevant feature for performance, as illustrated by our almostoptimal algorithm for approximate effective resistance computation. (b) is an additional benefit to our construction and connects to the wellstudied class of Steiner point removal problems [CGH16, EGK14].
In the Approximate Terminal Distance Preservation problem [CGH16], one is given a graph and a set of vertices . One is asked find a reweighted minor of with size for which
for all and some small distortion . The fact that is a minor of is particularly useful in the context of planar graphs. One can equivalently phrase this problem as a problem of finding a minor in which the norm of the minimizing flow between any two vertices is within an factor of the norm of the minimizing flow in . The analogous problem for norms is the problem of constructing a flow sparsifier (with non demands as well). Despite much work on flow sparsifiers [Moi09, LM10, CLLM10, MM10, EGK14, Chu12, AGK14, RST14], it is still not known whether flow sparsifiers with size exist, even when the sparsifier is not a minor of the original graph.
1.1 Our Results
Our main result is the following:
Theorem 1.2.
Consider a weighted graph , a dimensional vector space , and . Then an spectral subspace sparsifier for with edges exists.
When is the maximal subspace of orthogonal to for some set of vertices , spectral subspace sparsifiers satisfy the same approximation guarantee as Schur complement sparsifiers. The approximation guarantee of a spectral subspace sparsifier of is equivalent to saying that for any demand vector , the energy of the minimizing flow for in is within a factor of the energy for the minimizing flow for in . This yields an nearoptimal (up to a factor) answer to the approximate Steiner vertex removal problem for the norm. The version is substantially different from the problem, in which there do not exist size minors that 2approximate all terminal distances [CGH16].
Unlike Schur complement sparsifiers, spectral subspace sparsifiers may contain “Steiner nodes;” i.e. vertices outside of . This is generally not relevant in applications, as we illustrate in Section 6. Allowing Steiner nodes allows us to obtain sparsifiers with fewer edges, which in turn allows us to obtain faster constructions. Specifically, we show the following result:
Theorem 1.3.
Consider a weighted graph , a set of vertices , and . Let denote the time it takes to generate a random spanning tree from a distribution with total variation distance at most
from the uniform distribution. Then
spectral subspace sparsifier for with edges can be constructed in time.This sparsifier has as many edges as the Schur complement sparsifier given in [DKP17] but improves on their runtime. An important ingredient in the above construction is an algorithm for multiplicatively approximating changes in effective resistances due to certain modifications of . This algorithm is called with in this paper:
Lemma 1.4.
Consider a weighted graph , a set of vertices , and . There is an time algorithm that outputs numbers for all with the guarantee that
Finally, we replace the use of Theorem 6.1 in [DKP17] with our Theorem 1.3 in their improvement to JohnsonLindenstrauss to obtain a faster algorithm:
Corollary 1.5.
Consider a weighted graph , a set of pairs of vertices , and an . There is an time algorithm that outputs multiplicative approximations to the quantities
for all pairs .
This directly improves upon the algorithm in [DKP17], which takes time.
1.2 Technical Overview
To construct Schur complement sparsifiers, [DKP17] eliminates vertices onebyone and sparsifies the cliques resulting from those eliminations. This approach is fundamentally limited in that each clique sparsification takes time in general. Furthermore, in the vertex star graph with vertices connected to a single vertex , a approximate Schur complement sparsifier without Steiner vertices for the set must contain edges. As a result, it seems difficult to obtain Schur complement sparsifiers in time less than time using vertex elimination.
Instead, we eliminate edges from a graph by contracting or deleting them. Edge elimination has the attractive feature that, unlike vertex elimination, it always reduces the number of edges. Start by letting . To eliminate an edge from the current graph , sample
for some probability
depending on , contract if , and delete if .To analyze the sparsifier produced by this procedure, we set up a matrixvalued martingale and reduce the problem to bounding the maximum and minimum eigenvalues of a random matrix with expectation equal to the identity matrix. The right value for
for preserving this matrix in expectation turns out to be the probability that a uniformly random spanning tree of contains the edge. To bound the variance of the martingale, one can use the ShermanMorrison rank one update formula to bound the change in
due to contracting or deleting the edge . When doing this, one sees that the maximum change in eigenvalue is at most a constant timeswhere is the probability that is in a uniformly random spanning tree of . This quantity is naturally viewed as the quotient of two quantities:

The maximum fractional energy contribution of to any demand vector in ’s electrical flow.

The minimum of the probabilities that is in or is not in a uniformly random spanning tree of .
We now make the edge elimination algorithm more specific to bound these two quantities. Quantity (a) is small on average over all edges in (see Proposition 3.9), so choosing the lowestenergy edge yields a good bound on the maximum change. To get a good enough bound on the stepwise martingale variance, it suffices to sample an edge uniformly at random from the half of edges with lowest energy. Quantity (b) is often not bounded away from 0, but can be made so by modifying the sampling procedure. Instead of contracting or deleting the edge , start by splitting it into two parallel edges with double the resistance or two series edges with half the resistance, depending on whether or not . Then, pick one of the halves , contract it with probability , or delete it otherwise. This produces a graph in which the edge is either contracted, deleted, or reweighted. This procedure suffices for proving our main existence result (Theorem 1.2). This technique is similar to the technique used to prove Lemma 1.4 of [Sch17].
While the above algorithm does take polynomial time, it does not take almostlinear time. We can accelerate it by batching edge eliminations together using what we call steady oracles. The contraction/deletion/reweight decisions for edges in during each batch can be made by sampling just one approximate uniformly random spanning tree, which takes time. The main remaining difficulty is finding a large set of edges for which quantity (a) does not change much over the course of many edge contractions/deletions. To show the existence of such a set, we exploit electrical flow localization [SRS17]. To find this set, we use matrix sketching and a new primitive for approximating the change in leverage score due to the identification of some set of vertices (Lemma 1.4), which may be of independent interest. The primitive for approximating the change works by writing the change in an Euclidean norm, reducing the dimension by JohnsonLindenstrauss Lemma, and then computing the embedding by Fast Laplacian Solvers in nearlinear time.
We conclude by briefly discussing why localization is relevant for showing that quantity (a) does not change over the course of many iterations. The square root of the energy contribution of an edge to ’s electrical flow after deleting an edge is
by ShermanMorrison. In particular, the new energy on is at most the old energy plus some multiple of the energy on the deleted edge . By [SRS17], the average value of this multiplier over all edges and is , which means that the algorithm can do edge deletions/contractions without seeing the maximum energy on edges change by more than a factor of 2.
Acknowledgements. We thank Richard Peng, Jason Li, and Gramoz Goranci for helpful discussions.
Contents
 1 Introduction
 2 Preliminaries
 3 Existence of sparsifiers
 4 Fast oracle
 5 Efficient approximation of differences
 6 Better effective resistance approximation
 A Bounds on eigenvalues of Laplacians and SDDM matrices
 B Bounds on 2norms of some useful matrices
 C Bounds on errors of LaplSolve using norms
 D Split subroutines
2 Preliminaries
2.1 Graphs and Laplacians
For a graph and a subset of vertices , let denote the graph obtained by identifying to a single vertex . Specifically, for any edge in , replace each endpoint with and do not change any endpoint not in . Then, remove all selfloops to obtain .
Let be a weighted undirected graph with vertices, edges, and edge weights . The Laplacian of is an matrix given by:
We define edge resistances by for all .
If we orient every edge arbitrarily, we can define the signed edgevertex incidence matrix by
Then we can write as , where is a diagonal matrix with .
For vertex sets , denotes the submatrix of with row indices in and column indices in .
is always positive semidefinite, and only has one zero eigenvalue if is connected. For a connected graph , let be the eigenvalues of . Let
be the corresponding set of orthonormal eigenvectors. Then, we can diagonalize
and writeThe pseudoinverse of is then given by
In the rest of the paper, we will write to denote the smallest eigenvalue and to denote the largest eigenvalue. We will also write
to denote the largest singular value, which is given by
for any matrix .
We will also need to use Schur complements which are defined as follows:
Definition 2.1 (Schur Complements).
The Schur complement of a graph onto a subset of vertices , denoted by or , is defined as
where .
The fact below relates Schur complements to the inverse of graph Laplacian:
Fact 2.2 (see, e.g., Fact 5.4 in [Dkp17]).
For any graph and ,
where denotes the identity matrix, and denotes the matrix whose entries are all .
2.2 Leverage scores and rank one updates
For a graph and an edge , let denote the signed indicator vector of the edge ; that is the vector with on one endpoint, 1 on the other, and 0 everywhere else. Define the leverage score of to be the quantity
Let be two vectors with . Then the following results hold by the ShermanMorrison rank 1 update formula:
Proposition 2.3.
For a graph and an edge , let denote the graph with deleted. Then
Proposition 2.4.
For a graph and an edge , let denote the graph with contracted. Then
2.3 Random spanning trees
We use the following result on uniform random spanning tree generation:
Theorem 2.5 (Theorem 1.2 of [Sch17]).
Given a weighted graph with edges, a random spanning tree of can be sampled from a distribution with total variation distance at most from the uniform distribution in time .
Let denote the uniform distribution over spanning trees of . We also use the following classic result:
Theorem 2.6 ([Kir47]).
For any edge , .
For an edge , let denote a random graph obtained by contracting with probability and deleting otherwise.
2.4 Some useful bounds and tools
We now describe some useful bounds/tools we will need in our algorithms. In all the following bounds, we define the quantities and as follows:
The following lemma bounds the range of eigenvalues for Laplacians and SDDM matrices:
Lemma 2.7.
For any Laplacian and ,
(1)  
(2)  
(3) 
Proof.
Defered to Appendix A. ∎
The lemma below gives upper bounds on the largest eigenvalues/singular values for some useful matrices:
Lemma 2.8.
The following upper bounds on the largest singular values/eigenvalues hold:
(4)  
(5)  
(6) 
where .
Proof.
Defered to Appendix B. ∎
We will need to invoke Fast Laplacian Solvers to apply the inverse of a Laplacian of an SDDM matrix. The following lemma characterizes the performance of Fast Laplacian Solvers:
Lemma 2.9 (Fast Laplacian Solver [St14, Ckm14]).
There is an algorithm which takes a matrix either a Laplacian or an SDDM matrix with nonzero entries, a vector , and an error parameter , and returns a vector such that
holds with high probability, where , , and denotes the pseudoinverse of when is a Laplacian. The algorithm runs in time .
The following lemmas show how to bound the errors of Fast Laplacian Solvers in terms of norms, which follows directly from the bounds on Laplacian eigenvalues in Lemma 2.7:
Lemma 2.10.
For any Laplacian , vectors both orthogal to , and real number satifiying
the following statement holds:
Proof.
Defered to Appendix C. ∎
Lemma 2.11.
For any Laplacian , , vectors , and real number satifiying
where , the following statement holds:
Proof.
Defered to Appendix C. ∎
When computing the changes in effective resistances due to the identification of a given vertex set (i.e. merging vertices in that set and deleting any self loops formed), we will need to use JohnsonLindenstrauss lemma to reduce dimensions:
3 Existence of sparsifiers
In this section, we reduce the construction of spectral subspace sparsifiers to an oracle that outputs edges that have low energy with respect to every demand vector in the chosen subspace . We prove it by splitting and conditioning on edges being present in a uniformly random spanning tree onebyone until edges are left. This construction is a highdimensional generalization of the construction given in Section 10.1 of [Sch17]. We use the following matrix concentration inequality:
Theorem 3.1 (Matrix Freedman Inequality applied to symmetric matrices [Tro11]).
Consider a matrix martingale whose values are symmetric matrices with dimension , and let be the difference sequence . Assume that the difference sequence is uniformly bounded in the sense that
almost surely for . Define the predictable quadratic variation process of the martingale:
Then, for all and ,
Now, we give an algorithm that proves Theorem 1.2. The algorithm simply splits and conditions on the edge that minimizes the martingale difference repeatedly until there are too few edges left. For efficiency purposes, receives martingaledifferenceminimizing edges from a steady oracle with the additional guarantee that differences remain small after many edge updates. This oracle is similar to the stable oracles given in Section 10 of [Sch17].
Definition 3.2 (Steady oracles).
A steady oracle is a function that takes in a graph and a subspace that satisfy the following condition:

(Leverage scores) For all , .
and outputs a set . Let and for each , obtain by picking a uniformly random edge , arbitrarily letting or , and letting . satisfies the following guarantees with high probability for all :

(Size of )

(Leverage score stability)

(Martingale change stability)
We now state the main result of this section:
Lemma 3.3.
Consider a weighted graph , a dimensional vector space , and . There is an algorithm that, given access to a steadyoracle , computes a spectral subspace sparsifier for with
edges in time
where is the time required to generate a spanning tree of from a distribution with total variation distance from uniform and is the runtime of the oracle.
The algorithm will use two simple subroutines that modify the graph by splitting edges. Split replaces each edge with approximate leverage score less than 1/2 with a twoedge path and each edge with approximate leverage score greater than 1/2 with two parallel edges. Unsplit reverses this split for all pairs that remain in the graph. We prove the following two results about this subroutines in the appendix:
Proposition 3.4.
There is a lineartime algorithm that, given a graph , produces a graph with and a set of pairs of edges with the following additional guarantees:

(Electrical equivalence) For all that are supported on , .

(Bounded leverage scores) For all ,

( description) Every edge in is in exactly one pair in . Furthermore, there is a bijection between pairs and edges for which either (a) and have the same endpoint pair or (b) , , and for some degree 2 vertex .
Proposition 3.5.
There is a lineartime algorithm that, given a graph and a set of pairs of edges in , produces a minor with and the following additional guarantees:

(Electrical equivalence) For all that are supported on , .

(Edges of ) There is a surjective map from nonselfloop,nonleaf edges of such that for any pair , . Furthermore, for each , either (a) , (b) , with and having the same endpoints as or (c) , with and , and for a degree 2 vertex .
We analyze the approximation guarantees of by setting up two families of matrixvalued martingales. In all of the proof besides the final “Proof of Lemma 3.3,” we sample from the uniform distribution rather than from a distribution with total variation distance from uniform. We bound the error incurred from doing this in the final “Proof of Lemma 3.3.”
We start by defining the first family, which just consists of one martingale. Let and let be the graph between iterations and of the while loop of SubspaceSparsifier. Let . Since is orthogonal to , , which means that has a basis for which for all and for all . Let be the matrix with th column and let . Let . Since the s form a basis of , there is a vector for which for any . Furthermore, for any . In particular,
so it suffices to show that for all , where is the number of while loop iterations.
In order to bound the change between and , we introduce a second family of martingales consisting of one martingale for each while loop iteration. Let during the th iteration of the while loop in SubspaceSparsifier. Generate in during iteration of the while loop by sampling a sequence of edges without replacement from . Let for all . For a vector , let be the vector with for and for . For and , let . Let be the matrix with th column . Let . For any , , and , . In particular,
Next, we write an equivalent formulation for the steady oracle “Martingale change stability” guarantee that is easier to analyze:
Proposition 3.6.
Proof.
Notice that
as desired. ∎
Now, we analyze the inner family of matrices . Let