# Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability. We rigorously take into account the dependencies across subsequent resparsifications using martingale inequalities, fixing a flaw in the original analysis.

## Authors

• 11 publications
• 37 publications
• 33 publications
• ### Rate of Convergence and Error Bounds for LSTD(λ)

We consider LSTD(λ), the least-squares temporal-difference algorithm wit...
05/13/2014 ∙ by Manel Tagorti, et al. ∙ 0

• ### Incremental kernel PCA and the Nyström method

Incremental versions of batch algorithms are often desired, for increase...
01/31/2018 ∙ by Fredrik Hallgren, et al. ∙ 0

• ### A Matrix Chernoff Bound for Strongly Rayleigh Distributions and Spectral Sparsifiers from a few Random Spanning Trees

Strongly Rayleigh distributions are a class of negatively dependent dist...
10/19/2018 ∙ by Rasmus Kyng, et al. ∙ 0

• ### PAC-Bayesian Inequalities for Martingales

We present a set of high-probability inequalities that control the conce...
10/31/2011 ∙ by Yevgeny Seldin, et al. ∙ 0

• ### The natural algorithmic approach of mixed trigonometric-polynomial problems

The aim of this paper is to present a new algorithm for proving mixed tr...
02/25/2017 ∙ by Tatjana Lutovac, et al. ∙ 0

• ### To reorient is easier than to orient: an on-line algorithm for reorientation of graphs

We define an on-line (incremental) algorithm that, given a (possibly inf...
10/04/2019 ∙ by Marta Fiori-Carones, et al. ∙ 0

• ### Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis

When a robot acquires new information, ideally it would immediately be c...
09/04/2019 ∙ by Tyler L. Hayes, et al. ∙ 19

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Kelner and Levin (2013) introduced a simple single-pass approach to generate a spectral sparsifier of a graph in the semi-streaming setting, where edges are received one at a time. They store only an intermediate, approximate sparsifier and every time a new edge arrives, it is added to it. Whenever the sparsifier gets too large, they apply a resparsification algorithm to reduce its size, without compromising its spectral guarantees.

Although the algorithm is intuitive and simple to implement, the original proof presented in their paper is incomplete, as originally pointed out in Cohen et al. (2016). In particular, Kelner and Levin (2013)

relies on a concentration inequality for independent random variables, while in the sparsification algorithm the probability of keeping edge

in the sparsifer at step does depend on whether other edges have been included in the sparsifier at previous iterations. This structure introduces subtle statistical dependencies through different iterations of the algorithm, and a more careful analysis is necessary.

In addition to pointing out the problems with the original proof in Kelner and Levin (2013), Cohen et al. (2016) introduces a new algorithm to construct a sparsifer in a semi-streaming setting but, differently from the original algorithm in Kelner and Levin (2013), interactions between iterations are avoided because the algorithm proposed in Cohen et al. (2016) never drops an edge once it is introduced in the sparsifier. Another alternative algorithm, this time capable of dropping included edges, is presented in Pachocki (2016) together with a rigorous proof that takes into account all the dependencies between edges and between iterations. While the final result in Pachocki (2016) guarantees that a valid spectral sparisfier is generated at each iteration, the proposed algorithm is still different from the one originally proposed by Kelner and Levin (2013).

In this note, we derive an alternative proof for the original Kelner and Levin (2013) algorithm, using arguments similar to Pachocki (2016). In particular, it is possible to formalize and analyze the edge selection process as a martingale, obtaining strong concentration guarantees while rigorously taking into account the dependencies across the iterations of the algorithm.

## 2 Background

### 2.1 Notation

We use lowercase letters for scalars, bold lowercase letters

for vectors and uppercase bold letters

for matrices. We write for the Löwner ordering of matrices and when is positive semi-definite (PSD).

We denote with an undirected weighted graph with vertices and edges . Associated with each edge there is a weight (shortened ) measuring the “distance” between vertex  and vertex .111The graph can be either constructed from raw data (e.g., building a -nn graph with an exponential kernel) or it can be provided directly as input (e.g., in social networks). Throughout the rest of the paper, we assume that the weights are bounded, in particular we assume is smaller than 1, is strictly greater than 0, and that , which is always true for unweighted graphs. Given two graphs and over the same set of nodes , we denote by the graph obtained by summing the weights of the edges of and .

Given the weighted adjacency matrix and the degree matrix , the Laplacian of is the PSD matrix defined as . Furthermore, we assume that

is connected and thus has only one eigenvalue equal to

and . Let be the pseudoinverse of , and . For any node , we denote with the indicator vector so that is the “edge” vector. If we denote with the signed edge-vertex incidence matrix, then the Laplacian matrix can be written as , where is the diagonal matrix with .

We indicate with the matrix of the orthogonal projection on the dimensional space ortoghonal to the all one vector . Since the Laplacian of any connected graph has a null space equal to , then is invariant w.r.t. the specific graph on vertices used to defined it. Alternatively, the projection matrix can be obtained as . Finally, let , then we have .

### 2.2 Spectral Sparsification in the Semi-Streaming Setting

A graph is a spectral sparsifier of if the whole spectrum of the original graph is well approximated by using only a small portion of its edges. More formally,

###### Definition 1.

A spectral sparsifier of is a graph such that for all

 (1−ε)xTLGx≤xTLHx≤(1+ε)xTLGx.

Spectral sparsifiers store most of the spectral information of the original graph in a very sparse subgraph. Because of this, they are easy to store in memory and are used to provide fast approximation to many quantities that are expensive to compute on the original large graph.

After showing that every graph admits a sparsifier with edges, Spielman and Srivastava (2011) proposed a sampling algorithm to easily construct one using the effective resistance of the edges of .

###### Definition 2.

The effective resistance of an edge in graph is defined as . The total weighted sum of effective resistances in a graph is the same for all graphs, and is equal to .

Intuitively, the effective resistance encodes the importance of an edge in preserving the minimum distance between two nodes. If an edge is the only connection between two parts of the graph, its is large. On the other hand, if there are multiple parallel paths across many edges to connect two nodes, the effective resistance of an edge between the two nodes will be small, similarly to actual resistances in parallel in an electrical network. An important consequence of this definition is that adding edges to a graph can only reduce the effective resistance of other edges, because it can only introduce new alternative (parallel) paths in the graph. To prove this formally, consider a graph and a new set of edges . Then we have and therefore .

Spielman and Srivastava (2011) proved that sampling the edges of with replacement using a distribution proportional to their effective resistance produces a spectral sparsifier of size with high probability. The main issue of this approach is that we want to compute a sparsifier to avoid storing the whole Laplacian, but we need to store and (pseudo-)invert the Laplacian to compute exact effective resistances to construct the sparsifier. Spielman and Srivastava (2011) showed that this issue can be resolved by computing sufficiently accurate approximation of the effective resistances.

###### Definition 3.

An approximate effective resistance is called -accurate for if it satisfies

 1αre≤˜re≤αre.

In particular, Spielman and Srivastava (2011, Corollary 6) showed that batch sampling edges proportionally to their -accurate approximate effective resistances is enough to guarantee that the resulting graph is a -sparsifier. Building on this result, Kelner and Levin (2013) propose a sequential algorithm (summarized in Alg. 1) that can emulate the batch sampling of Spielman and Srivastava (2011) in a semi-streaming setting and incrementally construct a sparsifier, without having to fully store and invert the input Laplacian.

In a semi-streaming setting the graph is split int blocks , with , where each block is a subset of edges such that . Associated with each of the partial graph, we can define its respective effective resistances , and the sampling probabilities . Starting () from an empty sparsifier , the algorithm alternates between two phases. In the first phase, the algorithm reads edges from the stream to build a new block , and it combines it with the previous sparsifier to construct . This phase sees an increase in memory usage, because the algorithm needs to store the newly arrived edges in addition to the sparsifier. In the second phase the graph is used together with a fast SDD solver to compute -accurate estimates of the effective resistance of all the edges in it. These approximate effective resistances are used to compute approximate probabilities , and according to these approximate probabilities each of the edges in and is added to the new sparsifier or discarded forever freeing up memory. Choosing carefully the size of the blocks to be close to the size of the sparsifiers allows the algorithm to run efficiently in a small fixed space and produce a valid sparsifier at the end of each iteration.222Throughout this note, we consider that the decomposition of into blocks is such that all intermediate graphs are fully connected. Whenever this is not the case, the algorithm should be adjusted to run separately on all the components of the graph. For more details on the time complexity analysis and the implementation details on how to obtain -approximate effective resistances using a valid sparsifier and a fast SDD solver we refer to the original paper Kelner and Levin (2013).

The main result of Kelner and Levin (2013) is the following theorem.

###### Theorem 1.

Let be the sparsifier returned by Algorithm 1 after resparsifications (after streaming the first blocks). If and , with probability all sparsifiers from the beginning of the algorithm to its end () are valid spectral sparsifiers of their corresponding partial graph and the number of edges in each of the sparsifiers is .

At the core of the original proof of this theorem, Kelner and Levin rely on a concentration inequality for independent random variables in Vershynin (2009). Unfortunately, it is not possible to directly use this result since the probability that an edge is included in does indeed depend on all the edges that were included in through the computation of . The sparsifier in turn is generated from and so on. As a result, the probability that an edge is present in the final graph is strictly dependent on the other edges. In the following we rigorously take into account the interactions across iterations and provide a new proof for Theorem 1 which confirms its original statement, thus proving the correctness of Kelner and Levin (2013)’s algorithm.

## 3 Proof

Step 1 (the theoretical algorithm). In order to simplify the analysis, we introduce an equivalent formulation of Alg. 1. In Alg. 3 we consider the case where the blocks contain only a single edge and the algorithm performs resparsifications over the course of the whole stream of edges (loop at line 3). This is a wasteful approach and more practical methods (such as Alg. 1) choose to resparsify only when is larger than in order to save time without increasing the asymptotic space complexity of the algorithm. Nonetheless, the single edge setting can be used to emulate any larger choice for the block size , and therefore we only need to prove our result in this setting for it to hold in any other case.

In Alg. 3, we denote by the Bernoulli random variable that indicates whether copy of edge  is present in the sparsifier at step . While an edge can be present in only if , we initialize for all for notational convenience. The way these variables are generated is equivalent to lines 7-14 in Alg. 2 so that at iteration , each copy of an edge already in the sparsifier is kept with probability , while a copy of the new edge is added with probability . For any edge (i.e., not processed yet in Alg. 2) we initialize and thus the sampling in line 10 of Alg. 3 always returns . Since edges are added with weights , after processing edges, the Laplacian of the sparsifier can be written as

 LHs=s∑e=1N∑j=1aeN˜ps,eˆzs,e,jbebTe.

Step 2 (filtration). A convenient way to treat the indicator variables is to define them recursively as

 ˆzs,e,jdef\resizebox0.0pt0.0pt=I{us,e,j≤˜ps,e˜ps−1,e}ˆzs−1,e,j,

where is a uniform random variable used to compute the coin flip, and  are the approximate probabilities computed at step according to the definition in Algorithm 2, using the SDD solver, the sparsifier and the new edges . This formulation allows us to define the stochastic properties of variables in a convenient way. We first arrange the indices , , and into a linear index in the range , obtained as . Following the structure of Alg. 3, the linearization wraps first when hits its limit, and then when and finally do the same, such that for any , , and , we have

 {s,e,j}+1={s,e,j+1}, {s,e,N}+1={s,e+1,1}, {s,m,N}+1={s+1,1,1}.

It is easy to see that the checkpoints correspond to a full iteration (Alg. 3, line 5) of the algorithm. Let be the filtration containing all the realizations of the uniform random variables up to the step , that is . Again, we notice that defines the state of the algorithm after completing iteration . Since and are computed at the beginning of iteration using the sparisfier , they are fully determined by . Furthermore, since also defines the values of all indicator variables  up to for any and , we have that all the Bernoulli variables at iteration are conditionally independent given . In other words, we have that for any , and such that the following random variables are equal in distribution

 ˆzs,e,j∣∣F{s,e′,j′}=ˆzs,e,j∣∣F{s−1,m,N}∼B(˜ps,e˜ps−1,e) (1)

and for any , and such that and

 ˆzs,e,j∣∣F{s−1,m,N}⊥ˆzs,e′,j′∣∣F{s−1,m,N}. (2)

Step 3 (the projection error). While our objective is to show that is a -sparsifier, following Kelner and Levin (2013) we study the related objective of defining an approximate projection matrix that is close in -norm to the original projection matrix . In fact, the two objectives are strictly related as shown in the following proposition.

###### Proposition 1 (Kelner and Levin (2013)).

Given , let be a subset of edges of and an approximate projection matrix with weights . If the weights are such that

 ∥∥ ∥∥P−N∑i=1wiveivTei∥∥ ∥∥2=∥∥P−˜P∥∥2≤ε, (3)

then the graph obtained by adding edges with weights is a -sparsifier of .

Using the notation of Alg. 3, the approximate projection matrix is defined as

 ˜Pdef\resizebox0.0pt0.0pt=1NN∑j=1m∑e=1ˆzm,e,j˜pm,evevTe,

and thus the previous proposition suggests that to prove that the graph returned by Algorithm 1 after steps is a sparsifier, it is sufficient to show that

In order to study this quantity, we need to analyze how the projection error evolves over iterations. To this end, we introduce term which denotes the projection error at the end of step of Algorithm 3.

 ˆY{s,e,j}def\resizebox0.0pt0.0pt= +1N⎛⎝j∑l=1(1−ˆzs,e,l˜ps,e)+N∑l=j+1(1−ˆzs−1,e,l˜ps−1,e)⎞⎠vevTecopy j of edge e getting processed at% step s (Alg.~{}???, line ???) +1Nm∑k=e+1N∑l=1(1−ˆzs−1,k,l˜ps−1,k)vkvTkedges still not processed at step s

Notice that setting and for any implies that the edges that have not been processed yet do not contribute to the projection error. Finally, notice that at the end of the algorithm we have , which quantifies the error of the output of Algorithm 1.

We are now ready to restate Theorem 1 in a more convenient way as

 P(∃s∈{1,…,m}:m∑e=1∥ˆY{s,m,N}∥≥εAs∪s∑e=1N∑j=1ˆzs,e,j≥3NBs)≤δ,

where the first event refers to the case when for any the intermediate graph fails to be a valid sparsifier and the second event considers the event when the memory requirement is not met (i.e., too many edges are kept in the sparsifier ).

To prove the statement, we decompose the probability of failure as follows.

 P(∃s∈{1,…,m}:∥ˆY{s,m,N}∥≥ε∪s∑e=1N∑j=1ˆzs,e,j≥3N)=P(m⋃s=1As∪Bs) =P({m⋃s=1As}∪{m⋃s=1Bs}) =P({m⋃s=1As})+P({m⋃s=1Bs})−P({m⋃s=1As}∩{m⋃s=1Bs}) =P({m⋃s=1As})+P({m⋃s=1Bs}∩{m⋃s=1As}c) =P({m⋃s=1As})+P({m⋃s=1Bs}∩{m⋂s=1Acs}) =P({m⋃s=1As})+P(m⋃s=1{Bs∩{m⋂s′=1Acs′}})

Taking the last formulation and replacing the definitions of and , we get

 P(∃s∈{1,…,m}:∥ˆY{s,m,N}∥≥ε∪s∑e=1N∑j=1ˆzs,e,j≥3N) =P({m⋃s=1∥ˆY{s,m,N}∥≥ε})+P(m⋃s=1{s∑e=1N∑j=1ˆzs,e,j≥3N∩{m⋂s′=1∥ˆY{s′,m,N}∥≤ε}}) =P(∃s∈{1,…,m}:∥ˆY{s,m,N}∥≥ε) ≤m∑s=1P(∥ˆY{s,m,N}∥≥ε) +m∑s=1P(s∑e=1N∑j=1ˆzs,e,j≥3N∩{∀s′∈{1,…,m}:∥ˆY{s′,m,N}∥≤ε}) ≤m∑s=1P(∥ˆY{s,m,N}∥≥ε) +m∑s=1P(s∑e=1N∑j=1ˆzs,e,j≥3N∩{∀s′∈{1,…,s}:∥ˆY{s′,m,N}∥≤ε}) (4)

Step 4 (putting everything together). In the following sections, we prove the two main lemmas of this note, where we bound the probability of returning a non-spectral sparsifier and the probability of exceeding too much the budget limit . In particular, we derive the two following results.

###### Lemma 1.
 P(∥ˆY{t,m,N}∥≥ε)≤δ2m
###### Lemma 2.
 P(t∑e=1N∑j=1ˆzt,e,j≥3N∩{∀s∈{1,…,t}:∥ˆY{s,m,N}∥≤ε})≤δ2m

Combining the two lemmas into Eq. 3, we prove Thm .1 for an algorithm that resparsifies every time a new edge arrives (). Extending the proof to the case when multiple edges are stored in before a new resparsification happens is straightforward. In the proofs of Lemma 1 and 2 the fact that an edge is unseen (not streamed yet) is represented by deterministically setting its to 1, while the estimates for seen edges are computed based on the graph. To represent the arrival of an edge at time we simply start updating its . To take into account resparsifications of large blocks instead of resparsifications of single-edge blocks it is sufficient to start updating multiple at the same step. The rest of the analysis remains unchanged.

## 4 Proof of Lemma 1 (bounding ˆY{s,m,N})

Step 1 (freezing process). We first restate a proposition on the accuracy of the effective resistance estimates.

###### Proposition 2.

At iteration the approximated effective resistance of an edge in is computed using and the SDD solver. If is a valid -sparsifier of , then is -accurate.

Given -accurate effective resistances, the approximate probabilities are defined as

 ˜ps,e=min{ae˜rs,eα(n−1),˜ps−1,e}.

As pointed out in Proposition 2, the main issue is that whenever is not a valid sparsifier of , the approximate probabilities returned by the fast SDD solver are not guaranteed to be -accurate approximations of the true probabilities . While the overall algorithm may fail in generating a valid sparsifier at some intermediate iteration and yet return a valid sparsifier at the end, we consider an alternative (more pessimistic) process which is “frozen” as soon as it constructs an invalid sparsifier. Consider an alternative process based on the following definition of approximate probabilities

where by Proposition 1, the condition is equivalent to requiring that is a valid sparsifier. This new formulation represents a variant of our algorithm that can detect if the previous iteration failed to construct a graph that is guaranteed to be a sparsifier. When this failure happens, the whole process is frozen and continues until the end without updating anything. Then we redefine the indicator variable dependent on as

 zs,e,j=I{us,e,j≤¯¯¯ps,e¯¯¯ps−1,e}zs−1,e,j,

and then the projection error process based on them becomes

 Y{s,e,j} =1Ne−1∑k=1N∑l=1(1−zs,k,l¯¯¯ps,k)vkvTk +1N⎛⎝j∑l=1(1−zs,e,l¯¯¯ps,e)+N∑l=j+1(1−zs−1,e,l¯¯¯ps−1,e)⎞⎠vevTe +1Nm∑k=e+1N∑l=1(1−zs−1,k,l¯¯¯ps−1,k)vkvTk.

We can see that whenever at step , for all successive steps we have , or in other words we never drop or add a new edge and never change their weights, since is constant. Consequently, if any of the intermediate elements of the sequence violates the condition , the last element will violate it too. For the rest, the sequence behaves exactly like . Therefore,

 P(∥ˆY{t,m,N}∥≥ε)≤P(∥Y{t,m,N}∥≥ε).

Step 2 (martingale process). We now proceed by studying the process and showing that it is a bounded martingale. The sequence difference process is defined as , that is

 X{s,e,j}

In order to show that is a martingale, it is sufficient to verify the following (equivalent) conditions

We begin by inspecting the conditional random variable . Given the definition of , the conditioning on determines the values of and the approximate probabilities and . In fact, remember that these quantities are fully determined by the realizations in which are contained in . As a result, the only stochastic quantity in is the variable . Specifically, if , then we have and (the process is stopped), and the martingale requirement is trivially satisfied. On the other hand, if we have,

 Eus,e,j [1N(zs−1,e,j¯¯¯ps−1,e−zs,e,j¯¯¯ps,e)vevTe∣∣ ∣∣F{s,e,j}−1] =1N(zs−1,e,j¯¯¯ps−1,e−zs−1,e,j¯¯¯ps,eE[I{us,e,j≤¯¯¯ps,e¯¯¯ps−1,e}∣∣ ∣∣F{s,e,j}−1])vevTe =1N(zs−1,e,j¯¯¯ps−1,e−zs−1,e,j¯¯¯ps,e¯¯¯ps,e¯¯¯ps−1,e)vevTe=0,

where we use the recursive definition of and the fact that is a uniform random variable in . This proves that is indeed a martingale. We now compute an upper-bound on the norm of the values of the difference process as

 ≤1Naerm,e¯¯¯ps,e≤1Nα2aerm,eps,e≤1Nα2aerm,epm,e=1Nα2(n−1)aerm,eaerm,e=α2(n−1)Ndef\resizebox0.0pt0.0pt=R,

where we use the fact that if, at step , , then the approximate