Benczúr and Karger  introduced the notion of a cut sparsifier: a weighted graph is an cut sparsifier of a graph if, for every cut of the set of vertices, the weighted number of cut edges in is the same as the number of cut edges in , up to multiplicative error , that is,
where denotes the weighted number of edges in leaving the set . A stronger notion, introduced by Spielman and Teng , is that of a spectral sparsifier: according to this notion, a weighted graph is an cut sparsifier of a graph if
to be the 0/1 indicator vector of. A more compact way to express (2) is as
Batson, Spielman and Srivastava  show that, for every graph, an spectral sparsifier (and hence also an cut sparsifier) can be constructed in polynomial time with weighted edges, which is best possible up to the constant in the big-Oh. Sparsifiers have several applications to speeding-up graph algorithms.
For some graphs , for example the “barbell” graph (that consists of two disjoint cliques joined by a single edge), it is necessary for a non-trivial sparsifier of to have edges of different weights. This has motivated the question of whether there are weaker, but still interesting, notion of sparsification that can be achieved, for all graphs, using sparsifiers that are “unweighted” in the sense that all edges have the same weight.
Is a non-trivial notion of unweighted sparsification possible for all graphs?
Results on unweighted sparsification have focused on bounding the multiplicative error in such cases, allowing it to be superconstant [2, 1]. For graphs such as the barbell example one, however gets, necessarily, very poor bounds. But is there an alternative notion for which one can get arbitrarily good approximation on all graphs using a linear number of edges?
If one restricts this question from all graphs to selected classes of graphs, then a number of interesting results are known, and some major open questions arise.
If is a -regular graph such that every edge has effective resistance , the Marcus-Spielman-Srivastava  proof of the Kadison-Singer conjecture (henceforth, we will refer to this result as the MSS Theorem) implies that can be partitioned into almost-regular unweighted spectral sparsifiers with error and average degree . An interesting class of such graphs are edge-transitive graphs, such as the hypercube.
Another interesting class of graphs all whose edges have effective resistance is the class of -regular expanders of constant normalized expansion . Before the MSS Theorem, Frieze and Molloy  proved that such graphs can be partitioned into unweighted almost-regular graphs of average degree and normalized edge expansion at least . They also show how to construct such a partition in randomized polynomial time under an additional small-set expansion assumption on . Becchetti et al.  present a randomized linear time algorithm that, given a dense regular expander of degree finds an edge-induced expander in of degree . While both  and  find sparse expanders inside dense expanders, the work of Frieze and Molloy does not produce constant-degree graphs and the work of Becchetti et al. only applies to very dense graphs. Furthermore, neither work guarantees that one ends up with a sparse graph that is a good sparsifier of the original one.
Is there a polynomial time construction of the unweighted spectral sparsifiers of expanders whose existence follows from the Marcus-Spielman-Srivastava theorem?
Notions of cut sparsifiers  and spectral sparsifiers  have been defined for hypergraphs, generalizing the analogous definitions for graphs. In a hypergraph , a hyperedge is cut by a partition of the vertices if intersects both and . As for graphs, we can define to be the (weighted, if applicable) number of hyperdges in that are cut by . As before, a weighted subset of edges defines a hypergraph cut sparsifier with error if
Kogan and Krauthgamer  show how to construct such a (weighted) sparsifier in randomized polynomial time using hyperedges where is the maximum size of the hyperedges which is also called the rank of the hypergraph.
In order to define a notion of spectral sparsification, we associate to a hypergraph the following analog of the Laplacian quadratic form, namely a function such that
where is the weight (if applicable) of hyperedge . Note that with this definition we have that if for some subset of vertices then . Following Soma and Yoshida , we say that a weighted hypergraph is a spectral sparsifier with error of if we have
Soma and Yoshida  provide a randomized polynomial time construction of such sparisifiers, using hyperedges.
Is it possible, for every hypergraph, to construct a weighted spectral sparsifier with hyperedges?
As in the case of graphs, it is also natural to raise the following question.
Is a non-trivial notion of unweighted sparsification possible for all hypergraps?
We provide a positive answer to all the above questions.
1.1 Our Results
1.1.1 Sparsification with additive error
Oveis-Gharan suggested the following weakened definition of sparsification: if is -regular, we say that an unweighted graph is an additive cut sparsifier of with error if we have
where . Note that this (up to a constant factor change in the error parameter ) is equivalent to the standard notion if has constant normalized edge expansion, because and will then be within a constant factor of each other. On non-expanding graphs, however, this definition allows higher relative error on sparse cuts and a tighter control on expanding cuts. (The factor of 2 has no particular meaning and it is just there for consistency with the definition that we give next for non-regular graphs.)
For non-regular graphs , we say that is an additive cut sparsifier of with error if we have
where and is the average degree of and is the volume of that is, the sum of the degrees of the vertices in . In can be shown that both terms are necessary if one wants a definition of unweighted sparsification that is applicable to all graphs.
This notion has a natural spectral analog, which we state directly in the more general form:
Note, again, that if is a regular expander then this definition is equivalent to the standard definition of spectral sparsifier.
In a hypergraph, the degree of a vertex is the number of hyperedges it belongs to, and the volume of a set of vertices is the sum of the degrees of the vertices that belong to it. With these definitions in mind, the notion of additive graph sparsifier immediately generalizes to hypergraphs.
1.1.2 New Graph Sparsification Constructions
Our first result is a deterministic polynomial time construction which achieves a weak form of unweighted additive sparsification.
Theorem 1.1 (Deterministic Polynomial Time Construction).
Given a graph and a parameter , in deterministic polynomial time we can find a subset of size such that, if we let be the Laplacian of , be the Laplacian of the graph , be the average degree of , and , we have
Note, in particular, that we get that for every set of vertices we have
The first inequality follows by computing the quadratic forms of (4) with the indicator vector of , and noting that , that , that
for every diagonal matrix , and that . The second inequality follows by computing the quadratic forms of (4) with the indicator vector of , and noting that , , and .
Our proof is based on the online convex optimization techniques of Allen-Zhu, Liao and Orecchia . The construction of  involves weights for two reasons: one reason is a change of basis that maps to identity, a step that is not necessary in our setting and that could also be avoided in their setting if is a graph all whose edges have bounded effective resistance. The second reason is more technical, and it is to avoid blowing up the “width” on the online game that they define. The second issue comes up when one wants to prove , but is not a problem for the upper bound .
To sidestep this problem, we set the goals of proving the bounds
where denotes the signless Laplacian of a graph , defined as . Note that the above PSD inequalities are equivalent to (4).
The reasons why, when our goal is the PSD inequalities above, we are able to control the width without scaling (and without weighing the edges) are quite technical, and we defer further discussion to Section 3.
Our next result is a probabilistic construction of sparsifiers with additive error matching the Oveis-Gharan definition.
Theorem 1.2 (Probabilistic Polynomial Time Construction).
Given an -vertex graph and a parameter , in probabilistic polynomial time we can find a subset of size111where hides factors such that, if we let be the Laplacian of , be the Laplacian of the graph , be the average degree of , and , we have
When we apply the above result to a -regular expander , we obtain a graph whose average (and maximum) degree is and which is itself a good expander. More precisely, if has normalized edge expansion and is as above, then the normalized edge expansion of is about . Recall that Frieze and Molloy can find a as above but with degree rather than . Furthermore, if is a -regular expander of normalized edge expansion , we have222There is some abuse of notation in (7), because (7) only holds in the space orthogonal to . that
and so the unweighted sparsifier of given by the above theorem is also a spectral sparsifier in the standard sense. This answers Questions 1 and 2 of the previous section.
We briefly discuss the techniques in the proof. Following Frieze and Molloy  and Bilu and Linial , we apply the Lovász Local Lemma  (LLL) to construct an additive cut sparsifier. One difficulty with this approach is that one has to verify that the sparsifier approximates each of the exponentially many cuts. Indeed, if one defines a “bad” event for each one of these cuts, there are too many events that are dependent in order to successfully apply LLL. A key insight in  is that it is sufficient to verify those cuts where induces a connected subgraph. This makes a big difference in graphs of maximal degree : for a vertex , there are subsets of vertices containing whereas one can prove that there are at most such subsets of size that induce a connected subgraph. This allows one to manage the exponentially many events and get almost optimal results with LLL. Indeed, we obtain a close to optimal average degree . This improves upon the average degree bound in  . We achieve this by an iterative procedure that intuitively halves the number of edges, instead of sparsifying the graph “in one go.”
to give an efficient probabilistic time algorithm for finding the sparsifier. To apply the constructive version of LLL in the presence of exponentially bad events, one needs to find a subset of bad events of polynomial size such that the probability that any other bad event is true is negligible. We show that this can be achieved by selecting the subset of events corresponding to cutsso that induces a connected graph and . This gives us an efficient probabilistic algorithm for finding a cut sparsifier which we also generalize to hypergraphs (as we state in the next section). For graphs, we then adapt the techniques of Bilu and Linial  to go from a cut sparsifier to a spectral one. To do so we need to consider some more bad events in the application of LLL than needed by Bilu-Linial who worked with “signings” of the adjacency matrix. Specifically, in addition to the events that they considered, we need to also bound the degree of vertices.
1.1.3 New Hypergraph Sparsification Constructions
Theorem 1.3 (Hypergraph cut sparsification with additive error).
Given an -vertex hypergraph of rank and a parameter , in probabilistic polynomial time we can find a subset of size such that, if we let be the average degree of , and , the following holds with probability at least :
The proof follows the same approach as the first part of our proof of Theorem 1.2, and in fact we present directly the proof for hypergraphs, leaving the result for graphs as a corollary. It might seem strange that the number of hyperedges in our sparsifier is, for fixed , of the form , since, intuitively, the sparsification problem should only become harder when grows. The reason is that, even in a regular hypergraph, overestimates the number of hyperedges incident on by up to a factor of , and so, in order to have a non-trivial guarantee, one has to set .
Theorem 1.4 (Hypergraph sparsification with multiplicative error).
There is a randomized polynomial time algorithm that, given a hypergraph of rank , finds a weighted spectral sparsifier with multiplicative error having hyperedges.
The above result should be compared with the hyperedges of the construction of Soma and Yoshida . Our approach is to provide an “hypergraph analog” of the spectral graph sparsifier construction of Spielman and Srivastava . Given , we construct an associated graph (in which each hyperedge of is replaced by a clique in ), we compute the effective resistances of the edges of , and we use them to associate a notion of “effective resistance” to the hyperedges of . Then we sample from the set of hyperdedges of by letting the sampling probability of each hyperedge be proportional to its “effective resistance” and we weigh them so that the expected weight of each hyperedge in the sample is the same. At this point, to bound the error, Spielman and Srivastava complete the proof by applying a matrix concentration bound for the spectral norm of sums of random matrices. For hypergraphs, we would like to have a similar concentration bound on the error given by,
is a random variable that is 0 if the hyperedgeis not selected and it is its weight in the sparsifier if it is selected, with things set up so that has expectation zero. (Actually, this would only lead to a sparsifier with additive error: to achieve multiplicative error we have to study an expression such as the one above but after a change of basis defined in terms of the associated graph. For simplicity we will ignore this point in this overview.)
However, unlike in the graph case, the expression in (9) does not correspond to the spectral norm, or any other standard linear-algebraic norm, due to the term, and the key difficulty in all previous approaches to the problem was to get suitable upper bounds on this quantity. Our main idea is to consider the quantity and view it as a random process indexed by the set of all unit vectors , and directly argue about its supremum over all such , using the technique of generic chaining. In particular, we relate the metric given by the sub-gaussian norm of the increments of the process to another suitably defined Gaussian random process on the associated graph of , which is much easier to analyze. This allows us to relate the bound on the supremum of to a related expression on the graph , for which we can use known matrix concentration bounds.
2.1 Linear Algebra Preliminaries
In this paper all matrices will have real-valued entries.
A matrix is Positive Semidefinite (abbreviated PSD and written
) if it is symmetric and all its eigenvalues are non-negative. Equivalently,is PSD if and only if
that is, the quadratic form of is always non-negative. The trace of a matrix is the sum of its diagonal entries. For a symmetric matrix, its trace is equal to the sum of its eigenvalues, counted with multiplicities. A density matrix is a PSD matrix of trace one. The operator norm of a matrix is
If is symmetric, then the above is the largest absolute value of the eigenvalues of and we also refer to it as the spectral norm or spectral radius of the matrix.
If and are matrices of the same size, then their Frobenius inner product is defined as
and we will also sometimes denote it as . Note that if is a symmetric matrix we have
If is a symmetric matrix with spectral decomposition
then the “absolute value” of is the PSD matrix
2.2 Reduction to bounded-degree case
Consider the following construction: given a graph of average degree , construct a new graph such that
To each node there corresponds a cloud of nodes in .
To each edge there corresponds an edge in between the cloud of and the cloud of .
Each vertex in has degree at most .
A construction satisfying the above property can be realized by replacing the vertices of , in sequence, by a cloud as required, and then replacing in the edges incident to by vertices in the cloud of , in a balanced way.
Now suppose that is a subset of the edges of and that is the set of edges of corresponding to the edges of . Let be the graph and . Let be any vector, and define to be the vector such that if is in the cloud of . Then we observe that
The only non-trivial statement is the third one. To verify it, we see that the left-hand side is
This means that we can start from an arbitrary graph , construct as above, find an unweighted sparsifier of , and then obtain a set of edges such that is an unweighted sparsifier for , with the property that any bound dependent on on the quality of the sparsification of becomes a bound in terms of (and we can drop the ceiling at the cost of a constant factor in the error).
If is a hypergraph we can similarly construct a hypergraph such that
To each node there corresponds a cloud of nodes in .
To each edge there corresponds an edge in between the cloud of and the cloud of .
Each vertex in has degree at most .
Similarly to the graph case, for every set we can define a set (the union of the clouds of vertices in ) and for every set we can define a set of hyperedges of the same cardinality such that
We also note that, in both constructions, the maximum degree and the average degree of the new graph (or hypergraph) are within a constant factor.
3 Deterministic Construction
In this section we use the online convex optimization approach of Allen-Zhu, Liao and Orecchia  to construct a weak form of unweighted additive spectral sparsifiers, and we prove Theorem 1.1. Given the reduction described in Section 2.2, it is enough to prove the following theorem.
There is a deterministic polynomial time algorithm that given a graph of maximum degree and a parameter outputs a multiset of edges such that the graph satisfies
We are interested in the following online optimization setting: at each time , an algorithm comes up with a solution , which is an density matrix, and an adversary comes up with a cost matrix , which is an matrix, and the algorithm receives a payoff . The algorithm comes up with based on knowledge of and of , while the adversary comes up with based on and on . The goal of the algorithm is to maximize the payoff. After running this game for steps, one defines the regret of the algorithm as
Theorem 3.2 (Allen-Zhu, Liao, Orecchia ).
There is a deterministic polynomial algorithm that, given a parameter , after running for steps against an adversary that provides cost matrices restricted as described below, achieves a regret bound
Furthermore, if is block-diagonal, then is also block-diagonal with the same block structure The restrictions on the adversary are that at each step the cost function is positive semidefinite or negative semidefinite and satisfies
The theorem above is the case of Theorem 3.3 in . The Furthermore part is not stated explicitly in [1, Theorem 3.3] but can be verified by inspecting the proof. Note that what we are calling corresponds to in the treatment of , which is why their cost minimization problem becomes a maximization problem here, and the condition that satisfy becomes the condition that we have in the above theorem.
To gain some intuition about the way we will use the above theorem, note that the definition of regret implies that we have
where denotes the largest eigenvalue of the matrix. Now suppose that we play the role of the adversary against the algorithm of Theorem 3.2, and that, at time , we reply to the solution of the algorithm with a cost matrix of the form where and is an edge chosen so that
We know that such an edge must exist, because the average of the left-hand side above is zero if we compute it for a uniformly chosen random . After playing this game for steps we have
and, calling the multiset , calling the multigraph of such edges and , and noting that we have
which, provided that we can ensure that is small, is one side of the type of bounds that we are trying to prove.
In order to get a two-sided bound, one would like to use the idea that
and play the above game using, at step , a cost matrix of the form
where the edge is chosen so that
Then, if we define and as above, we would reach the conclusion
and what remains to do is to see for what value of we get a sufficiently small regret bound.
Unfortunately this approach runs into a series of difficulties.
First of all, our cost matrix is neither positive semidefinite nor negative semidefinite.
We could make it positive semidefinite by shifting, that is, by adding a multiple of the identity. This is not a problem for the block , whose smallest eigenvalue is at most in magnitude, but it is a serious problem for the block , whose smallest eigenvalue is of the order of : the shift needed to make this block PSD would be so big that the terms in the regret bound would be too large to obtain any non-trivial result.
Another approach, which is closer to what happens in , is to see that the analysis of Theorem 3.2 applies also to block-diagonal matrices in which each block is either positive semidefinite or negative semidefinite. This way, we can shift the two blocks in different directions by and get the cost function in a form to which Theorem 3.2 applies, but then we would still be unable to get any non-trivial bound because the term could be in the order of , while the analysis requires that term to be of the order of to get the result we are aiming for. To see why, note that if is a block-diagonal matrix with a positive semidefinite block and a negative semidefinite block, then is just the same matrix except that the negative semidefinite block appears negated. Recall that we wanted to select an edge so that is small: what will happen is that the PSD block gives a positive contribution, the NSD block gives a negative contribution, and is the sum of the absolute values of these contributions, which can both be order of .
We could work around this problem by scaling the matrix in a certain way, but this would make the analysis only work for a weighted sparsifier. This difficulty is the reason why  construct a weighted sparsifier even if the effective resistances of all the edges of are small, a situation in which an unweighted sparsifier is known to exist because of the Marcus-Spielman-Srivastava theorem.
We work around these difficulties by reasoning about the signless Laplacian. If is a graph with diagonal degree matrix and adjacency matrix , then the signless Laplacian of is defined as the matrix . We denote by the signless Laplacian of a graph , and by the signless Laplacian of a graph containing only the single edge . Equation (10) below shows that, in this case, the term in the regret bound can be bounded in term of and are never order of .
Recall that, like the Laplacian, the signless Laplacian is a PSD matrix whose largest eigenvalue is at most .
where the edge is chosen so that
Since for every density matrix, we get that, after steps, if we define to be the multiset of selected edges, , and , then we have
and so it remains to show that we can make by choosing .
Let us analyze the quantities that come up in the statement of Theorem 3.2.
Since is PSD, we have
The non-trivial part of the analysis is the following bound.
At every time step we have
Recall from Theorem 3.2 that matrices will have the same block structure as the cost matrices . We can therefore write the matrix as
Using the triangle inequality and the fact that all the eigenvalues of , and hence of , of , of and are at most one, we have
Also recall that we chose so that we would have
which is the same as
Now let us write
where are the eigenvalues of and
are a orthonormal basis of eigenvectors of, and let us also write
where is the vector of length . Then
Finally, by Cauchy-Schwarz,
In a completely analogous way we can prove that
To conclude the proof, take such that
which, by the above claim, means that it can be done by choosing . Then using (10) and that we have the regret bound
When , the above upper bound is , which means that we have constructed a graph with edges such that
where the second equation is equivalent to
proving Theorem 3.1.
4 Probabilistic construction of additive sparsifiers
In this section, we give probabilistic algorithms for constructing additive spectral sparsifiers of hypergraphs. Specifically, we prove the following theorem which, by the reduction in Section 2.2, implies Theorem 1.3. That we can choose the normalization constant to equal in Theorem 1.3 is because, in the reduction, the following theorem is used for a graph where approximately equals the average degree.
Given an -vertex hypergraph of rank and of maximal degree together with a parameter , in probabilistic polynomial time we can find a subset of size such that, if we let be a normalization constant, the following holds with probability at least :
Our arguments are inspired by those used by Frieze and Molloy  and subsequently by Bilu and Linial . They use the Lovász Local Lemma (LLL)  with an exponential number of bad events and may at first seem non-constructive. However, rather recent results give efficient probabilistic algorithms even in these applications of LLL. Theorem 3.3 in  will be especially helpful for us. To state it we need to introduce the following notation. We let be a finite collection of mutually independent random variables and let be a collection of events, each determined by some subset of . For any event that is determined by a subset of we denote the smallest such subset by . Further, for two events and we write if . In other words, and are neighbors in the standard dependency graph considered in LLL. Finally, we say that a subset is an efficiently verifiable core subset if there is a polynomial time algorithm for finding a true event in if any. We can now state a (slightly) simplified version of Theorem 3.3 in  as follows:
Let be an efficiently verifiable core subset of . If there is an and an assignment of reals such that:
then there exists a randomized polynomial time algorithm that outputs an assignment in which all events in are false with probability at least .
The following lemma says that we can roughly half the degree of vertices without incurring too much loss in the cut structure. Applying this lemma iteratively then yields a sparsifier. We use the following notation: For an edge set and disjoint vertex subsets and , we let denote the set of edges with one endpoint in and one in ; for brevity, we also write for . Also recall that and .
There exists a probabilistic polynomial-time algorithm that, given an -vertex hypergraph of maximal degree