Discrepancy theory is a subfield of combinatorics with several applications to theoretical computer science. In the classical setting one is given a family of sets with and the goal is to find a coloring so that the maximum imbalance is minimized. This minimum value is called the discrepancy of the family, denoted by . A seminal result of Spencer [Spe85] says that for any set family one has , assuming that . It is instructive to observe that for , Spencer’s result gives the bound of , while a uniform random coloring will have a discrepancy of . Moreover, one can show that for some set systems, only an exponentially small fraction of all colorings will indeed have a discrepancy of . This demonstrates that in fact, Spencer’s result provides the existence of a rather rare object.
The cleanest approach to prove Spencer’s result is due to Giannopoulos [Gia97], which we sketch for : Consider the set , a symmetric convex body which denotes the set of good-enough fractional colorings. Here is the strip of colorings that are good for set . The Lemma of Sidak-Khatri [Kha67, Šid67] allows us to lower bound the Gaussian measure of as for some constant using that each strip has a constant width. This rather weak bound on the measure is sufficient to use a pigeonhole principle argument and conclude that must contain a partial coloring with . Then one can color the elements in accordingly and repeat the argument for the remaining uncolored elements. The overall bound follows from the fact that the discrepancy of the partial colorings decreases geometrically as the number of elements in the set system decreases.
While the pigeonhole principle based argument above is non-constructive in nature, Bansal [Ban10] designed a polynomial time algorithm for finding the coloring guaranteed by Spencer’s Theorem. Here, [Ban10] exploits that it suffices to obtain a good enough fractional partial coloring with a constant fraction of entries in to make the argument work. Later, Lovett and Meka [LM12] found a Brownian motion-type algorithm that — despite being a lot simpler — works for more general polyhedral settings. Finally, the random projection algorithm of Rothvoss [Rot14] works for arbitrary symmetric convex bodies that satisfy the measure lower bound. Another remarkable result is due to Bansal, Dadush, Garg and Lovett [BDGL18]: for any symmetric body with
and any vectorsof length , one can find signs in randomized polynomial time so that . This was known before by a non-constructive convex geometric argument due to Banaszczyk [Ban98].
There are two possible strengthenings of Spencer’s Theorem that are both open at the time of this writing: suppose that the set system is sparse in the sense that every element is in at most sets. It is known that [BF81] as well as [Ban98, BDGL18], while the Beck-Fiala Conjecture suggests that is the right bound. For the second generalization — the one that we are following in this paper — it is helpful to define as the diagonal matrix with entry if and otherwise. If denotes the maximum singular value of a matrix, then Spencer’s result can be interpreted as the existence of a coloring so that . A conjecture raised by Meka111See the blog post https://windowsontheory.org/2014/02/07/discrepancy-and-beating-the-union-bound/ is whether for , this bound is also possible for arbitrary symmetric matrices that satisfy . One can prove using matrix concentration inequalities that a random coloring will lead to , and the same bound can also be achieved deterministically using a matrix multiplicative weight update argument [Zou12]. An excellent overview of matrix concentration can be found in the monograph of Tropp [Tro15].
To understand the difficulty of proving Meka’s conjecture, assume and revisit the approach of Giannopoulos for Spencer’s Theorem. We can again define a set
of good enough fractional colorings. Since is a norm, will indeed be symmetric and convex. It would hence suffice to prove that for some constant . However, it is open whether this inequality holds. The issue is that is non-polyhedral and applying Sidak-Khatri’s bound over infinitely222One can use an -net of many vectors but the bound is still too weak. many vectors is way too inefficient. While matrix concentration inequalities are fantastic at proving that likely events are indeed likely, they seem to be unable to prove that unlikely events are not too unlikely. With a scaling argument, they can still be used to prove that for some constant , assuming , though better bounds seem out of reach.
In terms of discrepancy in spectral settings, a different line of techniques has been arguably more successful. A beautiful and influential paper by Batson, Spielman and Srivastava [BSS09] proves that for any undirected graph on nodes one can take a weighted subgraph with just a linear number of edges that approximates every cut within a constant factor. Translated into linear algebra terms, [BSS09] show that given any vectors that are in isotropic position, i.e. , one can find weights with so that . In a more recent celebrated paper, Marcus, Spielman and Srivastava [MSS15] resolved the Kadison-Singer Conjecture, a problem that has appeared independently in different forms in many areas of mathematics. In a simple-to-state version, their result says that for any vectors with and for all , there are signs so that . On a very high level view, both methods of [BSS09] and [MSS15] control a carefully chosen potential function, though we note there is still no known polynomial time algorithm for the latter.
The goal of this paper will be to connect the classical discrepancy theory and the spectral discrepancy theory of [BSS09, MSS15] and develop arguments that prove largeness of non-polyhedral bodies. We remark that we made no attempt at optimizing constants but rather prefer to keep the exposition simple.
For a (not necessarily symmetric) matrix the operator norm can be formally defined as . For a symmetric matrix with eigendecomposition , we write as the matrix where all eigenvalues have been replaced by their absolute values. In this notation, is the maximum singular value. We abbreviate and . Given symmetric matrices , we write if for all .
A convex body is a closed convex set with nonempty interior. We denote as the distance from to . Let be the set of points that have distance at most to (in particular, ). The Minkowski sum of sets and is defined as . A halfspace is a set of the form for some and . The Gaussian measure of is defined as . Here is the distribution of a standard Gaussian in .
1.1 Our contribution
A possible way to approach the setting of Batson, Spielman, Srivastava [BSS09] from a classical discrepancy perspective is to take vectors in isotropic position and consider the body . If we could prove that , then the algorithm of [Rot14] would be able to find a partial coloring. While we still do not know whether the inequality holds, we can prove that a weaker condition that suffices for the algorithm of [Rot14] is satisfied:
Let be symmetric matrices with and select so that . Then for any , the set
satisfies . That is, .
A quantity that is often used in the convex geometry literature is the mean width of a body , which is defined as . The above result implies the following:
A body as defined in Theorem 1 has mean width .
A rather immediate consequence of this insight is that the following sampling algorithm will work with very high probability:
In fact we will prove:
With probability at least a run of the Spectral Sparsification Algorithm satisfies all of the following properties: (a) the algorithm runs in polynomial time; (b) the while loop is iterated at most times; (c) at the end one has and .
In this section, we discuss several tools from probability and linear algebra that we will be using in the proofs.
We need two concentration inequalities. For the first one, see [vH14].
If is -Lipschitz, then for one has
For the proof of the following Corollary, see Appendix A.
For we have
We also need Azuma’s inequality for Martingales with bounded increments, see [AS16].
Theorem 6 (Azuma’s Inequality).
Let be a Martingale with for all . Then for any we have
In order to increase the measure from to we use the following key theorem, see [LT11].
Theorem 7 (Gaussian Isoperimetric Inequality).
Let be a measurable set and be a halfspace such that . Then for all .
The following simple result is useful for dealing with dilations, see [Tko15].
Let be a measurable set and be a closed Euclidean ball such that . Then for all .
For (not neccesarily symmetric) matrices we define the Frobenius inner product and the corresponding Frobenius norm . Generalizing earlier notation, for a PSD matrix , we define as the distribution of a centered Gaussian with covariance matrix . Note that there is a canonical way to generate such a distribution: let be the factorization of that matrix for some vectors . Then draw a standard Gaussian , so that . In particular we will be interested in drawing a standard Gaussian restricted to a subspace . The distribution of such a Gaussian is exactly where and is an orthonomal basis of . The following properties are well known:
Let be a subspace and let be the distribution of a standard Gaussian restricted to that subspace. Then for one has always; (ii) ; (iii) for all ; (iv) for all ; (v) for any matrices one has .
The only property that is non-standard is (v). But note that we can use to justify that for each entry of the matrices one has ; the claim then follows by linearity of expectation and summing over all entries .
For the analysis, we need an estimate on the trace of the product of symmetric matrices:
Let be symmetric matrices with . Then
The proof can be found in Appendix A. We also need a Taylor approximation for the trace of the inverse of a matrix:
Let be symmetric matrices with and . Then
for some .
We abbreviate . As , the matrix is non-singular and by direct computation one can verify that its inverse is given by . Using this formula twice at , we obtain
Taking the trace on both sides gives
Since , we have , hence we can bound the absolute value of the last term as
Finally, note that
3 Main technical result
We now show our main result, Theorem 1. Fix symmetric matrices with and set so that . Let be the body as defined in Theorem 1 and fix a parameter . Ideally, the goal would be to prove that a random Gaussian from is on average close to
. Instead, we prove that there is a random variablethat is close to a Gaussian and ends up in with high probability. The strategy is to generate such a near-Gaussian random variable by performing a Brownian motion that adds up independent Gaussians with a tiny step size . The key ingredient is that in each iteration we walk inside a subspace of dimension at least , meaning that we draw with . This can be understood as blocking the movement in dimensions that are “dangerous”. Then the expected Euclidean distance of the outcome to an unrestricted Gaussian is at most . It remains to argue that the subspace can be chosen so that at the end of the Brownian motion, ends up in . For this sake we define a potential function
Observe that for one has , so the goal is to keep the potential function bounded. We show that, for a particular choice of parameters (later we will choose and ), an update of in expectation does not increase the value of the potential function — assuming that the current value of the potential function is small enough and is taken from the aforementioned subspace. There is the technical issue that the potential function goes up to as the minimal eigenvalue of approaches . We solve this problem by defining another distribution that draws , but if , then is replaced with . Recall that by Corollary 5 one has for any . A second problem is that keeping the potential function low in expectation is not sufficient — if the potential function ever crosses a certain threshold, the analysis stops working. However, a single step in the Brownian motion can be analyzed as follows:
Fix and . Let and suppose , as well as . Define as the unique value for which
Then there is a covariance matrix with and so that while always .
To simplify notation, we abbreviate matrices
Next, we define an index set
Here are the “dangerous” indices in the sense that updating in these coordinates might disproportionally change the potential function. Note that by Markov’s inequality, we have . Consider the subspace
so that for . Further, for .
We choose so that is the standard Gaussian restricted to .
We begin by showing a rather crude upper bound on for .
Claim I. For every with , one has , and .
Proof of Claim I. Note that in order for the potential functions and to be identical, we know that the difference matrix
must have one eigenvalue at least 0 and one eigenvalue at most . There would be no positive eigenvalues if , and similarly no negative eigenvalues if . Hence we conclude . This bound is good enough to show that
Since , it follows .
Now we can apply the matrix Taylor approximation from Lemma 11 and use that for every with , there exists some such that the difference in the potential function is
Observe that in the last equation we have conveniently used that due to the linear constraints defining ,
we have for all .
Now we can show that the quantity is a lot smaller than we have proven so far — in fact its maximum
length is independent of the step size :
Claim II. For every with one has .
Proof of Claim II. We rearrange for and obtain
using the estimates and .
Next, we justify that up to lower order terms.
Claim III. For any with one has
Proof of Claim III. Since , the difference in the left side equals
Here we use , as well as . In particular we have also made use of the linear constraint in the choice of the subspace .
Now we prove the central core of this theorem: in expectation for a Gaussian from the subspace ,
the quadratic term is bounded by a term that we can offset in the potential function by the
length increase of .
Claim IV. One has .
Proof of Claim IV. The argument for this claim needs some care, as we have in general since we draw from a subspace . We abbreviate (note that these matrices will in general not be symmetric). Then