Linear Size Sparsifier and the Geometry of the Operator Norm Ball

07/03/2019
by   Victor Reis, et al.
University of Washington
0

The Matrix Spencer Conjecture asks whether given n symmetric matrices in R^n × n with eigenvalues in [-1,1] one can always find signs so that their signed sum has singular values bounded by O(√(n)). The standard approach in discrepancy requires proving that the convex body of all good fractional signings is large enough. However, this question has remained wide open due to the lack of tools to certify measure lower bounds for rather small non-polyhedral convex sets. A seminal result by Batson, Spielman and Srivastava from 2008 shows that any undirected graph admits a linear size spectral sparsifier. Again, one can define a convex body of all good fractional signings. We can indeed prove that this body is close to most of the Gaussian measure. This implies that a discrepancy algorithm by the second author can be used to sample a linear size sparsifer. In contrast to previous methods, we require only a logarithmic number of sampling phases.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/19/2018

On the streaming complexity of fundamental geometric problems

In this paper, we focus on lower bounds and algorithms for some basic ge...
07/31/2020

On the Computational Complexity of Linear Discrepancy

Many problems in computer science and applied mathematics require roundi...
11/04/2021

A New Framework for Matrix Discrepancy: Partial Coloring Bounds via Mirror Descent

Motivated by the Matrix Spencer conjecture, we study the problem of find...
06/22/2018

A Nearly-Linear Bound for Chasing Nested Convex Bodies

Friedman and Linial introduced the convex body chasing problem to explor...
11/08/2017

Constructive Discrepancy Minimization with Hereditary L2 Guarantees

In discrepancy minimization problems, we are given a family of sets S = ...
10/29/2019

Thresholds versus fractional expectation-thresholds

Proving a conjecture of Talagrand, a fractional version of the 'expectat...
03/25/2021

Information theoretic parameters of non-commutative graphs and convex corners

We establish a second anti-blocker theorem for non-commutative convex co...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Discrepancy theory is a subfield of combinatorics with several applications to theoretical computer science. In the classical setting one is given a family of sets with and the goal is to find a coloring so that the maximum imbalance is minimized. This minimum value is called the discrepancy of the family, denoted by . A seminal result of Spencer [Spe85] says that for any set family one has , assuming that . It is instructive to observe that for , Spencer’s result gives the bound of , while a uniform random coloring will have a discrepancy of . Moreover, one can show that for some set systems, only an exponentially small fraction of all colorings will indeed have a discrepancy of . This demonstrates that in fact, Spencer’s result provides the existence of a rather rare object.

The cleanest approach to prove Spencer’s result is due to Giannopoulos [Gia97], which we sketch for : Consider the set , a symmetric convex body which denotes the set of good-enough fractional colorings. Here is the strip of colorings that are good for set . The Lemma of Sidak-Khatri [Kha67, Šid67] allows us to lower bound the Gaussian measure of as for some constant using that each strip has a constant width. This rather weak bound on the measure is sufficient to use a pigeonhole principle argument and conclude that must contain a partial coloring with . Then one can color the elements in accordingly and repeat the argument for the remaining uncolored elements. The overall bound follows from the fact that the discrepancy of the partial colorings decreases geometrically as the number of elements in the set system decreases.

While the pigeonhole principle based argument above is non-constructive in nature, Bansal [Ban10] designed a polynomial time algorithm for finding the coloring guaranteed by Spencer’s Theorem. Here, [Ban10] exploits that it suffices to obtain a good enough fractional partial coloring with a constant fraction of entries in to make the argument work. Later, Lovett and Meka [LM12] found a Brownian motion-type algorithm that — despite being a lot simpler — works for more general polyhedral settings. Finally, the random projection algorithm of Rothvoss [Rot14] works for arbitrary symmetric convex bodies that satisfy the measure lower bound. Another remarkable result is due to Bansal, Dadush, Garg and Lovett [BDGL18]: for any symmetric body with

and any vectors

of length , one can find signs in randomized polynomial time so that . This was known before by a non-constructive convex geometric argument due to Banaszczyk [Ban98].

There are two possible strengthenings of Spencer’s Theorem that are both open at the time of this writing: suppose that the set system is sparse in the sense that every element is in at most sets. It is known that  [BF81] as well as  [Ban98, BDGL18], while the Beck-Fiala Conjecture suggests that is the right bound. For the second generalization — the one that we are following in this paper — it is helpful to define as the diagonal matrix with entry if and otherwise. If denotes the maximum singular value of a matrix, then Spencer’s result can be interpreted as the existence of a coloring so that . A conjecture raised by Meka111See the blog post https://windowsontheory.org/2014/02/07/discrepancy-and-beating-the-union-bound/ is whether for , this bound is also possible for arbitrary symmetric matrices that satisfy . One can prove using matrix concentration inequalities that a random coloring will lead to , and the same bound can also be achieved deterministically using a matrix multiplicative weight update argument [Zou12]. An excellent overview of matrix concentration can be found in the monograph of Tropp [Tro15].

To understand the difficulty of proving Meka’s conjecture, assume and revisit the approach of Giannopoulos for Spencer’s Theorem. We can again define a set

of good enough fractional colorings. Since is a norm, will indeed be symmetric and convex. It would hence suffice to prove that for some constant . However, it is open whether this inequality holds. The issue is that is non-polyhedral and applying Sidak-Khatri’s bound over infinitely222One can use an -net of many vectors but the bound is still too weak. many vectors is way too inefficient. While matrix concentration inequalities are fantastic at proving that likely events are indeed likely, they seem to be unable to prove that unlikely events are not too unlikely. With a scaling argument, they can still be used to prove that for some constant , assuming , though better bounds seem out of reach.

In terms of discrepancy in spectral settings, a different line of techniques has been arguably more successful. A beautiful and influential paper by Batson, Spielman and Srivastava [BSS09] proves that for any undirected graph on nodes one can take a weighted subgraph with just a linear number of edges that approximates every cut within a constant factor. Translated into linear algebra terms, [BSS09] show that given any vectors that are in isotropic position, i.e. , one can find weights with so that . In a more recent celebrated paper, Marcus, Spielman and Srivastava [MSS15] resolved the Kadison-Singer Conjecture, a problem that has appeared independently in different forms in many areas of mathematics. In a simple-to-state version, their result says that for any vectors with and for all , there are signs so that . On a very high level view, both methods of [BSS09] and [MSS15] control a carefully chosen potential function, though we note there is still no known polynomial time algorithm for the latter.

The goal of this paper will be to connect the classical discrepancy theory and the spectral discrepancy theory of [BSS09, MSS15] and develop arguments that prove largeness of non-polyhedral bodies. We remark that we made no attempt at optimizing constants but rather prefer to keep the exposition simple.

Notation.

For a (not necessarily symmetric) matrix the operator norm can be formally defined as . For a symmetric matrix with eigendecomposition , we write as the matrix where all eigenvalues have been replaced by their absolute values. In this notation, is the maximum singular value. We abbreviate and . Given symmetric matrices , we write if for all .

A convex body is a closed convex set with nonempty interior. We denote as the distance from to . Let be the set of points that have distance at most to (in particular, ). The Minkowski sum of sets and is defined as . A halfspace is a set of the form for some and . The Gaussian measure of is defined as . Here is the distribution of a standard Gaussian in .

1.1 Our contribution

A possible way to approach the setting of Batson, Spielman, Srivastava [BSS09] from a classical discrepancy perspective is to take vectors in isotropic position and consider the body . If we could prove that , then the algorithm of [Rot14] would be able to find a partial coloring. While we still do not know whether the inequality holds, we can prove that a weaker condition that suffices for the algorithm of [Rot14] is satisfied:

Theorem 1.

Let be symmetric matrices with and select so that . Then for any , the set

satisfies . That is, .

A quantity that is often used in the convex geometry literature is the mean width of a body , which is defined as . The above result implies the following:

Theorem 2.

A body as defined in Theorem 1 has mean width .

A rather immediate consequence of this insight is that the following sampling algorithm will work with very high probability:

Spectral Sparsification Algorithm   Input: PSD matrices with and Output: with and   Set for WHILE DO Let with . Draw a Gaussian . Compute . If then replace by . Update .

In fact we will prove:

Theorem 3.

With probability at least a run of the Spectral Sparsification Algorithm satisfies all of the following properties: (a) the algorithm runs in polynomial time; (b) the while loop is iterated at most times; (c) at the end one has and .

2 Preliminaries

In this section, we discuss several tools from probability and linear algebra that we will be using in the proofs.

Concentration.

We need two concentration inequalities. For the first one, see [vH14].

Theorem 4.

If is -Lipschitz, then for one has

For the proof of the following Corollary, see Appendix A.

Corollary 5.

For we have

We also need Azuma’s inequality for Martingales with bounded increments, see [AS16].

Theorem 6 (Azuma’s Inequality).

Let be a Martingale with for all . Then for any we have

Gaussians.

In order to increase the measure from to we use the following key theorem, see [LT11].

Theorem 7 (Gaussian Isoperimetric Inequality).

Let be a measurable set and be a halfspace such that . Then for all .

The following simple result is useful for dealing with dilations, see [Tko15].

Theorem 8.

Let be a measurable set and be a closed Euclidean ball such that . Then for all .

For (not neccesarily symmetric) matrices we define the Frobenius inner product and the corresponding Frobenius norm . Generalizing earlier notation, for a PSD matrix , we define as the distribution of a centered Gaussian with covariance matrix . Note that there is a canonical way to generate such a distribution: let be the factorization of that matrix for some vectors . Then draw a standard Gaussian , so that . In particular we will be interested in drawing a standard Gaussian restricted to a subspace . The distribution of such a Gaussian is exactly where and is an orthonomal basis of . The following properties are well known:

Lemma 9.

Let be a subspace and let be the distribution of a standard Gaussian restricted to that subspace. Then for one has always; (ii) ; (iii) for all ; (iv) for all ; (v) for any matrices one has .

The only property that is non-standard is (v). But note that we can use to justify that for each entry of the matrices one has ; the claim then follows by linearity of expectation and summing over all entries .

Linear Algebra.

For the analysis, we need an estimate on the trace of the product of symmetric matrices:

Lemma 10.

Let be symmetric matrices with . Then

The proof can be found in Appendix A. We also need a Taylor approximation for the trace of the inverse of a matrix:

Lemma 11.

Let be symmetric matrices with and . Then

for some .

Proof.

We abbreviate . As , the matrix is non-singular and by direct computation one can verify that its inverse is given by . Using this formula twice at , we obtain

Taking the trace on both sides gives

Since , we have , hence we can bound the absolute value of the last term as

Finally, note that

3 Main technical result

We now show our main result, Theorem 1. Fix symmetric matrices with and set so that . Let be the body as defined in Theorem 1 and fix a parameter . Ideally, the goal would be to prove that a random Gaussian from is on average close to

. Instead, we prove that there is a random variable

that is close to a Gaussian and ends up in with high probability. The strategy is to generate such a near-Gaussian random variable by performing a Brownian motion that adds up independent Gaussians with a tiny step size . The key ingredient is that in each iteration we walk inside a subspace of dimension at least , meaning that we draw with . This can be understood as blocking the movement in dimensions that are “dangerous”. Then the expected Euclidean distance of the outcome to an unrestricted Gaussian is at most . It remains to argue that the subspace can be chosen so that at the end of the Brownian motion, ends up in . For this sake we define a potential function

Observe that for one has , so the goal is to keep the potential function bounded. We show that, for a particular choice of parameters (later we will choose and ), an update of in expectation does not increase the value of the potential function — assuming that the current value of the potential function is small enough and is taken from the aforementioned subspace. There is the technical issue that the potential function goes up to as the minimal eigenvalue of approaches . We solve this problem by defining another distribution that draws , but if , then is replaced with . Recall that by Corollary 5 one has for any . A second problem is that keeping the potential function low in expectation is not sufficient — if the potential function ever crosses a certain threshold, the analysis stops working. However, a single step in the Brownian motion can be analyzed as follows:

Lemma 12.

Fix and . Let and suppose , as well as . Define as the unique value for which

Then there is a covariance matrix with and so that while always .

Proof.

To simplify notation, we abbreviate matrices

Next, we define an index set

Here are the “dangerous” indices in the sense that updating in these coordinates might disproportionally change the potential function. Note that by Markov’s inequality, we have . Consider the subspace

so that for . Further, for . We choose so that is the standard Gaussian restricted to . We begin by showing a rather crude upper bound on for .
Claim I. For every with , one has , and .
Proof of Claim I. Note that in order for the potential functions and to be identical, we know that the difference matrix

must have one eigenvalue at least 0 and one eigenvalue at most . There would be no positive eigenvalues if , and similarly no negative eigenvalues if . Hence we conclude . This bound is good enough to show that

Since , it follows .

Now we can apply the matrix Taylor approximation from Lemma 11 and use that for every with , there exists some such that the difference in the potential function is

Observe that in the last equation we have conveniently used that due to the linear constraints defining , we have for all . Now we can show that the quantity is a lot smaller than we have proven so far — in fact its maximum length is independent of the step size :
Claim II. For every with one has .
Proof of Claim II. We rearrange for and obtain

using the estimates and .

Next, we justify that up to lower order terms.

Claim III. For any with one has

Proof of Claim III. Since , the difference in the left side equals

Here we use , as well as . In particular we have also made use of the linear constraint in the choice of the subspace .

Now we prove the central core of this theorem: in expectation for a Gaussian from the subspace , the quadratic term is bounded by a term that we can offset in the potential function by the length increase of .
Claim IV. One has .
Proof of Claim IV. The argument for this claim needs some care, as we have in general since we draw from a subspace . We abbreviate (note that these matrices will in general not be symmetric). Then