Discrepancy Minimization via a Self-Balancing Walk

06/24/2020 ∙ by Ryan Alweiss, et al. ∙ MIT Princeton University Stanford University 0

We study discrepancy minimization for vectors in ℝ^n under various settings. The main result is the analysis of a new simple random process in multiple dimensions through a comparison argument. As corollaries, we obtain bounds which are tight up to logarithmic factors for several problems in online vector balancing posed by Bansal, Jiang, Singla, and Sinha (STOC 2020), as well as linear time algorithms for logarithmic bounds for the Komlós conjecture.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

We start with discussing the vector balancing problem – given vectors , pick signs so that the discrepancy is as small as possible. This problem encompasses many known problems in discrepancy theory, including the Komlós conjecture and minimizing set discrepancy. Concretely, the Komlós conjecture asks for the best bound such that for any matrix with columns with -norm norm at most , there is a signing with . The Komlós conjecture states that one may take . The best known bound is due to Banaszczyk [2, 3]. Similarly, the problem of minimizing set discrepancy considers the case where

. Here, Spencer’s famous “six standard deviations suffice”

[25] shows that there exists a signing so that . In particular, if , then discrepancy is achievable, and this is tight up to the constant .

While the original proofs of Banaszczyk and Spencer [2, 3, 25] were nonconstructive, there has been significant interest in finding algorithmic versions. The first major results in this direction, which gave polynomial time algorithms for Spencer’s result [25], were achieved by Bansal [4] and Lovett-Meka [18]. Since then, there have been several other constructive discrepancy minimization algorithms [23, 14, 5, 7, 6, 12], including those matching Banaszczyk’s bound for the Komlós conjecture [2, 3] due to Bansal, Dadush, Garg [5] and Bansal, Dadush, Garg, Lovett [6]. However, these algorithms do not currently seem to extend to an online setting.

1.1. Online algorithms.

The problem of online discrepancy minimization, proposed by Spencer [24], is to assign weights to vectors which arrive one at a time, while trying to maintain a low norm of all the partial sums Against adaptive adversaries, the best possible bound is as the adversary may choose the next vector to be orthogonal to . Furthermore, a random signing achieves a bound of .

However, it was open whether one could get any improvement over the trivial bound in the oblivious version of this vector balancing problem. Here, an adversary fixes vectors ahead of time, and the player may use randomness. No deterministic algorithm can do better than as this is essentially equivalent to the case of an adaptive adversary. Our main result provides a probabilistic algorithm that achieves an bound of

with probability at least

for any . For in , our result, assuming no further structure on the , gives a bound of

The best previous results in this direction instead required that were sampled in an iid manner from a distribution p which is known beforehand. In this direction there are a number of previous works each achieving different garuantees. In the most restrictive setting where the vectors were sampled uniformly from , Bansal and Spencer [10] showed an guarantee of . In the more general setting where when the vectors were sampled from p supported on , Aru, Narayanan, Scott, and Venkatesan [1] achieved a bound of (where the implicit dependence on is super-exponential) and Bansal, Jiang, Singla, and Sinha [9] achieved a guarantee of .

1.2. Algorithm motivation and description.

We now motivate and describe the self-balancing walk which is at the heart of the paper. Let the input vectors be , where for all , and let be chosen later. The algorithm maintains the current partial sum , and will decide the sign of probabilistically depending on and . First, it is natural that the algorithm should pick with probability each if and are orthogonal vectors. Additionally, the more correlated and are, the higher probability that the algorithm picks

A natural choice is for the algorithm to pick with probability where is a constant upper bound on with high probability. This way, if and are orthogonal, the algorithm picks equiprobably. Additionally, the bias is stronger further from the origin, and it is stronger if one of the signings reduces the norm by a larger amount. The fact that the probability is linear in is important for our analysis, hence why we must choose this parameter in the algorithm.

1
2
3 for  do
4       if  or  then
5             Fail. Algorithm terminates with failure.
6            
7      
8       with probability , and with probability
9      
Algorithm 1 – takes a sequence of input vectors and assigns them signs online to maintain low discrepancy with probability .

Our main result is that Algorithm 1 maintains low discrepancy with high probability.

Theorem 1.1.

For any vectors with for all , algorithm maintains for all with probability .

We also note that this theorem is sharp up to logarithmic factors in and due to a lower bound of given in [9]. It seems possible to the authors that a variant of Algorithm 1 can maintain an bound of instead of . This is an interesting open problem.

1.3. Consequences of Theorem 1.1

Algorithm 1 works against oblivious adversaries. Therefore, Theorem 1.1 implies tight bounds up to logarithmic factors in and for all of Questions 1-5 in [9]. We state Questions 4 and 5, which are about oblivious adversaries, as these generalize the stochastic and prophet models discussed in the other questions raised in [9].

  • [9, §8, Question 4] Is there an online algorithm which maintains discrepancy on any sequence of vectors in chosen by an oblivious adversary?

  • [9, §8, Question 5] Is there an online algorithm which maintains discrepancy on any sequence of -sparse vectors in chosen by an oblivious adversary?

In fact, Theorem 1.1 directly implies a nearly tight bound of for Question 4, and for Question 5.

We also get improved bounds to the online geometric discrepancy problems of online interval discrepancy and online Tusnády’s problem by using a simplified version of the reduction to vector balancing given in [9]. This is discussed in Section 3.

Finally we also obtain linear time algorithms for logarithmic bounds for the Komlós conjecture. In what follows, denotes the number of nonzero entries in the matrix .

Theorem 1.2.

Given a matrix with columns with -norm at most , we can find with high probability in time a vector such that .

This requires a minor modification of Algorithm 1 which we sketch at the end of Section 2. Previous constructive discrepancy minimization algorithms [4, 18, 5, 14, 6] involved expensive linear algebra or solving linear or semidefinite programs, although [5, 6] achieve stronger bounds of respectively. This result therefore can be seen as a stepping stone towards giving input-sparsity time algorithms for discrepancy problems, a direction mentioned by Dadush [11].

1.4. Previous approaches for algorithmic discrepancy minimization.

Here we describe previous approaches to algorithm discrepancy minimization and the difficulties in extending previous methods to the online setting against oblivious adversaries. Previous approaches [4, 18, 14, 5, 6] either solve linear or semidefinite programs or perform random walks on the sign vector , all which require knowing all input vectors at the beginning.

The results of [10] and [9] work for the restricted online setting where all vectors come from a fixed distribution p and work by choosing the sign to minimize a potential function of the current point such as . This approach has significant difficulties working in the setting of oblivious adversaries as algorithms minimizing potentials are deterministic, and a lower bound of holds against any deterministic algorithm against oblivious adversaries.

1.5. Overview of analysis of Algorithm 1

A natural approach to analyzing Algorithm 1

would be to show that some potential function or exponential moment is increasing slowly in expectation, as is done with several analyses of (sub)martingales. However, this mode of analysis encounters significant difficulties due to the fact that the partial sum

and might be orthogonal. This prevents us from arguing that some potential function is pointwise a (sub)martingale.

We instead maintain a distributional guarantee that the distribution of over executions of Algorithm 1

is less “spread" out than an associated normal distribution. This allows us to transfer the tail bounds on normal distributions to the distribution of

.

1.6. Preliminaries and conventions.

For a vector , we let denote the -th coordinate of . For positive semidefinite matrices we write if is positive semidefinite. For a positive semidefinite matrix we define as the normal distribution with covariance , i.e. . For a subset , we write for the indicator function of the set .

1.7. Concurrent and Independent Work

In independent and concurrent work, Bansal, Jiang, Meka, Singla, and Sinha [8] (building on the techniques of [9]) achieve similar guarantees to the present work (with worse poly-log factors) for the online Komlós problem restricted to the setting where vectors are sampled randomly from a fixed distribution p inside the unit sphere. However [8] uses potential based techniques as in [9] and thus their results due not extend to minimizing discrepancy in the (more general) oblivious adversary model which is the primary focus of this paper. They also consider two extensions of this problem. In the first they consider a more general problem of splitting a set of incoming vectors (drawn from a stochastic distribution) into families such that the discrepancy between any two families is small. In the second they prove a more general result showing that one can balance vectors chosen from a known distribution p inside the unit sphere against an arbitrary norm induced by a symmetric convex body with Gaussian measure at least . We believe our methods extend to these settings and plan on addressing this in future work.

2. Analysis

2.1. Properties of Spreading

We now define the key notion for the analysis – the notion of one random variable being a

spread of another.

Definition 2.1.

We say that random variables on is a spread of random variable on if there exists a coupling of and such that .

The univariate notion of the definition above appears in mathematical economics literature under the name “mean-preserving spread” [22] and is closely related to “second-order stochastic dominace” [17, 16, 22]. As defined, the name spread may seem unintuitive, but consider the coupling between and such that . Then the random variable conditional on has mean . Thus, if is a spread of , one can obtain by first sampling , and then adding a mean random variable whose distribution may depend on the specific value of sampled. Furthermore, since whether is a spread of the random variable only depends on the respective distributions of and , we will often refer to distributions as spreads of one another. Equivalently, and can be coupled so that form a two-step martingale; the former perspective, however, is far more useful here.

Lemma 2.2.

Let distribution be a spread of . For any convex function , we have that

Proof.

Couple and in the manner which demonstrates that is a spread of . Then

where we have used Jensen’s inequality and that

Spreading is transitive and preserved by linear transformations.

Lemma 2.3 (Spreading is transitive).

If is a spread of and is a spread of , then is a spread of .

Lemma 2.4 (Linear transformations maintain spreading).

If is a spread of , then for any linear transformation on we have that is a spread of

The following is a slightly more abstract property of spreading that we need for Theorem 1.1.

Lemma 2.5.

Consider random variables , , , and . Suppose that is a spread of and is a fixed random variable such that is a spread of the conditional distribution of given for any value . Then , where and are sampled independently, is a spread of .

Remark.

It is implicit in the above definition that and live on the same probability space.

Proof.

The proof produces the desired coupling between and as follows.

  • Sample and using the coupling between and which demonstrates that is a spread of .

  • Then sample from the conditional distribution of given so that and are conditionally independent given .

  • Finally sample from the coupling of and (given the value of ) so that and are conditionally independent given .

We claim that the marginal distribution of is as if and were sampled independently. This follows by noting that and are conditionally independent given in this coupling and that the distribution of conditional on is independent of the value of by hypothesis. Finally we prove that is a spread of . Note that we have

where each equality follows by construction. Therefore, The result follows. ∎

Finally, any bounded variable with mean can be spread by the appropriate normal distribution.

Lemma 2.6 (Spreading real variables by Gaussians).

Let be a real-valued random variable with and . Then is a spread of .

Proof.

We first prove that the Bernoulli distribution with equal weight on

is a spread of any such variable . To see this, define conditional on to be with probability and with probability . Then is clearly supported on only and

so the Bernoulli distribution with equal weight on is a spread of . To see that is a spread of the Bernoulli distribution with equal weight on , let be distributed as if and otherwise. The result follows by noting that

and is distributed as . ∎

2.2. Proof of Theorem 1.1

We now formalize the notion of the distribution at time induced by Algorithm 1. For , this is defined to be the distribution of , except with all mass where the algorithm failed (line 1) moved to

Definition 2.7 (Distribution induced by Balance).

We define the distribution induced by
recursively.

  • At we have all mass of the distribution at .

  • Move all mass with or to – this mass will stay at for the remainder of the process. We refer to such as being corrupted.

  • For the remaining mass , evolve the distribution according to lines 1, 1.

will denote the distribution of the vector after time steps of the above procedure.

One key observation is that at each stage we have that is symmetric about the origin. We will ultimately compare to for an appropriate set of covariance matrices .

Definition 2.8.

Fix and . Define . For define inductively as

We now note that these covariance matrices are pointwise upper bounded independent of time.

Lemma 2.9.

Let , , . If then for any vector with

satisfies

Proof.

It is direct that . Note that

as . ∎

Applying Lemma 2.9 inductively gives us the following immediate corollary.

Corollary 2.10.

For all we have that .

The following lemma is the key step in our analysis, where we show that the distribution is spread by normal distributions with covariance matrices defined in Definition 2.8.

Lemma 2.11.

is a spread of for all times .

Proof.

For simplicity, we write throughout the proof. We can compute that if the algorithm does not fail (line 1) then

Define to be except when became corrupted – in this case set to . Let be the distribution of . As is symmetric, is a spread of . Define random variables

where is defined in line 1. By definition, is distributed as

By induction and Lemma 2.4 we have that

is a spread of , which is a spread of by Lemma 2.3.

Note that by definition, is mean and supported on as if is not corrupted. By Lemma 2.6, we have that is a spread of for each . Thus by Lemma 2.5, we have that

is a spread of , as desired. ∎

We can get tail bounds on because they are always bounded (Corollary 2.10).

Lemma 2.12.

For any vector with we have that

Proof.

Note that is distributed as and by Corollary 2.10. This implies that is a spread of . The result then follows by noting that

where we have used Lemma 2.2 on the convex function

We are now ready to complete the proof of Theorem 1.1 by combining the fact that is spread by with the tail bounds in Lemma 2.12.

Proof of Theorem 1.1.

It suffices to bound the total amount of additional corrupted mass in compared to . To bound this note for any vector with that

(2.1)
(2.2)

by the choice of . Here, we have used that is a spread of through Lemma 2.11, Lemma 2.2 on the convex function , and Lemma 2.12. Now, at step union bound over the choices where denotes the unit basis vector for coordinate . ∎

Proof of Theorem 1.2.

Modify Algorithm 1 to use for and note that we no longer need to maintain that at every step and instead only at the end. As in creftypeplural 2.2 and 2.1 we have that

Therefore, a union bound over shows that the algorithm does not fail for steps with probability Again, as in creftypeplural 2.2 and 2.1 for any basis vector we have that

Now union bounding over all using that gives the bound. ∎

3. Applications

We can obtain improved bounds for several geometric discrepancy problems given in [9]. Additionally, given Theorem 1.1 our algorithms are simpler, and do not require the Haar basis / wavelets used in [9].

3.1. Interval discrepancy.

The -dimensional interval discrepancy problem is to assign weights to vectors to minimize the discrepancy

over all intervals and

Applying Theorem 1.1 gives bounds for the -dimensional interval discrepancy problem matching the known lower bounds up to factors shown in [9] Theorem 1.2. Additionally, our result works when is sampled from an arbitrary known distribution on , instead of only in the case where the distribution is uniform.

Theorem 3.1.

There is an online algorithm which for vectors chosen from a known distribution p on maintains for all intervals and with probability

Sketch.

For simplicity we consider the case when the distribution p

is absolutely continuous with respect to the Lebesgue measure; this assumption can be removed with some care. Define the quantiles

for Let be a sorting of all the quantiles for By increasing by constants we can assume that for some integer . Now, consider the set of intervals

Note that each point is in total such intervals, over all dimensions we wish to consider. Therefore, the proof of Theorem 1.1 shows that the distribution of the discrepancy over the dyadic intervals is -subgaussian. Now, every interval can be written as a union of intervals in , plus small error terms on the ends. Therefore, the corresponding vector has norm bounded by Because there are total intervals, all have discrepancy

with high probability as desired. ∎

3.2. Online Tusnády’s problem.

Tusnády’s problem is the following – given points
to minimize the maximum discrepancy over boxes

The best known upper bound is [21] and the best known lower bound is [19]. One can ask an analogous online version, which is to minimize the maximum over boxes and of

We can apply Theorem 1.1 to the online Tusnády problem in the case where the vectors are sampled from a known distribution p on . Our bounds are more general and improve over the bounds of [9, 13], which only worked in the case of product distributions.

Theorem 3.2.

There is an online algorithm which for vectors chosen from a known distribution p on maintains for all boxes and with probability

Sketch.

As before, for simplicity, we only consider the case where p is absolutely continuous with respect to the Lebesgue measure. Compute the quantiles as done in the proof of Theorem 3.1. Now, build the dyadic decomposition as in Theorem 3.1 one dimension at a time recursively to build dyadic boxes. Each point is in such dyadic boxes. Thus, the proof of Theorem 1.1 shows that the distribution of discrepancy over the dyadic intervals is subgaussian. Now, every box can be written as the union of dyadic boxes, plus small errors. Therefore, the corresponding vecotr has norm at most . As there are such boxes to consider, we achieve discrepancy at most

with high probability. The rounding error can be handled with a Chernoff bound. ∎

We end by noting that our bounds for Theorem 3.1 and 3.2 work in the offline setting as well. Indeed, knowing the input points allows to form the dyadic partitions just as described in the proofs. For the interval disrepancy problem in the offline setting, an upper bound of [26] is known, as well as lower bounds of [20, 15] and . Therefore, our bound for the interval discrepancy problem is only a single factor off the best known offline bound, and a factor off the known lower bound. Additionally, our bound is only a factor off the best known offline bound for Tusnády’s problem, and a factor off the known lower bound.

Acknowledgements

The authors thank Noga Alon, Vishesh Jain, Mark Sellke, Aaron Sidford, and Yufei Zhao for helpful comments regarding the manuscript. We thank Nihkil Bansal for observations which led to improvements of several logarithmic factors in Theorem 3.1 and Theorem 3.2. R.A. is supported by an NSF Graduate Research Fellowship. Y.L. was supported by the Department of Defense (DoD) through the National Defense Science and Engineering Graduate Fellowship (NDSEG) Program.

References

  • [1] Juhan Aru, Bhargav Narayanan, Alex Scott, and Ramarathnam Venkatesan, Balancing sums of random vectors, Discrete Anal. (2018), Paper No. 4, 17.
  • [2] Wojciech Banaszczyk, Balancing vectors and gaussian measures of n-dimensional convex bodies, Random Structures & Algorithms 12 (1998), 351–360.
  • [3] Wojciech Banaszczyk, On series of signed vectors and their rearrangements, Random Structures & Algorithms 40 (2012), 301–316.
  • [4] Nikhil Bansal, Constructive algorithms for discrepancy minimization, 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, IEEE Computer Society, 2010, pp. 3–10.
  • [5] Nikhil Bansal, Daniel Dadush, and Shashwat Garg, An algorithm for Komlós conjecture matching Banaszczyk’s bound, IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA (Irit Dinur, ed.), IEEE Computer Society, 2016, pp. 788–799.
  • [6] Nikhil Bansal, Daniel Dadush, Shashwat Garg, and Shachar Lovett, The gram-schmidt walk: a cure for the Banaszczyk blues

    , Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018 (Ilias Diakonikolas, David Kempe, and Monika Henzinger, eds.), ACM, 2018, pp. 587–597.

  • [7] Nikhil Bansal and Shashwat Garg, Algorithmic discrepancy beyond partial coloring, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017, pp. 914–926.
  • [8] Nikhil Bansal, Haotian Jiang, Raghu Meka, Sahil Singla, and Makrand Sinha, Forthcoming work.
  • [9] Nikhil Bansal, Haotian Jiang, Sahil Singla, and Makrand Sinha, Online vector balancing and geometric discrepancy, STOC 2020 (2020).
  • [10] Nikhil Bansal and Joel H Spencer, On-line balancing of random inputs, arXiv preprint arXiv:1903.06898 (2019).
  • [11] Daniel Dadush, https://homepages.cwi.nl/~dadush/workshop/discrepancy-ip/open-problems.html.
  • [12] Daniel Dadush, Aleksandar Nikolov, Kunal Talwar, and Nicole Tomczak-Jaegermann, Balancing vectors in any norm, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2018, pp. 1–10.
  • [13] Raaz Dwivedi, Ohad N Feldheim, Ori Gurel-Gurevich, and Aaditya Ramdas, The power of online thinning in reducing discrepancy

    , Probability Theory and Related Fields

    174 (2019), 103–131.
  • [14] Ronen Eldan and Mohit Singh, Efficient algorithms for discrepancy minimization in convex sets, Random Struct. Algorithms 53 (2018), 289–307.
  • [15] Cole Franks, A simplified disproof of beck’s three permutations conjecture and an application to root-mean-squared discrepancy, arXiv preprint arXiv:1811.01102 (2018).
  • [16] Josef Hadar and William R Russell, Rules for ordering uncertain prospects, The American Economic Review 59 (1969), 25–34.
  • [17] Giora Hanoch and Haim Levy, The efficiency analysis of choices involving risk, The Review of Economic Studies 36 (1969), 335–346.
  • [18] Shachar Lovett and Raghu Meka, Constructive discrepancy minimization by walking on the edges, SIAM J. Comput. 44 (2015), 1573–1582.
  • [19] Jirí Matousek and Aleksandar Nikolov, Combinatorial discrepancy for boxes via the gamma_2 norm, 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands (Lars Arge and János Pach, eds.), LIPIcs, vol. 34, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015, pp. 1–15.
  • [20] Alantha Newman, Ofer Neiman, and Aleksandar Nikolov, Beck’s three permutations conjecture: A counterexample and some consequences, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, IEEE, 2012, pp. 253–262.
  • [21] Aleksandar Nikolov, Tighter bounds for the discrepancy of boxes and polytopes, Mathematika 63 (2017), 1091–1113.
  • [22] Michael Rothschild and Joseph E Stiglitz, Increasing risk: I. A definition, Journal of Economic theory 2 (1970), 225–243.
  • [23] Thomas Rothvoss, Constructive discrepancy minimization for convex sets, SIAM Journal on Computing 46 (2017), 224–234.
  • [24] Joel Spencer, Balancing games, J. Comb. Theory, Ser. B 23 (1977), 68–74.
  • [25] Joel Spencer, Six standard deviations suffice, Trans. Amer. Math. Soc. 289 (1985), 679–706.
  • [26] Joel H Spencer, Aravind Srinivasan, and Prasad Tetali, The discrepancy of permutation families, Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, LA, 1997), ACM, New York, 1997, pp. 692–701.