    # On-Line Balancing of Random Inputs

We consider an online vector balancing game where vectors v_t, chosen uniformly at random in {-1,+1}^n, arrive over time and a sign x_t ∈{-1,+1} must be picked immediately upon the arrival of v_t. The goal is to minimize the L^∞ norm of the signed sum ∑_t x_t v_t. We give an online strategy for picking the signs x_t that has value O(n^1/2) with high probability. Up to constants, this is the best possible even when the vectors are given in advance.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A random set of vectors is sent to our hero, Carole. The vectors are each uniform among the vectors with coordinates , and they are mutually independent. Carole’s mission is to balance the vectors into two nearly equal groups. To that end she assigns to each vector a sign . Critically, the signs have to be determined on-line – Carole has seen only vectors when she determines sign . Set

 P=x1v1+…+xnvn (1.1)

Carole’s goal is to keep all of the coordinates of small in absolute value. We set , the norm of . We consider , the value of this (solitaire) game, which Carole tries to minimize.

As our main result, we give a simple algorithm for Carole (with somewhat less simple analysis!) such that with high probability. Here is an absolute constant which we do not attempt to optimize.

To give a feeling, imagine Carole simply selected uniformly and independently, not looking at . Then each coordinate of would have distribution , roughly , standard normal. For, say, , the great preponderance of the coordinates would lie in . However, there would be a small but positive proportion of outliers, coordinates not lying in that interval. Indeed, the largest coordinate, with high probability, would be

. Carole’s task, from this vantagepoint, is to avoid outliers.

We more generally define the value of the -round game , where there are vectors in with arbitrary. In particular, think of as very large. We modify of our algorithm for to give an algorithm for which with high probability. This is considered in section 3.3.

### 1.1 Four Discrepancies

Paul, our villian, sends to Carole. Carole balances with signs . The value of this now two-player game is with as above. There are four variants. Paul can be an adversary (trying to make large) or can play randomly (as above). Carole can play on-line (as above) or off-line – waiting to see all before deciding on the signs . All of the variants are interesting.

Paul adversarial, Carole offline. Here . This was first shown by the senior author  and the first algorithmic strategy (for Carole) was given by the junior author .

Paul random, Carole offline. Here . In recent work , a value such that (with high probability) was conjectured with strong partial results.

Paul adversarial, Carole online. Here . These results may be found in the senior author’s monograph . Up to constants, Carole can do no better than playing randomly. It was this result that made our current result a bit surprising.

Paul random, Carole online. , the object of our current work.

The round setting is also very interesting. If Paul picks vectors adversarially, and Carole plays online, then no better bound is possible than exponential in . Basically, all Carole can do is alternate signs when one of the possible vectors is repeated.

### 1.2 Alternate Formulations

We return to our focus, the random online case. We find it useful to consider the problem in a variety of guises.

Consider an -round (solitaire) game with a position vector . Initially . On each round a random is given. Carole must then reset either or . The value of the game is with the position vector after the rounds have been completed.

Chip game. Consider chips on , all initially at . Each round each chip selects a random direction. Carole then either moves all of the chips in their selected direction or moves all of the chips in the opposite of their selected direction. After rounds the value is the longest distance from the origin to a chip. (Here chip at position represents that the -th coordinate of is .)

Folded chip game. Consider chips on the nonnegative integers, initially all at . The rules are as above except that a chip at position automatically is moved to . (Here the chip position is the absolute value of its position in the previous formulation.)

### 1.3 Erdős

Historically, discrepancy was examined for families of sets. Let be a set system with and a collection of subsets of . For a two-coloring , the discrepancy of a set is defined as , and measures the imbalance from an even split of . The discrepancy of the system is defined as

 (1.2)

That is, it is the minimum imbalance of all sets in over all possible two-colorings . Erdős famously asked for the maximal possible over all such set systems. It was in this formulation that the senior author first showed that .

Consider the incidence matrix for the set system . That is, set if , otherwise . Let denote the column vectors of . The coloring corresponds to the choice of . Then measures the maximal imbalance of the coloring. The set-system problem is then essentially the Adversarial, Off-Line Paul/Carole game. The distinction is only that the coordinates of the are instead of .

## 2 Carole’s Algorithm

The time will be indexed . Initially . In round random arrives and Carole resets . Let denote the vector after the -th round. Let denote the -th coordinate of .

The algorithm depends on variables . We shall want with high probability. will be a large constant as specified later. will be a positive integer central to the algorithm. We may take and to be specific. However, we use the variables and in the analysis until the end to understand the various dependencies among the parameters.

Define the gap for set as

 gj(t):=cn−dj(t)2 (2.3)

The algorithm will, with high probability, keep all so that the gaps are positive. Let

 Φj(t)=cpnp−1gj(t)−p (2.4)

and define the potential function

 Φ(t)=∑jΦj(t)=cpnp−1n∑j=1gj(t)−p (2.5)

As for all , . Note that potential blows up as any discrepancy approaches . The factor provides a convenient normalization. When all , .

The algorithm is simple. On the -th round, seeing , Carole selects the sign , that minimizes the increase in the potential .

### 2.1 Rough Analysis

Lets imagine all the as positive and near the boundary . The gap basically acts like

 g∗j(t)=2√cn[√cn−dj(t)] (2.6)

Let be the potential values using this cleaner gap function. Suppose all . Then and . Set and consider the change ( large) when is incremented or decremented by one. From Taylor Series we approximate

 f(x±1)−f(x)f(x)∼∓px−1+p(p+1)2x−2 (2.7)

ignoring the higher order terms. Consider the change in when a random vector is added. We break it into a linear part and a quadratic part . We compare their sizes using (2.7). The quadratic term is always positive, for each term, adding to . The linear term is for each term. As the vector (critically!) is random the signs are random and so add to distribution roughly , standard normal. Thus . Carole’s sign selection, effectively, replaces with . The change in is then proportional to . With probability at least , say, . Then, fixing , will be on the order of while will be on the order of . Letting be very large, say , the linear term will be much bigger than the positive quadratic term .

Now lets keep the total potential fixed but suppose that some of the gaps were smaller and the other gaps had zero affect on the total potential. Say, giving a good parametrization, that for values of (As the potential takes to power , the total potential will remain the same.) Again we break the change in into and . We think of as fixed and consider the effect of . The quadratic terms are now for each term, an extra factor of . But the number of terms is so the new value is . The linear terms are now for each term, an extra factor of . Now, however, we sum random signs, giving . Compared to the base case the quadratic term has been multiplied by while the linear term has been multiplied by . We’ve taken so these factors are and respectively. As gets bigger the domination of over becomes stronger. This gives us “extra room” and works even if only a proportion of the potential function came from these .

In the actual analysis the total potential is in a prescribed moderate range. However, we cannot assume that all of the potential comes from some coordinates with the same gaps. We split the coordinates into classes, those in the same class having roughly the same value. We find some class that has so much of the total potential that will dominate over . Making all this precise is the object of Lemma 2.2 below.

### 2.2 Analysis

We will show the following result.

###### Theorem 2.1.

The strategy above achieves value , with probability at least , where .

The potential starts initially at . Let . We consider the situation when the potential lies between and . (The value could be any sufficiently large constant.) We will show that if , then at any step the potential can increase by at most . More importantly, whenever , the sign for the element at time can be chosen so that there is a strong negative drift that more than offsets the increase. More formally, we can decompose the rise in potential into a linear part and some quadratic part , satisfying the following properties.

###### Lemma 2.2.

Consider time

. The increase in potential is a random variable (depending on the randomness in column

) that can be written as , where

1. with probability , whenever .

2. with probability at least , whenever .

Lemma 2.2 directly implies Theorem 2.1 as follows. Whenever , the potential has a strong negative drift as . As the interval has size , and the positive increment is , standard probabilistic tail bounds give that whenever reaches it will have an exponentially small probability of reach . Are there are only rounds of play, the result follows.

We now try to compute some relevant quantities. Let denote the color of element , and let

be indicator random variable if element

lies in . Then,

 Δdj(t):=dj(t)−dj(t−1)=xty(j,t) (2.8)

and note that .

Let us condition on the event that . Then for each and hence , which implies that .

We now upper bound the increase in potential, as follows.

Let with . Then , and

 f′′(x) = 2p(cn−x2)−p−1+4p(p+1)x2(cn−x2)−p−2 (2.9) = (2p(cn−x2)+4p(p+1)x2)(cn−x2)−p−2 ≤ 4p(p+1)cn(cn−x2)−p−2(as x2

As , for any smooth function , and as we have for . Using the expression for and the upper bound on in (2.9), we have that for

 f(x+η)−f(x)≤2px(cn−x2)p+1η+4p(p+1)cn(cn−x2)p+2 (2.10)

Setting , and gives

 Φj(t)−Φj(t−1)≤Lj(t)xt+Qj(t) (2.11)

where

 Lj:=cpnp−12pdj(t−1)y(j,t)gj(t−1)p+1 and % Qj:=cpnp−14p(p+1)cn(gj(t−1))p+2 (2.12)

As we only be interested in time , henceforth we drop . Let and . Then,

 L=∑jcpnp−12pdjyjgp+1j and Q=∑jcpnp−14p(p+1)cngp+2j (2.13)

and

#### Notation.

Let . For we say that a set lies in class if , or equivalently . Let denote the number of sets of class . As for a set of class , , by (2.13) can be upper bounded as,

 Q≤4p(p+1)cn2∑k≥0β(k+1)(p+2)nk. (2.14)

We also have the following useful bounds.

###### Lemma 2.3.

If , then

1. For each class , .

2. .

###### Proof.

As and for a set of class , we have that

 Φ≥cpnp−1∑k≥0βkpnk(cn)p=∑k≥0βkpnkn. (2.15)

As , each class contributes at most , which gives .

We now bound . Let be the maximum class index for which . As , we have .

Plugging in the bound for in (2.14) gives

 Q≤kmax∑k=04p(p+1)Hcnβp+2k+2=O(β2kmaxn)=O(n−1+2/p), (2.16)

where we use that and . ∎

We now focus on lower bounding , when . Recall that , and hence is a weighted sum of random variables . We will call the weight of .

We will use the following fact from 

###### Lemma 2.4.

Let all have absolute value at least . Consider the signed sums for . The number of sums that lie in any interval of length is maximized when all the and the interval is . In particular, taking for a small constant , the sums lie in only a small fraction of the time.

We use this as follows to show that the probability that is small. Consider the indices where the weights lies in (suitably chosen) weight class, and fix the signs outside that class. Then for any values of signs outside that class, the signs in the class that will put the total sum in is bounded by the probability in the lemma above.

We now do the computations.

###### Claim 2.5.

For a set of class , the weight is at least .

###### Proof.

This follows as , and for any class , and . ∎

By Lemma 2.4 and Claim 2.5, to show that with a constant probability, it would suffice to show that there is some class such that

 2p1/2βk∗(p+1)(cn3)1/2n1/2k∗≫Q (2.17)

Note that only classes are considered in Claim 2.5, while also has terms from class , so we need the a final technical lemma to show that this contribution from class can be ignored.

###### Lemma 2.6.

If , the contribution of class sets to is at most .

###### Proof.

As for a class set, and there at most such sets, the contribution of class to is at most . So to prove the claim, it suffices to show that .

As for a set of class , we have

 H2≤Φ≤cpnp−1∑k≥0nk(β−k−1cn)p=1n∑k≥0β(k+1)pnk, (2.18)

which gives . Using this together with for class sets and in the expression for in (2.13), we get

 Q≥∑k≥04p(p+1)cn2nkβk(p+2)≥2p(p+1)β−pHcn≥8p(p+1)βp+2cn, (2.19)

where the last equality uses our choice of . ∎

By (2.14) and the lemma above, to prove (2.17) it suffices to show that

###### Lemma 2.7.

There is a such that

 β(p+1)k∗n1/2k∗≫O(p3/2)∑k≥1β(k+1)(p+2)nk(cn)1/2 (2.20)
###### Proof.

Let , and note that by Lemma (2.3), for all . Writing is terms of , we need to show that there is some satisfying

 (ℓk∗βk∗(p+2))1/2≫O((Hp3/c)1/2)∑k≥1ℓkβ2k+p+2 (2.21)

Let , and let . Then for all , and hence . So the term on the right hand side of (2.21) is at most

 ∑k≥1βp+2vβ−k≤βp+2vβ−1=O(pv).

Next, as , the left hand side of (2.21) is at least , where the inequality follows as for all . So by (2.21), choosing finishes the proof. ∎

## 3 Arbitrary time horizon

We now consider the round setting, where can be arbitrarily large compared to . In particular, a uniformly chosen vector arrives at time , and Carole then selects a sign . As previously, , and the value after rounds is .

We will assume that is fixed in advance by Paul (and is not known to Carole). In particular, if can be chosen adaptively by Paul depending on the Carole’s play, then the problem is not very interesting and the exponential in lower bound  for adversarial input vectors still holds. This is because even if the input vectors are random, after sufficiently long time, some worst case adversarial sequence (against any online strategy) will eventually arrive.

Our main result is a strategy for Carole, described in section 3.3, that achieves with high probability. Before proving this result, we describe two strategies that achieve a weaker (but still independent of ) bound of . These are very natural and interesting on their own with simple analysis and are discussed in sections 3.1 and 3.2.

### 3.1 Strategy 1

The first strategy is based on a potential function approach as before, but with an exponential penalty function. This has the drawback of losing an extra factor, but has the advantage that the potential has a negative drift whenever it exceeds a certain threshold (without requiring an upper bound on as in Lemma 2.2). This allows us to bound the discrepancy for an arbitrary time horizon, as whenever the potential exceeds the thresholds the negative drift will bring it back quickly.

#### Strategy.

Consider a time step . Let be the discrepancy of the -th coordinate at the end of time . Consider the potential

 Φ(t)=n∑i=1cosh(λdi(t)),

where and is a large constant greater than .

As before, when presented with the vector , Carole chooses the sign that minimizes the increase in potential, .

#### Analysis.

Let denote the -th coordinate of . As we will only consider the time , let us denote , , and .

By the Taylor expansion and as and , the increase in potential can be written as

 ΔΦ = ∑i(λsinh(λdi)xtv(i)+λ22!cosh(λdi)(xtv(i))2+λ33!sinh(λdi)(xtv(i))3+…) (3.22) ≤ ∑iλsinh(λdi)xtv(i)+∑iλ2cosh(λdi)(xtv(i))2

where the second step follows as for all and , , and so the higher order terms are negligible compared to the second order term.

Let be the linear term, and be the second term in (3.22) (note that ). Conveniently, is exactly .

As the algorithm chooses to have , it suffices to show the following key lemma.

###### Lemma 3.1.

If , then with probability at least .

Before proving the lemma we need the following anti-concentration estimate.

###### Lemma 3.2.

If , with independent and uniform in , and . Then

 Pr[|Y|≥1√2(∑ia2i)1/2]≥14.
###### Proof.

The Paley-Zygmund theorem states that for any random variable

with finite variance and any

,

 Pr[X>sE[X]]≥(1−s2)E[X]2/E[X2]. (3.23)

We apply this to , and note and

 E[Y4]=∑ia4iE[Y4i]+3∑j∑i≠ja2ia2jE[Y2i]E[Y2j]≤3(∑ia2i)2=3(E[Y2])2. (3.24)

Setting in (3.23), and using the upper bound on in (3.24) gives

 Pr[|Y|≥1√2(E[Y2])1/2]=Pr[Y2≥12E[Y2]]≥34(E[Y2])2E[Y4]≥14,

which implies the claimed result. ∎

###### Proof (Lemma 3.1).

By Lemma 3.2, with probability at least ,

 |L|≥λ√2(∑isinh2(λdi))1/2=λ√2(∑i(cosh2(λdi)−1))1/2=λ√2((∑icosh2λdi)−n)1/2. (3.25)

As for all , . So for , we get

 (∑icosh2(λdi))−n≥12∑icosh2(λdi)≥Cauchy-Schwarz12(∑icosh(λdi))2n=12Φ2n (3.26)

Together (3.26) and (3.25) give that

 Pr[|L|≥λ2√nΦ]≥14. (3.27)

Using and plugging , gives that . ∎

As , we have (i) and setting , by Lemma 3.1 we have that (ii) if , then with probability at least . Setting , then this gives that and whenever , then with probability at least .

So performs a random walk over time with positive increments bounded by , and if , it has an expected negative drift of . By standard arguments, this implies that any time , . As and , this gives that with high probability.

### 3.2 Strategy 2

Our second strategy is even simpler, and we call it the majority rule. For convenience, it is useful to think of the folded chip view of the game, as described in section 1.2. In particular, there are chips, originally all at , the position of the -th chip being the absolute value of . From , a chip must go to . Each chip not at picks a random direction, and Carole then either moves all of the chips in their selected direction or all in their opposite directions. So from a position , a chip can go to .

#### Majority rule strategy.

Consider the directions of the chips not at position zero. If there is a direction with strict majority, Carole chooses the sign that makes the majority of the chips not at zero move towards zero. Otherwise, in case of a tie, Carole picks randomly.

#### Analysis.

We will show the following.

###### Theorem 3.3.

The majority rule strategy achieves . More precisely, the probability that any chip has position at time is .

###### Proof.

Consider some time , and a chip that is at a non-zero position at the end of . We claim that chip basically does a random walk with drift towards zero.

Look at the other non-zero coordinates (other than ), and suppose there are of them. We consider two cases depending on whether

is even or odd.

1. is even. Consider the random directions of the chips other than , as given by . If these directions are evenly split which occurs with probability , then the majority direction is determined by and so chip goes towards the origin.

Else if the directions are not split evenly, then at least chips of these chips have one direction (and at most the other). So has no effect on the outcome of the majority rule, and as is random and independent of the other directions, chip moves randomly.

2. is odd. If strictly more than of the chips have one direction, then the sign of does not affect the majority outcome. So as above, the chip moves randomly.

Else, exactly chips have one direction (say ) and have . As the directions are random this happens with probability . Conditioned on this event, with probability , the direction of chip is also , in which case there is a strict majority for , and chip goes towards the origin. Else picks the direction with probability , resulting in an overall tie, in which case Carole (and hence chip ) moves randomly.

So in either case, each chip does a random walk on non-negative integers with a reflection at and with drift at least towards the origin. That is, from it goes to , and from it goes to with probability at least , and else to . So the stationary distribution at positions for this chip, is dominated by the stationary distribution for an (imaginary) chip that goes to with probability