    # Relative Error Streaming Quantiles

Approximating ranks, quantiles, and distributions over streaming data is a central task in data analysis and monitoring. Given a stream of n items from a data universe U (equipped with a total order), the task is to compute a sketch (data structure) of size poly(log(n), 1/ε). Given the sketch and a query item y ∈U, one should be able to approximate its rank in the stream, i.e., the number of stream elements smaller than y. Most works to date focused on additive ε n error approximation, culminating in the KLL sketch that achieved optimal asymptotic behavior. This paper investigates multiplicative (1±ε)-error approximations to the rank. The motivation stems from practical demand to understand the tails of distributions, and hence for sketches to be more accurate near extreme values. The most space-efficient algorithms that can be derived from prior work store either O(log(ε^2 n)/ε^2) or O(log^3(ε n)/ε) universe items. This paper presents a sketch of size O(log^1.5(ε n)/ε) (ignoring poly(loglog n, log(1/ε)) factors) that achieves a 1±ε multiplicative error guarantee, without prior knowledge of the stream length or dependence on the size of the data universe. This is within a O(√(log(ε n))) factor of optimal.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Understanding the distribution of data is a fundamental task in data monitoring and analysis. The problem of streaming quantile approximation captures this task in the context of massive or distributed datasets.

The problem is as follows. Let be a stream of items, all drawn from a data universe equipped with a total order. For any , let be the rank of in the stream. When is clear from the context, we write . The objective is to process the stream while storing a small number of items, and then use those to approximate for any . A guarantee for an approximation is said to be additive if , and multiplicative or relative if .

A long line of work has focused on achieving additive error guarantees [17, 9, 2, 8, 3, 13, 18, 12]. However, additive error is not appropriate for many applications. Indeed, often the primary purpose of computing quantiles is to understand the tails of the data distribution. When , a multiplicative guarantee is much more accurate and thus harder to obtain. As pointed out by Cormode et al. , a solution to this problem would also yield high accuracy when , by running the same algorithm with the reversed total ordering on the universe (simply negating the comperator).

A quintessential application that demands relative error is monitoring network latencies. In practice, one often tracks response time percentiles , , , and . This is because latencies are heavily long-tailed. For example, Masson et al.  report that for web response times, the 98.5th percentile can be as small as 2 seconds while the 99.5th percentile can be as large as 20 seconds. These unusually long response times affect network dynamics  and are problematic for users. Hence, highly accurate rank approximations are required for items whose rank is very large (); this is precisely the requirement captured by the multiplicative error guarantee.

Achieving multiplicative guarantees is known to be strictly harder than additive ones. A uniform sample of stream items already gives a sketch for additive error (albeit a large one), and there are additive error algorithms that store just

items for constant failure probability

. For multiplicative error, no sampling of items suffices, and any algorithm achieving multiplicative error must store items (see, for example, [4, Theorem 2]).222Even non-comparison-based algorithms must produce sketches consisting of at least bits. This assertion appears not to have been explicitly stated in the literature; we prove it in Appendix A.

The best-known algorithms achieving multiplicative error guarantees are as follows. Zhang et al.  give a randomized algorithm storing universe items. This is essentially a factor away from the aforementioned lower bound. There is also an algorithm of Cormode et al.  that stores items. However, this algorithm requires prior knowledge of the data universe (since it builds a binary tree over ), and is inapplicable when is huge or even unbounded (e.g., if the data can take arbitrary real values). Finally, Zhang and Wang  give a deterministic algorithm requiring space. Very recent work of Cormode and Veselý  proves an lower bound for deterministic comparison-based algorithms, which is within a factor of the Zhang and Wang’s upper bound.

In this work, we give a randomized algorithm that maintains the optimal linear dependence on achieved by Zhang and Wang, with a significantly improved dependence on the stream length. Our bound is strictly better than any deterministic algorithm  and within an factor of the known lower bound for randomized algorithms achieving multiplicative error.333Here, the notation hides factors polynomial in , , and .

###### Theorem 1 (Single-Quantile Approximation).

For , there is a randomized, one-pass streaming algorithm that computes a sketch consisting of

 O(1ε⋅log1.5(εn)⋅√log(1/δ)⋅√log(log(εn)/δ))

universe items, and from which an estimate

of can be derived for every . For any fixed , with probability at least , the returned estimate satisfies the multiplicative error guarantee .

We remark that we prove Theorem 1 assuming that an upper bound on the stream length is known in advance. The space usage of the algorithm grows polynomially with the logarithm of this upper bound, so if this upper bound is at most for some constant , then the space usage of the algorithm will remain as stated in Theorem 1, with only the hidden constant factor changing. In Section 5, we explain how to mitigate this assumption at the cost of an factor increase in space usage. Our mitigation technique does require knowing an upper bound of , in order to appropriately set the failure probability of the resulting algorithm. However, for all practical values of and , is at most , and hence can be treated as constant.

As a straightforward corollary of Theorem 1, we obtain a space-efficient algorithm whose estimates are simultaneously accurate for all with high probability.

###### Corollary 2 (All-Quantiles Approximation).

The error bound from Theorem 1 can be made to hold for all simultaneously with probability while storing stream items.

###### Proof.

The sketch in this paper maintains a weighted coreset for its rank estimates. It is therefore monotone in the sense that for , it yields rank estimates . It follows that, if and both and suffer at most a multiplicative error of , then all the values suffer a multiplicative error of at most . Indeed:

 R(y)≤R(y2)≤R(y1)(1+ε/3)≤^R(y1)(1+ε/3)2≤^R(y)(1+ε/3)2≤^R(y)/(1−ε), and
 R(y)≥R(y1)≥R(y2)(1−ε/3)≥^R(y2)(1−ε/3)2≥^R(y)(1−ε/3)2≥^R(y)/(1+ε).

The space cost claimed in the Corollary is achieved by applying Theorem 1 with error parameter and with failure probability set to . By a union bound, with probability at least , the resulting sketch satisfies the -multiplicative error guarantee on all stream items of rank for . In this event, the previous paragraph implies that the -multiplicative guarantee holds for all . ∎

#### Remark.

The issue of mergeability (the ability to merge sketches of different streams to get an accurate sketch for the concatenation of the streams) is significant both in theory  and in practice . We are confident that our algorithm is fully mergeable via direct application of techniques of Karnin et al. , who gave mergeable additive error quantiles sketches and whose techniques we build upon. We leave formally verifying issues regarding mergeability to future work.

### 1.1 Prior Work

Some prior works on streaming quantiles consider queries to be ranks , and the algorithm must identify a such that is close to . In comparison, we consider queries to be universe items and the algorithm must yield an accurate estimate for . Unless specified otherwise, algorithms described in this section directly solve both formulations. The algorithm that we present operates in the comparison model, in which the only operation permitted on stream items is order comparison.

Below we recap prior work. Algorithms are randomized unless stated otherwise. For simplicity, randomized algorithms are assumed to have constant failure probability. All reported space costs refer to the number of universe items stored.

Manku, Rajagopalan and Lindsay [13, 14] built on the work of Munro and Paterson  and gave a deterministic solution that stores at most items, assuming the knowledge of . Greenwald and Khanna  created an intricate deterministic algorithm that stores items. This is the best known deterministic algorithm for this problem, with a matching lower bound for comparison-based algorithms . Agarwal, Cormode, Huang, Phillips, Wei, and Yi  provided a mergeable sketch of size . This paper contains many ideas and observations that were used in later work. Felber and Ostrovsky  managed to reduce the space complexity to items by combining sampling with the Greenwald-Khanna sketches in non-trivial ways. Finally, Karnin, Lang, and Liberty  resolved the problem and provided an solution, and a matching lower bound for comparison-based randomized algorithms.

#### Multiplicative Error and Biased Quantiles.

A large number of works sought to provide more accurate quantile estimates for low or high ranks. Only a handful offer solutions to the relative error quantiles problem (also sometimes called the biased quantiles problem) considered in this work. Specifically, Gupta and Zane  gave a solution that stores items, and use this to approximately count the number of inversions in a list; their algorithm requires prior knowledge of the stream length . As previously mentioned, Zhang et al.  gave a solution storing universe items. Cormode et al.  gave a deterministic solution storing items, which requires prior knowledge of the data universe . Their algorithm is inspired by the work of Shrivastava et al.  in the additive error setting. It is also only one-way mergeable (see [1, Section 3]). Zhang and Wang  gave a deterministic algorithm storing items. Cormode and Veselý  very recently showed that, amongst deterministic comparison-based algorithms, this is within a factor of optimal, i.e., a space lower bound of items applies to any deterministic comparison-based algorithm.

Other works that do not solve the relative error quantiles problem are as follows. Manku, Rajagopalan, and Lindsay  give an algorithm that, for a specified number , stores items and can return an item with (their algorithm requires prior knowledge of ). Cormode et al.  gave a deterministic algorithm that is meant to achieve error properties “in between” additive and relative error guarantees. That is, their algorithm aims to provide multiplicative guarantees only up to some minimum rank ; for items of rank below , their solution only provides additive guarantees. Their algorithm does not solve the relative error quantiles problem:  observed that for adversarial item ordering, the algorithm of  requires linear space to achieve relative error for all ranks. Dunning and Ertl 

describe a heuristic algorithm called t-digest that is intended to achieve relative error, but provide no formal accuracy analysis.

Most recently, Masson, Rim, and Lee  introduced a new notion of error for quantile sketches (they refer to their notion as “relative error”, but it is very different from the notion considered in this work). They require that for a query percentile , if denotes the item in the data stream satisfying , then the algorithm should return an item such that . This definition only makes sense for data universes with a notion of magnitude and distance (e.g., numerical data), and the definition is not invariant to natural data transformations, such as incrementing every data item by a large constant. In contrast, the standard notion of relative error considered in this work does not refer to the data items themselves, only to their ranks.

### 1.2 Paper Outline

Sections 2-4 describe and analyze the algorithm assuming that (an upper bound on) the stream length is known in advance. Section 5 explains how to modify the algorithm to work even when is not known in advance. Section 6 describes open directions.

## 2 Description of the Algorithm

### 2.1 The Relative-Compactor Object

The crux of our algorithm is a building block that we call the relative-compactor. Roughly speaking, this object processes a stream of items and outputs a stream of at most items (each “up-weighted” by a factor of 2), meant to “approximate” the input stream. It does so by maintaining a buffer of limited capacity.

Our complete sketch (described in Section 2.2 below) is composed of a sequence of relative-compactors, where the input of the ’th relative-compactor is the output of the ’th. With (approximately) such relative-compactors, being the length of the input stream, the output of the last relative-compactor is of size , and hence can be stored in memory.

#### Compaction Operations.

The basic subroutine used by our relative-compactor is a compaction operation. The input to a compaction operation is a list of items , and the output is a sequence of items. This output is chosen to be one of the following two sequences, uniformly at random: Either or

. That is, either the even or odd indexed items in the sorted order.

Consider an item . The following is a trivial observation regarding the error of the rank estimate of with respect to the input of a compaction operation when using . We wish to view the output of a compaction operation (with all items up-weighted by a factor of 2) as an approximation to the input ; for any , its weighted rank in should be close to its rank in . Observation 3 below states that this approximation incurs zero error on items that have an even rank in . Moreover, for items that have an odd rank in , the error for introduced by the compaction operation is or with equal probability.

###### Observation 3.

A universe item is said to be even (odd) w.r.t a compaction operation if is even (odd), where is the input sequence to the operation. If is even w.r.t the compaction, then . Otherwise is a variable taking a value from uniformly at random.

The observation that items of even rank (and in particular items of rank zero) suffer no error from a compaction operation plays an especially important role in the error analysis of our full sketch.

#### Full Description of the Relative-Compactor Object. Figure 1: Illustration of the execution of a relative-compactor when inserting a new item xt into a buffer that is full at time t. See Lines 6-10 of Algorithm 1.

The complete description of the relative-compactor object is given in Algorithm 1. The high-level idea is as follows. The relative-compactor maintains a buffer of size where is an integer parameter controlling the error, is the upper bound on the stream length, and is the success probability. The incoming items are stored in the buffer until it is full. At this point, we perform a compaction operation, as described above. The input to the compaction operation is not all items in the buffer, but rather the largest items in the buffer. The parameter

is chosen at random via an exponential distribution (Line

6 of Algorithm 1), subject to the constraint that . That is, the number of compacted items is at most . These items are then removed from the buffer, and the output of the compaction operation is sent to the output stream of the buffer. This intuitively lets low ranked items stay in the buffer longer than high ranked ones. Indeed, by design the lowest-ranked half of items in the buffer are never removed. We show later that this facilitates the multiplicative error guarantee.

### 2.2 The Full Sketch

Following prior work [13, 1, 12], the full sketch uses a sequence of relative-compactors. At the very start of the stream, it consists of a single relative-compactor and opens a new one once items are fed to the output stream of the first relative-compactor (i.e., after the first compaction operation, which occurs on the first stream update during which the buffer is full). In general, when there are relative-compactors, the first time the ’th buffer performs a compaction operation (feeding items into its output stream for the first time), we open a new relative-compactor at level and feed it these items. Algorithm 2 describes the logic of this sketch. To answer rank queries, we use the items in the buffers of the different relative-compactors as a weighted coreset. That is, the union of these items is a weighted set of items, where the weight of items in relative-compactor is ( starts from 0), and the approximate rank of is the sum of weights of items in smaller than or equal to .

The construction of layered exponentially weighted compactors and the subsequent rank estimation is identical to that explained in prior works [13, 1, 12], i.e., our essential departure from prior work is in the definition of the compaction operation, not in how compactors are strung together to form a complete sketch.

### 2.3 Informal Outline of the Analysis

Recall that in the full sketch we maintain a sequence of relative-compactors, indexed by . The relative-compactor of level feeds its output as the input to relative-compactor (see Algorithm 2). The items processed by relative-compactor each represent items in the original stream.

To analyze the error of the full sketch, we focus on the error of an arbitrary item . For clarity in this informal overview, we consider the failure probability to be constant. Recall that in our algorithm, all buffers have size ; we ultimately will set , in which case . By design, no relative-compactor ever compacts the lowest-ranked items that it stores.

Let be the rank of item in the input stream, and the error of the estimated rank for . Our analysis of relies on just two properties.

1. The level- compactor only does at most roughly compactions that might affect the error of .

Roughly speaking, this holds by the following reasoning. First, we show that as we move up one level at a time, ’s rank with respect to the input stream fed to that level falls by about half (this is formally established in Lemma 12). This is the source of the factor in the denominator. Second, we show that each compaction operation that affects also kicks out items smaller than from the buffer in expectation (Lemma 6 and Theorem 8). This is the source of the factor in the denominator.

2. Let be the smallest positive integer such that (8 could be any large-enough constant so that the analysis below will hold). Then no compactions occurring at levels above affect , because ’s rank relative to the input stream of any such buffer is less than .

Again, this holds because as we move up one level at a time, ’s rank with respect to each level falls by about half (see Lemma 12).

Together, this means that the variance of the estimate for

is at most:

 Hy∑ℓ=1(R(y)/(k2ℓ))⋅22ℓ=Hy∑ℓ=1(R(y)/k)⋅2ℓ. (1)

In the LHS above, bounds the number of relevant compaction operations at layer (this exploits Property 1 above), and is the variance contributed by each relevant compaction operation at layer (because items processed by relative-compactor each represent items in the original stream).

The RHS of Equation (1) is dominated by the term for , and the term for that value of is at most444In the derivations within Equation (2), there are a couple of important subtleties. The first is that when we replace with , that substitution only is valid if . But we may assume this inequality holds wlog. This is because we can assume wlog that , as otherwise the algorithm will make 0 error on by virtue of storing the lowest-ranked items deterministically. The second subtlety is that the algorithm is only well-defined if , so when we replace with , that is only a valid substitution if , which based on the final setting of requires the assumption that .

 (R(y)/k)⋅2Hy≤Θ((R(y)/k)⋅(R(y)/B))=Θ(R(y)2/(kB))=Θ(R(y)2log(εn)/B2). (2)

The first inequality in Equation (2) exploits Property 2 above, while the last equality exploits the fact that . We obtain the desired accuracy guarantees so long as this variance is at most

, as this will imply that the standard deviation is at most

. This hoped-for variance bound holds so long as , or equivalently .

### 2.4 Roadmap for the Formal Analysis

Section 3 establishes the necessary properties of a single relative-compactor (Algorithm 1), namely that, roughly speaking, each compaction operation that affects a designated item also kicks out items smaller than from the buffer. Section 4 then analyzes the full sketch (Algorithm 2), completing the proof of our main theorem.

## 3 Analysis of the Relative-Compactor

To analyze our algorithm, we keep track of the error associated with an arbitrary fixed item . Throughout this section, we restrict our attention to any single relative-compactor (Algorithm 1) maintained by our sketching algorithm (Algorithm 2), and we use “time ” to refer to the ’th insertion operation to this particular relative-compactor.

We analyze the error introduced by the relative-compactor for an item . Specifically, at time , let be the input stream to the relative-compactor, be the output stream, and be the items in the buffer. The error for the relative-compactor at time with respect to item is defined as

 Errt(y)=R(y;Xt)−2R(y;Zt)−R(y;Bt). (3)

Conceptually, tracks the difference between ’s rank in the input stream at time versus its rank as estimated by the combination of the output stream and the remaining items in the buffer at time (output items are upweighted by a factor of while items remaining in the buffer are not). The overall error of the relative-compactor is , where is the length of its input stream. To bound , we keep track of the error associated with over time, and define the increment (or decrement) of it as

 Δt(y)=Errt(y)−Errt−1(y),

where .

First, let us consider what happens in a time step where a compaction operation occurs (Lines 6-10 of Algorithm 1). Let denote the buffer maintained by the relative-compactor at the start of the compaction operation, and recall that denotes the buffer’s capacity. Recall from Observation 3 that if is even with respect to the compaction, then suffers no error, meaning that . Otherwise, is uniform in . At a given time , let be the number of items in the buffer such that .

###### Observation 4.

Let be the variable drawn at time (see Line 6 of Algorithm 1). If then .

###### Proof.

Since , has even rank with respect to the input to the compaction operation, since its rank with respect to the compacted items is zero. The claim immediately follows from Observation 3. ∎

The following lemma will be useful for analyzing how many elements smaller than are evicted from the relative-compactor in any particular compaction operation.

###### Lemma 5.

Let

denote the random variable taking values in

, with . Assuming ,

###### Proof.

Clearly:

 E[Z]≥∑km=1m⋅exp(−m/k)∑∞m=0exp(−m/k)(i)≥3k2⋅(1−2/k)4ek>k/4.

In deriving Inequality , we use the inequalities

 (∞∑m=0exp(−1/k)m)−1=1−exp(−1/k)≥1/k−2/k2,

and

 k∑m=1mexp(−m/k)≥k∑m=k/2mexp(−1)=3k2/(4e).

Recall that denotes the parameter in Algorithm 1 controlling the size of the buffer of each relative-compactor. The following lemma states that, with high probability, there can be at most times with and reasonably large. At a high level, the proof applies the following reasoning. We show that if and , then Lemma 5 implies that items smaller than or equal to were processed by the compaction operation (at least in expectation). Since there are only elements of the input that are smaller than or equal to , this intuitively limits the number of such compaction operations to .

###### Lemma 6.

Let . With probability at least , there are at most distinct time points in which both and .

###### Proof.

Denote by the set of time points in which both and . For each such , consider the random variable . Recall that this quantity is the number of items smaller than or equal to that are processed by the compaction operation at time . We can draw two conclusions about with certainty. First, since , we know that

 z′t≥0. (4)

Second,

 ∑t∈Tz′t≤R(y). (5)

This is because each compaction operation ejects a distinct set of stream updates (each smaller than or equal to ) from the relative-compactor, and at most such stream updates ever appear in the input stream for this relative-compactor.

Let denote the random variable taking values in , with . Clearly, for integers ,

 Pr[St=sty−i|St≤sty]∝exp(−i/k),

and this holds even when further conditioning on the variables for all .

Hence, since , stochastically dominates . That is, for every value , and for every possible sequence of values that the random variables may take,

 Pr[z′t≤i |St≤sty and z′j=Z′j for all j≤t−1]≤Pr[Z≤i]. (6)

In other words, the CDF of lower bounds the CDF of .

Let , and let us now consider the event in which . If this event occurs, let denote the first timesteps in . Equation (5) implies that

 Pr[|T|>T0]≤Pr[∑t∈T′z′t≤R(y)]. (7)

Since each stochastically dominates as per Equation (6), we have

 Pr[∑t∈T′z′t≤R(y)]≤Pr[T0∑i=1Zi≤R(y)], (8)

with the different ’s being i.i.d according to the exponential distribution of , as defined above.

By Lemma 5, each variable has the property that , and . We will assume w.l.o.g that ; this clearly maximizes the probability that we wish to upper-bound. Thus . The variance of such a random variable is maximized by having only have support , with taking value with probability and taking value with probability . This implies that . We apply Bernstein’s inequality:

 Pr[T0∑i=1Zi≤R(y)]≤exp(−R(y)2/2(3/16)⋅T0k2+(1/3)(3/4)k⋅R(y))≤exp(−T0/28).

Combining the above and Inequalities (7) and (8), we conclude that

 Pr[|T|>T0]≤exp(−T0/28)≤exp(−R(y)/(4k)).

If , we can never have , and hence in this case. Otherwise, we get

 Pr[|T|≥T0]≤exp(−R(y)/(4k))≤δ

as required. ∎

The above lemma resulted in the required bound for time steps where is large. When is small, we take advantage of the size of the buffer.

###### Lemma 7.

Assuming ,

 Pr[∃t∈[n],St≤B/2+4kln(1/δ)]≤2δ.
###### Proof.

We have that:

 Pr[St≤(B/2)+4kln(1/δ)]≤∑⌊4kln(1/δ)⌋i=0exp(−(B/2−i)/k)∑B/2i=0exp(−(B/2−i)/k)=∑4kln(1/δ)i=0exp(i/k)∑B/2i=0exp(i/k) ≤4kln(1/δ)(1/δ)4(B/4)⋅exp(8ln((n/k)ln(1/δ)/δ))≤2δ4/(n/k)8≤δk/(8n).

The proof of Lemma 6 (applied to an item with ) shows that with probability at least , the total number of compactions performed by the relative compactor while processing an input stream of length is at most . It follows that

 Pr[∃t∈[n],St≤(B/2)+4kln(1/δ)]≤2δ.

Our main analytic result regarding relative-compactors is an immediate corollary of Lemmas 6 and 7:

###### Theorem 8.

Let . For any fixed item , and any fixed relative-compactor fed an input stream of length , with probability at least over the randomness of the choice of , the following holds:

 n∑t=1|Δt(y)|≤8R(y)/k.

In particular, .

## 4 Analysis of the Full Sketch

Let be the rank of in the input stream to layer . Abusing notation, we denote by the error for item at the end of the stream when comparing the input stream to the compactor of level and its output stream and buffer. That is, letting be the items in the buffer of relative-compactor after Algorithm 2 has processed the input stream,

 Errh(y)=Rh(y)−2Rh+1(y)−R(y;Bh). (9)
###### Observation 9.

Let denote the number of relative-compactors ever created by the full algorithm (Algorithm 2). Then

###### Proof.

The stream output by level is at most half the length of that in its input. It follows that the relative-compactor of level will not observe more than items, meaning the entire input to the level- relative-compactor will be stored in the memory buffer. This means that an relative-compactor will never be constructed. ∎

In what follows, we assume that for all relative-compactors, the randomness of choosing is fixed in a way that provides the guarantees of Lemma 7 and Theorem 8 for all relative-compactors opened by the full algorithm. By a union bound, this occurs w.p. at least . That is, the remainder of the analysis will condition on the following two events, and , occurring:

• is the event that every time a compaction operation occurs in any relative-compactor, the number of items that are not not compacted (see Line 6 of Algorithm 1) is at least .

• is the event that the error guarantee of Theorem 8, i.e., that holds for the relative compactor on each level . Here, is the error variable at time on level .

We now provide bounds on the rank of on each level, starting with a simple one that will be useful for bounding the maximum level with .

###### Observation 10.

For any , it holds that .

###### Proof.

Since the minimum items in the input stream to relative compactor are stored in the buffer and never given to the output stream of relative-compactor , it follows immediately that . ∎

More crucially, we prove that roughly halves with every level, up to a certain crucial level . This is easy to see in expectation and an application of the following Chernoff bound shows that it is true with high probability up a certain level. To define this level, let be the minimal for which .

###### Fact 11 (Multiplicative Chernoff bound).

Let be independent binary random variables and let . Then, for any and any , it holds that

###### Lemma 12.

Assume that and . With probability at least , for any it holds that .

###### Proof.

We show by induction that and that with sufficiently high probability.

The base case is implied by , so consider . Observe that any compaction operation at level that involves items smaller than inserts such items to the input stream at level in expectation (no matter whether is odd or even). Thus, , where the second equality holds by the induction hypothesis.

Next, we apply the Multiplicative Chernoff bound. Note that the random variable equals a fixed amount (if we fix random bits used on levels below ) plus a sum of binary random variables. Namely, each compaction on level involving items smaller than promotes such items to level and if is odd, then with probability

additional one such item. Since the expectation of the sum of these binary variables (equal to their number) is at most

, we use Fact 11 with and to show

 Pr[Rh(y)>2⋅2−hR(y)] ≤exp(−13⋅2−hR(y)) =exp(−124⋅23−H(y)⋅2H(y)−h⋅R(y)) ≤exp(−124⋅2H(y)−h⋅B) ≤exp(−2H(y)−h⋅log(1/δ))=δ2H(y)−h≤δ⋅2−H(y)+h,

where the second inequality is by the definition of , which implies , the third inequality is by , and the last inequality uses and as . This concludes the induction proof.

Finally, to show the lemma, we take the union bound over to get that with probability at least , we have for all such levels . ∎

Note that the assumption in Lemma 12 that is always satisfied by our definition of (Line 2 of Algorithm 1).

###### Lemma 13.

Conditioned on the bounds in Lemma 12, it holds that .

###### Proof.

According to Lemma 12 and the definition of as the minimal for which ,

 RH(y)−1(y)≤22−H(y)R(y)≤12B.

Invoking Observation 10, we conclude that . ∎

We are now ready to bound the overall error of the sketch, i.e., where is the estimated rank of . It is easy to see that this error for item , denoted by , can be written as

 Err(y)=H∑h=02hErrh(y),

where is the top relative-compactor that never produces any output. To bound this error we refine the guarantee of Theorem 8. Notice that for any particular relative-compactor, the bound referred to in Theorem 8 applied to a layer is a potentially crude upper bound on : each term is positive or negative with equal probability, so the terms are likely to involve a large amount of cancellation.

###### Observation 14.

Conditioned on events and occurring, for any relative-compactor , is a sum of at most random variables, i.i.d. uniform in . In particular, is a zero-mean sub-gaussian random variable with .

According to Observation 14, is a sum of zero-mean sub-gaussian random variables, and as such is itself a zero-mean sub-gaussian random variable. We make use of this fact below in order to provide a high probability bound of the error by bounding its variance.

We are now ready to prove Theorem 1. Theorem 15 below provides a more detailed version of its statement, including specifying the constants hidden by the notation. Theorem 1 is a direct consequence of Theorem 15 with set to .

###### Theorem 15.

Let be the parameters fed into Algorithm 2 and assume that . Let . For any fixed item we have that

 Pr[|Err(y)|≥εR(y)]

The overall memory used by the algorithm is

###### Proof.

We assume that for all relative-compactors, the randomness of choosing is fixed in a way that provides the guarantees of Lemma 7 and Theorem 8. This occurs w.p. at least by Observation 9.

Let us first bound the variance of . If , then Observation 3 implies that , and hence that . Recall that denotes the minimal such that , which for in particular implies , or equivalently . Then Lemma 13 implies that for any , we have .

For , by Observation 14 we have . Hence:

 Var[Err(y)] =H(y)−1∑h=022hVar[Errh(y)] ≤