# Tracking the ℓ_2 Norm with Constant Update Time

The ℓ_2 tracking problem is the task of obtaining a streaming algorithm that, given access to a stream of items a_1,a_2,a_3,... from a universe [n], outputs at each time t an estimate to the ℓ_2 norm of the frequency vector f^(t)∈R^n (where f^(t)_i is the number of occurrences of item i in the stream up to time t). The previous work [Braverman-Chestnut-Ivkin-Nelson-Wang-Woodruff, FOCS 2017] gave an streaming algorithm with (the optimal) space using O(ϵ^-2(1/δ)) words and O(ϵ^-2(1/δ)) update time to obtain an ϵ-accurate estimate with probability at least 1-δ. We give the first algorithm that achieves update time of O( 1/δ) which is independent of the accuracy parameter ϵ, together with the optimal space using O(ϵ^-2(1/δ)) words. Our algorithm is obtained using the Count Sketch of [Charilkar-Chen-Farach-Colton, ICALP 2002].

## Authors

• 15 publications
• 4 publications
• 15 publications
• ### Frequency Estimation with One-Sided Error

Frequency estimation is one of the most fundamental problems in streamin...
11/06/2021 ∙ by Piotr Indyk, et al. ∙ 0

• ### Faster Update Time for Turnstile Streaming Algorithms

In this paper, we present a new algorithm for maintaining linear sketche...
11/04/2019 ∙ by Josh Alman, et al. ∙ 0

• ### Exponential Separations Between Turnstile Streaming and Linear Sketching

Almost every known turnstile streaming algorithm is implementable as a l...
05/07/2019 ∙ by John Kallaugher, et al. ∙ 0

• ### Streaming Quantiles Algorithms with Small Space and Update Time

Approximating quantiles and distributions over streaming data has been s...
06/29/2019 ∙ by Nikita Ivkin, et al. ∙ 0

• ### Relative Error Streaming Quantiles

Approximating ranks, quantiles, and distributions over streaming data is...
04/03/2020 ∙ by Graham Cormode, et al. ∙ 0

• ### ONCE and ONCE+: Counting the Frequency of Time-constrained Serial Episodes in a Streaming Sequence

As a representative sequential pattern mining problem, counting the freq...
01/29/2018 ∙ by Hui Li, et al. ∙ 0

• ### Optimal streaming and tracking distinct elements with high probability

The distinct elements problem is one of the fundamental problems in stre...
04/05/2018 ∙ by Jarosław Błasiok, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The streaming model considers the following setting. One is given a list as input where we think of as extremely large. The algorithm is only allowed to read the input once in a stream and the goal is to answer some predetermined queries using space of size logarithmic in . For each and time , define as the frequency of at time . Many classical streaming problems are concerned with approximating statistics of such as the distinct element problem (i.e., ). One of the most well-studied problems is the one-shot estimation problem where the goal is to estimate within multiplicative error and had been achieved by the seminal AMS sketch by Alon et al. [AMS96].

We consider a streaming algorithm that maintains some logarithmic space and outputs an estimation at the step of the computation. achieves -tracking if for every input stream

 Pr[∃t∈[m]∣∣σt−∥f(t)∥22∣∣>ϵΔt]≤δ

where the “normalization factor” differs between strong tracking and weak tracking. For -strong tracking, is the norm squared of the frequency vector up to the time , while for -weak tracking, is the norm squared of the overall frequency vector. Note that strong tracking implies weak tracking and weak tracking implies one-shot approximation. In this work, we focus on tracking via linear sketching, where we specify a distribution on matrices , and maintain a sketch vector at time as . Then the estimate is defined as . The space complexity of is the number of machine words111Following convention, we assume the size of a machine word is at least bits. required by . The update time complexity of is the time to update , in terms of number of arithmetic operations.

Both weak tracking and strong tracking have been studied in different context [HTY14, BCIW16, BCI17] and the focus of this paper is on the update time complexity. Specifically, we are interested in the dependency of update time on the approximation factor . The state-of-the-art result prior to our work is by Braverman et. al. [BCI17] showing that AMS provides weak tracking with update time and words of space.

Apart from tracking, there have been several sketching algorithms for one-shot approximation that have faster update time. Dasgupta et. al. [DKS10] and Kane and Nelson [KN14] showed that sparse JL achieves update time for one-shot approximation. Charikar, Chen, and Farach-Colton [CCFC02] designed the CountSketch algorithm and showed that it achieve update time for one-shot approximation.

#### Update time

Unlike the space complexity in streaming model, there have been less studies in the update time complexity though it is of great importance in applications. For example, the packet passing problem [KSZC03] requires the estimation in the streaming model with input arrival rate as high as packets222Each packet has 40 bytes (320 bits). per second. Thorup and Zhang [TZ12] improved the update time from 182 nanoseconds to 50 nanoseconds and made the algorithm more practical.

While some streaming problems have algorithms with constant update time (e.g., distinct elements [KNW10b] and estimation [TZ12]), some other important problems do not ( estimation for  [KNPW11a], heavy hitters problems [CCFC02, CM05], and tracking problems [BCI17]). Larsen et al. [LNN15]

systematically studies the update time complexity and showed lower bounds against heavy hitters, point query, entropy estimation, and moment estimation in the non-adaptive turnstile streaming model. In particular, they show that

-space algorithms for estimation of vectors over , with failure probability , must have update time roughly . Note that their lower bound does not depend on .

#### Space lower bounds

For one-shot estimation of the norm, Kane et al. [KNW10a] showed that bits of space are required, for any streaming algorithm. This space lower bound is tight due to the AMS sketch. However, this only applies in the constant failure probability regime.

In the regime of sub-constant failure probability , known tight lower-bounds on Distributional JL [KMN11, JW13] imply that rows are necessary for the special case of linear sketching algorithms.333 Note that an -weak tracking via linear sketch defines a distribution over matrices that satisfies the Distributional JL guarantee, with distortion and failure probability . For linear sketches, this lower bound on number of rows is equivalent to a lower bound on the words of space.

For the regime of faster update time, Kane and Nelson [KN14] shows that CountSketch-type of constructions (with the optimal rows) require sparsity i.e. number of non-zero elements per column to achieve distortion and failure probability . But, this does not preclude a sketch with suboptimal dependency on in the number of rows from having constant sparsity, for example a sketch with rows and constant sparsity – indeed, this is what CountSketch achieves. Note that in our setting, we can potentially boost constant-failure probability to arbitrarily small failure probability by taking medians of estimators.444This is not immediate for weak tracking. Thus, we may be able to bypass the lower-bounds for linear sketches.

To summarize the situation: for constant failure probability, it is only known that linear sketches require dimension , and it is not known if super-constant sparsity is required for tracking with this optimal dimension. In particular, it was not known how to achieve say -weak tracking for , with words of space and constant update time.

#### Our contributions

In this paper, we show that there is a streaming algorithm with update time and space using words that achieves -weak tracking.

###### LTheorem 1.1 (informal).

For any , , and . For any insertion-only stream over with frequencies , there exists a streaming algorithm providing -weak tracking with space using words and update time.

Further, by applying a standard union bound argument in 4.1, the same algorithm can achieve strong tracking as well.

###### LCorollary 1.2.

For any , , and . For any insertion-only stream over with frequencies , there exists a streaming algorithm providing -strong tracking with words and update time.

The algorithm in the main theorem is obtained by running many copies of CountSketch and taking the median.

The main techniques used in the proof are the chaining argument and Hansen-Wright inequality which are also used in [BCI17] to show the tracking properties of AMS. However, direct applications of these tools on the CountSketch algorithm would not give the desired bounds due to the sparse structure of the sketching matrix. To overcome this issue, we have to dig into the structure of sketching matrix of CountSketch. We will compare the difference between our techniques and that in [BCI17] after presenting the proof of Theorem 1.1 (see 3.11).

The rest of the paper is organized as follows. Some preliminaries are provided in Section 2. In Section 3, we prove our main theorem showing that CountSketch with rows achieves -weak tracking with constant update time. As for the strong tracking, we discuss some upper and lower bounds in Section 4. In Section 5, we discuss some future directions and open problems.

## 2 Preliminaries

In the following, denotes the size of the universe, denotes the number of rows of the sketching matrix, denotes the time, and denote the final time. We let and use and to denote the usual and with some extra poly-logarithmic factor.

The input of the streaming algorithm is a list . For each and time , define as the frequency of at time . The one-shot approximation problem is to produce an estimate for with multiplicative error and success probability at least for and .

### 2.1 ℓ2 tracking

Here, we give the formal definition of tracking for sketching algorithm.

###### Definition 2.0 (ℓ2 tracking).

For any , and . Let be the frequency of an insertion-only stream over and be its (randomized) approximation produced by a sketching algorithm. We say the algorithm provides -strong tracking if

 Pr[∃t∈[m], ∣∣∥~f(t)∥22−∥f(t)∥22∣∣>ϵ∥f(t)∥22]≤δ.

We say the algorithm provides -weak tracking if

 Pr[∃t∈[m], ∣∣∥~f(t)∥22−∥f(t)∥22∣∣>ϵ∥f(m)∥22]≤δ.

Note that the difference between the two tracking guarantees is that in strong tracking we bound the deviation of the estimate from the true norm-squared by , while in the weak tracking we bound this deviation by .

### 2.2 Ams sketch and CountSketch

Alon et. al. [AMS96] proposed the seminal AMS sketch for approximation in the streaming model. In AMS sketch, consider where and is i.i.d. Radmacher for each . When , AMS sketch approximates norm within multiplicative error. Note that the update time of AMS sketch is since the matrix is dense.

Charikar, Chen, and Farach-Colton [CCFC02] proposed the following CountSketch algorithm for approximation in the streaming model. Here, consider where we denote the column of as for each . is defined as follows. First, pick uniformly and set to be an independent Radmacher. Next, set the other entries in to be 0. Note that unlike AMS sketch, the normalization term in CountSketch is 1 since there is exactly one non-zero entry in each row. [CCFC02] showed that CountSketch provides one-shot approximation with rows.

###### LLemma 2.1 ([Ccfc02]).

Let , , and . Pick , we have for any ,

 PrΠ[|∥Πx∥22−∥x∥22|>ϵ∥x∥22]≤δ.

#### Implementing CountSketch in logarithmic space

Previously, we defined CountSketch using uniformly independent randomness, which requires space . However, one could see that in the proof of Theorem 3.1 we actually only need 8-wise independence. Thus, the space required can be reduced to for each row. It is well known that CountSketch with rows can be implemented with 8-wise independent hash family using words. We describe the whole implementation in Appendix A for completeness.

### 2.3 ϵ-net for insertion-only stream

In our analysis, we will use the following existence of a small -net for insertion-only streams.

###### Definition 2.1 (ϵ-net).

Let be a set of vectors. For any , we say is an -net for with respect to norm if for any , there exists such that .

###### LLemma 2.2 ([Bciw16]).

Let be an insertion-only stream. For any , there exists a size -net for with respect to norm. Moreover, the elements in the net are all from .

###### Proof Sketch.

The idea is to use a greedy algorithm, by scanning through the stream from the beginning and adding an element into the net if there does not already exist an element in the net that is -close to . ∎

### 2.4 Concentration inequalities

Our analysis crucially relies on the following Hanson-Wright inequality [HW71].

###### LLemma 2.3 (Hanson-Wright inequality [Hw71]).

For any symmetric , being independent Radmacher vector, and integer , we have

 ∥σ⊤Bσ−Eσ[σ⊤Bσ]∥p≤O(√p∥B∥F+p∥B∥)=O(p∥B∥F),

where is defined as .

Note that the only randomness in is the Radmacher vector .

## 3 CountSketch with O(ϵ−2) rows provides ℓ2 weak tracking

In this section we will show that CountSketch with rows provides -weak tracking.

###### LTheorem 3.1 (CountSketch with O(ϵ−2) rows provides ℓ2 weak tracking).

For any , , and . Pick . For any insertion-only stream over with frequency , the CountSketch algorithm with rows provides -weak tracking.

###### Remark.

Note that for linear sketches, the dependency of number of rows on is tight in Theorem 3.1. This is implied by known lower-bounds on Distributional JL [KMN11, JW13], which imply lower-bounds on one-shot approximation.

###### Remark.

Recall that the number of rows in linear sketches is proportional to the number of words needed in the algorithm.

Using the standard median trick, we can run copies of CountSketch with in parallel and output the median. With this,  Theorem 3.1 immediately gives the following corollary with better dependency on .

###### LCorollary 3.2.

For any , , and . For any insertion-only stream over with frequency , there exists a streaming algorithm providing -weak tracking with rows and update time .

The proof of Theorem 3.1 uses the Dudley-like chaining technique similar to other tracking proofs [BCI17]. However, direct application of this technique does not suffice, and we have to utilize the structure of the sketching matrix of CountSketch (see 3.11 for a comparison of our proof techniques). We will prove Theorem 3.1 in Section 3.1.

### 3.1 Proof of Theorem 3.1

In this subsection, we give a formal proof for our main theorem. Let us start with some notations for CountSketch. Recall that for any , the column of is defined by (i) picking uniformly and set

to be a Radmacher random variable and (ii) set the other entries in

to be 0. Denote , where is a Radmacher random variable, and is the indicator for choosing the row in the

column. Note that there is exactly one non-zero entry in each column and the probability distribution is uniform. The approximation error of

for a vector is denoted as . To show weak tracking, it suffices to upper bound the supremum of .

 EΠsupt∈[m]γ(f(t))=EΠsupt∈[m]∣∣∥Πf(t)∥22−∥f(t)∥22∣∣. (3.3)

The first observation555Note that the matrix we are using is different from the matrix used in the previous analysis of [BCI17]. This difference is crucial since the matrix of [BCI17] does not work for CountSketch. is that one can rewrite the error as follows.

 γ(x)=∣∣x⊤Π⊤Πx−x⊤x∣∣=∣∣σ⊤Bη,xσ−x⊤x∣∣=∣∣σ⊤~Bη,xσ∣∣,

where is an independent Radmacher random vector and for any ,

 (~Bη,x)i,i′={xixi′, i≠i′ and ∃j∈[k], ηj,i=ηj,i′=10, else.

Note that the diagonals of are all zero as follow.

 ~Bη,x=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝0x1x2⟨Π1,Π2⟩⋯x1xn⟨Π1,Πn⟩x2x1⟨Π2,Π1⟩0⋯x2xn⟨Π2,Πn⟩⋮⋮⋱⋮xnx1⟨Πn,Π1⟩xnx2⟨Πn,Π2⟩⋯0⎞⎟ ⎟ ⎟ ⎟ ⎟⎠.

For convenience, for any matrix , we overload the notation by denoting . That is, . One benefit of writing the weak tracking error as the above quadratic form is that Hanson-Wright inequality (see 2.3) is now applicable.

The lemma below shows that the expectation of the weak tracking error is upper bounded by the Frobenius norm of .

###### LLemma 3.4.

Let be the frequencies of an insertion-only stream. We have

 E[supt∈[m]γ(f(t)) | η]=O(∥~Bη,f(m)∥F).

The proof of 3.4 uses the Dudley-like chaining argument. For smoothness of presentation, we postpone the details to Section 3.2. Next, the following lemma shows that for any vector , with high probability, .

###### LLemma 3.5.

For any and ,

 Pr[∥~Bη,x∥F>√2∥x∥22√δ⋅k]≤δ2.

3.5 has similar flavor as 2.1. The proof can be found in Section 3.2. Finally, Theorem 3.1 is an immediate corollary of 3.4 and 3.5. Here we provide a proof for completeness.

###### Proof of Theorem 3.1.

Recall that to prove Theorem 3.1, it suffices to show that with probability at least over , . From 3.4, for a fixed , we have

 Pr[supt∈[m]γ(f(t))>C1∥~Bη,f(m)∥F]≤δ/2

for some constant . Next, from 3.5, we have with probability at least over the randomness in for some constant . Pick , we have and complete the proof. ∎

### 3.2 Proof of Lemma 3.4 and Lemma 3.5

In this subsection, we provide the proofs for 3.4 and 3.5. Let us start with 3.4 which shows that the tracking error can be upper bounded by the Frobenius norm of .

###### Proof of 3.4.

Recall that we define such that where is 8-wise independent Radmacher random vector. An important trick here is that we think of fixing666We do this by conditioning on . in the following.

The starting point of chaining argument is constructing a sequence of -nets with exponentially decreasing error for . Note that here are matrices but one can view it as a vector and apply 2.2 where norm for a vector becomes Frobenius norm for a matrix. Namely, for any non-negative integer , let be the -net for under Frobenius norm where . Note that here we fixed first and then constructed the nets. Thus, for each , one can rewrite into a chain as follows.

 ~Bη,f(t)=B(t)η,0+∞∑ℓ=1B(t)η,ℓ−B(t)η,ℓ−1, (3.6)

where and . Moreover, from Equation 3.6 we have

 Esupt∈[m]γ(f(t))≤Esupt∈[m]γ(B(t)η,0)+∞∑ℓ=1Esupt∈[m]γ(B(t)η,ℓ−B(t)η,ℓ−1). (3.7)

To bound to first term of Equation 3.7, observe that where

is the all zero matrix. Namely, the first term of

Equation 3.7 is zero. As for the second term of Equation 3.7, we apply the chaining argument as follows. For any positive integer , denote . Note that from the construction of -net in 2.2, we have by triangle inequality.

 E[supt∈[m]γ(B(t)η,ℓ−B(t)η,ℓ−1)] =∫∞0Pr[supA∈Aℓγ(A)>u]du ≤u∗ℓ+∫∞u∗ℓPr[supA∈Aℓγ(A)>u]du, (3.8)

where will be chosen later. For any and integer , by Markov’s inequality and Hanson-Wright inequality, we have

 Pr[γ(A)>u]≤E[γ(A)p]up=∥σ⊤Aσ∥ppup≤(C⋅√p∥A∥F+C⋅p∥A∥)pup

for some constant . Note that the randomness here is only in and thus we can apply the Hanson-Wright inequality. Let for some . The last inequality holds because of and the choice of -net. Now, choose where will be decided later, Equation 3.8 becomes

 E[supt∈[m]γ(B(t)η,ℓ−B(t)η,ℓ−1)] ≤u∗ℓ+∫∞u∗ℓ|Aℓ|⋅Rpupdu (3.9) ≤2SℓR+|Aℓ|⋅Rp(2SℓR)p−1 ≤2SℓC′p⋅∥~Bη,f(m)∥F⋅2−ℓ+|Aℓ|⋅C′p⋅∥~Bη,f(m)∥FSp−1ℓ

where the second term of Equation 3.9 is due to union bound. Now, Equation 3.7 becomes

 Esupt∈[m]γ(f(t)) ≤∞∑ℓ=12SℓC′p⋅∥~Bη,f(m)∥F+|Aℓ|⋅C′p⋅∥~Bη,f(m)∥FSp−1ℓ ≤∥~Bη,f(m)∥F⋅⎛⎝∞∑ℓ=12C′pSℓ⋅2−ℓ+22ℓC′pSp−1ℓ⎞⎠. (3.10)

Choose and , the summation term in Equation 3.10 can thus be upper bounded by a constant. We conclude that

 Esupt∈[m]γ(f(t))=O(∥~Bη,f(m)∥F).

Note that this also means that 8-wise independence suffices and thus the sketching matrix can be efficiently stored (see Appendix A for more details). ∎

Next, we prove 3.5 which upper bounds the expectation of for any .

###### Proof of 3.5.

We first show that and the lemma immediately holds due to Markov’s inequality.

Let be the indicator for whether there exists such that . Note that for , and the only randomness here is in .

 E∥~Bη,x∥2F =E∑i,i′∈[n](~Bη,x)2i,i′=E∑(i,i′)∈[n]2, i≠i′x2ix2i′1ii′ =1k∑(i,i′)∈[n]2, i≠i′x2ix2i′≤∥x∥42k,

where the last inequality is by Cauchy-Schwarz. Note that 8-wise independence is sufficient in the above argument. ∎

###### LRemark 3.11.

Here, let us briefly compare the difference between our techniques and that in [BCI17]. There are two key observations on the structure of the sketching matrix of CountSketch. First, we observe that the Frobenius norm of is dominated by its diagonal and thus removing the diagonal would give us a more accurate analysis on the contribution from the off-diagonal term. However, removing the diagonal of destroys the symmetric structure and thus the standard -net argument (e.g., in [BCI17]) would not work. To overcome this, we observe that one can directly construct -net for the matrix obtained by removing the diagonal from . Combining these two observations with a standard chaining argument, we are able to show that CountSketch provides weak tracking.

## 4 Strong tracking of Ams sketch and CountSketch

In this section, we are going to discuss the strong tracking of AMS sketch and CountSketch. We start with a standard reduction from weak tracking to strong tracking via union bound. This gives us an blow-up in the dependency on . Next, we show that this is essentially tight for both AMS sketch and CountSketch up to a logarithmic factor.

###### LLemma 4.1 (folklore).

For any , , and . If a linear sketch provides weak tracking for length inputs having value from , then it also provides strong tracking where .

###### Proof.

See Section B.1 for details. ∎

From 4.1, we immediate have the following corollaries.

###### LCorollary 4.2.

For any and , AMS sketch with rows provides -strong tracking.

###### LCorollary 4.3.

For any and , CountSketch with rows provides -strong tracking.

###### Remark.

After applying median trick on CountSketch, the dependency of the number of rows on becomes and thus rows suffices to achieve -strong tracking.

It turns out that the above two upper bounds for strong tracking are essentially tight for these two algorithms.

###### LTheorem 4.4.

There exists constants such that for any and , there exists such that if and , then fully independent AMS sketch with rows does not provide -strong tracking.

That is, AMS sketch requires rows to achieve -strong tracking. Interestingly, the hard instance for AMS sketch to achieve strong tracking is simply the stream consisting all distinct elements. See Section B.2 for details.

###### LTheorem 4.5.

There exists a constant such that for any , and , there exists such that if and , then CountSketch with rows does not provide -strong tracking.

That is, CountSketch requires rows to achieve -strong tracking. The hard instance for CountSketch is more complicated than that of AMS sketch. See Section B.3 for details.

## 5 Conclusion

In this work, we showed that CountSketch provides weak tracking with update time having no dependence on the error parameter . We also give almost tight strong tracking lower bounds for AMS sketch and CountSketch.

An immediate open problem after this work would be tracking with faster update time for . The estimation problem had been solved by Indyk [Ind06] via -stable sketch and was proven to provide weak tracking by Błasiok et al. [BDN17]. However, same as AMS sketch, the -stable sketch is dense and has update time . Nevertheless, Kane et al. [KNPW11b] gave a space-optimal algorithm for estimation problem with update time . It would be interesting to see if their algorithm also provides weak tracking.

### Acknowledgement

The authors wish to thank Jelani Nelson for invaluable advice throughout the course of this project. We also thank Mitali Bafna and Jarosław Błasiok for useful discussion and thank Boaz Barak for many helpful comments on an earlier draft of this article.

## References

• [AMS96] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In

Proceedings of the twenty-eighth annual ACM symposium on Theory of computing

, pages 20–29. ACM, 1996.
• [BCI17] Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P Woodruff. BPTree: An heavy hitters algorithm using constant memory. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 361–376. ACM, 2017.
• [BCIW16] Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, and David P Woodruff. Beating countsketch for heavy hitters in insertion streams. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 740–753. ACM, 2016.
• [BDN17] Jaroslaw Blasiok, Jian Ding, and Jelani Nelson. Continuous monitoring of l_p norms in data streams. In LIPIcs-Leibniz International Proceedings in Informatics, volume 81. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
• [Ber41] Andrew C Berry. The accuracy of the gaussian approximation to the sum of independent variates. Transactions of the american mathematical society, 49(1):122–136, 1941.
• [CCFC02] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, pages 693–703. Springer, 2002.
• [CM05] Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58–75, 2005.
• [DKS10] Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. A sparse Johnson-Lindenstrauss transform. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 341–350. ACM, 2010.
• [Ess42] Carl-Gustaf Esseen. On the Liapounoff limit of error in the theory of probability. Almqvist & Wiksell Stockholm, 1942.
• [HTY14] Zengfeng Huang, Wai Ming Tai, and Ke Yi. Tracking the frequency moments at all times. arXiv preprint arXiv:1412.1763, 2014.
• [HW71] David Lee Hanson and Farroll Tim Wright. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3):1079–1083, 1971.
• [IL06] Tadeusz Inglot and Teresa Ledwina. Asymptotic optimality of new adaptive test in regression model. In Annales de l’Institut Henri Poincare (B) Probability and Statistics, volume 42, pages 579–590. Elsevier, 2006.
• [Ind06] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307–323, 2006.
• [JW13] T. S. Jayram and David P. Woodruff. Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with subconstant error. ACM Trans. Algorithms, 9(3):26:1–26:17, June 2013.
• [KMN11] Daniel Kane, Raghu Meka, and Jelani Nelson. Almost optimal explicit Johnson-Lindenstrauss families. In

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques

, pages 628–639. Springer, 2011.
• [KN14] Daniel M Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM (JACM), 61(1):4, 2014.
• [KNPW11a] Daniel M Kane, Jelani Nelson, Ely Porat, and David P Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 745–754. ACM, 2011.
• [KNPW11b] Daniel M Kane, Jelani Nelson, Ely Porat, and David P Woodruff. Fast moment estimation in data streams in optimal space. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 745–754. ACM, 2011.
• [KNW10a] Daniel M Kane, Jelani Nelson, and David P Woodruff. On the exact space complexity of sketching and streaming small norms. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 1161–1178. Society for Industrial and Applied Mathematics, 2010.
• [KNW10b] Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 41–52. ACM, 2010.
• [KSZC03] Balachander Krishnamurthy, Subhabrata Sen, Yin Zhang, and Yan Chen. Sketch-based change detection: methods, evaluation, and applications. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 234–247. ACM, 2003.
• [LNN15] Kasper Green Larsen, Jelani Nelson, and Huy L Nguyên. Time lower bounds for nonadaptive turnstile streaming algorithms. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 803–812. ACM, 2015.
• [TZ12] Mikkel Thorup and Yin Zhang. Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM Journal on Computing, 41(2):293–331, 2012.

## Appendix A Implementation of CountSketch

Here, we present the implementation of CountSketch for the completeness. Note that the construction is standard and not new.

Note that both and can be stored in space and be evaluated in many arithmetic operations. can be stored in space bits. For the convenience of analysis, we define the sketching matrix of CountSketch by for all .

## Appendix B Proofs for strong tracking

### b.1 From weak tracking to strong tracking

After applying union bound on all points , a streaming algorithm provides -approximation also provides -strong tracking where . However, the blow-up in is , which is undesirable. The following lemma shows that with a more delicate union bound argument, the reduction from weak tracking to strong tracking only has blow-up in . Note that the lemma is a folklore and we provide a proof for completeness.

###### Proof.

Let be the frequency of an insertion-only stream and let be its (randomized) approximations produced by the linear sketch. Let and for each . Note that for each and , . Define the event

 Ei:={∥~f(ti)∥22−∥f(ti)∥22|>ϵ∥f(ti)∥22}.

Observe that for each , would imply . Namely, implies strong tracking.

By the -weak tracking property of the streaming algorithm, for each , we have and thus . We conclude that the streaming algorithm provides -strong tracking. ∎

### b.2 Strong tracking lower bound for Ams sketch

The hard instance is simply the stream of all distinct elements, i.e., for all .

###### Proof of Theorem 4.4.

Consider the stream of all distinct elements as the hard instance, i.e., for all . Thus, and for all .

Define a sequence of time as follows. and where . Pick and properly such that . Some quick facts about the choice of parameters here: (i) . (ii) .

To show AMS sketch does not provide -strong tracking for and , it suffices to show that with probability at least there exists such that .

For the convenience of the analysis, for any and , let which is the sum of independent Radmacher random variables divided by