Streaming algorithms are an integral part of the modern toolbox for large-scale data analysis. A streaming algorithm observes a stream of data updates that arrive one by one, and is required to compute some global function of the data using a small amount of space (memory) and with an efficient running time.
Most of the literature on streaming algorithms implicitly assumes that the stream updates do not depend on previous outputs of the algorithm or on the randomness produced by the algorithm. This assumption may not be realistic in many situations: for example, when the data is chosen by a malicious adversary in response to previous outputs, or when data characteristics change based on previous outcomes in some complicated or unpredictable way. As a result, the last couple of years have seen substantial progress in the systematic investigation of adversarially robust streaming algorithms [BY20, BJW+20, HKM+20, WZ20, ABD+21, KMN+21, BHM+21, ACS+21], which preserve their correctness guarantees even for adaptively chosen data and are thus especially suitable for these interactive settings.
There is already a wide range of problems and settings for which the best known adversarially robust streaming algorithms are almost as efficient as their classical, non-robust counterparts. The flip number [BJW+20] of a streaming problem, an algorithmic stability parameter which counts how many times the output value may change by a multiplicative factor of as the stream progresses, plays a central role in many of these results [HKM+20, WZ20, KMN+21, ACS+21]. When the flip number is small, the generic methods developed in these works can turn a classical streaming algorithm into an adversarially robust one with only a small overhead (linear in or better) in the space complexity. This is especially useful in the insertion only streaming model, where elements are only added to the stream, but may not be deleted from it. Many important streaming problems, such as -estimation, distinct elements, entropy estimation, and various others, all have flip number of for insertion-only streams of length . Under the standard assumption that , where is the size of the universe of all possible data elements, and building on additional known results from the streaming literature, one can then obtain adversarially robust insertion-only -approximation algorithms with space complexity .
The situation in the turnstile streaming model, which allows both insertions and deletions, is more complicated. The most popular technique for turnstile streams in the classical regime, linear sketching, is provably not adversarially robust [HW13]. Furthermore, the flip number can be very large, potentially even . The best known robustification methods in this regime [HKM+20, ACS+21], based on differential privacy, have a multiplicative dependence in the flip number (for constant ). Therefore, they induce a space overhead of compared to the best non-robust algorithms.
A separation result of Kaplan, Mansour, Nissim and Stemmer [KMN+21] shows that indeed the -type dependence in the flip number is tight for some streaming problems; specifically, they show this for a variant of the Adaptive Data Analysis problem in the context of bounded-space computation. We note, however, that the lower bound of [KMN+21] does not apply to many core problems in the streaming literature, for which no separation between the classical oblivious and adversarially robust settings is known. In particular, this is the case for -estimation, in which the goal is to approximate , the -th moment of a frequency vector . This gives rise to the following question, widely regarded as one of the central open questions on adversarially robust streaming.222To the best of our knowledge, the first explicit appearance of this question in the literature is in Jayaram’s Ph.D. thesis [JAY21, page 26]. See also a talk by Stemmer [STE21] at 54:45 and the third question on the list of open questions from the STOC 2021 Workshop on Robust Streaming, Sketching, and Sampling [ROB21a].
What is the adversarially robust space complexity of -estimation in the turnstile streaming model?
In this work we show that a combination of existing building blocks from the literature (with slight modifications and simplifications) can yield a substantially improved space complexity for the above problem. Our results hold when deletions are allowed, as long as each update increases or decreases the frequency of a single element by (or more generally, by a bounded integer amount). We also allow frequencies of elements to become negative in the process, which is known as the general turnstile streaming model.
2 Overview of Our Contribution
2.1 Our results
We give an -estimation algorithm that breaks the (or ) space barrier. We now state a simplified version of the main result, focusing just on on the dependence on the stream length and domain size , whenever it is polynomial. For the full statement of our results, see Theorem 20.
Theorem 1 (Simplified main result).
For any fixed and , there is an adversarially robust -estimation streaming algorithm that computes a -approximation, using:
space if ,
where the notation suppresses factors that are polynomial in , , and . The algorithm gives correct estimates throughout the entire stream with probability .
We note that since the flip number for the moment estimation problem is (see Section 3.3), the dependency of the space complexity of our approach in is for and for . This improves polynomially upon the currently best known bound, obtained using the aforementioned differential privacy based robustness frameworks [HKM+20, ACS+21]. Together with the separation of [KMN+21], our result (see also [JAY21]) suggests that a paradigm shift may be required in order to achieve improved space complexity for turnstile streams: rather than developing very widely applicable robustness frameworks that suffer from the -type lower bound due to their wide applicability, it may make sense to look for methods that are perhaps somewhat less generic, and exploit other properties of the problem, beyond just the flip number.
2.2 Our techniques
Our result relies on a straightforward combination of known techniques from the literature. The bottleneck of the previous best result for moment estimation for general turnstile streaming is the direct reliance on the flip number , which for general streams can be of order . As mentioned above, methods that take only the flip number into account (and do not use any other characteristics of the problem at hand) cannot get space complexity much smaller than (that is, for norm estimation). Thus, we exploit a specific characteristic of -estimation: the actual number of significant changes to the -th moment can only be large if the moment remains small. This can only be the case if the underlying vector is sparse, i.e., has relatively few non-zero coordinates. We therefore divide the current state of the frequency vector into two regimes: sparse and dense, using a threshold . If the vector has at most non-zero coordinates, it is considered sparse. If it has more than
non-zero coordinates, it is considered dense. For densities in between, the state of the vector may be temporarily classified as either dense or sparse.
In the sparse regime, we take the simplest possible approach, which is storing the input explicitly, using a sparse representation, which requires only space. In this form, it is easy to maintain the current moment, since we know the current frequency vector exactly. In the dense regime, we apply the technique from the paper of Hassidim, Kaplan, Mansour, Matias, and Stemmer [HKM+20], which uses differential privacy to protect not the input data, but rather the internal randomness of estimators it uses. At a high level, their framework consists of invoking a set of estimators, which upon each query provide an updated estimate. Given the stream of updates, they use differentially private methods to detect whenever the current estimate is no longer relevant, at which point they query the set of estimators to get an updated estimate. Their technique in general increases the space requirement by a factor of the square root of the flip number, compared to that of oblivious streaming algorithms. In particular, applying their method for the moment estimation problems, requires invoking instances of oblivious -estimation algorithms. We improve on the above, by taking advantage of the fact that the estimated value of the -th moment cannot change too rapidly for dense vectors. For instance, for (the distinct element count) or , if the vector has at least non-zero coordinates, at least insertions or deletions are required to change it by a constant factor. Similarly, for , at least insertions or deletions are needed. Hence, the flip number for the dense regime is much lower, and we can make significantly fewer queries to a set of oblivious -estimation algorithms. This in turn implies that we can significantly reduce the number of required estimators.
The missing component in our description so far is how the transition between the regimes happens. If we are transitioning from the sparse regime to the dense one, we have all the information needed about the current state of the input vector that we are tracking. If we are transitioning from the dense regime to the sparse regime, we use off-the-shelf sparse recovery techniques (also known as compressed sensing) to recover the frequency vector exactly. To know when to do this, we run in parallel an adversarially robust streaming algorithm for distinct element counting, which we also know has to be queried only every steps.
We present the pseudocode of our approach as Algorithm 1. We maintain adversarially robust estimators, and , that are queried significantly less frequently than times throughout the entire execution of the algorithm. We query them at regular intervals, knowing that their values cannot change too rapidly, when the vector is dense. We note that we do not use them when the vector is sparse, as in that regime their readouts may be inaccurate.
We present the pseudocode for an adversarially robust algorithm that has to answer only a limited number of queries in Algorithm 2. This algorithm is a simplified and adjusted version of the algorithm that appeared in the work of Hassidim et al. [HKM+20]. In the introduction of their paper, they note that constructing a bounded-query variant of their algorithm is possible, but do not give any details beyond that. As this variant is crucial for our purposes, we present this construction in full detail for completeness.
3.1 Basic notation and terms
For any , we write to denote , the set of smallest positive integers.
Definition 2 (The -moment).
The -moment of a vector is , for any . We interpret the -moment as , i.e., the number of non-zero coordinates of , by assuming in this context that and for any .
Definition 3 (Vector density).
Let and . We say that a vector is -dense if at least of its coordinates are non-zero, and -sparse if at most of them are non-zero.
Definition 4 (-approximation).
For any and , we say that is a -approximation to if
3.2 Streaming algorithms
A streaming algorithm receives, one by one, a stream of updates , , …, that modify the data, and is typically required to compute or approximate some function of the data over the stream of updates. In this paper, we fully focus on the setting in which the input is a frequency vector for some integer , known to the algorithm in advance. Initially, at the beginning of the stream, this vector is the all-zero vector, i.e., . The stream consists of updates of the form in which and . The interpretation of each update is that is added to the -th coordinate of , i.e., each update increases (“insertion”) or decreases (“deletion”) a select coordinate by 1.333As mentioned, the update values are always in the model we consider here. In the most general setting for turnstile streaming, may be unbounded; however, for our arguments to hold, it is important that are bounded in some way (e.g., satisfy for some constant ). This assumption is not necessary for -estimation—i.e., counting the number of distinct elements—for which updates of arbitrary magnitude are allowed as long as they can be handled by a non-robust streaming algorithm on which we build.
For any fixed stream of updates, the streaming algorithm is required to output or a good approximation to (e.g., a -approximation if is a non-negative real number) after seeing the stream with probability , for some parameter . We refer to in this context, as success probability. We sometimes refer to streaming algorithms in this model, in which the stream is independent of the actions of the streaming algorithm, as oblivious or non-robust to distinguish them from adversarially robust streaming algorithms, which we design in this paper.
We assume that the streaming algorithm does not know the exact length of the stream in advance, and only knows an upper bound on it. Because of that, we assume that the algorithm can be asked to output its approximation of at any time throughout the stream. This is true for a large majority of streaming algorithms, and in particular, to the best of our knowledge, applies to all general moment streaming algorithms. For this type of streaming algorithm, if its success probability is , this means that it can output a desired approximation with probability at least at any fixed prefix of the stream.
Streaming related notation and assumptions.
Throughout the paper, we consistently write to denote the dimension of the vector on which the streaming algorithm operates. We use to denote an upper bound on the length of the input and we assume that . We assume that machine words are large enough to represent and , i.e., the number of bits in them is at least and we express the complexity of algorithms in words.
3.3 Adversarially robust streaming algorithms
In this paper, we design streaming algorithms in the adversarially robust streaming model of Ben-Eliezer et al. [BJW+20], which we now introduce in the context of computing a -approximation to values of a function .
Definition 5 (Adversarially robust streaming).
Fix a function and let . The robust streaming model is defined as a game between two players, Adversary and Algorithm, where , , and the stream length are known to both players. In each round :
First, Adversary picks for and and sends them to Algorithm. The choice of and may depend on all previous updates sent by Adversary as well as all previous outputs of Algorithm.
Algorithm outputs , which is required to be a -approximation to , where is the vector aggregating all updates so far (that is, for all ). Algorithm sends to Adversary.
Algorithm’s goal is to return correct outputs at all times. That is, is required to be a -approximation to for all . Conversely, Adversary’s goal is to have Algorithm’s output that is not a -approximation to for some .
We say that a streaming algorithm is adversarially robust and has success probability for some if it can provide all correct outputs as Algorithm with probability at least for any Adversary.
We now introduce the notion of an -query adversarially robust streaming algorithm, which has to provide no more than outputs.
Definition 6 (-query robust streaming algorithm).
Let . A -query adversarially robust streaming algorithm is defined similarly to a standard adversarially robust streaming algorithm, with the following modification: Adversary may perform at most queries for outputs from Algorithm, and only receives the output in these time steps where queries are made. Adversary may pick these time steps adaptively as a function of all previous interactions. We say that a -query adversarially robust streaming algorithm has a probability of success , for some , if for any Adversary that makes at most queries, with probability at least , it correctly answers all of them.
Note on the tracking property.
Oblivious streaming algorithms are not required to have the “tracking” property, i.e., they have to provide a good approximation at the end of the stream (or at any fixed point), but their definition does not require any type of consistent behavior throughout the stream. This is required, however, for adversarially robust streaming algorithms. We build our adversarially robust streaming algorithms from oblivious streaming algorithms that do not have a tracking property.
3.4 Frequency moment estimation
The main focus of this paper is designing streaming algorithms for the problem of -estimation, i.e., estimation of the -moment, in the turnstile streaming model. To build our adversarially robust algorithms, we use classical, non-robust, turnstile streaming algorithms for -estimation.
Theorem 7 (Previously known -estimation results).
3.5 Flip number
The flip number, defined in [BJW+20], plays a prominent role in many of the previous results on adversarially robust streaming. For completeness, we next provide its definition suited for our context.
Definition 8 (Flip number).
Fix a function and . Let , …, be a sequence of updates to some vector whose initial value is , and let be the value of the vector after updates have been received. The flip number of with respect to the above sequence is the size of the largest subsequence for which is not a -approximation of for any . The flip number of is the maximum of over all possible choices of the sequence , …, .
It is easy to see that the flip number of -estimation is for any : indeed, consider the following pair of insertion-deletion updates , repeated times. In such a stream, the value of the moment alternates times between and .
3.6 Sparse recovery
In our algorithm, we use sparse recovery to reconstruct the current frequency vector when it becomes sparse, which is possible even if it was arbitrarily dense in the meantime. For an introduction to the topic of sparse recovery, see the survey of Gilbert and Indyk [GI10]. Here we use the following streaming subroutine introduced by Gilbert, Strauss, Tropp, and Vershynin [GST+07].
Theorem 9 (Sparse recovery [Gst+07]).
There is a streaming algorithm that takes a parameter , operates on a vector , and has the following properties. It uses words of space and handles each coordinate update in time. Whenever the input vector is -sparse, the algorithm can reconstruct it exactly in time.
With probability , taken over the initial selection of randomness, the algorithm can correctly recover all -sparse vectors in all parts of the process (even when they are constructed in an adaptive manner).
3.7 Differential privacy
Differential privacy [DMN+06] is by now a standard formal notion of privacy for individual data items in large datasets. The formal definition is as follows.
Definition 10 (Differential Privacy).
Let be a randomized algorithm operating on databases. is -differentially private (in short -DP) if for any two databases and that differ on one row, and any event , it holds that
In the framework of Hassidim et al. [HKM+20], which we use here, DP is used in a somewhat non-standard way to protect the internal randomness of instances of a static algorithm.
4 Bounded Query Adversarially Robust Streaming
In this section, we present a -query adversarially robust streaming algorithm for approximating a function . In the case that , and for problems where the flip number is , such as turnstile -estimation, it obtains significant gains in the space complexity compared to the general algorithm introduced by Hassidim et al. [HKM+20]. The space overhead of the -query robust algorithm over an oblivious streaming algorithm is roughly , independently of how much the function changes in the meantime. Informally, this is because the flip number of the output observed by the adversary decreases from a (worst case) factor to a one. The algorithm is a simplified and adjusted version of the algorithm of Hassidim et al. [HKM+20]. Their algorithm builds on two important primitives: a DP procedure for detecting when a set of functions exceeds a certain threshold, and a DP procedure for computing the median of a set of values. Their algorithm works by invoking the threshold detection procedure after each update, in order to detect whether the estimate of the computed function should be re-evaluated. If this is the case, then the median procedure is used to compute a private updated estimation. Compared to their algorithm, we do not need the first primitive, i.e., the differentially private threshold detection. We only recompute a private median when the algorithm is replying to a query from the adversary.444We note that the private thresholds procedure is crucial for Hassidim et al. [HKM+20] to improve their space overhead from roughly , which strongly depends on the stream length, to roughly , which depends only on the flip number. In the problems we consider here, this is not essential as anyway.
Lemma 11 (-query adversarially robust algorithm).
Let and . Let be an oblivious streaming algorithm for computing a -approximation to a function that uses space and is correct with probability when queried once. Additionally, let be the range of possible correct values of on the stream.
There is a -query adversarially robust streaming algorithm, Algorithm 2, that uses space to provide a -approximation to with probability , where
4.1 Tools from differential privacy
In order to prove Lemma 11, we use the following set of tools from the differential privacy literature. First, the following theorem allows for composing multiple applications of a DP mechanism.
Theorem 12 ([Drv10]).
Let and let . An algorithm that allows for adaptive interactions with an -DP mechanism is -DP for .
At the heart of the algorithm is a DP mechanism for computing a median of a set of items. While sublinear-space algorithms for DP median estimation are known to exist [ABC21], for our purposes it suffices to use a simple approach with near-linear space complexity.
Theorem 13 ([Hkm+20, Theorem 2.6]).
For every , there exists an -DP algorithm for databases of size that outputs an element such that with probability at least , there are at least elements in that are bigger or equal to and at least elements in that are smaller or equal to , where . The algorithm uses space.
We note that the original statement of the theorem did not mention the space complexity, but such space can be obtained using standard approaches in the DP literature, e.g., by applying the exponential mechanism with the Gumbel trick [ABC21, MT07].
Finally, we use a known generalization theorem that shows that a deferentially private mechanism cannot heavily bias its computation on a sufficiently large set of random samples.
4.2 Proof of Lemma 11
Recall that Algorithm 2 runs multiple copies , …, of an oblivious streaming algorithm , where each uses an independent random string , selected from the same distribution as the randomness of . Let be the collection of random strings used by the copies of . We can view as a database, in which each is a row, and Algorithm 2 as a mechanism that operates on it. We now show that Algorithm 2 does not reveal much about the collection of random strings it uses.
Algorithm 2 is -DP with respect to , the collection of random strings used by copies of , where and are as defined in the algorithm.
The only way in which the algorithm reveals anything about the strings is by outputting the private median of current estimates of all algorithms. Note that the set of possible values of estimates is of size at most . Let be as defined in Line 2. It follows from Theorem 13 that each application of the median algorithm is -DP with respect to and errs with probability at most when the constant hidden by the asymptotic notation in the definition of in Line 2 is large enough. Applying Theorem 12, we conclude that the entire algorithm is -DP with respect to , where
Proof of Lemma 11..
Recall that by Lemma 15, Algorithm 2 is -DP with respect to the collection of random strings that copies of use, where is defined in Line 2 and is defined in Line 2. For any random string that an instance of may use and for any , let equal if outputs a -approximation to the function being computed on the prefix of the stream, when asked the -th query and using as its randomness. Let be the randomness that , the -th instance of , uses. Since the adversary can be seen as an -DP mechanism, and this includes all generated queries and updates to the stream, by Theorem 14 and the union bound, we get that
for all , with probability at least as long as . We now show that this condition holds for a sufficiently large constant hidden by the asymptotic notation in the definition of in Line 2. First, observe that is defined to be a positive constant, and hence is a constant as well and can easily be bounded by a sufficiently large constant hidden in the definition of . It remains to bound . To this end, observe that the definition of also has two multiplicative terms. The first one is
and the second one is
Since , and, therefore, , their product multiplied by a sufficiently large constant—which again can be hidden in the asymptotic notation in the definition of —is greater than . This finishes the proof that can easily be achieved by properly adjusting constants.
Since is correct with probability at least on any fixed data stream, this implies that for each , at least predicates are . In other words, with probability at least , for each query from the adversary, at least of the collected estimates of the current value of are its -approximations. Note that the rounding step can only increase the approximation error by a factor of at most , which means that at least of estimates are -approximation of the current value of because .
Now note that the private median algorithm returns an estimate that is greater than or equal to at least of estimates and also smaller than or equal to at least of the same estimates with probability at least . As long as this algorithm outputs such an estimate, and as long as the fraction of bad estimates is at most , this means that the algorithm outputs a -approximation to the value of at the query point. By the union bound this occurs for all queries with probability at least .
The space complexity of the algorithm is dominated by the space to store the instances of . Note that there are of them. ∎
5 Bounded Change
As discussed in the introduction, our frequency moment estimation algorithm gains from the fact that when the vector is -dense for some value of , the value of cannot change too rapidly. We formally state and prove this below.
Let , , , and . If is -dense and , then is a -approximation to .
Consider a function defined as . We claim that for any , . This is easy to verify for , because and . Since is differentiable in with the absolute value of the derivative bounded by , the claim holds in that range as well, i.e., for other . This implies that for any , .
Since is -dense, for at least of its coordinates , , and hence . We therefore have . Analogously, . ∎
We use the following two well-known facts, which are easy to verify via basic calculus.
For and , .
For , .
Let , , , and . If is -dense and , then is a -approximation to .
Let . We partition the set of indices, , into two sets, and , based on how compares to . We have and .
For , we have
For , we have
This implies that, due to the convexity of the function for ,
The same bound holds for , i.e.,
Combining these bounds, we get a bound on the sum of differences in coordinates in
where the last inequality follows from the fact that is -dense, and therefore, .
Overall, combining our knowledge for both and ,
This immediately implies our main claim. ∎
6 Proof of the Main Result
We start by restating our main result. This version has more details than the simplified version, which was presented as Theorem 1 in the introduction.
Theorem 20 (Adversarially robust moment estimation algorithm, full version of Theorem 1).
|value of||space||detailed space complexity|
Implementation notes for Algorithm 1.
We start with a few implementation details. First, to ensure low memory usage, one has to maintain a sparse representation of , i.e., store only the non-zero coordinates in an easily searchable data structure such as a balanced binary search tree. This is possible, because as soon as becomes -dense, we stop maintaining it explicitly. Hence this part of the algorithm uses only words of space.
We also avoid discussing numerical issues, and assume that for any integer , we can compute a good approximation to in time, and also that summing such sufficiently good approximations still yields a sufficiently good approximation. In order to efficiently update , while avoiding accumulating numerical errors (due to a sequence of additions and subtractions), one can create a balanced binary tree in which we sum approximations for for all non-zero coordinates . Updating one of them then requires only updating the sums on the path to the root. This path is of length , and hence this requires updating at most intermediate sums, each being a result of adding two values.
Proof of our main result.
We are now ready to move on to the proof of our main result, which collects all the tools that we have developed throughout the paper.
Proof of Theorem 20.
We first prove our algorithm’s correctness conditioning on three assumptions, and then we prove that these assumptions hold with high probability. Finally, we analyze the space complexity of the algorithm. Throughout the proof denotes the value of after the update. Our assumptions are:
All invocations of , , and are successful. That is, the following events occur: correctly recovers (provided that is -sparse), returns a -approximation to the -moment of whenever it is queried, and returns a -approximation to the number of non-zero coordinates in whenever it is queried.
If is -sparse, then .
If , then is -sparse.
Correctness under the assumptions. By Assumption 3, is only invoked when is -sparse. By the first item, each such invocation correctly recovers . Hence, at the first time step of every time interval such that , the algorithm has a sparse representation of (as discussed in Section 6) and this continues for the duration of the sparse interval. Therefore, for the duration of an interval where , correctly approximates , and therefore all outputs of the algorithm are -approximations to .
Consider now a time interval where . By the above discussion, at the first time step such that , it holds that (since during sparse intervals the algorithm exactly knows ). We claim that at all time steps where , is a -approximation of . Fix a maximal time interval such that