Log In Sign Up

On the Robustness of CountSketch to Adaptive Inputs

by   Edith Cohen, et al.

CountSketch is a popular dimensionality reduction technique that maps vectors to a lower dimension using randomized linear measurements. The sketch supports recovering ℓ_2-heavy hitters of a vector (entries with v[i]^2 ≥1/kv^2_2). We study the robustness of the sketch in adaptive settings where input vectors may depend on the output from prior inputs. Adaptive settings arise in processes with feedback or with adversarial attacks. We show that the classic estimator is not robust, and can be attacked with a number of queries of the order of the sketch size. We propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries in the sketch size, which is an improvement factor of √(k) (for k heavy hitters) over prior work.


page 1

page 2

page 3

page 4


Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs

CountSketch and Feature Hashing (the "hashing trick") are popular random...

Cost-efficient Gaussian Tensor Network Embeddings for Tensor-structured Inputs

This work discusses tensor network embeddings, which are random matrices...

Fast Convex Quadratic Optimization Solvers with Adaptive Sketching-based Preconditioners

We consider least-squares problems with quadratic regularization and pro...

Elastic_HH: Tailored Elastic for Finding Heavy Hitters

Finding heavy hitters has been of vital importance in network measuremen...

Sketched MinDist

We consider sketch vectors of geometric objects J through the function ...

Recursive Sketches for Modular Deep Learning

We present a mechanism to compute a sketch (succinct summary) of how a c...

1 Introduction

Algorithms are often analyzed and used under the assumption that their internal randomness is independent of their inputs. This assumption, however, is not always reasonable. For example, consider a large system where there is a feedback loop between inputs and outputs or an explicit adversarial attack aimed at constructing inputs on which the system fails. In such a case, it is no longer true that the inputs are generated independently of the algorithm’s randomness (as the inputs depend on previous outputs which depend on the internal randomness) and hence the algorithms might fail to provide utility.

This motivated a growing interest in understanding the performance of algorithms when the inputs are chosen adaptively, possibly as a function of their previous outputs, and in designing robust algorithms

which provide utility guarantees even when their inputs are chosen adaptively. Works in this vein span multiple areas, including machine learning 

[43, 22, 3, 39], adaptive data analysis [20, 28, 34, 17], dynamic graph algorithms [41, 21, 25, 44], and sketching and streaming algorithms [10, 27, 46, 4, 9]. However, the resulting (robust) algorithms tend to be significantly less efficient then their classical counterparts. Furthermore, while the analyses of classical algorithms do not seem to carry over to the robust setting, in many cases we do not have any example showing that they are not robust (i.e,. for all we know it might only be the analysis that breaks).111It is worth mentioning that [10] showed an attack on a simplified version of the classical AMS sketch (with a weaker estimator). However, their attack does not apply with the classic estimator. [26] constructed an attack on linear sketches but the size of the attack is far from respective upper bounds. This naturally raises the question of quantifying more precisely the robustness of algorithms.

Driven by this question, in this work we set out to explore the robustness properties of the classical CountSketch algorithm [13], related to feature hashing [37] in the machine learning literature. CountSketch is a popular dimensionality reduction technique that maps vectors to a lower-dimension using randomized linear measurements. The method is textbook material and has found many applications in machine learning and data analysis [45, 40, 14, 15, 1, 42, 2, 16]. The sketch is often a component of large ML models or data analytics systems and its robustness may impact the overall robustness of the system.

Operationally, CountSketch is parametrized by , where is the dimension of input vectors, is the size of the sketch, and is a parameter controlling the accuracy of the sketch (referred to as the “width” of the sketch). It is applied by initializing pairs of random hash functions where and . We think of as defining “buckets” ( buckets for every pair of hash functions). To sketch a vector : For every and for every , add to the bucket indexed by . The resulting collection of summations (the values of the buckets) is the sketch, which we denote as . That is, , where

In applications, the desired task (e.g., recovering the set of heavy hitter entries of and approximate values, reconstructing an approximation of the input vector, or approximating the inner product of two vectors) is obtained by applying an estimator to the sketch. Note that does not access the original vector. The most commonly used estimator in the context of CountSketch is the median estimator, with which the th enrty of the original vector is estimated as To intuit this estimator, observe that for every we have that

is an unbiased estimator for

. Hence, intuitively, the median of these values is a good estimate.

Our focus will be on the task of recovering the -heavy hitters. An -heavy hitter with parameter of a vector is defined to be an entry such that , where , the -tail of , is a vector obtained from by replacing its largest entries in magnitude with . The heavy-hitters problem is to return a set of keys that includes all heavy hitters. The output is correct when all heavy hitters are reported. In the non-adaptive setting, this problem can be solved using CountSketch with and , by returning the keys with the largest estimated magnitudes (via the median estimator).

As we mentioned, in this work we are interested in the adaptive setting, where the same initialization (specified by ) is used to sketch different inputs or to maintain a sketch as the input is updated. In its most basic form, the setting is modeled using the following game between a sketching algorithm Sketch with an estimator (not necessarily CountSketch and the median estimator) and an Analyst. At the beginning, we sample the initialization randomness of Sketch, denoted as . Then, the game proceeds in rounds, where in round :

  • The Analyst chooses a vector , which can depend, in particular, on all previous outputs of .

  • outputs a set of keys, which is given to the Analyst.

We say that is robust for rounds

if for any Analyst, with high probability, for every

it holds that contains all the heavy hitters of . The focus is on designing robust sketches of size as small as possible (as a function of ). We remark that it is trivial to design robust sketches with size linear with , by simply duplicating a classical sketch times and using every copy of the classical sketch to answer one query. Therefore, if a sketch is only robust to a number of rounds that scales linearly with its size, then we simply say that this sketch is non-robust.

1.1 Our Contributions

We first briefly state our main contributions. We elaborate on these afterwards.

  1. We show that CountSketch together with the standard median estimator are non-robust. We achieve this by designing an attack that poses queries to CountSketch such that, w.h.p., the answer given to the last query is wrong. This constitutes the first result showing that a popular sketching algorithm is non-robust. We complement our analysis with empirical evaluation, showing that our attack is practical.

  2. We introduce (see Section 3) a novel estimator (instead of the median estimator), which we refer to as the sign-alignment estimator, that reports keys as heavy hitters based on the signs of their corresponding buckets. This new estimator is natural, and as we show, has comparable performances to the median estimator even in the non-adaptive setting. We believe that this new estimator can be of independent interest.

  3. We design a noisy version of our sign-alignment estimator (utilizing techniques from differential privacy), and design a variant of CountSketch, which we call BucketCountSketch, or BCountSketch in short. We show that BCountSketch together with our noisy estimator are robust for rounds using sketch size roughly . This improves over the previous state-of-the-art robust algorithm for the heavy hitters problem of [27], which has size .

    We show that, in a sense, the additional ingredients from differential privacy in the robust estimator are necessary: BCountSketch and CountSketch with a basic version of the sign-alignment estimator are non-robust. Moreover, our analysis of the robust noisy version is tight in that it can be attacked using rounds.

  4. Extensions: We show that our algorithms allow for robustly reporting estimates for the weight of the identified heavy-hitters. We also refine our robust algorithms so that robustness is guaranteed for longer input sequences in some situations. In particular, this is beneficial in a streaming setting, where the input vector changes gradually and we can only account for changes in output.

1.1.1 Our attack on CountSketch with the median estimator

We next provide a simplified description of our attack on CountSketch with the median estimator. We emphasize that our attack is much more general, applicable to a broader family of sketches and estimators (see Appendix 9). Recall that on every iteration, when given a vector , we have that outputs a set of keys. In our attack, we pose a sequence of vectors , where all of these vectors contain keys 1 and 2 as “largish heavy hitters” (of equal value), as well as a fixed set of “super heavy hitters”, say keys . In addition, each contains a disjoint set of keys (say keys) with random values, which we refer to as a random tail. So each of these vectors contains “super heavy hitters”, two “largish heavy hitters”, and random noise. We feed each of these vectors to CountSketch. As CountSketch reports exactly keys, in every iteration we expect all of the “super heavy” elements to be reported, together with one of the “largish heavy hitters” (as a function of the random tail). Let us denote by the subset of all iterations during which key was not reported. (We refer to a tail used in iteration as a collected tail.) The fact that key 2 was reported over key 1 during these iterations means that our random tails introduced some bias to the weight estimates of key 2 over the weight estimates of key 1. We show that, conditioned on a tail being collected, in expectation, the random tail introduces negative bias to the weight estimate of key 1, and positive bias to the weight estimate of key 2. At the end of this process, we construct a final query vector containing key 1 as a “super heavy hitter” and keys as ”largish heavy hitters”, together with the sum of all the collected tails. We show that, w.h.p., the biases we generate “add up”, such that key 1 will not get reported as heavy when querying , despite being the dominant coordinate in by a large margin.

Remark 1.1.

An important feature of our attack is that it works even when CountSketch is only used to report a set of heavy keys, without their estimated values. The attack can be somewhat simplified in case estimated values are reported.

Remark 1.2.

A natural question to ask regards the implications of our attack on the robustness of routine applications of CountSketch. We argue that elements of the attack can occur in routine settings. The attack is "blackbox:" Only uses information from the output. It collects and combines inputs with the same heavy key, which is a natural simple feedback process. In practice this can correspond to examples with the same label or to related traffic patterns that might load the same network component. Our attack combines components of the input that contribute to a "misclassification." Moreover, adaptivity of the input is very lightly used: The final query on which the sketch fails is the only one that depends on prior inputs. This suggests that an adversarial input can be constructed after simply monitoring the "normal operation" of the algorithm on unintentional inputs. On the other hand, our attack uses "borderline" inputs in order to construct an adversarial input. This suggests that the sketch would be more robust in settings that somehow restrict all inputs (in a deterministic way) to be far from the decision boundary – with all keys being either very heavy or very far from being heavy.

We complement our theoretical analysis of this attack with empirical results, showing that the attack is feasible with a small number of queries. Figure 1

reports simulation results of the attack on the median estimator. The left plot shows the bias-to-noise ratio (the median value of the tail contributions scaled by its standard deviation) as a function of the number of attack rounds for the two special keys (which accumulate positive and negative bias) and another key (which remains unbiased). The left plot visualizes the square-root relation of the bias-to-noise ratio with the number of rounds. A sketch provides good estimates for a key when its weight is larger than the "noise" on its buckets that is induced by the "tail" of the vector. In this case, the key is a heavy hitter. The attack is thus effective when the bias exceeds the noise. The right plot shows the number of rounds needed to achieve a specified bias-to-noise ratio (showing

) as a function of the size (number of rows ) of the sketch. The results indicate that rounds are needed to obtain a vector with bias-to-noise ratio of BNR for a sketch with rows.222We performed additional simulations (not shown) where we swept from to , keeping and keeping . We observed (as expected) the same dependence of . Note that the dependence should hold as long as the parameters are in a regime where most buckets of each of the heaviest keys have no collisions with other heavy keys. Also note that we used a large value of , but the same dependence holds for smaller values of .

Figure 1: Left: Bias-to-noise ratio for number of rounds, average of 10 simulations with different initializations (shaded region between the minimum and maximum). Right: Attack rounds versus sketch size to obtain bias-to-noise ratio , averaged over and simulations respectively (shaded region between minimum and maximum).

1.1.2 Our new robust sketch using differential privacy

Differential privacy [18] is a mathematical definition for privacy that aims to enable statistical analyses of datasets while providing strong guarantees that individual level information does not leak. Specifically, an algorithm that analyzes data is said to be differentially private if it is insensitive to the addition/deletion of any single individual record in the data. Intuitively, this guarantees that whatever is learned about an individual record could also be learned without this record. Over the last few years, differential privacy has proven to be an important algorithmic notion, even when data privacy is not of concern. Particularly of interest to us is a recent line of work, starting from [17], showing that differential privacy can be used as a tool for avoiding overfitting, which occurs when algorithms begin to “memorize” data rather than “learning” to generalize from a trend.

Recall that the difficulty in our adaptive setting arises from potential dependencies between the inputs of the algorithm and its internal randomness. Our construction builds on a technique, introduced by [27], for using differential privacy to protect not the input data, but rather the internal randomness of algorithm. As [27] showed, leveraging the “anti-overfitting” guarantees of differential privacy, this overcomes the difficulties that arise in the adaptive setting. Following [27], this technique was also used by [4, 9, 23, 7] for designing robust algorithms in various settings. At a high level, all of these constructions operate as follows. Let be a randomized algorithm that solves some task of interest in the non-adaptive setting (in our case, could be CountSketch combined with the median estimator). To design a robust version of do the following.

GenericTemplate [-8px] [topsep=0pt,rightmargin=10pt] Instantiate copies of the non-adaptive algorithm (for some parameter ). For steps: [topsep=0pt,leftmargin=15pt] Feed the current input to all of the copies of to obtain intermediate outputs . Return a differentially private aggregation of as the current output.

That is, all the current applications of differential privacy for constructing robust algorithms operate in a black-box fashion, by aggregating the outcomes of non-robust algorithms. For the -heavy hitters problem, as we next explain, we would need to set , which introduces a blowup of on top of the size of the non-adaptive algorithm. Applying this construction with CountSketch, which has size , results in a robust algorithm for rounds with size .

The reason for setting comes from known composition theorems for differential privacy, showing that, informally, we can release aggregated outputs in a differentially private manner given a dataset containing elements. In our context, we have rounds, during each of which we need to release heavy elements. This amounts to a total of aggregations, for which we need to have intermediate outputs.

We design better algorithms by breaking free from the black-box approach outlined above. We still use differential privacy to protect the internal randomness of our algorithms, but we do so in a “white-box” manner, by integrating it into the sketch itself. As a warmup, consider the following noisy variant of the median estimator (to be applied on top of CountSketch). Given the sketch of a vector , denoted as , and a coordinate , instead of returning the actual median of , return a differentially private estimate for it.

As before, in order to release estimates throughout the execution, we would need to set , which results in a sketch of size . So we did not gain much with this idea in terms of the sketch size compared to the GenericTemplate. Still, there is a conceptual improvement here. The improvement is that with the GenericTemplate we argued about differential privacy w.r.t. the intermediate outputs , where every results from a different instantiation of CountSketch. This effectively means that in the GenericTemplate we needed to tune our privacy parameters so that the aggregation in Step 2b “hides” an entire copy of CountSketch, which includes hash functions. With our warmup idea, on the other hand, when privately estimating the median of , we only need to hide every single one of these elements, which amounts to hiding only a single hash function pair . (We refer to the type of object we need to “hide” with differential privacy as our privacy unit; so in the GenericTemplate the privacy unit was a complete copy of CountSketch, and now the privacy unit is reduced to a single hash function pair). This is good because, generally speaking, protecting less with differential privacy is easier, and can potentially be done more efficiently.

Indeed, we obtain our positive results by “lowering the privacy bar” even further. Informally, we show that it is possible to work with the individual buckets as our privacy unit, rather than the individual hash functions, while still being able to leverage the generalization properties of differential privacy to argue utility in the adaptive setting. Intuitively, but inaccurately, by doing so we will have elements to aggregate with differential privacy (the number of buckets), rather than only elements (number of hash functions), which would allow us to answer a larger number of adaptive queries via composition arguments.

There are two major challenges with this approach, which we need to address.

First challenge.

Recall that every key participates in buckets (one for every hash function). So, even if we work with the individual buckets as the privacy unit, still, when estimating the weight of key we have only elements to aggregate; not elements as we described it above. It is therefore not clear how to gain from working with the buckets as the privacy unit. We tackle this by conducting a more fine-tuned analysis, based on the following idea. Suppose that the current input vector is and let denote the set of keys identified to be heavy. While we indeed estimate the weight of every by aggregating only buckets, what we show is that, on average, we need to aggregate different buckets to estimate the weight of every . This means that (on average) every bucket participates in very few estimations per query. Overall, every bucket participates in aggregations throughout the execution, rather than as before. Using composition arguments, we now need to aggregate only elements (buckets) to produce our estimates, rather than as before. So it suffices to set , i.e., suffices to set . The analysis of this idea is delicate, and we actually are not aware of a variant of the median estimator that would make this idea go through. To overcome this issue, we propose a novel estimator, which we refer to as the sign-alignment estimator, that reports keys as heavy hitters based on the signs of their corresponding buckets. This estimator has several appealing properties that, we believe, make it of independent interest.

Second challenge.

The standard generalization properties of differential privacy, which we leverage to avoid the difficulties that arise from adaptivity, only hold for product distributions.333While there are works that studied the generalization properties of (variants of) differential privacy under non-product distributions, these works are not applicable in our setting. [5, 32] This is fine when working with the individual hash functions as the privacy unit, because the different hash functions are sampled independently. However, this is no longer true when working with the individual buckets as the privacy unit, as clearly, buckets pertaining to the same hash function are dependent. To overcome this difficulty, we propose a variant of CountSketch which we call BCountSketch, that has the property that all of its buckets are independent. This variant retains the marginal distribution of the buckets in CountSketch, but removes dependencies.

1.1.3 Feasibility of alternatives

Deterministic algorithms are inherently robust and therefore one approach to achieve robustness is to redesign the algorithm to be deterministic or have deterministic components [41, 24]. We note that the related -heavy hitters problem on data streams has known deterministic sketches [36, 38], and therefore a sketch that is fully robust in adaptive settings. For -heavy hitters, however, all known designs are based on randomized linear measurements and there are known lower bounds of on the size of any deterministic sketch, even one that only supports positive updates [29]. In particular this means that for -heavy hitters we can not hope for better robustness via a deterministic sketch.

2 Preliminaries

For vectors we use the notation for the value of the th entry of the vector (which we also refer to as the th key), for the inner product, and for the norm.

Definition 2.1.

(heavy hitter) Given a vector , an entry is an --heavy hitter if .

Definition 2.2.

(Heavy hitters problem, with and without values) A set of entries is a correct solution for the heavy hitters problem if and includes all the heavy hitters. The solution is -correct for the problem with values if it includes approximate values for all so that .

2.1 CountSketch and BCountSketch

CountSketch and our proposed variant BCountSketch are specified by the parameters , where is the dimension of input vectors, is its width, and is the size of the sketch (number of linear measurements of the input vector).

The internal randomness of the sketch specifies a set of measurement vectors where for . The sketch of a vector is the set of linear measurements (which we also refer to as buckets)


The internal randomness specifies a set of random hash functions () with the marginals that , , , and () so that . The measurement vectors are organized as sets of measurements each. (, ):

Interestingly, limited (pairwise) independence of the hash functions and suffices for the utility guarantees (stated below). Note that with CountSketch the measurement vectors within each set are dependent.


The measurement vectors are drawn i.i.d. from a distribution . The distribution is the same as that of the measurement vectors of CountSketch except that dependencies are removed. The i.i.d. property will facilitate our analysis of robust estimation.

Each () is specified by two objects: A selection hash function and a sign hash function with the following marginals:

  • s.t. .

  • s.t. .

The measurement vector entries are (). Our upper bound only requires limited independence (3-wise for and 5-wise for , respectively). Our lower bounds hold in the stronger model of full independence.

2.2 The median estimator

We say that a key participates in bucket when . We denote by the set of buckets that participates in. Note that with CountSketch we have that since participates in exactly one bucket in each set of buckets and with BCountSketch we have since participates in each bucket with probability . Also note that with both methods, for are i.i.d.

Note that for all it holds that . For each key we get a multiset of unbiased independent weak estimates of the value (one for each ):

We use these estimates to determine if should be reported as a heavy hitter and if so, its reported estimated value. The classic CountSketch estimator [13] uses the median of these values: .

2.3 Utility of CountSketch and BCountSketch

The median estimator guarantees that for and , , where , the -tail of , is a vector obtained from by replacing its largest entries in magnitude with . The analysis extends to BCountSketch (that has the same distribution of the independent bucket estimates except that their number is

in expectation and not exact). The median estimator is unbiased whereas other quantiles of

may not be, but importantly for our robust weight estimation, the stated error bound holds for any quantile of in a range of , where the constant can be tuned by the constant factors of setting the sketch parameters [35].

The following guarantee is obtained using a union bound ver keys [13] (for and sketch parameters and ):


where is an approximation of that is computed from . For the heavy hitters problem, we return the set of keys with largest estimates, along with their estimated values.

When we have different non-adaptive inputs , a simple union bound argument with (1) provides that with a sketch parameters and :


That is, the number of inputs for which we can guarantee utility with high probability grows exponentially with the size of the sketch. As mentioned in the introduction, we shall see that the median estimator is not robust in adaptive settings, where we can only guarantee utility for number of inputs that grows linearly with the sketch size, matching a trivial upper bound.

3 Sign-Alignment Estimators

We propose sign-alignment estimators (with CountSketch and BCountSketch) that determine whether a key is reported as a potential heavy hitter based on the number of buckets for which the signs of align.

For , , and , we define the predicates

We show that for a key that participates in the bucket , if is heavy then the sign of is very likely to agree with the sign of the bucket estimate but when there are many keys that are heavier than ( lies in the "tail") then such agreement is less likely.

For and we accordingly define the probabilities that these predicates are satisfied by , conditioned on participating, as

The intuition is that when , we expect and therefore and thus . When we expect and and hence .

Lemma 3.1.

There are constants and and such that for all , for all

  • If then (and therefore )

  • If then .

Corollary 3.2.

Consider sketches with width and define


Then the set contains all heavy hitter keys of and .

It follows from Corollary 3.2 that a set that includes all keys and only keys is a correct solution of the heavy hitters problem. Our sign-alignment estimators are specified by two components. The first component is obtaining estimates of given a query with , a key , and . We shall see (Section 3.3) that simple averaging suffices for the oblivious setting but more nuanced methods (Section 4) are needed for robustness. The second component is the estimator that uses these estimates to compute an output set . We present two methods, threshold (Section 3.1) for arbitrary queries and stable (Section 3.2) for continuous reporting.

Remark 3.3.

Sign-alignment estimators have the desirable property that only keys that "dominate" most of their buckets can have high alignment and thus get reported. This because with probability , the magnitude of contribution of keys to a bucket is smaller than . In particular, vectors with no heavy keys (empty ) will have no reported keys. This can be a distinct advantage over estimators that simply report keys with highest estimates.

3.1 The Threshold Estimator

A threshold sign-alignment estimator output the set of keys:


where .

Lemma 3.4.

(correctness of threshold estimators) If for each query vector , , and the estimates satisfy


then the output is correct.


3.2 The Stable Estimator

This estimator is designed for a continuous reporting version of the heavy hitters problem and is beneficial when the input sequence is of related vectors (as in streaming). In this case we report K continuously and modify it as needed due to input changes. In these applications we desire stability of K, in the sense of avoiding thrashing, where a borderline key exits and re-enters K following minor updates. We shall see that stability can significantly improve robustness guarantees, as we only need to account for changes in the reported set instead of the total size of each reported set.

Our stable estimator uses two threshold values:


A key enters the reported set when . A key exits the reported set when .

Lemma 3.5.

(correctness of stable estimators) If for all queries , keys , and , our estimates satisfy


then the output of the stable estimator is correct (K includes all keys and only keys). Moreover, the reporting status of a key can change only when changes by at least .


Similar to that of Lemma 3.4. ∎

3.3 Basic estimates

The basic estimates are simple averages over buckets:

Lemma 3.6.

In an oblivious setting (when the buckets are an independent sample from that does not depend on ) we have that for any constant and , using ,


From multiplicative Chernoff bound, we obtain that

We obtain the claim by applying a union bound over the queries. ∎

It follows that in the oblivious setting the sign-alignment estimators are correct with . This matches the utility guarantees provided with the median estimator (Section 2.3). We shall see however that as is the case for the median estimator, our sign-alignment estimators with the basic estimates are non-robust in adaptive settings. The robust estimators we introduce in Section 4 use estimates that are more nuanced.

4 Robust Estimators

We provide two sign-alignment estimators for BCountSketch that are robust against adaptive adversaries. A robust version of the threshold estimator of Section 3.1, that is described as Algorithm LABEL:algo:robust-count-sketch in Section 4.1 (correctness proof provided in Section 5), and a robust version of the stable estimator of Section 3.2 that is described as Algorithm LABEL:algo:streaming-robust in Appendix 8.

In the introduction we stated the robustness guarantees in terms of the size of the query sequence, that is, a sketch with parameters provides guarantees for all query sequences where . The number of inputs is a coarse parameter that uses up the same "robustness budget" for each query even when very few keys are actually reported or when there is little or no change between reported sets on consecutive inputs. We introduce a refined property of sequences, its -number (), that accounts for smaller output sizes with the threshold estimator and only for changes in the output with the stable estimators. We then establish robustness guarantees in terms of . This allows the robust handling of much longer sequences in some conditions.

The -number we use to analyze our robust threshold estimator (Algorithm LABEL:algo:robust-count-sketch) accounts only for potential reporting of each key, namely, the number of times it occurs in . This saves “robustness budget” on inputs with a small number of heavy keys or with no heavy keys.

Definition 4.1.

(-number of an input sequence) For an input sequence and a key , define


to be the number of vectors for which is in . For an input sequence , define


The -number we use to analyze our stable robust estimator (Algorithm LABEL:algo:streaming-robust) accounts only for changes in the output between consecutive inputs. This is particularly beneficial for streaming applications, where updates to the input are incremental and hence consecutive outputs tend to be similar. For this purpose we redefine to bound the number of times that the key may enter or exit the reported set when a stable estimator is used (note that the redefined value is at most twice (11) but can be much smaller). We then redefine accordingly as in (12). Note that the redefined values always satisfy but it is possible to have , allowing for robustness on longer streams with the same budget. The approach of accounting for changes in the output in the context of robust streaming was first proposed in [10] and we extend their use of the term flip number.

Definition 4.2.

(flip number of a key) Consider an input sequence . We say that a key is high at step if and is low at step if . The flip number of a key is defined as the number of transitions from low to high or vice versa (skipping steps where it is neither).

Remark 4.3.

Consider the stable estimator when for all . The number of times key enters or exits the reported set is at most .

Our robust estimators provide the following guarantees:

Theorem 4.1.

Our robust threshold (Algorithm LABEL:algo:robust-count-sketch) and stable (Algorithm LABEL:algo:streaming-robust) estimators, with appropriate setting of the constants, provide the following guarantees (each for its respective definition of ): Let be appropriate constants. Let . Consider an execution of our robust estimator with adaptive inputs , access limit , and i.i.d initialization of and


Then if and , with probability all outputs are correct.

Restated, we obtain that a sketch with parameters , with our robust estimators, provides robustness to adaptive inputs with .444Importantly, to obtain this robustness guarantee we do not have to actually track for our input sequence so that we can stop when once a limit is reached. The design of our algorithm allows us to determine when the guarantees "fail" (our algorithm associates an "access count" for each sketch buckets and inactivates buckets that reach an "access limit." The accuracy guarantee fails when too many buckets of the same key turn inactive, which is something we can track.

4.1 The Robust Threshold Estimator

Our robust threshold estimator is provided as Algorithm LABEL:algo:robust-count-sketch (The constants will be chosen sufficiently large). The algorithm initializes a ThresholdMonitor structure [31] (see Algorithm LABEL:algo:threshold) over the dataset of the measurement vectors (buckets). A ThresholdMonitor inputs a predicate that is defined over and a threshold value and tests whether a noisy count of the predicate over exceeds the threshold. It has the property that the privacy budget is only charged on queries where the noisy count exceeds the threshold and only buckets on which the predicate evaluates as correct are charged. The access limit specifies how many times we can charge a bucket before it gets inactivated. The values constitute upper bounds on the number of times the buckets of key get charged during the execution. More details on and the correctness proof of our algorithm are provided in Appendix 5.

For each query vector, the estimator loops over all keys and tests whether the count of the predicates over active buckets, with noise added, exceeds a threshold. If so, the key is reported and the access count for the buckets that contributed to the count is incremented. Otherwise, the key is not reported. In Appendix 7 we show that we can make this more efficient by using a non-robust heavy-hitters sketch to exclude testing of keys that are highly unlikely to be reported.

The robust estimator as presented only reports a set of keys K. In Appendix 6 we describe how weight estimates can be reported as well for , by only doubling the "robustness budget" (number of accesses to buckets of ).



5 Proofs for the robust threshold estimator

In this Section we provide a proof of correctness of our robust threshold estimator. We use and the constants as in Lemma 3.1. Let and . We establish the following.

Theorem 5.1.

Let be appropriate constants. Let . Consider an execution of Algorithm LABEL:algo:robust-count-sketch with adaptive inputs and i.i.d initialization of . Set


Then if , with probability all outputs are correct.

5.1 Tools from Differential Privacy

First, we need to introduce necessary tools from Differential Privacy.

Theorem 5.2.

(Generalization property of DP [17, 6, 19]) Let be an -differentially private algorithm that operates on a database of size and outputs a predicate . Let be a distribution over , let be a database containing i.i.d. elements from , and let . Then for any we have that

The Fine-Grained Sparse Vector Technique

We will employ the (fine-grained) sparse vector of [31] described in Algorithm LABEL:algo:threshold (ThresholdMonitor). Algorithm ThresholdMonitor has the following utility and privacy guarantees:

Theorem 5.3 (Utility guarantee [31]).

Consider an execution of Algorithm ThresholdMonitor on a database and on a sequence of adaptively chosen queries. Let denote the database as it is before answering the -th query. With probability at least , it holds that for all

  • If the output is then .

  • If the output is then .

Theorem 5.4 (Privacy guarantee [31]).

Algorithm ThresholdMonitor is -differentially private.

5.2 Proof overview of Theorem 5.1

Algorithm LABEL:algo:robust-count-sketch issues to count queries for each input vector . These queries correspond to predicates of the form (sign , key and query ).

For each predicate , the respective count of the is computed over buckets that are active at the time of query:

The adds noise to to obtain . The outputs for the query if and only if . A key is reported if and only if the output is for at least one of its two queries. Equivalently, the inclusion of each key in the output set is determined as in (5) using the respective approximate values .

It follows from Lemma 3.4 that if all our noisy counts over buckets are within of their expectation over then the output is correct. We will show that this happens with probability .

We will bound the "error" of by separately bounding the contributions of different sources of error. We show that with probability the additive error is at most for all these estimates.

  • One source of error is due to not counting inactive buckets (those that reached the access limit by ). We introduce the notion of "useful" buckets (see Section 5.3), where usefulness is a deterministic property of the input sequence and has the property that all useful buckets remain active. We show that in expectation, for each key , a fraction of the buckets that a key participates in are useful. Hence in expectation the contribution to the error is bounded by .

  • Another source of error is due to the noise added by . We use the parameter settings and Theorem 5.3 to bound the maximum error over all queries by with probability .

  • We establish correctness under the following assumption (see Section 5.4). We treat the query vectors as fixed and assume the buckets satisfy the following: We formulate a set of predicates over that depend on the query vectors so that all have expectation and the expected value of all these predicates is approximated by the sample to within an additive error of . The set includes all predicates () and also includes the usefulness predicates over buckets each key participates in. With this assumption, the total error due to inactive buckets is bounded by (combining their expectation and the error). Using the assumption, the error due to estimation of is at most . Recall also that the error due to noise is . Combining, we get an error of when the assumption holds.

  • We remove the assumption by relating (see Section 5.5) the count of our predicates over buckets to its respective expectation over using the generalization property of DP (Theorem 5.2). The property establishes that even though our query vectors and hence predicates are generated in a dependent way on the sampled buckets, because they are generated in a way that preserves the privacy of the buckets, their average over the sampled buckets still approximates well their expectation. This holds with probability for all predicates.

5.3 Useful buckets

Definition 5.1.

(useful buckets) We say that a bucket with measurement vector is useful with respect to and access limit if the total count, over vectors in , of keys that participate in the bucket is at most :

The predicate useful depends on the set and applies to all whereas being active applies only to the buckets and buckets may become inactive over time. We can relate useful and active buckets as follows:

Remark 5.2.

Consider an execution of Algorithm LABEL:algo:robust-count-sketch with input sequence where only reports keys from for each query . Then all useful buckets