Tricking the Hashing Trick: A Tight Lower Bound on the Robustness of CountSketch to Adaptive Inputs

07/03/2022
by   Edith Cohen, et al.
1

CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of ℓ_2-heavy hitters (keys i where v_i^2 > ϵv_2^2) and approximate inner products. When the inputs are not adaptive (do not depend on prior outputs), classic estimators applied to a sketch of size O(ℓ/ϵ) are accurate for a number of queries that is exponential in ℓ. When inputs are adaptive, however, an adversarial input can be constructed after O(ℓ) queries with the classic estimator and the best known robust estimator only supports Õ(ℓ^2) queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after O(ℓ^2) queries produces an adversarial input vector whose sketch is highly biased. Our attack uses "natural" non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

On the Robustness of CountSketch to Adaptive Inputs

CountSketch is a popular dimensionality reduction technique that maps ve...
research
05/04/2021

Hardness-Preserving Reductions via Cuckoo Hashing

The focus of this work is hardness-preserving transformations of somewha...
research
06/11/2018

On the adversarial robustness of robust estimators

Motivated by recent data analytics applications, we study the adversaria...
research
04/08/2022

Testing Positive Semidefiniteness Using Linear Measurements

We study the problem of testing whether a symmetric d × d input matrix A...
research
06/15/2017

Generalization for Adaptively-chosen Estimators via Stable Median

Datasets are often reused to perform multiple statistical analyses in an...
research
07/13/2021

The Element Extraction Problem and the Cost of Determinism and Limited Adaptivity in Linear Queries

Two widely-used computational paradigms for sublinear algorithms are usi...

Please sign up or login with your details

Forgot password? Click here to reset