Frequency Estimation with One-Sided Error

11/06/2021
by   Piotr Indyk, et al.
0

Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream S of elements from some universe U={1 … n}, the goal is to compute, in a single pass, a short sketch of S so that for any element i ∈ U, one can estimate the number x_i of times i occurs in S based on the sketch alone. Two state of the art solutions to this problems are the Count-Min and Count-Sketch algorithms. The frequency estimator x̃ produced by Count-Min, using O(1/ε·log n) dimensions, guarantees that x̃-x_∞≤εx_1 with high probability, and x̃≥ x holds deterministically. Also, Count-Min works under the assumption that x ≥ 0. On the other hand, Count-Sketch, using O(1/ε^2 ·log n) dimensions, guarantees that x̃-x_∞≤εx_2 with high probability. A natural question is whether it is possible to design the best of both worlds sketching method, with error guarantees depending on the ℓ_2 norm and space comparable to Count-Sketch, but (like Count-Min) also has the no-underestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. We also study the complementary problem, where the sketch is required to not over-estimate, i.e., x̃≤ x should hold always.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2023

Count-min sketch with variable number of hash functions: an experimental study

Conservative Count-Min, an improved version of Count-Min sketch [Cormode...
research
07/17/2018

Tracking the ℓ_2 Norm with Constant Update Time

The ℓ_2 tracking problem is the task of obtaining a streaming algorithm ...
research
11/09/2018

Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions

The Count-Min sketch is an important and well-studied data summarization...
research
03/02/2019

One Table to Count Them All: Parallel Frequency Estimation on Single-Board Computers

Sketches are probabilistic data structures that can provide approximate ...
research
02/24/2021

SALSA: Self-Adjusting Lean Streaming Analytics

Counters are the fundamental building block of many data sketching schem...
research
10/28/2019

Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products

In the last decade, it has been shown that many hard AI tasks, especiall...
research
10/23/2017

HyperMinHash: MinHash in LogLog space

In this extended abstract, we describe and analyse a streaming probabili...

Please sign up or login with your details

Forgot password? Click here to reset