SpaceSaving^±: An Optimal Algorithm for Frequency Estimation and Frequent items in the Bounded Deletion Model

12/07/2021
by   Fuheng Zhao, et al.
0

In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the bounded deletion model. We establish the space lower bound for solving the deterministic frequent items problem in the bounded deletion model, and propose the Lazy SpaceSaving^± and SpaceSaving^± algorithms with optimal space bound. We then develop an efficient implementation of the SpaceSaving^± algorithm that minimizes the latency of update operations using novel data structures. The experimental evaluations testify that SpaceSaving^± has accurate frequency estimations and achieves very high recall and precision across different data distributions while using minimal space. Our analysis and experiments clearly demonstrate that SpaceSaving^± provides more accurate estimations using the same space as the state of the art protocols for applications with up to 93 deleted. Moreover, motivated by prior work, we propose Dyadic SpaceSaving^±, the first deterministic quantile approximation sketch in the bounded deletion model.

READ FULL TEXT
research
05/09/2019

Tight Lower Bound for Comparison-Based Quantile Summaries

Quantiles, such as the median or percentiles, provide concise and useful...
research
10/31/2022

Local Differentially Private Frequency Estimation based on Learned Sketches

Sketches are widely used for frequency estimation of data with a large d...
research
06/18/2018

Mining frequent items in unstructured P2P networks

Large scale decentralized systems, such as P2P, sensor or IoT device net...
research
11/20/2019

Streaming Frequent Items with Timestamps and Detecting Large Neighborhoods in Graph Streams

Detecting frequent items is a fundamental problem in data streaming rese...
research
09/12/2017

Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation

We introduce and study a new data sketch for processing massive datasets...
research
01/06/2022

SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

Stream monitoring is fundamental in many data stream applications, such ...
research
11/01/2017

Fast Dynamic Arrays

We present a highly optimized implementation of tiered vectors, a data s...

Please sign up or login with your details

Forgot password? Click here to reset