SALSA: Self-Adjusting Lean Streaming Analytics

02/24/2021
by   Ran Ben Basat, et al.
0

Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures. This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open-source [1].

READ FULL TEXT

page 4

page 7

research
08/14/2019

(Learned) Frequency Estimation Algorithms under Zipfian Distribution

The frequencies of the elements in a data stream are an important statis...
research
11/06/2021

Frequency Estimation with One-Sided Error

Frequency estimation is one of the most fundamental problems in streamin...
research
04/27/2018

Buffered Count-Min Sketch on SSD: Theory and Experiments

Frequency estimation data structures such as the count-min sketch (CMS) ...
research
11/09/2018

Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions

The Count-Min sketch is an important and well-studied data summarization...
research
03/29/2022

Analysis of Count-Min sketch under conservative update

Count-Min sketch is a hash-based data structure to represent a dynamical...
research
03/28/2022

A Formal Analysis of the Count-Min Sketch with Conservative Updates

Count-Min Sketch with Conservative Updates (CMS-CU) is a popular algorit...
research
07/04/2022

Learning state machines via efficient hashing of future traces

State machines are popular models to model and visualize discrete system...

Please sign up or login with your details

Forgot password? Click here to reset