Composable Sketches for Functions of Frequencies: Beyond the Worst Case

04/09/2020
by   Edith Cohen, et al.
0

Recently there has been increased interest in using machine learning techniques to improve classical algorithms. In this paper we study when it is possible to construct compact, composable sketches for weighted sampling and statistics estimation according to functions of data frequencies. Such structures are now central components of large-scale data analytics and machine learning pipelines. However, many common functions, such as thresholds and pth frequency moments with p>2, are known to require polynomial size sketches in the worst case. We explore performance beyond the worst case under two different types of assumptions. The first is having access to noisy advice on item frequencies. This continues the line of work of Hsu et al. (ICLR 2019), who assume predictions are provided by a machine learning model. The second is providing guaranteed performance on a restricted class of input frequency distributions that are better aligned with what is observed in practice. This extends the work on heavy hitters under Zipfian distributions in a seminal paper of Charikar et al. (ESA 2002). Surprisingly, we show analytically and empirically that "in practice" small polylogarithmic-size sketches provide accuracy for "hard" functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

Algorithms with Predictions

We introduce algorithms that use predictions from machine learning appli...
research
12/02/2021

Worst-case Optimal Binary Join Algorithms under General ℓ_p Constraints

Worst-case optimal join algorithms have so far been studied in two broad...
research
06/15/2021

Locally Differentially Private Frequency Estimation

We present two new local differentially private algorithms for frequency...
research
07/04/2019

Sampling Sketches for Concave Sublinear Functions of Frequencies

We consider massive distributed datasets that consist of elements modele...
research
01/28/2023

Do Orcas Have Semantic Language? Machine Learning to Predict Orca Behaviors Using Partially Labeled Vocalization Data

Orcinus orca (killer whales) exhibit complex calls. They last about a se...
research
12/05/2018

Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge

Estimating frequencies of certain items among a population is a basic st...
research
03/21/2023

Non-Asymptotic Pointwise and Worst-Case Bounds for Classical Spectrum Estimators

Spectrum estimation is a fundamental methodology in the analysis of time...

Please sign up or login with your details

Forgot password? Click here to reset