Random measure priors in Bayesian frequency recovery from sketches

03/27/2023
by   Mario Beraha, et al.
0

Given a lossy-compressed representation, or sketch, of data with values in a set of symbols, the frequency recovery problem considers the estimation of the empirical frequency of a new data point. Recent studies have applied Bayesian nonparametrics (BNPs) to develop learning-augmented versions of the popular count-min sketch (CMS) recovery algorithm. In this paper, we present a novel BNP approach to frequency recovery, which is not built from the CMS but still relies on a sketch obtained by random hashing. Assuming data to be modeled as random samples from an unknown discrete distribution, which is endowed with a Poisson-Kingman (PK) prior, we provide the posterior distribution of the empirical frequency of a symbol, given the sketch. Estimates are then obtained as mean functionals. An application of our result is presented for the Dirichlet process (DP) and Pitman-Yor process (PYP) priors, and in particular: i) we characterize the DP prior as the sole PK prior featuring a property of sufficiency with respect to the sketch, leading to a simple posterior distribution; ii) we identify a large sample regime under which the PYP prior leads to a simple approximation of the posterior distribution. Then, we develop our BNP approach to a "traits" formulation of the frequency recovery problem, not yet studied in the CMS literature, in which data belong to more than one symbol (trait), and exhibit nonnegative integer levels of associations with each trait. In particular, by modeling data as random samples from a generalized Indian buffet process, we provide the posterior distribution of the empirical frequency level of a trait, given the sketch. This result is then applied under the assumption of a Poisson and Bernoulli distribution for the levels of associations, leading to a simple posterior distribution and a simple approximation of the posterior distribution, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2021

Learning-augmented count-min sketches via Bayesian nonparametrics

The count-min sketch (CMS) is a time and memory efficient randomized dat...
research
03/31/2023

Transform-scaled process priors for trait allocations in Bayesian nonparametrics

Completely random measures (CRMs) provide a broad class of priors, argua...
research
02/07/2021

A Bayesian nonparametric approach to count-min sketch under power-law data streams

The count-min sketch (CMS) is a randomized data structure that provides ...
research
11/09/2022

Conformal Frequency Estimation with Sketched Data under Relaxed Exchangeability

A flexible method is developed to construct a confidence interval for th...
research
11/07/2018

Poisson Multi-Bernoulli Mapping Using Gibbs Sampling

This paper addresses the mapping problem. Using a conjugate prior form, ...
research
09/05/2022

Bayesian nonparametric estimation of coverage probabilities and distinct counts from sketched data

The estimation of coverage probabilities, and in particular of the missi...
research
07/09/2019

Bayesian approach for inverse obstacle scattering with Poisson data

We consider an acoustic obstacle reconstruction problem with Poisson dat...

Please sign up or login with your details

Forgot password? Click here to reset