Compressed Counting

02/17/2008
by   Ping Li, et al.
0

Counting is among the most fundamental operations in computing. For example, counting the pth frequency moment has been a very active area of research, in theoretical computer science, databases, and data mining. When p=1, the task (i.e., counting the sum) can be accomplished using a simple counter. Compressed Counting (CC) is proposed for efficiently computing the pth frequency moment of a data stream signal A_t, where 0<p<=2. CC is applicable if the streaming data follow the Turnstile model, with the restriction that at the time t for the evaluation, A_t[i]>= 0, which includes the strict Turnstile model as a special case. For natural data streams encountered in practice, this restriction is minor. The underly technique for CC is what we call skewed stable random projections, which captures the intuition that, when p=1 a simple counter suffices, and when p = 1+/Δ with small Δ, the sample complexity of a counter system should be low (continuously as a function of Δ). We show at small Δ the sample complexity (number of projections) k = O(1/ϵ) instead of O(1/ϵ^2). Compressed Counting can serve a basic building block for other tasks in statistics and computing, for example, estimation entropies of data streams, parameter estimations using the method of moments and maximum likelihood. Finally, another contribution is an algorithm for approximating the logarithmic norm, ∑_i=1^D A_t[i], and logarithmic distance. The logarithmic distance is useful in machine learning practice with heavy-tailed data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2021

Model Counting meets F0 Estimation

Constraint satisfaction problems (CSP's) and data stream models are two ...
research
12/07/2019

Flattened Exponential Histogram for Sliding Window Queries over Data Streams

The Basic Counting problem [1] is one of the most fundamental and critic...
research
01/13/2023

Differentially Private Continual Releases of Streaming Frequency Moment Estimations

The streaming model of computation is a popular approach for working wit...
research
05/09/2012

Improving Compressed Counting

Compressed Counting (CC) [22] was recently proposed for estimating the a...
research
05/08/2021

Separations for Estimating Large Frequency Moments on Data Streams

We study the classical problem of moment estimation of an underlying vec...
research
01/29/2018

ONCE and ONCE+: Counting the Frequency of Time-constrained Serial Episodes in a Streaming Sequence

As a representative sequential pattern mining problem, counting the freq...
research
10/21/2020

Taming Discrete Integration via the Boon of Dimensionality

Discrete integration is a fundamental problem in computer science that c...

Please sign up or login with your details

Forgot password? Click here to reset