Improving Compressed Counting

05/09/2012
by   Ping Li, et al.
0

Compressed Counting (CC) [22] was recently proposed for estimating the ath frequency moments of data streams, where 0 < a <= 2. CC can be used for estimating Shannon entropy, which can be approximated by certain functions of the ath frequency moments as a -> 1. Monitoring Shannon entropy for anomaly detection (e.g., DDoS attacks) in large networks is an important task. This paper presents a new algorithm for improving CC. The improvement is most substantial when a -> 1--. For example, when a = 0:99, the new algorithm reduces the estimation variance roughly by 100-fold. This new algorithm would make CC considerably more practical for estimating Shannon entropy. Furthermore, the new algorithm is statistically optimal when a = 0.5.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2021

New identities for the Shannon function and applications

We show how the Shannon entropy function H(p,q)is expressible as a linea...
research
10/06/2022

The Shannon Entropy of a Histogram

The histogram is a key method for visualizing data and estimating the un...
research
02/19/2018

On the computation of Shannon Entropy from Counting Bloom Filters

In this short note a method for computing the naive plugin estimator of ...
research
01/06/2018

Statistical estimation of the Shannon entropy

The behavior of the Kozachenko - Leonenko estimates for the (differentia...
research
02/17/2008

Compressed Counting

Counting is among the most fundamental operations in computing. For exam...
research
06/29/2023

Tokenization and the Noiseless Channel

Subword tokenization is a key part of many NLP pipelines. However, littl...
research
08/14/2022

Cellular liberality is measurable as Lempel-Ziv complexity of fastq files

Many studies used the Shannon entropy of transcriptome data to determine...

Please sign up or login with your details

Forgot password? Click here to reset