DeepAI

# Information Theoretic Limits of Cardinality Estimation: Fisher Meets Shannon

In this paper we study the intrinsic tradeoff between the space complexity of the sketch and its estimation error in the Random Oracle model. We define a new measure of efficiency for cardinality estimators called the Fisher-Shannon (𝖥𝗂𝗌𝗁) number ℋ/ℐ. It captures the tension between the limiting Shannon entropy (ℋ) of the sketch and its normalized Fisher information (ℐ), which characterizes (asymptotically) the variance of a statistically efficient estimator. We prove that many variants of the 𝖯𝖢𝖲𝖠 sketch of Flajolet and Martin have 𝖥𝗂𝗌𝗁 number H_0/I_0, where H_0,I_0 are two precisely-defined constants, and that all base-q generalizations of (Hyper)LogLog are strictly worse than H_0/I_0, but tend to H_0/I_0 in the limit as q→∞. All other known sketches have even worse 𝖥𝗂𝗌𝗁-numbers. We introduce a new sketch called 𝖥𝗂𝗌𝗁𝗆𝗈𝗇𝗀𝖾𝗋 that is based on a smoothed, compressed version of 𝖯𝖢𝖲𝖠 with a different estimation function. 𝖥𝗂𝗌𝗁𝗆𝗈𝗇𝗀𝖾𝗋 has 𝖥𝗂𝗌𝗁 number H_0/I_0 ≈ 1.98. It stores O(log^2log U) + (H_0/I_0)b ≈ 1.98b bits, and estimates cardinalities of multisets of [U] with a standard error of (1+o(1))/√(b). 𝖥𝗂𝗌𝗁𝗆𝗈𝗇𝗀𝖾𝗋's space-error tradeoff improves on state-of-the-art sketches like HyperLogLog, or even compressed representations of it. 𝖥𝗂𝗌𝗁𝗆𝗈𝗇𝗀𝖾𝗋 can be used in a distributed environment (where substreams are sketched separately and composed later). We conjecture that the 𝖥𝗂𝗌𝗁-number H_0/I_0 is a universal lower bound for any such composable sketch.

• 17 publications
• 3 publications
08/20/2020

### Simple and Efficient Cardinality Estimation in Data Streams

We study sketching schemes for the cardinality estimation problem in dat...
05/23/2022

### HyperLogLogLog: Cardinality Estimation With One Log More

We present HyperLogLogLog, a practical compression of the HyperLogLog sk...
08/22/2022

### Simpler and Better Cardinality Estimators for HyperLogLog and PCSA

Cardinality Estimation (aka Distinct Elements) is a classic problem in s...
07/10/2018

### Understanding VAEs in Fisher-Shannon Plane

In information theory, Fisher information and Shannon information (entro...
01/31/2019

### The Relation Between Bayesian Fisher Information and Shannon Information for Detecting a Change in a Parameter

We derive a connection between performance of estimators the performance...
10/23/2017

### HyperMinHash: Jaccard index sketching in LogLog space

In this extended abstract, we describe and analyse a streaming probabili...
05/09/2012

### Improving Compressed Counting

Compressed Counting (CC) [22] was recently proposed for estimating the a...