UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting

08/31/2023
by   Otmar Ertl, et al.
0

Since its invention HyperLogLog has become the standard algorithm for approximate distinct counting. Due to its space efficiency and suitability for distributed systems, it is widely used and also implemented in numerous databases. This work presents UltraLogLog, which shares the same practical properties as HyperLogLog. It is commutative, idempotent, mergeable, and has a fast guaranteed constant-time insert operation. At the same time, it requires 28 can be extracted using the maximum likelihood method. Alternatively, a simpler and faster estimator is proposed, which still achieves a space reduction of 24 non-distributed setting where martingale estimation can be used, UltraLogLog is able to reduce space by 17 registers lead to better compaction when using standard compression algorithms. All this is verified by experimental results that are in perfect agreement with the theoretical analysis which also outlines potential for even more space-efficient data structures. A production-ready Java implementation of UltraLogLog has been released as part of the open-source Hash4j library.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2020

Path Query Data Structures in Practice

Let us be given an ordinal tree, such that each node of it has a certain...
research
11/05/2020

Instance Based Approximations to Profile Maximum Likelihood

In this paper we provide a new efficient algorithm for approximately com...
research
03/25/2020

Further Results on Colored Range Searching

We present a number of new results about range searching for colored (or...
research
01/29/2023

Fast Correlation Function Calculator – A high-performance pair counting toolkit

Context. A novel high-performance exact pair counting toolkit called Fas...
research
04/27/2016

Probabilistic Graphical Models on Multi-Core CPUs using Java 8

In this paper, we discuss software design issues related to the developm...
research
07/03/2023

An embarrassingly parallel optimal-space cardinality estimation algorithm

In 2020 Blasiok (ACM Trans. Algorithms 16(2) 3:1-3:28) constructed an op...
research
06/18/2018

Implementation of Peridynamics utilizing HPX -- the C++ standard library for parallelism and concurrency

Peridynamics is a non-local generalization of continuum mechanics tailor...

Please sign up or login with your details

Forgot password? Click here to reset