Conservation of the t-digest Scale Invariant

03/24/2019
by   Ted Dunning, et al.
0

A t-digest is a compact data structure that allows estimates of quantiles which increased accuracy near q = 0 or q=1. This is done by clustering samples from R subject to a constraint that the number of points associated with any particular centroid is constrained so that the so-called k-size of the centroid is always < 1. The k-size is defined using a scale function that maps quantile q to index k. Since the centroids are real numbers, they can be ordered and thus the quantile range of a centroid can be mapped into an interval in k whose size is the k-size of that centroid. The accuracy of quantile estimates made using a t-digest depends on the invariance of this constraint even as new data is added or t-digests are merged. This paper provides proofs of this invariance for four practically important scale functions.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset