Conservation of the t-digest Scale Invariant
A t-digest is a compact data structure that allows estimates of quantiles which increased accuracy near q = 0 or q=1. This is done by clustering samples from R subject to a constraint that the number of points associated with any particular centroid is constrained so that the so-called k-size of the centroid is always < 1. The k-size is defined using a scale function that maps quantile q to index k. Since the centroids are real numbers, they can be ordered and thus the quantile range of a centroid can be mapped into an interval in k whose size is the k-size of that centroid. The accuracy of quantile estimates made using a t-digest depends on the invariance of this constraint even as new data is added or t-digests are merged. This paper provides proofs of this invariance for four practically important scale functions.
READ FULL TEXT