Asymmetric scale functions for t-digests

05/19/2020
by   Joseph Ross, et al.
0

The t-digest is a data structure that can be queried for approximate quantiles, with greater accuracy near the minimum and maximum of the distribution. We develop a t-digest variant with accuracy asymmetric about the median, thereby making possible alternative tradeoffs between computational resources and accuracy which may be of particular interest for distributions with significant skew. After establishing some theoretical properties of scale functions for t-digests, we show that a tangent line construction on the familiar scale functions preserves the crucial properties that allow t-digests to operate online and be mergeable. We conclude with an empirical study demonstrating the asymmetric variant preserves accuracy on one side of the distribution with a much smaller memory footprint.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/08/2015

Shedding Light on the Asymmetric Learning Capability of AdaBoost

In this paper, we propose a different insight to analyze AdaBoost. This ...
09/11/2018

Deep Asymmetric Networks with a Set of Node-wise Variant Activation Functions

This work presents deep asymmetric networks with a set of node-wise vari...
05/01/2019

Maximizing simulated tropical cyclone intensity with action minimization

Direct computer simulation of intense tropical cyclones (TCs) in weather...
10/09/2021

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification

With the development of deep learning, automatic speaker verification ha...
01/24/2020

A tutorial on the range variant of asymmetric numeral systems

This paper is intended to be an accessible introduction to the range var...
01/05/2022

ADRA: Extending Digital Computing-in-Memory with Asymmetric Dual-Row-Activation

Computing in-memory (CiM) has emerged as an attractive technique to miti...
03/24/2019

The Size of a t-Digest

A t-digest is a compact data structure that allows estimates of quantile...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently the -digest (Dunning and Ertl, 2019)

has gained prominence as an efficient data structure for online estimation of quantiles of large data streams. The digest consists of a collection of weighted centroids on the real line, with the weight representing cluster size (the number of observations near the corresponding centroid). In comparison to other methods, the

-digest is notable for its ability to have variable accuracy in different regions of quantile space. The accuracy is controlled by a scale function, which governs the permissible compression (expressed as a bound on cluster size, see (Dunning, 2019b)) as a function of the quantile . Using a linear scale function turns the -digest into a dynamic version of a histogram with equal-sized bins, but using logarithmic or (inverse) trigonometric functions allows the digest to achieve greater accuracy near the tails (i.e., near or ) and comparatively less accuracy near the median ().

The capacity of the -digest to operate online imposes a requirement on the scale function, namely that a collection of centroids compatible with a given scale function remains so when new samples are inserted. Since forming the ordered union of two digests may be described as a sequence of insertions from the viewpoint of either digest, meeting this requirement also implies -digests can be merged to form a new one that inherits the accuracy bounds of its constituents and thus large datasets may -digested in parallel (Dunning and Ertl, 2019, §2.5). For the well-known scale functions, preservation of the constraint under insertion is proved in (Dunning, 2019a).

Motivation. All of the scale functions the author is aware of are symmetric about , and thus expend similar computational resources on those parts of the distribution near and those near . In practice there are scenarios in which one tail of the distribution carries considerably more excitement than the other. For example, in application performance monitoring, the latency of individual operations or execution paths is often distributed with significant positive skew, as the overwhelming majority of executions complete quickly and uneventfully, while a relatively small number of outlying executions exhibits greater variation. In practical terms, the difference between a 97th percentile and a 99th percentile operation execution is greater than the difference between a 3rd percentile and a 1st percentile execution, and so accuracy near is “worth more” than accuracy near . In (Ross, 2019b, esp. §3.2) we have described a tail-based sampling method for distributed traces that requires a compact device for approximating quantiles and ranks, and in this setting we would like to make fine-grained distinctions near , whereas very little of our budget will be devoted to keeping execution traces near in any case.

A related context is monitoring service level objectives in distributed computing environments (Beyer et al., 2016, Ch. 4), (Sloss et al., 2017): it is common to treat upper quantiles of request latency as service level indicators (Beyer et al., 2016, Ch. 4), which may be implemented as a client querying a -digest for a particular quantile value near (but not near ). While it may not be possible to enhance the resolution of a -digest exactly in a neighborhood of a specified quantile (since insertions may shift a region of data for which only a coarse summary is available into a region in which greater accuracy is required), an asymmetric scale function allows one to strike a better balance between computational resources and accuracy (e.g., save computational resources without compromising the accuracy of the required estimate, or increase accuracy for the required estimate by using an asymmetric scale function with a larger parameter). Especially for high-volume endpoints over longer time windows, the asymmetric -digests we propose here are a natural family of data structures on which to base approximate calculations.

Contributions. In this paper, we prove (Subsection 3.1) that a simple modification of the common scale functions continues to enjoy the preservation of the constraint under insertion property. The construction uses a piecewise definition in which we keep the scale function for and use the best linear approximation of the scale function at (i.e., the function whose graph is the tangent line to the graph of the scale function at ) for . Our approach is motivated by some brief theory (Section 2), from which we conclude that decent scale functions must be differentiable, and from which we deduce an explicit criterion for verifying the decency of a candidate scale function. As a consequence we analyze the case of polynomial scale functions (Subsection 3.2). We conclude with some empirical results in Section 4.

Previous work. For some background on other methods for computing quantiles in an online fashion, we refer the reader to (Dunning and Ertl, 2019, 1.1), in particular the Q-digest of (Shrivastava et al., 2004), and the works of (Munro and Paterson, 1980), (Chen et al., 2000), and (Greenwald et al., 2001)

. The moment-based quantile sketch has recently emerged as another compact data structure for quantile estimation

(Gan et al., 2018).

2 Generalities

2.1 Definitions

An ordered set of clusters on a set of points in is called a -digest with respect to a scale function if every cluster has unit weight or satisfies (Dunning and Ertl, 2019, §2.1). The quantity is called the -size of the cluster. We will always require to be non-decreasing and piecewise differentiable.

We will be interested in the operation of inserting a collection of samples into a given set ; denote the result by . The notation does not specify where was inserted. We say a scale function accepts insertions (or is insertion-accepting) if given any -digest with respect to , every cluster continues to have -size less than or equal to when its quantile range is calculated in .

As the condition indeed implies a scale for , it is natural to restrict our attention to insertion-accepting scale functions with the property that is again insertion-accepting for any . We call such insertion-accepting scale functions decent.

If the insertion is to the left of a cluster spanning in , the cluster spans in , where is the proportion represented by in (i.e., . When the insertion is to the right, the cluster spans in .

2.2 Characterizations

Lemma 2.2.1.

The scale function is decent if and only if for all and all , we have for and for .

Proof.

Clearly the condition implies accepts insertions. Since the condition is preserved under scaling by , the condition implies accepts insertions, i.e., is decent.

If for some , we can find such that . An insertion into a set of clusters realizing the transformation would then violate the insertion-accepting condition for , and so decency implies the condition. ∎

Rearranging the inequality of the preceding lemma gives the following characterization of decent scale functions.

Corollary 2.2.2.

The scale function is decent if and only if for all , the functions and are non-increasing on .

2.3 Properties

Lemma 2.3.1.

Decent scale functions form a convex cone: if are decent, then so is for any .

Proof.

Use the characterization of Corollary 2.2.2 or that of Lemma 2.2.1. ∎

Lemma 2.3.2.

A decent scale function is continuous.

Proof.

Let be a point where continuity fails, and let denote the left and right hand limits of at . By piecewise continuity, we can find a pair of points such that . For an insertion pushing , but not , across the point of discontinuity, we have and so . Combining the inequalities produces a violation of the condition of Lemma 2.2.1. ∎

Proposition 2.3.3.

A decent scale function is differentiable.

Proof.

Let be a point where differentiability fails, suppose and both exist, and suppose . In this case we shift a centroid to the right; if the inequality were reversed, we would shift it to the left.

Let be a sequence approaching from below, and define and . Note that , and that . Before insertion the picture is

and after insertion it is

By choosing large enough, we can guarantee that , and therefore , violating the insertion-accepting property by Lemma 2.2.1. ∎

Remark 2.3.4.

By (Dunning, 2019a), the following are examples of decent scale functions:

Remark 2.3.5.

The unnormalized forms of (i.e., without the term) are also decent. Our conditions in quantile space for the unnormalized forms imply decency in the finite data case for the normalized forms since the function is non-decreasing.

3 Computations

3.1 Piecewise defined functions

Gluing. Suppose and are decent scale functions, and . Let denote the function which is on and on . For to be decent, Proposition 2.3.3 implies that and must agree at , so one natural approach to gluing is to take a decent scale function (e.g., from the list in Remark 2.3.4), choose a point , and let be the best linear approximation to at , i.e., use the function:

To show is decent, by Lemma 2.2.2 it suffices to show and are non-increasing on . Note if and are both greater than or equal to , the decency of implies the necessary non-increasing property for (and similarly via if both are less than or equal to ), and similarly for and . Therefore it suffices to show the non-increasing property for insertions moving from one side of to the other, i.e., for such that

  • (left to right) and , or

  • (right to left) and

For the functions we consider, the point is of particular interest since it minimizes the derivative , hence the cluster size for is as large as possible.

Notation and strategy. For ease of exposition, we establish common notation for the next three propositions (all concerning the gluing construction). For the case of shifting from left to right, we need to show is non-increasing on the interval defined by and . For the case of shifting from right to left, we need to show is non-increasing on the interval defined by and . We accomplish this by verifying and on the relevant domains. The decency results hold for positive scalar multiples of our scale functions as well (decency is a property of the determined ray), but we leave this implicit for notational simplicity.

Remark 3.1.1.

The construction can be modified in the obvious way to reverse the emphasis on the tails, i.e., using a non-linear scale function on and the linear function describing its tangent line at for , but we do not explicitly state this variant in our results. The variant with higher accuracy near is reminiscent of a high dynamic range histogram, though the -digest error is still bounded in terms of the quantile rather than the value of the observation itself.

Proposition 3.1.2.

For any , the scale function

is decent.

Proof.

We have

and therefore:

from which it follows that is equivalent to

Since and (as ), the left hand side is greater than or equal to . Since , the desired inequality follows.

We calculate:

from which it follows that is equivalent to

Since and , the left hand side is greater than or equal to , which is greater than the right hand side since . ∎

Proposition 3.1.3.

For any , the scale function

is decent.

Proof.

We calculate

and so

Therefore if and only if . Since and , the desired inequality follows.

For the other case,

and so when . We have and and so the result follows. ∎

Proposition 3.1.4.

For any , the scale function

is decent.

For any , the scale function

is decent.

Proof.

First we deal with the case the split point is greater than . We have

and so

Since , we have and hence as desired.

For insertions on the other side, we have

and so

Then is equivalent to . Since , we have and hence as needed.

In the case the split point is less than , we have

and therefore

(Note the limit of at exists.) For the first branch, implies , and , so for these . For the other branch, implies , so . Therefore and so for these as well.

Next we have

and so

(Note the limit of at exists.) For the first branch, implies and so for these . For the other branch, is equivalent to . Since , we have . Multiplying this inequality by and using gives the inequality we need. ∎

3.2 Polynomials

Proposition 3.2.1.

For any , the scale function is decent.

Proof.

We need to show, for any , that and are non-increasing on the domain . Since

we calculate:

Now implies and so . Since , is decreasing, so is negative on , so is decreasing on this domain, as desired.

As for , we have

and so

Now and is decreasing, so too is negative on , so is decreasing on this domain, as desired. ∎

More generally we have the following.

Proposition 3.2.2.

For any , there exists such that for , the scale function is decent.

Proof.

We find conditions on guaranteeing that and are non-increasing on the domain , for any . We have:

Now , where is a polynomial divisible by and with no -dependence. Hence is divisible by (say ) and we can write

Now is a polynomial in , in particular has a maximum on . Choosing larger than implies as desired, so we can choose to be anything larger than (which depends only on ).

The analysis of is somewhat simpler. We calculate:

Now and , so the first term is non-positive, and , so as desired. ∎

Combining various polynomials using the convex cone property (Lemma 2.3.1), we can generate lots of decent scale functions. The utility of this construction is somewhat unclear, as the linear term dominates more with larger . The decency of certain polynomials also opens the possibility of extending the gluing construction from linear approximations to higher degree Taylor polynomials.

4 Empirical results

This section summarizes the results of 100 runs of constructing a

-digest on one million samples from a uniform distribution, for different scale functions. The main goal is to understand empirically the effect of the scale functions discussed in Section

3, especially the gluing construction applied to the familiar scale functions. In all cases we set the compression parameter and perform a compression (so the digest is “fully merged”) before calculating quantiles. We follow the conventions of (Dunning and Ertl, 2019) (see also Remark 2.3.4). For the piecewise defined functions, we glue at the point . For we use the size bounds of (Dunning, 2019b) and for we bound by the reciprocal of the slope of the line; for these differ by higher order terms in the normalizing/compression factors.

The error is the absolute value of the difference between the cumulative distribution function evaluated at the estimate of quantile

and itself, and appears in the leftmost panel. The normalized error divides this quantity by and appears in the center panel. For the error plots, the whiskers range from the 5th to 95th percentile of the 100 runs, the boxes cover the interquartile range, the orange line is the median, and the horizontal axis is the following transformation of quantile space:

Note the horizontal axes have the same interpretation in each figure, but the vertical axes vary. The rightmost panel is a histogram of centroid counts over the 100 runs. Implementations of the asymmetric scale functions and code for generating the data and plots are available at (Ross, 2019a) (a fork of (Dunning, 2018)).

4.1 AVL tree results

This subsection compares the different scale functions for -digests using the AVLTree variant in the Java implementation (Dunning, 2018).

Figure 1: Errors and centroid counts for the scale function (first row; a baseline) and for the quadratic polynomial scale function (second row). For this coefficient choice, the resulting function maps to , as does . Both use AVLTree.
Figure 2: Errors and centroid counts for the usual (first row) and glued (second row) variants of the scale function . Both use AVLTree.
Figure 3: Errors and centroid counts for the usual (first row) and glued (second row) variants of the normalized scale function . Both use AVLTree.
Figure 4: Errors and centroid counts for the usual (first row) and glued (second row) variants of the normalized scale function . Both use AVLTree.

Discussion. In all cases the glued variant of has the error profile of for and that of (linear function, uniform cluster sizes) for , as expected. The reduction in number of centroids is more dramatic for and than it is for due to the normalizing term appearing in the linear halves of and . This reduction describes, to first order, the memory savings of the asymmetric (glued) variant over the usual symmetric one. We have not investigated quantitatively the computational advantage, but roughly speaking, half (when gluing at ) of the transcendental scale function evaluations are replaced by evaluation of a simple linear function.

4.2 Merging digest results

This subsection compares the different scale functions for -digests using the MergingDigest variant in the Java implementation (Dunning, 2018). We have made two minor changes to the main implementation in (Ross, 2019a)

. First, we set “useAlternatingSort” to false, so that we do not alternate between upward and downward merge passes. Alternating seems to interact poorly with asymmetric scale functions; when set to true, the digests using asymmetric scale functions have too few centroids. Second, we have added more padding to the underlying arrays; the amount of fudge required seems to depend on the number of samples processed.

Figure 5: Errors and centroid counts for the scale function (first row; a baseline) and for the quadratic polynomial scale function (second row). For this coefficient choice, the resulting function maps to , as does . Both use MergingDigest. For , the unusual errors at and (several times the error observed with the AVLTree implementation) seem to be related to the compression parameter (perhaps via inaccuracy near the boundary between clusters); these “bumps” move to and with . The asymmetric improves the error at at the expense of introducing more centroids.
Figure 6: Errors and centroid counts for the usual (first row) and glued (second row) variants of the scale function . Both use MergingDigest. The glued variant of has the error profile of for and that of for , including the unusual error at . The unexpected asymmetry of the errors for disappears when setting “useAlternatingSort” to true.
Figure 7: Errors and centroid counts for the usual (first row) and glued (second row) variants of the normalized scale function . Both use MergingDigest. The glued variant of has the error profile of for and that of for , except some of the unusual error for at seems to have shifted to (perhaps due to more effective compression for ).
Figure 8: Errors and centroid counts for the usual (first row) and glued (second row) variants of the normalized scale function . Both use MergingDigest. The glued variant of has the error profile of for and that of for , except some of the unusual error for at seems to have shifted to (perhaps due to more effective compression for ).

Acknowledgments. It is a pleasure to thank engineering and management at SignalFx for their encouragement and support during the preparation of this paper, and to thank Matthew Pound for many interesting discussions on this topic.

References

  • Beyer et al. (2016) Beyer B, Jones C, Petoff J, Murphy NR (2016). Site Reliability Engineering: How Google Runs Production Systems. ” O’Reilly Media, Inc.”.
  • Chen et al. (2000) Chen F, Lambert D, Pinheiro JC (2000). “Incremental quantile estimation for massive tracking.” In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 516–522. ACM.
  • Dunning (2018) Dunning T (2018). “The -digest Library.” https://github.com/tdunning/t-digest/. [Online; accessed 8-August-2019].
  • Dunning (2019a) Dunning T (2019a). “Conservation of the -digest Scale Invariant.” arXiv preprint arXiv:1903.09919.
  • Dunning (2019b) Dunning T (2019b). “The Size of a -Digest.” arXiv preprint arXiv:1903.09921.
  • Dunning and Ertl (2019) Dunning T, Ertl O (2019). “Computing extremely accurate quantiles using -digests.” arXiv preprint arXiv:1902.04023.
  • Gan et al. (2018) Gan E, Ding J, Tai KS, Sharan V, Bailis P (2018). “Moment-based quantile sketches for efficient high cardinality aggregation queries.” Proceedings of the VLDB Endowment, 11(11), 1647–1660.
  • Greenwald et al. (2001) Greenwald M, Khanna S, et al. (2001). “Space-efficient online computation of quantile summaries.” ACM SIGMOD Record, 30(2), 58–66.
  • Munro and Paterson (1980) Munro JI, Paterson MS (1980). “Selection and sorting with limited storage.” Theoretical computer science, 12(3), 315–323.
  • Ross (2019a) Ross J (2019a). “SignalFx fork of Ted Dunning’s -digest Library.” https://github.com/signalfx/t-digest/tree/asymmetric/docs/asymmetric. [Online; accessed 13-September-2019].
  • Ross (2019b) Ross J (2019b). “A Weighted Sampling Scheme for Distributed Traces.” Submitted.
  • Shrivastava et al. (2004) Shrivastava N, Buragohain C, Agrawal D, Suri S (2004). “Medians and beyond: new aggregation techniques for sensor networks.” In Proceedings of the 2nd international conference on Embedded networked sensor systems, pp. 239–249. ACM.
  • Sloss et al. (2017) Sloss BT, Dahlin M, Rau V, Beyer B (2017). “The Calculus of Service Availability.” Queue, 15(2), 40.