Recursive PGFs for BSTs and DSTs

02/07/2020 ∙ by Steven Finch, et al. ∙ Harvard University 0

We review fundamentals underlying binary search trees and digital search trees, with (atypical) emphasis on recursive formulas for associated probability generating functions. Other topics include higher moments of BST search costs and combinatorics for a certain finite-key analog of DSTs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Binary Search Trees

Consider the R program:

where is a random permutation on and is initially .  To model successful searches, let

be a random odd integer satisfying

.  To model unsuccessful searches, let be a random even integer satisfying .  This scenario is exactly as described in [4].  It is assumed, of course, that and are drawn independently with uniform sampling.  We begin with even , because this case is simpler, followed by odd .

1.1 Unsuccessful Search

The probability generating function for , given , obeys a recursion [5]

Note that always.  Differentiating with respect to :

we have first moment

that is,

where and .  Clearly and .  Differentiating again:

we have second factorial moment

that is,

where and .  Clearly and

.  Finally, we have variance

which is when and when .  From (more typical) harmonic number-based exact expressions, it can be proved that [2, 6, 7]

as .

1.2 Successful Search

The probability generating function for , given , obeys a recursion

Note that always.  Differentiating with respect to :

we have first moment

that is,

where and .  Clearly and .  Differentiating again:

we have second factorial moment

that is,

where and .  Clearly and .  Finally, we have variance which is when and when .

It can be proved that [2, 5, 6, 8]

as .

1.3 Total Path Length

The total (internal) path length is the sum of taken over all odd integers from to .  It is not surprising that calculations are more involved here than before. The probability generating function for , given , obeys a recursion [6]

Note that always.  Differentiating with respect to :

we have first moment

that is,

where and .  Clearly , , and .  Differentiating again:

we have second factorial moment

that is,

where and .  Clearly , , and .  Finally, we have variance which is when and when .

It can be proved that [2, 5, 9]

as .

1.4 Higher Moments

A third moment expression appears in [10] for successful search; analogous work for unsuccessful search remains undone.  We focus on total (internal) path length for BSTs.  The cumulants , , … , of were exhaustively studied by Hennequin [11, 12]; these asymptotically satisfy

as , where

Hoffman & Kuba [13] obtained a complicated recurrence for an associated sequence of rationals [14, 15]:

using what they called tiered binomial coefficients.  While they utilized notation , we adopt .  It suffices to say that and a rich theory about for awaits discovery.  We give Mathematica code for generating :

and code for generating , given , , … , :

This final line employs a well-known expression for cumulants in terms of partial (or incomplete) Bell polynomials of central moments.

2 Digital Search Trees

Consider the R program:

where is a random binary matrix with distinct rows, is initially and is initially . It is usually assumed [2, 16] that , from which the row-distinctness requirement follows almost surely (imagining the rows as binary expansions of independent Uniform numbers).  If instead , as exploratively specified in [17], then the matrix would need to be generated carefully to avoid duplicate keys. To model successful searches, let be a random row of .  To model unsuccessful searches, let be a random binary

-vector that is not a row of

.

2.1 Unsuccessful Search

The probability generating function for , given , is

for and

for .  A closed-form expression exists [2] for when , but a corresponding simple recursive formula does not evidently materialize.  Section 3 contains verification of these polynomial expressions.

2.2 Successful Search

The probability generating function for , given , is

for and

for .  A closed-form expression exists [1, 2] for when , but a corresponding simple recursive formula again does not materialize.  Means and variances for and those for unsurprisingly become closer as increases.

2.3 Total Path Length

The total (internal) path length is the sum of taken over all rows of .  It is not surprising that calculations are more involved here than before.  Assume that . The probability generating function for , given , obeys a recursion [18]

Note that always.  Differentiating with respect to :

we have first moment

that is,

where and .  Clearly , , and .  Differentiating again:

we have second factorial moment

that is,

where and .  Clearly , , and .  Finally, we have variance which is when and when .

Define constants

Let denote the partial product of and

It can be proved that [18, 19]

as , where

This expression for is, needless to say, a stunning result.

Assuming instead that , all we currently possess are PGFs for small :

A deeper understanding of finite-key DSTs would be welcome.

2.4 Some Combinatorics

We focus on unsuccessful searches, for both infinite keys () and finite keys ().  Let us examine the coefficients of and for simplicity.  The digital search trees appearing in Figure 1 for proceed from matrices

respectively.  When , the indicated keys are merely abbreviations (two leading bits in an infinite sequence); hence the keys are automatically distinct; thus

where is the count of binary matrices.  When , however, key-distinctness must be manually enforced.  We obtain the condition

which is equivalent to and gives possibilities; also the condition

which is equivalent to and gives possibilities; therefore

where is the count of permutations of objects, taken at a time.

Figure 1: Two linear cases and two triangular cases for .

For and , using Figures 2 and 3, we have

but when instead, we have

Figure 2: Four linear cases for ; note that two are reflections of the others.
Figure 3: Two triangular cases for ; note that one is a reflection of the other.

For and , using Figures 4, 5 and 6, we have

but when instead, we have

The emergence of bi-triangular cases at complicates our study for .  A similar argument for coefficients of , … , , as well as for successful searches, is possible.

Figure 4: Eight linear cases for (these four cases plus their reflections).
Figure 5: Four triangular cases for (these two cases plus their reflections).
Figure 6: Four bi-triangular cases for (these two cases plus their reflections).

Third and fourth moment expressions appear in [20] for unsuccessful search on infinite keys.  The covariance between two random distinct successful search costs within the same tree is apparently as , where [18]

Verifying this interesting result via simulation remains open.  What can be said about the cost covariance for two distinct unsuccessful searches?  What can be said about the cost covariance given a successful search and an unsuccessful search?

3 Acknowledgements

I am grateful to Markus Kuba and Sumit Kumar Jha for helpful discussions, and to David Penman for providing [12] (which at one time was available at http://algo.inria.fr/).

References

  • [1] G. Louchard, Exact and asymptotic distributions in digital and binary search trees, RAIRO Inform. Théor. Appl. 21 (1987) 479–495; MR0928772.
  • [2] H. M. Mahmoud, Evolution of Random Search Trees, Wiley, 1992, pp. 71–91, 260–285; MR1140708.
  • [3] S. R. Finch, Resolving conflicts and electing leaders, arXiv:1912.06545.
  • [4] S. R. Finch, Binary search tree constants, Mathematical Constants, Cambridge Univ. Press, 2003, pp. 349–354; MR2003519.
  • [5] R. Sedgewick and P. Flajolet, Introduction to the Analysis of Algorithms, Addison-Wesley, 1996, pp. 142, 162–163, 246–250.
  • [6] D. E. Knuth, The Art of Computer Programming, v. 3, Sorting and Searching, 2 ed., Addison-Wesley, 1998, pp. 430–431, 455, 709; MR3077154.
  • [7] W. C. Lynch, More combinatorial properties of certain trees, Computer J., v. 7 (1965) n. 4, 299–302; MR0172492.
  • [8] G. D. Knott, Variance of calculation, unpublished note (1973).
  • [9] P. F. Windley, Trees, forests and rearranging, Computer J., v. 3 (1960) n. 2, 84–88.
  • [10] H. M. Mahmoud and R. Neininger, Distribution of distances in random binary search trees, Annals Appl. Probab. 13 (2003) 253–276; MR1951999.
  • [11]

    P. Hennequin, Combinatorial analysis of quicksort algorithm,

    RAIRO Inform. Théor. Appl. 23 (1989) 317–333; MR1020477.
  • [12] P. Hennequin, Analyse en moyenne d’algorithmes, tri rapide et arbres de recherche, Ph.D. thesis, École Polytechnique Palaiseau, 1991; http://www.mit.edu/~sfinch/Hennequin-thesis.pdf.
  • [13] M. E. Hoffman and M. Kuba, Logarithmic integrals, zeta values, and tiered binomial coefficients, arXiv:1906.08347.
  • [14] M. Cramer, A note concerning the limit distribution of the quicksort algorithm, RAIRO Inform. Théor. Appl. 30 (1996) 195–207; MR1415828.
  • [15] S. B. Ekhad and D. Zeilberger, A detailed analysis of quicksort running time, arXiv:1903.03708; data output at http://sites.math.rutgers.edu/~zeilberg/tokhniot/oQuickSortAnalysis3.txt.
  • [16] D. E. Knuth, The Art of Computer Programming, v. 3, Sorting and Searching, 2 ed., Addison-Wesley, 1998, pp. 500–505, 509, 726; MR3077154.
  • [17] S. R. Finch, Digital search tree constants, Mathematical Constants, Cambridge Univ. Press, 2003, pp. 354–361; MR2003519.
  • [18] P. Kirschenhofer, H. Prodinger and W. Szpankowski, Digital search trees again revisited: the internal path length perspective, SIAM J. Comput. 23 (1994) 598–616; MR1274646 (95i:68034).
  • [19] H.-K. Hwang, M. Fuchs and V. Zacharovas, Asymptotic variance of random symmetric digital search trees, Discrete Math. Theor. Comput. Sci. 12 (2010) 103–165; MR2676668 (2012b:05232).
  • [20] G. Louchard and H. Prodinger, Approximate counting with counters: a probabilistic analysis, J. Algebra Combin. Discrete Struct. Appl. 2 (2015) 191–209; MR3400765.
    Steven Finch
    MIT Sloan School of Management
    Cambridge, MA, USA
    steven_finch@harvard.edu