1 Binary Search Trees
Consider the R program:
where is a random permutation on and is initially . To model successful searches, let
be a random odd integer satisfying
. To model unsuccessful searches, let be a random even integer satisfying . This scenario is exactly as described in [4]. It is assumed, of course, that and are drawn independently with uniform sampling. We begin with even , because this case is simpler, followed by odd .1.1 Unsuccessful Search
The probability generating function for , given , obeys a recursion [5]
Note that always. Differentiating with respect to :
we have first moment
that is,
where and . Clearly and . Differentiating again:
we have second factorial moment
that is,
where and . Clearly and
. Finally, we have variance
which is when and when . From (more typical) harmonic number-based exact expressions, it can be proved that [2, 6, 7]
as .
1.2 Successful Search
The probability generating function for , given , obeys a recursion
Note that always. Differentiating with respect to :
we have first moment
that is,
where and . Clearly and . Differentiating again:
we have second factorial moment
that is,
where and . Clearly and . Finally, we have variance which is when and when .
1.3 Total Path Length
The total (internal) path length is the sum of taken over all odd integers from to . It is not surprising that calculations are more involved here than before. The probability generating function for , given , obeys a recursion [6]
Note that always. Differentiating with respect to :
we have first moment
that is,
where and . Clearly , , and . Differentiating again:
we have second factorial moment
that is,
where and . Clearly , , and . Finally, we have variance which is when and when .
1.4 Higher Moments
A third moment expression appears in [10] for successful search; analogous work for unsuccessful search remains undone. We focus on total (internal) path length for BSTs. The cumulants , , … , of were exhaustively studied by Hennequin [11, 12]; these asymptotically satisfy
as , where
Hoffman & Kuba [13] obtained a complicated recurrence for an associated sequence of rationals [14, 15]:
using what they called tiered binomial coefficients. While they utilized notation , we adopt . It suffices to say that and a rich theory about for awaits discovery. We give Mathematica code for generating :
and code for generating , given , , … , :
This final line employs a well-known expression for cumulants in terms of partial (or incomplete) Bell polynomials of central moments.
2 Digital Search Trees
Consider the R program:
where is a random binary matrix with distinct rows, is initially and is initially . It is usually assumed [2, 16] that , from which the row-distinctness requirement follows almost surely (imagining the rows as binary expansions of independent Uniform numbers). If instead , as exploratively specified in [17], then the matrix would need to be generated carefully to avoid duplicate keys. To model successful searches, let be a random row of . To model unsuccessful searches, let be a random binary
-vector that is not a row of
.2.1 Unsuccessful Search
The probability generating function for , given , is
for and
for . A closed-form expression exists [2] for when , but a corresponding simple recursive formula does not evidently materialize. Section 3 contains verification of these polynomial expressions.
2.2 Successful Search
2.3 Total Path Length
The total (internal) path length is the sum of taken over all rows of . It is not surprising that calculations are more involved here than before. Assume that . The probability generating function for , given , obeys a recursion [18]
Note that always. Differentiating with respect to :
we have first moment
that is,
where and . Clearly , , and . Differentiating again:
we have second factorial moment
that is,
where and . Clearly , , and . Finally, we have variance which is when and when .
Define constants
Let denote the partial product of and
It can be proved that [18, 19]
as , where
This expression for is, needless to say, a stunning result.
Assuming instead that , all we currently possess are PGFs for small :
A deeper understanding of finite-key DSTs would be welcome.
2.4 Some Combinatorics
We focus on unsuccessful searches, for both infinite keys () and finite keys (). Let us examine the coefficients of and for simplicity. The digital search trees appearing in Figure 1 for proceed from matrices
respectively. When , the indicated keys are merely abbreviations (two leading bits in an infinite sequence); hence the keys are automatically distinct; thus
where is the count of binary matrices. When , however, key-distinctness must be manually enforced. We obtain the condition
which is equivalent to and gives possibilities; also the condition
which is equivalent to and gives possibilities; therefore
where is the count of permutations of objects, taken at a time.

For and , using Figures 2 and 3, we have
but when instead, we have


For and , using Figures 4, 5 and 6, we have
but when instead, we have
The emergence of bi-triangular cases at complicates our study for . A similar argument for coefficients of , … , , as well as for successful searches, is possible.



Third and fourth moment expressions appear in [20] for unsuccessful search on infinite keys. The covariance between two random distinct successful search costs within the same tree is apparently as , where [18]
Verifying this interesting result via simulation remains open. What can be said about the cost covariance for two distinct unsuccessful searches? What can be said about the cost covariance given a successful search and an unsuccessful search?
3 Acknowledgements
I am grateful to Markus Kuba and Sumit Kumar Jha for helpful discussions, and to David Penman for providing [12] (which at one time was available at http://algo.inria.fr/).
References
- [1] G. Louchard, Exact and asymptotic distributions in digital and binary search trees, RAIRO Inform. Théor. Appl. 21 (1987) 479–495; MR0928772.
- [2] H. M. Mahmoud, Evolution of Random Search Trees, Wiley, 1992, pp. 71–91, 260–285; MR1140708.
- [3] S. R. Finch, Resolving conflicts and electing leaders, arXiv:1912.06545.
- [4] S. R. Finch, Binary search tree constants, Mathematical Constants, Cambridge Univ. Press, 2003, pp. 349–354; MR2003519.
- [5] R. Sedgewick and P. Flajolet, Introduction to the Analysis of Algorithms, Addison-Wesley, 1996, pp. 142, 162–163, 246–250.
- [6] D. E. Knuth, The Art of Computer Programming, v. 3, Sorting and Searching, 2 ed., Addison-Wesley, 1998, pp. 430–431, 455, 709; MR3077154.
- [7] W. C. Lynch, More combinatorial properties of certain trees, Computer J., v. 7 (1965) n. 4, 299–302; MR0172492.
- [8] G. D. Knott, Variance of calculation, unpublished note (1973).
- [9] P. F. Windley, Trees, forests and rearranging, Computer J., v. 3 (1960) n. 2, 84–88.
- [10] H. M. Mahmoud and R. Neininger, Distribution of distances in random binary search trees, Annals Appl. Probab. 13 (2003) 253–276; MR1951999.
-
[11]
P. Hennequin, Combinatorial analysis of quicksort algorithm,
RAIRO Inform. Théor. Appl. 23 (1989) 317–333; MR1020477. - [12] P. Hennequin, Analyse en moyenne d’algorithmes, tri rapide et arbres de recherche, Ph.D. thesis, École Polytechnique Palaiseau, 1991; http://www.mit.edu/~sfinch/Hennequin-thesis.pdf.
- [13] M. E. Hoffman and M. Kuba, Logarithmic integrals, zeta values, and tiered binomial coefficients, arXiv:1906.08347.
- [14] M. Cramer, A note concerning the limit distribution of the quicksort algorithm, RAIRO Inform. Théor. Appl. 30 (1996) 195–207; MR1415828.
- [15] S. B. Ekhad and D. Zeilberger, A detailed analysis of quicksort running time, arXiv:1903.03708; data output at http://sites.math.rutgers.edu/~zeilberg/tokhniot/oQuickSortAnalysis3.txt.
- [16] D. E. Knuth, The Art of Computer Programming, v. 3, Sorting and Searching, 2 ed., Addison-Wesley, 1998, pp. 500–505, 509, 726; MR3077154.
- [17] S. R. Finch, Digital search tree constants, Mathematical Constants, Cambridge Univ. Press, 2003, pp. 354–361; MR2003519.
- [18] P. Kirschenhofer, H. Prodinger and W. Szpankowski, Digital search trees again revisited: the internal path length perspective, SIAM J. Comput. 23 (1994) 598–616; MR1274646 (95i:68034).
- [19] H.-K. Hwang, M. Fuchs and V. Zacharovas, Asymptotic variance of random symmetric digital search trees, Discrete Math. Theor. Comput. Sci. 12 (2010) 103–165; MR2676668 (2012b:05232).
-
[20]
G. Louchard and H. Prodinger, Approximate counting with
counters: a probabilistic analysis, J. Algebra Combin. Discrete
Struct. Appl. 2 (2015) 191–209; MR3400765.
Steven Finch MIT Sloan School of Management Cambridge, MA, USA steven_finch@harvard.edu