1 Introduction
A subtree of a rooted tree that consists of a node and all its descendants is called a fringe subtree. Fringe subtrees are a natural object of study in the context of random trees, and there are numerous results for various random tree models, see e.g. [3, 9, 11, 13].
Fringe subtrees are of particular interest in computer science: One of the most important and widely used lossless compression methods for rooted trees is to represent a tree as a directed acyclic graph, which is obtained by merging nodes that are roots of identical fringe subtrees. This compressed representation of the tree is often shortly referred to as minimal DAG and its size (number of nodes) is the number of distinct fringe subtrees occurring in the tree. Compression by minimal DAGs has found numerous applications in various areas of computer science, as for example in compiler construction [2, Chapter 6.1 and 8.5], unification [25], symbolic model checking (binary decision diagrams) [7], information theory [21, 28] and XML compression and querying [8, 20].
In this work, we investigate the number of fringe subtrees in random binary trees, i.e. random trees such that each node has either exactly two or no children. So far, this problem has mainly been studied with respect to ordered fringe subtrees in random ordered binary trees: A uniformly random ordered binary tree of size (with
leaves) is a random tree whose probability distribution is the uniform probability distribution on the set of ordered binary trees of size
. In [19], Flajolet, Sipala and Steyaert proved that the expected number of distinct ordered fringe subtrees in a uniformly random ordered binary tree of size is asymptotically equal to , where is the constant . This result of Flajolet et al. was extended to unranked labelled trees in [6] (for a different constant ). Moreover, an alternative proof to the result of Flajolet et al. was presented in [26] in the context of simplygenerated families of trees.Another important type of random trees are socalled random binary search trees: A random binary search tree of size is a binary search tree built by inserting the keys according to a uniformly chosen random permutation on . Random binary search trees naturally arise in theoretical computer science, see e.g. [12]. In [17], Flajolet, Gourdon and Martinez proved that the expected number of distinct ordered fringe subtrees in a random binary search tree of size is . This result was improved in [10] by Devroye, who showed that the asymptotics holds. Moreover, the result of Devroye was generalized from random binary search trees to a broader class of random ordered binary trees in [27]
, where the problem of estimating the expected number of distinct ordered fringe subtrees in random binary trees was considered in the context of socalled leafcentric binary tree sources, which were introduced in
[23, 28] as a general framework for modeling probability distributions on the set of ordered binary trees of size .In this work, we focus on estimating the number of nonisomorphic fringe subtrees in random ordered binary trees, where we call two binary trees nonisomorphic if they are distinct as unordered binary trees. This question arises quite naturally for example in the context of XML compression: Here, one distinguishes between socalled documentcentric XML, for which the corresponding XML document trees are ordered, and datacentric XML, for which the corresponding XML document trees are unordered. Understanding the interplay between ordered and unordered structures has thus received considerable attention in the context of XML (see, for example, [1, 5, 29]). In particular, in [24], it was investigated whether tree compression can benefit from unorderedness. For this reason, socalled unordered minimal DAGs were considered. An unordered minimal DAG of a binary tree is a directed acyclic graph obtained by merging nodes that are roots of isomorphic fringe subtrees, i.e. of fringe subtrees which are identical as unordered trees. From such an unordered minimal DAG, an unordered representation of the original tree can be uniquely retrieved. The size of this compressed representation is the number of nonisomorphic fringe subtrees occurring in the tree. So far, only some worstcase estimates comparing the size of a minimal DAG to the size of its corresponding unordered minimal DAG are known: Among other things, it was shown in [24] that the size of an unordered minimal DAG of a binary tree can be exponentially smaller than the size of the corresponding (ordered) minimal DAG.
However, no averagecase estimates comparing the size of the minimal DAG of a binary tree to the size of the corresponding unordered minimal DAG are known so far. In particular, in [24] it is stated as an open problem to estimate the expected number of nonisomorphic fringe subtrees in a uniformly random ordered binary tree of size and conjectured that this number asymptotically grows as .
In this work, as one of our main theorems, we settle this open conjecture by proving upper and lower bounds of order for the number of nonisomorphic fringe subtrees which hold both in expectation and with high probability (i.e., with probability tending to as ). Our approach can also be used to obtain an analogous result for random binary search trees, though the order of magnitude changes to . Again, we have upper and lower bounds in expectation and with high probability. Our two main theorems read as follows.
Theorem 1
Let be the total number of nonisomorphic fringe subtrees in a uniformly random ordered binary tree with leaves. For two constants and , the following holds:

,

with high probability.
Theorem 2
Let be the total number of nonisomorphic fringe subtrees in a random binary search tree with leaves. For two constants and , the following holds:

,

with high probability.
To prove the above Theorems 1 and 2, we refine techniques from [26]. Our proof technique also applies to the problem of estimating the number of distinct ordered fringe subtrees in uniformly random binary trees or in random binary search trees. In this case, upper and lower bounds for the expected value have already been proven by other authors. Our new contribution is to show that they also hold with high probability.
Theorem 3
Let denote the total number of distinct fringe subtrees in a uniformly random ordered binary tree with leaves. Then, for the constant , the following holds:

,

with high probability.
Here, the first part (i) was already shown in [19] and [26], part (ii) is new. Similarly, we are able to strengthen the results of [10] and [27]:
Theorem 4
Let be the total number of distinct fringe subtrees in a random binary search tree with leaves. For two constants and , the following holds:

,

with high probability.
2 Preliminaries
Let denote the set of ordered binary trees, i.e. of ordered rooted trees such that each node has either exactly two or no children. We define the size of a binary tree as the number of leaves of and by we denote the set of binary trees of size for every integer . It is well known that , where denotes the th Catalan number [18]: We have
(1) 
where the asymptotic growth of the Catalan numbers follows from Stirling’s Formula [18]. Analogously, let denote the set of unordered binary trees, i.e. of unordered rooted trees such that each node has either exactly two or no children. The size of an unordered tree is again the number of leaves of and by we denote the set of unordered binary trees of size . We have , where denotes the th WedderburnEtherington number. Their asymptotic growth is
(2) 
for certain positive constants [4, 16]. In particular, we have .
A fringe subtree of a binary tree is a subtree consisting of a node and all its descendants. For a binary tree and a given node , let denote the fringe subtree of rooted at . Two fringe subtrees are called distinct if they are distinct as ordered binary trees.
Every tree can be considered as an element of by simply forgetting the ordering on ’s nodes. If two binary trees correspond to the same unordered tree , we call them isomorphic: Thus, we obtain a partition of into isomorphism classes. If two binary trees belong to the same isomorphism class, we can obtain from and vice versa by reordering the children of some of ’s (respectively, ’s) inner nodes. An inner node of an ordered or unordered binary tree is called a symmetrical node if the fringe subtrees rooted at ’s children are isomorphic. Let denote the number of symmetrical nodes of . The cardinality of the automorphism group of is given by . Thus, by the orbitstabilizer theorem, there are many ordered binary trees in the isomorphism class of , and likewise many ordered representations of .
We consider two types of probability distributions on the set of ordered binary trees of size :

The uniform probability distribution on , that is, every binary tree of size is assigned the same probability
. A random variable taking values in
according to the uniform probability distribution is called a uniformly random (ordered) binary tree of size .
Before we start with proving our main results, we need two preliminary lemmas on the number of fringe subtrees in uniformly random ordered binary trees and in random binary search trees:
Lemma 1
Let be positive real numbers with . For every positive integer with , let be a set of ordered binary trees with leaves. We denote the cardinality of by . Let denote the (random) number of fringe subtrees with leaves in a uniformly random ordered binary tree with leaves that belong to . Moreover, let denote the (random) number of arbitrary fringe subtrees with more than leaves in a uniformly random ordered binary tree with leaves. We have

for all with , the constant being independent of ,

for all with , again with an constant that is independent of ,

and

with high probability, the following statements hold simultaneously:

for all with ,

.

We emphasize (since it will be important later) that the inequality in part (4), item (i), does not only hold with high probability for each individual , but that it is satisfied with high probability for all in the given range simultaneously.
Proof
(1) Recall first that the number of ordered binary trees with leaves is the Catalan number . We observe that every occurrence of a fringe subtree in in a tree with leaves can be obtained by choosing an ordered tree with leaves, picking one of the leaves and replacing it by a tree in . Thus the total number of occurrences is
Consequently, the average number is
by Stirling’s formula (the constant being independent of in the indicated range).
(2) The variance is determined in a similar fashion: we first count the total number of pairs of fringe subtrees in
that appear in the same ordered tree with leaves. Each such pair can be obtained as follows: take an ordered tree with leaves, pick two leaves, and replace them by fringe subtrees in . The total number is thusgiving us
again by Stirling’s formula. The second moment and the variance are now derived from this formula in a straightforward fashion: We find
and thus, as ,
(3) To obtain the estimate for , we observe that the average total number of fringe subtrees with leaves is
where the estimate follows from Stirling’s formula again for . Summing over all , we get
(4) For the second part, we apply Chebyshev’s inequality to obtain concentration of :
Hence, by the union bound, the probability that the stated inequality fails for any in the given range is only , proving that the first statement holds with high probability. Finally, Markov’s inequality implies that
showing that the second inequality holds with high probability as well.
For the number of fringe subtrees in random binary search trees, a very similar lemma holds:
Lemma 2
Let be positive real numbers with and let and denote positive integers. Moreover, for every , let be a set of ordered binary trees with leaves and let denote the probability that a random binary search tree is contained in , that is, , where the sum is taken over all binary trees in . Let denote the (random) number of fringe subtrees with leaves in a random binary search tree with leaves that belong to . Moreover, let denote the (random) number of arbitrary fringe subtrees with more than leaves in a random binary search tree with leaves. We have

for ,

for all with , where the constant is independent of ,

and

with high probability, the following statements hold simultaneously:

for all with ,

.

Proof
(1) In order to estimate , we define as the (random) number of arbitrary fringe subtrees with leaves in a random binary search tree with leaves. That is, for . Applying the law of total expectation, we find
As conditioned on for some integer
is binomially distributed with parameters
and , we find and henceWith
(see for example [14]), the statement follows.
(2) In order to estimate , we apply the law of total variance:
Again as conditioned on for some integer is binomially distributed with parameters and , we find and . Thus, we have
With and
(see for example [14]), this yields
(3) In order to estimate , first observe that
With for and , this yields
(4) For the second part of the statement, we apply Chebyshev’s inequality to obtain:
Hence, by the union bound, the probability that the stated inequality fails for any in the given range is , proving that the given statement holds with high probability. Furthermore, with Markov’s inequality, we find
Thus, the second inequality holds with high probability as well.
3 Fringe Subtrees in Uniformly Random Binary Trees
3.1 Ordered Fringe Subtrees
We provide the proof of Theorem 3 first, since it is simplest and provides us with a template for the other proofs. Basically, it is a refinement of the proof for the corresponding special case of Theorem 3.1 in [26]. In the following sections, we refine the argument further to prove Theorems 1, 2 and 4.
Proof (Proof of Theorem 3)
We prove the statement in two steps: In the first step, we show that the upper bound holds for both in expectation and with high probability. In the second step, we prove the corresponding lower bound.
The upper bound: Let . The number of distinct fringe subtrees in a uniformly random ordered binary tree with leaves equals (i) the number of such distinct fringe subtrees of size at most plus (ii) the number of such distinct fringe subtrees of size greater than . We upperbound (i) by the number of all ordered binary trees of size at most (irrespective of their occurrence as fringe subtrees), which is
This upper bound holds deterministically. Furthermore, we upperbound (ii) by the total number of fringe subtrees of size greater than occurring in the tree: We apply Lemma 1 with and and let denote the set , such that , to obtain:
in expectation and with high probability as well, as the estimate from Lemma 1 (part (4)) holds with high probability simultaneously for all in the given range. As we have
we can combine the two bounds to obtain the upper bound on stated in Theorem 3, both in expectation and with high probability.
The lower bound: Again, let and . From the first part of the proof, we find that the main contribution to the total number of fringe subtrees in a uniformly random binary tree of size comes from fringe subtrees of sizes with . Hence, in order to lowerbound the number of distinct fringe subtrees in a uniformly random binary tree with leaves, we only count distinct fringe subtrees of sizes with and show that we did not overcount too much in the first part of the proof by upperbounding this number by the total number of fringe subtrees of sizes . To this end, let denote the number of pairs of identical fringe subtrees of size in a uniformly random ordered binary tree of size . Each such pair can be obtained as follows: Take an ordered tree with leaves, pick two leaves, and replace them by the same ordered binary tree of size . The total number of such pairs of identical fringe subtrees of size is thus
By dividing by , i.e. the total number of binary trees of size , we thus obtain the expected value:
Thus, we find
If a binary tree of size occurs times as a fringe subtree in a uniformly random binary tree of size , it contributes to the random variable . Since for all nonnegative integers , we find that is a lower bound on the number of distinct fringe subtrees with leaves. Hence, we have
The second sum is in expectation and thus with high probability as well by the Markov inequality. As the first sum is both in expectation and with high probability by our estimate from the first part of the proof, the statement of Theorem 3 follows.
As the main idea of the proof is to split the number of distinct fringe subtrees into the number of distinct fringe subtrees of size at most plus the number of distinct fringe subtrees of size greater than for some suitably chosen integer , this type of argument is called a cutpoint argument and the integer is called the cutpoint (see [17]). This basic technique is applied in several previous papers to similar problems (see for instance [10], [17], [26], [27]). Moreover, we remark that the statement of Theorem 3 can be easily generalized to simply generated families of trees.
3.2 Unordered Fringe Subtrees
In this subsection, we prove Theorem 1. For this, we refine the cutpoint argument we applied in the proof of Theorem 3: In particular, for the lower bound on , we need a result due to Bóna and Flajolet [4] on the number of automorphisms of a uniformly random ordered binary tree. It is stated for random phylogenetic trees in [4], but the two probabilistic models are equivalent.
Theorem 5 ([4], Theorem 2)
Consider a uniformly random ordered binary tree with leaves, and let
be the cardinality of its automorphism group. The logarithm of this random variable satisfies a central limit theorem: For certain positive constants
and , we havefor every real number . The numerical value of the constant is .
With Theorem 5, we are able to upperbound the probability that two fringe subtrees of the same size are isomorphic in our proof of Theorem 1:
Proof (Proof of Theorem 1)
We prove the statement in two steps: First, we show that the upper bound on stated in Theorem 1 holds both in expectation and with high probability, then we prove the respective lower bound.
The upper bound: The proof for the upper bound in Theorem 1 exactly matches the first part of the proof of Theorem 3, except that we choose a different cutpoint: Let , where is the constant in the asymptotic formula (2) for the WedderburnEtherington numbers. We then find
both in expectation and with high probability, where the estimates for and follow again from Lemma 1. We have .
The lower bound: As a consequence of Theorem 5, the probability that the cardinality of the automorphism group of a uniformly random binary tree of size satisfies tends to as . We define as the set of ordered trees with leaves that do not satisfy this inequality, so that . Our lower bound is based on counting only fringe subtrees in for suitable . The reason for this choice is that we have an upper bound on the number of ordered binary trees in the same isomorphism class for every tree in . Recall that the number of possible ordered representations of an unordered binary tree with leaves is given by by the orbitstabiliser theorem. Hence, the number of ordered binary trees in the same isomorphism class as a tree is bounded above by .
Now set for some positive constant , and consider only fringe subtrees that belong to , where . By Lemma 1, the number of such fringe subtrees in a random ordered binary tree with leaves is
both in expectation and with high probability. Since , the number of fringe subtrees that belong to in a random ordered binary tree of size becomes . We show that most of these trees are the only representatives of their isomorphism classes as fringe subtrees. To this end, we consider all fringe subtrees in for some that satisfies . Let the sizes of the isomorphism classes of trees in be , so that . By definition of , we have for every . Let us condition on the event that their number is equal to for some . Each of these fringe subtrees
follows a uniform distribution among the elements of
, so the probability of being in an isomorphism class with elements is . Moreover, the fringe subtrees are also all independent. Let be the number of pairs of isomorphic trees among the fringe subtrees with leaves. We haveSince this holds for all , the law of total expectation yields
Since , we find that
Thus
As in the previous proof, we see that is a lower bound on the number of nonisomorphic fringe subtrees with leaves. This gives us
The second sum is negligible since it is in expectation and thus also with high probability by the Markov inequality. For the first sum, a calculation similar to that for the upper bound shows that it is
both in expectation and with high probability. Since is arbitrary, we can choose any constant smaller than for .
4 Fringe Subtrees in Random Binary Search Trees
In this section, we prove our results presented in Theorem 2 and Theorem 4 on the number of distinct, respectively, nonisomorphic fringe subtrees in a random binary search tree. In order to show the respective lower bounds of Theorem 2 and Theorem 4, we need two theorems similar to Theorem 5: The first one shows that the logarithm of the random variable , where denotes a random binary search tree of size , satisfies a central limit theorem and is needed to estimate the probability that two fringe subtrees in a random binary search tree are identical. The second one transfers the statement of Theorem 5 from uniformly random binary trees to random binary search trees and is needed in order to estimate the probability that two fringe subtrees in a random binary search tree are isomorphic. The first of these two central limit theorems is shown in [15]:
Theorem 6 ([15], Theorem 4.1)
Consider a random binary search tree with leaves, and let . The logarithm of this random variable satisfies a central limit theorem: For certain positive constants and , we have
for every real number . The numerical value of the constant is
The second of these two central limit theorems follows from a general theorem devised by Holmgren and Janson [22]: Let denote a function mapping an ordered binary tree to a real number. Moreover, given such a mapping , define by
The theorem by Holmgren and Janson states:
Theorem 7 ([22], Theorem 1.14)
Let be a random binary search tree of size . If
then for certain constants and , we have
Moreover, if , then
for every real number . In particular, we have
Note that in [22], the equivalent binary search model is considered that allows binary trees to have unary nodes, so that the index of summation has to be shifted in the sum defining . Moreover, note that if we set for and otherwise, we have
by definition of in (3), and thus Theorem 6 follows as a special case of Theorem 7. This special case is also considered in Example 8.13 of [22].
As our main application of Theorem 7, we transfer the statement of Theorem 5 from uniformly random binary trees to random binary search trees, that is, we show that if the random number denotes the size of the automorphism group of a random binary search tree with leaves, then the logarithm of this random variable satisfies a central limit theorem as well. For this, we define the function in Theorem 7 by
We thus have
that is, evaluates to the number of symmetrical nodes in . Recall that equals the size of the automorphism group of . It is not difficult to check that satisfies the conditions of Theorem 7: As for every , we have and thus as well, so that the assumptions of Theorem 7 are satisfied. In order to determine the corresponding value , we start with estimating the expectation