DeepAI

Entropy rates for Horton self-similar trees

In this paper we examine planted binary plane trees. First, we provide an exact formula for the number of planted binary trees with given Horton-Strahler orders. Then, using the notion of entropy, we examine the structural complexity of random planted binary trees with N vertices. Finally, we quantify the complexity of the tree's structural properties as tree grows in size, by evaluating the entropy rate for planted binary plane trees with N vertices and for planted binary plane trees that satisfy Horton Law with Horton exponent R.

05/26/2020

Compaction for two models of logarithmic-depth trees: Analysis and Experiments

In this paper we are interested in the quantitative analysis of the comp...
07/23/2021

Entropy, Derivation Operators and Huffman Trees

We build a theory of binary trees on finite multisets that categorifies,...
08/19/2020

Counting embeddings of rooted trees into families of rooted trees

The number of embeddings of a partially ordered set S in a partially ord...
03/12/2021

Non-ambiguous trees: new results and generalisation (Full version)

We present a new definition of non-ambiguous trees (NATs) as labelled bi...
05/10/2021

Distinct Fringe Subtrees in Random Trees

A fringe subtree of a rooted tree is a subtree induced by one of the ver...
08/30/2022

Unbalancing Binary Trees

Assuming Zipf's Law to be accurate, we show that existing techniques for...
11/14/2018

A structural characterization of tree-based phylogenetic networks

Attempting to recognize a tree inside a network is a fundamental underta...

I Introduction

Tree-like structures are among the most widely observed natural patterns, occurring in the applied fields of study as diverse as river and drainage networks, botanical trees and leaves, blood systems, crystals, and lightening. In addition, many processes like branching processes, percolation, nearest-neighbor clustering, binary search trees in computer science, spread of a disease, spread of news on social platforms, or propagation of gene traits can be represented as trees; see [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] and references therein.

An important measure of branching complexity of tree graphs was proposed in hydrology by Horton [7] and Strahler [18, 8]. The Horton-Strahler ordering scheme assigns an order to each tree branch in accordance with its hierarchial importance. This measure found its practical application in many different areas, ranging from hydrology and biology to computer science and neuroscience [16, 12, 17, 10, 11, 14, 15, 9]. In particular, Horton self-similarity is an important property, that describes the geometric decrease of Horton-Strahler numbers. In recent years, the questions related to Horton self-similarity were addressed in a variety of scientific publications [19, 20, 21, 11, 17, 22, 23, 16, 6, 24]

. In this work, we consider a space of uniformly distributed planar planted binary trees and determine the number of trees with given structural features, such as the number of vertices in a tree, the Horton-Strahler order of a tree, and the Horton-Strahler numbers. Also we use entropy to study the structural complexity of uniformly distributed planted binary plane trees with given Horton-Strahler numbers. Furthermore, we find closed-form formula for the entropy rate, that describes the growth of the entropy as the number of vertices of the tree grows to infinity. In particular, we consider a special class of binary trees that satisfy Horton law with a given Horton exponent

and find a closed-form formula for its entropy rate. Note that the uniform distribution on the space of planted binary plane trees with vertices is different from the uniform distribution over the space of planted binary non-plane trees, induced by the critical binary Galton-Watson process, conditioned on having vertices.

The paper is organized as follows. In Section II, we provide main definitions and notations that are used throughout this paper. The formula for the number of planted binary trees with specific Horton-Strahler numbers is given in Section III. The main results are provided in Section IV, where, using the notions of entropy and entropy rate, we quantify the structural complexity of Horton self-similar trees and growing tree models. All proofs are provided in the Appendix VI.

Ii Preliminaries

Ii-a Planted binary plane trees

In this paper, a is defined as an acyclic connected graph. A tree with one vertex labeled as the is called a . Presence of the root in a tree provides a natural child-parent relation between the neighboring vertices. More precisely, the of a vertex is the vertex connected to it on the path down, towards the root and the of a vertex is the vertex connected to it on the path up, away from the root. Note that a vertex can have more than one child and every vertex except the root has a unique parent and a unique that connects a vertex to its parent. A is a vertex with no children. The of a vertex is the number of edges incident to a vertex.

A tree is called a tree if each vertex has at most two children. A binary tree is a tree in which every vertex has either zero or two children. A binary tree is a binary tree in which all interior vertices have two children and all leaves have the same depth. In Figure 1 (a) and (b) we depict full and perfect binary trees, respectively. A tree is a rooted tree with a specified ordering for the children of each vertex. This ordering is equivalent to an embedding of the tree in the plane and provides a natural left and right orientations for the children. A is a rooted tree such that its root has degree one and every other vertex is either a leaf or an of degree three (see section 7.2 in [25]). We denote a to be the unique edge that connects the root vertex with its only child. Assuming the tree grows from the root vertex upwards, the root vertex is located at the bottom of the stem. Every planted binary plane tree with leaves has edges and even number of vertices , such that of them are internal. For examples, see Figure 1 (c) and (d).

In this paper, we consider the space of finite unlabeled rooted binary plane trees with no edge length. We denote this space by . Let be the space of all planted binary plane trees with leaves and vertices. The number of possible configurations of a planted binary plane tree with vertices is given by the th Catalan number [25] as follows

 |TN|=Cn−1=1n(2n−2n−1)=(2n−2)!n!(n−1)!,

where and

Ii-B Horton-Strahler ordering

The Horton-Strahler ordering of the vertices and branches in a binary tree is performed, from the leaves to the root node, by hierarchical counting [16, 11, 7, 8, 13, 26] as follows

• each leaf is assigned order ;

• an internal vertex with children of orders and is assigned the order where is the Kronecker’s delta;

• the parental edge of the vertex has the same order as the vertex;

• a of order is a sequence of neighboring vertices of order together with their corresponding parental edges.

The order of a non-empty tree is defined as the maximal order of its vertices. The of a tree is defined as a set of numbers , , where is the number of branches of order . Note that

1. in order to have a branch of order we need to have at least two branches of order , i.e., , ;

2. a planted binary plane tree of order will have only one branch of order , i.e., .

In this work, we consider only admissible sequences, defined as the sequences that satisfy the two conditions described above. We call an admissible sequence a set of Horton-Strahler numbers. To illustrate the Horton-Strahler ordering of a planted binary plane tree consider an example in Figure 2.

Denote be the space of all planted binary plane trees with Horton-Strahler numbers . Note that , where . In the next section, we present our first statement that gives - the number of possible planted binary plane trees with Horton-Strahler numbers .

Iii Number of planted binary plane trees with given Horton-Strahler numbers

Lemma 1.

The number of planted binary plane trees of order with a particular set of Horton-Strahler numbers and with vertices is given by the following formula

 |TN1⋯NK|=2N1−1−∑K−1i=1Ni+1K−1∏i=1(Ni−22Ni+1−2), (1)

where

The above lemma is known since its publication by Shreve in 1966 [27]. The detailed proof of Lemma 1 is provided in the Section VI-B. Note that, although done for planted binary plane trees, the results of this lemma can be applied to the trees without a stem and to the trees with a [16, 6].

Iii-1 Example

Suppose we want to find the number of planted binary plane trees of order with the Horton-Strahler numbers , and vertices. Using formula (1), we find that the number of such trees is

 |T7,3,1|=2N1−1−∑K−1i=1Ni+1K−1∏i=1(Ni−22Ni+1−2)=27−1−3−1(7−26−2)(3−22−2)=225!4!1!=4×5=20.

In Figure 3, we depict all planted binary plane trees of order with vertices. In Table I we present the number of trees for different sets of Horton-Strahler numbers.

Iv Entropy and entropy rates

In this section, we use Shannon entropy to quantify the structural complexity of a tree. We propose an entropy based measure, namely the entropy rate of a tree, to examine how the structural complexity of a tree changes as a tree is allowed to grow in size. We find entropy rates for two types of trees: the planted binary plane trees with vertices and the planted binary plane trees with vertices that satisfy Horton law with Horton exponent .

Iv-a Entropy and entropy rate for space TN

Recall that entropy is a measure of the average uncertainty in the random variable

[28, 29]

. For a discrete random variable

with possible values

and probability mass function

the entropy of is defined by

 H(X)=−n∑i=1P(xi)log2P(xi)=−E[log2P(X)],

where the quantity is taken to be . The entropy is measured in bits. We can think of as the uncertainty of the outcome (or the “surprise” of observing ). Thus, the entropy is the average “surprise”. If possible values of are uniformly distributed, i.e., , then there is a maximal uncertainty about the outcome, maximum “surprise”. In this case, the entropy achieves its maximal value . For example, the entropy of a fair coin toss is 1 bit. On the other hand, the occurrence of a certain event (, i.e., no “surprise”) has minimal uncertainty, which corresponds to the minimal value of entropy .

From the information theory point of view, the entropy of a random variable can be thought of as an average number of bits required to describe the random variable [28]. Consider a random variable that has a uniform distribution over outcomes, e.g., an eight-sided dice. The entropy of is bits. A -bit string takes on different values and is sufficient to describe outcomes of . Note that all outcomes of have representations of the same -bit length. Consider now a random variable with a nonuniform distribution. Assume can take possible values with corresponding probabilities . The entropy of is bits. Using Huffman coding technique we can encode outcome as strings , respectively. Since we use shorter description for the more probable outcome and longer descriptions for the less probable outcomes , the average description length is equal to the value of the entropy and is exactly bits.

Consider now a space , and let be a probability measure over . We define the entropy of a random planted binary plane tree as follows

 H(TN)=−E[log2P(TN)],

and consider it to be the measure of the structural complexity of a tree. Informally, the larger the entropy, the more complex is the tree’s dendritic structure. The entropy of a tree gives the average number of bits needed to encode it. In order to analyze the entropy’s growth rate as , we define the entropy rate to be the limit of normalized entropies , as

 H∞=limN→∞H(TN)N,

provided that the limit exists. The entropy rate quantifies per vertex entropy. In other words, for large the entropy rate gives the average number of bits per vertex required to encode the tree. In fact, for large there exist an arithmetic coding scheme that encodes a tree with vertices using about bits [30]. Arithmetic coding can get arbitrarily close to the entropy, because it does not convert each vertex separately, but assigns one codeword to the entire tree. The tree can be recreated from this codeword.

The next lemma provides a formula for the entropy of a random planted binary plane tree with vertices.

Lemma 2.

For a given space , equipped with a uniform distribution, the entropy of a random tree is given by

 H(TN)=N−log2N−1+O(log2N).

The proof of this result is given in Section VI-C.

Corollary 1.

For a given space , equipped with a uniform distribution, the entropy rate is

 H∞=limN→∞H(TN)N=1.

The proof of the Corollary 1 is given in Section VI-D. Lemma 2 and Corollary 1 demonstrate that for large enough , we need about bits per tree or about one bit per vertex to encode any tree . While presented in a different context, Corollary 1 reaffirms the entropy rate of the maximum entropy model in [30].

Iv-B Entropy and entropy rate for space TK,R

In Section II-B we introduced a space of planted binary plane trees with an arbitrary (but admissible) set Horton-Strahler numbers . Quite often, however, observed tree-like structures display geometric decrease of the numbers of elements of Horton-Strahler order . This property is known as Horton self-similarity, also referred to as the Horton Law. Formally, the Horton Law states the existence of the limit , where the quantity is called the Horton exponent. There are multiple models with broad range of Horton exponents that appear in different scientific areas and have practical importance in a variety of applications [16, 13, 11, 31, 19]. For example, a perfect binary tree satisfies the Horton law with while the critical binary Galton-Watson tree [16, 11, 25, 32] satisfies the Horton law with . The real river networks have Horton exponent in a range [33, 7], e.g., for Amazon river and for Mississippi river . In fact, for many natural tree-like structures . It was confirmed in hydrology [27, 34, 11, 35, 36, 37], biology, and other areas [13].

Next, we introduce a space of planted binary plane trees that satisfy Horton law with Horton exponent and examine its entropy rate.

Let be a space of planted binary trees with vertices and the Horton-Strahler numbers , that are defined in a special form as follows

 Nk∈(RK−k−αK−k,RK−k+αK−k),

where such that and . In other words, with an error dominated by the power of an exponent smaller than . It is easy to see that this model satisfies the Horton law with Horton exponent .

Theorem 1.

For a given space , equipped with a uniform distribution, the entropy rate is given by

 H∞(R)1−1−H(2/R)2−2/R,

where is a binary entropy of .

The proof of Theorem 1 is given in Section VI-E. In Figure 4 we depict the entropy rate for a range of values of . The entropy rate is zero for because the dendritic structure of a perfect planted binary plane tree is predetermined for any . Note that, when is allowed to grow, the entropy rate converges to . More precisely,

 limR→∞H∞(R)=limR→∞(1−1−H(2/R)2−2/R)=12.

Thus, for large and one would need about bits to decode the entire tree. It would be interesting to explain why trees with large enough require less bits to encode them than the trees with .

For the entropy rate attains its maximal value . Recall that the critical binary Galton-Watson model has parameter . It was noticed that the Horton exponents for real rivers are different from the theoretical parameter [11]. For example, for Amazon river . Consequently, entropy rate for Amazon river is . A natural question to ask would be: What physical phenomenon does cause the nonoptimality of entropy rate of the rivers? A possible explanation of this phenomenon is given in [38]: although river deltas adjust their configurations to maximize the entropy, this maximization happens within local feasibility constraints, thus global maximum is not achieved.

V Conclusion

This work was motivated by the growing interest in statistical and complexity characteristics of tree-like structures. We considered several spaces of planted binary plane trees and examined structural complexity of those trees. Specifically, we calculated the number of planted binary plane trees with particular Horton-Strahler orders. We defined and evaluated the entropy for a space of planted binary plane trees with vertices. We introduced the entropy rate measure in order to explain the long term behavior of growing tree model and find closed-form formulas for the entropy rate for a space of planted binary plane trees with vertices. Moreover, we found entropy rate for a space of planted binary plane trees that satisfy the Horton Law with Horton exponent . The author is currently working on extending these results under different forms of trees self-similarity.

Vi Appendix

Vi-a Auxiliary Lemma

Lemma 3.

For any and such that the following asymptotic approximation is true

 log2(nk)=nH(kn)+O(log2n), (2)

where and is a binary entropy of .

Proof.

Using the Stirling’s approximation we obtain the required approximation as follows

 log2(nk) = log2(n!k!(n−k)!) (3) = nlog2n−(log2e)n−klog2k+(log2e)k−(n−k)log2(n−k) (4) + (log2e)(n−k)+O(log2n)−O(log2k)−O(log2(n−k)) (5) = nlog2n−klog2k−(n−k)log2(n−k)+klog2n−klog2n+O(log2n) (6) = n(−knlog2(kn)−(1−kn)log2(1−kn))+O(log2n) (7) = nH(kn)+O(log2n). (8)

Vi-B Proof of Lemma 1

Proof.

We prove this theorem by providing a method to construct and count trees with fixed Horton-Strahler numbers. We start by introducing a few helpful definitions.

For a given tree we define the (also know as a skeleton in related publications) to be the minimal subtree of the same order with the same root. Each branch of order is obtained by merging two of order , . All other frames are . Thus, given a set of Horton-Strahler numbers such that , , the number of necessary frames of order is and the number of extra frames of order is .

To illustrate the notion of necessary and extra frames, consider a planted binary plane tree of order depicted in Figure 5. Note that each extra frame of order is attached to the branch of higher order because all other ways to attach extra frames will result in either non-binary trees or in binary trees with incorrect Horton-Strahler numbers and incorrect number of vertices. For example, no extra frame can be attached to the root node of the main frame, since the root node should be of degree . Also extra frame can not be attached to the leaf vertices or to the internal vertices of the main frame; otherwise it will result in a non-binary tree. Moreover, attaching an extra frame of order to the branch of lower order will result in a tree with incorrect number of branches of order : instead of the tree will have branches of order . Finally, attaching an extra frame of order to the branch of the same order results in a tree that is redundant to the tree constructed by attaching an extra frame of order to the branch of higher order. See Figure 6 for an example. Therefore, to find the total number of planted binary plane trees of order with vertices and given Horton-Strahler numbers , we should start with a main frame of order and then count all possible ways we can attach all extra frames to the branches of higher orders, starting with the extra frames of order , followed by the extra frames of order , and so on. Extra frames of order will be attached at the end.

We start with the main frame of order . Denote to be the number of trees we obtain by attaching extra frames of order to one branch of order of the main frame. In general, the number of ways to place identical objects into different positions is given by the formula

 (n+k−1k−1)=(n+k−1)!(k−1)!n!. (9)

We also need to take into account that each extra frame can be attached to the middle point of a branch either form the left or from the right. Therefore, can be calculated as follows

 TK−1→K=2MK−1(NK+MK−1−1NK−1)=2NK−1−2NK(NK−1−2NK−1). (10)

Note that when we attach extra frames of order to one branch of order , we brake the branch of order into edges. The Horton-Strahler number does not change: there is still one branch of order , i.e., , but it consists of edges of order . For example in Figure 6 (b), attachment of an extra frame of order to the branch of order , broke the branch of order into two edges.

Next, denote to be the the number of different trees we obtain by attaching extra frames of order to branches of higher orders and . There are edges of order and branches of order . Thus, there are edges of orders and to which we can attach extra frames of order . By using formula (9) and considering that each extra frame can be attached either from the left or from the right we obtain as follows

 TK−2→K−1,K = 2NK−2−2NK−1(NK−2−22NK−1−2). (11)

Consider now an intermediate step. Let be the number of trees that we obtain by attaching extra frames of order to the branches of orders . Note that there are now

 k = NK+MK−1+NK−1+MK−2+NK−2+⋯+Mi+1+Ni+1 (12) = NK+NK−1−2NK+NK−1+NK−2−2NK−1+NK−2+⋯+Ni+1−2Ni+2+Ni+1 = 2Ni+1−NK=2Ni+1−1

edges of orders , to which we can attach extra frames of order . Thus, using formula (9) and considering that each extra frame can be attached either from the left or from the right we obtain as follows

 Ti→i+1,⋯,K=2Ni−2Ni+1(Ni−22Ni+1−2). (13)

Equation (13) provides a general formula for the terms , .

Note now that for every possible attachment of extra frames of order there are possible attachments of extra frames of order . Thus, using the multiplication principle of combinatorics, we obtain the total number of planted binary plane trees of order with particular Horton-Strahler numbers and vertices as follows

 |TN1⋯NK| = 1∏i=K−1Ti→i+1,⋯,K=1∏i=K−12Ni−2Ni+1(Ni−22Ni+1−2) = 2∑K−1i=1(Ni−2Ni+1)K−1∏i=1(Ni−22Ni+1−2)=2N1−1−∑K−1i=1Ni+1K−1∏i=1(Ni−22Ni+1−2).

Vi-C Proof of Lemma 2

Proof.

First note that the probability of a random tree is since we assume the uniform distribution of trees in . Thus, the entropy is given as follows

 H(TN)=−E[log2P(TN)]=−Cn−1∑i=11Cn−1log21Cn−1=log2Cn−1. (14)

Note that term can be rewritten in the following way

 (15)

where we used the fact that . Using the results of Lemma 3, given in Section VI-A, we obtain the entropy of as follows

 H(TN) = log2[2N(N−2N2−1)]=1−log2N+2(N2−1)H(12)+O(log2N) (16) = 1−log2N+N−2+O(log2N)=N−log2N−1+O(log2N).

Vi-D Proof of Corollary 1

Proof.

Dividing by and taking the limit as we obtain the entropy rate as follows

 H∞(TN)=limN→∞H(TN)N=limN→∞N−log2N−1+O(log2N)N=1. (17)

Vi-E Proof of Theorem 1

We begin with two auxiliary lemmas that will be used in the proof of Theorem 1.

Lemma 4.

Given parameters and , the following properties hold in space

1. , ,

2. ,

3. ,

4. .

Proof.
1. Using the fact that the Horton-Strahler numbers satisfy and , we conclude that .

2. Let then . Since , then .

3. .

4. .

5. Since for any planted binary plane tree , then .

Lemma 5.

Let be a space of planted binary trees with particular set of Horton-Strahler numbers , such that , . Then

 limN→∞log2|TN1,N2,⋯,NK,R|N=1−1−H(2/R)2−2/R.
Proof.

We begin the proof by using the results of Lemma 1 that gives us the number of planted binary trees with given Horton-Strahler numbers . Thus,

 limN→∞log2|TN1,N2,⋯,NK,R|N = limN→∞1Nlog2(2N1−1−∑K−1i=1Ni+1K−1∏i=1(Ni−22Ni+1−2)) = limN→∞1N[N1−1−K−1∑i=1Ni+1]+limN→∞1N[K−1∑i=1log2(Ni−22Ni+1−2)].

Note that the term can be rewritten in the following way

 (Ni−22Ni+1−2)=(Ni−2)!(2Ni+1−2)!(Ni−2Ni+1)!=(Ni2Ni+1)(2Ni+1−2)(2Ni+1−1)(Ni−2)(Ni−1).

Therefore,

 = limN→∞1N[N1−1−K−1∑i=1Ni+1]+limN→∞1NK−1∑i=1log2(Ni2Ni+1) (18) + limN→∞1NK−1∑i=1log22Ni+1−2Ni−2+limN→∞1NK−1∑i=1log22Ni+1−1Ni−1.

We consider each of the four limits in (VI-E) separately. Starting with the first limit, we notice that term can be rewritten in the following way

 N1−1−K−1∑i=1Ni+1 = 2N1−1−K∑i=1Ni=N−1−K∑i=1(RK−i±αK−i) (19) = N−1−RKK∑i=1R−i−(±1)αKK∑i=1α−i = N−1−RK−1R−1−(±1)αK−1α−1.

Thus, dividing equation (19) by and taking the limit as we find the value of the first limit in (VI-E)

 limN→∞1N[N1−1−K−1∑i=1Ni+1] = limN→∞1N[N−1−RK−1R−1−(±1)αK−1α−1] (20) = 1−limN→∞1NRK−1R−1=1−R/2R−1,

where the last equation is obtained using result 5) of Lemma 4.
Consider now the second limit in equation (VI-E). Using the result of Lemma 3, provided in Section VI-A, we can rewrite the second term as follows

 limN→∞1NK−1∑i=1log2(Ni2Ni+1) = limN→∞1NK−1∑i=1[NiH(2Ni+1Ni)+O(log2Ni)].

To examine the term , we break it into two sums

 K−1∑i=1NiH(2Ni+1Ni) = K′−1∑i=1NiH(2Ni+1Ni)+K−1∑i=K′NiH(2Ni+1Ni), (21)

where .
Consider the first sum in equation (21). Using the fact that , we obtain the following upper bound on term

 2Ni+1Ni≤2RK−(i+1)+αK−(i+1)RK−i−αK−i=2R×1+(αR)K−(i+1)1−(αR)K−i≤2R×1+(αR)K−K′1−(αR)K−1.

Thus,

 2Ni+1Ni≤2R⎛⎝1+O⎛⎝(αR