Bayer & McCreight (1970, 1972) invented B-trees which are balanced tree data structures appropriate for organization and maintenance of large ordered indices, especially on disks. Since each node of a B-tree should allocate room for a predetermined maximum number of keys, B-trees are memory-inefficient. By linking keys of a B-tree node by left arcs, Bayer (1971) introduced a binary tree representation of B-trees which avoided their storage overhead. Bayer (1972) introduced symmetric binary trees, which were binary tree representations of 2-3-4 trees and allowed the keys within a B-tree to be either linked by left arcs or right arcs. Symmetric binary trees were named red-black (RB) trees thereafter when Guibas & Sedgewick (1978) proposed a dichromatic framework for balanced trees. Since then, many improvements to RB trees have been proposed. Some authors (Andersson et al., 1990; Roura, 2013) tried to decrease the maximum height of RB trees, which is in the worst case. Others tried to uncouple updating from rebalancing, allowing a greater degree of concurrency and postponed processing (Boyar & Larsen, 1994; Park & Park, 2001; Larsen, 2002; Besa & Eterovic, 2013; Howard & Walpole, 2014). While being extremely useful in applications, RB trees have always been criticized for being baffling and inappropriate for pedagogical purposes. To simplify RB trees, Andersson (1993) proposed right-leaning red-black trees in which only the right nodes could be red. In another attempt to simplify RB trees, Okasaki (1999) proposed an algorithm for insertion into RB trees using functional programming in Haskell. By temporarily introducing a third ”double-black” color, Germane & Might (2014) proposed a functional delete algorithm for RB trees. Attempting to simplify RB trees for pedagogical purposes, Sedgewick (2008) proposed left-leaning red-black (LLRB) trees. Although the insert algorithm of LLRB trees is simple, the delete algorithm is even more incomprehensible than classical RB trees. In fact, the real problem with classical RB trees is the delete algorithm which its rationale is unclear (Germane & Might, 2014; Sen et al., 2016).
In this paper, we initially consider 2-3 RB trees, in which children cannot both be red, and propose an insertion algorithm and an intuitive parity-seeking delete algorithm that is highly suitable for educational purposes. We then show that, with a simple amendment, the proposed parity-seeking delete algorithm can be used in ordinary 2-3-4 RB trees, yielding the first pedagogically sound algorithm for RB trees. Besides, our experiments on 2-3 and 2-3-4 RB trees show that the proposed parity-seeking delete algorithm is extremely efficient. The rest of the paper proceeds as follows. In Section 2, we review the classical algorithm of RB trees as was explained in (Cormen et al., 2009). In Section 3, we review the more recent LLRB trees (Sedgewick, 2008) and show that, despite the claims of the author, the deletion algorithm is extremely inefficient and unintuitive. In Section 4, we consider 2-3 RB trees and propose an insertion algorithm along with a novel parity-seeking delete algorithm that is much simpler than the delete algorithm of classical RB trees. In Section 6, we experimentally evaluate the performance of the standard RB trees, as described by Cormen et al. (2009), LLRB, and the proposed 2-3 and 2-3-4 RB trees. We conclude the paper in Section 7.
2 Red-Black (RB) Trees
RB trees can be defined both for general binary trees that preserve the inorder iteration of elements (Sahni, 1998, 2005) and more specifically for binary search trees (Cormen et al., 2009). In this paper, for simplicity, we define RB trees as binary search trees. The generalization of the proposed method to general binary trees is straightforward.
Definition 1 (RB Trees).
An RB tree is a binary search tree with one additional attribute in each node: its color, which can be either red or black. RB trees have the following properties:
The root node is black.
If a node is red, then its parent is black.
The number of visited black nodes from the root to all external nodes is the same 111 We assume that the null pointers of the leaf and degree-1 nodes are replaced by pointers to some imaginary nodes called external nodes. In fact, since we use the nil trick (Cormen et al., 2009), in our implementation, all external nodes are represented by the nil node..
Sometimes it is useful to refer to the color of a link. The color of the link between a child node and its parent, is the color of the child node.
2.1 Relation between RB trees and B-trees of order 4 (2-3-4 trees)
Considering an RB tree, if we draw the red links horizontally and the black links vertically, then a representation is obtained in which, due to the 3rd property in definition 1, all leaves are drawn at the same level. Furthermore, if we place the horizontally connected nodes in one compound node, then the 2-3-4 tree equivalent of the very RB tree is obtained. Figure 4 shows an RB tree along with its other equivalent representations. In illustrations of this paper, we depict black nodes and links by solid lines, the red nodes and links by solid double lines, and those with either red or black colors by dotted lines.
2.2 Basic operations in RB Trees
After inserting/deleting a node into/from an RB tree, the properties of definition 1 may be violated. While modifying the tree in order to comply with definition 1, it is important that the order of the nodes in the inorder traversal of the tree does not change, so that the resulting tree would remain a valid binary search tree. In this section, we introduce the basic operations that preserve the properties of binary search trees. These operations are left rotation and right rotation, which are shown in Figure 5. Furthermore, changing the color of nodes is another operation that preserves the properties of binary search trees. To understand the color of nodes after rotation, it is easier to assume that the links are rotated and infer the color of nodes from the color of their links to their parents.
2.3 Insertion algorithm of RB trees
The insert algorithm of RB trees works in two steps. Initially, the new data is inserted according to the rules of binary search trees in a new red node. Then, if any property of definition 1 is violated, the tree is fixed with appropriate fix-up operations. The 3rd property of definition 1 could not be violated as the newly inserted node is colored red. If the insertion is applied to an empty tree, then the 1st property of definition 1 is violated, which is simply fixed by changing the color of the root node to black. The only potential problem is the violation of the 2nd property of definition 1, i.e. the occurrence of two consecutive red nodes. Assuming that a child node and its parent are both red, and that the parent node is a left child, the tree is fixed using the following rules:
If the sibling of the parent node is black, and the current node is a right child, then a left rotation is performed on its parent node (Figure (c)c). The situation becomes ready for applying the next rule.
If the sibling of the parent node is black, and the current node is a left child, then a right rotation is performed on the grandparent node (Figure (d)d).
The rules for the case that the parent node is a right child, are obtained by exchanging ”left” and ”right” in the above statements.
2.4 Deletion algorithm of RB trees
The delete operation may happen at the root node, an internal node, or a leaf node. Firstly, if the to-be-deleted node is of degree 2, its value is replaced by the greatest value in the left subtree or the smallest value in the right subtree, transferring the deletion to a degree-1 node or a leaf node. Then, the actual deletion is performed according to the following rules:
Deleting a degree-1 node: Since degree-1 nodes do not possess a child on one side, the existence of a black node further down their subtree is precluded. Also, since a node and its child cannot both be red, it is only possible for a degree-1 node to be a black node with a single red child. In this case, the value of the red child node is copied to the degree-1 node, and the red child node is deleted.
Deleting a red leaf node: In this case, the node is simply removed and the resulting tree is a legitimate RB tree.
Deleting a black leaf node: After deleting a black leaf node, the number of black nodes from the root node to the leaves of the left and right subtrees of its parent would be different, and the 3rd property of definition 1 would be violated. In this case, until at least one of the rules of Figure 16 is applicable, the fix-up operations are continued.
The main problem with the rules of Figure 16 is not their number, but their unclear rationale. For example, the rule of Figure (e)e states that if the root of the deficient subtree is black, its sibling is black, and the right child of the sibling is red, then make the right child of the sibling black, and perform a left rotation on the sibling. From an educational point of view, the problem with this rule is that one has no idea what the rationale behind it is.
3 Left-Leaning Red-Black (LLRB) Trees
For pedagogical purposes, Sedgewick (2008) proposed LLRB trees to lessen the complexity of classical red-black trees. An LLRB tree is a red-black tree in which all red nodes are left children of their parents. LLRB trees have a one-to-one correspondence with 2-3 trees. Figure 19 shows an example of this one-to-one correspondence. Sedgewick (2008) proposed a neat insertion algorithm and taught it in his MOOC algorithms course on Coursera (Wayne & Sedgewick, 2012). However, as we will show, the deletion algorithm of LLRB is neither efficient nor suitable for educational purposes.
3.1 Insertion algorithm of LLRB tree
As in classical RB trees, the insert algorithm of LLRB trees starts by inserting a new leaf node into a binary search tree with the color red. In addition to the possibility of having double red links, which is a violation of the 2nd property of RB trees in definition 1, the inserted node could be a right child, violating the sole new constraint of LLRB trees. Sedgewick (2008) proposed the three operations of left rotation, right rotation, and color flip to transform the resulting tree into a correct LLRB tree (Figure 23). Note that in contrast to classical RB trees where there were 3 other symmetric cases, since LLRB trees do not permit red right children, here all cases are the three ones shown in Figure 23. One of the important weaknesses of the insert algorithm of LLRB is that these rules should be applied until reaching the root node, even though it is possible to infer that the tree has been fixed up long before reaching the root. The reason for this inefficiency is that the insert algorithm is implemented recursively and there is no way to empty the call stack except throwing an exception. In fact, our attempt to modify the code of LLRB to terminate the fix-up operation by throwing an exception led to the severe slow down of the algorithm.
3.2 Deletion algorithm of LLRB tree
Sedgewick (2008) proposed a recursive top-down algorithm for deletion in LLRB trees. To delete a node, the algorithm starts from the root node and moves left/right towards the to-be-deleted node. The algorithm prepares the scene to apply the actual deletion to a red node and, therefore, as it descends the tree it ensures that either the current node or its left child is red. If it is not the case, the algorithm enforces this property by two methods named ”moveRedLeft” and ”moveRedRight”. As the deletion algorithm descends the tree, it modifies the tree extensively and causes immense changes. This is awful since it is possible that the query node does not exist, or it is already red and, therefore, can be simply deleted. Figure 24 shows an example of a tree in which the deletion operation is as simple as solely deleting the node with the given key, while the delete algorithm of LLRB engages in immense modifications to the tree.
4 The considered framework: 2-3 RB Trees
We define a 2-3 RB tree as a red-black tree in which both children of a node can not be red. Note that, like (Bayer, 1972) and in contrast to (Bayer, 1971; Andersson, 1993; Sedgewick, 2008), 2-3 RB trees treat the left and right children symmetrically. While LLRB trees are in one-to-one correspondence with 2-3 trees, there might be multiple equivalent 2-3 RB trees for a given 2-3 tree. Figure 28 illustrates a 2-3 tree and two of its equivalent 2-3 RB trees.
4.1 Proposed Insertion algorithm for 2-3 RB trees
To insert a value in a 2-3 RB tree, we initially insert it with the color red in the position determined by the rules of binary search trees. Then, if necessary, we perform fix-up operations until we obtain a legitimate 2-3 RB tree. For two reasons the resulting tree, after the initial insertion, might not be a legitimate 2-3 RB tree: (I) the parent of the just-inserted node is red, or (II) its sibling is red. Let us denote the node of the tree which has one of these problems by . Our proposed rules for case I, in which the node and its parent are both red, are shown in Figure (a)a and Figure (b)b. In case II, in which the just-inserted node and its sibling are red, we propose a color-flip operation as shown in Figure (c)c. We terminate the fix-up operations as soon as the color of becomes black. We make root the child of a dummy node with the color black, to ensure that a black node is eventually visited, and the procedure terminates. Finally, we reset the color of the root to black.
The fix-up operations of the insert algorithm of 2-3 RB trees terminate.
As is clear from Figure 32, at each step, the node marked with becomes one level closer to the root node. Therefore, the maximum possible number of fix-up operations is the height of the tree.
4.2 The proposed parity-seeking delete algorithm for 2-3 RB trees
In this section, we describe our proposed parity-seeking delete algorithm in the context of 2-3 RB trees. First, according to the deletion rules of binary search trees, the initial delete operation is transferred to a leaf or a degree-1 node. Now, if the degree of the to-be-deleted node is one, then, from property 3 of definition 1, it follows that its whole subtree is a single red child. Therefore, to delete a degree-1 node, it suffices to delete its red child and put its value in its parent. Now, consider the case of deleting a leaf node. If the leaf node is red, then it can be simply deleted and the resulting tree is a valid 2-3 RB tree (Figure 35). The hard case is deleting a black leaf node. First, let us define deficient subtrees.
Definition 2 (Deficient subtree).
A subtree rooted at a node is deficient if (1) neglecting the color of , it is a 2-3 RB tree, and (2) the number of visited black nodes from to the leaves is one less than that of ’s sibling.
Assume that we want to delete a black leaf node named . After deleting , we replace it with nil and set the parent of nil to the parent of . Therefore, initially, nil is the root of the deficient subtree. Inductively, assume that is the root of the deficient subtree, and is its sibling. Our parity-seeking delete algorithm works as follows: it either fixes the deficiency of the node or also makes its sibling deficient, elevating the deficiency to the parent node. There are three possibilities:
and are both black.
is black and y is red.
Case I is simply handled by changing the color of to black, which resolves the deficiency of . In the following subsections, we explain our algorithm for the other two cases.
4.2.1 Case II: both the root of the deficient subtree, and its sibling are black
Assume that both the root of the deficient subtree, i.e. , and its sibling, i.e , are black. We attempt to move the deficiency one level higher by turning red. If one of y’s children is red, a vertical double-red link situation arises. Our handling for cases that one of ’s children is red is shown in Figures (c)c and (d)d
. Please note that at the moment we are fixing the subtree rooted at the common parent ofand , and a potential vertical double-red link between and its parent will be resolved when deficiency reaches ’s grandparent. If none of ’s children are red, the deficiency is transferred to the parent of and (Figure (b)b). Please note that there is no special handling for the case that the whole tree becomes deficient as it is automatically handled by cases I and II.
4.2.2 Case III: the root of the deficient subtree is black and its sibling is red
In this case is black and is red. Therefore, children of are black. In this case, we can neither fix the deficiency of as is black, nor can make the sibling deficient as is red. We perform a rotation on the common parent of and so that the new sibling of becomes one of the children of . Since the new sibling of x is black, the algorithm returns to case II. Figure (e)e illustrates this situation. In contrast to the insert algorithm in which the considered node was steadily moving up the tree, in the delete algorithm the deficient subtree can both move up or down the tree. In the following proposition, we prove that, despite this, the delete algorithm of 2-3 RB trees terminates.
The proposed parity-seeking algorithm for deletion in 2-3 RB trees terminates and generates a legitimate 2-3 RB tree.
We need to prove that, in all the three cases of the delete algorithm, the problem of deficiency is resolved. We have:
In case I, where was red, the deficiency problem was completely resolved by making black (Figure (a)a). In this case the algorithm clearly terminates.
In case III, where is black and is red, the algorithm eventually moves to case II. Considering Figure (e)e, if at least one of C’s children are red, the deficiency problem is resolved immediately as was shown in Figures (c)c and (d)d. On the other hand, if both children of were black, then, after applying rules of case II, becomes red and the deficiency problem transfers to the red node . The deficiency of the red node is then immediately resolved by changing its color to black by case I.
5 A Parity-Seeking delete algorithm for classical RB trees
After preparing this manuscript, we noticed the high similarity between the proposed parity-seeking delete algorithm of 2-3 RB and the delete algorithm of classical RB trees. Rules (a), (b), and (c) in Figure 16 for deletion in RB trees are identical to rules (a), (e), and (b) in Figure 41 for deletion in 2-3 RB trees. The series of operations performed in rules (c) and (d) in Figures 41 for 2-3 RB trees have the same effect as rules (c) and (d) in Figure 16 for RB trees. The only difference is that, in 2-3 RB trees, the case where has two red children is impossible, while this situation is subsumed in case (e) of Figure (a)a for classical RB trees. By substituting rule (d) in Figure (d)d with the new rule shown in Figure 42, we obtain an intuitive parity-seeking delete algorithm for RB trees. It must be mentioned that, in our implementation, we follow all intermediate steps shown in Figures 41 and 42. To distinguish it from classical RB trees, we call a red-black tree with the new parity-seeking delete algorithm, a 2-3-4 RB tree.
In this section, we experimentally compare our proposed 2-3 and 2-3-4 RB trees with classical RB trees and LLRB trees in inserting and removing random sequences of numbers. For LLRB trees, we started from the implementation of Sedgewick (2008) in java and translated it to C++ for fair comparison. We were forced to modify the code slightly and handle some null references since even the original java implementation crashed in our extensive tests. We implemented RB trees based on (Cormen et al., 2009) with a nil node, trying to make it similar to the elegantly concise implementation of LLRB. Then, we implemented our 2-3 and 2-3-4 RB trees with as few modifications as possible to the implementation of RB trees. Our goal of having a common basis for the implementation of RB, 2-3 RB, and 2-3-4 RB trees was to ensure that any difference in performance is solely due to algorithmic issues and all codes have been optimized to the same level. For fair comparison, we added the nil node to the implementation of LLRB, which helped in removing some conditional statements. All experiments have been performed on a UX310UQ notebook PC with an Intel(R) Core(TM) i7-6500U CPU @ 2.5GHz and 12 GB memory on a 64-bit windows 10 operating system. We report both the average number of rotations and the average execution time. Table 1 shows the number of rotations for each algorithm, normalized by and multiplied by for better readability. As it can be seen, the average number of rotations in LLRB is almost 2 times of RB and 2-3-4 RB in the insert and almost 20 times in the delete algorithm, showing extreme inefficiency of LLRB. Comparing RB and 2-3 RB, we observe that the number of rotations in the insert algorithm of 2-3 RB trees is almost times of that of RB trees. The number of rotations in the delete algorithm of RB and 2-3 RB trees are almost equal. In fact, the number of rotations of the delete algorithms of RB and 2-3 RB are identical and the observed difference is solely due to the different initial trees obtained by different insertion algorithms. As expected, the number of rotations of RB and 2-3-4 RB trees are identical.
Table 2 reports the running time of RB, LLRB, and 2-3 RB, and 2-3-4 RB trees, normalized by . As it can be seen, the running time of RB, 2-3 RB, and 2-3-4 RB trees are almost equal, while the running time of LLRB trees is almost twice of them. This shows that the number of rotations is not an appropriate unit for measuring the running time of red-black trees as it does not reflect the actual running time. Although, our motivation for introducing the parity-seeking delete algorithm was pedagogical, we observe that the resulting algorithm is also very efficient.
|Rotations during insertion||Rotations during deletion|
|n||#rep||RB||LLRB||2-3 RB||2-3-4 RB||RB||LLRB||2-3 RB||2-3-4 RB|
|Normalized Average Insertion Time||Normalized Average Deletion Time|
|n||#rep||RB||LLRB||2-3 RB||2-3-4 RB||RB||LLRB||2-3 RB||2-3-4 RB|
In this paper, we introduced the parity-seeking delete algorithm for 2-3 and classic RB trees. Our goal was to introduce a pedagogically sound and easily understandable algorithm for deletion in red-black trees. The proposed parity-seeking delete algorithm is very natural and easily understandable. Specifically, the rationale behind the parity-seeking delete algorithm is to balance the deficient subtree and its sibling by either fixing the deficient subtree or making the sibling also deficient, elevating the deficiency one level higher. In our experiments, we found that the performance of 2-3 RB trees is very close to classical RB trees both in the insert and delete operations. Besides, we also introduced a parity-seeking delete algorithm for classical RB trees which its performance is almost identical to the classic delete algorithm of RB trees. The goal of devising a simple yet efficient algorithm for the delete operation in red-black trees is finally achieved.
The parity-seeking delete algorithm came to the mind of Kamaledin Ghiasi-Shirazi when he taught LLRB trees in his data structure course. He invited his former students, Taraneh Ghandi, Ali Taghizadeh, and Ali Rahimi-Baigi, to participate in the preparation of this paper. All authors validated the idea in common sessions, and Ali Taghizadeh, Ali Rahimi-Baigi, and Taraneh Ghandi implemented 2-3 RB along with the competing methods of RB and LLRB. Ali Taghizadeh and Ali Rahimi-Baigi carefully studied RB and LLRB trees and explained it to other members of the team. The paper was initially written on the blackboard of a classroom in Persian, with all authors participating and discussing. The paper was then translated to English by Taraneh Ghandi and Kamaledin Ghiasi-Shirazi. All graphics have been produced by Taraneh Ghandi. Considering the extreme importance of the topic, Kamaledin Ghiasi-Shirazi re-implemented RB, 2-3 RB, and 2-3-4 RB trees in a unified framework for a fair comparison. Kamaledin Ghiasi-Shirazi revised the manuscript and prepared the final manuscript. All authors carefully read and commented on the final manuscript.
- Andersson (1993) Andersson, A. (1993). Balanced search trees made simple. In Workshop on Algorithms and Data Structures (pp. 60–71). Springer.
- Andersson et al. (1990) Andersson, A., Icking, C., Klein, R., & Ottmann, T. (1990). Binary search trees of almost optimal height. Acta Informatica, 28, 165–178.
- Bayer (1971) Bayer, R. (1971). Binary b-trees for virtual memory. In Proceedings of the 1971 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control (pp. 219–235).
- Bayer (1972) Bayer, R. (1972). Symmetric binary b-trees: Data structure and maintenance algorithms. Acta informatica, 1, 290–306.
- Bayer & McCreight (1970) Bayer, R., & McCreight, E. (1970). Organization and maintenance of large ordered indices. In Proceedings of the 1970 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control (pp. 107–141).
- Bayer & McCreight (1972) Bayer, R., & McCreight, E. (1972). Organization and maintenance of large ordered indexes. Acta Informatica, 1, 173–189.
- Besa & Eterovic (2013) Besa, J., & Eterovic, Y. (2013). A concurrent red–black tree. Journal of Parallel and Distributed Computing, 73, 434–449.
- Boyar & Larsen (1994) Boyar, J., & Larsen, K. S. (1994). Efficient rebalancing of chromatic search trees. Journal of Computer and System Sciences, 49, 667–682.
- Cormen et al. (2009) Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms. MIT press.
- Germane & Might (2014) Germane, K., & Might, M. (2014). Deletion: The curse of the red-black tree. Journal of Functional Programming, 24, 423–433.
- Guibas & Sedgewick (1978) Guibas, L. J., & Sedgewick, R. (1978). A dichromatic framework for balanced trees. In 19th Annual Symposium on Foundations of Computer Science (sfcs 1978) (pp. 8–21). IEEE.
- Howard & Walpole (2014) Howard, P. W., & Walpole, J. (2014). Relativistic red-black trees. Concurrency and Computation: Practice and Experience, 26, 2684–2712.
- Larsen (2002) Larsen, K. S. (2002). Relaxed red-black trees with group updates. Acta informatica, 38, 565–586.
- Okasaki (1999) Okasaki, C. (1999). Red-black trees in a functional setting. Journal of functional programming, 9, 471–477.
- Park & Park (2001) Park, H., & Park, K. (2001). Parallel algorithms for red–black trees. Theoretical Computer Science, 262, 415–435.
- Roura (2013) Roura, S. (2013). Fibonacci bsts: A new balancing method for binary search trees. Theoretical Computer Science, 482, 48–59.
- Sahni (1998) Sahni, S. (1998). Data structures, algorithms, and applications in C++.
- Sahni (2005) Sahni, S. (2005). Data structures, algorithms, and applications in Java. Universities Press.
- Sedgewick (2008) Sedgewick, R. (2008). Left-leaning red-black trees. In Dagstuhl Workshop on Data Structures (p. 17). URL: http://www.cs.princeton.edu/~rs/talks/LLRB/LLRB.pdf.
- Sen et al. (2016) Sen, S., Tarjan, R. E., & Kim, D. H. K. (2016). Deletion without rebalancing in binary search trees. ACM Transactions on Algorithms (TALG), 12, 1–31.
- Wayne & Sedgewick (2012) Wayne, K., & Sedgewick, R. (2012). Algorithms, part I. URL https://www.coursera.org/learn/algorithms-part1, .