The -tree [BLN13] is a compact data structure conceived to represent the adjacency matrix of Web graphs, but its functionality was later extended to represent other kinds of -ary relations such as ternary relations [AGdBBN16], point grids [BdBKNS16], raster data [dBABNP13], RDF stores [AGBFMPN14], temporal graphs [CdBFPS18], graph databases [AGFLP18], etc.
The -tree compactly represents an extension of a variant of the Quadtree data structure [Mor66], more precisely of the MX-Quadtree [Sam06, Section 18.104.22.168]. The MX-Quadtree splits the grid into four submatrices of . The root indicates which of the submatrices are nonempty of points, and a child of the root recursively represents each nonempty submatrix. In the -tree, the matrix is instead split into submatrices of cells. In dimensions, the structure becomes a -tree, where the grid is divided into submatrices of cells. The height of the tree is then .
Instead of using pointers to represent the tree topology, the -tree uses a long bitvector , where each node stores only bits indicating which of its submatrices are nonempty, and all the node bitvectors are concatenated level-wise into . Bitvector supports navigation towards children and parents in time [BLN13] by means of rank/select operations [Cla96, Mun96] on bitvector . Query operations like retrieving all the neighbors or the reverse neighbors of a node (when representing graphs) or retrieving all the points in a range (when representing grids) then translate into traversals on the -tree [BLN13].
In various applications one would like the relations to be dynamic, however, that is, elements (graph edges, grid points) can be inserted and deleted from the relation. Each such update requires flipping bits or inserting/deleting chunks of bits at each of the levels in . Such operations can be supported using a dynamic bitvector representation [BCPdBN17]. There exists, however, an lower bound to support updates and rank/select operations on a bitvector of length [FS89], and such slowdown factor multiplies every single operation carried out on the bitvector, both for traversals and for updates.
In this paper we take a different view of the -tree representation. We regard the -ary tree as a trie on the Morton codes [Mor66] of the elements stored in the grid. The Morton code (in two dimensions, but the extension is immediate) is the concatenation of the identifiers of the consecutive subgrids chosen by a point until it is inserted at the last level. We then handle a trie of strings of length over an alphabet of size . While such a view yields no advantage in the static case, it provides more efficient implementations in the dynamic scenario. For example, a succinct dynamic trie [ADR16] on the Morton codes requires space similar to our bitvector representation, but it is much faster in supporting the operations: time, and constant for practical values of and .
In this paper we implement this idea and show that it is not only theoretically appealing but also competitive in practice with the preceding dynamic-bitvector-based representation [BCPdBN17]. In our way, we define a new depth-first deployment for tries that, unlike the level-wise one [BLN13], cannot be traversed in constant time per edge. Yet, we show it turns out to be convenient in a dynamic scenario because we have to scan only small parts of the representation.
2 The -tree and its representation as a trie
Let us focus on the case and for simplicity; encompasses all the applications where we represent graphs, and the small value of is the most practical in many cases. Given points in an matrix , the -tree is a -ary (i.e., -ary) tree where each node represents a submatrix. Assume is a power of (i.e., of ) for simplicity. The root then represents the whole matrix . Given a node representing a submatrix , its children represent the submatrices (top-left), (top-right), (bottom-left), and (bottom-right), in that order, where and . Each of the submatrices of a node may be empty of points, in which case the node does not have the corresponding child. The node stores bits indicating with a that the corresponding matrix is nonempty, or with a that it is empty. The -tree is of height . See Figure 1.
for tree=inner sep=0pt,outer sep=-1pt [ 1 [ 1 [ 1  [ 1  ] [ 1  ]  ] [ 1 [ 1  ] [ 1  ]   ] [ 1 [ 1  ]   [ 1  ] ] [ 0 ] ] [ 0 ] [ 0 ] [ 1  [ 1 [ 1  ]  [ 1  ]  ]   ] ]
A simplified description of the compact -tree representation [BLN13] consists of a bitvector where the tree is traversed levelwise, left to right, and the bits of all the nodes are concatenated. Then, if the tree has nodes, the bitvector is of length , . Note that the nodes of depth correspond to cells, and therefore it is sufficient to store their bits; their children are not represented. Given points, the number of nodes of the -tree is [Nav16, Sec. 9.2].
Each -tree node is identified by the position of the first of the bits that describes its empty/nonempty children. To move from a node to its -th child, the formula is simply , where counts the number of 1s in and can be computed in time using space on top of [Cla96]. For example, we determine in time whether a certain point exists in the grid. Other operations require traversal of selected subtrees [BLN13].
A dynamic -tree [BCPdBN17] is obtained by representing as a dynamic bitvector. Now operation takes time [NS14], which is optimal [FS89]. This slows down the structure with respect to the static variant. For example, determining whether a point exists takes time . To insert a point , we must create its path up to the leaves, converting the first in the path to a and thereafter inserting groups of bits, one per level up to level . This takes time as well. Deleting a point is analogous.
Consider a point , which induces a root-to-leaf path in the -tree. If we number the submatrices described in the beginning of this section as 0,1,2,3, then we can identify with a sequence of symbols over the alphabet that indicate the submatrix chosen by at each level. In particular, note that if we write the symbols in binary, , , , and , then the row is obtained by concatenating the first bits of the levels, from highest to lowest bit, and the column is obtained by concatenating the second bits of the levels. The Morton code of is then obtained by interlacing the bits of the binary representations of and .
As a consequence, we can regard the -tree as the trie of the Morton codes of all the points, that is, a trie storing strings of length over an alphabet of size . The extension to general values of is immediate.
A recent dynamic representation [ADR16] of tries of nodes over alphabet requires bits. If is polylogarithmic in , it simulates each step of a trie traversal in time, and the insertion and deletion of each trie node in amortized time. Used on our Morton codes, with alphabet size , the tries use bits, exactly as the representation using the bitvector . Instead, they support queries like whether a given point exists in time , and inserting or deleting a point in amortized time , way faster than on the dynamic bitvector .
The general case.
With larger values of and , requires bits, and it may become sparse. By using sparse bitvector representations [OS07], the space becomes bits [Nav16, Sec. 9.2], but the time of operation rank becomes , and this time penalty factor multiplies all the other operations. A dynamic representation of the compressed bitvector [NS14] uses the same space and requires time for each operation. The space usage of the trie [ADR16] on a general alphabet of size is of the same order, bits, but the operations are supported in less time, (amortized for updates). The insertion or deletion of a point, which affects tree edges, then requires amortized time. We state this simple result as a theorem.
A dynamic tree can represent points on a -size grid within bits, while supporting the traversal, insertion, or deletion of each tree edge in time (amortized for updates). If , then the times are (also amortized for updates).
3 Implementation of the dynamic trie
We now define a practical implementation of succinct dynamic tries, for the particular case of -trees with . The whole trie is divided into blocks, each being a connected component of the trie. A block can have child blocks, so we can say that the trie is represented as a tree of blocks. Let us define values , such that , for , for a given parameter , and [AN11]. At any given time, a block of size is able to store at most nodes. If new nodes are added to such that the number of nodes exceeds , then is grown to have size , for , such that the new nodes can be stored. By defining the block sizes as we do, we ensure that the fill ratio of each block is at least ; for example, if , then every node is at least full, which means that the space wasted is at most .
Each block stores the following components:
: the tree topology of the connected component represented by the block. Every node in the trie is either an internal node, a leaf node, or a frontier node in some . The latter are seen as leaves in , but they correspond to trie nodes whose subtree is stored in a descendant block. We mark such nodes in and store a pointer to the corresponding child block, see next.
: a sorted array storing the preorder numbers of the frontier nodes.
: an array with the pointers to children blocks, in the same order of .
: the depth (in the trie) of the root of .
Unlike the classical -tree representation [BLN13, BCPdBN17], which deploys the nodes levelwise, we represent the tree topology in depth-first order. This order is compatible with our block layout and speeds up the insertion and deletion of points, since the bits of all the edges to insert or remove are contiguous.
In , each node is encoded using 4 bits, indicating which of its children are present. For instance, ‘0110’ encodes a node that has two children, labeled by symbols 1 and 2. Therefore, the total number of bits used to encode the trees is exactly the same as in the classical representations [BLN13, BCPdBN17].
We store using a simple array able to hold nodes. A node is identified by its index within this array. Figure 2 shows an example top block for the -tree of Figure 1 and our array-based depth-first representation. Depth-first numbers are shown along each node; these are also their indexes in the array storing . In the example, nodes with depth-first number 2 and 3 are frontier nodes; they are underlined in the array representation.
Apart from , each block then requires words to store and its corresponding entries in the arrays and in its parent block . This implies a maximum overhead of bits per node, assuming pointers of bits as in the transdichotomous RAM model of computation. Thus we have to choose for this overhead to be .
The depth-first order we use, however, corresponds more to the dfuds representation [BDMRRR05], whereas the classical levelwise deployment is analogous to a louds representation [Jac89]. An important difference is that, whereas the fixed-arity variant of louds is easy to traverse in constant time per edge, the dfuds representation requires more space [BDMRRR05, Nav16]: apart from the 4 bits, each node with children uses bits to mark its number of children.
As a consequence, our actual storage format cannot be traversed in constant time per edge. Rather, we will traverse the blocks sequentially and carry out all the edge traversals or updates on the block in a single left-to-right pass. This is not only cache-friendly, but convenient because we do not need to store nor recompute any sublinear-space data structure to speed up traversals [Cla96].
A complication related to our format is that, when traversing the tree, we must maintain the current trie depth in order to identify the leaves (these are always at depth ). Besides, as we traverse the block we must be aware of which are the frontier nodes, so as to skip them in the current block or switch to another block, depending on whether or not we want to enter into them.
This is the main operation needed for traversing the tree. Let yield the child of node by symbol (if it exists). Assume node belongs to block . For computing , we first check whether node is in the frontier of or not. To support this checking efficiently, we keep a finger on array , such that is the smallest value for which is greater or equal than the preorder of the current node in the traversal. Since we traverse in preorder, and is sorted, increasing as we traverse is enough to keep up to date. When the preorder of the current node exceeds , we increase . If , then node is in the frontier, hence we go down to block , start from the root node (which is itself stored in the child block), and set . Otherwise, is not a frontier node, and we stay in .
Determining whether the -th child of a node exists requires a simple bit inspection. If it does, we must determine how many children of (and their subtrees) must be skipped to get to . We store a precomputed table that, for every 4-bit pattern and each , indicates how many subtrees must be skipped to get the desired child. For instance, if is ‘1011’ and , this table tells that one child of must be skipped to get to the node labeled 2.
In our sequential traversal of , corresponding to a depth-first traversal of , we keep a stack (initially empty) with the number of children not yet traversed of the ancestors of the current node. We start looking for the desired child by moving to position , corresponding to the first child of in preorder. At this point, we push the number of children of this node into . The traversal is carried out by increasing an index on the array that stores . The key for the traversal is to know where in the tree one is at each step. As said before, we keep track of the current depth , to know when we arrive to frontier nodes. When traversing, we update as follows. Every time we move to the next node (in preorder), we increase only if (1) is not the maximum depth (minus 1, recall that the last level is not represented), (2) the current node is not a frontier node, or (3) the current node is the last child of its parent. We use to check the latter condition. Every time we reach a new node, we push in its number of children if the node is not of maximum depth (minus 1), and it is not a frontier node. Otherwise, we instead decrease the value at the top of the stack, since in both cases the subtree of the corresponding node has been completely traversed. When the top value becomes 0, it means that a whole subtree has been traversed. In such a case we pop , decrease the current depth , and decrease the new value at the top (if this also becomes 0, we keep repeating the process, decreasing and the top value).
Once the stack becomes empty again, we have traversed the subtree of the first child. We repeat the same process from the current node, skipping as many children of as needed.
To insert a point , we use the corresponding Morton code , for strings and to navigate the trie, until we cannot descend anymore. Assume that we have been able to get down to a node (stored in block ) that represents string , and at this node we have failed to descend using the first symbol of . Then, we must insert string in the subtree of node . If the block has enough space for the new nodes, we simply find the insertion point from (skipping subtrees as explained above), make room for the new nodes, and write them sequentially using a precomputed table that translates a given symbol of to the 4-bit pattern corresponding to the unary node for that symbol. We also store a precomputed table that, given the encoding of and the first symbol of string , yields the new encoding for .
If, on the other hand, the array used to store has no room for the new nodes, we proceed as follows. If the array is currently able to store up to nodes, we reallocate it to make it of size , for the smallest such that holds. If, otherwise, , or , we must first split to make room.
To minimize space usage, the splitting process should traverse to choose the node such that splitting at generates two trees whose size difference is minimum. We combine this criterion, however, with another one that optimizes traversal time. As explained, an advantage of our method is that we can traverse several edges in a single left-to-right scan of the block. Such scan, however, ends when we have to follow a pointer to another block. We try, therefore, to have those pointers as early as possible in the block so as to minimize the scan effort spent to reach them. Our splitting criterion, then, tries first to separate the leftmost node in the block whose subtree size is 25%–75% of the total block size.
After choosing node , we carry out the split by generating two blocks, adding the corresponding pointer to the new child block, and adding as a frontier node (storing its preorder in and its pointer in ).
Increasing the size of deeper blocks.
A way to reduce the cost of traversing the blocks sequentially is to define a small maximum block size . The cost is that this increases the space usage, because more blocks will be needed (thus increasing the number of pointers, and hence the space, of the data structure). We have the fortunate situation, however, that the most frequently traversed blocks are closer to the root, and these are relatively few. To exploit this fact, we define different maximum block sizes according to the depth of the corresponding block, with smaller maximum block sizes for smaller depths. We define parameters such that blocks whose root has depth at most have maximum block size , blocks whose root has depth at most have maximum block size , and the remaining blocks have maximum size , for . In this way, we aim to reduce the traversal cost, while using little space at deeper blocks. Pushing this idea to the extreme, we may set , equivalent to allowing that the top part of the tree be represented with explicit pointers.
Theorem 2.1 builds on a highly theoretical result [ADR16], thus our engineered structure obtains higher time complexities. In our implementation, each operation costs time, which we set close to to obtain the same space redundancies of dynamic bitvectors. In turn, the implementation of dynamic bitvectors [BCPdBN17] takes time per basic operation (edge traversal or update). An advantage of our implementation is that, during the -time traversal of a single block, we may process several -tree edges, but this is not guaranteed. As a result, we can expect that our implementation be about as fast as the dynamic bitvectors or significantly faster, depending on the tree topology. Our experiments in the next section confirm these expectations.
4.1 Experimental setup
We experimentally evaluate our proposal comparing it with the dynamic
-tree implementation based on dynamic bit vectors[BCPdBN17], to demonstrate the comparative performance of our technique. Other dynamic trie implementations exist [AS10, BBV10, K17] that are designed for storing general string dictionaries, and could store the points using their Morton codes. However, these techniques usually do not compress and require space comparable to the original collection of strings; moreover, even if they are more efficient to search for a single element, they lack the ability to answer more complex queries, such as row/column queries, through a single traversal of the tree, that is required in -tree representations.
We use four different datasets in our experiments. Their basic information is described in Table 1. The graphs indochina and uk are Web graphs111http://law.di.unimi.it/datasets.php, known to be very sparse and compressible. The datasets triples-med and triples-dense are selected predicates of the DBPedia 3.5.1222https://wiki.dbpedia.org/services-resources/datasets/data-set-35/ data-set-351, transformed through vertical partitioning as in previous work [AGBFMPN14]; they are also sparse matrices but much less regular, and more difficult to compress.
Four our structure we use and the following configuration parameters: (i.e., we use explicit pointers in the first few levels of the trie), , and use varying , from 256 to 1024. We show the tradeoff using values of 8 and 12, and values of from 10 to 16 depending on .
For the approach based on dynamic bitvectors (dyn-bitmap), we show results of the practical implementation with the default setup (block size 512 and in the first 3 levels of decomposition) and, when relevant, another configuration with smaller block size 128 and in the first 5 levels.
We run our experiments in a machine with 4 Intel email@example.comGHz cores and 8GB RAM, running Ubuntu 16.04.6. Our code is implemented in C++ and compiled with g++ 5.5.0 using the -O9 optimization flag.
In order to test the compression and performance of our techniques, we start by building the representations from the original datasets. To do this, we shuffle the points in the dataset into a random order, and insert them in the structures one by one. Then, we measure the average insertion time during construction of the complete dataset, as well as the space used by the structure after construction.
Figure 3 displays insertion times during construction and final space for all the datasets and tested configurations. The results show that in Web graphs (indochina and uk) our representations can be created significantly faster than the dynamic bitvectors while requiring negligible additional space, for example 20–25% faster using 3% more space. Moreover, our representations provide a wide space-time tradeoff that the technique based on dynamic bitvectors does not match (in Web graphs we only show results for the default configuration of dyn-bitmap, because the configuration with smaller blocks is both larger and slower). The configuration to achieve this tradeoff is also quite intuitive: larger(smaller) blocks in the lower levels lead to slower(faster), but more(less) compact structures.
In the RDF datasets (triples-med and triples-dense), our structures are even more competitive, using far less space and time than the dynamic bitvectors. In triples-med, our structures are 2.5 times faster when using similar space, or use 25% less space for the same speed. In triples-dense we are about 5 times faster when using the same space, and still 3 times faster than dynamic bitvectors when using 20% less space. Notice that the main difference between RDF and Web graph datasets is the regularity and clusterization of the points in the matrix, which is much higher in Web graphs than in RDF datasets. This also explains the worse space results achieved in these datasets compared to Web graphs. A similar difference in regularity exists between triples-med and triples-dense, where the latter is much more difficult to compress.
Next, we measure the average query times to retrieve a point. To do this, we again select the points of each collection in random order, limiting our selection to 100 million points in the larger datasets, and measure the average query time to search for each of them. Figure 4 displays the query times for these cell retrieval queries. Results are analogous to those of insertion times. In Web graphs, our tries obtain even better performance compared to dynamic bitvectors. In RDF datasets the times are slightly closer but our tries still outperform dynamic bitvectors in space and time: In triples-med tries are 70% faster when using the same space, or 20% smaller when taking the same time. In triples-dense tries are 4 times faster when using the same space, and 3 times faster when using 20% less space.
We also perform tests querying for 100 million randomly selected cells. In practice, most of these cells will not belong to the collection, and they will probably be relatively far from existing points, hence allowing the structures to stop the traversal in the upper levels of the tree. These kind of queries are much faster and almost identical for all the trie configurations tested in each dataset. In Web graphs, the dynamic bitvectors obtain better query times in Web graphs for these queries (0.4–0.6s/query in indochina and uk, while our tries take around 0.6–0.7 and 0.75–0.95 s/query, respectively). In RDF datasets, our tries are still significantly faster (around 0.55–0.6 s/query in both datasets, whereas dynamic bitvectors take 1.1–1.2 s/query in triples-med and 1.5–1.9 s/query in triples-dense). This points to the depth of the tree search as a relevant factor in query complexity: our tries seem to have more stable query times, and are faster in queries that involve traversal of the full tree depth. In Web graphs, where points are usually clustered, non-existing points are detected in upper levels of the tree, and query times are usually better. In the RDF datasets, where points are more randomly distributed, the depth of the search is expected to be higher on average even if the dataset is still very sparse.
Regarding the -tree as a trie on the Morton codes of the points it represents yields a new view that differs from the classical one based on bitvectors [BLN13]. We have shown that this makes an important difference in the dynamic scenario, because dynamic tries can break lower bounds on maintaining dynamic bitvectors. Apart from the theoretical result, we have implemented a dynamic trie specialized in representing -trees, where the trie is cut into a tree of blocks, each block representing a connected component of the trie. The dynamic trie uses a depth-first search deployment of the trie, unlike the classical level-wise deployment. While this format cannot be traversed in constant time per trie edge, it is convenient for a dynamic trie representation because it is consistent with the tree of blocks, update operations require local changes, a single left-to-right block scan processes several downward edge traversals, and such scan is cache-friendly and does not require rebuilding any speed-up data structure.
Our experimental results show that our representation significantly outperforms the one based on dynamic bitvectors [BCPdBN17] on some datasets, in space, time, or both, depending on the nature of the dataset.
In the final version we will include experiments on other operations like extracting all the neighbors of a node. A future goal is to explore applications of our dynamic -tree representation, in particular for graph databases [AGFLP18].