Access-Adaptive Priority Search Tree

09/04/2020 ∙ by Haley Massa, et al. ∙ 0

In this paper we show that the priority search tree of McCreight, which was originally developed to satisfy a class of spatial search queries on 2-dimensional points, can be adapted to the problem of dynamically maintaining a set of keys so that the query complexity adapts to the distribution of queried keys. Presently, the best-known example of such a data structure is the splay tree, which dynamically reconfigures itself during each query so that frequently accessed keys move to the top of the tree and thus can be retrieved with fewer queries than keys that are lower in the tree. However, while the splay tree is conjectured to offer optimal adaptive amortized query complexity, it may require O(n) for individual queries. We show that an access-adaptive priority search tree (AAPST) can provide competitive adaptive query performance while ensuring O(log n) worst-case query performance, thus potentially making it more suitable for certain interactive (e.g.,online and real-time) applications for which the response time must be bounded.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many applications demand the efficient satisfaction of key-retrieval queries from a dynamically-maintained search structure (database) of keys. In many of these applications certain keys are queried much more frequently than other keys, and this nonuniform sampling from the set of keys can potentially be exploited by a distribution-sensitive search structure to surpass the comparison-based theoretical lower bound on the expected number of comparisons per query required in the uniform case.

The splay tree, developed by Daniel Sleator and Robert Tarjan [4], is a self-adjusting [1] binary search tree that optimizes its structure to the distribution patterns of the dataset. Splay trees differ from standard balanced BSTs by performing rotations that migrate frequently-accessed keys to the top of the tree so that the search paths to those keys will be shorter when accessed during future queries. A novelty of the splay tree is that it does not necessarily enforce balance at all times during a given sequence of updates and/or queries, but it does guarantee that the complexity of any given sequence is . The value of the splay tree as an access-sensitive search structure is that it can offer sequence time complexity approaching if the access distribution of keys is highly nonuniform. By contrast, a standard balanced BST (e.g., AVL, red-black, etc. [3]) provides no access-distribution sensitivity and thus can be expected to require time to perform a sequence of operations. A natural question is whether it is possible to combine the access-sensitive properties of the splay tree with the efficient worst-case properties of a balanced BST.

In this paper we show that the priority search tree [2], which was published in the same year as the splay tree (1985), can be applied to achieve adaptive query performance that is competitive with the splay tree while providing superior worst-case optimal update and query complexity. This access-adaptive priority search tree (AAPST) is described in the following section. We then provide practical comparisons of the AAPST and splay tree in the form of simulation results with varying degrees of nonuniformity in the sampling of query keys.

Ii Adaptive Priority Search Tree

The priority search tree (PST) is a data structure introduced by Edward McCreight in 1985 with the objective of storing a set of points in in a way that allows for update complexity, i.e., insertion or deletion of a point, and complexity for semi-infinite 2-dimensional range queries where is the number of returned objects. This complexity is achieved by maintaining the points simultaneously in BST order on the coordinates and heap order on the coordinates within the same binary tree structure. The data structure allows for five main operations on a dataset

of ordered pairs to be performed efficiently:

  1. Insert an ordered pair into .

  2. Delete an ordered pair from .

  3. Given integers , , and , among all the pairs in such that and , find the pair whose is minimal.

  4. Given integers and , among all the pairs in such that , find the pair whose is minimal.

  5. Given integers , , and , enumerate all pairs in such that and .

The priority search tree was the first data structure to support 2-dimensional spatial search queries within the same complexity of 1-dimensional range queries offered by balanced binary search trees (BSTs) while also supporting update operations. Specifically operations 1-4 have time complexity and operation 5 has complexity.

Each node in a priority search tree contains exactly one ordered pair . A maximum PST is constructed so that the -value of every child node is less than or equal to that of its parent whereas in a minimum PST the -value of a child node is greater than or equal to it parent’s -value. In both cases, the -value of every node in a right subtree is strictly less than that of every node in the left subtree. Furthermore, the cardinality of a node’s right subtree is equal to or one less than the node’s left subtree, ensuring balance.

While the priority search tree was originally created to store two-dimensional coordinates, we introduce here an alternative use of the data structure. With some slight construction and operation alterations, the priority search tree can be used as a distribution-sensitive search structure. We will call this specialized structure an access adaptive priority search tree (AAPST). The AAPST stores each key of a dataset in the x-value of a node and its respective access frequency as the y-value. In other words, the skeleton of the tree maintains a BST ordering of the keys while the access-freqencies associated with the keys are maintained in heap order. By slightly altering the search algorithms of a regular priority search tree, any search key can be found (or not found) in an AAPST in time. The principal change to the standard PST is the incrementing of the priorities (access frequencies) associated with the keys. Specifically, when a key is accessed by either an update or a query, its access count is incremented and the key’s position in the heap may then also be incremented. More specifically, a modified access/query algorithm can be defined as follows:

  1. Find the query key and increment its associated priority.

  2. If the incremented priority does not exceed the priority of the key-priority pair in its parent node then return.

  3. Else delete the pair and reinsert using the standard PST update algorithms.

Step 1 takes time proportional to the pair’s depth in the tree, and this will be the complexity of the operation in all cases in which the updated priority does not affect the heap order; otherwise the complexity of the operation will be dominated by the complexities of the standard PST update algorithms. This establishes the worst-case complexity of the new adaptive query algorithm.

Iii Comparative Performance Results

In this section we examine the relative performance characteristics of a conventional balanced binary search tree (BST), a splay tree, and the AAPST. In the case of uniformly sampled query keys, the splay tree and AAPST incur extra overhead compared to the BST. In the case of the splay tree, this overhead takes the form of extra comparisons performed as the tree is restructured. In the case of the AAPST, the overhead takes the form of an extra key comparison per node visited: one comparison to the key stored at each node according to heap order, and another comparison to the pivot key stored at each node for use in traversing the tree according to BST order. Therefore, the goal of our tests is to examine how relative number of key comparisons used during the search of each data structure is affected by the distribution of queried keys. We should expect the BST to be superior in the uniform case while the splay tree and AAPST should perform better as the distribution becomes increasingly nonuniform.

We define a value , , to parameterize our key-access testing distributions with representing a uniform random distribution of key accesses;

representing an exponentially-distributed sequence of accesses with the most frequently-accessed key representing approximately

of the accesses, the next representing of the accesses, etc., such that of the keys comprise of the accesses; and representing a weighted mixture of accesses from the two distributions.

As can be seen in Figure 1, the splay and AAPS trees perform comparably but are outperformed by the BST because of its lower overhead. In other words, the overhead of adaptivity incurred by the splay and AOPS trees does not yield dividends in the case of keys that are queried uniform-randomly. Figure 2 shows that the relative performance advantage of the BST decreases when of the key accesses exponentially distributed access frequencies. Figure 3 shows that when there is an equal mix of keys sampled from the uniform and exponential distributions the three search structures perform comparably. In the case of all keys sampled exponentially, Figures 4 and 5 show that the distribution-sensitivity properties of the splay and AOPS trees provide a significant performance advantage over the BST as the access frequencies tend toward an exponential distribution.

Fig. 1:

This figure shows the average number of key comparisons for query keys drawn from uniform distribution for datasets of increasing size

. The expected number of key comparisons performed by the BST is while the splay and AOPS trees perform roughly twice as many comparisons per query.
Fig. 2: This figure shows the average number of key comparisons when the of the query keys are sampled with exponential frequency and the remaining are sampled uniformly.
Fig. 3: This figure shows the average number of key comparisons when half of the query keys are sampled with exponential frequency and the remaining half are sampled uniformly.
Fig. 4: This figure shows the average number of key comparisons when of the query keys are sampled according to an exponetial distribution.
Fig. 5: This figure shows the average number of key comparisons when all query keys are sampled according to an exponential distribution, i.e., the frequency of access of different keys decreases exponentially.

Iv Discussion

The principal contribution of this paper is the demonstration that the classical priority search tree of McCreight can be reinterpreted so that instead of storing 2-dimensional points it is adapted for the access-sensitive storage and retrieval of 1-dimensional keys. Our simulation results show that the new access-adaptive priority search tree (AAPST) offers comparable access-sensitive performance to the splay tree while bounding the complexity of each operation, a property which is needed for interactive applications that must impose strict constraints on the worst-case response time of each operation. Future work will examine finer-grain performance characteristics of the AAPST and their relevance to practical applications.

References

  • [1] B. Allen and I. Munro, “Self-organizing search trees,” Journal of the ACM, 25 (4): 526–535, 1978.
  • [2] McCreight, Edward, “Priority search trees,” SIAM Journal on Scientific Computing, 14 (2): 257–276, 1985.
  • [3] Sedgewick, Robert, “Balanced Trees,” Algorithms, Addison-Wesley, 1983.
  • [4] Sleator, Daniel D.; Tarjan, Robert E., “Self-Adjusting Binary Search Trees,” Journal of the ACM, 32 (3): 652–686. 1985.