In the -nearest neighbors (-NN) problem we are given a set of point sites in some domain, and we wish to preprocess these points such that given a query point and an integer , we can find the sites in ‘closest’ to efficiently. This static problem has been studied in many different settings [Andoni08, Chan00, Chan16, lee1982kthorder_vd, Liu20]. We study the dynamic version of the -nearest neighbors problem, in which the set of sites may be subject to updates; i.e. insertions and deletions. We are particularly interested in two settings: (i) a setting in which the domain containing the sites contains (polygonal) obstacles, and in which we measure the distance between two points by their geodesic distance: the length of the shortest obstacle avoiding path, and (ii) a setting in which only insertions into are allowed (i.e. no deletions).
In many applications involving distances and shortest paths, the entities involved cannot travel towards its destination in a straight line. For example, a person walking through the city center may want to find the closest restaurants that currently have seats available. However, he or she cannot walk through walls, and hence, we need to explicitly incorporate such obstacles into the problem. This introduces additional complications as a single shortest path in a polygon with vertices may have complexity , and thus it requires time to compute such a path. We wish to limit the dependency on in the space, query, and update times of our data structure as much as possible. In particular, we want to avoid having to spend time and space when we insert or delete a new site in . In terms of the above example: we wish to avoid having to spend time every time some seats open up causing a restaurant to become available.
The second setting is motivated by classification problems. In
-nearest neighbor classifiers the sites inall have a label, and the label of some query point is predicted based on the labels of the sites nearest to [cover1967nearest]. When this label turns out to be sufficiently accurate, it is then customary to then extend the data set by adding to . Hence, this naturally leads to the question whether there is an insertion-only data structure that can efficiently answer -nearest neighbor queries.
The static problem.
If the set of sites is static, and is known a-priori, one option is to build the (geodesic) -order Voronoi diagram of [Liu13]. This yields very fast query times, where is the complexity of the domain , however it is costly in space, as even in a simple polygon the diagram has size . Moreover, needs to be known a-priori. In the scenario where the domain is the Euclidean plane, much more space efficient solutions have been developed. There is an optimal linear space data structure achieving query time after deterministic preprocessing time [Afshani09, Chan16]. Very recently, Liu showed how to achieve the same query time for general constant-complexity distance functions for arbitrary sites in , using space [Liu20] and roughly expected preprocessing time (the exact bound depends on the algebraic degree of the functions). In case is a simple polygon with vertices, the problem has not explicitly been studied. The only known solution using less space than just storing the -order Voronoi diagram is the fully dynamic -NN structure of Agarwal et al. [Staals18]. It uses space, and answers queries in time (by deleting and reinserting the -closest sites to answer a query).
Issues when inserting sites.
Since nearest neighbor searching is decomposable, we can apply the logarithmic method [Overmars83] to turn a static -NN searching data structure into an insertion-only data structure. For example in the Euclidean plane this yields a linear space data structure with insertion time. However, since this partitions the set of sites into subsets, and we do not know how many of the -nearest sites appear in each subset, we may have to consider up to sites from each of the subsets, thus yielding an term in the query time. In Section 3 we will present a general technique that allows us to avoid this additional factor.
Fully dynamic data structures.
In case we wish to support both insertions and deletions the problem becomes more complicated, and the existing solutions much more involved. When we again consider the plane, and we wish to report only one nearest neighbor (i.e. -NN searching), several efficient fully dynamic data structures exist [Chan10, Chan19, Kaplan17]. Actually, all these data structures are variants of the same data structure by Chan [Chan10]. For the Euclidean distance, the current best result using linear space achieves worst-case query time, insertion time, and deletion time [Chan19]. These results are deterministic, and the update times are amortized. The variant by Kaplan et al. [Kaplan17] achieves similar results for general distance functions: space, worst-case query time, and expected amortized update time. Using recent results on shallow cuttings by Liu [Liu20] the space can be reduced to . These data structures can also be used to answer -NN queries, but when used in this way essentially suffer from the same problem as in the insertion-only case. That is, we get a query time of time [Chan10].
For the Euclidean case, Chan argues that the above data structure for
-NN searching can be extended to obtain
query time, while still retaining
polylogarithmic updates [ Chan12-kNN
Chan12-kNN]. Chan’s data structure essentially maintains a collection of -NN data structures built on subsets of the sites. A careful analysis shows that some of these structures can be rebuilt during updates, and that the cost of these updates is not too large. Queries are then answered by performing -NN queries on several disjoint subsets of sites that together are guaranteed to contain the nearest sites. However –perhaps because the details of the -NN searching data structure are already fairly involved– one aspect in the query algorithm is missing: how to determine the value to query subset with. While it seems that this issue can be fixed using randomization [Chan21Fix]111The main idea is that the data structure as is can be used to efficiently report all sites within a fixed distance from the query point (reporting all planes below a query point in ). Combining this with an earlier random sampling idea [Chan00] one can then also answer -NN queries., our general -NN query technique (Section 3) allows us to recover deterministic, worst-case query time.
Very recently, Liu [Liu20] stated that one can obtain query time while supporting expected amortized updates also for general distance functions by using the data structure of Kaplan et al. [Kaplan17]. However, it is unclear why that would be the case, as all details are missing. Using the Kaplan et al. data structure as is yields an term in the query time as with Chan’s earlier version [Chan10]. If the idea is to also apply the ideas from Chan’s later paper [Chan12-kNN] the same issue of choosing the ’s appears. Similarly, extending the geodesic -NN data structure [Staals18] to -NN queries yields query time.
Organization and Results.
We develop dynamic data structures for -NN queries whose query time are of the form , where is some function of . In particular, we wish to avoid an term in the query time. To this end, we present a general query technique that given disjoint subsets of sites , each stored in a static data structure that supports -NN queries in time, can report the nearest neighbors among in time. Our technique, presented in Section 3, is completely combinatorial, and is applicable to any type of sites. In Section 4, we then use this technique to obtain a -NN data structure that supports queries in time and insertions in time, where is the time required to build the static data structure. This result again applies to any type of sites. In the specific case of the Euclidean plane, we obtain a linear space data structure with query time and insertion time. At a slight increase of insertion time, we can also match the query time of Chan’s [Chan12-kNN] fully dynamic data structure. For general, constant complexity, algebraic distance functions we obtain the same query and insertion times (albeit the insertion time holds in expectation). In the case where the sites are points inside a simple polygon with vertices, we use our technique to obtain the first static -NN data structure that uses near-linear space, supports efficient (i.e. without the term) queries, and can be constructed efficiently. We now do get an term in the query time, as computing the distance between a pair of points already takes time. Our data structure uses space, can be constructed in time, and supports time queries. In turn, this then also leads to a data structure supporting efficient, time, insertions.
In Section 5 we argue that our general query algorithm is the final piece of the puzzle for the fully dynamic case. For the Euclidean plane, this allows us to recover the deterministic, worst-case query time claimed before [Chan12-kNN, Liu20]. Insertions take amortized time, whereas deletions take time. We obtain the same query time in case of constant degree algebraic distance functions. Updates now take amortized expected time (see Section 5.5 for the exact bounds).
For the geodesic case there is one final hurdle to take. Chan’s algorithm uses partition-tree based “slow” dynamic -NN query data structure of linear size as one of its subroutines (see Section 5.1 for details). Liu uses a similar trick (after appropriately linearizing the distance functions into for some constant ) in his static -NN data structure [Liu20]. Unfortunately, this idea is not applicable in the geodesic setting, as it is unknown if an appropriate (shallow) simplicial partition exists, and the geodesic distance function cannot be linearized into a constant dimensional space (the dimension would need to depend on ). Instead, we design a simple, shallow-cutting based, alternative “slow” dynamic -NN structure, that does extend to the geodesic setting. This way, we end up with an efficient (i.e. expected updates, queries) fully dynamic -NN data structure.
We can easily transform a -nearest neighbors problem in to a -lowest functions problem in by considering (the graphs of) the distance functions of the sites . We discuss these problems interchangeably, furthermore we identify a function with its graph.
2.1 Shallow cuttings
Let be a set of bivariate functions. We consider the arrangement of in . The level of a point is defined as the number of functions in that pass strictly below . The at most -level is then the set of points in that have level at most .
A -shallow cutting of is a set of disjoint cells covering , such that each cell intersects at most functions [m-rph-92]. When is clear from the context we may write rather than . We are interested only in the case where the cells are (pseudo-)prisms: constant-complexity regions that are bounded from above by a function, from the sides by vertical (with respect to the -direction) planes, and unbounded from below. For example, if is a set of planes, we can define the top of each prism to be a triangle. This allows us to find the prism containing a query point by a point location query in the downward projection of the cutting. See Figure 2. The subset intersecting a prism is the conflict list of . When, for every subset , the lower envelope has linear complexity (for example, in the case of planes), a shallow cutting of size (the number of cells) can be computed efficiently [Liu20]. In general, let be the time to construct a -shallow cutting of size on functions, and be the time to locate the prism containing a query point. We assume these functions are non-decreasing in and non-increasing in , and that , for some function .
2.2 A dynamic nearest neighbor data structure
We briefly discuss the main ideas of the dynamic nearest neighbor data structure by Chan [Chan10, Chan19] that was later improved by Kaplan et al. [Kaplan17], as this also forms a key component in our fully dynamic -NN data structures. For a more detailed description we refer to the original papers. For ease of exposition, we describe the data structure when is a set of linear functions (planes). To make sure the analysis is correct for our definition of (the current number of points in ), we rebuild the data structure from scratch whenever has doubled or halved. The cost of this is subsumed in the cost of the other operations [Chan10].
The data structure consists of “towers” , for some fixed . Each tower consists of a hierarchy of shallow cuttings that is built on a subset of planes . For we have , and a sequence of shallow cuttings, for a fixed constant . For we have a -shallow cutting of a subset of the planes , where . We set and construct these cuttings from to . After computing , we find the set of “bad” planes that intersect more than prisms in all cuttings computed so far. We prune these planes by setting and removing all planes in from the conflict lists of the prisms in . Note that these bad planes are removed only from the conflict lists of the current cutting, and can still occur in conflict lists of higher level cuttings. In the final cutting, each conflict list has a constant size of . We denote by the set of all bad planes generated during this process. By we denote the set of planes that have not been pruned during the process, so . We then set and recursively build on the functions in . This partitions into sets . When insertions and deletions take place, planes can move from a set to some , but the property that these sets form a partition of will be preserved. Kaplan et al. prove the following lemma on the size of after the preprocessing phase:
[Lemma 7.1 of [Kaplan17]] For any there exists a sufficiently large (but constant) choice of , such that after building .
When , we get towers, for some fixed , as desired. According to Kaplan et al. [Kaplan17], this is achieved by choosing , for some constant . Thus a plane occurs times in a tower. Here, we first consider only the case where .
To build a single tower, naively we would need to compute shallow cuttings, each of which takes time. By using information of previously computed cuttings, Chan [Chan19] recently achieved an overall construction time of . The preprocessing time of the entire data structure thus adheres to the recurrence . This solves to .
To insert a plane into , we insert it into . When we insert a function into we assign it to , and thus recursively insert it in . When reaches we rebuild the towers . The first tower, , is built on the planes , and the following towers are again built recursively on the new sets . Only after insertions can such a rebuild occur. The insertion time is thus given by the recurrence , where is the time to build the data structure on a set of planes. Using , this results in an amortized insertion time of .
Deletions are not performed explicitly on the conflict lists. Instead, for each prism we keep track of the number of planes in that have been deleted so far, denoted by . When deleting a plane , we increase for all prisms with , and remove from the set that includes . When too many planes in a conflict list have been deleted, we purge the prism. In particular, we purge a prism in a -shallow cutting when . When a prism in is purged, we mark it as such, and we reinsert all planes . These planes are effectively moved from to some other . Note that we only reinsert planes that have not been deleted so far. This scheme ensures a prism is only purged after at least deletions, and this causes at most reinsertions. Thus, each increment of causes amortized reinsertions. This gives an amortized deletion time of .
Nearest neighbor queries.
When answering a nearest neighbor query for a query point , we simply find the prism containing in the lowest () cutting of each by a point location query in time. We then go through each conflict list (of constant size) to find the plane that is lowest at . If the plane we find for is not in , we ignore the result. When a prism has been purged, we simply skip it. Finally, we return the plane that is lowest among the planes that are found. As we perform point location queries, the query time is .
-nearest neighbors queries.
Answering -nearest neighbors queries using this dynamic 1-NN data structure is straightforward. For each tower we consider the prism containing of the shallow cutting at level , for some large enough constant . The size of the conflict list of each of these prisms is , thus we can find the -lowest live planes in each conflict list in time. Chan [Chan10] proves that it is indeed sufficient to consider only planes in these conflict lists. This query algorithm has a running time of .
Liu [Liu20] recently claimed the data structure, in particular the version of Kaplan et al. [Kaplan17], supports -NN queries in . However, we see an issue with this approach. When a plane is pruned during the preprocessing, or when a prism is purged, the plane is only removed from the conflict lists of the current shallow cutting. It can thus still occur in other shallow cuttings in the hierarchy. This means that we can encounter the same plane multiple times when querying each tower for the -lowest planes. See Figure 2 for an illustration. As there are towers, this yields an term in the query time, which matches the bound given by Chan [Chan10].
General distance functions.
Kaplan et al. [Kaplan17] showed how to adapt Chan’s data structure to support more general shallow cutting algorithms. The main differences between their data structure and the one from Chan is that planes are only pruned when they appear in conflict lists. Kaplan et al. essentially prove the following lemma.
[Kaplan et al. [Kaplan17]] Given an algorithm that constructs a -shallow cutting of size on functions in time, such that the prism containing a query point can be located in time, we can construct a data structure of size that dynamically maintains a set of at most functions . Reporting the lowest function at a query point takes time, inserting a function in takes amortized time, and deleting a function from takes amortized time.
From now on we consider the general variant where there data structure consists of towers. The following lemma describes the properties of the -NN data structure we need to construct our general fully dynamic -nearest neighbors data structure.
Let be any fixed value and be the size of -shallow cutting. There is a dynamic nearest neighbor data structure that has the following properties.
The data structure consists of towers.
A function occurs times in a conflict list in a single tower.
The insertion time is , where is the preprocessing time.
A deletion causes amortized reinsertions.
To find the -NN of a query point it is sufficient to consider the prisms containing of the shallow cuttings at level , for some large enough constant .
3 Querying multiple k-NN data structures simultaneously
In this section we introduce a method to find the -nearest neighbors of a query point spread over (disjoint) -NN data structures storing a set of sites simultaneously. Suppose the query time of such a -NN data structure is , for a non-decreasing function . Naively, querying each data structure for the closest sites would take time. Our method allows us to find the -NN over all these data structures in time instead, thus reducing the term to .
3.1 Query algorithm
We use the heap selection algorithm of Frederickson [Frederickson93] to answer -NN queries efficiently. This algorithm finds the smallest elements of a binary min-heap of size in time by forming groups of elements, called clans, in the original heap. Representatives of these clans are then added to another heap, and smaller clans are created from larger clans and organised in heaps recursively. For our purposes, we need to consider (only) how clans are formed in the original heap, because we do not construct the entire heap we query before starting the algorithm. Instead, the heap is expanded during the query when necessary. See Figure 3 for an example. Note that any (non-root) element of the heap will only be included in a clan by the Frederickson algorithm after its parent has been included in a clan.
The heap , on which we call the heap selection algorithm, contains all sites exactly once, with the distance as key for each site. Let be the partition of into disjoint sets, where is the set of sites stored in the -th -NN data structure. For each set of sites , , we define a heap containing all sites in . We then “connect” these heaps by building a dummy heap of size that has the roots of all as leaves. We set the keys of the elements of to . Let be the complete data structure (heap) that we obtain this way, see Figure 4. It follows that we can now compute the sites closest to by finding the smallest elements in the resulting heap and reporting only the non-dummy sites.
What remains is how to (incrementally) build the heaps while running the heap selection algorithm. Each such heap consists of a hierarchy of subheaps , such that every element of appears in exactly one . Moreover, since the sets are pairwise disjoint, this holds for any , i.e. appears in exactly one . The level 1 heaps, , consist of the sites in closest to , which we find by querying the static data structure of . The subheap at level is built only after the last element of is included in a clan, i.e. is considered by the heap selection algorithm. We then add a pointer from to the root of , such that the root of becomes a child of , as in Figure 3.
To construct a subheap at level , we query the static data structure of using . The new subheap is built using all sites returned by the query that have not been encountered earlier. It follows that all elements of are larger than any of the elements in . Thus, the heap property is preserved.
3.2 Analysis of the query time
As stated before, finding the -smallest non-dummy elements of takes time [Frederickson93]. In this section, we analyse the time used to construct .
First, the level 0 and level 1 heaps are built. Building takes only time. To build the level 1 heaps, we query each of the substructures using . In total these queries take time. Retrieving the next elements to build for requires a single query and thus takes time. To bound the time used to build all heaps at level greater than 1, we first prove the following lemma.
The size of a subheap , , at level is exactly .
To create , we query the static data structure of to find the sites closest to . Of these sites, only the ones that have been not been included in any of the lower level subheaps are included in . The sites previously encountered are exactly the sites returned in the previous query. It follows that . ∎
Building takes time. To pay for this, we charge to each element of . Because we choose , Lemma 3.2 implies that , and that . Note that the heap , , is only built when all elements of have been included in a clan. Thus, we only charge elements of the heaps of which all elements have been included in a clan (shown blue in Figure 4). In total, elements (not in ) are included in a clan, so the total size of these subheaps is . From this, and the fact that all subheaps are disjoint, it follows that we charge to only sites. We then have:
Let be disjoint sets of point sites of sizes , each stored in a data structure that supports -NN queries in time. There is a -NN data structure on that supports queries in time. The data structure uses space, where is the space required by the -NN structure on .
Throughout this section, we used the standard assumption that for any two points their distance can be computed in constant time. When evaluating takes time, our technique achieves a query time of by setting and charging to each site of to pay for building .
4 An insertion-only data structure
We describe a method that transforms a static -NN data structure with query time into an insertion-only -NN data structure with query time . Insertions take time, where is the preprocessing time of the static data structure, and is its space usage. We assume , , and are non-decreasing.
To support insertions, we use the logarithmic method [Overmars83]. We partition the sites into groups with for . To insert a site , a new group containing only is created. When there are two groups of size , these are removed and a new group of size is created. For each group we store the sites in the static -NN data structure. This results in an amortized insertion time of . This bound can also be made worst-case [Overmars83]. The main remaining issue is then how to support queries in time, thus avoiding an term in the query time. Applying Lemma 3.2 directly solves this problem, and we thus obtain the following result.
Let be a set of point sites, and let be a static -NN data structure of size , that can be built in time, and answer queries in time. There is an insertion-only -NN data structure on of size that supports queries in time. Inserting a new site in takes time.
4.1 Points in the plane
In the Euclidean metric, -nearest neighbors queries in the plane can be answered in time, using space and preprocessing time [Afshani09, Chan16]. Hence:
There is an insertion-only data structure of size that stores a set of sites in , allows for -NNs queries in time, and insertions in time.
If we increase the size of each group in the logarithmic method to , with and , we get only groups instead of . This reduces the query time to , matching the fully dynamic data structure. However, this also increases the insertion time to . For general constant-complexity distance functions, we achieve the same query time using Liu’s data structure [Liu20], using space and expected insertion time.
4.2 Points in a simple polygon
In the geodesic -nearest neighbors problem, is a set of sites inside a simple polygon with vertices. For any two points and the distance is defined as the length of the shortest path between and fully contained within . The input polygon can be preprocessed in time so that the geodesic distance between any two points can be computed in time [Guibas89].
To apply Theorem 4, we need a static data structure for geodesic -NN queries. We can build such a data structure by combining the approach of Chan [Chan00] and Agarwal et al. [Staals18]. The data structure consists of a hierarchy of lower envelopes of random samples . For each sample, we store a (topological) vertical decomposition of the downward projection of the lower envelope –appropriately preprocessed for efficient point location queries using the method of Oh and Ahn [Oh20]– and the conflict lists of the corresponding (pseudo-)prisms. From a Clarkson and Shor style sampling argument, it follows that total expected size of all conflict lists of one sample is . Thus, the space of the resulting data structure is in expectation. We can then find a prism in one of the vertical decompositions that contains the query point and whose conflict list has size in time [Chan00]. This allows us to answer -NN queries in the same time. The crux in this approach is in how to compute the conflict lists. We can naively compute these in time by explicitly constructing the geodesic distance function for each site, and intersecting it with each of the pseudo-prisms. It is unclear how to improve on this bound.
A static data structure.
To circumvent this issue, we recursively partition the polygon into two subpolygons and of roughly the same size by a diagonal [Staals18]. This results in a decomposition of the polygon of levels. We denote by and the sites in and , respectively.
Let be a set of sites in a simple polygon with vertices. In time we can build a data structure of size , excluding the size of the polygon, that can answer -NN queries with respect to in time.
In this proof, we describe a data structure that can find the -nearest neighbors among sites in for a query point in in time. We store the points of both and at each of the levels of the decomposition in this data structure. To answer a query, we consider one set of sites ( or ) for each level of the decomposition (see Figure 5). Using our technique from Section 3, we can thus find the -NN spread over these data structures in time.
By partitioning the polygon, the Voronoi diagram of sites in (resp. ) restricted to is a Hamiltonian abstract Voronoi diagram, as intersects all Voronoi regions exactly once for all [Klein94]. Agarwal et al. use this fact to construct an efficient data structure (Theorem 22 of [Staals18]) for our problem. This is essentially the data structure that was described in the previous paragraph. However, because we consider a Hamiltonian abstract Voronoi diagram here, we can efficiently compute the conflict lists by only considering the functions intersecting the corners of each prism (see [Staals18] for details). Thus, we can build the data structure in time. The data structure requires space, excluding the size of the polygon, and it allows us to find the -nearest neighbors among for a point in expected time [Staals18].
Improving the query time.
The query time of the static data structure is determined by two factors: a point location query in the topological vertical decomposition of a geodesic Voronoi diagram, this takes time, and distance queries that take time. We can improve the point location time to by incorporating the idea of Oh and Ahn [Oh20] to approximate a geodesic Voronoi diagram by a polygonal subdivision. Given the exact location of the degree-1 and degree-3 vertices, they approximate each common boundary of two Voronoi regions by connecting the two endpoints using at most three line segments. This allows them to find the (not-approximated) Voronoi region that contains a query point in time, using preprocessing time. In our case, we want to find the (pseudo-)trapezoid of the (topological) vertical decomposition of a geodesic Voronoi diagram. Thus, we need to slightly adapt their approach to not only find the Voronoi region of the query point , but also the exact trapezoid containing .
Instead of approximating the bisector connecting two Voronoi vertices, we approximate each part of the bisector between two vertices of the vertical decomposition separately, see Figure 7. To approximate a bisector of sites and between vertices , Oh and Ahn first find two points such that the geodesic convex hull of is contained in the Voronoi regions of and , and the boundary of this convex hull consists of at most four maximal concave polygonal curves. They then approximate the bisector by three line segments contained in the convex hull.
We now consider the vertical decomposition of our Hamiltonian abstract Voronoi diagram. Note that all bisectors in are -monotone [Staals18]. W.l.o.g., let be the vertex furthest from . Oh and Ahn choose as the junction of and . When choosing the ’s like this, it could be that either of the ’s is not contained in the trapezoid we are approximating. In this case, we instead choose as the intersection between and the left line segment defining the trapezoid, see Figure 7. As is contained in the Voronoi region of , this intersection point indeed exists. Note that this also ensures the convex hull is still bound by at most four maximal concave curves. Thus, we can use we can use the algorithm of Oh and Ahn [Oh20] to approximate the bisector within the convex hull.
To find the trapezoid containing a query point , we use the query algorithm of Oh and Ahn [Oh20]. The query is performed as follows. First, we find the approximated trapezoid containing by shooting a ray upwards from , and determining what segment in the approximated Voronoi diagram (or the polygon boundary) is hit. Let be the site whose Voronoi region we find. Suppose this is not the real trapezoid containing , then lies in a region bounded by part of a bisector of and some other site , and the approximation of that bisector, see Figure 7. To find this approximated bisector (and thus the trapezoid in the Voronoi region of containing ), we shoot a ray from in direction opposite to the edge op incident to . This does not intersect the real bisector, and thus intersects the approximated bisector of and . Finally, we compare the distance from to the two sites to find the trapezoid of the vertical decomposition containing . ∎
An insertion-only data structure.
Let be a simple polygon with vertices. There is an insertion-only data structure of size that stores a set of point sites in , allows for geodesic -NN queries in expected time, and inserting a site in time.
5 A fully dynamic data structure
In this section, we consider -NN queries while supporting both insertions and deletions, building on the results of Chan [Chan12-kNN]. We first fill in the part missing from Chan [Chan12-kNN]’s query algorithm. We then discuss a simple deletion-only -NN structure. This allows us to adapt Chan’s -NN data structure to more general distance functions like the geodesic distance.
5.1 A dynamic k-NN data structure for planes
Chan [Chan12-kNN] describes how to adjust his original 1-NN data structure to efficiently perform -NN queries. We denote this (adjusted) data structure by . There are two main changes in this data structure: the conflict lists are stored in -NN data structures, and the number of towers is reduced by using . Only the live planes of the conflict list of each prism are stored in a data structure that uses linear space, can perform -NN (or -lowest planes) queries in time, and deletions in time. A different data structure is used to store small and large conflict lists. After building a tower , each data structure of a prism is built on . As the data structures use linear space, the space usage of the entire data structure is .
Insertions are performed as in the original data structure, but the deletion of a plane requires more effort than in the original. In addition to increasing for each prism containing , is explicitly removed from the data structures. Note that for which is the only tower whose data structures contain . When a prism in tower is purged, we also delete its planes from the other data structures in to retain this property. This gives an amortized expected update time of [Chan12-kNN]. The improvement of Kaplan et al. [Kaplan17] reduces this to . In Section 5.3 we discuss this update time in more detail w.r.t. our adaption of the data structure. It follows from Lemma 2.2 and the above modifications that: [Chan [Chan12-kNN]] Let be a query point. In time, we can find prisms , such that: (i) all prisms contain , (ii) the conflict list of each prism has size , (iii) the conflict lists are pairwise disjoint and stored in a data structure, and (iv) the sites in closest to appear in the union of the conflict lists of those prims.
So, to answer -NN queries we can use a -NN query on each data structure of the prisms , where is the number of sites from the -nearest neighbours of that appear in the conflict list of . This takes time. However, it is unclear how to compute those values. Fortunately, we can use Lemma 3.2 to find the -nearest neighbors over all of the substructures in time. Plugging in the appropriate query time (see Chan [Chan12-kNN] and Section 5.3), this achieves a total query time of time as claimed.
5.2 A simple deletion-only data structure
Let be a set of planes, and let be a parameter. We develop a data structure that supports reporting the lowest planes above a query point in time, and deletions in time. Our entire data structure consists of just -shallow-cuttings of the planes, for values , for . Hence, this uses space in total. We can compute the shallow cuttings along with their conflict lists in time [Chan16]. Note that when , it can be that for some . In this case, we simply do not build any of the cuttings that have . For our application, we are mostly interested in the deletion time of the data structure, and less in the query time. By picking to be somewhat small, we can make deletions efficient at the cost of making the query time fairly terrible.
If we delete a plane, we remove it from all conflict lists in all cuttings. Since cutting has size , each plane occurs at most times. Hence, the total time to go through all of these prisms is time. When more than half of the planes from any conflict list are removed, we rebuild the entire data structure. Because the smallest conflict list contains at least planes, at least deletions take place before a global rebuild. We charge the cost of rebuilding to these planes, so we charge to each deletion. Deletions thus take amortized time.
We report the -lowest planes at a query point as follows. We consider the cutting for which , so at level , for some large enough constant . When , there is no such cutting, so we query the lowest level cutting instead. We find the prism containing by a point location query. As the largest cutting has size , this takes time. We then simply report the lowest planes at by going through the entire conflict list. This results in a query time of .
Reducing space usage.
When is large w.r.t. , that is , we can use a similar approach to Chan [Chan10, Chan12-kNN] to achieve linear space usage. Instead of storing the conflict lists explicitly, we only store the prisms of the shallow cuttings. Additionally, we store the planes in an auxiliary halfspace range reporting data structure [Agarwal95] with preprocessing time, query time, deletion time, and linear space. This results in a linear space data structure. To delete a plane, we simply delete it in the auxiliary halfspace range reporting data structure in time. After deletions we rebuild the entire data structure. Thus the amortized deletion time remains .
When performing a query, we first locate the prism as before. We then query the halfspace range reporting data structure with the intersection point of the vertical line through and the roof of . We find the lowest planes by going through the returned planes. This results in a query time of . Because , we have , thus the query time is .
For any fixed , we can construct a deletion-only data structure of size , or when , in time that stores a set of planes, allows for -lowest planes queries in time and deletions in time.
General data structure.
The bootstrapping data structure can be applied to any type of functions for which we have an algorithm to compute (vertical) -shallow cuttings. The data structure uses space. Note that the “lowest” cutting we use is an -shallow cutting. It follows that constructing all shallow cuttings takes time. To delete a function, we remove all occurrences of the function from the conflict lists in time, and we charge to the deletion to pay for the global rebuild. To answer a query, we simply find the prism containing in one cutting, so the query time is . This results in the following general lemma.
For any fixed , we can construct a deletion-only data structure of size in time that stores a set of functions, allows for -lowest functions queries in time and deletions in time.
5.3 A general dynamic k-NN data structure
To generalize the dynamic -NN data structure from Section 5.1 to other types of distance functions, we replace the data structures by the data structure of Section 5.2. In our approach, we use the same data structure for both small and large conflict lists. Queries and updates are performed as before (see Sections 2.2 and 5.1). This results in a dynamic -lowest functions data structure that can be used for any type of functions for which we can construct -shallow cuttings. As the data structure only uses -shallow cuttings, our approach is also somewhat simpler than Chan’s, albeit at a slight increase in space usage. This problem can be solved by using the space saving idea discussed in Section 5.2.
In this section we analyze the space usage and running times of the data structure for planes, as this is somewhat easier to follow. In Section 5.4 we analyze these for the general data structure.
Our bootstrapping data structure has query time and deletion time . Because we query the shallow cutting at level , the size of each conflict list we query is . By again using our scheme to find the -nearest neighbors over the substructures simultaneously, the query time becomes:
If we set (and just like Chan) we get
Thus using our data structure does not affect the query time.
In the following we analyse the update time of the data structure in more detail. The update time given by Chan is:
Note that this update time is based on the old approach of Chan, where a plane can occur times in a tower. To give a more detailed analysis, we first study the insertion time and then the deletion time of .
Lemma 2.2 states that insertion time is given by , where is the preprocessing time of . Our preprocessing time increases w.r.t. to the original data structure, since after building the hierarchy of shallow cuttings for a tower, we additionally need to build the structures on each of the conflict lists. As before, building the shallow cuttings takes time [Chan19]. Next, we analyse the time to build all data structures . Note that the cutting at level in the hierarchy consists of prisms, and the size each conflict list in the cutting is . Let be the constant bounding the size of the conflict lists. Using that , we find the following running time:
The preprocessing time thus adheres to the recurrence relation