On the complexity of range searching among curves

07/15/2017 ∙ by Peyman Afshani, et al. ∙ 0

Modern tracking technology has made the collection of large numbers of densely sampled trajectories of moving objects widely available. We consider a fundamental problem encountered when analysing such data: Given n polygonal curves S in R^d, preprocess S into a data structure that answers queries with a query curve q and radius ρ for the curves of S that have distance at most ρ to q. We initiate a comprehensive analysis of the space/query-time trade-off for this data structuring problem. Our lower bounds imply that any data structure in the pointer model model that achieves Q(n) + O(k) query time, where k is the output size, has to use roughly Ω((n/Q(n))^2) space in the worst case, even if queries are mere points (for the discrete distance) or line segments (for the continuous distance). More importantly, we show that more complex queries and input curves lead to additional logarithmic factors in the lower bound. Roughly speaking, the number of logarithmic factors added is linear in the number of edges added to the query and input curve complexity. This means that the space/query time trade-off worsens by an exponential factor of input and query complexity. This behaviour addresses an open question in the range searching literature: whether it is possible to avoid the additional logarithmic factors in the space and query time of a multilevel partition tree. We answer this question negatively. On the positive side, we show we can build data structures for the distance by using semialgebraic range searching. Our solution for the discrete distance is in line with the lower bound, as the number of levels in the data structure is O(t), where t denotes the maximal number of vertices of a curve. For the continuous distance, the number of levels increases to O(t^2).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent technological advances have made it possible to collect trajectories of moving objects, indoors and outdoors, on a large scale using various technologies, such as GPS [16], WLAN, Bluetooth, RFID [18] or video analysis [12]. In this paper we study time-space trade-offs for data structures that store trajectory data and support similarity retrieval. In particular we focus on the case where the similarity or distance between two curves is measured by the Fréchet distance. This distance measure is widely studied in computational geometry and gives high-quality results for trajectory data. We focus on the case where the query should return all input curves in a specified range, given by a query curve and a radius . The range is defined as the set of curves that have Fréchet distance at most to , i.e., the metric ball of radius centered at . Our study is timely as it coincides with the 6th GIS-focussed algorithm competition hosted by ACM SIGSPATIAL111 6th ACM SIGSPATIAL GISCUP 2017, see also http://sigspatial2017.sigspatial.org/giscup2017/ drawing attention to this very problem from the practical domain.

At the same time, our results address a broader question concerning multilevel partition trees, a very important classical tool from the range searching literature. See the following survey for more background [4], but briefly, in range searching the goal is to store a set of points such that the points in a query region can be found efficiently. One of the most prominent problems is when the queries are simplicies in , a problem known as simplex range searching. Interestingly, the known solutions for simplex range searching can be easily repackaged into multilevel data structures that can even solve more difficult problems, such as simplex-simplex searching: store a set of simplicies such that the simplicies that are entirely contained in a query simplex can be found efficiently. For some illustrative examples on the versatility and power of multilevel data structures see [7].

The concept of multilevel partition tree based data structures is broad and mathematically not well-defined. Roughly speaking, in a multilevel data structure, first a base data structure is built that defines some notion of first generation canonical sets. Then on the first generation canonical sets, a secondary set of data structures are built which in turn defines a second generation canonical sets. Continuing this “nested” structure for levels would yield a multilevel data structure with levels. This flexibility, allows more complex problems (such as simplex-simplex searching problem mentioned above) to be solved and with very little effort and by only degrading space or query time by small factors. It seems intuitively obvious that each additional level should blow up the space or the query time of the data structure and in fact all known data structures suffer an exponential factor in (often a factor). It has been conjectured that this should be the case but not even a polynomial blow up was proven before (see [1, 7]).

Exponential vs polynomial dependency.

To better understand the situation, let us momentarily focus on planar data structures with linear or near-linear space. For the main problem of simplex range reporting, there exist data structures with space and query time where is the output size. This query time is conjectured to be optimal and there exist lower bounds that almost match it up to a factor ([1]) or factor ([8]). Thus, the base problem of simplex range reporting is well-understood. However, beyond this, things are less clear. In particular, we would like to know what happens if the query regions or the input are more complex objects. Assume the input is a set of points but the query is a tuple of hyperplanes that define a polytope. To report the set of points inside the query polytope, we can triangulate the query polytope into simplices and then we can ask a simplex range searching query for each resulting simplex. This will not alter the space consumption at all and it will only blow up the query time by a factor but for a constant , this factor is a fixed polynomial of the query complexity, . This example shows that such “obviously more complex” queries can actually be handled very efficiently. Now consider what happens if both queries and input are complex objects. Consider a problem in which the input is a set containing tuples where the -th tuple is composed of points, i.e., where each , and the query is a tuple of simplices . The goal is to report all the input tuples such that lies inside for every . In this case, seemingly, the best thing to do is to build a multilevel data structure composed of levels. Such a data structure will consume space and will have the query time of using the best known results in the literature on multilevel data structures [7]. The crucial difference here is that both space and query time degrade exponentially in as opposed to the polynomial dependency in the previous case. The main open question here is whether this exponential factor is required.

The picture becomes more interesting once one looks at the history of multilevel data structures and once one realizes that there are many ways to build them. In Matoušek’s [17] original paper, one would sacrifice a factor for space and a factor for the query time but this comes at a larger pre-processing time. If one wishes to reduce the preprocessing time, then the loss increases to an unspecified number of factors per level. Chan [7] offers the currently best known way to build multilevel data structures at only one factor loss for space and query time per level (in fact, in some cases, we can do even better). This brings us to the main lower bound question regarding multilevel data structures.

Question 1.

Is it possible to avoid the additional logarithmic factors for every level in the space and query time of a multilevel data structure?

We at least partially settle this open question by showing that the space/query time tradeoff must blow up by at least a roughly factor for every increase in . To do that, we show that a particular problem that can be solved using multilevel data structures is hard.

We remark that the above question is ambiguous since we did not provide a mathematically precise definition of a “multilevel data structure”. Such a definition would have to capture the versatility of the multilevel approach to data structuring. For instance, multilevel partition trees can have different fan-outs at different levels, they can selectively use duality restricted to individual levels, or they can use different auxiliary data structures mixed in with them. A crucial and arguably most useful property of the multilevel structures is that different levels can handle completely independent subproblems. By lack of a precise definition that is commonly agreed upon and perhaps in the hope to prove an even stronger statement, we take a different approach: We prove a lower bound for a concrete relevant multilevel data structuring problem (Problem 3 on page 3).

The problem only involves independent points and simplicies (the basic components of a simplex range reporting problem) and thus any multilevel data structure must be able to solve the problem. This means, a lower bound for this problem gives a lower bound for the general class of multilevel data structures. From this point of view, our lower bound is in fact stronger: it shows that the multilevel stabbing problem is strictly more difficult than the ordinary simplex range searching problem, even if we are not restricted to use “multilevel data structures.”

Not only that, but we also show that the lower bound also generalizes to geometric search structures based on Fréchet distance: preprocess a set of polygonal curves of complexity such that given a query polygonal curve of complexity , we can find all input curves within some distance of the query (Problems 1 and 2 on page 1). This lower bound is not obvious and it also provides additional motivation to study multilevel data structures. The fact that we can extend our lower bound to such a practically relevant problem emphasizes the relevance of our lower bounds.

2 Definitions and Problem Statement

A polygonal chain is a sequence of vertices . The discrete Fréchet distance of two chains and is defined using the concept of traversals. A traversal is a sequence of pairs of indices such that , and and one of the following holds for each pair with : (i) and , or (ii) and , or (iii) and . The discrete Fréchet distance is defined as

(1)

Finding the traversal that minimizes the Fréchet distance is called the alignment problem.

The continuous Fréchet distance is defined for continuous curves. For a polygonal chain

, we obtain a polygonal curve by linearly interpolating

and , i.e., adding the edge in between each pair of consecutive vertices. The curve has a uniform parametrization that allows us to view it as a parametrized curve . The Fréchet distance between two such parametrized curves is defined as

(2)

where ranges over all continuous and monotone bijections with and .

In this paper we consider the following two problems based on the Fréchet distance.

Problem 1 (Discrete Frechet Queries).

Let be an input set of polygonal chains in where each polygonal chain has size at most . Given a parameter , we would like to store in a data structure such that given a query polygonal chain of length at most , we can find all the chains in that are within the discrete Frechet distance of , see Equation (1).

Problem 2 (Continuous Frechet Queries).

Let be an input set of polygonal curves in where each polygonal curve consists of at most vertices. Given a parameter , we would like to store in a data structure such that given a query polygonal curve consisting of vertices, we can find all the curves in that are within the continuous Frechet distance of , see Equation (2).

For both problems, the output size is the number of input curves that match the query requirements.

Since we will be working with tuples of points and geometric ranges, we introduce the following notations to simplify the description of our results.

A -point in is a tuple of points where each is a point in . The concepts of -hyperplanes and -halfspaces, and etc. are defined similarly. A slab is the region between two parallel hyperplanes. The thickness of is the distance between the hyperplanes and it is denoted by . A -slab in is a tuple of slabs where each is a slab. The thickness of is defined as . A -point is inside a -slab if is inside for every . We will adopt the convention that the -th point in a -point is denoted by . The same applies to the other definitions.

We will also show a lower bound for the following concrete problem.

Problem 3 (Multilevel Stabbing Problem).

Let be an input set containing -slabs. We would like to store in a data structure such that given a query -point we can find all the -slabs such that contains .

The pointer machine model.

The model of computation that we use for our lower bound is the pointer machine model. This model is very suitable for proving lower bounds for range reporting problems. Consider an abstract data structure problem where the input is a set of elements and where a query (implicitly) specifies a subset that needs to be output by the data structure. In the pointer machine model, the storage of the data structure is represented using a directed graph with constant out-degree where each vertex in corresponds to one memory cell. Each memory cell can store one element of . The constant out-degree requirement means that each memory cell can point to at most a constant number of other memory cells. The elements of are assumed to be atomic, meaning, to answer a query , for each element , the data structure must visit at least one vertex (i.e., cell) that stores . To visit that subset of cells, the data structure starts from a special vertex of (called the root) and follows pointers: the data structure can visit a memory cell only if it has already visited a cell such that points to . There is no other restriction on the data structure, i.e., it can have unlimited information and computational power. The size of the graph lower bounds the storage usage of the data structure and the number of nodes visited while answering a query lower bounds the query time of the data structure. Thus, when proving lower bounds in the pointer machine model, it is sufficient to show that if the data structure operates on a small graph , then during the query time, it has to visit a lot of cells (or vice versa).

3 Related Work on Fréchet Queries

Few data structures are known which support Fréchet queries of some type. We review the space and query time obtained by these data structures. In the following, let denote the number of curves in the data structure and let denote the maximum number of vertices of each curve. The data structures can be distinguished by the type of queries answered: (i) nearest neighbor queries [15, 11], (ii) range counting queries [10, 13],

Before we discuss these data structures, we would like to point out that under certain complexity-theoretic assumptions both (i) and (ii) above become much harder for long curves, and in particular for

. More specifically, there is a known reduction from the orthogonal vectors problem which implies that, unless the orthogonal vectors hypothesis fails, there exists no data structure for range searching or nearest neighbor searching under the (discrete or continuous) Fréchet distance that can be built in

time and achieves query time in for any (see also the discussion in [11]).

A data structure by Indyk supports approximate nearest-neighbor searching under the discrete Fréchet distance [15]. The query time is in and the approximation factor is in . The data structure uses space in , where is the size of the domain on which the curves are defined. The data structure precomputes all answers to queries with curves of length , leading to a very high space consumption. A recent result by Driemel and Silvestri [11] shows that it is possible to construct locality-sensitive hashing schemes for the discrete Fréchet distance. One of the main consequences is a -approximate near-neighbor data structure that achieves query time and space.

As for the continuous Fréchet distance, de Berg, Gudmundsson and Cook study the problem of preprocessing a single polygonal curve into a data structure to support range counting queries among the subcurves of this curve [10]. The data structure uses a multilevel partition tree to store compressed subcurves. This representation incurs an approximation factor of in the query radius. For any parameter , the space used by the data structure is in . The queries are computed in time in . However, the data structure does not support more complex query curves than line segments.

The motivation to study the subcurves of a single curve originated from the application of analyzing single trajectories of individual team sports players during the course of an entire game. A different application, namely the map matching of trajectories onto road maps [5] led Gudmundsson and Smid to study slightly more general input—consider the geometric graph that represents a road map and a range query among the set of paths in the graph. Gudmundsson and Smid study the case where the input belongs to a certain class of geometric trees [13]. Based on the result of de Berg, Gudmundsson and Cook they describe a data structure which supports approximate range emptiness queries and can report a witness path if the range is non-empty. Furthermore, the queries can be more complex than mere line segments. The data structure has size , preprocessing time in and answers queries with polygonal curves of vertices in time.

It should be noted that the latter two data structures [10, 13] make strict assumptions on the length of the edges of the query curves with respect to the query radius which seems to simplify the problem. While it is widely believed, based on complexity-theoretic assumptions, that there is no -time algorithm for any that can decide if the discrete or continuous Fréchet distance between two curves is at most a given value of  (see Bringmann [6]), this problem drastically simplifies if is smaller than half of the maximal length of an edge of the two curves. In particular, a simple linear scan can solve the decision problem in time. Our results do not make any assumptions on the length of the edges of the curves or the distribution of the edges.

4 Our Results

We show the first upper and lower bounds for exact range searching under the discrete and continuous Fréchet distance. Our lower bounds are in fact obtained for the multilevel stabbing problem and it proves that the space required for answering the multilevel stabbing queries in time must obey

(3)

Here is the size of the output and is the number of levels. Based on what we have discussed, not only this proves the first separation between the simplex range reporting data structures and multilevel data structures, but it also shows space should increase exponentially in , as long as .

For the Fréchet distance queries, a set of polygonal curves in is given as input, where each input curve consist of at most vertices. A query with a curve of vertices and query radius returns the set of input curves that have Fréchet distance at most to .

  1. Assume there exists a data structure that achieves query time and uses space in the pointer model. We show that the must obey the same lower bound in Eq. 3 where .

In addition, we show how to build multilevel partition trees for the discrete and the continuous Fréchet distance using semi-algebraic range searching:

  1. For the discrete Fréchet distance we descibe a data structure which uses space in and achieves query time in , assuming .

  2. For the continuous Fréchet distance we describe a data structure for which uses space in and achieves query time in , assuming . For the second data structure, the query radius has to be known at preprocessing time.

5 Outlines of the Technical Proofs

5.1 Outline of the lower bounds

We first prove lower bounds for the reporting variant of the multilevel stabbing problem in the pointer machine model. By what we discussed, this gives a lower bound for multilevel data structures. Next, we build sets of input curves and query curves that show the same lower bounds can be realized under the Fréchet distance. Before we sketch the lower bound construction, we say a few words about the lower bound framework we use.

5.1.1 The framework of the proofs

Our reporting lower bound uses a volume argument by Afshani [1]. This argument can be used to show lower bounds for stabbing reporting queries, i.e., the input is a set of ranges and a query with a point returns all ranges that contain this point. Imagine, we want to answer any query in time where is the size of the output. In order to set up the volume argument we need to define a set of queries that has volume one and a set of input ranges, such that (i) each query point is covered by sufficiently many ranges (by at least ranges), and (ii) the volume of the intersection of any subset of ranges is sufficiently small, i.e., at most . Then, the framework shows that the space is asymptotically lower bounded by . The intuition of why the framework works is the following: the intersection of some subset of ranges is the locus of (query) points that must output those particular subset of ranges. Thus, if the intersection of every subset of ranges is small, then our set of queries contains many different queries that each output a different subset of input ranges. Thus, precomputing (and implicitly storing) partial answers must increase the space according to these volumes.

5.1.2 Multilevel Stabbing Problem

We start with the unit cube in where denotes the number of levels. In particular, we view as the Cartesian product of unit squares: . The input is a set of -slabs in . The query is a -point in . The main part of the proof is an intricate construction of the -slabs.

The main result here is Lemma 1 (see page 1). We will not repeat the exact technical claim and instead we will focus on the general ideas and the intuition behind them. The first step is to build different sets of -slabs of roughly size, such that the slabs in each set tile , i.e., any -point is covered by exactly one -slab. This will directly satisfy condition (i) of the framework described in Section 5.1.1. The difficult part is to find a good construction that guarantees that every subset of -slabs have a small intersection.

To build , we build a set of two-dimensional slabs in each unit square such that they together tile . Then, is taken to be the set of -slabs that one obtains by creating the Cartesian product of all the slabs created in . See Figure 5 on page 5 for an illustration. In order to obtain small intersection volume we would like to adjust the thickness of the two-dimensional slabs. While adjusting the thickness of the slabs in each universe, we make sure that we create roughly -slabs in : This boils down to making sure that the product of the thicknesses of the two-dimensional slabs is a fixed value . We have degrees of freedom to pick the orientation of the slabs and thus we can represent the set of angles that define the orientation of the slabs in each by a point in ; we call these points, “parametric points”. Thus, every set has one parametric point and in our construction there are parametric points in total.

The parametric points need to be placed very carefully. In particular our construction places the parametric point such that the volume of the smallest axis-aligned rectangle created by any two parametric points is maximized (see Lemma 3 in page 3). Intuitively, this means that if the points are “well-spread” so that no small axis-aligned box can contain two points, then the volume of the intersection of the slabs is also going to be large.

Regarding the thicknesses of the slabs, we have degrees of freedom since the product of the thicknesses is set to be a fixed value . However, we need to place more restrictions on the thicknesses. We make sure that the different thicknesses are sufficiently different. In particular, we set the width to be in the form of for some fixed parameter and some integer . This means that for each slab we allow roughly a logarithmic number of different possible thicknesses. Thus, the degrees of freedom in choosing the thicknesses are translated to freedom in choosing integers in some narrow range (between 0 and roughly ). Note that this freedom is only present for the first two-dimensional slabs, that is, in and the thickness of the last slab is determined based these values and the value of . This further implies that the sum of the integers that we choose should also be within the same narrow range. Nonetheless, unlike the case of angles, our choices in picking these integers are represented combinatorially as a single value and we treat it like a color. In other words, we define a set of all the available colors (roughly and then associate each set with a color; the color determines the thickness of slabs in .

Thus, after placing parametric points in , we need to color each point with a color. This coloring needs to be done carefully as well. The placement and the coloring of the points are done using one lemma (Lemma 3 in page 3).

However, more work is required to make the construction work. We need to impose some favorable combinatorial structure on the set of colors that we create by removing some of the colors. This is done by sampling a small number of colors.

Finally, we try to bound the volume of the intersection of any of the -slabs that we created. Any two slabs in are disjoint and thus for a non-zero intersection, the slabs should come from different sets, e.g., . The straightforward argument gives us a bound on that ultimately gives the same lower bound as simplex range reporting. So we perform a non-obvious analysis. We look at two possible cases: Either (i) two of the parametric points of have the same color and in this case we use the properties of our coloring (see Lemma 3 in page 3) to ensure that such points are “well-spread”; in particular, if we have colors, we can make sure that the parametric points of each color are a factor roughly “better spread”, meaning, the volume of the smallest axis-aligned rectangle that contains two points of the same color is a factor larger than the volume of the smallest rectangle that contains two points of different colors. Ultimately, this buys us a factor in our lower bound. Observe that the value grows exponentially on (up to some maximum value). The other case is when (ii) all the parametric points of have distinct colors. By using the favorable combinatorial property that we had imposed earlier on the set of colors, we find 3 colors among the many different distinct colors and an index such that these colors have three distinct values at coordinate . This in turn implies that the slabs in have 3 distinct thicknesses. However, thicknesses differ by at least a factor and thus further analysis buys us a factor on value of . By combining the two cases, we show that we can improve our lower bound by either a factor or factor . We set our parameters such that and are roughly equal and we obtain the lower bound.

5.1.3 Constructions for the Fréchet Distance

In order to apply the above construction to the Fréchet range searching we dualize the Fréchet query ranges to some extent. Our dualization differs significantly between the two variants of the problem. For the discrete Fréchet distance we observe that the set of points that lie within Fréchet distance to a line segment are contained in the intersection of the two circles of radius centered at the two endpoints. We call the intersection of two circles a lens. Thus, we create a set of lenses as input instead of a set of slabs and we let the vertices of the query curve act as stabbing queries. Refer to Figure 1 on page 1 for an illustration of this straight-forward approach. We observe that inside the unit square, lenses can be made to almost look like slabs, that is, for any slab, we can create a lens that is fully contained in the slab such that the area of the symmetric difference between the slab and the lens is made arbitrarily small. As a result, after a little bit more work, we can show that the construction for the multilevel stabbing problem directly gives a lower bound for the discrete Fréchet queries problem.

In contrast, our construction for the continuous Fréchet distance dualizes the lines

supporting the edges of the query curve, creating a separate “universe” for every odd edge (in lieu of a universe for every vertex). Here, our construction is such that the locus of query curves in the dual space that lie within Fréchet distance

to a specific input curve forms a set of slabs—one in each universe. To this end we let the input curve follow a zig-zag shape. We use one zig-zag curve per universe. Refer to Figure 7 on page 7 for an example of a zig-zag used in the construction. Our analysis uses the basic fact that the set of lines intersecting a vertical interval in the primal space corresponds to the set of points enclosed in a slab in the dual space. We combine this fact with a well-known connection between the Fréchet distance and ordered line-stabbing initially observed by Guibas et al. [14]. This observation says that the line supporting the query edge needs to stab the disks of radius centered at the input curve in their correct order. For our zig-zag curves this has the effect that the line needs to intersect the vertical interval formed by the two intersection points of the circles of radius centered at the two corners of the zig-zag. We can control the width, orientation and position of the resulting slab in the dual space by varying the length and the position of this vertical interval. Using these proof elements, we can show that the lower bound of the multilevel stabbing problem which is analyzed in the beginning, carries over to the continuous Fréchet distance as well.

Figure 1: Illustration of the lower bound construction for the discrete Fréchet distance showing universes , and . For every , a query curve has its th vertex inside . The intersection of two disks centered at the vertices of the th odd edge of an input curve forms a “near-slab” and needs to contain the th vertex of a query curve if is contained in the query range centered at .

5.2 Outline of the data structures

To obtain our upper bounds, we perform an extensive analysis of the definition of the Fréchet distance that allows us to restate the alignment problem using a sequence of semialgebraic range queries. One of the challenges here is to design a set of filters that do not create duplicates in the output across the different range queries that need to be performed. We first focus on the discrete Fréchet distance, where the analysis is significantly cleaner and simpler. The dynamic programming algorithm which is commonly used to compute the discrete Fréchet distance uses a Boolean matrix, the so-called free space matrix, to decide which alignments between the curves are feasible. The entry of this matrix indicates if the Euclidean distance between the th vertex of one curve and the th vertex of the other curve is at most . The two curves have Fréchet distance at most if and only if there exists a traversal that only uses the 1-entries in the free-space matrix. The set of possible truth assignments to this matrix induces a partition on the input curves with respect to their free space matrix with the query curve. Furthermore, each set in this partition is either completely contained in the query range or it is completely disjoint from the query range. We show how to construct a multilevel data structure that allows us to query independently for each of those sets which are contained in the query range.

Our query processing works in three phases. First, we compute all feasible free space matrices based on the arrangement of balls centered at vertices of the query. Next, we refine this arrangement to obtain cells of constant complexity that can be described by the zero set of a polynomial function. In the third phase we query the data structure with each free space matrix separately, using semialgebraic range searching in each level of the data structure to filter the input curves that have their th vertex inside a specific cell of the refined arrangement. To see how this works, consider the set of th vertices of the input curves that lie in a fixed cell of the arrangement of balls centered at the vertices of the query curve. The corresponding input curves share the same truth assignment in the th column of the free-space matrix with . Refer to Figure 2 for an example.

Figure 2: Example of a query matrix for the discrete Fréchet distance with a feasible traversal (right). The truth assignment in a fixed column corresponds to a cell in the arrangement of balls centered at the vertices of the query curve (left). The figure also shows three input curves that have this free-space matrix with the query curve and would thus be reported.

We now build a standard multilevel partition tree on the polygonal chains. In the th level we store the th points of the input curves. Our query algorithm processes the free-space matrix in a column-by-column fashion, where we use the convention that the column index refers to a point on the input curves and a row index refers to a point on the query curve. This makes the storage layout of the data structure independent of the number of vertices of the query curve.

For the continuous Fréchet distance the approach is similar, at least on a high level. The main difference is that the Boolean matrix that guides the queries is more complicated, since we operate on the continuous free-space diagram instead of the discrete free-space matrix. We first define high-level predicates that carry enough information to decide the Fréchet distance. Each predicate involves a constant number of edges and vertices from the input and query curves, e.g., testing the feasibility of a monotone path for a combination of a row and two vertical lines in the free-space diagram. Next we show how to represent these predicates using more basic operations, e.g., above-below relationships between points and lines that can be dualized. Finally, the query algorithm will test groups of these predicates for each feasible truth assignment separately. Also here we manage to keep the layout of the data structure independent of the complexity of the query curve.

There are two main challenges in dealing with the continuous case. One is to obtain the more complicated discrete matrix that captures all possible free-space diagrams of the fixed query curve with any arbitrary possible input curve. The second challenge is to make sure we can express all our predicates in the framework of semialgebraic range searching in two dimensions. Our solution is non-obvious since the Fréchet distance is not defined as a closed-form algebraic expression. This second challenge is the main issue that prevents us from directly generalizing our data structure to higher dimensional queries.

5.3 Organization

We prove the lower bounds in Section 6. We first show the lower bound for the multilevel stabbing problem. The construction is given in Section 6.2. We discuss the range reporting lower bound in Section 6.3. In Section 6.4 we show how to implement the construction for the two variants of Fréchet queries. We describe our data structures in Section 7. In Section 7.1 we describe the machinery that we use to build our data structures. In Section 7.2 we develop a data structure for discrete Fréchet queries. In Section 7.3 we extend these ideas and develop a data structure for continuous Fréchet queries. We conclude with some open problems in Section 8.

6 Lower Bounds

As discussed, we prove lower bounds for a concrete problem, that is, the multilevel stabbing problem. To do that, we need to construct a “difficult” input instance of -slabs with certain desirable properties. This construction is at the heart of our lower bounds and this is what we are going to attempt in this section.

6.1 Definitions

Our lower bounds for the multilevel stabbing problem is based on an intricate construction that we outline in this subsection. Define the space where each is the unit cube in the plane. now represents the set of all possible queries: a query -point is represented by the point which corresponds to a point in , for every . Observe that the points are completely independent. Similarly, an input -slab is represented by picking independent slabs, one slab in for each .

Consider a (measurable) subset that lies in a -dimensional flat of , . We denote the -dimensional Lebesgue measure of with . For a set of points , we denote the smallest axis-aligned box that contains them all with . Finally, for two -slabs and , we say is a translation of if for every index , the slabs and are parallel and have the same thickness.

6.2 The 2D Construction

Lemma 1.

Consider parameters and under constraints to be specified shortly. We can build a set of -slabs such that , for every . Furthermore, (i) for any -slabs and any -slabs such that is a translation of , we have .

The constraints are that , , and that is defied as when and when .

As we shall see later, combined with the existing framework, the above lemma offers our desired lower bounds with only little bit more work. Thus, the main challenge is actually proving the above lemma. The main idea is the following: To define each -slab in , we have the freedom to pick different angles, one angle for each universe , . We also have the freedom to alter the thickness of the slabs we constructed in each . Thus, we have “ degrees of freedom” to pick the angles and “ degrees of freedom” to pick the slab thickness. The former degrees of freedom are represented as points (that we call “parametric points”) in and the latter are represented combinatorially as “colors”. To make the construction work, we do not allow for all possible combinations of “colors” and instead we prune the colors using a combinatorial technique. We ultimately isolate a sub-problem that is very connected to orthogonal range searching. This is a very satisfying since it was suspected that there could be connections between orthogonal range searching and multilevel non-orthogonal range searching 222For example, Chan [7] compares non-orthogonal multilevel data structures to -dimensional range trees that can be viewed as -levels of -dimensional data structures.. As a result, we manage to incorporate some techniques from orthogonal range searching lower bound in our construction (see Theorem 2). However, combining the colors (i.e., the “orthogonal component”) and the set of parametric points (the non-orthogonal component) requires a careful analysis.

6.2.1 Parameters Defining the -slabs.

To construct each slab in , we use parameters: assume, we would like to construct a slab . We use real-valued parameters and integral parameters . We call these parameters the defining parameters of . is the angle slab makes with the -axis, and the thickness of is defined as , for . However, since we would like to end up with a -slab such that , we define and . Note that is not necessarily an integer. We have:

Observation 1.
Definition 1.

Consider a -slab . Let and color be the defining parameters of . We call the point the parametric point of and denote it with and with the tuple being its color.

We have to make very careful choices when picking the defining parameters of . We discuss how to pick the colors in Section 6.2.3.

Figure 3: Each is defined as the Cartesian product of -dimensional slabs. The thickness of is .

We now establish some basic facts about this construction.

Observation 2.

Consider -slabs . We have .

We will also use the following elementary geometry observation regarding the area of the intersection of two slabs.

Observation 3.

Consider two 2-dimensional slabs and of thickness and respectively. And let be the angle between them. Then, . (See Figure 4.)

Figure 4: Distance between the parallel lines forming slabs and is and respectively. If the angle between the slabs is , then the area of the shaded region is .
Lemma 2.

Let be two -slabs and let and be the parametric points of and respectively. Then regardless of the colors of these two -slabs, is asymptotically bounded by

Proof.

By Observations 3 and 2, is asymptotically upper bounded by

By Observation 1, the nominator equals since the thickness of both and is . Then, the lemma then follows from observing that is exactly . ∎

6.2.2 Coloring and the Parametric Points

In this subsection, we will discuss how to pick the parametric points of the slabs in . Essentially, we will place a set of points in using the upcoming constructions. We extend the following construction that is used in lower bounds for the orthogonal problems.

Theorem 1.

[2, 3, 9] For any parameter , we can place a set of points inside the unit cube in such that for any two points , we have where the constant in the asymptotic notation depend on .

To choose the parametric points, we use the method in [2]. But since in our case, the dimension is not longer considered a constant, we need to provide a tight analysis, and determine its precise dependency on the dimension (by using the prime number theorem).

Theorem 2.

For any parameter , we can place a set of points inside the cube in such that for any two points we have .

Proof.

We use the same construction in [2]: We pick the first prime numbers and we place points on the integer points in . The coordinates of the -th point is , for , where is the “reversed” (or inverted) representation of in base using digits 333That is, if , then . Basically, we write in base with the most significant digits to the left and then read the digits from right to left to obtain .. In [2], it is only proved that the volume of the axis-aligned box that contains points is with a “constant” that depends on . Here, we need to make the dependence on explicit.

Consider two points such that . Let . Observe that if divides but does not, then the representation of in base contains exactly leading zeros (and as a conclusion, contains exactly zeros at its most significant digits). This implies that for any natural number such that, , and agree on exactly of their most significant digits, which yields the bound . Let be the smallest box that contains and , be the side lengths of and be its volume. Thus, . Let be the integer such that . Based on the above observation, must contain leading zeros or in other words, divides . Let . Since ’s are relatively prime, it follows that also divides . However, observe that

Let . We claim , since otherwise we will reach a contradiction. To see this, assume to the contrary that . Assuming this, we get

(4)

which is a contradiction since must divide .

It remains to estimate

. By Prime number theorem, we know that . Thus, using Stirling’s approximation, we have

Using the above theorem, we prove the following result.

Lemma 3.

Consider the unit cube in . Let and be two integral parameters such that . We can place a set of points in and assign each an integer color from 0 to such that the following hold: (i) for any two points and we have and (ii) for any two points and that have the same color we have .

Proof.

Consider the unit cube in . We use Theorem 2 (after re-scaling the cube to the unit cube), and we place a set of points in .

We now define the set . Consider a point and let be the value of the last coordinate of . We project into the first -dimensions to get a point and color it with color .

We show that the projected points satisfy the two claims in the lemma. Claim (i) is trivial: By Theorem 2, for any two points that were obtained from the two points we have . Simply observe that

Thus, it remains to prove claim (ii). Consider two points with the same color that correspond to two points . Since and have the same color, it follows that the difference between the value of their -th coordinate is at most . This fact combined by Theorem 2 implies

6.2.3 Choosing the Colors.

In the previous subsection, we discussed constructions that will help us place the parametric points. Here, we will pick the set of colors that are used to color them. First, we establish an invariant.

Invariant (I).

Let . We will maintain one invariant that and , for each . This invariant is to make sure that our construction is well-defined, in particular, to make sure that for each slab , , for all , are in the valid range . As a result, any tuple of integers that satisfy this invariant, will yield well-defined values for the thickness of the slabs used in our construction.

We will first need to estimate the total number of different colors that satisfy this invariant. Let be the set of all the colors satisfying Invariant (I). In other words, is the set of all tuples where each , , is a non-negative integer and furthermore, , for where .

Observation 4.

.

Proof.

If we force , for , then we will have and thus , for . Clearly, the number of tuples is at least as claimed. ∎

Pruning the colors.

Fix an integral parameter . We call a subset of colors an -subset. We say an -subset is bad if by looking at the dimensions of the colors in , we see only distinct values at each dimension and is good if it is not bad. Alternatively, is good if we can find three colors , , , and , and an index such that , and are all distinct. Let be the largest subset of that contains no bad -subsets (in other words, every -subset of is good).

Lemma 4.

If , then contains no bad -subset and thus .

Proof.

Consider a bad -subset . We can have at most distinct values at each coordinate of the tuples in . Therefore the number of tuples in cannot exceed . In turn, there are no bad -subsets if . ∎

Lemma 5.

If , then .

Proof.

We claim that there exists a subset that contains the claimed number of colors without containing any bad -subset and this clearly proves the lemma.

We prove the claim using random sampling: we take a random sample of small enough size and then remove the bad subsets.

Let be a parameter to be determined later. Let be a subset of

where each color is sampled independently and with probability

. Clearly, we have . From each bad -subset, we remove one color. The set of remaining colors will be the claimed set . By construction, will not contain any bad -subsets but the main point is to show that will actually retain a significant fraction of the colors.

Let be the total number of bad -subsets . We first estimate . By definition of a bad -subset, for every dimension we see only distinct values among tuples in . Thus, is the total number of ways we can choose these distinct values, at a particular dimension. After choosing the distinct values, for every tuple, every dimension has only possible choices. Thus we have,

Now, consider a bad -subset . Observe that survives in with probability and thus the expected number of colors that we will remove is at most . If we can choose the parameter such that , then we are expected to only remove one color and thus the expected number of colors left in after the pruning step is at least . Thus, we need to pick a value such that

Picking as above, implies that the number of points left in is at least

6.2.4 The Final Construction.

We use Lemmas 4 and 5 (depending on the value of ), to pick the set of colors. Then, we use Lemma 3, where is set to , is set to and is set to . Thus, Lemma 3 yields us a point set . The coordinates of the -th point in defines the parametric point of and the color of defines the thickness of the two-dimensional slabs that create . Thus, the set completely defines the set of -slabs that we aimed to build.

The last challenge is to bound the volume of the intersection of these slabs. We will do this in the remainder of this subsection.

Lemma 6.

Consider -slabs where the defining parameters of are