Tracking technologies like GPS gather huge and growing collections of trajectory data, for instance for cars, mobile devices, and animals. The analysis of these collections poses many interesting problems, which has been the subject of much attention recently . One of these problems is the identification of the region, in which an entity has spent a large amount of time. Such regions are usually called stay points, popular places, or hotspots in the literature.
We consider polygonal trajectories, in which the trajectory is obtained by linearly interpolating the location of the moving entity, recorded at specific points in time (the assumption of polygonal trajectories is very common in the literature; see for instance[2, 3, 4]). Gudmundsson et al. define several problems about trajectory hotspots and present an algorithm to solve the following : defining a hotspot as an axis-aligned square of fixed side length, the goal is to find a placement of such a square that maximizes the time the entity spends inside it (there are other models and assumptions about hotspots, for a brief survey of which, the reader may consult ; e.g. the assumption of pre-defined potential regions , counting only the number of visits or the number of visits from different entities , or based on the sampled locations only ). To solve this problem, they first show that the function that maps the location of the square to the duration the trajectory spends inside it, is piecewise linear and its breakpoints happen when a side of the square lies on a vertex, or a corner of the square on an edge of the trajectory. Based on this observation, they subdivide the plane into faces and test each face for the square with the maximum duration.
We limit ourselves to trajectories whose edges are parallel to axes of the coordinate system (which we call orthogonal). One possible application of this problem is finding regions in a (possibly multi-layer, 3-dimensional) VLSI chip, with high wire density considering their current, to identify potential chip hot spots. A -approximate hotspot of a trajectory is a square, in which the entity spends no less than times the time it spends in the optimal hotspot. We present an time algorithm for this problem with an approximation ratio of and the time complexity . This algorithm combines kinetic tournaments  with segment trees to maintain the maximum among sums of a set of piecewise linear functions. We also present a simpler time algorithm for finding -approximate hotspots.
2 Preliminaries and Basic Results
A trajectory specifies the location of a moving entity through time. Therefore, it can be described as a function that maps each point in a specific time interval to a location in the plane. Similar to Gudmundsson et al. , we assume that trajectories are continuous and piecewise linear. The location of the entity is recorded at different points in time, which we call the vertices of a trajectory. We assume that the entity moves in a straight line and with constant speed from a vertex to the next (this simplifying assumption is very common in the literature but there are other models for the movement of the entity between vertices ); we call the sub-trajectory connecting two contiguous vertices, an edge of the trajectory.
In this paper, we relax the requirement that a trajectory is continuous. We assume that a trajectory is a set of edges . For an edge , we associate a weight , which denotes the duration of the sub-trajectory through its end points (the difference between the time recorded for its end points). In orthogonal trajectories, all trajectory edges are parallel either to the or to the -axis. In horizontal (similarly vertical) trajectories all edges are parallel to the -axis (-axis).
For any axis-parallel square with some fixed side length, we define the weight of as the total duration in which the entity has spent inside it and denote it with , or if there is no confusion . A hotspot is an axis-parallel square with side length and with the maximum possible weight. We denote the weight of a hotspot of trajectory with . The side lengths of the squares considered in the rest of this paper are equal and fixed.
Let and be a partition of an orthogonal trajectory , in which contains the horizontal and contains the vertical edges of . Let be the maximum of and . Then, is at least .
Let be a hotspot in . Every edge of is either in or in and thus equals . Therefore, either or . Since and , we have as required. ∎
Let be an axis-parallel square and be a horizontal trajectory. The entering rate of an edge of with respect to is the rate at which the contribution of the weight of the edge to the weight of increases, if the right side of is moved to the right. Similarly, the leaving rate of an edge with respect to is the rate at which the contribution of the weight of the edge to the weight of decreases, if the left side of is moved to the right. We denote the former as and the latter as . It is not difficult to see that (and similarly ) is either zero or the ratio of its duration to its length, which we denote as . In Figure 1, except the entering rate of and , and the leaving rate of and , the entering and leaving rates of all edges are zero.
The entering rate of horizontal trajectory with respect to square , denoted as , is defined as the sum of the entering rates of all edges of . Similarly, the leaving rate of trajectory with respect to square is the sum of the leaving rates of all edges of ; this is denoted as .
Let be a horizontal trajectory. There exists a square with side length , whose weight equals and one of whose vertical sides contains a vertex of .
Let be a square with weight and suppose none of its vertical sides contains a vertex of . Clearly, cannot be greater than ; otherwise, the weight of increases by moving it to the right, which is impossible since it is a hotspot. Similarly, cannot be greater than . Therefore, and by moving to the right until one of its sides meets a vertex of , its weight does not change. ∎
Let be a horizontal trajectory and let be the maximum weight of a square with side length , one of whose corners coincides with one of the vertices of . Then, .
Let be the square with weight , one of whose vertical sides contains a vertex of (such a square surely exists, as shown in Lemma 2.2). Suppose is on the left side of (the argument for the right side is similar). Let and be the squares with side length , whose lower left and upper left corners are on , respectively. Given that the union of and covers , is at least and therefore is at least . Since , we have . ∎
In Lemma 2.4, we show how to find a maximum weight square with a left corner on a trajectory vertex, for horizontal trajectories.
Among all axis-parallel squares with side length and with a left corner on a vertex of a horizontal trajectory , the square with the maximum weight can be found with the time complexity .
Let be the sequence of trajectory edges, ordered by their -coordinates. We perform a sweep line algorithm as follows. During the algorithm, for each edge we maintain a linear function of , which shows the contribution of the edge to intersecting sweep squares. Initially, is a zero function for every edge .
The sweep line algorithm processes four types of events: when the left or right end point of an edge meets the left or right side of sweep squares. At each of these events for edge , the slope and vertical intercept of is updated to reflect the current contribution of the edge to the weight of intersecting sweep squares. At each event, we also compute the weight of a sweep square whose lower or upper side has the same height as and record the square with the maximum weight. Let be the height of . Using binary search on , we can find the first and the last edge that intersects any of these sweep squares in . Then, we can find the sum of the slopes and the -intercepts of the contribution of these edges to compute the linear function that shows the weight of the sweep square based on the position of the sweep line. To do so, we can store the slope and the vertical intercept of the edges in separate data structures that support finding the sum of any contiguous subset of the values according to . This can be done using, for instance, the Fenwick tree data structure, supporting the computation of the prefix-sum of a sequence of numbers and updating any of them in .
Therefore, when the algorithm finishes after processing events, each with the time complexity , we can report the maximum weight square, one of whose left corners are on a trajectory vertex. ∎
There is an approximation algorithm for finding hotspots (axis-parallel squares with side length ) of orthogonal trajectories, such that the weight of the square found by the algorithm is at least of the optimal value ().
Let be an orthogonal trajectory. can be partitioned into sets and containing the vertical and horizontal edges of , respectively. Lemma 2.4 shows how to find a square with the maximum possible weight, in which one of its corners is on a vertex of (the algorithm can be performed twice, once after rotating the plane 180 degrees to find the maximum-weight squares with one of its right corners on a vertex of ). The same algorithm can obtain a square with the maximum possible weight for , after rotating the plane 90 degrees. By Theorem 2.3, and . Also, by Lemma 2.1, , implying that , as required. ∎
3 A Approximation Algorithm
In a plane sweep algorithm to find a hotspot of a horizontal trajectory, as we move the vertical sweep line to the right, we can maintain the weight of every square whose left side is on the line (referred to as sweep squares hereafter). The contribution of a trajectory edge to the weight of any such square can be described as a piecewise linear function of the horizontal position of the sweep line, . When only the right side of a square intersects a trajectory edge, the weight of the square increases by the entering rate of the edge. Similarly, if only the left side of a square intersects an edge, the contribution of the edge to the weight of the square decreases by the leaving rate of the edge. Therefore, the weight of each square is also piecewise linear, as it is the sum of a set of piecewise linear functions of . Thus, to find a square with the maximum weight, we can update the weight of the squares as we sweep. Although there are infinitely many squares with a left corner on the sweep line, for finding a square with the maximum weight we can keep track of only of them, that is exactly those whose lower left corner has the same height as a trajectory vertex (we can move any square upwards to the point that its lower side meets a trajectory vertex, without decreasing its weight).
We define a plane , whose horizontal axis represents the position of the sweep line and whose vertical axis represents the weight of sweep squares, as follows. For each square, we add a moving point to whose height at shows the weight of the corresponding sweep square when the sweep line is at . Figure 2 shows a square and the trajectory of its corresponding point in . Obviously, the highest point for any value of in shows the square with the maximum weight when the sweep line is at . Thus, to find a hotspot of the trajectory, we can compute the upper envelope of for these moving points.
We start with the assumption that the vertical distance between any pair of horizontal edges is more than the side length of the hotspot. This assumption implies that every square intersects no more than one trajectory edge. To compute the upper envelope in this case, we can use kinetic tournament trees , which work as follows. The points (and their corresponding weight function) are stored in the leaves of a tree in an arbitrary order. At each internal node, we compute the maximum function of the children (the winner) and the amount of time this maximum is guaranteed to stay the same (the failure time of the winner’s winning certificate). The failure times are stored in a priority queue and are processed ordered by their time. When the next certificate fails, the winner or the failure time of the corresponding node and its parents are updated. In we also have update events, in which the linear function of a node changes (when a trajectory edge crosses the right side of sweep squares or when the sweep line moves to the right of an edge). The function assigned to each point can be updated as in dynamic and kinetic tournament trees .
If the vertical distance between trajectory edges can be smaller than hotspot side length, a sweep square may contain more than one trajectory edge, in which case the height of its corresponding point is the sum of a set of piecewise linear functions. At each edge event, of these functions may need to be updated, making the complexity of the sweep line algorithm . To handle these update events efficiently, we use a segment tree as the underlying data structure for the kinetic tournament tree. The details, correctness, and complexity of this algorithm is shown in the rest of this section.
3.1 The Algorithm
We use a segment tree to compute the weight of every sweep square during the sweep line algorithm. is initialized with segments: if is the -coordinate of a horizontal trajectory edge , the segment is added to the segment tree, where is the side length of the squares. Every sweep square, the -coordinate of whose lower side is within this interval intersects . A stabbing query for value on reports every edge intersecting the sweep square whose lower side is at height . An example is demonstrated in Figure 3. For each edge, a segment with length is inserted into the segment tree (the root of the segment tree is the left-most node). The text above each tree node represents the set of segment labels at that node. The sweep square , whose lower side is within the interval specified by the segment corresponding to edge , intersects at some point during the sweep line algorithm.
For each node , we represent the set of labels of the segments at as and the union of the intervals of the leaves of the subtree rooted at as . The leaves of the segment tree represent the start or end point of a segment; denotes the -coordinate of the corresponding point of leaf . With slight abuse of notation, we sometimes use to refer to the sweep square whose lower side has height and by the weight of , we mean the weight of that square.
During the sweep line algorithm, a linear function of is assigned to each segment, which shows the contribution of the corresponding trajectory edge to the weight of intersecting sweep squares. For each node of , we store two linear functions. The sum function is the sum of the functions assigned to the segments in . The winner function is equal to for leaves and, for other nodes, is the sum of and the maximum function (for the current sweeping position ), among any child of . We also store the winner leaf for each node . For leaves, equals itself. For other nodes, equals , if is the child of with the maximum value of (therefore, ).
Sweeping starts at , in which the functions of all segments are zero. We use a priority queue to store sweeping events. There are two types of events: the failure of the winning certificate of a node of (failure events) and a trajectory edge entering or leaving a sweep square (edge events).
At a failure event for node , the winner, the winner function, and the certificate failure time of and its parents are updated, as in regular kinetic tournament trees. At an edge event, let be its corresponding segment. The function for every such that appears in should be updated to reflect the new function of . After updating , also needs to be updated. Since the updated function may change the winner and the failure time of ’s parents, they should also be updated as in failure events.
For a leaf , let be the path from the root of to , in which . Then, the weight of the square at is during the sweep line algorithm.
Answering a stabbing query, i.e. a query to find the segments that intersect a given value of , requires a traversal from the root of the tree to a leaf and reporting every segment stored in the nodes of the traversal (note that every intersecting segment is reported exactly once). Therefore, a query for the value of reports every segment in for . To compute the weight of , we need to sum up the contribution of each intersecting segment. Since, is the sum of the functions of the segments in and the label of every intersecting segment is stored in exactly one node of , is the total contribution of the segments to the weight of the square. ∎
Let be a leaf, be one of its ancestors in , and be the path from to , where and . If is , we should have .
We use induction on , the number of the nodes of the path from to . When , is a leaf and . For , let and be the two children of . is either or . Therefore, either or is . Without loss of generality, suppose (this implies that ). Based on induction hypothesis, . Since , the statement follows. ∎
A square at one of the leaves of has the maximum weight among all sweep squares at any stage during the algorithm.
We can move a square with the maximum weight upwards until the lower side meets a trajectory edge without decreasing its weight. Since for every trajectory edge, a segment is inserted into , there is a node of such that equals the height of . ∎
stores a sweep square with the maximum weight at its root during the sweep line algorithm.
During the algorithm and for every subtree rooted at , we show that is the square with the maximum weight among the squares, the height of whose lower side is in the interval . Lemma 3.3 implies that we need to consider only the leaves of the subtree rooted at . Therefore, we instead show that a leaf with the maximum weight appears as the winner of .
We use induction on the height (the distance from the leaves) of the nodes to show that the property holds for every node. For leaves, the statement is trivially true. Let be a node with children and (the case with one child is also trivial and omitted). We denote by the subtree rooted at node . Then, every leaf of is a leaf in either or . By induction hypothesis, a leaf with the maximum weight in and appears as and , respectively. Therefore, the square with the maximum weight in , , is either or .
Let and and let and be the path from the root of to and , respectively. Both and include ; let . Since and diverge at , for every integer such that . Based on Lemma 3.1, the weight of is and the weight of is . Also based on Lemma 3.2, and . Therefore, is if and , otherwise. This implies that is the same as , since is chosen based on the value of and . This completes the proof. ∎
The main challenge in the analysis of the sweeping algorithm is limiting the number of failure events. In a dynamic and kinetic tournament tree for movement functions with degree at most , using a balanced binary tree and when implementing each update as a deletion followed by an insertion, the number of events is , where is the number of updates and is the maximum length of Davenport-Schinzel sequences of order on symbols . For our problem, this yields a poor bound, since each edge event may update the weight of leaves and thus , which implies that the total number of failure events is , in which denotes the inverse Ackermann function. In Theorem 3.5 we present a tighter bound.
The time complexity of the plane sweep algorithm for finding a hotspot of a horizontal trajectory is
Instead of limiting the number of failure events, we find an upper bound for the total number of winner changes at different nodes of (note that a failure event may change winners).
Let at some point in the algorithm, in which is a leaf. Since weight functions are linear, when changes to a value , where , can never become a winner at , unless an edge event updates the weight of or . Without the edge events, therefore, the number of times a leaf can become a winner in its parent nodes is . This implies a total of winner changes. It remains to limit the number of winner changes that can result from edge events.
Suppose an edge event for edge updates the function assigned to segment . Let be the set of all nodes like in such that . For every node in the sum and winner functions are updated. This change does not cause any winner change in , because the relative weight of its leaves does not change. However, the new weight function of may cause future winner changes in the parents of . In segment trees one can show that the label of each segment appears in nodes (for details, see ) and thus the size of is . Therefore, the number of winner changes by each edge event is and, since there are edge events, the total number of winner changes induced by the edge events is . Since for each winner change at node , the failure time of is updated in , the cost of performing each winner change is . Thus, the time complexity of the algorithm is . ∎
There exists an approximation algorithm with the approximation factor and time complexity for finding hotspots of orthogonal trajectories.
4 Concluding Remarks
Any algorithm used for finding hotspots in 2-dimensions can be extended to find axis-parallel, cube hotspots of fixed side length for orthogonal trajectories in . We extend the definitions and notations presented in Section 2 to . The weight of a cube with side length with respect to trajectory in is the total duration in which the entity spends inside it; we represent it as , as before. A hotspot of a trajectory in is an axis-parallel cube (i.e. a cube whose faces are parallel to the planes defined by any pair of the axes of the coordinate system) of fixed side length and the maximum weight, .
Let be an edge parallel to the -axis and let be an axis-parallel cube. Exactly two faces of are parallel to the -plane, and , with appearing first (in the positive direction of the -axis). The entering rate of with respect to , denoted as , is the rate at which the contribution of the weight of to the weight of increases if is moved to the right. Similarly, the leaving rate of with respect to , denoted as , is the rate at which the contribution of the weight of to the weight of decreases if is moved to the right. As in the 2-dimensional case, or are either zero or the ratio of the duration of to its length, which we denote as . We define (similarly ) for orthogonal trajectory as the sum of the entering (leaving) rates of all edges of that are parallel to the -axis. The following lemma extends Lemma 2.2 to three dimensions.
Let be a trajectory in with axis-parallel edges. For any axis-parallel cube like , there is a cube with at least the same weight, such that a vertex of is on one of the two planes formed by extending its -parallel faces.
Therefore, to find a hotspot of , it suffices to search among the cubes with a vertex of on one of the -parallel planes containing its -parallel faces. This observation suggests Threorem 4.2.
Suppose algorithm can find a -approximate hotspot of any trajectory in containing axis-aligned edges with the time complexity . For a trajectory in , all of whose edges are axis-aligned, there exists an algorithm with the time complexity and approximation factor to find an axis-aligned cube hotspot of .
For each vertex of , let be the -coordinate of . Project all edges that are (maybe partially) between and to the plane to obtain an orthogonal 2-dimensional trajectory . Edges parallel to the -axis are projected to an edge with length zero, whose weight denotes the duration of the portion between and . Perform algorithm on to obtain a square . Let be the cube with on . It is not difficult to see that is equal to . Record , if it has the maximum weight so far. Repeat the preceding steps after reversing the direction of the -axis to find cubes like , with on a vertex of . Return the cube with the maximum weight. Lemma 4.1 implies that this cube is a -approximate hotspot of . ∎
A -approximate cube hotspot of a three-dimensional trajectory can be found in .
The author wishes to thank Marc van Kreveld for his valuable comments on an earlier version of this paper.
-  Y. Zheng. Trajectory data mining - an overview. ACM Transactions on Intelligent Systems and Technology, 6(3):29:1–29:41, 2015.
-  M. Benkert, B. Djordjevic, J. Gudmundsson, and T. Wolle. Finding popular places. International Journal of Computational Geometry and Applications, 20(1):19–42, 2010.
-  M. Buchin, A. Driemel, M. J. van Kreveld, and V. Sacristán. Segmenting trajectories - a framework and algorithms using spatiotemporal criteria. Journal of Spatial Information Science, 3(1):33–63, 2011.
-  B. Aronov, A. Driemel, M. J. van Kreveld, M. Löffler, and F. Staals. Segmentation of trajectories on nonmonotone criteria. ACM Transactions on Algorithms, 12(2):26:1–26:28, 2016.
-  J. Gudmundsson, M. J. van Kreveld, and F. Staals. Algorithms for hotspot computation on trajectory data. In SIGSPATIAL/GIS, pages 134–143, 2013.
-  L. O. Alvares, V. Bogorny, B. Kuijpers, J. A. F. de Macêdo, B. Moelans, and A. A. Vaisman. A model for enriching trajectories with semantic geographical information. In ACM International Symposium on Geographic Information Systems, page 22. ACM, 2007.
-  S. Tiwari and S. Kaushik. Mining popular places in a geo-spatial region based on gps data using semantic information. In Workshop on Databases in Networked Information Systems, pages 262–276. Springer, 2013.
-  J. Basch, L. J. Guibas, and J. Hershberger. Data structures for mobile data. Journal of Algorithms, 31(1):1–28, 1999.
-  H. J. Miller. Modelling accessibility using space-time prism concepts within geographical information systems. International Journal of Geographical Information Science, 5(3):287–301, 1991.
P. M. Fenwick.
A new data structure for cumulative probability tables - an improved frequency-to-symbol algorithm.Software, Practice and Experience, 26(4):489–490, 1996.
-  P. K. Agarwal, H. Kaplan, and M. Sharir. Kinetic and dynamic data structures for closest pair and all nearest neighbors. ACM Transactions on Algorithms, 5(1):4:1–4:37, 2008.
-  M. de Berg, O. Cheong, M. J. van Kreveld, and M. H. Overmars. Computational geometry - algorithms and applications. Springer, third edition, 2008.