Dynamic Products of Ranks

07/16/2020
by   David Eppstein, et al.
University of California, Irvine
0

We describe a data structure that can maintain a dynamic set of points given by their Cartesian coordinates, and maintain the point whose product of ranks within the two coordinate orderings is minimum or maximum, in time O(√(nlog n)) per update.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

11/01/2021

Dynamic Geometric Set Cover, Revisited

Geometric set cover is a classical problem in computational geometry, wh...
03/14/2021

More Dynamic Data Structures for Geometric Set Cover with Sublinear Update Time

We study geometric set cover problems in dynamic settings, allowing inse...
06/28/2021

Dynamic Connectivity in Disk Graphs

Let S be a set of n sites, each associated with a point in ℝ^2 and a rad...
11/03/2020

Diverging orbits for the Ehrlich-Aberth and the Weierstrass root finders

We show that the higher dimensional Weierstrass and Ehrlich-Aberth metho...
12/13/2019

Construction and Maintenance of Swarm Drones

In this paper we study the dynamic version of the covering problem motiv...
09/03/2021

From Data Processes to Data Products: Knowledge Infrastructures in Astronomy

We explore how astronomers take observational data from telescopes, proc...
01/01/2022

Dynamic Least-Squares Regression

A common challenge in large-scale supervised learning, is how to exploit...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The rank

of an element in a collection of elements is its position in a list of all elements, sorted by some associated numerical value. If elements have a multidimensional vector of values associated with them, then each of these values gives rise to a different rank, and we may wish to aggregate these multiple ranks into a single combined score. One common method of aggregating ranks is to use the geometric mean or equivalently the product of ranks as the combined score. This method is used in applications ranging from finding differentially regulated genes in DNA microarray data 

[2], choosing winners in multi-discipline sports events [7], and measuring the scholarly output of economists [10] to image recognition [8] and spam filtering in web search engines [6].

In many of these applications, it is natural for the elements in the collection and their associated numerical values to change dynamically, and when they do the whole system of ranks for other elements may change. For instance, inserting one new element, with a low numerical value, will increase the ranks of all elements with larger values. This raises the question: how can we update the elements and their numerical values, and maintain information about the product of ranks?

We can model this as a geometry problem, in which the elements in the collection are modeled as points in the Cartesian plane, with the - and -coordinates of these points representing their associated numerical values. In this model, we would like to maintain a dynamic set of pairs of real numbers, subject to point insertion and point deletion, and as we do so, maintain dynamically the point whose product of ranks in the two coordinate orderings is minimum or maximum.

In this work we provide a solution to this dynamic product of ranks problem. We solve the dynamic product of ranks problem, in the special case when there are two rankings being combined, in time per update.

There are three main ideas to our method:

  • We partition the points into rigid subsets: sets of points whose ranks all change in lockstep with each operation (that is, without changing the difference between the ranks of any two elements in the set). Our partition will have the property that each update will rebuild rigid subsets of total size and search for the point with minimum or maximum product of ranks within of these subsets.

  • We provide two solutions to the dynamic product of ranks problem within each rigid subset. One solution applies a lifting transformation (to the pairs of ranks of the points, not their given coordinates) to turn it into a problem of querying a (static) three-dimensional convex hull. Dually, the other solution uses analogues of the classical Voronoi diagram and farthest-point Voronoi diagram, minimization and maximization diagrams with convex-polygon cells.

  • We provide linear-time constructions for the lifted convex hull in the minimization version of the problem, and for the maximization diagram in the maximization version of the problem, adapted from two different algorithms for linear-time construction of Voronoi diagrams of points in convex position.

Our method can be generalized to larger numbers of rankings, but with a quadratic blowup in the dimension of the lifting transformation that (together with the high complexity of higher-dimensional extreme point queries) leads to a running time per update that is only slightly smaller than the trivial naive algorithm of updating all rankings and recomputing all products in linear time per update. For this reason, we restrict our attention to maintaining information about the product of two rankings.

2 Rigid subsets

2.1 Lifted hull

We say that a subset of elements in our product of ranks problem is rigid, through a sequence of updates, if none of the updates performs an insertion or deletion of an element of , or of another element whose position in either of the two rankings lies between two elements of . Equivalently, the difference in ranks of any two elements of remains invariant throughout the given sequence of updates.

Lemma

Let be any subset of elements in the product of ranks problem, of size . Then in time we can build a data structure for such that, throughout any sequence of updates for which is rigid, we can compute the elements of with the minimum or maximum product of ranks in time per update.

Let be the ranks of the elements of prior to the sequence of updates for which is rigid. We construct the three-dimensional convex hull of the lifted points , and a Dobkin–Kirkpatrick hierarchy allowing us to perform linear optimization queries (finding the extreme point on the resulting hull of a given linear function) in time per query [5]. The hull takes time to construct and its Dobkin–Kirkpatrick hierarchy takes an additional time. For each element, let denote its third coordinate in this lifted point set.

After a sequence of updates that have changed the ranks by subtracting the same offset from each rank and the same offset from each rank within , the updated products of ranks are

a linear function of the three coordinates of the lifted points, so the elements with the minimum and maximum product of ranks can be found by a linear optimization query.

This method is closely analogous to the classical lifting transformation of two-dimensional closest-point problems to three-dimensional extreme-point problems [3], which in its most commonly used form maps pairs to triples ; however, we use a different quadratic function for the third coordinate. Note that we will only query this structure for pairs with and , because the differences and represent ranks and are therefore non-negative.

2.2 Linear time construction

To construct the lifted hull more quickly, it is helpful to reduce the set of points to a subset whose projection to the plane is convex.

Figure 1: The minimizer of must be a convex hull vertex, because the region below line , the tangent to the hyperbola through , must be disjoint from (subsection 2.2). Analogously, the maximizer of must be a maximal point of , because the region above and to its left (yellow) must be disjoint from (subsection 2.4).
Lemma

Let be a set of points, let be a pair of numbers with less than or equal to all -coordinates in and less than or equal to all -coordinates in . Let be the point in minimizing . Then lies on the convex hull of .

The locus of points with is a hyperbola, asymptotic to the lines and , with on its positive branch. Let be the line tangent to this hyperbola at ; see Figure 1. Then the halfplane below must be disjoint from , for any point between and the other branch of the hyperbola would have a smaller value of and by the assumptions on and there are no points of on the other side of the other branch of the hyperbola.

Aggarwal et al. [1] showed that, for 3d points whose two-dimensional projection is convex, the 3d convex hull can be constructed in linear time. In the next lemma we apply this result to the lifted hull of subsection 2.1.

Lemma

Let be any subset of elements in the product of ranks problem, of size , for which the sorted order by -coordinate is known. Then in time we can build a data structure for such that, throughout any sequence of updates for which is rigid, we can compute the elements of with the minimum product of ranks in time per update.

We use Graham scan to compute the 2d convex hull from the sorted order of points in linear time, and the algorithm of Aggarwal et al. [1] to compute the 3d convex hull from the 2d convex hull in linear time. The Dobkin–Kirkpatrick hierarchy construction time is also linear.

2.3 Maximization diagram

Instead of lifting the points to the convex hull of three-dimensional points , an alternative representation for each rigid subset would be to represent it by the minimization diagram or maximization diagram of the functions . Then, the minimum or maximum product of ranks for the rigid subset with rank offsets and could be obtained by performing a point location query in this diagram, rather than by performing an extreme-point query on a three-dimensional polyhedron.

Because the quadratic term in the definition of the function does not depend on the point , and is equal for all points, it does not affect the minimization or maximization: we obtain the same minimization or maximization diagrams for the linear functions . As the minimization or maximization diagram of linear functions, these diagrams have convex polygon cells, separated by bisector lines, the lines consisting of the points at which two of these functions are equal.

Figure 2: The bisector between two sites in the minimization diagram is the line through the other two corners of their bounding box.
Lemma

The bisector of any two given points (sites) and in the minimization or maximization diagram described above is a line that passes through the other two corners and of the bounding box of the two points (Figure 2).

When the bounding box is a square, this follows by symmetry: a reflection through the line described in the lemma maps the two given points to each other, swapping the two Cartesian coordinates, so for any point on the line described in the lemma, the coordinate differences between and the two given points are equal but reversed. That is, and . Since the quantity being minimized is the product of these coordinate differences, it is equal for the two given points along this line.

For any other two points, not both on the same vertical or horizontal line, we may apply a linear transformation to one of the coordinates that makes the bounding box a square; this transformation affects both of the functions

and in the same way, so the bisector of the transformed points (the diagonal of the square) is the transformation of the bisector, which must therefore be the diagonal of the original bounding box. The remaining case, that the points are on a horizontal or vertical line, follows by continuity.

Figure 3: The maximization diagram for a given set of maximal points. Although this diagram is well-defined over the whole plane, we will only query it within the bottom-left quadrant, below and to the left of all the given points.

Figure 3 depicts an example of the maximization diagram described above.

2.4 Expected linear time construction

These diagrams can be constructed in time, either by interpreting them as a lower or upper envelope of three-dimensional planes (the graphs of the functions they minimize or maximize) or by using algorithms for abstract Voronoi diagrams with bisectors determined as in subsection 2.3 [9]. However, as we now show, they can be constructed in expected linear time.

Our construction begins with the following analogue of subsection 2.2. We observe that, in constructing the maximization diagram for a collection of points, we need only include the points that are maximal (meaning that there is no other point with and ), for those are the only ones that can produce the maximum of the function values at any point .

Lemma

Let be a set of points, let be a pair of numbers with less than or equal to all -coordinates in and less than or equal to all -coordinates in . Let be the point in maximizing . Then is one of the maximal points of , meaning that there is no other point in with and .

Any such point would have a larger value of .

The quarter-plane of points with larger - and -coordinates than , and their relation to the hyperbola of points with equal query values to , is shown in Figure 1.

To construct the maximization diagram in expected linear time we adapt an algorithm by Paul Chew for Voronoi diagrams of convex polygons [4].

Lemma

Let be a set of points, all of which are maximal in , indexed in sorted order by their -coordinates, and let and be consecutive points in this ordering. Then in the maximization diagram for , the cells for these two points share an edge.

Within the bounding rectangle of and , the point has a larger query value than all points of with smaller index, and the point has a larger query value than all points of with larger index, so the maximization diagram within the rectangle consists only of points in the cells for these two points. By subsection 2.3 the cells meet within the rectangle along the bisector of these two points, which is the diagonal of the rectangle.

The shared edge is not in a part of the diagram that we will query in our data structure for products of ranks, but its location is unimportant for the use we will make of it in the following lemma.

Lemma

Let be any subset of elements in the product of ranks problem, of size , for which the sorted order by -coordinate is known. Then in randomized expected time we can build a data structure for such that, throughout any sequence of updates for which is rigid, we can compute the elements of with the maximum product of ranks in time per update.

The maximal points in can be found in linear time from the sorted order by -coordinates, using a stack algorithm closely related to Graham scan.

We construct the maximization diagram by a randomized incremental algorithm in which we randomly permute the points and add them to the diagram one at a time in that random order. By the analysis of Chew [4], this can be done in expected constant time per point as long as we know the identity of a neighboring cell in the diagram of the points added so far. We can form a random permutation with this additional information about neighboring cells by starting with a doubly linked list of all of the points, in -coordinate order, deleting randomly chosen points from the linked list until none are left, and then reversing the order of the deletions. By subsection 2.4, the neighbors of a point in the linked list at the time of its deletion will form neighboring cells in the maximization diagram at the time of its insertion.

Because it is the maximization diagram of a set of linear functions, we can interpret this diagram as a three-dimensional intersection of halfspaces, and construct a Dobkin–Kirkpatrick hierarchy from it in linear time, suitable for performing point location queries in logarithmic time. (Alternatively, the history DAG of a vertical decomposition of the randomized incremental maximization diagram construction can be used as a point location data structure with logarithmic expected time per query.)

3 Partitioned data structure

3.1 One-dimensional partition

To partition our given elements into rigid subsets, we first consider a one-dimensional partition method, which we will apply separately to the two rankings of the elements.

Lemma

Let be any positive concave function of a single argument. Then for any sequence of ordered values undergoing insertions and deletions, we can maintain a partition of into an ordered sequence of contiguous subsets, with elements in each subset, changing subsets per update, using a data structure with time per update, where denotes the current size of .

We use binary search trees to keep track of the sequence of elements and the sequence of subsets. As keys for the binary search tree of subsets, we use the values of their first elements. In this way we can find the subset containing the updated element, after any update, and determine the new size of this subset. We also keep track of the sizes of each subset and maintain priority queues for the largest subsets and for the smallest consecutive pairs of subsets.

We maintain as an invariant the requirements that the sizes of all subsets in the partition are at most , and that no two consecutive subsets both have size less than . We say that our structure is growing if, for the most recent update having a different value of , that value was smaller than the current value, and shrinking otherwise. If the structure is growing, we require that all subsets have size at most , and if it is shrinking, we require that no two consecutive subsets both have size less than .

On each update, if the structure is growing, we select an arbitrary pair of consecutive subsets of size (if such a pair exists) and merge them into a single subset. If the structure is shrinking, we select an arbitrary subset of size greater than (if such a subset exists) and split it into two subsets of size as close to equal as possible. We claim that this is sufficient to maintain our invariants. Clearly, it does so at the updates for which does not change, so we need only consider the steps at which it does change.

In the case that changes in such a way that the structure was growing before the update and is shrinking after the update, the invariants are automatically maintained, because the ranges of sizes of subsets and consecutive pairs of subsets that are allowed remain unchanged. The same is true when the structure was shrinking before the update and growing after the update.

When increases twice in a row (so that it was growing both before and after the second increase), let be the value of at the first increase. Then at that time, there must be at most consecutive pairs of small subsets, and (by concavity of ) at least steps between the two increases. It only takes steps to eliminate all of the consecutive pairs of small subsets. So by the time that the second increase happens, all of the consecutive pairs of small subsets will have been eliminated, maintaining the invariant.

Similarly, when decreases twice in a row (so that it was shrinking both before and after the second decrease), let be the value of at the first decrease. Then at that time, there must be at most large subsets, and (by concavity of ) at least steps between the two decreases. It only takes steps to eliminate all of the large subsets. So by the time that the second decrease happens, all of the consecutive pairs of large subsets will have been eliminated, maintaining the invariant.

3.2 Two-dimensional partition

We now use our one-dimensional rank partition to partition the given elements into subsets, most of which remain rigid in each update. If the ranks of each element are , we will maintain one rank partition on the ranks , and a second rank partition on the ranks , each with parameter . Then each subset of our two-dimensional partition will consist of elements that are grouped together both in the partition on the -ranks and in the partition on the -ranks.

Lemma

The partition into subsets described above has the following properties:

  • There are subsets.

  • Each update to the data causes of the subsets, with total size , to be non-rigid.

  • Each update to the data causes of the subsets, with total size , to be replaced by new subsets due to the change in the underlying one-dimensional partitions.

It follows from subsection 3.1 and our choice of the function that each one-dimensional partition has subsets, of size , and that each update causes changes to the one-dimensional partition. Because each subset in the two-dimensional partition is determined by a pair of subsets in the two one-dimensional partitions, there are subsets in the two-dimensional partition.

In any update, only one subset of each one-dimensional partition contains non-rigid subsets of the two-dimensional partition. Therefore, the total number of non-rigid subsets is at most twice the number of two-dimensional subsets that can be contained in a single one-dimensional subset, , and the total size of the non-rigid subsets is at most twice the size of a one-dimensional subset, . The analysis of the number of subsets that are replaced with new subsets and their total size is similar: each change to a one-dimensional subset causes changes to two-dimensional subsets having a total of elements, so the bounds on replaced subsets follow from the fact that each update causes changes to the one-dimensional partitions.

4 Which subsets to query?

We introduced subsection 2.2 and subsection 2.4 to aid in the efficient construction of rigid subsets, but they can also be used to reduce the number of rigid subsets that we must query after any update. As these two lemmas show, the point with the smallest product of ranks must be minimal in the coordinate ordering of the points, and the point with the largest product of ranks must be maximal. The two-dimensional partition of subsection 3.2 partitions the points in a grid pattern, and we need only query the rigid subsets for cells in this grid that can contain minimal or maximal points.

Figure 4: A grid partition of a point set, and a path (yellow shading) through the cells of the grid, such that the cells of the path contain all minimal points of the set (shown as red).
Lemma

Let a given set of points be partitioned by axis-parallel lines into a grid of cells, represented in such a way that in constant time we can find the neighboring cell in any direction from any given cell and find the lowest nonempty cell in any column of the grid. Then in time we can identify a subset of of the grid cells that contain all of the minimal points in the set.

As we describe below, we select cells in the grid along a path from top left to bottom right, such that every unselected cell below the path is also below the lowest nonempty cell in its column, and every unselected cell above the path has a nonempty selected cell below and to the left of it. In this way, every minimal point of the given point set belongs to a selected cell, for there can be no points below and to the left of the path, and all points above and to the right are not minimal. Figure 4 shows an example.

To find this path of grid cells, we begin at the top left cell of the grid. Then we repeatedly step to a neighboring cell, according to the following rules:

  • If the current cell is the bottom right cell of the grid, we terminate the path.

  • If the current cell is not the lowest nonempty cell in its column, or if it belongs to the rightmost column, we step to the next cell down.

  • Otherwise, we step to the next cell to the right.

The path must extend across all columns, for it can only stop in the rightmost column. If a cell is below the path, it must also be below the lowest nonempty cell in its column, or we would have stepped downward to it when the path crossed its column; therefore, all cells below the path are empty. If a cell is above the path, then the path must have stepped below it in some column to the left of it, which can only happen when the lowest nonempty cell in that column is below and to the left of the given cell. Therefore, all cells above the path have a nonempty cell below and to the left of them.

A similar method, with the ability to find the highest nonempty cell in each column, can find a path of grid cells containing all maximal points.

5 Overall data structure

Our overall data structure consists of:

  • Two binary search trees on the two coordinate values of the elements, augmented to allow the rank of any element at any step of the update sequence to be looked up in logarithmic time per query.

  • Two one-dimensional partitions of the elements, one for each of the two rankings of the elements, according to subsection 3.1, with the parameter choice specified for subsection 3.2.

  • The two-dimensional partition of the elements into rigid subsets defined from these one-dimensional partitions, according to subsection 3.2.

  • A graph describing the relation between neighboring cells in this two-dimensional partition, and the lowest or highest nonempty cell in each column of cells, suitable for use in section 4.

  • A sorted list of points in each partition set, sorted by their -coordinates.

  • A data structure for maintaining the extreme points for the product of ranks of each subset , through updates for which it is rigid, according to subsection 2.1.

The data structure described above can maintain the minimum or maximum product of ranks in time per update for the minimimum, or the same time bound in expectation for the maximum.

By subsection 3.2, each update causes changes to subsets of total size ; by subsection 2.2 and subsection 2.4, reconstructing the extreme-point data structures for these subsets takes the stated time per update. After each update, we may use section 4 to find a subset of subsets to query, use the binary search trees to determine the offsets in rank for each of these selected subsets, and then query the extreme point within each subset in time by subsection 2.1. The total time for these queries is again the stated time per update. Maintaining the binary search trees and one-dimensional partitions takes an amount of time that is negligible with respect to this total time bound.

References

  • [1] Alok Aggarwal, Leonidas J. Guibas, James Saxe, and Peter W. Shor. A linear-time algorithm for computing the Voronoi diagram of a convex polygon. Discrete & Computational Geometry, 4(6):591–604, 1989. doi:10.1007/BF02187749.
  • [2] Rainer Breitling, Patrick Armengaud, Anna Amtmann, and Pawel Herzyk. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Letters, 573(1-3):83–92, 2004. doi:10.1016/j.febslet.2004.07.055.
  • [3] Kevin Q. Brown. Voronoi diagrams from convex hulls. Information Processing Letters, 9(5):223–228, December 1979. doi:10.1016/0020-0190(79)90074-7.
  • [4] L. Paul Chew. Building Voronoi diagrams for convex polygons in linear expected time. Technical Report PCS-TR90-147, Dartmouth College Department of Mathematics and Computer Science, 1990. URL: https://www.cs.dartmouth.edu/~trdata/reports/TR90-147.pdf.
  • [5] David P. Dobkin and David G. Kirkpatrick. A linear algorithm for determining the separation of convex polyhedra. Journal of Algorithms, 6(3):381–392, 1985. doi:10.1016/0196-6774(85)90007-0.
  • [6] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on World Wide Web. ACM, 2001. doi:10.1145/371920.372165.
  • [7] International Federation of Sport Climbing. Rules 2019, March 2019. URL: https://www.ifsc-climbing.org/images/World_Competitions/IFSC-Rules_2019_v192_PUBLIC.pdf.
  • [8] Chao Li and A. Barreto. An integrated 3D face-expression recognition approach. In Proceedings of the International Conference on Acoustics Speed and Signal Processing. IEEE, 2006. doi:10.1109/icassp.2006.1660858.
  • [9] K. Mehlhorn, St. Meiser, and C. Ó’Dúnlaing. On the construction of abstract Voronoi diagrams. Discrete & Computational Geometry, 6(3):211–224, 1991. doi:10.1007/BF02574686.
  • [10] Christian Zimmermann. Academic rankings with RePEc. Econometrics, 1(3):249–280, December 2013. doi:10.3390/econometrics1030249.