The set-maxima problem was first introduced by Graham et al. (1980)
, in the context of finding lower bounds for shortest path problems. It was shown at the time that the decision tree bound is weak. The general problem remains important and unsolved.
We define the problem. Let be a set of elements with an underlying total order. Let be a collection of distinct subsets of . Let . The set-maxima problem asks to determine the maxima of all the sets in the collection. Specifically we are interested in determining the number of comparisons necessary and sufficient to solve the problem. In this model we assume that determining set memberships, computing the union / intersections of the sets etc. are free. We shall use the term comparison complexity to indicate that we are only dealing with the number of comparisons between elements and use the term total complexity to indicate the overall run time. (To be clear, each will actually a subset of , integer indices into , so an implementation would not involve comparisons in set operations.)
The best known lower bound for the problem under the comparison tree model is no better than the trivial bound of . This was proved in Graham et al. (1980) using the -uniqueness property. The best upper bound for the problem is combination of several trivial upper bounds and is summarized as . The term comes from the following simple observation: if we sort the set then without any further comparisons we can determine the maximum of each set by simply scan the sorted list while doing membership queries. The second term is the results of the following procedure: for each element add it to the bucket (create one if it does not exists) representing the intersection of sets the element belongs to. We have to create at most buckets since they are mutually disjoint. Determine the maximum for each bucket, doing so takes at most comparisons. Next for each of the sets determine the collection of at most buckets the set has a non-empty intersection. Compute the maxima of these buckets. The second algorithms is only considered when .
2 Previous and Related Work
Komlós (1984) proposed an algorithm for the special set maxima problem motivated by Graham et al. For the minimum spanning tree verification problem, is the set of weighted edges in a tree and the collection consists of subsets of edges that join two non-adjacent vertices in the tree. Komlos’ algorithm arbitrarily roots the tree and makes paths into pairs of paths to the root. The algorithm makes comparisons.
Bar-Noy et al. (1992) gave the first general algorithm. Their “rank-sequence algorithm” determines a rank sequence according to the application domain. Specifically a rank sequence is an ordered sequence of ranks . The corresponding partition of is computed. Each
is reduced to just those elements in on block of the partition. When the elements are points and the sets are hyperplanes that form a projective space, it can be computed with linear comparisons, for a suitable rank sequence. However, the rank-sequence algorithm is no better than the trivial algorithm above in the worst case. It was shown byDesper (1994) that for some collection of subsets there are no good rank sequence for which the number of comparisons made by the algorithm is linear.
Liberatore (1998) showed that this can be generalized using weighted matroids. One of the canonical examples of matroids is the graphic matroid. Generalized to binary matroids (since graphic matroids are also regular) this has been termed by Liberatore as the fundamental path maxima problem over such matroids. A cographic matroid is a dual of a graphic matroid. For a cographic matroid the problem can be solved in (Tarjan (1982)) comparisons. Liberatore generalized these results to a restricted class of matroids that can be constructed via direct-sums and 2-sums and gave a -comparison algorithm (Liberatore (1998)).
Goddard et al. (1993) proposed an algorithm that chose a rank sequence randomly. They show that the expected number of comparisons in their algorithm is which is optimal according to the comparison tree complexity. The randomized algorithm can solve a more general problem of computing the largest elements for each subset .
3 Our General Algorithm
There has only been one algorithm for the general set-maxima problem (Bar-Noy et al. (1992)) and our algorithm is incomparable to that. Ours is based on the structure of the subset lattice. One drawback is that our algorithm is oblivious. That is, the set of comparisons determined by the intersection lattice is only dependent on the set system and not on the results of prior comparisons. However, for the geometric case below can still derive a non-trivial bound with only the knowledge of the subset structure.
We define a structure that is a sparse sub-lattice of the normal subset lattice. Each node of our lattice is identified/labeled with a subset , and it will be interpreted as an index set for a collection of subsets drawn from . Recall that the normal full subset lattice it has layers and the top layer represents the empty set; we will discard this layer from . The layer contains sets whose cardinality is , and the bottom layer corresponds to . We will define next. A node will exists only when is not empty. We treat the first layer in a special manner since the index sets for this layer are just singletons, we require they remain in regardless. These are all the nodes of .
Each node has a . An elements if and only if and for all , if , . The set of all s form a disjoint partition of , so the number of non-empty s is bounded by . The number of nodes in is including the nodes in the first layer.) We can construct this lattice by iterating over the elements and determine the intersection they belong in. The time it takes to do this will depend on how the input is represented but cannot be more than . Additionally, this does not involve any key comparisons between the elements.
Further we define the parent relations in the reduced lattice ; if a path from node down to exists in the original lattice (so ) and now only nodes and from that path exist then node is the parent of . The cover for each is defined as the set of all the parents of in . A good-cover contains parents of such that the following holds:
Let be a good-cover of of minimum cardinality. Let be the set of all nodes in .
We can solve set-maxima with comparisons.
The value we need to compute for each is . Note that if contains more than just the maximum of the elements in only the maximum will ever be needed by the computation. Hence we assume a preprocessing step where each is reduced to the maximum element; this can be done in a overall total of comparisons. The key observation is that can be regarded as a directed acyclic graph with nodes on the first layer, with the edges directed from a parent to a child. The value computed for each is the maximum of the keys in the s for the nodes reachable from on the first layer. Given the s we define to be the subgraph of in which the set of parents of each node is just ; clearly reachability is unaffected with the change. Using standard graph algorithms we can process the graph bottom-up where each node updates its to be the maximum of its key value and the s of all its children. The loop invariant is that each processed node knows the maximum of all of its reachable values. Clearly the element in is involved in comparisons, which proves the theorem. ∎
4 Convex Set-System
Work with rectangular queries (see next section) motivated us look at the geometric setting. We also wanted to explore non-trivial set systems in which the algorithm in the previous section only used linear comparisons. (The earlier formulation by Bar-Noy et al. (1992) for the projective geometry case is only based on the -design structure of the set-system and does not use any geometric arguments.)
Elements of are identified with points on the Euclidean plane. We regard each element of as a key value associated with a point; point has key value . The sets of are now constrained to be convex polygons. The points that are in are all the points in a given convex polygon ; each point is either in the interior or on the perimeter of , so we can assume that each is the convex hull of . Our problem has a parameter ; we will assume that each has at most sides. Note that this does not restrict the cardinality of .
We show that we can determine every maxima for the convex regions using only comparisons. The algorithmic framework is the same as in the previous section. Again we will use a lattice but with convex polygons associated with each node. The nodes in the first layer (in the reduced lattice) will be associated with each . The other nodes correspond to various non-empty intersection of collections of s. Let . All polygons below will be convex. Let be the set of points from in not found in a where . As above, a node is in the reduced lattice if either on the first layer or contains at least one point from . We define a cover-set for the convex regions analogously. If is a good-cover for polygon then: 1) for all , 2) the region is empty (has no points) for all and , 3) and 4) is minimum among all such collections.
The rest of the algorithm is analogous to the non-geometric case. The number of nodes in the reduced lattice is linear. Again we reduce each to at most one point. And it is still true the run-time depends on the sum of the size of the cover-sets, for the same reasons. Only thing that remains is to bound the quantity .
Assume . Let be a polygonal chain of successive edges of which are not (part of) some edge of . In Figure 1 we see two such polygonal chains. An upper chain and a lower chain . Let be the set of all such chains formed by the intersection of and . Note that and are not part of any chain. If we treat a chain as a set consisting of edges, then we can define the set operators on a pair of such chains.
If are in the cover-set of then for any chain and . If is in the cover-set of then since .
Let there be some and such that . Then from Figure 2 we see that there is a convex polygonal region such that . This contradicts our earlier observation that a pair of set in a cover-set cannot intersect beyond the region that they are covers of. ∎
For the next lemma we need an additional combinatorial observation.
If is a set of elements and be collection of subsets from each of which has size , then every set in contains some element from any subset of size of .
If the set system consists of polygons of at most sides then every polygon in the lattice has a cover-set of size at most .
Let be any (-gonal) region formed by the intersection of polygons, . Let these polygons be . For each polygon , at least of the sides of will be part of some polygonal chain (i.e., in ) of . This can be easily seen since has at most sides and is convex, only edges of can also be part of the edges of . Let be the collection of all edges of that are in some chain of . And let the collection of all these sets. We can use Observation 2.4 to claim that there is a set of at most edges of such that every polygon has at least one edge in one of its chain from the set . For each edge of which is in let be the index set corresponding to the collection of polygons which contains the edge in some chain. Let . Clearly . If for each pair of edges of if then we are done. Otherwise we have for some pair . Then we simply replace the two regions and with , which can only reduce the size of our cover-set. Since we see that the collection forms a cover-set of and has size at most .
On a set-system realized by convex polygons (as above) with at most sides, we can solve the set maxima, with the elements representing points in the plane, with comparisons. Where is the number of points.
Proof immediately follows from Theorem 3 and Lemma 7. ∎
Parameterization of the results in terms of is important. Without any restriction on the polygons, it is possible to represent any arbitrary set system in this geometric setting. This follows easily. Take all the points of to be on a circle. Then any subset of points are the corners of a convex polygon. Restricting allows the geometry to play a role.
- Bar-Noy et al. (1992) Bar-Noy, A., Motwan, R., Naor, J., 1992. A linear time approach to the set maxima problem. SIAM Journal on Discrete Mathematics 5 (1), 1–9.
- Desper (1994) Desper, R., 1994. The set-maxima problem: an overview. Master’s thesis, Rutgers University.
- Goddard et al. (1993) Goddard, W., Kenyon, C., King, V., Schulman, L. J., 1993. Optimal randomized algorithms for local sorting and set-maxima. SIAM Journal on Computing 22 (2), 272–283.
- Graham et al. (1980) Graham, R. L., Yao, A. C., Yao, F. F., 1980. Information bounds are weak in the shortest distance problem. Journal of the ACM (JACM) 27 (3), 428–444.
- Komlós (1984) Komlós, J. N., 1984. Linear verification for spanning trees. In: Foundations of Computer Science, 1984. 25th Annual Symposium on. IEEE, pp. 201–206.
- Liberatore (1998) Liberatore, V., 1998. Matroid decomposition methods for the set maxima problem. In: SODA. pp. 400–409.
- Tarjan (1982) Tarjan, R. E., 1982. Sensitivity analysis of minimum spanning trees and shortest path trees. Information Processing Letters 14 (1), 30–33.