Faster Approximation Algorithms for Geometric Set Cover

We improve the running times of O(1)-approximation algorithms for the set cover problem in geometric settings, specifically, covering points by disks in the plane, or covering points by halfspaces in three dimensions. In the unweighted case, Agarwal and Pan [SoCG 2014] gave a randomized O(nlog^4 n)-time, O(1)-approximation algorithm, by using variants of the multiplicative weight update (MWU) method combined with geometric data structures. We simplify the data structure requirement in one of their methods and obtain a deterministic O(nlog^3 nloglog n)-time algorithm. With further new ideas, we obtain a still faster randomized O(nlog n(loglog n)^O(1))-time algorithm. For the weighted problem, we also give a randomized O(nlog^4nloglog n)-time, O(1)-approximation algorithm, by simple modifications to the MWU method and the quasi-uniform sampling technique.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/14/2021

More Dynamic Data Structures for Geometric Set Cover with Sublinear Update Time

We study geometric set cover problems in dynamic settings, allowing inse...
07/21/2018

Faster Exact and Approximate Algorithms for k-Cut

In the k-cut problem, we are given an edge-weighted graph G and an integ...
07/19/2018

Distributed approximation algorithms for maximum matching in graphs and hypergraphs

We describe randomized and deterministic approximation algorithms in Lin...
07/25/2018

Mildly Exponential Time Approximation Algorithms for Vertex Cover, Uniform Sparsest Cut and Related Problems

In this work, we study the trade-off between the running time of approxi...
05/03/2022

Experiments with Unit Disk Cover Algorithms for Covering Massive Pointsets

Given a set of n points in the plane, the Unit Disk Cover (UDC) problem ...
11/05/2020

Competitive Data-Structure Dynamization

Data-structure dynamization is a general approach for making static data...
02/29/2020

Dynamic geometric set cover and hitting set

We investigate dynamic versions of geometric set cover and hitting set w...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Unweighted geometric set cover.

In this paper we study one of the most fundamental classes of geometric optimization problems: geometric set cover. Given a set of points and a set of geometric objects, find the smallest subset of objects from to cover all points in . In the dual set system, the problem corresponds to geometric hitting set (finding the smallest number of points from that hit all objects in ).

This class of problems has been extensively investigated in the computational geometry literature. Since they are NP-hard in most scenarios, attention is turned towards approximation algorithms. Different types of objects give rise to different results. Typically, approximation algorithms fall into the following categories:

  1. Simple heuristics, e.g., greedy algorithms.

  2. Approaches based on solving the linear programming (LP) relaxation (i.e., fractional set cover) and rounding the LP solution.

  3. Polynomial-time approximation schemes (PTASs), e.g., via local search, shifted grids/quadtrees (sometimes with dynamic programming), or separator-based divide-and-conquer.

Generally, greedy algorithms achieve only logarithmic approximation factors (there are some easy cases where they give approximation factors, e.g., hitting set for fat objects such as disks/balls in the “continuous” setting with  [EfratKNS00]). The LP-based approaches give better approximation factors in many cases, e.g., approximation for set cover and hitting set for disks in 2D and halfspaces in 3D, set cover for objects in 2D with linear “union complexity”, and hitting set for pseudodisks in 2D [bronnimann1995almost, clarkson2007improved, AronovES10, Varadarajan09, PyrgaR08]. Subsequently, local-search PTASs have been found by Mustafa and Ray [mustafa2009ptas] in some cases, including set cover and hitting set for disks in 2D and halfspaces in 3D (earlier, PTASs were known for hitting set only in the continuous setting for unit disks/balls [HochbaumM85], and for arbitrary disks/balls and fat objects [Chan03]).

Historically, the focus has been on obtaining good approximation factors. Here, we are interested in obtaining approximation algorithms with good—ideally, near linear—running time. Concerning the efficiency of known approximation algorithms:

  1. Certain simple heuristics can lead to fast -approximation algorithms in some easy cases (e.g., continuous hitting set for unit disks or unit balls by using grids), but generally, even simple greedy algorithms may be difficult to implement in near linear time (as they may require nontrivial dynamic geometric data structures).

  2. LP-based approaches initially may not seem to be the most efficient, because of the need to solve an LP. However, a general-purpose LP solver can be avoided. The set-cover LP can alternatively be solved (approximately) by the multiplicative weight update (MWU) method. In the computational geometry literature, the technique has been called iterative reweighting, and its use in geometric set cover was explored by Brönnimann and Goodrich [bronnimann1995almost] (one specific application appeared in an earlier work by Clarkson [clarkson1993algorithms]), although the technique was known even earlier outside of geometry. On the other hand, the LP-rounding part corresponds to the well-known geometric problem of constructing -nets, for which efficient algorithms are known [chan2016optimal, Mustafa19].

  3. PTAS approaches generally have large polynomial running time, even when specialized to specific approximation factors. For example, see [BusGMR17] for efforts in improving the degree of the polynomial.

In this paper we design faster approximation algorithms for geometric set cover via the LP/MWU-based approaches. There has been a series of work on speeding up MWU methods for covering or packing LPs (e.g., see [chekuri2018randomized, KoufogiannakisY14, Young01]). In geometric settings, we would like more efficient algorithms (as generating the entire LP explicitly would already require quadratic time), by somehow exploiting geometric data structures. The main previous work was by Agarwal and Pan [agarwal2014near] from SoCG 2014, who showed how to compute an -approximation for set cover for 2D disks or 3D halfspaces, in randomized time.

Agarwal and Pan actually proposed two MWU-based algorithms: The first is a simple variant of the standard MWU algorithm of Brönnimann and Goodrich, which proceeds in logarithmically many rounds. The second views the problem as a 2-player zero-sum game, works quite differently (with weight updates to both points and objects), and uses randomization; the analysis is more complicated. Because the first algorithm requires stronger data structures—notably, for approximate weighted range counting with dynamic changes to the weights—Agarwal and Pan chose to implement their second algorithm instead, to get their result for 3D halfspaces.

New results.

In this paper we give:

  • a deterministic near-linear -approximation algorithm for set cover for 3D halfspaces. Its running time is , which besides eliminating randomization is also a little faster than Agarwal and Pan’s;

  • a still faster randomized near-linear -approximation algorithm for set cover for 3D halfspaces. Its running time is , which is essentially optimal111Just deciding whether a solution exists requires

    time in the algebraic decision-tree model, even for 1D intervals.

    ignoring minor factors.

Although generally shaving logarithmic factors may not be the most important endeavor, the problem is fundamental enough that we feel it worthwhile to find the most efficient algorithm possible.

Our approach interestingly is to go back to Agarwal and Pan’s first MWU algorithm. We show that with one simple modification, the data structure requirement can actually be relaxed: namely, for the approximate counting structure, there is no need for weights, and the only update operation is insertion. By standard techniques, insertion-only data structures reduce to static data structures. This simple idea immediately yields our deterministic result. (Before, Bus et al. [bus2018practical] also aimed to find variants of Agarwal and Pan’s first algorithm with simpler data structures, but they did not achieve improved theoretical time bounds.) Our best randomized result requires a more sophisticated combination of several additional ideas. In particular, we incorporate random sampling in the MWU algorithm, and extensively use shallow cuttings, in both primal and dual space.

We have stated our results for set cover for 3D halfspaces. This case is arguably the most central. It is equivalent to hitting set for 3D halfspaces, by duality, and also includes set cover and hitting set for 2D disks as special cases, by the standard lifting transformation. The case of 3D dominance ranges is another special case, by a known transformation [chan2011orthogonal, PachT11] (although for the dominance case, word-RAM techniques can speed up the algorithms further). The ideas here are likely useful also in the other cases considered in Agarwal and Pan’s paper (e.g., hitting set for rectangles, set cover for fat triangles, etc.), but in the interest of keeping the paper focused, we will not discuss these implications.

Weighted geometric set cover.

Finally, we consider the weighted version of set cover: assuming that each object is given a weight, we now want a subset of the objects of with the minimum total weight that covers all points in . The weighted problem has also received considerable attention: Varadarajan [varadarajan2010weighted] and Chan et al. [chan2012weighted] used the LP-based approach to obtain -approximation algorithms for weighted set cover for 3D halfspaces (or for objects in 2D with linear union complexity); the difficult part is in constructing -nets with small weights, which they solved by the quasi-random sampling technique. Later, Mustafa, Raman, and Ray [MustafaRR15] discovered a quasi-PTAS for 3D halfspaces by using geometric separators; the running time is very high .

Very recently, Chekuri, Har-Peled, and Quanrud [fasterlp] described new randomized MWU methods which can efficiently solve the LP corresponding to various generalizations of geometric set cover, by using appropriate geometric data structures. In particular, for weighted set cover for 3D halfspaces, they obtained a randomized -time algorithm to solve the LP but with an unspecified number of logarithmic factors. They did not address the LP-rounding part, i.e., construction of an -net of small weight—a direct implementation of the quasi-uniform sampling technique would not lead to a near-linear time bound.

We observe that a simple direct modification of the standard MWU algorithm of Brönnimann and Goodrich, or Agarwal and Pan’s first algorithm, can also solve the LP for weighted geometric set cover, with arguably simpler data structures than Chekuri et al.’s. Secondly, we observe that an -net of small weight can be constructed in near-linear time, by using quasi-uniform sampling more carefully. This leads to a randomized -time, -approximation algorithm for weighted set cover for 3D halfspaces (and thus for 2D disks).

2 Preliminaries

Let be a set of points and be a set of objects. For a point , its depth in refers to the number of objects in containing . A point is said to be -light in , if it has depth in ; otherwise it is -heavy. A subset of objects is an -net of if covers all points that are -heavy in .

It is known that there exists an -net with size for any set of halfspaces in 3D or disks in 2D [matouvsek1990net] (or more generally for objects in the plane with linear union complexity [clarkson2007improved]).

2.1 The Basic MWU Algorithm

We first review the standard multiplicative weight222 In our algorithm description, we prefer to use the term “multiplicity” instead of “weight”, to avoid confusion with the weighted set cover problem later. update (MWU) algorithm for geometric set cover, as described by Brönnimann and Goodrich [bronnimann1995almost] (which generalizes an earlier algorithm by Clarkson [clarkson1993algorithms], and is also well known outside of computational geometry).

Let be the set of input points and be the set of input objects, with . Let OPT denote the size of the minimum set cover. We assume that a value is known; this assumption will be removed later by a binary search for . In the following pseudocode, we work with a multiset ; in measuring size or counting depth, we include multiplicities (e.g., is the sum of the multiplicities of all its elements).

1:Guess a value and set .
2:Define a multiset where each object in initially has multiplicity .
3:while we can find a point which is -light in  do
4:     for each object containing  do call lines 4–5 a multiplicity-doubling step
5:         Double its multiplicity .      
6:Return an -net of the multiset .

Since at the end all points in are -heavy in , the returned subset is a valid set cover of . For halfspaces in 3D or disks in 2D, its size is .

A standard analysis shows that the algorithm always terminates after multiplicity-doubling steps. We include a quick proof: Each multiplicity-doubling step increases by a factor of at most , due to the -lightness of . Thus, after doubling steps, . On the other hand, consider a set cover of size . In each multiplicity-doubling step, at least one of the objects in has its multiplicity doubled. So, after multiplicity-doubling steps, the total multiplicity in is at least . We conclude that , implying that .

2.2 Agarwal and Pan’s (First) MWU Algorithm

Next, we review Agarwal and Pan’s first variant of the MWU algorithm [agarwal2014near]. One issue in implementing the original algorithm lies in the test in line 3: searching for one light point by scanning all points in from scratch every time seems inefficient. In Agarwal and Pan’s refined approach, we proceed in a small number of rounds, where in each round, we examine the points in in a fixed order and test for lightness in that order.

1:Guess a value and set .
2:Define a multiset where each object in initially has multiplicity .
3:loop call this the start of a new round
4:     for each point in any fixed order do
5:         while  is -light in  do
6:              for each object containing  do call lines 6–7 a multiplicity-doubling step
7:                  Double its multiplicity .               
8:              if the number of multiplicity-doubling steps in this round exceeds  then
9:                  Go to line 3 and start a new round.                             
10:     Terminate and return an -net of the multiset .

To justify correctness, observe that since each round performs at most multiplicity-doubling steps, increases by a factor of at most . Thus, a point that is checked to be -heavy in

at any moment during the round will remain

-heavy in at the end of the round.

Since all but the last round performs multiplicity-doubling steps and we have already shown that the total number of such steps is , the number of rounds is .

3 “New” MWU Algorithm

Agarwal and Pan’s algorithm still requires an efficient data structure to test whether a given point is light, and the data structure needs to support dynamic changes to the multiplicities. We propose a new variant that requires simpler data structures.

Our new algorithm is almost identical to Agarwal and Pan’s, but with just one very simple change! Namely, after line 3, at the beginning of each round, we add the following line, to readjust all multiplicities:

3.5:    for each object , reset its multiplicity .

To analyze the new algorithm, consider modifying the multiplicity instead to . The algorithm behaves identically (since the multiplicities are identical except for a common rescaling factor), but is more convenient to analyze. In this version, multiplicities are nondecreasing over time (though they may be non-integers). After the modified line 3.5, the new is at most . If the algorithm makes multiplicity-doubling steps, then it performs line 3.5 at most times and we now have . This is still sufficient to imply that , and so the number of rounds remains .

Now, let’s go back to line 3.5 as written. The advantage of this multiplicity readjustment step is that it decreases to . At the end of the round, increases by a factor of at most and so remains . Thus, in line 7, instead of doubling the multiplicity of an object, we can just repeatedly increment the multiplicity (i.e., insert one copy of an object) to reach the desired value. The total number of increments per round is .

Note that in testing for -lightness in line 5, a constant-factor approximation of the depth is sufficient, with appropriate adjustments of constants in the algorithm. Also, although the algorithm as described may test the same point for lightness several times in a round, this can be easily avoided: we just keep track of the increase in the depth of the current point ; the new depth of can be 2-approximated by the maximum of the old depth and .

To summarize, an efficient implementation of each round of the new algorithm requires solving the following geometric data structure problems (Report for line 6, and Approx-Count-Decision for line 5):

Problem Report:

Design a data structure to store a static set of size so that given a query point , we can report all objects in containing the query point . Here, the output size of a query is guaranteed to be at most where (since -lightness of  implies that its depth is at most even including multiplicities).

Problem Approx-Count-Decision:

Design a data structure to store a multiset of size so that given a query point , we can either declare that the number of objects in containing is less than a fixed threshold value , or that the number is more than , for some constant . Here, the threshold again is (since ). The data structure should support the following type of updates: insert one copy of an object to . (Deletions are not required.) Each point in is queried once.

To bound the cost of the algorithm:

  • Let denote the total time for queries in Problem Report.

  • Let denote the total time for queries and insertions in Problem Approx-Count-Decision. (Note that the initialization of at the beginning of the round can be done by insertions.)

  • Let denote the time for computing an -net of size for a given multiset of size .

The total running time over all rounds is

(1)

4 Implementations

In this section, we describe specific implementations of our MWU algorithm when the objects are halfspaces in 3D (which include disks in 2D as a special case by the standard lifting transformation). We first consider deterministic algorithms.

4.1 Deterministic Version

Shallow cuttings.

We begin by reviewing an important tool that we will use several times later. For a set of planes in , a -shallow -cutting is a collection of interior-disjoint polyhedral cells, such that each cell intersects at most planes, and the union of the cells cover all points of level at most (the level of a point refers to the number of planes below it). The list of all planes intersecting a cell is called the conflict list of . Matoušek [matousek1992reporting] proved the existence of a -shallow -cutting with cells for any constant . Chan and Tsakalidis [chan2016optimal] gave an -time deterministic algorithm to construct such a cutting, along with all its conflict lists (an earlier randomized algorithm was given by Ramos [ramos1999range]). If is sufficiently large, the cells may be made “downward”, i.e., they all contain .

Constructing -nets.

The best known deterministic algorithm for constructing -nets for 3D halfspaces is by Chan and Tsakalidis [chan2016optimal] and runs in time.

The result follows directly from their shallow cutting algorithm (using a simple argument of Matoušek [matousek1992reporting]): Without loss of generality, assume that all halfspaces are upper halfspaces, so depth corresponds to level with respect to the bounding planes (we can compute a net for lower halfspaces separately and take the union, with readjustment of by a factor of 2). We construct an -shallow -cutting with cells, and for each cell, add a plane completely below the cell (if it exists) to the net. To see correctness, for a point with level , consider the cell containing ; at least planes are completely below , and so the net contains at least one plane below .

Solving Problem Report.

This problem corresponds to 3D halfspace range reporting in dual space, and by known data structures [Chan00, afshani2009optimal, chan2016optimal], the total time to answer queries is , assuming an initial preprocessing of time (which is done only once).

This result also follows directly from shallow cuttings (since space is not our concern, the solution is much simplified): Without loss of generality, assume that all halfspaces are upper halfspaces. We construct a -shallow -cutting with downward cells. Given a query point , we find the cell containing , which can be done in time by planar point location; we then do a linear search over its conflict list, which has size .

Note that the point location operations can be actually be done during preprocessing in time since is known in advance. This lowers the time bound for queries to .

Solving Problem Approx-Count-Decision.

This problem corresponds to the decision version of 3D halfspace approximate range counting in dual space, and several deterministic and randomized data structures have already been given in the static case [AfshaniC09, afshani2010general], achieving query time and preprocessing time.

This result also follows directly from shallow cuttings: Without loss of generality, assume that all halfspaces are upper halfspaces. We construct a -shallow -cutting with downward cells for every for some constant . Chan and Tsakalidis’s algorithm can actually construct all such cuttings in total time. With these cuttings, we can compute an -approximation to the depth/level of a query point by simply finding the largest such that is contained in a cell of the -shallow cutting (the level of would then be and at least ). In Chan and Tsakalidis’s construction, each cell in one cutting intersects cells in the next cutting, and so we can locate the cells containing in time per , for a total of time.

To solve Problem Approx-Count-Decision, we still need to support insertion. Although the approximate decision problem is not decomposable, the above solution solves the approximate counting problem, which is decomposable, so we can apply the standard logarithmic method [bentley1980decomposable] to transform the static data structure into a semi-dynamic, insertion-only data structure. The transformation causes a logarithmic factor increase, yielding in our case query time and insertion time. Thus, the total time for queries and insertions is .

Conclusion.

By (1), the complete algorithm has running time .

One final issue remains: we have assumed that a value is given. In general, either the algorithm produces a solution of size , or (if it fails to complete within rounds) the algorithm may conclude that . We can thus find an -approximation to OPT by a binary search over among the possible powers of 2, with calls to the algorithm. The final time bound is .

Given points and halfspaces in , we can find a subset of halfspaces covering all points, of size within factor of the minimum, in deterministic time.

4.2 Randomized Version 1

We now describe a better solution to Problem Approx-Count-Decision, by using randomization and the fact that all query points (namely, ) are given in advance.

Reducing the number of insertions in Problem Approx-Count-Decision.

In solving Problem Approx-Count-Decision, one simple way to speed up insertions is to work with a random sample of . When we insert an object to , we independently decide to insert it to the sample

with probability

, or ignore it with probability , for a sufficiently large constant . (Different copies of an object are treated as different objects here.) It suffices to solve the problem for the sample with the new threshold around .

To justify correctness, consider a fixed query point . Let be the sequence of objects in that contain , in the order in which they are inserted (extend the sequence arbitrarily to make its length greater than ). Let if object is chosen to be in the sample , or 0 otherwise. Note that the ’s may not be independent (since the object we insert could depend on random choices made before); however, the ’s are independent. By the Chernoff bound, and with probability for any fixed constant . Thus, with high probability, at any time, if the number of objects in containing is more than , then the number of objects in containing is more than ; if the former number is less than , then the later number is less than . Since there are possible query points, all queries are correct with high probability.

By this strategy, the number of insertions is reduced to with high probability.

Preprocessing step.

Next we use a known preprocessing step to ensure that each object contains at most points, in the case of 3D halfspaces. This subproblem was addressed in Agarwal and Pan’s paper [agarwal2014near] (where it was called “(P5)”—curiously, they used it to implement their second algorithm but not their first MWU-based algorithm.) We state a better running time:

In time, we can find a subset of halfspaces, such that after removing all points in covered by , each halfspace of contains at most points.

Proof.

We may assume that all halfspaces are upper halfspaces. We work in dual space, where is now a set of points and is a set of planes. The goal is to find a subset of points such that after removing all planes of that are below some points of , each point of has depth/level at most .

We proceed in rounds. Let be a constant. In the -th round, assume that all points of have level . Compute a -shallow -cutting with cells. In each cell, add an arbitrary point of (if exists) to the set . In total points are added. Remove all planes that are below these added points from .

Consider a point of . Let be the cell containing , and let be the point in that was added to . Any plane that is below but not removed (and thus above ) must intersect , so there can be at most such planes. Thus, after the round, the level of is at most . We terminate when reaches . The total size of is .

Naively computing each shallow cutting from scratch by Chan and Tsakalidis’s algorithm would require total time. But Chan and Tsakalidis’s approach can compute multiple shallow cuttings more quickly: given a -shallow cutting along with its conflict lists, we can compute the next -shallow cutting along with its conflict lists in time. However, in our application, before computing the next cutting, we also remove some of the input planes. Fortunately, this type of scenario has been examined in a recent paper by Chan [chan2019dynamic], who shows that the approach still works, provided that the next cutting is relaxed to cover only points covered by the previous cutting (see Lemma 8 in his paper); this is sufficient in our application. In our application, we also need to locate the cell containing each point of . This can still be done in time given the locations in the previous cutting. Thus, the total time is . ∎

At the end, we add back to the solution, which still has total size.

Solving Problem Approx-Count-Decision.

We now propose a very simple approach to solve Problem Approx-Count-Decision: just explicitly maintain the depth of all points in . Each query then trivially takes time. When inserting an object, we find all points contained in the object and increment their depths.

Due to the above preprocessing step, the number of points contained in the object is . For the case of 3D halfspaces, we can find these points by halfspace range reporting; as explained before for Problem Report, this can be done in time by using shallow cuttings, after an initial preprocessing in time. Thus, each insertion takes time. Since the number of insertions has been reduced to by sampling, the total time for Problem Approx-Count-Decision is .

Conclusion.

By (1), the complete randomized algorithm has running time (even including the -time preprocessing step). Including the binary search for , the time bound is .

4.3 Randomized Version 2

Finally, we combine the ideas from both the deterministic and randomized implementations, to get our fastest randomized algorithm for 3D halfspaces.

Solving Problem Approx-Count-Decision.

We may assume that all halfspaces are upper halfspaces. We work in dual space, where is now a multiset of points and is a set of planes. In a query, we want to approximately count the number of points in that are above a query plane in . By the sampling reduction from Section 4.2, we may assume that the number of insertions to is . By the preprocessing step from Section 4.2, we may assume that all points in have level at most .

Compute a -shallow -cutting with downward cells, along with its conflict lists. For each point , locate the cell containing . All this can be done during a (one-time) preprocessing in time.

For each cell , we maintain in a semi-dynamic data structure for 3D approximate halfspace range counting. As described in Section 4.1, we get query and insertion time, where .

In an insertion of a point to , we look up the cell containing and insert the point to the approximate counting structure in .

In a query for a plane , we look up the cells whose conflict lists contain , answer approximate counting queries in these cells, and sum the answers.

We bound the total time for all insertions and queries. For each cell , the number of insertions in its approximate counting structure is and the number of queries is (since each plane is queried once). The total time is

Since there are terms and , we have “on average”; applying Jensen’s inequality to the first term, we can bound the sum by . Thus, .

Conclusion.

By (1), the complete randomized algorithm has running time . If , the first term dominates. On the other hand, if , our earlier randomized algorithm has running time . In any case, the time bound is at most . Including the binary search for , the time bound is .

Given points and halfspaces in , we can find a subset of halfspaces covering all points, of size within factor of the minimum, in time by a randomized Monte-Carlo algorithm with error probability for any constant .

Remark.

The number of the factors is improvable with still more effort, but we feel it is of minor significance.

5 Weighted Set Cover

In this final section, we consider the weighted set cover problem. We define -lightness and -nets as before, ignoring the weights. It is known that there exists an -net of with total weight , for any set of 3D halfspaces or 2D disks (or objects in 2D with linear union complexity) [chan2012weighted]. Here, the weight of a set refers to the sum of the weights of the objects in .

5.1 MWU Algorithm in the Weighted Case

Let be the set of input points and be the set of weighted input objects, where object has weight , with . Let OPT be the weight of the minimum-weight set cover. We assume that a value is given; this assumption can be removed by a binary search for .

We may delete objects with weights . We may automatically include all objects with weights in the solution, and delete them and all points covered by them, since the total weight of the solution increases by only . Thus, all remaining objects have weights in . By rescaling, we may now assume that all objects have weights in and that .

In the following, for a multiset where object has multiplicity , the weight of the multiset is defined as .

We describe a simple variant of the basic MWU algorithm to solve the weighted set cover problem. (A more general, randomized MWU algorithm for geometric set cover was given recently by Chekuri, Har-Peled, and Quanrud [fasterlp], but our algorithm is simpler to describe and analyze.) The key innovation is to replace doubling with multiplication by a factor , where is the weight of the concerned object . (Note that multiplicities may now be non-integers.)

1:Guess a value .
2:Define a multiset where each object in initially has multiplicity .
3:repeat
4:     Find a point which is -light in with .
5:     for each object containing  do call lines 5–6 a “multiplicity-increasing step”
6:         Multiply its multiplicity by .      
7:until all points are -heavy in .
8:Return an -net of the multiset .

Since at the end all points are -heavy in , the returned subset is a valid set cover of . For halfspaces in 3D or disks in 2D, its weight is .

We now prove that the algorithm terminates in multiplicity-increasing steps.

In each multiplicity-increasing step, increases by

i.e., increases by a factor of at most . Initially, . Thus, after multiplicity-increasing steps, .

On the other hand, consider the optimal set cover . Suppose that object has its multiplicity increased times. In each multiplicity-increasing step, at least one object in has its multiplicity increased. So, after multiplicity-increasing steps, and . In particular, for some . Therefore, (since ). We conclude that , implying that .

Similar to Agarwal and Pan’s first MWU algorithm, we can also divide the multiplicity-increasing steps into rounds, with each round performing up to multiplicity-increasing steps. Within each round, the total weight increases by at most . Also if increases by a constant factor, we immediately start a new round: because and may be doubled at most times, this case can happen at most times. This ensures that if a point is checked to be -heavy at any moment during a round, it will remain -heavy at the end of the round. There are only rounds.

Additional ideas are needed to speed up implementation (in particular, our modified MWU algorithm with multiplicity-readjustment steps does not work as well now). First, we work with an approximation to the multiplicity of each object . By rounding, we may assume all weights are powers of 2. In the original algorithm, , where is the number of points that are contained in object , and be the multiset consisting of all points that have undergone multiplicity-increasing steps so far. Note that since the total multiplicity is , we have . Let be a random sample of where each point is included independently with probability (if , we can just set ). Let be the number of points that are contained in object . By the Chernoff bound, since , we have with high probability. By letting , it follows that and are within a factor of of each other, with high probability, at all times, for all . Thus, our earlier analysis still holds when working with instead of . Since , we have with high probability. So, the total number of increments to all and updates to all is . In lines 5–6, we flip a biased coin to decide whether should be placed in the sample (with probability ) for each , and if so, we use halfspace range reporting in the dual to find all objects of weight containing , and increment and update . Over all executions of lines 5–6 and all indices , the cost of these halfspace range reporting queries is plus the output size. As the total output size for the queries is , the total cost is .

We also need to redesign a data structure for lightness testing subject to multiplicity updates: For each , we maintain a subset containing all objects with multiplicity at least , in a data structure to support approximate depth (without multiplicity). The depth of a point in can be -approximated by . Each subset undergoes insertion only, and the logarithmic method can be applied to each . Since , there are values of . This slows down lightness testing by a logarithmic factor, and so in the case of 3D halfspaces, the overall time bound is , excluding the -net construction time.

5.2 Speeding up Quasi-Uniform Sampling

Finally, we show how to efficiently construct an -net of the desired weight for 3D halfspaces. We will take advantage of the fact that we need -nets only in the discrete setting, with respect to a given set of points. Without loss of generality, assume that all halfspaces are upper halfspaces.

We begin by sketching (one interpretation of) the quasi-random sampling algorithm of Varadarajan [varadarajan2010weighted] and Chan et al. [chan2012weighted]:

1:
2:repeat
3:     Remove all points with depth in less than .
4:     Move each point downward so that its depth in is .
5:     Pick a random sample of size for some appropriate choice of .
6:     Let .
7:     repeat
8:         Find an object containing the fewest number of non-equivalent points in .
9:         if object contains a bad point then add object to the output.          
10:         Remove object from .
11:     until  is empty.
12:     Set and .
13:until  is below a constant.
14:Add to the output.

In line 4, we use the property that the objects are upper halfspaces. In line 8, two points and of are considered equivalent iff the subset of objects from containing is the same as the corresponding subset for . In line 9, a point is said to be bad iff its depth in is equal to and its depth in is less than .

With appropriate choices of parameters, in the case of 3D halfspaces, Chan et al. [chan2012weighted] showed that the output is an -net, with the property that each object of is in the output with probability (these events are not independent, so the output is only a “quasi-uniform” sample). This property immediately implies that the output has expected weight . We will not redescribe the proof here, as our interest lies in the running time.

Consider one iteration of the outer repeat loop. For the very first iteration, lines 3–4 can be done by answering halfspace range reporting queries in the dual (reporting up to objects containing each query point ), which takes total time by using shallow cuttings (as described in Section 4.1). As a result, we also obtain a list of the objects containing each point; these lists have total size . In each subsequent iteration, lines 3–4 take only time by scanning through these lists and selecting the lowest bounding planes per list.

For each point , we maintain its depth in and its depth in . Whenever an object is removed from , we examine all points in the object, and if necessary, decrement these depth values; this takes total time (since there are object-point containment pairs). Then line 9 can be done by scanning through all points in the object; again, this takes total time.

Line 8 requires more care, as we need to keep track of equivalence classes of points. One way is to use hashing or fingerprinting [MotRag]: for example, map each point to for a random and a fixed prime , where is a sufficiently large constant. Then two points are equivalent iff they are hashed to the same value, with high probability. When we remove an object from , we examine all points contained in the object, recompute the hash values of these points (which takes time each, given a table containing ), and whenever we find two equivalent points with the same hash values, we remove one of them. This takes total time (since there are object-point containment pairs). To implement line 8, for each object, we maintain a count of the number of points it contains. Whenever we remove a point, we decrement the counts of objects containing it; again, this takes total time. The minimum count can be maintained in time per operation without a heap, since the only update operations are decrements (for example, we can place objects in buckets indexed by their counts, and move an object from one bucket to another whenever we decrement).

To summarize, the first iteration of the outer repeat loop takes time, and each subsequent iteration takes time. Since is halved in each iteration, the total time over all iterations is where .

In our application, we need to compute an -net of a multiset . Since the initial halfspace range reporting subproblem can be solved on the set without multiplicities, the running time is still but with .

For , the time bound is , which is still too large. To reduce the running time, we use one additional simple idea: take a random sample of size for a sufficiently large constant . Then . For a fixed point of depth in , the depth of in is with high probability, by the Chernoff bound. We then compute a -net of , which gives us an -net of , with expected weight . The net for is easier to compute, since is reduced to . The final running time for the -net construction is .

We can verify that the net’s weight bound holds (and that all points of are covered), and if not, repeat the algorithm for expected number of trials.

We conclude:

Given points and weighted halfspaces in , we can find a subset of halfspaces covering all points, of total weight within factor of the minimum, in expected time by a randomized Las Vegas algorithm.

Remark.

A remaining open problem is to find efficient deterministic algorithms for the weighted problem. Chan et al. [chan2012weighted] noted that the quasi-uniform sampling technique can be derandomized via the method of conditional probabilities, but the running time is high.

References