1 Introduction
1.1 Background
Let be a ground set of elements and be a collection of subsets of . A set cover is a subcollection of such that its union is . The set cover problem is to find a minimum cardinality set cover of . It is a classic NPhard problem that is also hard to approximate to within a factor of in polynomial time for any [11, 9].
In the online setting [2, 7, 8], the members of are identified a priori, but the elements of the ground set that need to be covered, along with their respective set membership, are revealed sequentially. More precisely, the online set cover problem consists of a game between an algorithm and an oblivious adversary; one which knows the algorithm but not the realization of any random choices^{1}^{1}1If the algorithm is deterministic, an oblivious adversary is equivalent to an adaptive adversary; one which makes requests adaptively in response to the algorithm [4].. The adversary produces, in advance, a sequence of elements of , which it reveals to the algorithm one at a time. Upon the arrival of an element, the algorithm must either conclude that the element is already in the set cover, or irrevocably extend the set cover with a member of containing the element.
Alon et al. [2] gave a deterministic competitive algorithm for the online set cover problem and a nearly matching lower bound for any online algorithm. Buchbinder and Naor [7, 8] later proposed a general scheme for the design and analysis of online algorithms, namely the primaldual method^{2}^{2}2This scheme first appeared in the context of approximation algorithms [15]., and used it to obtain new algorithms for the online set cover problem. Their algorithms generally consist of two phases: i) a deterministic primaldual subroutine for the fractional online set cover problem, which is optimal up to constant terms, and ii) a randomized rounding procedure whose expected cost is times the cost of the fractional solution, ultimately producing randomized competitive algorithms. The rounding procedure can be derandomized, producing deterministic competitive algorithms.
1.2 Contributions
Herein we introduce the batched set cover problem, which is a generalization of the online set cover problem. However, as in [7, 8], our focus is on its fractional counterpart; this corresponds to phase i) of the primaldual scheme. We immediately recover the integral case through the rounding procedures in phase ii), which we leave untouched. In essence, the batched set cover problem differs from the online set cover problem in that the adversary produces a sequence of batches of elements of . Thus, the online set cover problem is a special case of the batched set cover problem in which each element revealed by the adversary is its own batch.
Note that the problem we consider is distinct from the capacitated online set cover problem with set requests treated by Bhawalkar et al. [5]. They argue that the uncapacitated problem is not meaningful because the elements in a batch can be thought of as arriving sequentially, whereas we argue that this is not always the case. Our main technical contribution is a tight lower bound on the competitive ratio of any fractional batched algorithm given a parametrized restriction on the adversary. Specifically, if we consider adversaries that are required to produce batches of Vapnik Chervonenkis (VC)dimension [12] at least , for some , any fractional batched algorithm is competitive. For , this bound is more generous (to the algorithm) than the bound of the online setting [7, 8], which we recover when .
In addition, we propose a minor modification to an online algorithm found in [8] to obtain a dedicated batched algorithm. The main idea is the simultaneous update of the dual variables that correspond to unsatisfied primal constraints, which is reminiscent of a primaldual algorithm in [14] for the generalized Steiner tree problem. Unfortunately, we are unable to analyze this algorithm in a way that exhibits the effects of the more generous (parametrized) bound. Alternatively, we provide computational results that suggest that while the greedy strategy proposed in [5] is inoffensive for the worst case instance given , it compromises on the competitive ratio obtained on the worst case instance given . Our experiments suggest that proposed algorithm improves on the competitive ratio obtained by the greedy strategy by some constant factor.
The significance of this problem stems from the fact that, in some realworld applications, decisions are made after collecting a batch of data. Moreover, in many of these applications, batches of data are rarely produced by an absolute worst case adversary. The intent of our restricted adversarial model is to mimic the worst case instances that may effectively arise in the real world. For example, highcapacity ridesharing systems rely on the batch assignment of trip requests to vehicles [3], and some related problems such as that of optimal congregation points for passenger pickups and dropoffs can be modeled as a batched set cover problem. Intuitively, sequencing a batch of travel requests defeats the purpose of preparing the batch in the first place. Moreover, the batches that arise in this setting tend to have a VCdimension greater than or equal to two, as the application revolves around exploiting the overlaps between distinct requests (see Section 2). Our results formalize this intuition. As listed in [5], further examples of applications of the batched set cover problem may be found in distributed computing, facility planning, and subscription markets.
To the best of our knowledge, we are the first to use the VCdimension in the context of online (batched) covering problems. The VCdimension has been used successfully in the context of approximation algorithms for (offline) set cover problems [6, 10], as well as in the context of improved running time bounds for unconstrained [1] and constrained [13] shortest path algorithms. In both of these settings, the algorithms exploit the low VCdimension of the set systems on which they operate. Perhaps surprisingly, algorithms for the batched set cover problem may instead exploit the high VCdimension of the set systems on which they operate, which we model as a restriction on the adversary. Intuitively, the reason is that the adversary is forced to reveal complex, intertwined batches. A dedicated algorithm attempts to exploit the richness of the information revealed, while a greedy algorithm is myopic to the interactions between the set memberships of the elements in a batch.
1.3 Organization
In Section 2 we formally introduce our problems and definitions. In Section 3 we consider bounds for the online fractional set cover problem. We present a known lower bound of on the competitive ratio of any online algorithm. While the tightness of this lower bound (up to constants) follows immediately from the existence of competitive fractional algorithms [7, 8], we present an inductive proof that shows the tightness of the lower bound without the explicit need of a competitive algorithm. This technique is used in Section 4 to show the tightness of a lower bound on the competitive ratio of any batched algorithm, given our restricted adversary parametrized by . The reason for doing this is our argument that batching may offer a constant factor improvement in the competitive ratio. Hence, tightness up to constants is not informative enough for our purposes. In Section 5 we formalize the greedy strategy suggested in [5] and present our minor modification to an online algorithm found in [8]. We also present the results of our computational experiments.
2 Preliminaries
Let be set to if is brought to the set cover and to otherwise. Now, consider LP 1
, which describes the linear programming relaxation of the offline set cover problem. We refer to
LP 1 as the primal covering problem. Here, refers to the cost of bringing some set to the set cover, and the objective is to minimize the total cost incurred. In the unweighted case, for all . Constraints ensure that every element in the ground set is covered. Note that the set membership information of each element is encoded in its respective constraint.(LP 1) 
The primal covering problem has an associated dual packing problem, described in LP 2. We refer to this primaldual formulation throughout this work. We will refer to the collection of sets in that individually contain by .
(LP 2) 
In the fractional online setting [7, 8], the objective function of LP 1 is known a priori, but constraints are revealed one by one. This corresponds to the algorithm identifying the costs of the sets in a priori, but the adversary revealing a sequence of elements of , along with their respective set membership, in an online fashion. Equivalently, the right hand side of the constraints of LP 2 are known a priori, but the variables involved in them and in the objective function are revealed one by one.
Now, consider the following batched version of the set cover problem, which is also a game between an algorithm and an oblivious adversary. In the batched set cover problem, is identified a priori, but the adversary produces a sequence of batches of elements of , which it reveals one batch at a time. For instance, , where denotes the size of the th batch. When a batch arrives, all of its elements, along with their respective set membership information, are revealed simultaneously. The fractional batched setting is analogous to the fractional online setting, except constraints appear in tandem. Equivalently, the variables involved in the objective function and constraints are revealed in tandem. Note that the online setting is trivially recovered when each batch is a singleton. We refer to the union of sets in that individually cover the elements in , namely , by .
We define an instance of the online set cover problem as a collection together with the adversarial sequence. We introduce the following performance measures. The batched setting for both of these measures is analogous.
Definition 1
An online algorithm is said to be competitive if for every instance of the problem it outputs a solution of cost at most , where is the cost of the optimal offline solution.
Definition 2
An online adversary is said to be advantaged if it produces an instance such that every online algorithm outputs a solution of cost at least , where is the cost of the optimal offline solution.
Our analysis in Section 4 relies on imposing a minimum on the VCdimension of any batch produced by the adversary. The VCdimension was first proposed by Vapnik and Chernovekis [12]
, and it is a widely used measure of complexity in computational learning theory. We work with the following definitions.
Definition 3 (Set System)
A set system is a ground set together with a collection of subsets of .
Definition 4 (Shattering)
A subset is said to be shattered by if , where is the power set of .
Definition 5 (VCdimension)
The VCdimension of a set system is the cardinality of the largest subset to be shattered by . We denote it by .
In particular, upon the arrival of a batch we obtain a set system , where is known a priori. Moreover, note that i) restricting the adversary to produce batches with VCdimension is only meaningful when ; otherwise the adversary is unable to produce any batches, and ii) by definition, any batch satisfying also satisfies . This is illustrated in Figure 1, which showcases how a batch satisfying can be constructed with and . Observe that in each of the cases, is shattered since each of its subsets is the intersection of with some . Of course, given , there may be instances for which , or for which the adversary produces batches satisfying , or both. We consider these cases in our analysis in Section 4.
In a ridesharing context, we may interpret each as a possible congregation point (e.g., an intersection in a street network), whereas each constraint corresponds to the set of compatible congregation points (e.g., within walking distance) for each trip origin or destination. Note that the only reasons why we would have are if i) , or ii) but all the travel request form either a collection of proper subsets or a collection completely disjoint subsets of the possible congregation points. Given sufficiently high and heterogeneous demand, batches with are unlikely to arise; the batches that arise look more like the those in Figure 1 (c) and (d).
3 Fractional Online Set Cover
Lemma 1 (Variation of Buchbinder and Naor [8])
There exists an instance of the unweighted fractional online set cover problem such that any online algorithm is competitive on this instance.
Proof
Consider the following instance , which is particular to the sequence produced by an adversary in response to an arbitrary online algorithm . Let . Upon the arrival of , must satisfy . Thus, it must let for at least one . Refer to such as and to its corresponding variable as . Now, let . Upon the arrival of , must satisfy . Thus, it must let for at least one . Again, refer to such as and to its corresponding variable as . In general, an adversary may continue revealing elements satisfying
forcing the algorithm to let for at least one . Refer to such as and to its corresponding variable as . After steps, the total cost incurred by the algorithm, namely , is at least
Meanwhile, the total cost incurred by an optimal offline solution is , which corresponds to simply letting . Thus, .
Note that depends on only in the sense that the particular adversarial sequence produced is a response to the particular algorithm; the lower bound on the competitive factor, on the other hand, is independent of the algorithm. Thus, we may parametrize the instance in Lemma 1 by to obtain the instance class . In other words, refers to the instances that produce a lower bound of on the competitive factor of any online algorithm as a result of the adversary following the strategy in the proof of Lemma 1. If we vary , we obtain the family of instance classes .
The tightness of this lower bound (up to constants) is an immediate result of the existence of competitive algorithms for the fractional online set cover problem [7, 8]. In Lemma 3 we present a different approach to show that this lower bound is tight, this time without relying on a particular algorithm. We use Lemma 2, whose proof is in the Appendix.
Lemma 2
for any integers .
Lemma 3
There exists an online algorithm for the unweighted fractional online set cover problem such that any adversary is advantaged. In particular, this bound is matched by any adversary that follows the strategy in the family of instance classes , described in the proof of Lemma 1.
Proof
By Lemma 1, there exists an adversary that is advantaged, namely one that follows the strategy of instance class . We need to show that this is indeed the best an adversary can be guaranteed to achieve. We prove this by strong induction on and by focusing on an arbitrary adversary . We will make use of the existence of a randomized algorithm
that, in principle, produces specific outcomes with nonzero probability.
Base Case (): When reveals any element , it must be the case that . Then, must let , achieving a competitive factor of , as in
Inductive Step: Assume, by way of strong induction, than the statement is true for . We need to show that the statement is true for . By Lemma 1, there exists an adversary that is advantaged, namely one that follows the strategy of instance class . Now, consider the case in which deviates from the strategy of on some arbitrary step . Let be the th element according to the strategy of . If reveals an element such that is empty, the cost of the optimal solution increases by 1, which by Lemma 2 irrevocably decreases the advantage of . Therefore, suppose that reveals an element such that is nonempty. Then, with nonzero probability disregards all , if any, making such deviation futile. Thus, safely assume that instead reveals an element such that . Let , , and note that . Let be the first step after step such that . As before, with nonzero probability disregards all , if any, so safely assume that reveals an element such that . Then, by the inductive hypothesis, given that element satisfied , the best can do is to recreate on the remainder of the steps, starting with . In particular, the best can do is to reveal an element such that and . A symmetric argument can be made about concurrently recreating on the remainder of the steps, which is disjoint from after the th step and hence increases the offline solution by one. However, by Lemma 2, this achieves a strictly lower competitive advantage for .
4 Fractional Batched Set Cover
4.1 General Case
Lemma 4
There exists an instance of the unweighted fractional batched set cover problem such that any batched algorithm is competitive on this instance.
Lemma 5
There exists a batched algorithm for the unweighted fractional batched set cover problem such that any adversary is advantaged.
Lemma 4 follows from the fact that the fractional online set cover problem is a special case of the fractional batched set cover problem, together with Lemma 1. In Section 4.2 we consider the case in which the adversary is imposed a minimum VCdimension for any batch produced. In the proof of Lemma 7, we mention why an adversary never benefits from producing batches with a VCdimension larger than the minimum required. This, together with Lemma 3, yields Lemma 5.
4.2 Restricted Adversary
Our intent now is to characterize instance classes that distinguish the fractional batched set cover problem from the fractional online set cover problem. In particular, we restrict the adversary in that it is forced to produce batches satisfying , for some . Given , we assume that and , as described in Section 2.
Lemma 6
There exists an instance of the unweighted fractional batched set cover problem, with an adversary satisfying for any batch , such that any batched algorithm is competitive on this instance.
Proof
This proof is analogous to that of Lemma 1. Consider the following instance , which is particular to the sequence produced by an adversary satisfying for any batch , in response to an arbitrary batched algorithm . In the following, our ordering of is arbitrary.
Let such that is as in the diagrams in Figure 1 on sets with , in addition to each being also contained in each . This last property cannot decrease the VCdimension, so is a valid batch. Then, because of the constraint corresponding to , must let for at least one . For clarity and without loss of generality, assume such is .
Then, let such that is as in the diagrams in Figure 1 on sets with , in addition to each being also contained in each . Then, because of the constraint corresponding to , must let for at least one . For clarity and without loss of generality, assume such is .
In general, an adversary may continue revealing such that is as in the diagrams in Figure 1 on sets , with , in addition to each being also contained in each . Then, because of the constraint corresponding to , must let for at least one . For clarity and without loss of generality, assume such is .
After the possible steps, the total cost incurred by the algorithm, namely , is at least
Meanwhile, the total cost incurred by an optimal offline solution is , which corresponds to simply letting . Thus, . In simple terms, the adversary may capitalize on while assuming a sunk cost on .
As in Section 3, we parametrize the instances in Lemma 6 by . Thus, given , we obtain the family of instance classes . We recover the general case when . Analogous to Lemma 3, Lemma 7 shows the lower bound given is tight. Its proof can be found in the Appendix.
Lemma 7
There exists a batched algorithm for the unweighted fractional online set cover problem such that any adversary satisfying for any batch is advantaged. In particular, this bound is matched by any adversary that follows the strategy in the family of instance classes , described in the proof of Lemma 6.
5 Batched Algorithms
5.1 Analysis
Note that since any batch could be artificially decomposed into a sequence of elements that are ‘revealed’ one at a time, any competitive algorithm for online set cover would produce a competitive feasible solution; we refer to this as the ‘trivial’ approach. More precisely, the trivial approach consists of two phases: i) some (possibly randomized) subroutine that executes a mapping , where is the th element of the artificial sequence, followed by ii) any competitive algorithm for the online set cover problem.
An example of such approach is the competitive primaldual subroutine in Algorithm 1, which we refer to as , followed by any rounding technique (i.e., the second phase of the primaldual method). is a minor modification of the competitive Algorithm 2 in Section 4.2 of [8] for the batched setting. Note that . Its correctness follows immediately from Theorem 4.2 of [8]. For conciseness, we only mention that the proof relies on showing three claims together with weak duality: i) the algorithm produces a primal feasible solution to LP 1, ii) the algorithm produces a dual feasible solution to LP 2, and iii) the objective value of LP 1 is bounded above by times the objective value of LP 2. Clearly, the complete algorithm is competitive.
Nevertheless, unless for all , the trivial approach may fail to leverage the rich information that is possibly implicit in the fact that all elements of a given batch are revealed simultaneously. We refer to an algorithm that attempts to leverage any such information as a ‘dedicated’ algorithm. We obtain such an algorithm if we replace with Algorithm 2, which we refer to as .
Note the difference in how the dual variables are updated: sequentially in and simultaneously in . This is reminiscent of the approach of increasing multiple variables at once in a primaldual algorithm by [14] for the generalized Steiner tree problem. As expected, is also competitive^{3}^{3}3To be precise, both algorithms are competitive.; it is also a minor modification of the competitive Algorithm 2 in Section 4.2 of [8], and its correctness also follows immediately from Theorem 4.2 of [8]. Unfortunately, we are unable to improve its analysis in a way that reflects the intuition that batching should improve the competitive factor obtained under certain conditions. For example, we expect batching to help in the families of instances , shown in Section 4 to produce a more generous lower bound on the competitive factor of any algorithm due to the VCdimension requirement on the batches produced by the adversary. We leave presenting such an analysis as an open problem. As an alternative, in Section 5.2 we present the results of computational experiments that compare the performance (i.e., competitive factor) of and on instances of .
5.2 Computational Experiments
Figure 2 (a) and (b) present the results of computational experiments that compare the worstcase performance (i.e., competitive factor) of and obtained on instances of , respectively, for various values of and . We discretize both algorithms with a step size of . We justify the use of instances of by the fact that it is not some arbitrary family of instance classes; Lemma 6 ans Lemma 7 imply it produces tight bounds. Also, note that because of the symmetric nature of the batches that arise in , the particular order produced by the mapping in is irrelevant for evaluating the worstcase performance of the trivial approach so long as is the last element ‘revealed’. This notion does not translate to , since the dual variable updates occur simultaneously. Also, recall that for instances in the optimal offline solution is one.
Note that for , both algorithms match the theoretical lower bound. This is expected, as in both of these cases, per the description of , all the batches are singletons. On the other hand, for , neither of the algorithms match said lower bound. However, it can be observed that is closer than to the theoretical lower bound on the competitive ratio; this is not surprising as the algorithm has only be shown to be optimal up to constants. The improvement is more significant as increases, though it is only by a constant factor; both curves remain logarithmic. These results provide empirical evidence of the benefits of batching when the VCdimension is high.
Acknowledgements
The first author was funded through the Center for Transportation, Environment, and Community Health (CTECH) as well as a Federal Highway Administration (FHWA) Dwight David Eisenhower Transportation Fellowship.
References
 [1] Abraham, I., Delling, D., Fiat, A., Goldberg, A.V., Werneck, R.F.: Vcdimension and shortest path algorithms. In: International Colloquium on Automata, Languages, and Programming. pp. 690–699. Springer (2011)

[2]
Alon, N., Awerbuch, B., Azar, Y.: The online set cover problem. In: Proceedings of the 35th annual ACM Symposium on the Theory of Computation. pp. 100–105 (2003)
 [3] AlonsoMora, J., Samaranayake, S., Wallar, A., Frazzoli, E., Rus, D.: Ondemand highcapacity ridesharing via dynamic tripvehicle assignment. Proceedings of the National Academy of Sciences 114(3), 462–467 (2017). https://doi.org/10.1073/pnas.1611675114
 [4] BenDavid, S., Borodin, A., Karp, R., Tardos, G., Wigderson, A.: On the power of randomization in online algorithms. Algorithmica 11(1), 2–14 (1994)
 [5] Bhawalkar, K., Gollapudi, S., Panigrahi, D.: Online set cover with set requests. In: LIPIcsLeibniz International Proceedings in Informatics. vol. 28. Schloss DagstuhlLeibnizZentrum fuer Informatik (2014). https://doi.org/10.4230/LIPIcs.APPROXRANDOM.2014.64
 [6] Brönnimann, H., Goodrich, M.T.: Almost optimal set covers in finite vcdimension. Discrete & Computational Geometry 14(4), 463–479 (1995)
 [7] Buchbinder, N., Naor, J.: Online primaldual algorithms for covering and packing problems. In: Proceedings of the 13th Annual European Symposium on Algorithms. pp. 689–701 (2005)
 [8] Buchbinder, N., Naor, J.: The design of competitive online algorithms via a primaldual approach. Foundations and Trends in Theoretical Computer Science 3(23), 93–263 (2009). https://doi.org/10.1561/0400000024
 [9] Dinur, I., Steurer, D.: Analytical approach to parallel repetition. In: Proceedings of the Fortysixth Annual ACM Symposium on Theory of Computing. pp. 624–633. STOC ’14, ACM (2014). https://doi.org/10.1145/2591796.2591884
 [10] Even, G., Rawitz, D., Shahar, S.M.: Hitting sets when the vcdimension is small. Information Processing Letters 95(2), 358–362 (2005)
 [11] Feige, U.: A threshold of for approximating set cover. Journal of the ACM 45(4), 634–652 (1998). https://doi.org/10.1145/285055.285059
 [12] Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Measures of complexity, pp. 11–30. Springer (2015)
 [13] Vera, A., Banerjee, S., Samaranayake, S.: Computing constrained shortestpaths at scale (2017)
 [14] Williamson, D.P., Goemans, M.X., Mihail, M., Vazirani, V.V.: A primaldual approximation algorithm for generalized steiner network problems. Combinatorica 15(3), 435–454 (1995). https://doi.org/10.1007/BF01299747
 [15] Williamson, D.P., Shmoys, D.B.: The Design of Approximation Algorithms. Cambridge University Press, New York, NY, USA, 1st edn. (2011)
Appendix
5.3 Proof of Lemma 2.
Proof
Assume, without loss of generality, that . Note that
Likewise, note that
Clearly,
so the proof is complete.
5.4 Proof of Lemma 7.
Proof
This proof is analogous to that of Lemma 3. By Lemma 7, there exists an adversary satisfying for any batch that is advantaged, namely one that follows the strategy of instance class . We need to show that this is indeed the best an adversary can be guaranteed to achieve. We prove this by strong induction on and by focusing on an arbitrary batched adversary . We will make use of the existence of a randomized algorithm that, in principle, produces specific outcomes with nonzero probability.
Base Case (). If , the statement is vacously true because the adversary cannot produce any batches.
Base Case (). When reveals any batch , it must be the case that is as in the diagrams in Figure 1 on sets . Then, with nonzero probability, disregards any . In such case, lets , achieving a competitive factor of . This is in agreement with the description of .
Inductive Step. Assume, by way of strong induction, than the statement is true for . We need to show that the statement is true for . By Lemma 6, there exists an adversary that is advantaged, namely one that follows the strategy of instance class . Now, consider the case in which deviates from the strategy of on some arbitrary step . Let , where , and let be the th batch according to the strategy of . Further, denote , with , as well as , with . If , then the cost of the optimal offline solution increases by 1, which by Lemma 2 irrevocably decreases the advantage of . Therefore, suppose that reveals a batch such that . Moreover, with nonzero probability disregards all , so safely assume this is the case for the rest of the proof. Now, with nonzero probability disregards all , if any, making such deviation futile. Thus, safely assume that , implying that . Let be the first step after step such that . As before, with nonzero probability disregards all , so safely assume that this is the case for the rest of the proof. Then, with nonzero probability also disregards any , if any, so safely assume that . Then, by the inductive hypothesis, given that batch satisfied , the best can do is to recreate on the remainder of the steps, starting with . In particular, this requires and . A symmetric argument can be made about concurrently recreating on the remainder of the steps, which is disjoint from after the th step and hence increases the offline solution by one. However, by Lemma 2, this achieves a strictly lower competitive advantage for .
Note that cannot be guaranteed to benefit from producing batches satisfying because with nonzero probability disregards any , making such deviation futile with nonzero probability. In fact, a larger VCdimension would involve more sets, possibly making (and hence the attainable competitive advantage) smaller.