The FAST Algorithm for Submodular Maximization

07/14/2019 ∙ by Adam Breuer, et al. ∙ Harvard University 0

In this paper we describe a new algorithm called Fast Adaptive Sequencing Technique (FAST) for maximizing a monotone submodular function under a cardinality constraint k whose approximation ratio is arbitrarily close to 1-1/e, is O((n) ^2( k)) adaptive, and uses a total of O(n (k)) queries. Recent algorithms have comparable guarantees in terms of asymptotic worst case analysis, but their actual number of rounds and query complexity depend on very large constants and polynomials in terms of precision and confidence, making them impractical for large data sets. Our main contribution is a design that is extremely efficient both in terms of its non-asymptotic worst case query complexity and number of rounds, and in terms of its practical runtime. We show that this algorithm outperforms any algorithm for submodular maximization we are aware of, including hyper-optimized parallel versions of state-of-the-art serial algorithms, by running experiments on large data sets. These experiments show FAST is orders of magnitude faster than the state-of-the-art.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper we describe a fast parallel algorithm for submodular maximization. Informally, a function is submodular if it exhibits a natural diminishing returns property. For the canonical problem of maximizing a monotone submodular function under a cardinality constraint, it is well known that the greedy algorithm, which iteratively adds elements whose marginal contribution is largest to the solution, obtains a approximation guarantee [NWF78] which is optimal for polynomial-time algorithms [nemhauser1978best]

. The greedy algorithm and other submodular maximization techniques are heavily used in machine learning and data mining since many fundamental objectives such as entropy, mutual information, graphs cuts, diversity, and set cover are all submodular.

In recent years there has been a great deal of progress on fast algorithms for submodular maximization designed to accelerate computation on large data sets. The first line of work considers serial algorithms where queries can be evaluated on a single processor [LKGFVG07, BV14, mirzasoleiman2015lazier, mirzasoleiman2016fast, EN19-1, EN19-2]. For serial algorithms the state-of-the-art for maximization under a cardinality constraint is the lazier-than-lazy-greedy (LTLG) algorithm which returns a solution that is in expectation arbitrarily close to the optimal and does so in a linear number of queries [mirzasoleiman2015lazier]. This algorithm is a stochastic greedy algorithm coupled with lazy updates, which not only performs well in terms of the quality of the solution it returns but is also very fast in practice.

Accelerating computation beyond linear runtime requires parallelization. The parallel runtime of blackbox optimization is measured by adaptivity, which is the number of sequential rounds an algorithm requires when polynomially-many queries can be executed in parallel in every round. For maximizing a submodular function defined over a ground set of elements under a cardinality constraint , the adaptivity of the naive greedy algorithm is , which in the worst case is . Until recently no algorithm was known to have better parallel runtime than that of naive greedy.

A very recent line of work initiated by Balkanski and Singer [BS18a] develops techniques for designing constant factor approximation algorithms for submodular maximization whose parallel runtime is logarithmic [BS18b, BRS19, EN19, FMZ19, FMZ18, BBS18, KMZLK19, chekuri2018submodular, BRS19b, chekuri2018matroid, ene2019, chen2019, FMZ19]. In particular,  [BS18a] describe a technique called adaptive sampling that obtains in rounds an approximation arbitrarily close to for maximizing a monotone submodular function under a cardinality constraint. This technique can be used to produce solutions arbitrarily close to the optimal in rounds [BRS19, EN19].

1.1 From theory to practice

The focus of the work on adaptive complexity described above has largely been on conceptual and theoretical contributions: achieving strong approximation guarantees under various constraints with runtimes that are exponentially faster under worst case theoretical analysis. From a practitioner’s perspective however, even the state-of-the-art algorithms in this genre are infeasible for large data sets. The logarithmic parallel runtime of algorithms in this genre carries extremely large constants and polynomial dependencies on precision and confidence parameters that are hidden in their asymptotic analysis. In terms of sample complexity alone, obtaining (for example) a

approximation with confidence for maximizing a submodular function under cardinality constraint requires evaluating at least  [BRS19] or  [FMZ19, chekuri2018submodular] samples of sets of size approximately

in every round. Even if one heuristically uses a single sample in every round, other sources of inefficiencies that we discuss throughout the paper that prevent these algorithms from being applied even on moderate-sized data sets. The question is then whether the plethora of breakthrough techniques in this line of work of exponentially faster algorithms for submodular maximization can lead to algorithms that are fast in practice for large problem instances.

1.2 Our contribution

In this paper we design a new algorithm called Fast Adaptive Sequencing Technique (Fast) whose approximation ratio is arbitrarily close to , has adaptivity, and uses a total of queries for maximizing a monotone submodular function under a cardinality constraint . The main contribution is not in the algorithm’s asymptotic guarantees but in its design that is extremely efficient both in terms of its non-asymptotic worst case query complexity and number of rounds, and in terms of its practical runtime. In terms of actual query complexity and practical runtime, this algorithm outperforms any algorithms for submodular maximization we are aware of, including hyper-optimized versions of LTLG. To be more concrete, we give a brief experimental comparison in the table below for a max-cover objective on a Watts-Strogatz random graph with nodes against optimized implementations of algorithms with the same adaptivity and approximation (experiment details in Section 4).

rounds queries time (sec)
Amortized-Filtering  [BRS19] 540 35471 0.58
Exhaustive-Maximization  [FMZ19] 12806 4845205 55.14
Randomized-Parallel-Greedy  [chekuri2018submodular] 66 81648 1.36
Fast 18 2497 0.051

Fast achieves its speedup by thoughtful design that results in frugal worst case query complexity as well as several heuristics used for practical speedups. From a purely analytical perspective, Fast improves the dependency in the linear term of the query complexity of at least in [BRS19, EN19] and in [FMZ19] to . We provide the first non-asymptotic bounds on the query and adaptive complexity of an algorithm with sublinear adaptivity, showing dependency on small constants. In Appendix A, we compare these query and adaptive complexity achieved by Fast to previous work. Our algorithm uses adaptive sequencing [BRS19b] and multiple optimizations to improve the query complexity and runtime.

1.3 Paper organization

We introduce the main ideas and decisions behind the design of Fast in Section 2. We describe and analyze guarantees in Section 3. We discuss experiments in Section 4.

2 Fast Overview

Before describing the algorithm, we give an overview of the major ideas and discuss how they circumvent the bottlenecks for practical implementation of existing logarithmic adaptivity algorithms.

Adaptive sequencing vs. adaptive sampling.

The large majority of low-adaptivity algorithms use adaptive sampling [BS18a, EN19, FMZ19, FMZ18, BBS18, BRS19, KMZLK19], a technique introduced in [BS18a]

. These algorithms sample a large number of sets of elements at every iteration to estimate (1) the expected contribution of a random set

to the current solution and (2) the expected contributions of each element to . These estimates, which rely on concentration arguments, are then used to either add a random set to or discard elements with low expected contribution to . In contrast, the adaptive sequencing technique which was recently introduced in [BRS19b] generates at every iteration a single random sequence of the elements not yet discarded. A prefix of the sequence is then added to the solution , where is the largest position such that a large fraction of the elements in have high contribution to . Elements with low contribution to the new solution are then discarded from .

The first choice we made was to use an adaptive sequencing technique rather than adaptive sampling.

  • Dependence on large polynomials in . Adaptive sampling algorithms crucially rely on sampling and as a result their query complexity has high polynomial dependency on (e.g. at least in [BRS19] and [EN19]). In contrast, adaptive sequencing generates a single random sequence at every iteration and we can therefore obtain an dependence in the term that is linear in that is only ;

  • Dependence on large constants. The asymptotic query complexity of previous algorithms depends on very large constants (e.g. at least in [BRS19] and [EN19]) making them impractical. As we tried to optimize constants for adaptive sampling, we found that due to the sampling and the requirement to maintain strong theoretical guarantees, the constants cascade and grow through multiple parts of the analysis. In principle, adaptive sequencing does not rely on sampling which dramatically reduces the dependency on constants.

Negotiating adaptive complexity with query complexity.

The vanilla version of our algorithm, whose description and analysis are in Appendix B, has at most adaptive rounds and uses a total of queries to obtain a approximation, without additional dependence on constants or lower order terms. In our actual algorithm, we trade a small factor in adaptive complexity for a substantial improvement in query complexity. We do this in the following manner:

  • Search for OPT estimates. All algorithms with logarithmic adaptivity require a good estimate of OPT, which can be obtained by running instances of the algorithms with different guesses of OPT in parallel, so that one guess is guaranteed to be a good approximation to OPT.111[FMZ19] does some preprocessing to estimate OPT, but it is estimated within some very large constant. We accelerate this search by binary searching over the guesses of OPT

    . A main difficulty for this binary search is that the approximation guarantee of each solution needs to hold with high probability instead of in expectation. Even though the marginal contributions obtained from each element added to the solution only hold in expectation for adaptive sequencing, we obtain high probability guarantees for the global solution by generalizing the robust guarantees of

    [HS17]. In the practical speedups below, we discuss how we often only need a single iteration of this binary search in practice;

  • Search for position . To find the position , which is the largest position such that a large fraction of not-yet-discarded elements have high contribution to , the vanilla adaptive sequencing technique queries the contribution of all elements in at each of the positions, which causes the query complexity. Instead, similar to guessing OPT, we binary search over a set of geometrically increasing values of . This improves the dependency on and in the query complexity to . Then, at any step of the binary search over a position , instead of evaluating the contribution of all elements in to , we only evaluate a small sample of elements. In the practical speedups below, we discuss how we can often skip this binary search for in practice.

Practical speedups.

We include several ideas which result in considerable speedups in practice without sacrificing approximation, adaptivity, or query complexity guarantees:

  • Preprocessing the sequence. At the outset of each iteration of the algorithm, before searching for a prefix to add to the solution , we first use a preprocessing step that adds high value elements from the sequence to . Specifically, we add to the solution all sequence elements that have high contribution to . After adding these high-value elements, we discard surviving elements in that have low contribution to the new solution . In the case where this step discards a large fraction of surviving elements from , we can also skip this iteration’s binary search for and continue to the next iteration without adding a prefix to ;

  • Number of elements added per iteration. An adaptive sampling algorithm which samples sets of size adds at most elements to the current solution at each iteration. In contrast, adaptive sequencing and the preprocessing step described above often allow our algorithm to add a very large number of elements to the current solution at each iteration in practice;

  • Single iteration of the binary search for OPT. Even with binary search, running multiple instances of the algorithm with different guesses of OPT is undesirable. We describe a technique which often needs only a single guess of OPT. This guess is the sum of the highest valued singletons, which is an upper bound on OPT. If the solution obtained with that guess has value , then, since , is guaranteed to obtain a approximation and the algorithm does not need to continue the binary search. Note that with a single guess of OPT, the robust guarantees for the binary search are not needed, which improves the sample complexity to ;

  • Lazy updates. There are many situations where lazy evaluations of marginal contributions can be performed [minoux1978accelerated, mirzasoleiman2015lazier]. Since we never discard elements from the solution , the contributions of elements to are non-increasing at every iteration by submodularity. Elements with low contribution to the current solution at some iteration are ignored until the threshold is lowered to . Lazy updates also accelerate the binary search over .

3 The Algorithm

We describe the Fast-Full algorithm (Algorithm 1). The main part of the algorithm is the Fast subroutine (Algorithm 2), which is instantiated with different guesses of OPT. These guesses of OPT are geometrically increasing from to by a factor, so contains a value that is a approximation to OPT. The algorithm binary searches over guesses for the largest guess that obtains a solution that is a approximation to .

  input function , cardinality constraint , parameter
  
   Binary-Search where
       
  return
Algorithm 1 Fast-Full: the full algorithm

Fast generates at every iteration a uniformly random sequence of the elements not yet discarded. After the preprocessing step which adds to elements guaranteed to have high contribution, the algorithm identifies a position in this sequence which determines the prefix that is added to the current solution . Position is defined as the largest position such that there is a large fraction of elements in with high contribution to . To find , we binary search over geometrically increasing positions . At each position , we only evaluate the contributions of elements , where is a uniformly random subset of of size , instead of all elements .

  input function , cardinality constraint , guess for OPT, parameter
  
  while and number of iterations do
       
       while and do
            
            
            
            if then and continue to next iteration
            ,
             Binary-Search
            
  return
Algorithm 2 Fast: the Fast Adaptive Sequencing Technique algorithm

3.1 Analysis

We show that Fast obtains a approximation w.p. and that it has adaptive complexity and query complexity.

[] Assume and , where . Then, Fast with has at most adaptive rounds, queries, and achieves a approximation w.p. .

We defer the analysis to Appendix C. The main part of it is for the approximation guarantee, which consists of two cases depending on the condition which breaks the outer-loop. Lemma 3 shows that when there are iterations of the outer-loop, the set of elements added to at every iteration of the outer-loop contributes . Lemma 5 shows that for the case where , the expected contribution of each element added to is arbitrarily close to . For each solution , we need the approximation guarantee to hold with high probability instead of in expectation to be able to binary search over guesses for OPT, which we obtain in Lemma 7 by generalizing the robust guarantees of [HS17] in Lemma 6. The main observation to obtain the adaptive complexity (Lemma C.1) is that, by definition of , at least an fraction of the surviving elements in are discarded at every iteration with high probability.222To obtain the adaptivity with probability and the approximation guarantee w.p. , the algorithm declares failure after rounds and accounts for this failure probability in . For the query complexity (Lemma C.1), we note that there are function evaluations per iteration.

4 Experiments

Our goal in this section is to show that in practice, Fast finds solutions whose value meets or exceeds alternatives in less parallel runtime than both state-of-the-art low-adaptivity algorithms and Lazier-than-Lazy-Greedy. To accomplish this, we build optimized parallel MPI implementations of Fast, other low-adaptivity algorithms, and Lazier-than-Lazy-Greedy, which is widely regarded as the fastest algorithm for submodular maximization in practice. We then use Intel Skylake-SP GHz processors on AWS to compare the algorithms’ runtime over a variety of objectives defined on real and synthetic datasets. We measure runtime using a rigorous measure of parallel time (see Appendix D.7). Appendices D.1, D.3, D.8, and D.5 contain detailed descriptions of the benchmarks, objectives, implementations, hardware, and experimental setup on AWS.

We conduct two sets of experiments. The first set compares Fast to previous low-adaptivity algorithms. Since these algorithms all have practically intractable sample complexity, we grossly reduce their sample complexity to only samples per iteration so that each processor performs a single function evaluation per iteration. This reduction, which we discuss in detail below, gives these algorithms a large runtime advantage over Fast, which computes its full theoretical sample complexity in these experiments. This is practically feasible for Fast because Fast samples elements, not sets of elements like other low-adaptivity algorithms. Despite the large advantage this setup gives to the other low-adaptivity algorithms, Fast is consistently one to three orders of magnitude faster.

The second set of experiments compares Fast to Parallel-Lazier-than-Lazy-Greedy (Parallel-LTLG). We scale up the objectives to be defined over synthetic data with and real data with up to with from to We find that Fast is consistently to times faster than Parallel-LTLG and that its runtime advantage increases in . These fast relative runtimes are a loose lower bound on Fast’s performance advantage, as Fast can reap additional speedups by adding up to processors, whereas Parallel-LTLG performs at most function evaluations per iteration, so using over processors often does not help. In Section 4.1, we show that on many objectives Fast is faster even with only a single processor.

4.1 Experiment set 1: Fast vs. low-adaptivity algorithms

Our first set of experiments compares Fast to state-of-the-art low-adaptivity algorithms. To accomplish this, we built optimized parallel MPI versions of each of the following algorithms: Randomized-Parallel-Greedy [chekuri2018submodular], Exhaustive-Maximization [FMZ19], and Amortized-Filtering [BRS19]. For any given all these algorithms achieve a approximation in rounds. For calibration, we also ran (1) Parallel-Greedy, a parallel version of the standard Greedy algorithm as a heuristic upper bound for the objective value, as well as (2) Random, an algorithm that simply selects elements uniformly at random.

A fair comparison of the algorithms’ parallel runtimes and solution values is to run each algorithm with parameters that yield the same guarantees, for example a approximation w.p. with and . However, this is infeasible since the other low-adaptivity algorithms all require a practically intractable number of queries to achieve any reasonable guarantees, e.g., every round of Amortized-Filtering would require at least samples, even with .

Dealing with practically intractable query complexity of benchmarks.

To run other low-adaptivity algorithms despite their huge sample complexity we made two major modifications:

  1. Accelerating subroutines. We optimize each of the three other low-adaptivity benchmarks by implementing parallel binary search to replace brute-force search and several other modifications that reduce unnecessary queries (for a full description of these fast implementations, see Appendix D.9). These optimizations result in speedups that reduce their runtimes by an order of magnitude in practice, and these optimized implementations are publicly available in our code base. Despite this, it remains practically infeasible to compute these algorithms’ high number of samples in practice even on small problems (e.g. elements);

    Figure 1: Experiment Set 1.a: Fast vs. low-adaptivity algorithms on graphs (time axis log-scaled).
  2. Using a single query per processor. Since our interest is in comparing runtime and not quality of approximation, we dramatically lowered the number of queries the three benchmark algorithms require to achieve their guarantees. Specifically, we set the parameters and for both Fast and the three low-adaptivity benchmarks such that all algorithms guarantee the same approximation with probability (see Appendix D.2). However, for the low-adaptivity benchmarks, we reduce their theoretical sample complexity in each round to have exactly one sample per processor (instead of their large sample complexity, e.g. samples needed for Amortized-Filtering). This reduction in the number of samples per round allows the benchmarks to have each processor perform a single function evaluation per round instead of e.g. functions evaluations per processor per round, which ‘unfairly’ accelerates their runtimes at the expense of their approximations. However, we do not perform this reduction for Fast. Instead, we require Fast to compute the full count of samples for its guarantees. This is feasible since Fast samples elements and not sets.

Data sets.

Even with these modifications, for tractability we could only use small data sets:

  • Experiments 1.a: synthetic data sets (). To compare the algorithms’ runtimes under a range of conditions, we solve max cover on synthetic graphs generated via four different well-studied graph models: Stochastic Block Model (SBM); Erdős Rényi (ER); Watts-Strogatz (WS); and Barbási-Albert (BA). See Appendix D.3.1 for additional details;

  • Experiments 1.b: real data sets (). To compare the algorithms’ runtimes on real data, we optimize Sensor Placement on California roadway traffic data; Movie Recommendation on MovieLens data; Revenue Maximization on YouTube Network data; and Influence Maximization on Facebook Network data. See Appendix D.3.3 for additional details.

Figure 2: Experiment Set 1.b: Fast vs. low-adaptivity algorithms on real data (time axis log-scaled).
Results of experiment set 1.

Figures 1 and 2 plot all algorithms’ solution values and parallel runtimes on synthetic and real data. In terms of solution values, across all experiments, values obtained by Fast are nearly indistinguishable from values obtained by Greedy—the heuristic upper bound. From this comparison, it is clear that Fast does not compromise on the values of its solutions. In terms of runtime, Fast is to times faster than Exhaustive-Maximization; to times faster than Randomized-Parallel-Greedy; and to times faster than Amortized-Filtering on the objectives and various (the time axes of Figures 1 and 2 are log-scaled). We emphasize that Fast’s faster runtimes were obtained despite the fact that the three other low-adaptivity algorithms used only a single sample per processor at each of their iterations.

4.2 Experiment set 2: Fast vs. Parallel-Lazier-than-Lazy-Greedy

Our second set of experiments compares Fast to a parallel version of Lazier-than-Lazy-Greedy (LTLG[mirzasoleiman2015lazier], which is widely regarded as the fastest algorithm for submodular maximization in practice. Specifically, we build an optimized, scalable, truly parallel MPI implementation of LTLG which we refer to as Parallel-LTLG (see Appendix D.10).

This allows us to scale up to random graphs with , large real data with up to , and various from to . For these large experiments, running the parallel Greedy algorithm is impractical. LTLG has a approximation guarantee in expectation, so we likewise set both algorithms’ parameters to guarantee a approximation in expectation (see Appendix D.2).

Figure 3: Experiment Set 2.a: Fast (blue)  vs. Parallel-LTLG (red) on graphs.
Figure 4: Experiment Set 2.b: Fast (blue)  vs. Parallel-LTLG (red) on real data.
Results of experiment set 2.

Figures 3 and 4 plot solution values and runtimes for large experiments on synthetic and real data. In terms of solution values, while the two algorithms achieved similar solution values across all 8 experiments, Fast obtained slightly higher solution values than Parallel-LTLG on the majority of objectives and values of we tried.

In terms of runtime, Fast was to times faster than Parallel-LTLG on each of the objectives and all we tried from to . More importantly, runtime disparities between Fast and Parallel-LTLG increase in larger , so larger problems exhibit even greater runtime advantages for Fast.

Furthermore, we emphasize that due to the fact that the sample complexity of Parallel-LTLG is less than 95 for many experiments, it cannot achieve better runtimes by using more processors, whereas Fast can leverage up to processors to achieve additional speedups. Therefore, Fast’s fast relative runtimes are a loose lower bound for what can be obtained on larger-scale hardware and problems. Figure 5 plots Fast’s parallel speedups versus the number of processors we use.

Finally, we note that even when running the algorithms on a single processor, Fast is faster than LTLG for reasonable values of on of the objectives due to the fact that Fast often uses fewer queries (see Appendix D.11). For example, Figure 5 plots single processor runtimes for the YouTube experiment.

Figure 5: Single processor runtimes for Fast and Parallel-LTLG, and parallel speedups vs. number of processors for Fast for the YouTube experiment.

Acknowledgments

This research was supported by NSF grant 1144152, NSF grant 1647325, a Google PhD Fellowship, NSF grant CAREER CCF 1452961, NSF CCF 1816874, BSF grant 2014389, NSF USICCS proposal 1540428, a Google Research award, and a Facebook research award.

References

Appendix

Appendix A Comparison of Query and Adaptive Complexity with Previous Work

We compare the query and adaptive complexity achieved by Fast to previous work. As mentioned in the introduction, Fast improves the dependency in the linear term of the query complexity of at least in [BRS19, EN19] and in [FMZ19] to . We provide the first non-asymptotic bounds on the query and adaptive complexity of an algorithm with sublinear adaptivity, showing dependency on small constants. In Table 1, we compare our (asymptotic) bounds to the query and adaptive complexity achieved in previous work.

Query complexity Adaptivity
  Amortized-Filtering [BRS19]
Exhaustive-Maximization [FMZ19]
  Randomized-Parallel-Greedy [chekuri2018submodular]
  Fast
Table 1: Comparison on the query complexity and adaptivity achieved in previous work and in this paper in order to obtain a approximation with probability , where .

Appendix B Vanilla Adaptive-Sequencing

We begin by describing a simplified version of the algorithm. This algorithm is adaptive (without additional dependence on constants), uses a total of queries (again, no additional constants), and obtains a approximation in expectation. Importantly, it assumes the value of the optimal solution OPT is known. The full algorithm is an optimized version which does not assume OPT is known and improves the query complexity.

b.1 Description of Adaptive-Sequencing

Adaptive-Sequencing, formally described below as Algorithm 3, generates at every iteration a random sequence of elements that is used to both add elements to the current solution and discard elements from further consideration. More precisely, each element , for , in is a uniformly random element from the set of surviving elements , which initially contains all elements. The algorithm identifies a position in this sequence which determines the elements that are added to the current solution , as well as the elements with low contribution to that are discarded from . This position is defined to be the smallest position such that there is at least an fraction of elements in with low contribution to . By the minimality of , we simultaneously obtain that (1) the elements added to , which are the elements before position , are likely to contribute high value to the solution and (2) at least an fraction of the surviving elements have low contribution to and are discarded.

The algorithm iterates until there are no surviving elements left in . It then lowers the threshold between high and low contribution and reinitializes the surviving elements to be all elements. The algorithm lowers the threshold at most times and then returns the solution obtained.

  input function , cardinality constraint , parameter , value of optimal solution OPT
  
  while and number of iterations do
       
       while and do
            
            
                 with
            
            
  return
Algorithm 3 Adaptive-Sequencing

b.2 Analysis of Adaptive-Sequencing

b.2.1 The adaptive complexity and query complexity

The main observation to bound the number of iterations of the algorithm is that, by definition of , at least an fraction of the surviving elements in are discarded at every iteration. Since the queries at every iteration of the inner-loop can be evaluated in parallel, the adaptive complexity is the total number of iterations of this inner-loop. For the query complexity, we note that there are function evaluations per iteration.

[] The adaptive complexity of Adaptive-Sequencing is at most . Its query complexity is at most .

Proof.

We first analyze the adaptive complexity and then the query complexity.

The adaptive complexity.

The algorithm consists of an outer-loop and an inner-loop. We first argue that at any iteration of the outer-loop, there are at most iterations of the inner loop. By definition of , we have that . Thus there is at least an fraction of the elements in that are discarded at every iteration. The inner-loop terminates when , which occurs at the latest at iteration where This implies that there are at most iterations of the inner loop. The function evaluations inside an iteration of the inner-loop are non-adaptive and can be performed in parallel in one round. These are the only function evaluations performed by the algorithm.333The value of needed to compute can be obtained using that was computed in the previous iteration. Since there are at most iterations of the outer-loop, there are at most rounds of parallel function evaluations.

The query complexity.

In the inner-loop, the algorithm evaluates the marginal contribution of each element to for all , so a total of function evaluations. Similarly as for Lemma B.2.1, at any iteration of the outer-loop, there are at most iterations of the inner-loop and we have at iteration . We conclude that the query complexity is

b.2.2 The approximation guarantee

There are two cases depending on the condition which breaks the outer-loop. The main lemma for the case where there are iterations of the outer-loop is that at every iteration, the elements added to contribute an fraction of the remaining value .

[] Let be the current solution at the start of iteration of the outer-loop of Adaptive-Sequencing. For any , if , then

Proof.

Since , at the end of iteration . This implies that for all elements , is discarded from by the algorithm at some iteration where for some . By submodularity, for any element , we get Next, by monotonicity and submodularity, Combining the two previous inequalities, we obtain

By rearranging the terms, we get the desired result. ∎

The main lemma for the case where is that the expected contribution of each element added to the current solution is arbitrarily close to a fraction of the remaining value .

[] At any iteration of the inner-loop of Adaptive-Sequencing, for all , we have

Proof.

Since is a uniformly random element from and for , we have

By standard greedy analysis, Lemmas B.2.2 and B.2.2 imply that the algorithm obtains a approximation in each case. We emphasize the low constants and dependencies on in this result compared to previous results in the adaptive complexity model.

Theorem 1.

Adaptive-Sequencing is an algorithm with at most adaptive rounds and queries that achieves a approximation in expectation.

Proof.

We first consider the case where there are iterations of the outer-loop. Let be the set at each of the iterations of Adaptive-Sequencing. The algorithm increases the value of the solution by at least at every iteration by Lemma B.2.2. Thus,

Next, we show by induction on that

Observe that

Since , we get

Similarly, for the case where the solution returned is such that , by Lemma B.2.2 and by induction we get that

Appendix C Analysis of the Main Algorithm

We define and

c.1 Adaptive Complexity and Query Complexity

The adaptivity of the main algorithm is slightly worse than for Adaptive-Sequencing due to the binary searches over and . To obtain the adaptive complexity with probability , if at any iteration of the outer while-loop there are at least iterations of the inner-loop, we declare failure. In Lemma 2, we show this happens with low probability.

[] The adaptive complexity of Fast is at most

Proof.

The algorithm consists of four nested loops: a binary search over , an outer while-loop, an inner while-loop, and a binary search over . For the binary searches, we have and . Thus, there are at most iterations for each binary search.

Due to the termination condition of the while-loops, there are at most and iterations of each while-loop. The function evaluations inside an iteration of the last nested loop are non-adaptive and can be performed in parallel in one round. Thus the adaptive complexity of Fast is a most

Thanks to the binary search over and the subsampling of from , the query complexity is improved from to

[] The query complexity of Fast is at most

Proof.

There are queries, for all , needed to compute . At each iteration of the binary search over , there are queries needed for to evaluate for . There are at most instances of the binary search over , each with at most iterations. The total number of queries for this binary search is at most

At each iteration of the inner-while loop, there are at most queries to update and at most queries to add elements to . There are at most instances of the inner while-loop each with at most iterations. The total number of queries for updating and is

By combining the queries needed to compute , , and , we get the desired bound on the query complexity. ∎

c.2 The Approximation

c.2.1 Finding

Similarly as for Adaptive-Sequencing, we denote and .

Lemma 1.

Assume that , then, with probability , for all iterations of the inner while-loop, we have that and .

Proof.

By the definition of and , we have that and . We show by contrapositive that, with probability if then and that if then .

Note that for all and , we have

First, assume that . Then by the Chernoff bound, with ,

Next, assume that . By the Chernoff bound with ,

Thus, with and by contrapositive, we have that and each with probability . By a union bound, these both hold with probability for all iterations of the inner while-loop. ∎

Corollary 1.

With probability , for all iterations of the inner while-loop, we have

  • for all , and

  • for all

Proof.

Consider an iteration of the inner while-loop. We first note that, by submodularity, is monotonically decreasing as increases. Thus we can perform a binary search over to find . By Lemma 1, we have that with probability , we have that and . We conclude the proof by noting that by submodularity, is also monotonically decreasing as increases. ∎

Lemma 2.

With probability , at every iteration of the outer while-loop, after at most iterations of the inner while-loop.

Proof.

By Lemma 1, with probability , at every iteration of the inner while-loop, there is at least an fraction of the elements in that are discarded. We assume this is the case. After iterations of discarding an fraction of the elements in , we have .

c.2.2 If number of iterations of outer while-loop is

The analysis defers depending on whether the number of iterations or the size of the solution caused the algorithm to terminate. We first analyze the case where Adaptive-Sequencing returned s.t. because the number of iterations reached . The main lemma for this case is that at every iteration of Adaptive-Sequencing, if , the set added to the current solution contributes at least an fraction of the remaining value .

Lemma 3.

Assume that and let be the set at the start of iteration of the outer while-loop of Fast. With probability , for all and all , we have that if , then

Proof.

By Lemma 2, with probability , at every iteration of the outer while-loop, after at most iterations of the inner while-loop. We assume this holds for the remaining of this proof.

Since , at the end of iteration of the outer while-loop. This implies that for all elements , is discarded from by the algorithm at some iteration where

for some . By submodularity, for any element , we get

Next, since , by monotonicity, and by submodularity,

Combining the previous inequalities, we obtain

By rearranging the terms, we get the desired result. ∎

By standard greedy analysis, similarly as for the proof of Theorem 1 we obtain that .

Lemma 4.

With probability , for all , after iterations of the outer while-loop of Fast, .

c.2.3 If

Next, we analyze the case where the outer-loop terminated because . We show that each element added to is, in expectation, a good approximation to .

Lemma 5.

With probability , at every iteration of the inner while-loop, we have that independently for each , with probability at least ,

Proof.

By Corollary 1, we have that with probability , for all iterations of the inner while-loop,

for all . We assume this is the case and consider an iteration of the inner while-loop. Since each is a uniformly random element from , we have that independently for each ,

c.2.4 Guessing Opt

Lemma 6 (Extends [Hs17]).

Consider a set and let . Assume that, independently for each , we have that with probability at least ,

then, for any such that ,

with probability at least .

Proof.

The analysis is similar as in [HS17]. Assume that and let . We first argue that . By induction, we have that

Since , we obtain

Let . By the Chernoff bound,

Thus, with probability at least , we get