Non-monotone Submodular Maximization with Nearly Optimal Adaptivity Complexity

08/19/2018 ∙ by Matthew Fahrbach, et al. ∙ 0

As a generalization of many classic problems in combinatorial optimization, submodular optimization has found a wide range of applications in machine learning (e.g., in feature engineering and active learning). For many large-scale optimization problems, we are often concerned with the adaptivity complexity of an algorithm, which quantifies the number of sequential rounds where polynomially-many independent function evaluations can be executed in parallel. While low adaptivity is ideal, it is not sufficient for a distributed algorithm to be efficient, since in many practical applications of submodular optimization the number of function evaluations becomes prohibitively expensive. Motivated by such applications, we study the adaptivity and query complexity of non-monotone submodular optimization. We provide the first constant approximation algorithm for maximizing a non-monotone submodular function with cardinality constraint k that has nearly-optimal adaptivity complexity O((n)). Furthermore, our algorithm makes only O((k)) calls per element to the function evaluation oracle in expectation.



There are no comments yet.


This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Submodular set functions are a powerful tool for modeling real-world problems because they naturally exhibit the property of diminishing returns. Several well-known examples of submodular functions include graph cuts, entropy-based clustering, coverage functions, and mutual information. As a result, submodular functions have been increasingly used in applications of machine learning such as data summarization (Simon et al., 2007; Sipos et al., 2012; Tschiatschek et al., 2014), feature selection (Das & Kempe, 2008; Khanna et al., 2017), and recommendation systems (El-Arini & Guestrin, 2011). While some of these applications involve maximizing monotone submodular functions, the more general problem of non-monotone submodular maximization has also been used extensively (Feige et al., 2011; Buchbinder et al., 2014; Mirzasoleiman et al., 2016; Balkanski et al., 2018; Norouzi-Fard et al., 2018). Some specific applications of non-monotone submodular maximization include image summarization and movie recommendation (Mirzasoleiman et al., 2016), and revenue maximization in viral marketing (Hartline et al., 2008). Two compelling uses of non-monotone submodular maximization algorithms are:

  • Optimizing objectives that are a monotone submodular function minus a linear cost function that penalizes the addition of more elements to the set (e.g., the coverage and diversity trade-off). This appears in facility location problems where opening centers is expensive and in exemplar-based clustering (Dueck & Frey, 2007).

  • Expressing learning problems such as feature selection using weakly submodular functions (Das & Kempe, 2008; Khanna et al., 2017; Elenberg et al., 2018; Qian & Singer, 2019). One possible source of non-monotonicity in this context is overfitting to training data by selecting too many representative features (e.g., Section 1.6 and Corollary 3.19 in (Mohri et al., 2018)). Although most of these learning problems have not yet been rigorously modeled as non-monotone submodular functions, there has been a recent surge of interest and a substantial amount of momentum in this direction.

The literature on submodular optimization typically assumes access to an oracle that evaluates the submodular function. In practice, however, oracle queries may take a long time to process. For example, the log-determinant of submatrices of a positive semi-definite matrix is a submodular function that is notoriously expensive to compute (Kazemi et al., 2018). Therefore, our goal when designing distributed algorithms is to minimize the number of rounds where the algorithm communicates with the oracle. This motivates the notion of the adaptivity complexity of submodular optimization, first investigated in (Balkanski & Singer, 2018). In this model of computation, the algorithm can ask polynomially-many independent oracle queries all together in each round.

In a wide range of machine learning optimization problems, the objective functions can only be estimated through oracle access to the function. In many instances, these oracle evaluations are a new time-consuming optimization problem that we treat as a black box (e.g., hyperparameter optimization). Since our goal is to optimize the objective function using as few rounds of interaction with the oracle as possible, insights and algorithms developed in this adaptivity complexity framework can have a deep impact on distributed computing for machine learning applications in practice. Further motivation for the importance of this computational model is given in 

(Balkanski & Singer, 2018).

While the number of adaptive rounds is an important quantity to minimize, the computational complexity of evaluating oracle queries also motivates the design of algorithms that are efficient in terms of the total number of oracle queries. An algorithm typically needs to make at least a constant number of queries per element in the ground set to achieve a constant-factor approximation. In this paper, we study the adaptivity complexity and the total number of oracle queries that are needed to guarantee a constant-factor approximation when maximizing a non-monotone submodular function.

Results and Techniques. Our main result is a distributed algorithm for maximizing a non-monotone submodular function subject to a cardinality constraint that achieves an expected -approximation in adaptive rounds using expected function evaluation queries. To the best of our knowledge, this is the first constant-factor approximation algorithm with nearly optimal adaptivity for the general problem of maximizing non-monotone submodular functions. The adaptivity complexity of our algorithm is optimal up to a factor by the lower bound in (Balkanski & Singer, 2018).

The building blocks of our algorithm are the Threshold-Sampling subroutine in (Fahrbach et al., 2019), which returns a subset of high-valued elements in adaptive rounds, and the unconstrained submodular maximization algorithm in (Chen et al., 2018) that gives a -approximation in adaptive rounds. We modify Threshold-Sampling

to terminate early if its pool of candidate elements becomes too small, which ensures that each element is not chosen with at least constant probability. This property has been shown to be useful for obtaining constant-factor approximations for non-monotone submodular function maximization 

(Buchbinder et al., 2014). Next, we run unconstrained maximization on the remaining set of high-valued candidates if its size is at most , downsample accordingly, and output the better of the two solutions. Our analysis shows how to optimize the constant parameters to balance between these two behaviors. Last, since Threshold-Sampling requires an input close to , we find an interval containing OPT, try logarithmically-many input thresholds in parallel, and return the solution with maximum value. We note that improving the bounds for OPT via low-adaptivity preprocessing can reduce the total query complexity as shown in (Fahrbach et al., 2019).

Algorithm Approximiation Adaptivity Queries
(Buchbinder et al., 2016)
(Balkanski et al., 2018)
(Chekuri & Quanrud, 2018)
(Ene et al., 2018b)
This paper

Table 1: Independent and concurrent works for low-adaptivity non-monotone submodular maximization subject to a cardinality constraint.

Related Works. Submodular maximization has garnered a significant amount of attention in the distributed and streaming literature because of its role in large-scale data mining (Lattanzi et al., 2011; Mirzasoleiman et al., 2013; Badanidiyuru et al., 2014; Kumar et al., 2015; Mirrokni & Zadimoghaddam, 2015; Barbosa et al., 2015, 2016; Fahrbach et al., 2018; Liu & Vondrak, 2018). However, in many distributed models (e.g., the Massively Parallel Computation model), round complexity often captures a different notion than adaptivity complexity. For example, a constant-factor approximation is achievable in two rounds of computation (Mirrokni & Zadimoghaddam, 2015), but it is impossible to compute a constant-factor approximation in adaptive rounds (Balkanski & Singer, 2018). Since adaptivity measures the communication complexity with a function evaluation oracle, a round in most distributed models can have arbitrarily high adaptivity.

The first set of related works with low adaptivity focus on maximizing monotone submodular functions subject to a cardinality constraint . In (Balkanski & Singer, 2018), the authors show that a -approximation is achievable in rounds. In terms of parallel running time, this is exponentially faster than the celebrated greedy algorithm which gives a -approximation in rounds (Nemhauser et al., 1978). Subsequently, (Balkanski et al., 2019; Ene & Nguyen, 2019; Fahrbach et al., 2019) independently designed -approximation algorithms with adaptivity. These works also show that only oracle queries are needed in expectation. Recent works have also investigated the adaptivity of the multilinear relaxation of monotone submodular functions subject to packing constraints (Chekuri & Quanrud, 2019) and the submodular cover problem (Agarwal et al., 2019).

While the general problem of maximizing a (not necessarily monotone) submodular function has been studied extensively (Lee et al., 2010; Feige et al., 2011; Gharan & Vondrák, 2011; Buchbinder et al., 2014), noticeably less progress has been made. For example, the best achievable approximation for the centralized maximization problem is unknown but in the range  (Buchbinder & Feldman, 2016; Gharan & Vondrák, 2011). However, some progress has been made for the adaptive complexity of this problem, all which has been done independently and concurrently with an earlier version of this paper. Recently, (Balkanski et al., 2018) designed a parallel algorithm for non-monotone submodular maximization subject to a cardinality constraint that gives a -approximation in  adaptive rounds. Their algorithm estimates the expected marginal gain of random subsets, and therefore the number of function evaluations it needs to achieve provable guarantees is

. We acknowledge that the query complexity can likely be improved via normalization or estimating an indicator random variable instead. The works of 

(Chekuri & Quanrud, 2018; Ene et al., 2018b) give constant-factor approximation algorithms with adaptivity for maximizing non-monotone submodular functions subject to matroid constraints. Their approaches use multilinear extensions and thus require function evaluations to simulate an oracle for with high enough accuracy. There have also been significant advancements in low-adaptivity algorithms for the problem of unconstrained submodular maximization (Chen et al., 2018; Ene et al., 2018a).

2 Preliminaries

For any set function and subsets , let denote the marginal gain of  at with respect to . We refer to as the ground set and let . A set function is submodular if for all and any we have , where the marginal gain notation is overloaded for singletons. A set function is monotone if for all subsets we have . In this paper, we investigate distributed algorithms for maximizing submodular functions subject to a cardinality constraint, including those that are non-monotone. Let be a solution set to the maximization problem subject to the cardinality constraint , and let

denote the uniform distribution over all subsets of

of size .

Our algorithms take as input an evaluation oracle for , which for every query returns in time. Given an evaluation oracle, we define the adaptivity of an algorithm to be the minimum number of rounds needed such that in each round the algorithm makes independent queries to the evaluation oracle. Queries in a given round may depend on the answers of queries from previous rounds but not the current round. We measure the parallel running time of an algorithm by its adaptivity.

One of the inspirations for our algorithm is the following lemma, which is remarkably useful for achieving a constant-factor approximation for general submodular functions.

Lemma 2.1.

(Buchbinder et al., 2014) Let be submodular. Denote by a random subset of where each element appears with probability at most (not necessarily independently). Then .

In our case, if is the output of the algorithm and the probability of any element appearing in  is bounded away from , we can analyze the submodular function defined by to lower bound in terms of since .

2.1 The Threshold-Sampling Algorithm

We start with a high-level description of the Threshold-Sampling algorithm in (Fahrbach et al., 2019), which after a slight modification is the main subroutine of our non-monotone maximization algorithm. For an input threshold , Threshold-Sampling iteratively builds a solution set  over adaptive rounds and maintains a set of unchosen candidate elements . Initially, the solution set is empty and all elements are candidates (i.e., and ). In each round, the algorithm starts by discarding elements in whose marginal gain to the current solution  is less than the threshold . Then the algorithm efficiently finds the largest cardinality such that for uniformly at random we have . At the end of a round, the algorithm samples and updates the current solution to be .

The random choice of in Threshold-Sampling has two beneficial effects. First, it ensures that in expectation the average contribution of each element in the returned set is at least . Second, it implies that an expected -fraction of candidates are filtered out of in each round. Therefore, the number of elements that the algorithm considers in each round decreases geometrically in expectation. It follows that rounds suffice to guarantee that when the algorithm terminates, we either have or the marginal gains of all the elements are below the threshold.

Before presenting Threshold-Sampling, we define the distribution from which Threshold-Sampling samples when estimating the maximum cardinality 

. Sampling from this Bernoulli distribution can be simulated with two calls to the evaluation oracle.

Definition 2.2.

Conditioned on the current state of the algorithm, consider the process where the set and then the element are drawn uniformly at random. Let

denote the probability distribution over the indicator random variable


We can view as the probability that the -th marginal is at least the threshold  if the candidates in are inserted into according to a random permutation.

Input: oracle for , constraint , threshold , error , failure probability

1:  Set smaller error
2:  Set ,
3:  Set smaller failure probability
4:  Initialize ,
5:  for  rounds do
6:     Filter
7:     if  then
8:        break
9:     for  to  do
10:        Set
11:        Set
12:        Sample
13:        Set
14:        if  then
15:           break
16:     Sample
17:     Update
18:     if  then
19:        break
20:  return
Algorithm 1 Threshold-Sampling
Lemma 2.3.

(Fahrbach et al., 2019) The algorithm Threshold-Sampling outputs with in adaptive rounds such that the following properties hold with probability at least :

  1. There are oracle queries in expectation.

  2. The expected marginal .

  3. If , then for all .

2.2 Unconstrained Submodular Maximization

The second subroutine in our non-monotone maximization algorithm is a constant-approximation algorithm for unconstrained submodular maximization that runs in a constant number of rounds depending on . While the focus of this paper is submodular maximization subject to a cardinality constraint, we show how calling Unconstrained-Max on a new ground set of size can be used with (Buchbinder et al., 2014) to achieve a constant-approximation for the constrained maximization problem.

Lemma 2.4.

(Feige et al., 2011) For any nonnegative submodular function , denote the solution to the unconstrained maximization problem by . If is a uniformly random subset of , then .

The guarantees for the Unconstrained-Max algorithm in Lemma 2.5 are standard consequences of Lemma 2.4.

Input: oracle for , ground subset , error , failure probability

1:  Set iteration bound
2:  for  to in parallel do
3:     Let be a uniformly random subset of
4:  Set
5:  return
Algorithm 2 Unconstrained-Max
Lemma 2.5.

For any nonnegative submodular function  and subset , Unconstrained-Max outputs a set in one adaptive round using oracle queries such that with probability at least we have , where .

An essentially optimal algorithm for unconstrained submodular maximization was recently given in (Chen et al., 2018), which allows us to slightly improve the approximation factor of our non-monotone maximization algorithm.

Theorem 2.6.

(Chen et al., 2018) There is an algorithm that achieves a -approximation for unconstrained submodular maximization using adaptive rounds and evaluation oracle queries.

3 Non-monotone Submodular Maximization

In this section we show how to combine Threshold-Sampling and Unconstrained-Max to achieve the first constant-factor approximation algorithm for non-monotone submodular maximization subject to a cardinality constraint  that uses adaptive rounds. Moreover, this algorithm makes expected oracle queries. While the approximation factor is only , we demonstrate that Threshold-Sampling can readily be extended to non-monotone settings without increasing its adaptivity.

We start by describing Adaptive-Nonmonotone-Max and the analysis of its approximation factor at a high level. One inspiration for this algorithm is Lemma 2.1, which allows us to lower bound the expected value of the returned set by OPT as long as every element has at most a constant probability less than 1 of being in the output. With this property in mind, Adaptive-Nonmonotone-Max starts by trying different thresholds in parallel, one of which is sufficiently close to . For each threshold, it runs Threshold-Sampling modified to break if the number of candidates in falls below . For all values of , this guarantees that each element appears in  with probability at most . In the event that Threshold-Sampling breaks because , it then runs unconstrained submodular maximization on and downsamples the solution so that it has cardinality at most . In the end, the algorithm returns the set with maximum value over all thresholds. Our analysis shows how we optimize the constants and to balance the expected trade-offs between the two events and thus give the best approximation factor. We present the algorithm and its guarantees below.

Input: evaluation oracle for , constraint , error , failure probability

1:  Set smaller error
2:  Set ,
3:  Set smaller failure probability
4:  Set optimized constants
5:  Initialize
6:  for  to in parallel do
7:     Set
8:     Set modified to break on Line 7 if
9:     Initialize ,
10:     if  then
11:        Set
12:        if  then
13:           Sample
14:           Update
15:        else
16:           Update
17:     Permute the elements of uniformly at random
18:     Set highest-valued prefix of the permutation
19:     Update
20:  return
Algorithm 3 Adaptive-Nonmonotone-Max
Theorem 3.1.

For any nonnegative submodular function , Adaptive-Nonmonotone-Max outputs a set with in adaptive rounds such that with probability at least it makes queries in expectation and .

Since the quality of our approximation relies on the approximation factor of a low-adaptivity algorithm for unconstrained submodular maximization, we can use Theorem 2.6 instead of Unconstrained-Max to improve our approximation without a loss in adaptivity or query complexity.

Theorem 3.2.

There is an algorithm for nonnegative submodular maximization subject to a cardinality constraint that achieves a -approximation in expectation using adaptive rounds and expected queries to the evaluation oracle.

3.1 Prerequisite Notation and Lemmas

We start by defining notation that is useful for analyzing Threshold-Sampling as the subroutine progresses. Let be the sequences of randomly generated sets used to build the output set . Similarly, let the corresponding sequences of partial solutions be and candidate sets be . To analyze the approximation factor of Adaptive-Nonmonotone-Max, we consider a threshold sufficiently close to and then analyze the resulting sets , , , and . Lastly, we use ALG as an alias for the final output set .

Next, we present several simple lemmas that are helpful for analyzing the approximation factor. The following lemma is an equation in the proof of Lemma 2.3, and we use this lemma to show that the elements in any partial solution have an average marginal gain exceeding the input threshold.

Lemma 3.3.

(Fahrbach et al., 2019) At each step of Threshold-Sampling, we have

Corollary 3.4.

At each step of Threshold-Sampling we have .

The following lemmas allow us show that (1) every element has at least a constant probability of not appearing in the output set, and (2) that the quality of a solution of size greater than degrades at worst by its downsampling rate. The first property is motivated by Lemma 2.1 and allows us to achieve a lower bound in terms of OPT in Lemma 3.9. The second property is useful for analyzing Line 13 of the Adaptive-Nonmonotone-Max algorithm.

Lemma 3.5.

For any element , .


Let  be an indicator random variable for the event . It follows that

Lemma 3.6.

For any subset and , if then .

We defer the proofs of Corollary 3.4 and Lemma 3.6 to the supplementary manuscript.

3.2 Analysis of the Approximation Factor

The main idea behind our analysis is to capture two different behaviors of Adaptive-Nonmonotone-Max and balance the worst of the two outcomes by optimizing constants.

Definition 3.7.

Let denote the event that the subroutine Threshold-Sampling breaks because . Similarly, let denote the complementary event.

The following two key lemmas lower bound the expected solution in terms of OPT and . The goal is to average these inequalities so that the probability terms disappear, giving us with a lower bound only in terms of OPT.

Lemma 3.8.

For any such that , we have .


Observing that , it follows from Corollary 3.4 and the law of total expectation that

The result follows from the fact . ∎

The core of the analysis is devoted to proving the following lower bound and intricately uses the conditional expectation of nonnegative random variables.

Lemma 3.9.

Let denote the approximation factor for an unconstrained submodular maximization algorithm. For any threshold such that , we have


For any pair of subsets returned by Threshold-Sampling, we can partition the optimal into and . Let be the output of a call to Unconstrained-Max. By Lemma 2.5, we have . Submodularity and the definition of also imply that . Let . By subadditivity and the previous inequalities, it follows that


Using Section 3.2 and the assumption on , we have


Our next goal is to upper bound as a function of so that we have a bound that is independent of . Specifically, we prove in the supplementary material that for all sets , . This is a consequence of submodularity. Therefore, we have by subadditivity since is nonnegative.

Next, define a new submodular function such that , and consider a random set returned by Threshold-Sampling. Each element appears in with probability at most by Lemma 3.5. Applying Lemma 2.1 to gives us . It follows that


Now we are prepared to give the lower bound for in terms of . From our earlier analysis, if the algorithm calls Unconstrained-Max, we can use the inequality Equation 2 to lower bound . Since , the claim follows from Section 3.2, the law of total expectation, and the nonnegativity of and . Last, it is possible that the unconstrained solution exceeds the cardinality constraint, but by construction . Therefore, it follows from Lemma 3.6 that , which gives us the desired lower bound for . ∎

Equipped with these two complementary lower bounds, we can now prove our main results.

Proof of Theorem 3.1.

First assume that all subroutines behave as desired with probability at least by our choice of and a union bound. Since Adaptive-Nonmonotone-Max necessarily tries a such that , the analysis that follows considers this particular threshold.

We start with the proof of the approximation factor. Suppose for a constant that we later optimize. This leads to a -approximation for OPT. Otherwise, , so it follows from Lemma 3.9 that


Taking a weighted average of Lemma 3.8 and Equation 4 gives us

To bound the approximation factor, we solve the optimization problem

subject to the constraint , which effectively balances the two complementary probabilities.

Now we optimize the constants in the algorithm. The equality constraint implies that . Next, we set the two expressions in the maximin problem to be equal since one is increasing in and the other is decreasing, which implies that

Using the expressions above for and , it follows that


Lemma 2.5 implies that . Setting , , it follows that , which gives us an approximation factor of by Equation 5.

The proof of the adaptivity and query complexities follow from Lemma 2.3 and Lemma 2.5 since all thresholds are run in parallel. This completes the analysis for the Adaptive-Nonmonotone-Max algorithm. ∎

Proof of Theorem 3.2.

The proof is analogous to the proof of Theorem 3.1 except that Theorem 2.6 implies . Setting , we have and an approximation factor of by Equation 5. Running the same non-monotone maximization algorithm with failure probability and error proves the claim. ∎

4 Experiments

(a) Image Summarization
(b) Image Summarization
(c) Image Summarization
(d) Movie Recommendation
(e) Movie Recommendation
(f) Revenue Maximization
(g) Revenue Maximization
(h) Revenue Maximization
Figure 1: Performance of Adaptive-Nonmonotone-Max compared to several benchmarks for image summarization on the CIFAR-10 dataset, movie recommendation on the MovieLens 20M dataset, and revenue maximization on the top 5,000 communities of YouTube.

In this section, we evaluate Adaptive-Nonmonotone-Max on three real-world applications introduced in (Mirzasoleiman et al., 2016). We compare our algorithm with several benchmarks for non-monotone submodular maximization and demonstrate that it consistently finds competitive solutions using significantly fewer rounds and queries. Our experiments build on those in (Balkanski et al., 2018), which plot function values at each round as the algorithms progress. Additionally, we include plots of for different constraints and plots of the cumulative number of queries an algorithm has used after each round. For algorithms that rely on a -approximation of OPT, we run all guesses in parallel and record statistics for the approximation that maximizes the objective function. We defer the implementation details to the supplementary manuscript.

Next, we briefly describe the benchmark algorithms. The Greedy algorithm builds a solution by choosing an element with the maximum positive marginal gain in each round. This requires adaptive rounds and oracle queries, and it does not guarantee a constant approximation. The Random algorithm randomly permutes the ground set and returns the highest-valued prefix of elements. It uses a constant number of rounds, makes  queries, and also fails to give a constant approximation. The Random-Lazy-Greedy-Improved algorithm (Buchbinder et al., 2016) lazily builds a solution by randomly selecting one of the elements with highest marginal gain in each round. This gives a -approximation in adaptive rounds using queries. The Fantom algorithm (Mirzasoleiman et al., 2016) is similar to Greedy and robust to intersecting matroid and knapsack constraints. For a cardinality constraint, it gives a -approximation using adaptive rounds and queries. The Blits algorithm (Balkanski et al., 2018) constructs a solution by randomly choosing blocks of high-valued elements, giving a -approximation in rounds. While Blits is exponentially faster than the previous algorithms, it requires oracle queries.

Image Summarization. The goal of image summarization is to find a small, representative subset from a large collection of images that accurately describes the entire dataset. The quality of a summary is typically modeled by two contrasting requirements: coverage and diversity. Coverage measures the overall representation of the dataset, and diversity encourages succinctness by penalizing summaries that contain similar images. For a collection of images , the objective function we use for image summarization is

where is the similarity between image and image . The trade-off between coverage and diversity naturally gives rise to non-monotone submodular functions. We perform our image summarization experiment on the CIFAR-10 test set (Krizhevsky & Hinton, 2009), which contains 10,000

color images. The image similarity

is measured by the cosine similarity of the 3,072-dimensional pixel vectors for images

and . Following (Balkanski et al., 2018), we randomly select images to be our subsampled ground set since this experiment is throttled by the number and cost of oracle queries.

We set in Figure 0(a) and track the progress of the algorithms in each round. Figure 0(b) compares the solution quality for different constraints and demonstrates that Adaptive-Nonmonotone-Max and Blits find substantially better solutions than Random. We use

trials for each stochastic algorithm and plot the mean and standard deviation of the solutions. We note that

Fantom performs noticeably worse than the others because it stops choosing elements when their (possibly positive) marginal gain falls below a fixed threshold. We give a picture-in-picture plot of the query complexities in Figure 0(c) to highlight the difference in overall cost of the estimators for Adaptive-Nonmonotone-Max and Blits.

Movie Recommendation. Personalized movie recommendation systems aim to provide short, comprehensive lists of high-quality movies for a user based on the ratings of similar users. In this experiment, we randomly sample 500 movies from the MovieLens 20M dataset (Harper & Konstan, 2016), which contains 20 million ratings for 26,744 movies by 138,493 users. We use


 (Mazumder et al., 2010) to predict the rating vector for each movie via low-rank matrix completion, and we define the similarity of two movies as the inner product of the rating vectors for movies and . Following (Mirzasoleiman et al., 2016), we use the objective function

with . Note that if we have the cut function.

We remark that experiment is similar to solving max-cut on an Erdös-Rényi graph. In Figure 0(d) we set , and in Figure 0(e) we consider . The Greedy algorithm performs moderately better than Random as the constraint approaches , and all other algorithms except Fantom are sandwiched between these benchmarks. The query complexities are similar to Figure 0(c), so we exclude this plot to keep Figure 1 compact.

Revenue Maximization. In this application, our goal is to choose a subset of users in a social network to advertise a product in order to maximize its revenue. We consider the top 5,000 communities of the YouTube network (Leskovec & Krevl, 2014) and subsample the graph by restricting to 25 randomly chosen communities (Balkanski et al., 2018). The resulting network has 1,329 nodes and 3,936 edges. We assign edge weights according to the continuous uniform distribution , and we measure influence using the non-monotone function

In Figure 0(f), we set and observe that Adaptive-Nonmonotone-Max significantly outperforms Fantom and Random. Figure 0(g) shows a stratification of the algorithms for , and Figure 0(h) is similar to the image summarization experiment. We note that the inner plot in Figure 0(h) shows that for the optimal threshold of Adaptive-Nonmonotone-Max, the number of candidates instantly falls below and the algorithm outputs a random prefix of high-valued elements in the next round.

5 Conclusions

We give the first algorithm for maximizing a non-monotone submodular function subject to a cardinality constraint that achieves a constant-factor approximation with nearly optimal adaptivity complexity. The query complexity of this algorithm is also nearly optimal and considerably less than in previous works. While the approximation guarantee is only , our empirical study shows that for several real-world applications Adaptive-Nonmonotone-Max finds solutions that are competitive with the benchmarks for non-monotone submodular maximization and requires significantly fewer rounds and oracle queries.


We thank the anonymous reviewers for their valuable feedback. Matthew Fahrbach was supported in part by an NSF Graduate Research Fellowship under grant DGE-1650044. Part of this work was done while he was a summer intern at Google Research, Zürich.


  • Agarwal et al. (2019) Agarwal, A., Assadi, S., and Khanna, S. Stochastic submodular cover with limited adaptivity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 323–342. SIAM, 2019.
  • Badanidiyuru et al. (2014) Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., and Krause, A. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 671–680. ACM, 2014.
  • Balkanski & Singer (2018) Balkanski, E. and Singer, Y. The adaptive complexity of maximizing a submodular function. In

    Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

    , pp. 1138–1151. ACM, 2018.
  • Balkanski et al. (2018) Balkanski, E., Breuer, A., and Singer, Y. Non-monotone submodular maximization in exponentially fewer iterations. In Advances in Neural Information Processing Systems, pp. 2359–2370, 2018.
  • Balkanski et al. (2019) Balkanski, E., Rubinstein, A., and Singer, Y. An exponential speedup in parallel running time for submodular maximization without loss in approximation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 283–302. SIAM, 2019.
  • Barbosa et al. (2015) Barbosa, R., Ene, A., Nguyen, H., and Ward, J. The power of randomization: Distributed submodular maximization on massive datasets. In International Conference on Machine Learning, pp. 1236–1244, 2015.
  • Barbosa et al. (2016) Barbosa, R. d. P., Ene, A., Nguyen, H. L., and Ward, J. A new framework for distributed submodular maximization. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 645–654. IEEE, 2016.
  • Buchbinder & Feldman (2016) Buchbinder, N. and Feldman, M. Constrained submodular maximization via a non-symmetric technique. arXiv preprint arXiv:1611.03253, 2016.
  • Buchbinder et al. (2014) Buchbinder, N., Feldman, M., Naor, J. S., and Schwartz, R. Submodular maximization with cardinality constraints. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1433–1452. SIAM, 2014.
  • Buchbinder et al. (2016) Buchbinder, N., Feldman, M., and Schwartz, R. Comparing apples and oranges: Query trade-off in submodular maximization. Mathematics of Operations Research, 42(2):308–329, 2016.
  • Chekuri & Quanrud (2018) Chekuri, C. and Quanrud, K. Parallelizing greedy for submodular set function maximization in matroids and beyond. arXiv preprint arXiv:1811.12568, 2018.
  • Chekuri & Quanrud (2019) Chekuri, C. and Quanrud, K. Submodular function maximization in parallel via the multilinear relaxation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 303–322. SIAM, 2019.
  • Chen et al. (2018) Chen, L., Feldman, M., and Karbasi, A. Unconstrained submodular maximization with constant adaptive complexity. arXiv preprint arXiv:1811.06603, 2018.
  • Das & Kempe (2008) Das, A. and Kempe, D.

    Algorithms for subset selection in linear regression.

    In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pp. 45–54. ACM, 2008.
  • Dueck & Frey (2007) Dueck, D. and Frey, B. J. Non-metric affinity propagation for unsupervised image categorization. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pp. 1–8. IEEE, 2007.
  • El-Arini & Guestrin (2011) El-Arini, K. and Guestrin, C. Beyond keyword search: Discovering relevant scientific literature. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 439–447. ACM, 2011.
  • Elenberg et al. (2018) Elenberg, E. R., Khanna, R., Dimakis, A. G., Negahban, S., et al. Restricted strong convexity implies weak submodularity. The Annals of Statistics, 46(6B):3539–3568, 2018.
  • Ene & Nguyen (2019) Ene, A. and Nguyen, H. L. Submodular maximization with nearly-optimal approximation and adaptivity in nearly-linear time. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 274–282. SIAM, 2019.
  • Ene et al. (2018a) Ene, A., Nguyen, H. L., and Vladu, A. A parallel double greedy algorithm for submodular maximization. arXiv preprint arXiv:1812.01591, 2018a.
  • Ene et al. (2018b) Ene, A., Nguyen, H. L., and Vladu, A. Submodular maximization with matroid and packing constraints in parallel. arXiv preprint arXiv:1808.09987, 2018b.
  • Fahrbach et al. (2018) Fahrbach, M., Miller, G. L., Peng, R., Sawlani, S., Wang, J., and Xu, S. C. Graph sketching against adaptive adversaries applied to the minimum degree algorithm. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 101–112. IEEE, 2018.
  • Fahrbach et al. (2019) Fahrbach, M., Mirrokni, V., and Zadimoghaddam, M. Submodular maximization with nearly optimal approximation, adaptivity and query complexity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 255–273. SIAM, 2019.
  • Feige et al. (2011) Feige, U., Mirrokni, V. S., and Vondrak, J. Maximizing non-monotone submodular functions. SIAM Journal on Computing, 40(4):1133–1153, 2011.
  • Gharan & Vondrák (2011) Gharan, S. O. and Vondrák, J. Submodular maximization by simulated annealing. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1098–1116. SIAM, 2011.
  • Harper & Konstan (2016) Harper, F. M. and Konstan, J. A. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS), 5(4):19, 2016.
  • Hartline et al. (2008) Hartline, J., Mirrokni, V., and Sundararajan, M. Optimal marketing strategies over social networks. In Proceedings of the 17th international conference on World Wide Web, pp. 189–198. ACM, 2008.
  • Kazemi et al. (2018) Kazemi, E., Zadimoghaddam, M., and Karbasi, A. Scalable deletion-robust submodular maximization: Data summarization with privacy and fairness constraints. In International Conference on Machine Learning, pp. 2549–2558, 2018.
  • Khanna et al. (2017) Khanna, R., Elenberg, E. R., Dimakis, A. G., Negahban, S., and Ghosh, J. Scalable greedy feature selection via weak submodularity. In

    Proceedings of the 20th International Conference on Artificial Intelligence and Statistics

    , pp. 1560–1568, 2017.
  • Krizhevsky & Hinton (2009) Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
  • Kumar et al. (2015) Kumar, R., Moseley, B., Vassilvitskii, S., and Vattani, A. Fast greedy algorithms in mapreduce and streaming. ACM Transactions on Parallel Computing (TOPC), 2(3):14, 2015.
  • Lattanzi et al. (2011) Lattanzi, S., Moseley, B., Suri, S., and Vassilvitskii, S. Filtering: A method for solving graph problems in mapreduce. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 85–94. ACM, 2011.
  • Lee et al. (2010) Lee, J., Mirrokni, V. S., Nagarajan, V., and Sviridenko, M. Maximizing nonmonotone submodular functions under matroid or knapsack constraints. SIAM Journal on Discrete Mathematics, 23(4):2053–2078, 2010.
  • Leskovec & Krevl (2014) Leskovec, J. and Krevl, A. SNAP Datasets: Stanford large network dataset collection., June 2014.
  • Liu & Vondrak (2018) Liu, P. and Vondrak, J. Submodular optimization in the mapreduce model. arXiv preprint arXiv:1810.01489, 2018.
  • Mazumder et al. (2010) Mazumder, R., Hastie, T., and Tibshirani, R. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11:2287–2322, 2010.
  • Mirrokni & Zadimoghaddam (2015) Mirrokni, V. and Zadimoghaddam, M. Randomized composable core-sets for distributed submodular maximization. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 153–162. ACM, 2015.
  • Mirzasoleiman et al. (2013) Mirzasoleiman, B., Karbasi, A., Sarkar, R., and Krause, A. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems, pp. 2049–2057, 2013.
  • Mirzasoleiman et al. (2016) Mirzasoleiman, B., Badanidiyuru, A., and Karbasi, A. Fast constrained submodular maximization: Personalized data summarization. In International Conference on Machine Learning, pp. 1358–1367, 2016.
  • Mohri et al. (2018) Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of Machine Learning. MIT Press, 2018.
  • Nemhauser et al. (1978) Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265–294, 1978.
  • Norouzi-Fard et al. (2018) Norouzi-Fard, A., Tarnawski, J., Mitrović, S., Zandieh, A., Mousavifar, A., and Svensson, O. Beyond -approximation for submodular maximization on massive data streams. arXiv preprint arXiv:1808.01842, 2018.
  • Qian & Singer (2019) Qian, S. and Singer, Y. Fast parallel algorithms for feature selection. arXiv preprint arXiv:1903.02656, 2019.
  • Simon et al. (2007) Simon, I., Snavely, N., and Seitz, S. M. Scene summarization for online image collections. In 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE, 2007.
  • Sipos et al. (2012) Sipos, R., Swaminathan, A., Shivaswamy, P., and Joachims, T. Temporal corpus summarization using submodular word coverage. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 754–763. ACM, 2012.
  • Tschiatschek et al. (2014) Tschiatschek, S., Iyer, R. K., Wei, H., and Bilmes, J. A. Learning mixtures of submodular functions for image collection summarization. In Advances in Neural Information Processing Systems, pp. 1413–1421, 2014.

Appendix A Missing Analysis from Section 2

See 2.5


First assume that . We start by bounding the individual failure probability

By Lemma 2.4 we have . Using an analog of Markov’s inequality to upper bound , it follows that

Therefore, we must have . Since the subsets are chosen independently, our choice of gives us a total failure probability of

This completes the proof that with probability at least we have . To prove the adaptivity complexity, notice that all subsets can be generated and evaluated at once in parallel, hence the need for only one adaptive round. For the query complexity, we use the inequality , which holds for all . ∎

Appendix B Missing Analysis from Section 3

See 3.4


We prove the claim by induction. Since is nonnegative, the base case is clearly true. Assuming the claim as the induction hypothesis, it follows from Lemma 3.3 that

See 3.6


Fix an ordering on the elements in . Expanding the expected value and using submodularity, it follows that

which completes the proof. ∎

Lemma B.1.

For any set and optimal solution , if , then


It is equivalent to show that

For any sets , we have by the definition of submodularity. It follows that

Therefore, it suffices to instead show that


Let and write

Next, fix an ordering on the elements in . Summing the consecutive marginal gains of the elements in the set according to this order gives


We claim that each marginal contribution in Equation 7 is nonnegative. Assume for contradiction this is not the case. Let be the first element violating this property, and let be the previous element according to the ordering. By submodularity,

which implies , a contradiction. Therefore, the inequality in Equation 6 is true, as desired. ∎

Appendix C Implementation Details from Section 4

We set for all of the algorithms except Random-Lazy-Greedy-Improved, which we run with . Since some of the algorithms require a guess of OPT, we adjust  accordingly and fairly. We remark that all algorithms give reasonably similar results for any . We set the number of queries to be for the estimators in Adaptive-Nonmonotone-Max and Blits, although for the theoretical guarantees these should be and , respectively. For context, the experiments in (Balkanski et al., 2018) set the number of samples per estimate to be 30. Last, we set the number of outer rounds for Blits to be , which also matches (Balkanski et al., 2018) since the number needed for provable guarantees is , which is too large for these datasets.