# Weakly Submodular Function Maximization Using Local Submodularity Ratio

Weak submodularity is a natural relaxation of the diminishing return property, which is equivalent to submodularity. Weak submodularity has been used to show that many (monotone) functions that arise in practice can be efficiently maximized with provable guarantees. In this work we introduce two natural generalizations of weak submodularity for non-monotone functions. We show that an efficient randomized greedy algorithm has provable approximation guarantees for maximizing these functions subject to a cardinality constraint. We then provide a more refined analysis that takes into account that the weak submodularity parameter may change (sometimes improving) throughout the execution of the algorithm. This leads to improved approximation guarantees in some settings. We provide applications of our results for monotone and non-monotone maximization problems.

There are no comments yet.

## Authors

• 7 publications
• 31 publications
• ### Regularized Non-monotone Submodular Maximization

In this paper, we present a thorough study of maximizing a regularized n...
03/18/2021 ∙ by Cheng Lu, et al. ∙ 0

• ### Submodular Optimization under Noise

We consider the problem of maximizing a monotone submodular function und...
01/12/2016 ∙ by Avinatan Hassidim, et al. ∙ 0

• ### Two-Sided Weak Submodularity for Matroid Constrained Optimization and Regression

The concept of weak submodularity and the related submodularity ratio co...
02/18/2021 ∙ by Theophile Thiery, et al. ∙ 0

• ### Weakly monotone averaging functions

Monotonicity with respect to all arguments is fundamental to the definit...
08/02/2014 ∙ by Tim Wilkin, et al. ∙ 0

• ### Greed is Still Good: Maximizing Monotone Submodular+Supermodular Functions

We analyze the performance of the greedy algorithm, and also a discrete ...
01/23/2018 ∙ by Wenruo Bai, et al. ∙ 0

• ### Differentiable Greedy Submodular Maximization: Guarantees, Gradient Estimators, and Applications

We consider making outputs of the greedy algorithm for monotone submodul...
05/06/2020 ∙ by Shinsaku Sakaue, et al. ∙ 0

• ### Non-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings

We study the problem of maximizing a non-monotone submodular function un...
11/15/2019 ∙ by Vanja Doskoč, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Submodularity is a property of set functions equivalent to the notion of diminishing returns. More formally, we say that a set function is submodular if for any two sets and an element , the corresponding marginal gains satisfy . Submodularity has found a wide range of connections and applications to different computer science areas in recent years.

However, many applications in practice does not satisfy the diminishing returns property, but rather a weaker version of it. This has motivated several lines of work exploring different ways to relax the submodularity property. One such relaxation that has received a lot of attention from the machine learning community is the notion of weak submodularity (we postpone the formal definition to Section

1.1), originally introduced by Das and Kempe [10]

. They provided applications to the feature selection and the dictionary selection problems, and showed that the standard greedy algorithm achieves a

-approximation for the monotone maximization problem subject to a cardinality constraint. Here the parameter is called the submodularity ratio, and it measures how “close” the function is to being submodular. Weak submodularity has found applications in areas such as linear and nonlinear sparse regression [13, 23], high-dimensional subset selection [14]

, interpretability of black-box neural network classifiers

[13], video summarization, splice site detection, and black-box interpretation of images [9].

In subsequent work, Das and Kempe [11] left as an open question whether some of these theoretical guarantees can be extended to non-monotone objectives. As their definition of weak submodularity is targeted at monotone functions, they raise the question of whether there is a more general definition that retains some of the positive results of their work, while also yielding an analogue to non-monotone objectives.

One main goal of this work is to answer that question. We believe this is interesting for both theoretical and practical purposes, given that non-monotone submodular objectives have found a wide range of applications in computer science. Some of these include document summarization

[26, 27], MAP inference for determinantal point processes [19], personalized data summarization [28], nonparametric learning [32], image summarization [30], and removing redundant elements from DNA sequencing [25]. Hence, it seems natural to study how the approximation guarantees for non-monotone submodular maximization degrade in terms of the submodularity ratio.

In this work we introduce a natural generalization of weak submodularity to the non-monotone setting. We then show that a fast and simple randomized greedy algorithm retains some of the good theoretical guarantees available for (non-monotone) submodular objectives. In addition, for monotone weakly submodular functions, this algorithm retains the approximation guarantee of given in [10].

A second main contribution of our work is to provide a more refined analysis that takes into account that the submodularity ratio parameter may change (some times improving) throughout the execution of the algorithm. We provide several applications where this more refined bound leads to improved approximation guarantees, for both monotone and non-monotone maximization problems.

The rest of this section is organized as follows. In Section 1.1 we extend weak submodularity to the non-monotone setting. In Section 1.2 we discuss the notion of local submodularity ratio. We discuss several examples and applications in Section 1.3. Our main contributions are presented in Section 1.5. Additional related work regarding weak submodularity and non-monotone submodular maximization is discussed in Section 1.4.

### 1.1 Weak submodularity and non-monotonicity

Throughout this paper we use to denote the marginal gain of adding the set to , that is . A non-negative monotone set function is -weakly submodular for some parameter , if for any pair of disjoint sets , it satisfies We note that this is the definition used in [2, 9, 13], which is slightly adapted from the original definition given in [10, 11]. The parameter is called the submodularity ratio.

When is monotone, it is clear that for any value of the above class contains monotone submodular functions. However, for non-monotone objectives the marginal gains can be negative, and in this case we have whenever , leading to a stronger condition than diminishing returns. This motivates us to introduce the following two classes of non-monotone non-submodular functions.

[pseudo and weak submodularity]

Given a scalar , we say that a set function is:

1. -pseudo submodular if for any pair of disjoint sets .

2. -weakly submodular if for any disjoint.

We first note that for monotone functions, the above two definitions are equivalent to the notion of -weakly submodularity from previous works [2, 9, 13]. This follows immediately from the fact that monotone functions satisfy for all , and hence .

For any value the above definition of -weakly submodularity leads to a weaker notion of diminishing returns (i.e., it contains non-monotone submodular functions). Indeed, if we have , while if we have . On the other hand, while the class of -pseudo submodular functions does not properly contain non-monotone submodular functions, it does contain functions that are not necessarily submodular. We show this in Figure 1.

### 1.2 Local submodularity ratio

The submodularity ratio is a very pessimistic bound for most applications in general. This is due to the fact that is defined as a global bound, in the sense that it must hold for any pair of disjoint sets . Or at least for any pair of sets that are relevant to the execution of the algorithm, e.g., sets of cardinality at most . We next discuss a natural way to “refine” this bound.

Given a function and any pair of disjoint sets , in this work we denote by any non-negative scalar satisfying . When it is clear from the context we usually simplify the notation to instead of . One of our contributions is showing how using these local bounds can be beneficial in some settings. In particular we discuss several natural classes of functions for which (i) one can compute explicit bounds for the value (see Section 1.3), and (ii) using the local bounds (instead of ) leads to significantly better theoretical guarantees (we discuss this in more detail in Section 1.5). We believe this is interesting for both theoretical and practical applications.

### 1.3 Examples and applications

In this section we present several classes of functions for which the parameter can be bounded explicitly, and discuss applications arising from these results. Due to space limitations we postpone the proofs to Appendix A.

Our first example is the so-called metric diversity function (also known as remote clique). Here we are given a metric (i.e., a distance that satisfies the triangle inequality) over a finite set , where measures the dissimilarity between two elements and . One then defines a set function that measures the diversity inside the set . The problem of finding a diverse subset has been studied in the operations research community [22, 29, 3], and has found applications in other areas [1, 12].

[] Given a metric , consider the function , which is monotone and supermodular. Then, we have for any two disjoint sets , where and .

The works of [5, 6] introduced the notion of proportionally submodular functions111They called them weakly submodular at first, and changed their name in subsequent work.. A set function is proportionally submodular if for every . In the monotone setting, this class properly contains monotone submodular functions. In addition, this class also contains some non-submodular objectives such as the (supermodular) metric diversity function discussed in Example 1.3. Since these functions are closed under addition, the sum of a monotone submodular function and a metric diversity function is proportionally submodular. Our next result bounds the parameter for this class, in both the monotone and non-monotone settings.

A non-negative proportionally submodular function has for any two disjoint sets , where and .

The above result leads to interesting applications. First it allows to improve over the current best approximation for maximizing a monotone proportionally submodular function subject to a cardinality constraint. In addition, combining this with other results from this work, we can also get improved approximations for the product of a monotone submodular function and a monotone proportionally submodular function . We discuss this in more detail in Section 3.

[] Let be two monotone set functions with parameters and respectively. Then the product function is also non-negative and monotone, with parameter

 γA,B≥⎧⎪⎨⎪⎩f(A)f(A∪B)γgA,Bif % γfA,B≥γgA,Bg(A)g(A∪B)γfA,Bif γgA,B≥γfA,B,

for any two disjoint sets . In particular, if and have global parameters and respectively, such that , then the product function has parameter .

Using that submodular functions satisfy , we can combine the above result with Examples 1.3 and 1.3 to get the following.

Let be two monotone functions, and let be the product function with parameter . Then we have the following.

1. If and are submodular then .

2. If is submodular and is the metric diversity function from Example 1.3, then , where and .

3. If is submodular and is proportionally submodular then , where and .

By taking a non-monotone submodular function , and either multiplying it or dividing by the cardinality function, we obtain a new function that is no longer submodular. The next example bounds the parameter for these functions.

[] Let be a submodular function. Then for any two disjoint sets with and we have the following.

1. The function satisfies .

2. The function has .

We next discuss the behavior of the parameter under summation, and how this result allows us to generalize some of the bounds previously discussed in this section.

[] Let be two set functions with parameters and respectively. We have the following.

1. If and are both monotone, then is also monotone with parameter . In particular, if holds for all pairs of disjoint sets and , then has parameter .

2. If is monotone and is non-monotone, and holds for all pairs of disjoint sets and , then has parameter .

By combining the above proposition with Examples 1.31.3, and 1.3 we get the following.

[] Let be a non-negative monotone submodular function. Then:

• The sum where is a metric diversity function satisfies .

• The sum where is non-monotone submodular satisfies .

• The sum where is non-monotone proportionally submodular satisfies , where and .

We can also combine the above result with Example 1.3 to get that the product function satisfies , whenever and are monotone submodular and is a metric diversity function. This generalizes the bound from Example 1.3 (b).

We note that the sum of a monotone submodular function and a metric diversity function has been previously studied [4]. We discuss this in more detail in Section 3.

### 1.4 Additional related work

The notion of weak submodularity was introduced by Das and Kempe [10], where they showed that the standard greedy algorithm achieves a

-approximation for the monotone maximization problem subject to a cardinality constraint. They provided applications to the feature selection problem for linear regression and the dictionary selection problems. Khanna et al.

[23] showed that faster (such as distributed and stochastic) versions of the greedy algorithm also retain provable theoretical guarantees for monotone weakly submodular maximization under a cardinality constraint. They discussed applications for the sparse linear regression problem and the support selection problem. Elenberg et al. [13] considered the above problem in the random order streaming setting, and provided applications to nonlinear sparse regression and interpretability of black-box neural network classifiers. Connections between weak submodularity and restricted strong convexity were shown by Elenberg et al. [14], and used for high-dimensional subset selection problems. The work of Chen et al. [9] goes beyond the cardinality constraint, and considers the monotone maximization problem subject to a matroid constraint. They provided an approximation ratio of for this problem, and discuss applications to video summarization, splice site detection, and black-box interpretation of images. Gatmiry and Gomez [17] showed that the standard deterministic greedy algorithm also enjoys provable guarantees for the above problem, though worse than the one provided by [9]

. They provide applications to tree-structured Gaussian graphical model estimation. The recent work of Harshaw et al.

[21] considers the problem , where is non-negative monotone -weakly submodular and is a non-negative modular function. Using the special structure of this type of objective, they circumvented the potential roadblocks of being negative or non-monotone, and provided a -approximation (which matches the approximation ratio from Das and Kempe for the -weakly monotone maximization problem). In addition, they showed that this approximation ratio is tight in the value oracle model.

Non-monotone submodular maximization subject to a cardinality constraint has been studied extensively. The first constant factor approximation for this problem was given by Lee et al. [24]. Since then a large series of works [8, 15, 16, 18, 20, 31] have improved the approximation factor, leading to the current best approximation ratio due to Buchbinder and Feldman [7]. Some of the latter works, however, use an approach that involves using a continuous relaxation of the objective function and then applying rounding methods to the fractional solution. While this approach has been extremely successful for proving strong theoretical guarantees, due to the run time they usually become impractical in real-world scenarios with large amounts of data. In our work we use a randomized greedy algorithm proposed in [8], where it is shown that this algorithm produces a -approximation (on expectation). On the inapproximability side, Gharan and Vondrak [18] show that it is impossible to achieve a approximation for this problem in the value oracle model.

### 1.5 Our contributions

One main contribution of this work is showing that an easy-to-implement and fast randomized greedy algorithm (i.e., Algorithm 1) has provable theoretical guarantees for the problem when the function is non-monotone weakly submodular (as defined in Section 1.1). This is encapsulated in the following result. To the best of our knowledge, this is the first time that weakly submodular functions are considered in the non-monotone setting.

There exists an efficient randomized greedy algorithm which has an approximation ratio (on expectation) of at least for the problem of maximizing a non-negative non-monotone -weakly submodular function subject to a cardinality constraint. For non-negative non-monotone -pseudo submodular functions, the approximation ratio is of at least .

We remark that when approaches to , our bounds recover the approximation factor given in [8] for the analysis of the same algorithm over submodular functions (i.e., the case when ).

A key ingredient for analyzing non-monotone objectives is to bound the term with respect to . For submodular functions the work of [8] (see their Lemma 2.2 and Observation 1) bounds the above term by using the diminishing returns property, i.e., whenever and . However, it is not clear how one could imitate such argument in the case of non-submodular functions. In particular, it is not obvious whether from the definition of weak submodularity, one could find a parameter satisfying some approximate version of diminishing returns. We circumvent this issue by analyzing the quantity directly with respect to the execution of the algorithm (see Lemma 2.2).

Another important piece of our work is to provide a more refined analysis that allows for the submodularity ratio to change throughout the execution of the algorithm. This is particularly useful since many classes of functions will usually satisfy this (see for instance Section 1.3). Our most general result (Theorem 2.2) assumes some local bounds throughout the algorithm, and provides approximation guarantees based on these bounds. Its statement is somewhat less clean to express since it depends on the notation used in Algorithm 1 (which we introduce in Section 2.1), so we defer its full presentation and discussion to Section 2.2. We next present some of its consequences, which lead to some of our main applications.

Assume we run the randomized greedy algorithm described in Algorithm 1 on a function with parameters for any pair of disjoint sets . Moreover, assume there are values for so that holds for any possible solution of the algorithm after iteration . Then the algorithm produces (on expectation):

• An approximation factor of at least if is monotone.

• An approximation factor of at least if is non-monotone.

We remark that for monotone -weakly submodular objectives the above result retains the -approximation given in [10]. This can be achieved by setting for all .

Combining the above theorem with the results from Section 1.3 leads to interesting applications. We now highlight some of them, and defer a more detailed discussion to Section 3. For non-monotone objectives the result from Theorem 1.5 allows us to obtain, for instance, provable guarantees for the functions discussed in Example 1.3.

Theorem 1.5 also leads to interesting results for monotone objectives. Applying it to Example 1.3 we get a -approximation for maximizing monotone proportionally submodular functions subject to a cardinality constraint. This improves over the current best -approximation from [5, 6]. Another set of applications is obtained via Example 1.3, which allows us to get several constant factor approximations for the product of set functions. For instance, for the product where are monotone submodular and is a metric diversity function, our results lead to a -approximation. For the product where is monotone submodular and is monotone proportionally submodular, we get a -approximation. We are not aware of previous work for these problems.

## 2 Analysis of the algorithm

In this section we present the main theoretical contribution of this work, which is to analyze the performance of a randomized greedy algorithm on non-monotone functions. Due to space limitation, we defer the analysis for monotone functions to Appendix B. We next describe the randomized greedy algorithm that we use in this work.

### 2.1 Randomized greedy algorithm

In this section, we explain the randomized greedy algorithm introduced in the work of [8], where they study the problem of maximizing a non-monotone submodular function subject to a cardinality constraint. We note that this algorithm has also been used in [9] for the problem of maximizing a monotone weakly submodular function subject to a matroid constraint.

Given a set function over a ground set , we first add a set of dummy elements to the ground set. That is, for any set and the function satisfies . Then, for each , we take a set of elements that maximizes the sum of the marginal gains, where in case of ties we always give preference to elements from the original ground set . Finally, we choose uniformly at random one of the elements, and add it to the current solution. We summarize this procedure in Algorithm 1.

The algorithm is quite efficient as it makes queries to the value oracle. This is the same number of queries that the standard deterministic greedy algorithm makes. Moreover, adding dummy elements to the original ground set guarantees the following.

###### Observation .

At any iteration of the RandomizedGreedy algorithm the following is satisfied:

1. .

2. , and hence .

3. .

###### Proof.

The first two statements are immediate from the fact that we add dummy elements. To see the last statement, let denote a set of size containing and potentially some dummy elements (so that ). Then, by definition of we have

 ∑e∈MifSi−1(e)≥∑e∈¯MifSi−1(e)=∑e∈OPTfSi−1(e).\qed

### 2.2 Analysis for non-monotone functions

In this section we analyze the performance of the RandomizedGreedy algorithm on non-monotone functions. As mentioned in Section 1.5, a key ingredient for analyzing the non-monotone case is to bound the term from below with respect to . For monotone objectives this is trivial, since by monotonicity we always have . The techniques used in [8] for analyzing RandomizedGreedy with respect to submodular functions make use of the diminishing returns property (see their Lemma 2.2 and Observation 1). However, it is not clear how to extend those techniques for non-monotone weakly submodular functions, since it is not obvious whether they satisfy some type of approximate diminishing returns property . Our next result circumvents this issue by analyzing the quantity directly with respect to the execution of the algorithm.

Let be a non-negative set function. Assume there are numbers such that

 ∑u∈MifSi−1∪OPT(u)≥¯γi⋅fSi−1∪OPT(Mi)

is satisfied for any choice of and throughout the execution of the RandomizedGreedy algorithm. Then at any iteration the algorithm satisfies

###### Proof.

Fix and an event of a possible path of the algorithm up to iteration . Then (conditioned on this event) we have

 E[f(Si∪OPT)] =f(Si−1∪OPT)+E[fSi−1∪OPT(ui)]=f(Si−1∪OPT)+1k∑u∈MifSi−1∪OPT(u) ≥f(Si−1∪OPT)+¯γikfSi−1∪OPT(Mi) =f(Si−1∪OPT)+¯γik[f(Si−1∪OPT∪Mi)−f(Si−1∪OPT)] ≥f(Si−1∪OPT)−¯γikf(Si−1∪OPT)=[1−¯γik]f(Si−1∪OPT),

where the first inequality follows from the assumption and the second inequality follows from non-negativity.

By unconditioning on the event , and taking an expectation over all such possible events we get:

 E[f(Si∪OPT)] ≥[1−¯γik]E[f(Si−1∪OPT)]≥[1−¯γik][1−¯γi−1k]E[f(Si−2∪OPT)] ≥⋯≥i∏j=1[1−¯γjk]E[f(S0∪OPT)]=i∏j=1[1−¯γjk]f(OPT).\qed

For submodular functions the above result becomes , since we can take for all . We remark that this matches the bound provided in [8] for submodular functions (see their Observation 1).

We now state and prove our main result.

Let be a set function. Assume there are non-negative values and such that

 ∑u∈MifSi−1∪OPT(u)≥¯γi⋅fSi−1∪OPT(Mi)

and

 ∑e∈OPTfSi−1(e)≥γi−1⋅fSi−1(OPT)

is satisfied for any choice of and throughout the execution of the RandomizedGreedy algorithm. Then at any iteration the algorithm satisfies

 E[f(Si)]≥(i−1∏j=1min{1−¯γjk,1−γjk})⋅(i−1∑j=0γjk)⋅f(OPT).
###### Proof.

We first show that at any iteration the algorithm satisfies

 E[f(Si)]≥[1−γi−1k]⋅E[f(Si−1)]+γi−1k⋅i−1∏j=1[1−¯γjk]⋅f(OPT). (1)

We do this as follows. Fix and an event of a possible realization of the algorithm up to iteration . Then (conditioned on this event) we have

 E[fSi−1(ei)] =1k∑e∈MifSi−1(e)≥1k∑e∈OPTfSi−1(e)≥γi−1kfSi−1(OPT) =γi−1k[f(Si−1∪OPT)−f(Si−1)],

where the first inequality follows from Observation 2.1, and the second inequality from the theorem’s assumption.

We now unfix the realization and take expectations over all such possible realizations of the algorithm.

 E[f(Si)] =E[f(Si−1)+fSi−1(ei)]=E[f(Si−1)]+E[fSi−1(ei)] ≥E[f(Si−1)]+γi−1kE[f(Si−1∪OPT)−f(Si−1)] =[1−γi−1k]E[f(Si−1)]+γi−1kE[f(Si−1∪OPT)]

where the last inequality follows from Lemma 2.2 (which we can use due to the lemma’s assumptions).

We are now ready to prove the statement of the theorem using induction on the value of . The base case claims that . This follows from

 E[f(S1)]=1k∑e∈M1fS0(e)≥1k∑e∈OPTfS0(e)≥1k⋅γ0⋅fS0(OPT)=γ0k⋅f(OPT),

where the first inequality follows from Observation 2.1, the second inequality from the theorem’s assumptions, and the last equality is because .

Now let be arbitrary, and assume that the claim is true for all values ; we show it is also true for . Using Equation (1) and the induction hypothesis we get

 E[f(Si)]≥[1−γi−1k]E[f(Si−1)]+γi−1ki−1∏j=1[1−¯γjk]f(OPT) ≥(i−1∏j=1min{1−¯γjk,1−γjk})⋅((i−2∑j=0γjk)+γi−1k)⋅f(OPT) =(i−1∏j=1min{1−¯γjk,1−γjk})⋅(i−1∑j=0γjk)⋅f(OPT).\qed

The above result leads to several interesting consequences. For instance, for non-monotone -weakly submodular functions we have for all . Hence we immediately get an approximation of for this class of functions. In a similar fashion, for -pseudo submodular functions we can take for all , leading to an approximation factor of for this class. This now proves Theorem 1.5.

Moreover, if the function has a parameter that satisfies (such as in Example 1.3), we immediately get that . Thus leading to an approximation factor of , as claimed in Theorem 1.5.

Theorem 2.2 becomes particularly useful to prove tighter guarantees for some of the examples discussed in Section 1.3, which have a parameter that changes throughout the algorithm. We discuss this and applications for monotone objectives in the next section.

## 3 Applications

We now present some applications from our results. We discuss the monotone case first.

For monotone functions it is clear that the RandomizedGreedy algorithm always selects elements from the original ground set (i.e., it never chooses dummy elements). In particular, the current solution at iteration always has elements from , while is a set containing at most elements. We can use this, together with the results from Section 1.3, to compute a lower bound for a parameter that satisfies For instance, one can take

 γi=min|A|=i, |B|≤k, A∩B=∅γA,B.

We then immediately get a provable approximation ratio of at least via Theorem 1.5 (or Theorem B).

For monotone proportionally submodular functions, Example 1.3 gives a bound of where and . Hence for . By plugging this into Theorem 1.5 we get an expression that does not seem to have a closed form, but that numerically converges from above to . This improves over the approximation factor of given in [6] for the same problem (they give it as a -approximation since they express approximation factors as numbers greater than ).

There is an efficient -approximation for the problem of maximizing a non-negative monotone proportionally submodular function subject to a cardinality constraint.

Our next application is for the product of monotone set functions. First, let us consider the case where is submodular and is either submodular, metric diversity, or proportionally submodular. Example 1.3 provides explicit bounds for the parameter of these product functions. We have where the latter term denotes the parameter of the function . Hence, we need to lower bound the term . We can do this as follows. One can show that for submodular functions, if there is a set satisfying then for any set and any set of size at most (see Claim C in the Appendix). We can then take as the initial set and run the RandomizedGreedy algorithm during iterations (to get a set of size ), with a guarantee that the parameter of the product function satisfies . This leads to approximation guarantees of , where denotes the parameter of the function .

For submodular functions, we can run the standard greedy algorithm on during iterations to find a set of size satisfying . Combining this with the fact that submodular functions have , the sum of submodular and metric diversity has , and proportionally submodular functions have for , one can obtain the following approximation guarantees.

Let and be non-negative and monotone. If is submodular, then:

• there is an approximation (on expectation) of for when is submodular.

• there is an approximation (on expectation) of for when is a metric diversity function and is submodular.

• there is an approximation (on expectation) of for when is proportionally submodular

We are not aware of previous work for the product of set functions that we can compare our results to. However, when the functions are monotone, a natural baseline can be obtained by taking the set where is obtained by running the greedy algorithm for , and similarly is obtained by running the greedy algorithm for . Then if and , we get that . In the case of the above functions we get the following guarantees for after running the greedy algorithm for iterations: for a submodular function we get via the standard greedy algorithm analysis, for the sum of submodular and metric diversity we get via the analysis from [4], and for proportionally submodular we get via the analysis using Example 1.3 and Theorem 1.5 (which improves over the previous analysis given in [6]). This leads to the following baselines (though there is room for optimizing the sizes of and ): a 0.155-approximation for the product of two submodular functions, a -approximation for the product of a submodular function and the sum of submodular and metric diversity, and a -approximation for the product of a submodular function and a proportionally submodular function.

We note that our choice of cardinality for the initial set of the algorithm, and for the sets and used in the baselines, may not be optimal. For the sake of consistency and to keep the argument as clean as possible, we used the same cardinality for all of them.

By using a similar argument to the one from Theorem 3 one can also get constant factor approximations in the case where is a metric diversity function. This follows since if satisfies , then for any set and any set of size at most (see Claim C in the Appendix). The fact that this bound is worse than for submodular functions is expected, since is supermodular and hence whenever and .

We now discuss the non-monotone case. While for monotone functions the algorithm always chooses elements from the original ground set (i.e., it never picks dummy elements), this may not be the case for non-monotone objectives. That is, for non-monotone objectives we have . Hence, we cannot just directly plug the bounds for from Section 1.3, since these depend on the number of elements from that the current solution has. Our next result gives a guarantee with respect to the number of elements from that the algorithm picks.

Let be a set function with parameters satisfying the assumptions of Theorem 2.2. In addition, assume that the values are non-decreasing, i.e., . Then, if the RandomizedGreedy algorithm picks elements from the original ground set (i.e., not dummy elements), its output satisfies

 E[f(Sk)]≥1ke[γ0+(k−m)γ1+m∑i=2γi−1]⋅f(OPT).
###### Proof.

We can always assume that in the first iteration the algorithm picks an element from the original ground set. This is because and for all by non-negativity of . Hence there is always a choice of elements from the original ground set for the candidate set .

Moreover, since the values are non-decreasing, the worst scenario occurs when the algorithm picks the rest of the non-dummy elements in the last iterations. In this case, we have a bound of (where is potentially zero) during the first iteration, a bound of during iterations , and a bound of during iterations . That leads to a worst approximation guarantee of

 E[f(Sk)] ≥1ke[γ0+(k−m)γ1+m∑i=2γi−1]⋅f(OPT).\qed

This corollary can be used to obtain bounds for some of the examples discussed in Section 1.3 that satisfy and have non-decreasing values, such as those from Example 1.3. We discuss this next, where we state the approximation guarantees for the case the algorithm selects at least elements from .

Let be a (non-monotone) set function, and assume the RandomizedGreedy algorithm picks at least elements from . Then its output satisfies the following guarantees:

1. If where is monotone submodular and is non-monotone proportionally submodular, then .

2. If where is monotone submodular and is non-monotone submodular, then .

## References

• [1] Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. Diversifying search results. In Proceedings of the second ACM international conference on web search and data mining, pages 5–14, 2009.
• [2] Andrew An Bian, Joachim M Buhmann, Andreas Krause, and Sebastian Tschiatschek. Guarantees for greedy maximization of non-submodular functions with applications. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages 498–507, 2017.
• [3] Benjamin Birnbaum and Kenneth J Goldman. An improved analysis for a greedy remote-clique algorithm using factor-revealing lps. Algorithmica, 55(1):42–59, 2009.
• [4] Allan Borodin, Aadhar Jain, Hyun Chul Lee, and Yuli Ye. Max-sum diversification, monotone submodular functions, and dynamic updates. ACM Transactions on Algorithms (TALG), 13(3):1–25, 2017.
• [5] Allan Borodin, Dai Le, and Yuli Ye. Weakly submodular functions. CoRR, abs/1401.6697, 2014.
• [6] Allan Borodin, Dai Le, and Yuli Ye. Proportionally submodular functions. http://www.cs.toronto.edu/ bor/Papers/proportional-talg-submit.pdf, 2015.
• [7] Niv Buchbinder and Moran Feldman. Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research, 44(3):988–1005, 2019.
• [8] Niv Buchbinder, Moran Feldman, Joseph Seffi Naor, and Roy Schwartz. Submodular maximization with cardinality constraints. In Proceedings of the 25th annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1433–1452, 2014.
• [9] Lin Chen, Moran Feldman, and Amin Karbasi. Weakly submodular maximization beyond cardinality constraints: Does randomization help greedy? In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 803–812, 2018.
• [10] Abhimanyu Das and David Kempe. Submodular meets spectral: greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on Machine Learning (ICML), pages 1057–1064, 2011.
• [11] Abhimanyu Das and David Kempe. Approximate submodularity and its applications: subset selection, sparse approximation and dictionary selection. The Journal of Machine Learning Research, 19(1):74–107, 2018.
• [12] Marina Drosou and Evaggelia Pitoura. Search result diversification. ACM SIGMOD Record, 39(1):41–47, 2010.
• [13] Ethan Elenberg, Alexandros G Dimakis, Moran Feldman, and Amin Karbasi. Streaming weak submodularity: Interpreting neural networks on the fly. In Advances in Neural Information Processing Systems (NIPS), pages 4044–4054, 2017.
• [14] Ethan R Elenberg, Rajiv Khanna, Alexandros G Dimakis, Sahand Negahban, et al. Restricted strong convexity implies weak submodularity. The Annals of Statistics, 46(6B):3539–3568, 2018.
• [15] Alina Ene and Huy L Nguyen. Constrained submodular maximization: Beyond . In Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 248–257, 2016.
• [16] Moran Feldman, Joseph Naor, and Roy Schwartz. A unified continuous greedy algorithm for submodular maximization. In Proceedings of the IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pages 570–579, 2011.
• [17] Khashayar Gatmiry and Manuel Gomez-Rodriguez. Non-submodular function maximization subject to a matroid constraint, with applications. arXiv preprint arXiv:1811.07863, 2018.
• [18] Shayan Oveis Gharan and Jan Vondrák. Submodular maximization by simulated annealing. In Proceedings of the 22nd annual ACM-SIAM symposium on Discrete Algorithms (SODA), pages 1098–1116, 2011.
• [19] Jennifer Gillenwater, Alex Kulesza, and Ben Taskar. Near-optimal map inference for determinantal point processes. In Advances in Neural Information Processing Systems (NIPS), pages 2735–2743, 2012.
• [20] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-monotone submodular maximization: offline and secretary algorithms. In Proceedings of the 6th international conference on Internet and network economics (WINE), pages 246–257, 2010.
• [21] Chris Harshaw, Moran Feldman, Justin Ward, and Amin Karbasi. Submodular maximization beyond non-negativity: Guarantees, fast algorithms, and applications. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 2634–2643, 2019.
• [22] Refael Hassin, Shlomi Rubinstein, and Arie Tamir. Approximation algorithms for maximum dispersion. Operations research letters, 21(3):133–137, 1997.
• [23] Rajiv Khanna, Ethan Elenberg, Alex Dimakis, Sahand Negahban, and Joydeep Ghosh. Scalable greedy feature selection via weak submodularity. In

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS)

, pages 1560–1568, 2017.
• [24] Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Maximizing nonmonotone submodular functions under matroid or knapsack constraints. SIAM Journal on Discrete Mathematics, 23(4):2053–2078, 2010.
• [25] Maxwell W Libbrecht, Jeffrey A Bilmes, and William Stafford Noble. Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB), pages 566–566, 2018.
• [26] Hui Lin and Jeff Bilmes. Multi-document summarization via budgeted maximization of submodular functions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pages 912–920, 2010.
• [27] Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), pages 510–520, 2011.
• [28] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast constrained submodular maximization: Personalized data summarization. In Proceedings of the 33rd International Conference on Machine Learning (ICML), pages 1358–1367, 2016.
• [29] Sekharipuram S Ravi, Daniel J Rosenkrantz, and Giri Kumar Tayi. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299–310, 1994.
• [30] Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, and Jeff A Bilmes. Learning mixtures of submodular functions for image collection summarization. In Advances in neural information processing systems (NIPS), pages 1413–1421, 2014.
• [31] Jan Vondrák. Symmetry and approximability of submodular maximization problems. SIAM Journal on Computing, 42(1):265–304, 2013.
• [32] Ghahramani Zoubin. Scaling the indian buffet process via submodular maximization. In Proceedings of the 30th International Conference on Machine Learning (ICML), pages 1013–1021, 2013.

## Appendix A Proofs for Section 1.3

In this section we present the proofs for the results discussed in Section 1.3. Given that the argument for proportionally submodular functions (i.e., Example 1.3) is much longer and involved than the rest, we discuss it in a different subsection.

### a.1 Proof of Example 1.3 from Section 1.3

Recall that a function is proportionally submodular [5, 6] if

 |S|f(T)+|T|f(S)≥|S∩T|f(S∪T)+|S∪T|f(S∩T)