Many well known objectives in combinatorial optimization exhibit two common properties: the marginal value of any given element is non-negative and it decreases as more and more elements are selected. The notions of submodularity and monotonicity111A set function on the ground set is called submodular when . The function is monotone if . W.l.o.g., assume . Combined with monotonicity this implies non-negativity.
nicely capture this property, resulting in the appearance of constrained monotone submodular maximization in a wide and diverse array of modern applications in machine learning and optimization, including feature selection ([20, 35]), network monitoring (), news article recommendation (), sensor placement and information gathering ([31, 17, 21, 22]), viral marketing and influence maximization ([19, 18]26]) and crowd teaching ().
Here we are interested in scenarios where multiple objectives, all monotone submodular, need to be simultaneously maximized subject to a cardinality constraint. This problem has an established line of work in both machine learning  and the theory community . As an example application, in robust experimental design one often seeks to maximize a function , which is monotone submodular for every value of . The function is very sensitive to the choice of
but the parameter is unknown a priori and estimated from data. Therefore, one possible approach to finding a robust solution is to maximize the function, where is a set that captures the uncertainty in . If is assumed to be a finite set of discrete values  we have an instance of multi-objective monotone submodular maximization. More generally, we consider the following problem,
where is monotone submodular for every . The problem also has an alternative formulation due to , which we discuss later. Broadly speaking, there are two ways in which this framework has been applied –
When there are several natural criteria that need to be simultaneously optimized: such as in network monitoring, sensor placement and information gathering [31, 25, 22, 23]. For example in the problem of intrusion detection , one usually wants to maximize the likelihood of detection while also minimizing the time until intrusion is detected, and the population affected by intrusion. The first objective is often monotone submodular and the latter objectives are monotonically decreasing supermodular functions [25, 22]. Therefore, the problem is often formulated as an instance of cardinality constrained maximization with a small number of submodular objectives.
When looking for solutions robust to the uncertainty in objective: such as in feature selection [23, 15], variable selection and experimental design , robust influence maximization . In these cases, there is often inherently just a single submodular objective which is highly prone to uncertainty either due to dependence on a parameter that is estimated from data, or due to multiple possible scenarios that each give rise to a different objective. Therefore, one often seeks to optimize over the worst case realization of the uncertain objective, resulting in an instance of multi-objective submodular maximization.
In some applications the number of objectives is given by the problem structure and can be larger even than the cardinality parameter. However, in applications such as robust influence maximization, variable selection and experimental design, the number of objectives is a design choice that trades off optimality with robustness.
1.1 Related Work
The problem of maximizing a monotone submodular function subject to a cardinality constraint,
goes back to the work of [29, 28], where they showed that the greedy algorithm gives a guarantee of and this is best possible in the value-oracle model. Later,  showed that this is also the best possible approximation unless P=NP. While this settled the hardness and approximability of the problem, finding faster approximations remained an open line of inquiry. Notably,  found a faster algorithm for that improved the quadratic query complexity of the classical greedy algorithm to nearly linear complexity, by trading off on the approximation guarantee. This was later improved by .
For the more general problem , where is the collection of independent sets of a matroid; [5, 36] in a breakthrough, achieved a approximation by (approximately) maximizing the multilinear extension of submodular functions, followed by suitable rounding. Based on this framework, tremendous progress was made over the last decade for a variety of different settings ([5, 36, 12, 37, 38, 8]).
In the multi-objective setting,  amalgamated various applications and formally introduced the following problem,
where is monotone submodular for every . They call this the Robust Submodular Observation Selection (RSOS) problem and show that in general the problem is inapproximable (no non-trivial approximation possible) unless . Consequently, they proceeded to give a bi-criterion approximation algorithm, called SATURATE, which achieves the optimal answer by violating the cardinality constraint. Note that their inapproximability result only holds when . Another bi-criterion approximation was given more recently in .
On the other hand,  showed a randomized approximation for constant in the more general case of matroid constraint, as an application of a new technique for rounding over a matroid polytope, called swap rounding. The runtime scales as 222The term could potentially be improved to by leveraging subsequent work [3, 13].. Note,  consider a different but equivalent formulation of the problem that stems from the influential paper on multi-objective optimization . The alternative formulation, which we review in Section 2, is the reason we call this a multi-objective maximization problem (same as ). For the special case of cardinality constraint (which will be our focus here),  recently showed that the greedy algorithm can be generalized to achieve a deterministic approximation for the special case of bi-objective maximization. Their runtime scales as and . To the best of our knowledge, when no constant factor approximation algorithms or inapproximability results were known prior to this work.
1.2 Our Contributions
Our focus here is on the regime . This setting is essential to understanding the approximability of the problem for super-constant and includes several of the applications we referred to earlier. For instance, in network monitoring and sensor placement, the number of objectives is usually a small constant [23, 25]. For robust influence maximization, the number of objectives depends on the underlying uncertainty but is often small . And in settings like variable selection and experimental design , where the number of objectives considered is a design choice. We show three algorithmic results with asymptotic approximation guarantees for .
1. Asymptotically optimal approximation algorithm: We give a approximation, which for and tends to as . The algorithm is randomized and outputs such an approximation w.h.p. Observe that this implies a steep transition around , due to the inapproximability result (to within any non-trivial factor) for .
We obtain this via extending the matroid based algorithm of , which relies on the continuous greedy approach, resulting in a runtime of . Note that there is no dependence in the runtime, unlike the result from . The key idea behind the result is quite simple, and relies on exploiting the fact that we are dealing with a cardinality constraint, far more structured than matroids.
2. Fast and practical approximation algorithm: In practice, can range from tens of thousands to millions ([31, 25]), which makes the above runtime intractable. To this end, we develop a fast time approximation. Under the same asymptotic conditions as above, the guarantee simplifies to . We achieve this via the Multiplicative-Weight-Updates (MWU) framework, which replaces the bottleneck continuous greedy process. This is what costs us the additional factor of in the guarantee but allows us to leverage the runtime improvements for achieved in [3, 27].
MWU has proven to be a vital tool in the past few decades ([16, 4, 14, 39, 40, 33, 1]). Linear functions and constraints have been the primary setting of interest in these works, but recent applications have shown its usefulness when considering non-linear and in particular submodular objectives ([2, 7]
). Unlike these recent applications, we instead apply the MWU framework in vein of the Plotkin-Shmoys-Tardos scheme for linear programming (), essentially showing that the non-linearity only costs us a another factor of in the guarantee and yields a nearly linear time algorithm. Independently and prior to our work,  applied the MWU framework in a similar manner and gave a new bi-criterion approximation. We further discuss how our result differs from theirs in Section 3.2.
3. Finding a deterministic approximation for small : While the above results are all randomized, we also show a simple greedy based deterministic approximation with runtime . This follows by establishing an upper bound on the increase in optimal solution value as a function of cardinality , which also resolves a weaker version of a conjecture posed in .
Outline: We start with definitions and preliminaries in Section 2, where we also review relevant parts of the algorithm in  that are essential for understanding the results here. In Section 3, we state and prove the main results. Since the guarantees we present are asymptotic and technically converge to the constant factors indicated as becomes large, in Section 4 we test the performance of a heuristic, closely inspired by our MWU based algorithm, on Kronecker graphs  of various sizes and find improved performance over previous heuristics even for small and large .
2.1 Definitions & review
We work with a ground set of elements and recall that we use to denote the single objective (classical) problem. [29, 28] showed that the natural greedy algorithm for achieves a guarantee of for and that this is best possible. The algorithm can be summarized as follows –
Starting with , at each step add to the current set an element which adds the maximum marginal value until elements are chosen.
Formally, given set the marginal increase in value of function due to inclusion of set is,
Let for . Note that . Further, for ,
This function appears naturally in our analysis and will be useful for expressing approximation guarantees.
We use the notation
for the support vector of a set(1 along dimension if and 0 otherwise). We also use the short hand to denote the norm of a vector x. Given , recall that its multilinear extension over is defined as,
The function can also be interpreted as the expectation of function value over sets obtained by including element
independently with probability. acts as a natural replacement for the original function in the continuous greedy algorithm (). Like the greedy algorithm, the continuous version always moves in a feasible direction that best increases the value of function . While evaluating the exact value of this function and its gradient is naturally hard in general, for the purpose of using this function in optimization algorithms, approximations obtained using a sampling based oracle suffice ([3, 8, 5]). Given two vectors , let denote the component wise maximum. Then we define marginals for as,
Now, we briefly discuss another formulation of the multi-objective maximization problem, call it , introduced in . In we are given a target value (positive real) with each function and the goal is to find a set of size at most , such that or certify that no exists. More feasibly one aims to efficiently find a set of size such that for all and some factor , or certify that there is no set of size such that . Observe that w.l.o.g. we can assume (since we can consider functions instead) and therefore is equivalent to the decision version of : Given , find a set of size at most such that , or give a certificate of infeasibility.
When considering formulation , since we can always consider the modified submodular objectives , we w.l.o.g. assume that for every set and every function . Finally, for both we use to denote an optimal/feasible set (optimal for , and feasible for ) to the problem and to denote the optimal solution value for formulation . We now give an overview of the algorithm from  which is based on . To simplify the description we focus on cardinality constraint, even though it is designed more generally for matroid constraint. We refer to it as Algorithm 1 and it has three stages. Recall, the algorithm runs in time .
Stage 1: Intuitively, this is a pre-processing stage with the purpose of picking a small initial set consisting of elements with ’large’ marginal values, i.e. marginal value at least for some function . This is necessary for technical reasons due to the rounding procedure in Stage 3.
Given a set of size , fix a function and index elements in in the order in which the greedy algorithm would pick them. There are at most elements such that , since otherwise by monotonicity (violating our w.l.o.g. assumption that ). In fact, due to decreasing marginal values we have, for every .
Therefore, we focus on sets of size (at most elements for each function) to find an initial set such that the remaining elements have marginal value for , for every . In particular, one can try all possible initial sets of this size (i.e. run subsequent stages with different starting sets), leading to the term in the runtime. Stages 2 and 3 have runtime polynomial in (in fact Stage has runtime independent of ). Hence, Stage 1 is really the bottleneck. For the more general case of matroid constraint, it is not obvious at all if one can do better than brute force enumeration over all possible starting sets and still retain the approximation guarantee. However, we will show that for cardinality constraints one can easily avoid enumeration.
Stage 2: Given a starting set from stage one, this stage works with the ground set and runs the continuous greedy algorithm. Suppose a feasible set exists for the problem, then for the right starting set , this stage outputs a fractional point with such that for every . However, this is computationally expensive and takes time . We formally summarize this stage in the following lemma and refer the interested reader to  for further details (which will not be necessary for subsequent discussion).
( Lemma 7.3) Given submodular functions and values , cardinality , the continuous greedy algorithm finds a point such that or outputs a certificate of infeasibility.
Stage 3: For the right starting set (if one exists), Stage 2 successfully outputs a point . Stage 3 now follows a random process that converts into a set of size such that, and as long as . The rounding procedure is called swap rounding and we include a specialized version of the formal lemma below.
( Theorem 1.4, Theorem 7.2) Given monotone submodular functions with the maximum value of singletons in for every ; a fractional point with and . Swap Rounding yields a set with cardinality , such that,
Remark: For any , the above can be converted to a result w.h.p. by standard repetition. Also this is a simplified version of the matroid based result in .
2.2 Some simple heuristics
Before we present the main results, let us take a step back and examine some variants of the standard greedy algorithm. To design a greedy heuristic for multiple functions, what should the objective for greedy selection be?
One possibility is to split the selection of elements into equal parts. In part , pick elements greedily w.r.t. function . It is not difficult to see that this is a (tight) approximation. Second, recall that the convex combination of monotone submodular functions is also monotone and submodular. Therefore, one could run the greedy algorithm on a fixed convex combination of the functions. It can be shown this does not lead to an approximation better than . This is indeed the idea behind the bi-criterion approximation in . Third, one could select elements greedily w.r.t. to the objective function . A naïve implementation of this algorithm can have arbitrarily bad performance even for (previously observed in ). We show later in Section 3.3, that if one greedily picks sets of size instead of singletons at each step, for large enough one can get arbitrarily close to .
3 Main Results
3.1 Asymptotic approximation for
We replace the enumeration in Stage 1 with a single starting set, obtained by scanning once over the ground set. The main idea is simply that for the cardinality constraint case, any starting set that fulfills the Stage 3 requirement of small marginals will be acceptable (not true for general matroids).
New Stage 1: Start with and pass over all elements once in arbitrary order. For each element , add it to if for some , . Note that we add at most elements (at most for each function). When the subroutine terminates, for every remaining element , (as required by Lemma 2). Let and note .
Stage 2 remains the same as Algorithm 1 and outputs a fractional point with . While enumeration over all starting sets allowed us to find a starting set such that for every ; with the new Stage 1 we will need to further exploit properties of the multilinear extension to show a similar lower bound on the marginal value of .
Given a point with and a multilinear extension of a monotone submodular function, for every ,
Proof Note that the statement is true for concave . The proof now follows directly from the concavity of multilinear extensions in positive directions (Section 2.1 of ). ∎
for every .
Stage 3 rounds to of size , and final output is . The following theorem now completes the analysis.
For we have, with constant probability. For the factor is asymptotically .
ProofFrom Lemma 4 and applying Lemma 2 we have, . Therefore, . To refine the guarantee, we choose , where the is due to Lemma 2 and the term is to balance and . The resulting guarantee becomes , where the function as , so long as .
Note that the runtime is now independent of . The first stage makes oracle queries, the second stage runs the continuous greedy algorithm on all functions simultaneously and makes queries to each function oracle, contributing to the runtime. Stage 2 results in a fractional solution that can be written as a convex combination of sets of cardinality each (bases) (ref. Appendix A in ). For cardinality constraint, swap rounding can merge two bases in time hence, the last stage takes time . ∎
3.2 Fast, asymptotic approximation for
While the previous algorithm achieves the best possible asymptotic guarantee, it is infeasible to use in practice. The main underlying issue was our usage of the continuous greedy algorithm in Stage 2 which has runtime , but the flexibility offered by continuous greedy was key to maximizing the multilinear extensions of all functions at once. To improve the runtime we avoid continuous greedy and find an alternative in Multiplicative-Weight-Updates (MWU) instead. MWU allows us to combine multiple submodular objectives together into a single submodular objective and utilize fast algorithms for at every step.
The algorithm consists of 3 stages as before. Stage 1 remains the same as the New Stage 1 introduced in the previous section. Let be the output of this stage as before. Stage 2 is replaced with a fast MWU based subroutine that runs for rounds and solves an instance of during each round. Here is an artifact of MWU and manifests as a subtractive term in the approximation guarantee. The currently fastest algorithm for , in , has runtime and an expected guarantee of . However, the slightly slower, but still nearly linear time thresholding algorithm in , has (the usual) deterministic guarantee of . Both of these are known to perform well in practice and using either would lead to a runtime of , which is a vast improvement over the previous algorithm.
Now, fix some algorithm for with guarantee , and let denote the set it outputs given monotone submodular function and cardinality constraint as input. Note that can be as large as , and we have as before. Then the new Stage 2 is,
The point obtained above is rounded to a set in Stage 3 (which remains unchanged). The final output is . Note that by abuse of notation we used the sets to also denote the respective support vectors. We continue to use and interchangeably in the below.
This application of MWU is unlike [2, 7], where broadly speaking the MWU framework is applied in a novel way to determine how an individual element is picked (or how a direction for movement is chosen in case of continuous greedy). In contrast, we use standard algorithms for and pick an entire set before changing weights. Also,  uses MWU along with the continuous greedy framework to tackle harder settings, but for our setting using the continuous greedy framework eliminates the need for MWU altogether and in fact, we use MWU as a replacement for continuous greedy. Subsequent to our work we discovered a resembling application of MWU in . Their application differs from Algorithm 2 only in minor details, but unlike our result they give a bi-criterion approximation where the output is a set of cardinality up to such that .
Now, consider the following intuitive schema. We would like to find a set of size such that for every . While this seems hard, consider the combination , which is also monotone submodular for non-negative . We can easily find a set such that , since this is a single objective problem and we have fast approximations for . However, for a fixed set of scalar weights , solving the problem instance need not give a set that has sufficient value for every individual function . This is where MWU comes into the picture. We start with uniform weights for functions, solve an instance of to get a set . Then we change weights to undermine the functions for which was closer to the target value and stress more on functions for which was small, and repeat now with new weights. After running many rounds of this, we have a collection of sets for . Using tricks from standard MWU analysis () along with submodularity and monotonicity, we show that . Thus far, this resembles how MWU has been used in the literature for linear objectives, for instance the Plotkin-Shmoys-Tardos framework for solving LPs. However, a new issue now arises due to the non-linearity of functions . As an example, suppose that by some coincidence turns out to be a binary vector, so we easily obtain the set from . We want to lower bound , and while we have a good lower bound on , it is unclear how the two quantities are related. More generally, we would like to show that and this would then give us a approximation using Lemma 2. Indeed, we show that , resulting in a approximation. Now, we state and prove lemmas that formalize the above intuition.
Proof Consider the optimal set and note that . Now the function , being a convex combination of monotone submodular functions, is also monotone submodular. We would like to show that there exists a set of size such that . Then the claim follows from the fact that is an approximation for monotone submodular maximization with cardinality constraint.
To see the existence of such a set , greedily index the elements of using . Suppose that the resulting order is , where is such that for every . Then the truncated set has the desired property, and we are done. ∎
Proof Suppose we have,
Then assuming , the RHS above simplifies to,
And we have for every ,
After rounds, . Further, for every ,
Using and for , and with and (for a positive approximation guarantee), we have,
Given monotone submodular function , its multilinear extension , sets for , and a point , we have,
Proof Consider the concave closure of a submodular function ,
Clearly, . So it suffices to show , which in fact, follows from Lemmas 4 and 5 in .
Alternatively, we now give a novel and direct proof for the statement. We abuse notation and use and interchangeably. Let and w.l.o.g., assume that sets are indexed such that for every . Further, let and .
Recall that can be viewed as the expected function value of the set obtained by independently sampling element with probability . Instead, consider the alternative random process where starting with , one samples each element in set independently with probability . The random process runs in steps and the probability of an element being chosen at the end of the process is exactly , independent of all other elements. Let , it follows that the expected value of the set sampled using this process is given by . Observe that for every , and therefore, . Now in step , suppose the newly sampled subset of adds marginal value . From submodularity we have, and in general, .
To see that , consider a LP where the objective is to minimize subject to ; and with . Here is a parameter and everything else is a variable. Observe that the extreme points are characterized by such that, and for all and . For all such points, it is not difficult to see that the objective is at least . Therefore, we have , as desired.
For , the algorithm makes queries, and with constant probability outputs a feasible approximate set. Asymptotically, approximate for .
Proof Combining Lemmas 7 & 8 we have, . The asymptotic result follows just as in Theorem 5. For runtime, note that Stage 1 takes time . Stage 2 runs an instance of , times, leading to an upper bound of , if we use the thresholding algorithm in  (at the cost of a multiplicative factor of in the approximation guarantee). Finally, swap rounding proceeds in rounds and each round takes time, leading to total runtime for Stage 3. Combining all three we get a runtime of . ∎
3.3 Variation in optimal solution value and derandomization
Consider the problem with cardinality constraint . Given an optimal solution with value for the problem, it is not difficult to see that for arbitrary , there is a subset of size , such that . For instance, indexing the elements in using the greedy algorithm, and choosing the set given by the first elements gives such a set. This implies , and the bound is easily seen to be tight.
This raises a natural question: Can we generalize this bound on variation of optimal solution value with varying , for multi-objective maximization? A priori, this isn’t obvious even for modular functions. In particular, note that indexing elements in order they are picked by the greedy algorithm doesn’t suffice since there are many functions and we need to balance values amongst all. We show below that one can indeed derive such a bound.
Given that there exists a set such that and . For every , there exists of size , such that,
Proof We restrict our ground set of elements to and let be a subset of size at most such that (recall, we discussed the existence of such a set in Section 2.1, Stage 1). The rest of the proof is similar to the proof of Lemma 4. Consider the point . Clearly, , and from Corollary 3, we have . Finally, using swap rounding Lemma 1, there exists a set of size , such that .
Conjecture in : Note that this resolves a slightly weaker version of the conjecture in  for constant . The original conjecture states that for constant and every , there exists a set of size , such that . Asymptotically, both and tend to . This implies that for large enough , we can choose sets of size (-tuples) at each step to get a deterministic (asymptotically) approximation with runtime for the multi-objective maximization problem, when is constant (all previously known approximation algorithms, as well as the ones presented earlier, are randomized).