 # Submodular Function Maximization in Parallel via the Multilinear Relaxation

Balkanski and Singer  recently initiated the study of adaptivity (or parallelism) for constrained submodular function maximization, and studied the setting of a cardinality constraint. Very recent improvements for this problem by Balkanski, Rubinstein, and Singer  and Ene and Nguyen  resulted in a near-optimal (1-1/e-ϵ)-approximation in O( n/ϵ^2) rounds of adaptivity. Partly motivated by the goal of extending these results to more general constraints, we describe parallel algorithms for approximately maximizing the multilinear relaxation of a monotone submodular function subject to packing constraints. Formally our problem is to maximize F(x) over x ∈ [0,1]^n subject to Ax < 1 where F is the multilinear relaxation of a monotone submodular function. Our algorithm achieves a near-optimal (1-1/e-ϵ)-approximation in O(^2 m n/ϵ^4) rounds where n is the cardinality of the ground set and m is the number of packing constraints. For many constraints of interest, the resulting fractional solution can be rounded via known randomized rounding schemes that are oblivious to the specific submodular function. We thus derive randomized algorithms with poly-logarithmic adaptivity for a number of constraints including partition and laminar matroids, matchings, knapsack constraints, and their intersections.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

intro A real-valued set function over a finite ground set is submodular iff

 f(A)+f(B)≥f(A∪B)+f(A∩B) for all A,B⊆N.\labelthisequationsubmodularity

Submodular set functions play a significant role in classical combinatorial optimization. More recently, due to theoretical developments and a plethora of applications ranging from algorithmic game theory, machine learning, and information retrieval & analysis, their study has seen a resurgence of interest. In this paper we are interested in constrained submodular function

maximization. Given a non-negative submodular set function over a finite ground set the goal is to find where is down-closed family of sets that captures some packing constraint of interest. The canonical problem here is the cardinality constrained problem . Among many other applications, this problem captures NP-Hard problems including the Maximum -Cover problem which can not be approximated to better than a -factor for any unless . The cardinality constrained problem has been well-studied from the 70’s with an optimal -approximation established via a simple greedy algorithm when is monotone . There has been extensive theoretical work in the last decade on approximation algorithms for submodular function maximization. Several new algorithmic ideas were developed to obtain improved approximation ratios for various constraints, and to handle non-monotone functions. One of these new ingredients is the multilinear relaxation approach  that brought powerful continuous optimization techniques to submodular function maximization. We refer the reader to a recent survey  for some pointers to the new developments on greedy and continuous methods, and to  on local search methods.

Recent applications of submodular function maximization to large data sets, and technological trends, have motivated new directions of research. These include the study of faster algorithms in the sequential model of computation [2, 34, 19, 35, 27], algorithms in distributed setting [33, 29, 32, 8, 9, 30], and algorithms in the streaming setting [3, 14, 18]. Barbosa et al.  developed a general technique to obtain a constant round algorithm in the MapReduce model of computation that gets arbitrarily close to the approximation achievable in the sequential setting. The MapReduce model captures the distributed nature of data but allows for a polynomial amount of sequential work on each machine. In some very recent work Balkanski and Singer  suggested the study of adaptivity requirements for submodular function maximization which is closer in spirit to the traditional parallel computation model such as the PRAM. To a first order approximation the question is the following. Assuming that the submodular function can be evaluated efficiently in parallel, how fast can constrained submodular function maximization be done in parallel? To avoid low-level considerations of the precise model of parallel computation, one can focus on the number of adaptive rounds needed to solve the constrained optimization problem; this corresponds to the depth in parallel computation. The formal definition of the notion of adaptivity from  is the following. An algorithm with oracle access to a submodular function is -adaptive for an integer if for , every query to in round depends only on the answers to queries in rounds to (and is independent of all other queries in rounds and greater). We believe that the definition is intuitive and use other terms such as depth, rounds and iterations depending on the context.

Balkanski and Singer  considered the basic cardinality constrained problem and showed that in the value oracle model (where one assumes black box access to ), one needs rounds of adaptivity for a constant factor approximation. They also developed a randomized algorithm with an approximation ratio of . In recent work, Balkanski et al.  and Ene and Nguyen  described randomized algorithms that achieved a near-optimal approximation ratio of with adaptivity. The algorithm of Ene and Nguyen  uses function calls, while the algorithm of Balkanski et al.  uses function calls222We use notation to suppress poly-logarithmic factors..

We refer the reader to  for extensive justification for the study of adaptivity of submodular function maximization. We believe that the close connections to parallel algorithms is already a theoretically compelling motivation. For instance, specific problems such as Set Cover and Maximum -Cover have been well-studied in the PRAM model (see  and references therein). Our goals here are twofold. First, can we obtain parallel algorithms for other and more general classes of constraints than the cardinality constraint? Second, is there a unified framework that cleanly isolates the techniques and ideas that lead to parallelization for submodular maximization problems?

#### Our Contribution:

We address our goals by considering the following general problem. Given a monotone submodular function maximize subject to a set of explicitly given packing constraints in the form ; here , and is a non-negative matrix. Packing constraints in this form capture many constraints of interest including cardinality, partition and laminar matroids, matchings in graphs and hypergraphs, independent sets in graphs, multiple knapsack constraints, and their intersections to name a few. To solve these in a unified fashion we consider the problem of solving in parallel the following multilinear relaxation:

 maximize \mlfx s.t\ Ax≤\ones and x∈[0,1]N.\labelthisequation[Pack-ML]pack−multi

Here is the multilinear extension of , a continuous extension of defined formally in background. We mention that solving a packing LP of the form

 maximize \ripcx~{}s.t~{}Ax≤\ones and x∈[0,1]n\labelthisequation[Pack-LP]pack−lp

with is a special case of our problem.

The multilinear relaxation is used primarily for the sake of discrete optimization. For this reason we make the following convenient assumption: for every element of the ground set , the singleton element satisfies the packing constraints, that is, . Any element which does not satisfy the assumption can be removed from consideration. We make this assumption for the rest of the paper as it helps the exposition and avoids uninteresting technicalities.

Our main result is the following theorem.

main-intro There is a parallel/adaptive algorithm that solves the multilinear relaxation of a monotone submodular function subject to packing constraints with the following properties. For a given parameter :

• It outputs a -approximation to the multilinear relaxation.

• It runs in adaptive rounds.

• The algorithm is deterministic if given value oracle access to and its gradient . The total number of oracle queries to and is .

• If only given access to a value oracle for the algorithm is randomized and outputs a

-approximate feasible solution with high probability, and deterministically finishes in the prescribed number of rounds. The total number of oracle accesses to

is .

Our algorithm solves the continuous relaxation and outputs a fractional solution . To obtain an integer solution we need to round . Several powerful and general rounding strategies have been developed over the years including pipage rounding, swap rounding, and contention resolution schemes [13, 7, 17, 16, 28, 26, 12]

. These establish constant factor integrality gaps for the multilinear relaxation for many constraints of interest. In particular, for cardinality constraints and more generally matroid constraints there is no loss from rounding the multilinear relaxation. Thus solving the multilinear relaxation in main-intro already gives an estimate of the value of the integer optimum solution. One interesting aspect of several of these rounding algorithms is the following: with randomization, they can be made

oblivious to the objective function (especially for monotone submodular functions). Thus one can convert the fractional solution into an integer solution without any additional rounds of adaptivity. Of course, in a fine-grained parallel model of computation such as the PRAM, it is important to consider the parallel complexity of the rounding algorithms. This will depend on the constraint family. We mention that the case of partition matroids is relatively straight forward and one can derive a randomized parallel algorithm with an approximation ratio of with poly-logarithmic depth. In rounding we briefly discuss some rounding schemes that can be easily parallelized.

For the case of cardinality constraint we are able to derive a more oracle-efficient algorithm with similar parameters as the ones in [6, 21]. The efficient version is presented as a discretization of the continuous algorithm, and we believe it provides a different perspective from previous work333Balkanski and Singer [4, Section D] describe very briefly a connection between their -approximation algorithm and the multilinear relaxation but not many details are provided.. The algorithm can be extended to a single knapsack constraint while maintaining a depth of .

Our parallel algorithm for the multilinear relaxation relies only on “monotone concavity” of the multilinear extension (as defined in background). Thus our parallel alogirthm also applies to yield a -approximation for maximizing any monotone concave function subjecting to packing constraints. Even for non-decreasing concave functions, which can be optimized almost exactly in the sequential setting, it is not clear that they can be solved efficiently and near optimally in the parallel setting when in the oracle model with black box access to the the function and its gradient.

A number of recent papers have addressed adaptive and parallel algorithms for submodular function maximization. Our work was inspired by [4, 6, 21] which addressed the cardinality constraint. Other independent papers optimized the adaptivity and query complexity , and obtained constant factor approximation for nonnegative nonmonotone functions under a cardinality constraint [5, 24]. Partly inspired by our work, Ene et al.  obtained improved results for approximating the multilinear relaxation with packing constraints. First, they obtain a -approximation for the monotone case in rounds of adaptivity. Second, they are able to handle nonnegative functions and obtain a -approximation.

### 1.1 Technical overview

We build upon several ingredients that have been developed in the past. These include the continuous greedy algorithm for approximating the multilinear relaxation [38, 13] and its adaptation to the multiplicative weight update method for packing constraints . The parallelization is inspired by fast parallel approximation schemes for positive LPs pioneered by Luby and Nisan  and subsequently developed by Young . Here we briefly sketch the high-level ideas which are in some sense not that complex.

We will first consider the setting of a single constraint (), which corresponds to a knapsack constraint of the form . For linear objective functions , we know that the optimal solution is obtained by greedily sorting the coordinates in decreasing order of and choosing each coordinate in turn to its fullest extent of the upper bound until the budget of one unit is exhausted (the last job may be fractionally chosen). One way to parallelize the greedy algorithm (and taking a continuous view point) while losing only a -factor is the following. We bucket the ratios into a logarithmic number of classes by some appropriate discretization. Starting with the highest ratio class, instead of choosing only one coordinate, we choose all coordinates in the same bucket and increase them simultaneously in parallel until the budget is met or all coordinates reach their upper bound. If the budget remains we move on to the next bucket. It is not hard to to see that this leads to a parallel algorithm with poly-logarithmic depth; the approximation loss is essentially due to bucketing.

Consider now the nonlinear case, pack-multi under a knapsack constraint. In the sequential setting, the continuous greedy algorithm [38, 13] is essentially the following greedy algorithm presented as a continuous process over time. At any time , if is the current solution, we increase only for the best “bang-for-buck” coordinate ; here is the th coordinate of gradient of the at . In the special case of the cardinality constraint, this is the coordinate with the largest partial derivative. Multilinearity of implies that we should increase the same coordinate until it reaches its upper bound. A natural strategy to parallelize this greedy approach is to bucket the ratios of the coordinates (by some appropriate discretization) and simultaneously increase all coordinates in the best ratio bucket. This won’t quite work because is non-linear and the gradient values decrease as increases444This tension is also central to the recent works [4, 6, 21]. We believe that it is easier to understand it in the continuous setting where one can treat the matter deterministically.. Here is a simple but key idea. Let be the current highest ratio and let us call any coordinate in the highest bucket a good coordinate. Suppose we increase all good coordinates by some until the average ratio of the good coordinates falls, after the increase, to . During the step we have a good rate of progress, but the step size may be very small. However, one can argue that after the step, the number of good coordinates for current gradient level falls by an fraction. Hence we cannot make too many such steps this bucket empties, and have made “dual” progress in terms of decreasing the -norm of the gradient. This simple scheme suffices to recover a polylogarithmic depth algorithm for the knapsack constraint. With some additional tricks we can convert the algorithm into a randomized discrete algorithm that recovers the parameters of [6, 21] for the cardinality constraint. We note that viewing the problem from a continuous point of view allows for a clean and deterministic algorithm (assuming value oracles for and its gradient ).

The more technical aspect of our work is when ; that is, when there are several constraints. Here we rely on a Lagrangean relaxation approach based on the multiplicative weight update (MWU) method for positive LPs, which has a long history in theoretical computer science . The MWU approach maintains non-negative weights on the constraints and solves a sequence of Lagrangean relaxations of the original problem while updating the weights. Each relaxed problem is obtained by collapsing the constraints into a single constraint obtained by taking a weighted linear combination of the original constraints. Note that this single constraint is basically a knapsack constraint. However, the weights are updated after each step and hence the knapsack constraint evolves dynamically. Nevertheless, the basic idea of updating many variables with the same effective ratio that we outlined for the single knapsack constraint can be generalized. One critical feature is that the weights increase monotonically. In the sequential setting,  developed a framework for pack-multi that allowed a clean combination of two aspects: (a) an analysis of the continuous greedy algorithm for proving a -approximation for the multilinear relaxation and (b) the analysis of the step size and weight updates in MWU which allows one to argue that the final solution (approximately) satisfies the constraints. We borrow the essence of this framework, but in order to parallelize the algorithm we need both the dual gradient-decreasing viewpoint discussed above and another idea from previous work on parallel algorithms for positive LPs [31, 41]. Recall that in the setting of a single knapsack constraint, when we update multiple variables, there are two bottlenecks for the step size: the total budget and the change in gradient. In the MWU setting, the step size is further controlled by weight update considerations. Accordingly, the step size update rule is constrained such that if we are increasing along the coordinate with a current value of , then the updated value is at most . This limit is conservative enough to ensure the weights do not grow too fast, but can only limit the step size a small number of times before the geometrically increasing coordinates exceed a certain upper bound.

#### Organization:

The rest of the paper is organized as follows. background describes relevant background on submodular functions and the multilinear extension. In cardinality, we first describe and analyze an algorithm for the multilinear relaxation when we have a single cardinality constraint. This give an algorithm with depth assuming oracle access to the multilinear extension and its derivative , which in turn can be implemented via (many more) oracle calls to without increasing the adaptivity. We describe and analyze our algorithm for general packing constraints in mwu. In rpg, we analyze a randomized discretization of the continuous algorithm for cardinality constraints with a better oracle complexity w/r/t . In knapsack, we describe and analyze -adaptive algorithms for maximizing a monotone submodular function subject to a single knapsack constraint.

Note that cardinality is largely included to develop some intuition ahead of the more complicated constraints in mwu, but none of the formal observations in cardinality are invoked explicitly in mwu. Moreover, the bounds obtained in cardinality for the cardinality constraint are already known [6, 21]. The reader primarily interested in the main result regarding general packing constraints may prefer to skip ahead to mwu.

## 2 Submodular set functions and the Multilinear relaxation

background In this section we provide some relevant background and notation that we use in the rest of the paper. Let assign real values to subsets of . is nonnegative if for all . is monotone if implies . is normalized if .

We have already seen one definition of submodularity in submodularity. Another useful (and equivalent) definition is via marginal values. For a real-valued set function , the marginal value of a set with respect to a set is defined as , which we abbreviate by . If is a singleton we write instead of . We also use the notation and as short hand for and . A set function is submodular iff it satisfies the following property modeling decreasing marginal returns:

The following seemingly restricted form of this property also suffices: and we will see a continuous analogue of this latter property subsequently. In this paper we restrict attention to normalized, nonnegative and monotone submodular set functions.

### 2.1 Multilinear extension and relaxation

In this section, we outline basic properties of a continuous extension of submodular functions to the fractional values in called the multilinear extension .

For two vectors

, let be the coordinatewise maximum of and , and let denote the coordinatewise minimum, and let . We identify an element with the coordinate vector , and a set of elements with the sum of coordinate vectors, In particular, for a vector and a set of coordinates , is the vector obtained from by setting all coefficients not indexed by to 0, and is the vector obtained from setting all coordinates indexed by to 0.

Given a set function , the multilinear extension of , denoted , extends to the product space by interpreting each point as an independent sample with sampling probabilities given by , and taking the expectation of . Equivalently,

 \mlf(x)=∑S⊆N\parof∏i∈Sxi∏i∉S(1−xi).

We extend to the cone by truncation: . where takes the coordinatewise minimum of and the all-ones vector . We also write which generalizes the definition of marginal values to the continuous setting. We let denote the gradient of at and denote the Hessian of at . denotes the partial derivative of with respect to , and denotes the second order partial derivative with respect to and . The following lemma captures several submodularity properties of that it inherits from . The properties are paraphrased from [38, 13] and can be deriveed from the algebraic formula for and submodularity of .

Let be the multilinear extension of a set function , and .

1. (Multilinearity) For any , In particular, is linear in .

2. (Monotonicity) For any , In particular, if is monotone, then is nonnegative, and is monotone (that is, if ).

3. For any , for , we have

 \ddmlf[i][j]x=\mlfy∨\setofi,j−\mlfy∨i−\mlfy∨j+\mlfy.

If is submodular, then

4. (Monotone concavity) For any , the function is concave in (whenever is defined).

#### Multilinear relaxation:

The multilinear extension of a submodular function has many uses, but a primary motivation is to extend the relax-and-round framework of approximation algorithms for linear functions to submodular function maximization. Given a discrete optimization problem of the form we relax it to the continuous optimization problem where is a polyhedral or convex relaxation for the feasible solutions of constraint family . The problem is referred to as the multilinear relaxation. It is useful to assume that linear optimization over is feasible in polynomial time in which case it is referred to as solvable. The multilinear relaxation is not exactly solvable even for the simple cardinality constraint polytope . The continuous greedy algorithm  gives an optimal approximation for solvable polytopes when is monotone. Our focus in this paper is the restricted setting of explicit packing constraints.

#### Preprocessing:

Recall that we made an assumption that for all , . With this assumption in place we can do some useful preprocessing of the given instance. First, we can get lower and upperbounds on , the optimum solution value for the relaxation. We have and . Since we are aiming for a -approximation we can assume that for all , ; any element which does not satisfy this assumption can be discarded and the total loss is at most . Further, we can see, via sub-additivity of and that . We can also assume that or for all ; if we can round it down to . Let be the modified matrix. If then we have that . Therefore . From monotone concavity we also see that . Thus, solving with respect to does not lose more than a multiplicative factor when compared to solving with .

#### Evaluating \mlf and \dmlf:

The formula for gives a natural random sampling algorithm to evaluate in expectation. Often we need to evaluate and to high accuracy. This issue has been addressed in prior work via standard Chernoff type concentration inequalities when is non-negative.

 dmlf-sample Suppose . Then with parallel evaluations of one can find an estimate of such that . Similarly, if , then with parallel evaluations of , one can find an estimate of such that .

Choosing and we can estimate and to within a multiplicative error, and an additive error of and respectively. Via the preprocessing that we already discussed, we can assume that and . For any such that we can set to obtain a -relative approximation. Similarly if we can obtain a -relative approximation by setting .

In some cases an explicit and simple formula for exists from which one can evaluate it deterministically and efficiently. A prominent example is the coverage function of a set system. Let be defined via a set system on sets over a universe of size as follows. For we let , the total number of elements covered by the sets in . It is then easy to see that

 \mlfx=∑e∈U\parof1−∏i:e∈Ai(1−xi).

Thus, given an explicit representation of the set system, and can be evaluated efficiently and deterministically555We ignore the numerical issues involved in the computation. One can approximate the quantities of interest with a small additive and multiplicative error via standard tricks..

Throughout the paper we assume that is sufficiently small. We also assume that , since otherwise sequential algorithms already achieve -adaptivity.

## 3 Parallel maximization with a cardinality constraint

cardinality

We first consider the canonical setting of maximizing the multilinear extension of a submodular function subject to a cardinality constraint specified by an integer . The mathematical formulation is below.

 maximize \mlfx over x∈\nnrealsN s.t.\ \rip\onesx≤k.

This problem was already considered and solved to satisfaction by Balkanski et al.  and Ene and Nguyen . The approach given here is different (and simple enough), and is based on the continuous-greedy algorithm of Călinescu et al. , specialized to the cardinality constraint polytope. Establishing this connection lays the foundation for general constraints in mwu. That said, there is no formal dependence between mwu and this section. As the bounds presented in this section have been obtained in previous work [6, 21], the reader primarily interested in new results may want to skip ahead to mwu.

We propose the algorithm parallel-greedy, given in parallel-greedy. It is a straightforward parallelization of the original continuous-greedy algorithm due to Călinescu et al. , specialized to the cardinality polytope. continuous-greedy is an iterative and monotonic algorithm that, in each iteration, computes the gradient and finds the point in the constraint polytope that maximizes . In the case of the cardinality polytope, is for the coordinate with the largest gradient. continuous-greedy then adds to for a fixed and conservative step size . The new algorithm parallel-greedy makes two changes to this algorithm. First, rather than increase along the single best coordinate, we identify all “good” coordinates with gradient values nearly as large as the best coordinate, and increase along all of these coordinates uniformly. Second, rather than increase along these coordinates by a fixed increment, we choose dynamically. In particularly, we greedily choose as large as possible such that, after updating and thereby decreasing the gradient coordinatewise, the set of good coordinates is still nearly as good on average.

The dynamic choice of accounts for the fact that increasing multiple coordinates simultaneously can affect their gradients. The importance of greedily choosing the step size is to geometrically decrease the number of good coordinates. It is shown below (in pg-depth) that, when the many good coordinates are no longer nearly-good on average, a substantial fraction of these coordinates are no longer good. When there are no nearly-good coordinates remaining, the threshold for “good” decreases. The threshold can decrease only so much before we can conclude that the current solution cannot be improved substantially and obtains the desired approximation ratio. Thus parallel-greedy takes a primal-dual approach equally concerned with maximizing the objective as driving down the gradient.

We first assume oracle access to values and gradients . The algorithm and analysis immediate extends to approximate oracles that return relative approximation to these quantities. Such oracles do exist (and are readily parallelizable) for many real submodular functions of interest. Given oracle access to , one can implement sufficiently accurate oracles to and without increasing the depth but with many more oracle calls to . In cardinality-oracle, we present a randomized discretization of parallel-greedy that improves the oracle compliexity w/r/t . Note that the algorithms in [6, 21] call directly and do not assume oracle access to or .

### 3.1 Approximation ratio

We first analyze the approximation ratio of the output solution . The main observation is that is an upper bound on the gap . pg-threshold At any point, we have .

###### Proof.

The claim holds initially. Whenever is increased, decreases since is monotone, and hence the claim continues to hold. Whenever is about to be decreased in pg-threshold, we have empty (or the algorithm terminates since ) with respect to the current value of . Thus, if is an optimal solution then we have

 \opt−\mlfx \tago≤\mlfzx\tago≤\rip\dmlfxz∨x−x\tago≤\rip\dmlfxz\tago≤\epslessλk\ripz\ones\tago≤\epslessλ

by monotonicity of , monotonic concavity of , monotonicity of (implying ) and , emptiness of w/r/t , and the fact that

The connection between and allows us to reinterpret pg-apx-condition as saying that we are closing the objective gap at a good rate in proportion to the increase in the (fractional) cardinality of . This is the basic invariant in standard analyses of the greedy algorithm that implies that greedy achieves a (near) -approximation, as follows. The output satisfies

###### Proof.

Let be the total sum of the coordinates. From the preceding lemma and the choice of in the algorithm, we have in pg-apx-condition, hence

 d\mlfxdt≥\epsless2k\parof\opt−\mlfx,

hence

 \mlfx≥\parof1−exp−\epsless2t/k\opt.

In particular, if at the end of the algorithm, we have

 \mlfx≥\apxless\parof1−e−1\opt.

If , then . In either case, the output satisfies

### 3.2 Iteration count

We now analyze the iteration count of parallel-greedy. The key observation lies in line pg-apx-condition. If is determined by line pg-apx-condition, then the margin of taking uniformly has dropped significantly. In this case, as the next lemma shows, a significant fraction of the coordinates in must have had their marginal returns decrease enough to force them to drop out of . The iteration can then be charged to the geometric decrease in .

pg-depth If then the step pg-update-S decreases by at least a -multiplicative factor. This implies that, for fixed , the loop at pg-inc-loop iterates at most times, and at most times total. That is, each step in iterates at most times.

###### Proof.

Let and denote the values of and before updating, and let and denote the values of and after. We want to show that We have

 \epsless2λδ\sizeofS′k \tago=\mlfx′′x≥\rip\dmlfx′′δS′\tago≥\rip\dmlfx′′δS′′\tago≥\epslessλδ\sizeofS′′k.\labelthisequationpg−depth−derivation

by choice of , monotonicity, and definition of . Dividing both sides by , we have

One implementation detail is finding in the inner loop. We can assume that (since below , the gradient does not change substantially). It is easy to see that a (say) -multiplicative approximation of the exact value of suffices. (A more detailed discussion of approximating is in the more subtle setting of generic packing constraints is given later in pmwu-work). Hence we can try all powers of between and 1 to find a sufficiently good approximation of . A second implementation detail regards to initial value of for upper bounding . Standard tricks allow us to obtain a constant factor without increasing the depth; see the related discussing w/r/t general packing constraints in pmwu-work.

### 3.3 Oracle complexity w/r/t f

cardinality-oracle

The preceding algorithm and analysis were presented under the assumption that gradients of the multilinear extension were easy to compute (at least, approximately). This assumption holds for many applications of interest. In this section, we consider a model where we only have oracle access to the underlying set function .

We first note that and can still be approximated (to sufficient accuracy) by taking the average of for many random samples . To obtain -accuracy with high probability for either or a single coordinate of , one requires about samples, each of which may be computed in parallel (see dmlf-sample). Thus parallel-greedy still has depth in this model. However, the total number of queries to increases to , because computing an entire gradient to assemble in line pg-good-coordinates requires queries to .

To reduce the oracle complexity w/r/t , we propose the alternative algorithm randomized-parallel-greedy in randomized-parallel-greedy, which is guided by the previous parallel-greedy algorithm, but maintains a discrete set rather than a fractional solution . The primary difference is in steps rpg-sample and rpg-inc, where rather than add the fractional solution to our solution, we first sample a set (where each coordinate in is drawn independently with probability ), and then we add to the running solution. The primary benefit to this rounding step is that computing the gradient is replaced by computing the margins , which requires only a constant number of oracle calls per element.

We defer the analysis of randomized-parallel-greedy to rpg. At a high level, one can see that the key points to the analysis of parallel-greedy now hold in expectation. Further techniques from randomized analysis adapt the essential invariants from parallel-greedy to the additional randomization to obtain the following bounds.

rpg Let be given, let be a normalized, monotone submodular function in the oracle model, and let . Then with high probability, randomized-parallel-greedy computes a multiplicative approximation to the maximum value set of cardinality with expected adaptivity and expected oracle calls to .

## 4 Parallel maximization with packing constraints

mwu We now consider the general setting of maximizing the multilinear relaxation in the setting of explicit packing constraints in the form below.

 maximize \mlfx over x∈[0,1]N% s.t.\ Ax≤\ones.

We refer the reader to some preprocessing steps outlined in background. In pmwu, we give a parallel algorithm that combines the many-coordinate update and greedy step size of parallel-greedy with multiplicative weight update techniques that navigates the packing constraints. The high-level MWU framework follows the one from .

We briefly explain the algorithm. The framework from  has a notion of time, maintained in the variable , that goes from to . The algorithm maintains non-negative weights for each constraint that reflect how tight is each constraint. In the sequential setting, the algorithm in  combines continuous-greedy and MWU as follows. In each iteration, given the current vector , it finds a solution to the following linear optimization problem with a single non-trival constraint obtained via a weighted linear combination of the packing constraints:

 max\rip\dmlfxy s.t.\ \ripoverAw\optsubx≤\ripw\optsub\ones and y≥\zeroes.

The optimum solution to this relaxation is a single coordinate solution where maximizes the ratio . The algorithm then updates by adding for some appropriately small step size and then updates the weights. The weights are maintained according to the MWU rule and (approximately) satisfy the invariant that for .

The parallel version differs from the sequential version as follows. When solving the Lagrangean relaxation it considers all good coordinates (the set ) whose ratios are close to and simultaneously updates them. The step size has to be adjusted to account for this, and the adjusted step size is a primary difference from the algorithm in . The sequential algorithm takes a greedy step for the sake of obtaining width independence. In the parallel setting, two different considerations come in to play. First, the simultaneous update to many coordinates means that the step size needs to be small enough such that the gradient does not change too much, but that it does change sufficiently so that we can use an averaging argument to limit the number of iterations. Second, if the gradient is not the bottleneck, then the bottleneck comes from limiting the change in to ensure the weights do not grow too rapidly. In this case, we ensure that each coordinate increases by at least multiplicative factor, which can only happen a limited number of times due to the starting value of .

We organize the formal analysis into four parts. The first part, pmwu-packing, concerns the packing constraints, and shows that the output satisfies . The second part, pmwu-apx, concerns the approximation ratio, and shows that the output has an approximation factor of . The third part, pmwu-iterations, analyzes the number of iterations and shows that each step in pmwu is executed at most times. The last part, pmwu-work, addresses the total number of oracle calls. The lemmas in these parts together prove main-intro.

We first note the monotonicity of the various variables at play. Over the course of the algorithm, is increasing, is increasing, is increasing, is increasing, is increasing, is decreasing, and is decreasing. Within the loop at pmwu-inc-loop, is decreasing.

### 4.1 Feasibility of the packing constraints

pmwu-packing We first show that the algorithm satisfies the packing constraints to within a -factor. The first fact shows that the weights grow at a controlled rate as increases. The basic proof idea, which appears in Young , combines the fact that we increase (some coordinates) of by a small geometric factor, and the fact that is recursively near-feasible. This implies that the increase in load of any constraint is by at most a small additive factor, hence the weights (which exponentiate the loads) increase by at most a small geometric factor.

pmwu-weight-step At the beginning of each iteration of step pmwu-inc, if , then

 w\optsub(x+δγ(x∧S))≤\epsmore\ripw\optsub\ones.
###### Proof.

For each constraint , we have

 w\optsubi\parofx+δγ(x∧S) =w\optsubi\parofδγ(x∧S)w\optsubi(x)\tago≤w\optsubi\parof\eps24logm(x∧S)w\optsubi(x) =e\eps4(A(x∧S))iw\optsubi(x)\tago≤e\eps/2w\optsubi(x)\tago≤\epsmorew\optsubi(x)

by choice of per pmwu-weight-condition, , and upper bounding the Taylor expansion of . ∎

pmwu-weight-growth At the beginning of each iteration of step pmwu-inc, if , then

 \ripw\optsub(x+δγ(x∧S))\ones≤\parof1+δ\epsmorelogm\eps\ripw\optsub\ones.\labelthisequationpmwu−packing
###### Proof.

The following is a standard proof from the MWU framework, where the important invariant is preserved by choice of w/r/t pmwu-weight-condition. Let . Define where we recall that

 w\optsubi(x+τγ\parofx∧S)=w\optsubi(x)expτγlogm\eps\parofA\parofx∧Si.

We have

 \ripw\optsub(x′)\ones−\ripw\optsub(x)\ones =\ripω\optparτ\ones−\ripω\optpar0\ones=∫δ0ddτ\ripω\optparτ\onesdτ =γlogm\eps∫δ0\pripoverAω\optparτx∧Sdτ\tago≤\epsmoreγlogm\eps∫δ0\pripoverAw\optsubx∧Sdτ \tago≤\epsmorelogm\eps∫δ0\ripw\optsub\onesdτ=δ\epsmorelogm\eps\ripw\optsub\ones

by monotonicity of and pmwu-weight-step and choice of . ∎

pmwu-packing The output of the algorithm satisfies .

###### Proof.

We prove a slightly stronger claim; namely, that at each time , one has .

Consider pmwu-weight-growth. So long as

, by interpolating (the upper bound on)

as a continuous function of , we have

 ddt\ripw\optsub\ones≤\epsmorelogm\eps\ripw\optsub\ones. (1)

Initially, when , we have by choice of .

Solving the differential inequality with initial value , we have

 \ripw\optsub\ones≤m2exp\epsmorelogm\epst=explogm\eps\parof\epsmoret+2\eps.

for all as long as . In particular, since for each , we have

 Ax≤\parof\epsmoret+2\eps\ones≤\parof1+3\eps\ones≤2\ones.

By induction on , we have for all . ∎

### 4.2 Approximation ratio

pmwu-apx

We now analyze the approximation ratio of the output solution . The main observation, similar to pg-threshold for parallel-greedy, is that is an upper bound on the gap . At all times, .

###### Proof.

The claim holds initially with and . Whenever is increased and is unchanged, increases due to monotonicity of , hence the claim continuous to hold. Whenever is about to be decreased in pmwu-threshold, we have empty with respect to the current value of . Thus, letting be an optimal solution, we have

 \opt−F(x) \tago≤\mlfzx\tago≤\rip\dmlfxz\tago≤\epsless3λ\ripoverAw\optsubzW+\eps\epslessλ \tago≤\epsless3λ\ripw\optsub\onesW+\eps\epslessλ\tago≤\epsless2λ+\eps\epslessλ≤\epslessλ

by monotonicity of , nonnegative concavity, , , and . Thus, after replacing with , we still have . ∎

pmwu-apx The output of the algorithm satisfies

###### Proof.

From the preceding lemma and line pmwu-gradient-condition of the algorithm, we have the following. Suppose changes to with step size . We have

 \mlfx′−\mlfx≥(1−\eps)4δλ≥(1−\eps)4δ\parof\opt−\mlfx,

and increases by . Therefore, , as a function of , increases at a rate such that In particular, since the algorithm terminates with either or , the output satisfies

### 4.3 Iteration count

pmwu-iterations

In this section, we analyze the total number of iterations in parallel-mwu. parallel-mwu consists of two nested loops: an outer loop pmwu-loop, where and are adjusted, and an inner loop pmwu-inc-loop, which increases uniformly along “good” coordinates w/r/t fixed values of and . We first analyze the number of iterations of the outer loop. In each iteration of the outer loop pmwu-loop except for the last, either decreases by a -multiplicative factor, or increases by a -multiplicative factor. This implies that, since ranges from to , and ranges from to , there are at most iterations of the outer loop.

###### Proof.

The inner loop pmwu-inc-loop terminates when either , , or . If , then decreases in line pmwu-threshold. If , then since was the value of at the beginning of the iteration, we have that increased by a -multiplicative factor. If , then this is the last iteration of pmwu-loop. ∎

We now analyze the number of iterations of the inner loop pmwu-inc-loop for each fixed iteration of the outer loop pmwu-loop. Each iteration of the inner loop, except for possibly the last, fixes based on either pmwu-gradient-condition or pmwu-weight-condition. We first bound the number of times can be chosen based on pmwu-weight-condition. The key idea to the analysis (due to ) is that one can only geometrically increase the coordinates in a small number of times before violating the upper bounds on the coordinates implied by .

pmwu-weight-step-count In each iteration of the outer loop pmwu-loop, is determined by pmwu-weight-condition at most times.

###### Proof.

If is determined by pmwu-weight-condition more than times, consider the coordinate that survives in throughout all of these many iterations. Such a coordinate exists because the set is decreasing throughout the inner loop. The initial value of sets for some , and by pmwu-packing, cannot exceed . Each iteration where is determined by pmwu-weight-condition increases (hence ) by a -multiplicative factor. Applying this multiplicative increase more than times would violate the upper bound on . ∎

We now analyze the number of times can be chosen based on pmwu-gradient-condition. The following lemma is analogous to pg-depth, but the analysis is more subtle because of (a) the general complexity added by the weights and (b) the fact that the underlying potential function is not monotone. pmwu-gradient-step For a fixed iteration of the outer loop pmwu-loop, is determined by pmwu-gradient-condition at most times.

###### Proof.

The overall proof is based on the potential function , which is always in the range for nonempty . An important distinction from the potential function of pg-depth is that is not monotone: and are both decreasing, but is increasing. On one hand, the total growth by is bounded above by an -multiplicative factor coordinatewise by our initial choice of , as discussed in pmwu-weight-step-count. On the other hand, we claim that whenever is determined by pmwu-gradient-condition, decreases by a

-multiplicative factor. We prove the claim below, but suppose for the moment that this claim holds. Then we have a

-multiplicative range for with non-empty , and that the total increase of is bounded above (via the bound on the growth of ) by a -multiplicative factor. It follows that is determined by pmwu-gradient-condition at most times until falls below the lower bound , and is empty.

Now we prove the claim. Let , , and denote the values of , and after the update in step pmwu-update. We want to show that for some constant . We have

 δγ\rip\dmlfx′x′∧S′ \tago≤\parof1+\eps2logmδγ\rip\dmlfx′x∧S′ \tago≤\parof1+\eps2logmδγ\rip\dmlfx′x∧