# Optimal Algorithms for Submodular Maximization with Distributed Constraints

We consider a class of discrete optimization problems that aim to maximize a submodular objective function subject to a distributed partition matroid constraint. More precisely, we consider a networked scenario in which multiple agents choose actions from local strategy sets with the goal of maximizing a submodular objective function defined over the set of all possible actions. Given this distributed setting, we develop Constraint-Distributed Continuous Greedy (CDCG), a message passing algorithm that converges to the tight (1-1/e) approximation factor of the optimum global solution using only local computation and communication. It is known that a sequential greedy algorithm can only achieve a 1/2 multiplicative approximation of the optimal solution for this class of problems in the distributed setting. Our framework relies on lifting the discrete problem to a continuous domain and developing a consensus algorithm that achieves the tight (1-1/e) approximation guarantee of the global discrete solution once a proper rounding scheme is applied. We also offer empirical results from a multi-agent area coverage problem to show that the proposed method significantly outperforms the state-of-the-art sequential greedy method.

## Authors

• 4 publications
• 2 publications
• 5 publications
• 36 publications
• 27 publications
• ### A Parallel Double Greedy Algorithm for Submodular Maximization

We study parallel algorithms for the problem of maximizing a non-negativ...
12/04/2018 ∙ by Alina Ene, et al. ∙ 0

• ### A Submodularity-Based Approach for Multi-Agent Optimal Coverage Problems

We consider the optimal coverage problem where a multi-agent network is ...
08/14/2017 ∙ by Xinmiao Sun, et al. ∙ 0

• ### Distributed Submodular Minimization over Networks: a Greedy Column Generation Approach

Submodular optimization is a special class of combinatorial optimization...
12/14/2018 ∙ by Andrea Testa, et al. ∙ 0

• ### Few Cuts Meet Many Point Sets

We study the problem of how to breakup many point sets in R^d into small...
08/09/2018 ∙ by Sariel Har-Peled, et al. ∙ 0

• ### Speeding Up Distributed Pseudo-tree Optimization Procedure with Cross Edge Consistency to Solve DCOPs

Distributed Pseudo-tree Optimization Procedure (DPOP) is a well-known me...
09/14/2019 ∙ by Mashrur Rashik, et al. ∙ 0

• ### The Impact of Message Passing in Agent-Based Submodular Maximization

Submodular maximization problems are a relevant model set for many real-...
04/07/2020 ∙ by David Grimsman, et al. ∙ 0

• ### The discrete optimization problems with interval objective function on graphs and hypergraphs and the interval greedy algorithm

We consider the discrete optimization problems with interval objective f...
03/04/2020 ∙ by Alexander Prolubnikov, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Recently, the need has arisen to design algorithms that distribute decision making among a collection of agents or computing devices. This need has been motivated by problems from statistics, machine learning and robotics. These problems include:

• (Density estimation)

What is the best way to estimate a non-parametric density function from a distributed dataset?

[1]

• How should we summarize very large datasets in a distributed manner to facilitate Gaussian process regression? [2]

• (Information acquisition) How should a team of mobile robots move in order to acquire information about an environmental process or reduce uncertainty in a mapping task? [3]

Research toward solving the problems posed in these applications has resulted in a large body of work on topics such as sensing and coverage [4]-[5]

[6], and learning and statistics [7]-[8]. Inherent to all of these applications is an underlying optimization problem that can be expressed as

 maximize f(S) (1a) subjectto S⊆Y,S∈I (1b)

where is a submodular function (i.e. it has a diminishing-returns property), is a finite ground set of all decision variables, and is a family of allowable subsets of . In words, the goal of (1) is to pick a set from the family of allowable subsets that maximizes the submodular set function . A wide class of relevant objective functions such as mutual information and weighted coverage are submodular; this has motivated a growing body of work surrounding submodular optimization problems [9]-[10].

Intuitively, it is useful to think of the problem in (1) as a distributed -player game. In this game, each player or agent has a distinct local strategy set of actions. The goal of the game is for each agent to choose at most one action from its own strategy set to maximize a problem-specific notion of reward. Therefore, the problem is distributed in the sense that agents can only form a control policy with the actions from their local, distinct strategy sets. To maximize reward, agents are allowed to communicate with their direct neighbors in a bidirectional communication graph. In this way, we might think of these agents as robots that collectively aim to solve a coverage problem in an unknown environment by communicating their sensing actions to their nearest neighbors. Throughout this work, we will refer to this multi-agent game example to elucidate our results.

In this paper, our aim is to study problem (1) in a distributed setting, which we will formally introduce in Section III; this setting differs considerably from the centralized setting, which has been studied thoroughly in past work (see [11]). Notably, the distributed setting admits a more challenging problem because agents can only communicate locally with respect to a communication graph. Therefore designing an efficient communication scheme among agents is a concomitant requirement for the distributed setting, whereas in the centralized setting, there is no such desideratum.

### I-a Related work

The optimization problem in (1) has previously been studied in settings that differ significantly from the setting studied in this paper. In particular, [11] addresses this problem in a centralized setting and shows that a centralized algorithm can obtain the tight approximation of the optimal solution. In this way, [11] is perhaps the closest to this paper in that both manuscripts introduce algorithms that obtain the tight guarantee for problem (1) with respect to a particular setting. However, the setting of [11] is inherently centralized, whereas our setting is distributed.

Another similar line of work concerns the so-called “master-worker” model. In this framework, agents solve a distributed optimization problem such as (1) by exchanging local information with a centralized master node. However, this setting also differs from the setting studied in this work in that our results assume an entirely distributed setting with no centralized node [12]-[13].

Fundamentally, the optimization problem posed in (1) is NP-hard. However, near-optimal solutions to (1) can be approximated by greedy algorithms [14]-[15]. In the distributed context, the sequential greedy algorithm (SGA) has been rigorously studied in [16]. This work poses (1) as a communication problem among agents distributed in an directed acyclic graph (DAG) working to optimize a global objective function. The authors of [16] offer upper and lower bounds on the performance of SGA based on the clique number of the underlying DAG. Building on this, [17] analyzes the communication redundancy in such an approach and proposes a distributed planning technique that randomly partitions the agents in the DAG. On the other hand, [18] extends the work of [16] to a sequential setting in which agents have limited access to the prior decisions of other agents. Extensions of SGA such as the distributed SGA (DSGA) have also been proposed. In particular, [19] poses (1) as a multi-robot exploration problem and uses DSGA to quantify the suboptimality incurred by redundant sensing information.

Others have proposed novel algorithms with the goal of avoiding the communication overhead incurred by deploying SGA for a large number of agents. Instead of explicitly solving (1), many of these algorithms seek to solve a continuous relaxation of this problem [20], [21]. This continualization of the problem in (1) was originally introduced in [11]. In particular, [9] proposes several gradient ascent-style algorithms for solving a problem akin to (1) in which each agent has access to a local objective function. Similarly, novel algorithms have been developed for solving problems such as unconstrained submodular maximization [22] and submodular maximization with matroid constraints [11], [23] by first lifting these problems to the continuous domain.

Another notable direction in solving problem (1) has been to define an auxiliary or surrogate function in place of the original submodular objective. For instance, [24] introduces a distributed algorithm for maximizing a submodular auxiliary function subject to matroid constraints that obtains the optimal approximation. This approach of defining surrogate functions in place of the submodular objective differs significantly from our approach.

### I-B Contributions

In this paper, we formulate the general case of maximizing a submodular function subject to a distributed partition matroid constraint in Problem 1. We then formulate the continuous relaxation of this problem via the multilinear extension in Problem 2. Both of these problems are formally defined in Section III. To this end, we study the special case of this optimization problem in which each agent can compute the global objective function and the gradient of the objective function; however we assume that each agent only has access to a local, distinct set of actions. Considering these constraints, we develop Constraint-Distributed Continuous Greedy (CDCG), a novel algorithm for solving the continuous relaxation of the distributed submodular optimization problem that achieves a tight approximation of the optimal solution, which is known to be the best possible approximation unless . We offer an analysis of the proposed algorithm and prove that it achieves the tight approximation and that its error term vanishes at a linear rate.

Previous work on the distributed version of this problem can approximate the optimal solution to within a multiplicative factor of via the SGA or DSGA [16], [17], [11]. Algorithms for slightly different settings, such as the setting of [9] in which each node has access to a local objective function which is averaged to form a global objective function, can also achieve the approximation. Similarly, [11] shows that it is possible to achieve the optimal approximation in the centralized setting. However, to the best of our knowledge the CDCG algorithm presented in this paper is the first algorithm that is guaranteed to achieve the approximation of the optimal solution in this distributed setting. The proofs of all lemmas and theorems in this work will be included in the arxiv version of this paper.

## Ii Preliminaries

We review the notation used throughout this paper in Section II-A and state definitions in Section II-B that are necessary for the problem formulations in Section III.

### Ii-a Notation

Throughout this paper, lowercase bold-face (e.g.

) will denote a vector, while uppercase bold-face (e.g.

) will denote a matrix. The component of a vector will be denoted ; the element in the row of the column of a matrix will be denoted by . The inner product between two vectors and will be denoted by and the Euclidean norm of a vector will be denoted by . Given two vectors and , we define as the (vector-valued) component-wise maximum between and ; similarly, will denote the component-wise minimum between and . We will use the notation to denote an -dimensional vector in which each component is zero; similarly will denote an -dimensional vector in which each component is one. Calligraphic fonts will denote sets (e.g. ). Given a set , will denote the carnality of , while will denote the power set of . will represent the indicator function for the set . That is, is the function that takes value one if its argument is an element of and takes value zero otherwise.

### Ii-B Background

Let be a finite set and let be a set function mapping subsets of to the nonnegative real line. In this setting, is commonly referred to as the ground set. The function is called submodular if for every ,

 f(A∩B)+f(A∪B)≤f(A)+f(B).

In essence, submodularity amounts to having a so-called diminishing-returns property, meaning that the incremental value of adding a single element to the argument of is no less than that of adding the same element to a superset of the argument. To illustrate this, we will slightly overburden our notation by defining

 f(x|A):=f(A∪{x})−f(A)

as the marginal reward of given . This gives rise to an equivalent definition of submodularity. In particular, is said to be submodular if for every and ,

 f(x|B)≤f(x|A).

Throughout this paper, we will consider submodular functions that are also monotone, meaning that for every , , and normalized so that .

In practice, one often encounters a constraint on the allowable subsets of the ground set when maximizing a submodular objective function. Concretely, if is a nonempty family of allowable subsets of the ground set , then the tuple is a matroid if the following criteria are satisfied:

1. (Heredity) For any , if , then .

2. (Augmentation) For any , if , then there exists an such that .

Furthermore, if is partitioned into disjoint sets , then is a partition matroid if there exists positive integers such that

 I≡{A:A⊆Y,|A∩Yi|≤αi∀i=1,…,n}.

Partition matroids are particularly useful when defining the constraints of a distributed optimization problem because they can be used to describe a setting in which a ground set of all possible actions is written as the product of disjoint local action spaces .

The notion of submodularity can be extended to the continuous domain [25]. Consider a set , where is a compact subset of for . We call a continuous function submodular if ,

 F(x∨y)+F(x∧y)≤F(x)+F(y).

As in the discrete case, we say that a continuous function is monotone if , implies that . Furthermore, if is differentiable, we say that is -submodular, where stands for “diminishing-returns,” if the gradients are antitone. That is, , is -submodular if implies that .

## Iii Problem Statement

Given the aforementioned applications which emphasize the utility of maximizing submodular functions subject to distributed partition matroid constraints, we formulate the main problem of this paper:

###### Problem 1 (Submodular Maximization Subject to a Distributed Partition Matroid Constraint).

Consider a collection of agents that form the set . Let be a normalized and monotone submodular function and let be pairwise disjoint partition of a finite ground set , wherein each agent can only choose actions from its local strategy set . Consider the partition matroid , where

 (2)

The problem of submodular maximization subject to a distributed partition matroid constraint is to maximize by selecting a set from the family of allowable subsets so that . Formally:

 maximize f(S) (3a) subjectto S∈I (3b)

In effect, the distributed partition matroid constraint in Problem 1 enforces that each agent can choose at most one action from its local strategy set . Note that in this setting, each agent can only choose actions from its own local strategy set. Therefore, this problem is distributed in the sense that agents can only determine the actions taken by other agents by directly communicating with one another.

### Iii-a Sequential greedy algorithm

It is well known that the sequential greedy algorithm (SGA), in which each agent chooses an action sequentially based on

 yi=argmaxy∈Yif(y|Si−1) (4)

where , approximates the optimal solution to within a multiplicative factor of [16]. The drawbacks of this algorithm are twofold. Firstly, as we will show, it is possible to achieve the approximation of the optimal solution. Secondly, as its name suggests, SGA is sequential in nature and therefore it scales very poorly in the number of agents. That is, each agent must wait for each of the previous agents to compute their contribution to the optimal set .

### Iii-B Continuous Extension of Problem 1

Sequential algorithms such as SGA can only achieve a approximation of the optimal solution. To achieve the best possible approximation of the optimal solution, it is necessary to extend Problem 1 to the continuous domain via the so-called multilinear extension of the submodular objective function [14]. Thus, the method we use in this work to achieve the tight approximation relies on the continualization of Problem 1. Importantly, it has been shown that Problem 1 and the optimization problem engendered by lifting Problem 1 to the continuous domain via this multilinear extension yield the same solution [11]. Furthermore, by applying proper rounding techniques, such as those described in Section 5.1 of [9] and in [11] and [26] to the continuous relaxation of Problem 1, one can obtain the tight approximation for Problem 1.

Therefore, our approach in this paper will be to lift Problem 1 to the continuous domain. We formulate this problem in the following way:

###### Problem 2 (Continuous Relaxation of Problem 1).

Consider the conditions of Problem 1. Define the -submodular continuous multilinear extension of the objective function in Problem 1 by

 F(y):=∑S⊆Yf(S)∏i∈Syi∏j∉S(1−yj) (5)

and let be the matroid polytope

where is the family of sets defined in (2). The continuous relaxation of Problem 1 is formally defined by

 maximize F(y) (6a) subjectto y∈P (6b)

Problem 2 is distributed in the sense that each agent is associated with its own distinct continuous strategy space . Formally, the set is defined as

 Pi:=conv{1S:S⊆Ii} (7)

where . In this way, . These sets play similar roles in Problem 2 as the sets do in Problem 1.

Note that Problem 2 is nonconvex, and therefore cannot be solved by classical convex solvers or algorithms. Further, we assume that each agent can compute the multilinear extension of the submodular objective function in (3a) and the gradient of .

## Iv Constraint-Distributed Continuous Greedy

In this section, we present Constraint-Distributed Continuous Greedy (CDCG), a decentralized algorithm for solving Problem 2. The pseudo-code of CDCG is described in Algorithm 1. At a high level, this algorithm involves updating each agent’s local decision variable based on the aggregated belief of a small group of other agents about the best control policy. In essence, inter-agent communication within small groups of agents facilitates local decision making.

For clarity, we introduce a simple framework for the inter-agent communication structure. In CDCG, agents share their decision variables with a small subset of local agents in . To encode the notion of locality, suppose that each agent is a node in a bidirectional communication graph in which denotes the set of edges. Given this structure, we assume that each agent can only communicate its decision variable with its direct neighbors in . Let us denote the neighbor set of agent by . Then the set of edges can be written . We adopt this notation for the remainder of this paper.

### Iv-a Intuition for Cdcg algorithm

The goal of CDCG at a given node is to learn the local decision variable . CDCG is run at each node in to assemble the collection where is a given positive integer; this collection represents an approximate solution to Problem 2 and guarantees that each agent contributes at most one element to the solution. Then, by applying proper rounding techniques to each element of the collection such as those discussed in [9], [11], and [26], we obtain a solution to Problem 1. In the proceeding sections, we show that this solution achieves the tight approximation of the optimal solution.

In the analysis of CDCG, we add the superscript to the vectors and defined in Algorithm 1. This superscript denotes the iteration number so that and represent the values of the local variables and at iteration respectively.

### Iv-B Description of steps for Cdcg

From the perspective of node , CDCG takes two arguments: nonnegative weights for each and a positive integer . The weights correspond to the row in a doubly-stochastic weight matrix and is the number of iterations for which the algorithm will run. The weight matrix is a design parameter of the problem and must fulfill a number of technical requirements that are fully described in Appendix A [9]. Before any computation, the local decision variable is initialized to the zero vector.

Computation proceeds in rounds. In each round, the first step is to calculate the gradient of the multilinear extension function evaluated at the local decision variable from the previous iteration.

In line 3 of Algorithm 1, we calculate the ascent direction at iteration in the following way:

 vti=argmaxx∈Pi∩Ci⟨∇F(yt−1i),x⟩.

Intuitively, one can think of as the vector from the set that is most aligned with . To define the set , first define the set as the set of indices of the elements in that correspond to elements in . Then

 Ci:={x∈R|Y|+:xj=0∀j∉Ji}. (8)

Using this notation, we can equivalently define by

 Pi={x∈R|Y|+:∑j∈Jixj≤1}.

Next, in line 4 of Algorithm 1, is updated; in particular, we set

 yti=∑j∈Ni∪{i}wijyt−1j+nTvti.

In this way, the governing principle is to collaboratively accumulate the local belief about the optimal decision and to then move in the approximate direction of steepest ascent from this point.

After rounds of computation at each node , we obtain a local decision variable at each node. By applying proper rounding techniques, we obtain a decision variable for each agent . Rounding in a decentralized manner is discussed in Section 5.1 of [9]. The rounding techniques of [9] build on “pipage rounding” [11] and “swap rounding” [26], which are both centralized rounding techniques. The collection of these decision variables form the set , which represents our solution to Problem 1.

## V Convergence Analysis

The main result in this paper is to show that in the distributed setting of Problem 2, CDCG achieves a tight multiplicative approximation of the optimal solution. The following theorem summarizes this result.

###### Theorem 1.

Consider the CDCG algorithm described in Algorithm 1. Let denote the global maximizer of the optimization problem defined in Problem 2, and assume that a positive integer and a doubly-stochastic weight matrix are given. Then provided that the assumptions outlined in Appendix A hold, for all nodes , the local variables obtained after iterations satisfy

 F(yTi) ≥(1−1e)F(y∗)−LD22T −LD2(n2+n3/2)+n3/2DGT(1−β). (9)

Succinctly, Theorem 1 means that the sequence of local iterates generated by CDCG achieves the optimal approximation ratio and that the error term vanishes at a linear rate of . That is,

 F(yTi)≥(1−1e)F(y∗)−O(1T(1−β)),

which implies that each agent reaches an objective value larger than after rounds of communication. Previous work can only guarantee an objective value of [16]. The proof of this result will be provided in the arxiv version of this paper.

## Vi Simulation Results

To evaluate the proposed algorithm, we consider a multi-agent area coverage problem. In this setting, each agent is constrained to move in a two-dimensional grid. We assume that each agent has a finite radius so that it can observe those grid points that lie with a square with sidelength . The objective is for the agents to collectively maximize the cardinality of the union of their observation sets of grid points. In other words, given an initial configuration, the problem is to choose an action for each agent that maximizes the overall coverage of the grid.

Consider an initial configuration of agents in states for with the dynamic constraint , where is a control input from a discrete set Elements from this set represent the admissible actions for each agent in the two-dimensional grid.

In our simulation, we compared the performance of SGA against CDCG on the coverage task posed above for a variable number of agents. For simplicity, we assumed that the underlying communication graph used in CDCG was fully connected and that each value in the weight matrix was . A random initialization for each agent’s position and the coverages achieved by CDCG and SGA are shown in Figures 0(a), 0(b), and 0(c) respectively. We compared the performance of these algorithms across ten random initializations of starting locations for the agents; the mean performance of each algorithm and the respective standard deviations are shown in Figure 0(d). In each trial, we ran both algorithms 50 times, each of which produced a control input for each agent. For each initialization, we ran CDCG for iterations.

We also compared the coverages achieved by CDCG and SGA for a setting in which each agent’s starting position is the center of the grid. The results of this experiment are shown in Figure 0(e). We ran both algorithms a total of 15 times; we ran CDCG for iterations. Interestingly, SGA converges to a local maximum in this problem, whereas CDCG achieves the optimal value.

## Vii Conclusion

In this work, we described an approach for achieving the optimal approximation to a class of submodular optimization problems subject to a distributed partition matroid constraint. The algorithm we proposed outperforms the sequential greedy algorithm in two senses:

1. CDCG achieves the tight approximation for the optimal solution whereas SGA can only achieve a approximation.

2. CDCG imposes a limited communication structure on this problem, which allows for significant gains via parallelization. SGA is sequential in nature and therefore is not parallelizable.

We showed empirically via an area coverage simulation with multiple agents that CDCG outperforms the sequential greedy algorithm.

## References

• [1] Y. Hu, H. Chen, J.G. Lou, and J. Li, “Distributed density estimation using non-parametric statistics,” 27th International Conference on Distributed Computing Systems, 2007.
• [2] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause, “Distributed submodular maximization,” Journal of Machine Learning Research, vol. 17, no. 238, pp. 1-44, 2016.
• [3] B. Schlotfeldt, D. Thakur, N. Atanasov, V. Kumar, and G. J. Pappas, “Anytime planning for decentralized multirobot active information gathering,“ IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1025-1032, 2018.
• [4] M. Zhong and C. G. Cassandras, “Distributed coverage control and data collection with mobile sensor networks,” IEEE Transactions on Automatic Control, vol. 56, no. 10, pp. 2445-2455, 2011.
• [5]

A. Singh, A. Krause, C. Guestrin, and W. J. Kaiser, “Efficient informative sensing using multiple robots.” Journal of Artificial Intelligence Research, vol. 34, pp 707-755, 2009.

• [6]

K. Wei, Y. Liu, K. Kirchhoff, and J. Bilmes, “Using document summarization techniques for speech data subset selection,”

Proceedings of NAACL-HLT, pp. 721-726, 2013.
• [7]

D. Golovin and A. Krause, “Adaptive submodularity: theory and applications in active learning and stochastic optimization,”

Journal of Artificial Intelligence Research, vol. 42, pp. 427-486, 2011.
• [8] J. Djolonga, S. Tschiatschek, and A. Krause, “Variational Inference in Mixed Probabilistic Submodular Models,” Advances in Neural Information Processing Systems 29, 2016.
• [9] A. Mokhtari, H. Hassani, and A. Karbasi, “Decentralized submodular maximization: bridging discrete and continuous settings,” arXiv preprint arXiv:1802.03825v1, 2018.
• [10] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. “Distributed submodular maximization: Identifying representative elements in massive data,” Advances in Neural Information Processing Systems, 2013.
• [11] G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák, “Maximizing a monotone submodular function subject to a matroid constraint,” SIAM Journal on Computing, vol. 40, no. 6, pp. 1740-1766, 2011.
• [12] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. “Distributed submodular maximization: Identifying representative elements in massive data.” Advances in Neural Information Processing Systems, 2013.
• [13] R. Barbosa, A. Ene, H. Nguyen, and J. Ward. “The power of randomization: Distributed submodular maximization on massive datasets.” International Conference on Machine Learning, pp. 1236-1244, 2015
• [14] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of approximations for maximizing submodular set functions-I.” Mathematical Programming, vol. 15, no. 1., pp. 265-294, 1978.
• [15] G. L. Nemhauser and L. A. Wolsey, “Best algorithms for approximating the maximum of a submodular set function.” Mathematics of Operations Research, vol. 3, no. 3, pp. 177-188, 1978.
• [16] B. Gharesifard and S. L. Smith, “Distributed submodular maximization with limited information.” IEEE Transactions on Control of Network, vol. 5, no. 4, pp. 1635-1645, 2017.
• [17] M. Corah and N. Michael, “Distributed submodular maximization on partition matroids for planning on large sensor networks.” IEEE Conference on Decision and Control (CDC), pp. 6792-6799, 2018.
• [18] D. Grimsman, M.S. Ali, J.P. Hespanha, and J.R. Marden, “The Impact of Information in Greedy Submodular Maximization,” IEEE Transactions on Control of Network Systems, 2017.
• [19] M. Corah and N. Michael, “Efficient online multi-robot exploration via distributed sequential greedy assignment,” Robotics: Science and Systems, 2017.
• [20] H. Hassani, M. Soltanolkotabi, and A. Karbasi, “Gradient methods for submodular maximization,” Advances in Neural Information Processing Systems, pp. 5841–5851. 2017.
• [21] M. Mokhtari, H. Hassani, and A. Karbasi, “Stochastic conditional gradient methods: From convex minimization to submodular maximization,” arXiv preprint arXiv:1804.09554, 2018.
• [22] N. Buchbinder, M. Feldman, J. Seffi, and R. Schwartz. “A tight linear time (1/2)-approximation for unconstrained submodular maximization,” SIAM Journal on Computing, pp. 1384–1402, 2015.
• [23] N. Buchbinder, M. Feldman, J. S. Naor, and R. Schwartz, “Submodular maximization with cardinality constraints,” Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pp. 1433-1452. Society for Industrial and Applied Mathematics, 2014.
• [24] A. Clark, B. Alomair, L. Bushnell, and R. Poovendran, “Scalable and distributed submodular maximization with matroid constraints,” 2015 13th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), pp. 435-442, 2015.
• [25] L. A. Wolsey, “An analysis of the greedy algorithm for the submodular set covering problem,” Combinatorica, 1982.
• [26] C. Chekuri, J. Vondrak, and R. Zenklusen. “Submodular function maximization via the multilinear relaxation and contention resolution schemes,” SIAM Journal on Computing, pp. 1831-1879, 2014.

## Appendix A Assumptions for Theorem 1

Consider the continuous relaxation of Problem 1 that was described in Section III-B. We assume that the Euclidean distance between elements of the convex set are uniformly bounded, i.e. that

 ||x−y||≤D∀x,y∈P. (10)

This is a trivial consequence of the multilinear extension , since is contained in the unit cube. Furthermore, we assume that the gradient of the multilinear extension of the objective function in Problem 1 is -Lipschitz continuous, i.e. that

 ||∇F(x)−∇F(y)||≤L||x−y||∀x,y∈P (11)

so that by (10). Again, this is not a limiting assumption, because the domain of is compact, which implies the Lipschitzness of . Also, we assume that the norm of the gradient of is bounded over , i.e. that

 ||∇F(x)||≤G∀x∈P, (12)

which again follows from the compactness of the domain of . It is then easy to show that (12) and the multivariable mean value theorem imply that is -Lipschitz continuous over .111Note that in this case, since is the multilinear extension of , assumptions (10), (11), and (12) all hold. Moreover, the constants , , and all depend on the maximum singleton value of . For further justification, see [20]-[21]. Finally, it will be prudent to mention that for the multilinear extension of any monotone and submodular function , it holds that and

 (13)

For justification, see [11].

Now consider the communication framework described in Section IV and the weight matrix . This matrix is a parameter that is designed to match the criteria and setting of a given application. We assume that the weights used in CDCG are nonnegative so that ; furthermore, if node , then . Also, we assume that the weight matrix is doubly stochastic and symmetric, and that . The assumptions made about are similar to those described in [9].

Lastly, consider that past work has studied the case in which the objective function is distributed [9]. However, our setting is one in which the problem is distributed in the constraints rather than the objective. Therefore, we assume that each agent has access to an oracle for computing the objective submodular function .

## Appendix B Preliminary Lemmas

In this appendix, we offer proofs of lemmas that support the proof of Theorem 1.

222The proofs for Lemmas 1 and 2 are similar to those that originally appeared in [9].

In general, the goal of Lemma 1 is to show that the local decision variable for each agent converges to the mean . Then, in Lemma 2, we show that these means are Cauchy, meaning that for a sufficiently large number of iterations , the distance between and becomes arbitrarily small. Together, Lemma 1 and Lemma 2 establish that for a sufficiently large number of iterations, the set of nodes come to a consensus for the optimal decision. Lemmas 4 and Lemma 5 are technical results used in the proof of Theorem 1.

###### Lemma 1.

For any iteration where , it follows that the Euclidean distance between the local variable at node and the mean of the local variables can be bounded by

 ∣∣∣∣yti−¯yt∣∣∣∣≤n3/2DT(1−β)

where

is the magnitude of the eigenvalue of

that among all eigenvalues in has the second largest magnitude.

###### Proof.

Define and as the concatenations of the local variables and descent directions in CDCG. The update rule in step 2 in Algorithm 1 leads to the expression

 ytcon=nTt−1∑s=0(W⊗I)t−1−svscon (14)

Next, if we premultiply both sides of (14) by the matrix , which is the Kronecker product of the matrices and , we obtain

 (15)

The left hand side of (15) can be simplified to

 (1n1†nn⊗I)ytcon=¯ytcon (16)

where . Combining (16) and the equality , we can write (15) as

 ¯ytcon=nTt−1∑s=0(1n1†nn⊗I)vscon. (17)

Using the expressions in (14) and (17), we can derive an upper bound on the difference by

 ∣∣∣∣ytcon−¯yt% con∣∣∣∣ =nT∣∣ ∣∣∣∣ ∣∣t−1∑s=0[(Wt−1−s−1n1†nn)⊗I]vscon∣∣ ∣∣∣∣ ∣∣ ≤nDT∣∣ ∣∣∣∣ ∣∣Wt−1−s−1n1†nn∣∣ ∣∣∣∣ ∣∣, (18)

where the first inequality follows from the Cauchy-Schwartz inequality and the fact that the norm of a matrix does not change if we Kronecker it by the identity matrix. The second inequality holds because

. Note that the eigenvectors of the matrices

and are the same for all . Therefore, the largest eigenvalue of is 1 with eigenvector and the second largest magnitude of the eigenvalues is , where is the second largest magnitude of the eigenvalues of . Also note that because is an eigenvector of , it follows that all of the other eigenvectors of are orthogonal to since is symmetric. Hence we can bound the norm by . Applying this substitution to the right hand side of (18) yields

 ∣∣∣∣ytcon−¯yt% con∣∣∣∣ ≤nDTt−1∑s=0βt−1−s ≤nDT(1−β). (19)

Since , we find that

 ∣∣∣∣ytj−¯yt∣∣∣∣ ≤n∑i=1∣∣∣∣yti−¯yt∣∣∣∣ ≤√n(n∑i=1∣∣∣∣yti−¯yt∣∣∣∣2)1/2 ≤n3/2DT(1−β) (20)

where inequality (20) follows from (19). ∎

###### Lemma 2.

For any iteration for , the Euclidean distance between the means and of the local variables and respectively for at consecutive iterations and can be bounded by

 ∣∣∣∣¯yt−¯yt−1∣∣∣∣2≤DT. (21)
###### Proof.

Averaging both sides of the update rule for of Algorithm 1 across the set of agents yields the following expression for :

 ¯yt =1nn∑i=1∑j∈Ni∪{i}wijyt−1j+1Tn∑i=1vti. (22)

Since if , we can rewrite the RHS of (22) in the following way:

 ¯yt =1nn∑i=1n∑j=1wijyt−1j+1Tn∑i=1vti =1nn∑j=1yt−1jn∑i=1wij+1Tn∑i=1vti =1nn∑j=1yt−1j+1Tn∑i=1vti (23)

where (23) follows since . Rearranging (23), it follows that

 ∣∣∣∣¯yt−¯yt−1∣∣∣∣=1T∣∣ ∣∣∣∣ ∣∣n∑i=1vti∣∣ ∣∣∣∣ ∣∣≤DT

Note that because the Euclidean distance between points of the polytope are assumed to be bounded, . The expression in (21) follows. ∎

###### Corollary 3.

Let . Then the vector is in the constraint set .

###### Proof.

In Lemma 1 we proved that converges to . We show that by induction. Because we assign , it is clear that . Now as inductive hypothesis, we assume that is in . Observe that we can write . Thus by the inductive hypothesis and the fact that , it follows that is a convex combination of elements of . That is, we can write . Therefore , and so converges to a point in . ∎

###### Lemma 4.

Let be the multilinear extension of a monotone submodular function where is a discrete ground set. Then

 maxv∈Pi∩Ci⟨∇F(yi),v⟩=maxx∈Pi⟨[∇F(yi)]ci,x⟩ (24)

where denotes the projection of onto the set .

###### Proof.

Consider the definitions of and in (7) and (8) respectively. Maximizing over results in the same value as maximizing the inner product of the projection of onto the set over . ∎

###### Lemma 5.

Let be the multilinear extension of a monotone submodular function where is a discrete ground set. Then

 ∣∣ ∣∣∣∣ ∣∣∇F(¯yt)−n∑i=1[∇F(yti)]Ci∣∣ ∣∣∣∣ ∣∣≤n3/2DLT(1−β) (25)
###### Proof.

Observe that

 ∣∣ ∣∣∣∣ ∣∣∇F(¯yt)−n∑i=1[∇F(yti)]Ci∣∣ ∣∣∣∣ ∣∣ ≤∣∣ ∣∣∣∣ ∣∣n∑i=1([∇F(¯yt)]Ci−[∇F(yti)]Ci)∣∣ ∣∣∣∣ ∣∣ (26) ≤n∑i=1∣∣∣∣∇F(¯yt)−∇F(yti)∣∣∣∣ (27) ≤n3/2DLT(1−β) (28)

where (26) follows from the triangle inequality, (27) follows by the definition of the set , and (28) follows from the assumption that is -Lipschitz continuous and from Lemma 1. ∎

## Appendix C Proof of Theorem 1

This Appendix establishes the main result of this paper.

###### Proof.

Due to the assumption that is -Lipschitz,

 F(¯yt+1)−F(¯yt) ≥⟨∇F(¯yt),¯yt+1−¯yt⟩−L2∣∣∣∣¯yt+1−¯yt∣∣∣∣2 ≥⟨∇F(¯yt),¯yt+1−¯yt⟩−LD22T2 (29)

where (29) follows from Lemma 2. Now consider that the inner-product term on the RHS of (29) can be written in the following way:

 ⟨∇F(¯yt),¯yt+1−¯yt⟩ =⟨∇F(¯yt),1Tn∑i=1vt+1i⟩ =1Tn∑i=1[⟨∇F(¯yt)−∇F(yti),vt+1i⟩ +⟨∇F(yti),vt+1i⟩]. (30)

Here (30) follows from the linearity of inner products and then from adding and subtracting . Our immediate goal is to bound (30) from below. To do so, consider that by the Cauchy-Schwartz inequality,

 ⟨∇F(¯yt)−∇F(yti),vt+1i⟩ ≤∣∣∣∣∇F(¯yt)−∇F(yti)∣∣∣∣⋅∣∣∣∣vt+1i∣∣∣∣ ≤LD∣∣∣∣¯yt−yti∣∣∣∣ (31) ≤n3/2LD2T(1−β) (32)

where (31) is due to the assumption that is -Lipschitz continuous and (32) follows from Lemma 1. Next, because is defined as the argmax between and vectors in the Step 3 of Algorithm 1 and by Lemma 4 we have

 ⟨∇F(yti),vt+1i⟩ ≥⟨[∇F(yti)]Ci,y∗⟩. (33)

By Lemma 5, if we let , we can conclude that

 −ϵ1+∇F(¯yt)≤n∑i=1[∇F(yti)]Ci≤∇F(¯yt)+ϵ1. (34)

By construction, since . Then we can infer from (34) that

 ⟨n∑i=1[∇F(yti)]Ci,y∗⟩≥⟨−ϵ1,y∗⟩+⟨∇F(¯yt),y∗⟩. (35)

Our goal is to bound (35). To do this, consider that by (10) and since . Since , we have . Thus by replacing , we conclude that

 ⟨n∑i=1[∇F(yti)]Ci,y∗⟩ ≥⟨∇F(¯yt),y∗⟩−n2D2LT(1−β) ≥F(y∗)−F(¯yt)−n2D2LT(1−β). (36)

Altogether, we have shown via (32), (33), and (36) that (30) can be bounded by

 ⟨∇F(¯yt),yt+1i−yti⟩ ≥1T[F(y∗)−F(¯yt)−LD2(n2+