Equilibrium Computation and Robust Optimization in Zero Sum Games with Submodular Structure

10/03/2017 ∙ by Bryan Wilder, et al. ∙ University of Southern California 0

We define a class of zero-sum games with combinatorial structure, where the best response problem of one player is to maximize a submodular function. For example, this class includes security games played on networks, as well as the problem of robustly optimizing a submodular function over the worst case from a set of scenarios. The challenge in computing equilibria is that both players' strategy spaces can be exponentially large. Accordingly, previous algorithms have worst-case exponential runtime and indeed fail to scale up on practical instances. We provide a pseudopolynomial-time algorithm which obtains a guaranteed (1 - 1/e)^2-approximate mixed strategy for the maximizing player. Our algorithm only requires access to a weakened version of a best response oracle for the minimizing player which runs in polynomial time. Experimental results for network security games and a robust budget allocation problem confirm that our algorithm delivers near-optimal solutions and scales to much larger instances than was previously possible.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Submodular functions are ubiquitous due to wide-spread applications ranging from machine learning, to viral marketing, to mechanism design. Intuitively, submodularity captures diminishing returns (formalized later). In this paper, we use techniques rooted in submodular optimization to solve previously intractable zero-sum games. We then show how to instantiate our algorithm for two specific games, including the robust optimization of a submodular objective.

As an example, consider the network security game introduced by Tsai et al. Tsai et al. (2010). A defender can place checkpoints on edges of a graph. An attacker aims to travel from a source node to any one of several targets without being intercepted. Each player has an exponential number of strategies since the defender may choose any set of

edges and the attacker may choose any path. Hence, previous approaches to computing the optimal defender strategy were either heuristics with no approximation guarantee, or else provided guarantees but ran in worst-case exponential time

(Jain et al., 2011; Iwashita et al., 2016).

However, this game has useful structure. The defender’s best response to any attacker mixed strategy is to select the edges which are most likely to intersect the attacker’s chosen path. Computing this set is a submodular optimization problem (Jain et al., 2013). We give a general algorithm for computing approximate minimax equilibria in zero-sum games where the maximizing player’s best response problem is a monotone submodular function. Our algorithm obtains a approximation (modulo an additive loss of ) to the maximizing player’s minimax strategy. This algorithm runs in pseudopolynomial time even when both action spaces are exponentially large given access to a weakened form of a best response oracle for the adversary. Pseudopolynomial means that the runtime bound depends polynomially on largest value of any single item (which we expect to be a constant for most cases of interest). Our algorithm approximately solves a non-convex, non-smooth continuous extension of the problem and then rounds the solution back to a pure strategy in a randomized fashion. To our knowledge, no subexponential algorithm was previously known for this problem with exponentially large strategy spaces. Our framework has a wide range of applications, corresponding to the ubiquitous presence of submodular functions in artificial intelligence and algorithm design (see Krause and Golovin Krause and Golovin (2014) for a survey).

One prominent class of applications is robust submodular optimization. A decision maker is faced with a set of submodular objectives . They do not know which objective is the true one, and so would like to find a decision maximizing . Robust submodular optimization has many applications because uncertainty is so often present in decision-making. We start by studying the randomized version of this problem, where the decision maker may select a distribution over actions such that the worst case expected performance is maximized (Krause et al., 2011; Chen et al., 2017; Wilder et al., 2017). This is equivalent to computing the minimax equilibrium for a game where one player has a submodular best response. Our techniques for solving such games also yield an algorithm for the deterministic robust optimization problem, where the decision maker must commit to a single action. Specifically, we obtain bicriteria approximation guarantees analogous to previous work (Krause et al., 2008) under significantly more general conditions.

We make three contributions. First, we define the class of submodular best response (SBR) games, which includes the above examples. Second, we introduce the EQUATOR algorithm to compute approximate equilibrium strategies for the maximizing player. Third, we give example applications of our framework to problems with no previously known approximation algorithms. We start out by showing that network security games (Tsai et al., 2010) can be approximately solved using EQUATOR. We then introduce and solve the robust version of a classical submodular optimization problem: robust maximization of a coverage function (which includes well-known applications such as budget allocation and sensor placement). Finally, we experimentally validate our approach for network security games and robust budget allocation. We find that EQUATOR produces near-optimal solutions and easily scales to instances that are too large for previous algorithms to handle.

Problem description

Formulation: Let be a set of items with . A function is submodular if for any and , . We restrict our attention to functions that are monotone, i.e., for all . Without loss of generality, we assume that and hence . Let be a finite set of submodular functions on the ground set . may be exponentially large. Let

denote the set of probability distributions over the elements of any set

. Oftentimes, we will work with independent distributions over

, which can be fully specified by a vector

. gives the marginal probability that item is chosen. Denote by the independent distribution with marginals . Let be a collection of subsets of . For instance, we could have . We would like to find a minimax equilibrium of the game where the maximizing player’s pure strategies are the subsets in , and the minimizing player’s pure strategies are the functions in . The payoff to the strategies and is . We call a game in this form a submodular best response (SBR) game. For the maximizing player, computing the minimax equilibrium is equivalent to solving

(1)

where denotes that is distributed according to .

Example: network security games. To make the setting more concrete, we now introduce one of our example domains, the network security game of Tsai et al. Tsai et al. (2010). There is a graph . There is a source vertex (which may be a supersource connected to multiple real sources) and a set of targets . An attacker wishes to traverse the network starting from the source and attack a target. Each target has a value . The attacker picks a path for some . The defender attempts to catch the attacker by protecting edges of the network. The defender may select any edges, and the attacker is caught if any of these edges lies on the chosen path. We use the normalized utilities defined by Jain et al. Jain et al. (2013), which give the defender utility if an attack on is intercepted and 0 if the attack succeeds. Thus, each path from to for the attacker induces an objective function for the defender: for any set of edges , if , otherwise . is easily seen to be submodular (Jain et al., 2013). Hence, we have a SBR game with and .

Allowable pure strategy sets: Our running example is when the pure strategies of the maximizing player are all size subsets: . In general, our algorithm works when is any matroid; this example is called the uniform matroid. We refer to (Korte et al., 2012)

for more details on matroids. Here, we just note that matroids are a class of well-behaved constraint structures which are of great interest in combinatorial optimization. A useful fact is that any linear objective can be exactly optimized over a matroid by the greedy algorithm. For instance, consider the above uniform matroid. If each element

has a weight , the highest weighted set of size is obtained simply by taking the items with highest individual weights. Let be the size of the largest pure strategy. E.g., in network security games is the number of defender resources. In general, is the rank of the matroid.

We now introduce some notation for the continuous extension of the problem. Let be the indicator vector of the set (i.e., an -dimensional vector with 1 in the entries of elements that are in and 0 elsewhere). Let be the convex hull of . Note that is a polytope.

Best response oracles: A best response oracle for one player is a subroutine which computes the pure strategy with highest expected utility against a mixed strategy for the other player. We assume that an oracle is available for the minimzing player. However, we require only a weaker oracle, which we call an best response to independent distributions oracle (BRI). A BRI oracle is only required to compute a best response to mixed strategies which are independent distributions, represented as the marginal probability that each item in appears. Given a vector , where is the probability that element is chosen, a BRI oracle computes . We use to denote that is drawn from the independent distribution with marginals . As we will see later, sometimes a BRI oracle is readily available even when the full best response is NP-hard.

Robust optimization setting: One prominent application of SBR games is robust submodular optimization. Robust optimization models decision making under uncertainty by specifying that the objective is not known exactly. Instead, it lies within an uncertainty set which represents the possibilities that are consistent with our prior information. Our aim is to perform well in the worst case over all objectives in . We can view this as a zero sum game, where the decision maker chooses a distribution over actions and nature adversarially chooses the true objective from . A great deal of recent work has been devoted to the setting of randomized actions, both because randomization can improve worst-case expected utility (Delage et al., 2016), and because the randomized version often has much better computational properties (Krause et al., 2011; Orlin et al., 2016). Randomized decisions also naturally fit a problem setting where the decision maker will take several actions and wants to maximize their total reward. Any single action might perform badly in the worst case; drawing the actions from a distribution allows the decision maker to hedge their bets and perform better overall.

Previous work

We discuss related work in two areas. First, solving zero-sum games with exponentially large strategy sets. Efficient algorithms are known only for limited special cases. One approach is to represent the strategies in a lower dimensional space (the space of marginals). We elaborate more below since our algorithm uses this approach. For now, we just note that previous work (Ahmadinejad et al., 2016; Xu, 2016; Chan et al., 2016) requires that the payoffs be linear in the lower dimensional space. Linearity is a very restrictive assumption; ours is the first algorithm which extends the marginal-based approach to general submodular functions. This requires entirely different techniques.

In practice, large zero sum games are often solved via the double oracle algorithm (McMahan et al., 2003; Bosansky et al., 2014; Bosanskỳ et al., 2015; Halvorson et al., 2009). Double oracle starts with each player restricted to only a small number of arbitrarily chosen pure strategies and repeatedly adds a new strategy for each player until an equilibrium is reached. The new strategies are chosen to be each player’s best response to the other’s current mixed strategy. This technique is appealing when equilibria have sparse support, and so only a few iterations are needed. However, it is easy to give examples where every pure strategy lies in the support of the equilibrium, so double oracle will require exponential runtime. Our algorithm runs in guaranteed polynomial time.

Second, we give more background on robust submodular optimization. Krause et al. Krause et al. (2008) introduced the problem of maximizing the minimum of submodular functions, which corresponds to Problem 1 with the maximizing player restricted to pure strategies. They show that the problem is inapproximable unless P = NP. They then relax the problem by allowing the algorithm to exceed the budget constraint (a bicriteria guarantee). Our primary focus is on the randomized setting, where the algorithm respects the budget constraint but chooses a distribution over actions instead of a pure strategy. This randomized variant was studied by Wilder et al. Wilder et al. (2017) for the special case of influence maximization. Krause et al. Krause et al. (2011) and Chen et al. Chen et al. (2017) studied general submodular functions using very similar techniques: both iterate dynamics where the adversary plays a no-regret learning algorithm and the decision maker plays a greedy best response. This algorithm maintains a variable for every function in and so is only computationally tractable when is small. By contrast, we deal with the setting where is exponentially large. However, we lose an extra factor of in the approximation ratio.

We also extend our algorithm to obtain bicriteria guarantees for the deterministic robust submodular optimization problem (where we select a single feasible set). Our guarantees apply under significantly more general conditions than those of Krause et al. Krause et al. (2008) but have weaker approximation guarantee; details can be found in the discussion after Theorem 4.

Preliminaries

We now introduce techniques our algorithm builds on.

Multilinear extension: We can view a set function as being defined on the vertices of the hypercube . Each vertex is the indicator vector of a set. A useful paradigm for submodular optimization is to extend to a continuous function over which agrees with at the vertices. The multilinear extension is defined as

Equivalently, . That is, is the expected value of on sets drawn from the independent distribution with marginals . can be evaluated using random sampling (Calinescu et al., 2011) or in closed form for special cases (Iyer et al., 2014). Note that for any set and its indicator vector , . One crucial property of is up-concavity (Calinescu et al., 2011). That is, is concave along any direction (where denotes element-wise comparison). Formally, a function is up-concave if for any and any , is concave as a function of .

Correlation gap: A useful property of submodular functions is that little is lost by optimizing only over independent distributions. Agrawal et al. Agrawal et al. (2010) introduced the concept of the correlation gap, which is the maximum ratio between the expectation of a function over an independent distribution and its expectation over a (potentially correlated) distribution with the same marginals. Let be the set of distributions with marginals . The correlation gap of a function is defined as

For any submodular function . This says that, up to a loss of a factor , we can restrict ourselves to independent distributions when solving Problem 1.

Swap rounding: Swap rounding is an algorithm developed by Chekuri et al. Chekuri et al. (2010) to round a fractional point in a matroid polytope to an integral point. We will use swap rounding to convert the fractional point obtained from the continuous optimization problem to a distribution over pure strategies. Swap rounding takes as input a representation of a point as a convex combination of pure strategies. It then merges these sets together in a randomized fashion until only one remains. For any submodular function and its multilinear extension , the random set satisfies . I.e., swap rounding only increases the value of any submodular function in expectation.

Algorithm for SBR games

In this section, we introduce the EQUATOR (EQUilibrium via stochAsTic frank-wOlfe and Rounding) algorithm for computing approximate equilibrium strategies for the maximizing player in SBR games. Since the pure strategy sets can be exponentially large, it is unclear what it even means to compute an equilibrium: representing a mixed strategy may require exponential space. Our solution to this dilemma is to show how to efficiently sample

pure strategies from an approximate equilibrium mixed strategy. This suffices for the maximizing player to implement their strategy. Alternatively, we can build an approximate mixed strategy with sparse support by drawing a polynomial number of samples and outputing the uniform distribution over the samples. In order to generate these samples, EQUATOR first solves a continuous optimization problem, which we now describe.

The marginal space: A common meta-strategy for solving games with exponentially large strategy sets is to work in the lower-dimensional space of marginals. I.e., we keep track of only the marginal probability that each element in the ground set is chosen. To illustrate this, let be a distribution over the pure strategies , and denote a vector giving the marginal probability of selecting each element of in a set drawn according to . Note that is -dimensional while could have dimension up to . Previous work has used marginals for linear objectives. A linear function with weights satisfies , so keeping track of only the marginal probabilities is sufficient for exact optimization. However, submodular functions do not in general satisfy this property: the utilities will depend on the full distribution , not just the marginals . We will treat a given marginal vector as representing an independent distribution where each is present with probability (i.e., compactly represents the full distribution ). The expected value of under any submodular function is exactly given by its multilinear extension, which is a continuous function.

Continuous extension: Let be the pointwise minimum of the multilinear extensions of the functions in . Note that for any marginal , is exactly the objective value of for Problem 1. Hence, optimizing over all is equivalent to solving Problem 1 restricted to independent distributions. Via the correlation gap, this restriction only loses a factor : if the optimal full distribution is , then the independent distribution with the same marginals as has at least of of ’s value under any submodular function. Previous algorithms (Calinescu et al., 2011; Bian et al., 2017) for optimizing up-concave functions like do not apply because is nonsmooth (see below). We introduce a novel Stochastic Frank-Wolfe algorithm which smooths the objective with random noise. Its runtime does not depend directly on at all; it only uses BRI calls.

Rounding: Once we have solved the continuous problem, we need a way of mapping the resulting marginal vector to a distribution over the pure strategies . Notice that if we simply sample items independently according to , we might end up with an invalid set. For instance, in the uniform matroid which requires , an independent draw could result in more than items even if . Hence, we sample pure strategies by running the swap rounding algorithm on . In order to implement the maximizing player’s equilibrium strategy, it suffices to simply draw a sample whenever a decision is required. If a full description of the mixed strategy is desired, we show that it is sufficient to draw independent samples via swap rounding and return the uniform distribution over the sampled pure strategies.

To sum up, our strategy is as follows. First, solve the continuous optimization problem to obtain marginal vector . Second, draw sampled pure strategies by running randomized swap rounding on .

Solving the continuous problem

The linchpin of our algorithmic strategy is solving the optimization problem . In this section, we provide the ingredients to do so.

Properties of : We set the stage with four important properties of (proofs are given in the supplement). First, while is not in general concave, it is up-concave:

Lemma 1.

If are up-concave functions, then is up-concave as well.

The proof is similar to the proof that the minimum of concave functions is concave. Up-concavity of is the crucial property that enables efficient optimization.

1:
2://Stochastic Frank-Wolfe algorithm
3:for  do
4:     for  do
5:         Draw
6:          BRI()
7:         
8:     end for
9:     
10:     
11:     
12:end for
13:
14://Sample from equilibrium mixed strategy
15:Return samples of SwapRound()
Algorithm 1 EQUATOR

Second, is Lipschitz. Specifically, let be the maximum value of any single item. It can be shown that since (intuitively), the gradient of is related to the marginal gain of items under . From this we derive

Lemma 2.

is -Lipschitz in the norm.

Third, is not smooth. For instance, it is not even differentiable at points where the minimizing function is not unique. This complicates the problem of optimizing and renders earlier algorithms inapplicable.

Fourth, at any point where the minimizing function is unique, . Hence, we can compute by calling the BRI to find , and then computing . In general, can be computed by random sampling (Calinescu et al., 2011), and closed forms are known for particular cases (Iyer et al., 2014).

Randomized smoothing: We will solve the continuous problem . Known strategies for optimizing up-concave functions (Bian et al., 2017) rely crucially on being smooth. Specifically, must be Lipschitz continuous. Unfortunately, is not even differentiable everywhere. Even between two points and where is differentiable, and can be arbitrarily far apart if . No previous work addresses nonsmooth optimization of an up-concave function.

To resolve this issue, we use a carefully calibrated amount of random noise to smooth the objective. Let be the uniform distribution over the ball of radius . We define the smoothed objective which averages over the region around . This (and similar) techniques have been studied in the context of convex optimization (Duchi et al., 2012). We show that is a good smooth approximator of .

Lemma 3.

has the following properties:

  • is up-concave.

  • .

  • is differentiable, with .

  • is Lipschitz continuous in the norm.

Hence, we can use as a better-behaved proxy for since it is both smooth and close to everywhere in the domain. The main challenge is that and its gradients are not available in closed form. Accordingly, we randomly sample values of the perturbation and average over the value of (or its gradient) at these sampled points.

Stochastic Frank-Wolfe algorithm (SFW)

We propose the SFW algorithm (Algorithm 1) to optimize . SFW generates a series of feasible points , where

is the number of iterations. Each point is generated from the last via two steps. First, SFW estimates the gradient of

. Second, it takes a step towards the point in which is furthest in the direction of the gradient. To carry out these steps, SFW requires three oracles. First, a linear optimization oracle which, given an objective , returns . In the context of our problem, outputs the indicator vector of the set which maximizes the linear objective . can be efficiently found via the greedy algorithm. The other two oracles concern gradient evaluation. One is the BRI oracle discussed earlier. The other is a stochastic first-order oracle which, for any function and point

, returns an unbiased estimate of

.

The algorithm starts at . At each iteration , it averages over calls to to compute a stochastic approximation to (Lines 4-9). For each call, it draws a random perturbation and uses the BRI to find the minimizing at . It then queries for an estimate of . Lastly, it takes a step in the direction of by setting (Lines 10-11). Since at each iteration is a combination of vertices of , the output is guaranteed to be feasible. The intuition for why the algorithm succeeds is that it only moves along nonnegative directions (since is always nonnegative). This is in contrast to gradient-based algorithms for concave optimization, which move in the (possibly negative) direction . As an up-concave function, is concave along all nonegative directions. By moving only in such directions we inherit enough of the nice properties of concave optimization to obtain a approximation.

A small technical detail is that adding random noise could result in negative values, for which the multilinear extension is not defined. To circumvent this, we start the algorithm at (i.e., with small positive values in every entry) and then return (Line 13).

Theoretical bounds

Let be the runtime of the linear optimization oracle and be the runtime of the first-order oracle. We prove the following guarantee for SFW:

Theorem 1.

For any , there are parameter settings such that SFW finds a solution satisfying with probability at least . Its runtime is 111The notation hides logarithmic terms.

We remark that is small since linear optimization over can be carried out by a greedy algorithm. For instance, the runtime is for the uniform matroid, which covers many applications. is typically dominated by the runtime of the BRI since it is known how to efficiently compute the gradient of a submodular function (Calinescu et al., 2011; Iyer et al., 2014).

Based on this result, we show the following guarantee on a single randomly sampled set that EQUATOR returns after applying swap rounding to the marginal vector .

Theorem 2.

With , EQUATOR outputs a set such that with probability at least . Its time complexity is the same as SFW.

Proof.

Suppose that is the distribution achieving the optimal value for Problem 1. Let be the optimizer for the problem . That is, can be interpreted as the marginals of the independent distribution which maximizes . With slight abuse of notation, let be the independent distribution with the same marginals as . By applying the correlation gap to each and taking the , we have

By definition of , . Hence, . Via Theorem 1, the marginal vector that our algorithm finds satisfies . Lastly, Chekuri et al. Chekuri et al. (2010) show that swap rounding outputs an independent set satisfying for any , which completes the proof. ∎

This guarantee is sufficient if we just want to implement the maximizing player’s strategy by sampling an action. We also prove that if a full description of the maximizing player’s mixed strategy is desired, drawing a small number of independent samples via swap rounding suffices:

1:Run EQUATOR to obtain .
2:for  do
3:     run SwapRound() times, yielding .
4:     
5:end for
6:return
Algorithm 2 Efficient bicriteria approximation
Theorem 3.

Draw samples using independent runs of randomized swap rounding. The uniform distribution on these samples is a approximate equilibrium strategy for the maximizing player with probability at least . The runtime is .

This also gives a simple way of obtaining a single feasible set (pure strategy) which has a bicriteria guarantee for the robust optimization problem. As pointed out by Chen et al. Chen et al. (2017), since the are all monotone, taking the union of the sets output by swap rounding gives a single set with at least as much value. Algorithm 2 implements this procedure. It first solves the fractional problem by running EQUATOR. Then, it carries out a series of independent iterations. Each iteration draws sets via swap rounding and stores their union . It then returns the best of the . Via our concentration bound for the distribution produced in each iteration (Theorem 3), each iteration succeeds in producing a “good” set with probability at least . Algorithm 2 runs iterations so that at least one succeeds with probability at least .

Theorem 4.

Algorithm 2 returns a single set which is the union of at most elements of and satisfies with probability at least .

The strongest existing bicriteria guarantee is for the SATURATE algorithm of Krause et al. Krause et al. (2008), which outputs a set matching the optimal value with size . Our maintains logarithmic dependence on , but also contains dependence on . Moreoever, it is only a -approximation to the optimal solution quality. However, our result is much more general than that of Krause et al. and handles situations that SATURATE cannot. First, our result applies when is accessible only through an oracle, where SATURATE relies on explicitly enumerating the functions. Second, our result applies when is any matroid, where SATURATE applies only to cardinality-constrained problems. To our knowledge, this is the first computationally efficient bicriteria algorithm under either condition.

Improving the approximation ratio

In this section, we examine the conditions under which it is possible to improve EQUATOR’s -approximation to . The earlier analysis lost a factor in two places: the use of the correlation gap to bound the loss introduced by only tracking marginals, and the use of SFW to solve the continuous relaxation. While the second factor is difficult to improve, we can eliminate the loss from the correlation gap when a stronger best response oracle for the adversary is available. Specifically, we define a best response to mixture of independent distributions (BRMI) oracle to be an algorithm which, given a list of marginal vectors , outputs

1:Set
2:Use SFW to solve the problem , obtaining
3:Set
4:for  do
5:     Draw sets independently as SwapRound().
6:end for
7:Return the uniform distribution on .
Algorithm 3 EQUATOR with improved approximation guarantee

We will be interested in BRMI oracles which take time polynomial in . As the name implies, a BRMI oracle can compute adversary best responses to any distribution which is explicitly represented as a mixture of independent distributions with given marginals. By contrast, a BRI is restricted to a single independent distribution. A BRMI is a considerably more powerful oracle because, with sufficiently large , any distribution can be arbitrarily well-approximated by a mixture of independent distributions (a statement which is formalized below). Hence, the algorithm we propose maintains copies of the decision variables for a value of which will be set later. We aim to maximize

which we recognize as being equivalent to the problem

(2)

It is easy to check that is an up-concave function which inherits all of the smoothness properties of the . Hence, we can use SFW to obtain a -approximate solution to Problem 2 provided that we have a BRMI oracle with which to compute gradients. After solving Problem 2, we can use swap rounding to produce feasible sets with guaranteed approximation ratio. For a single set, we first select a uniformly at random and then run swap rounding on . To output a full distribution, as in Theorem 3, we draw samples from each of the and then output the uniform distribution over the combined set of samples. The extra logarithmic dependence on ensures that we can take a final union bound over the batches of swap rounding. The entire procedure is summarized in Algorithm 3. We let be an upper bound on the value of for any feasible set: . Note that always holds via submodularity, but tighter bounds might apply for particular functions.

We have the following approximation guarantee for Algorithm 3. We note that the idea of optimizing over a mixture of independent distributions has been used in (Dughmi and Xu, 2017), but we prove Lemma 4 (establishing that a good mixture exists) for completeness.

Theorem 5.

Given access to a BRMI oracle for any SBR game instance, Algorithm 3 returns a distribution which satisfies with probability at least .

Proof.

We first establish that there exists a near-optimal distribution over elements of with support size at most :

Lemma 4.

Take any collection of functions with and a distribution . There exists a distribution supported on at most elements of which satisfies for all .

Proof.

We will use the probabilistic method. Suppose that we draw samples independently from and let be the uniform distribution on the samples. Fix an arbitrary function . Via Hoeffding’s inequality, we have that

and this holds simultaneously for all scenarios with probability at least via union bound. That is, we have a random sampling procedure which outputs a distribution satisfying for all with positive probability. Via the probabilistic method we are guaranteed that such a distribution (i.e., one which is a uniform distribution on at most elements of ) exists. ∎

Now, note that Algorithm 3 maximizes over the set , which includes the distribution . Via the guarantee for SFW (Theorem 1), SFW returns satisfying (we ignore for convenience the issue of adjusting all of the values by a constant factor). Now we just need to establish that the rounding procedure succeeds. A simple variation on the proof of Theorem 3 suffices: we claim that holds for each with probability at least via our choice of . Taking union bound over all and completes the proof. ∎

Applications

We now give several examples of domains that our algorithm can be applied to. In each of these cases, we obtain the first guaranteed polynomial time constant-factor approximation algorithm for the problem. The key part of both applications is developing a BRI (the first order oracle is easily obtained in closed form via straightforward calculus).

Network security games: Earlier, we formulated network security games in the SBR framework. All we need to solve it using EQUATOR is a BRI oracle. The full attacker best response problem is known to be NP-hard (Jain et al., 2011). However, it turns out the best response to an independent distribution is easily computed. Index the set of paths and let be the th path, ending at a target with value . Let be the set of all paths from the (super)source to . Let be the corresponding submodular objective. Given a defender mixed strategy , the attacker best response problem is to find . We can rewrite this as

We can now solve a separate problem for each target and then take the one with lowest value. For each , we solve a shortest path problem. We aim to find a path which maximizes the product of the the weights on each edge. Taking logarithms, this is equivalent to finding the path which minimizes . This is a shortest path problem in which each edge has nonnegative weight , and so can be solved via Dijkstra’s algorithm. With the attacker BRI in hand, applying EQUATOR yields the first subexponential-time algorithm for network security games.

Robust coverage and budget allocation: Many widespread applications of submodular functions concern coverage functions. A coverage function takes the following form. There a set of items , and each has a weight . The algorithm can choose from a ground set of actions. Each action covers a set . The value of any set of actions is the total value of the items that those actions cover: . We can also consider probabilistic extensions where action covers each independently with probability . This framework includes budget allocation, sensor placement, facility location, and many other common submodular optimization problems. Here we consider a robust coverage problem where the weights are unknown. For concreteness, we focus on the budget allocation problem, but all of our logic applies to general coverage functions.

Budget allocation models an advertiser’s choice of how to divide a finite budget between a set of advertising channels. Each channel is a vertex on the left hand side of a bipartite graph. The right hand consists of customers. Each customer has a value which is the advertiser’s expected profit from reaching . The advertiser allocates their budget in integer amounts among . Let denote the amount of budget allocated to channel . The advertiser solves the problem

where is the probability that one unit of advertising on channel will reach customer . This a probabilistic coverage problem where the action set contains copies222We use this formulation for simplicity, but it is possible to use only copies of each node (Ene and Nguyen, 2016). of each and the feasible decisions are all size subsets of . Choosing copies of node corresponds to setting . Budget allocation has been the subject of a great deal of recent research (Alon et al., 2012; Soma et al., 2014; Miyauchi et al., 2015).

In the robust optimization problem, the profits are not exactly known. Instead, they belong to a polyhedral uncertainty set . This is very realistic: while an advertiser may be able to estimate the profit for each customer from past data, they are unlikely to know the true value for any particular campaign. We remark that Staib and Jegelka Staib and Jegelka (2017) also considered a robust budget allocation problem, but their problem has uncertainty on the probabilities , not the profits . Further, they consider a continuous problem without the complication of rounding to discrete solutions.

As an example uncertainty set, consider the D-norm uncertain set, which is common in robust optimization (Bertsimas et al., 2004; Staib and Jegelka, 2017). The uncertainty set is defined around a point estimate as

This can be thought of as allowing an adversary to scale down each entry of with a total budget of . In our case, is the advertiser’s best estimate from past data, and they would like to perform well for all scenarios within . defines the advertiser’s tolerance for risk. The problem we want to solve is , which we recognize as an instance of Problem 1. For any fixed distribution , we have by linearity of expectation

Note that the inner expectation (which is the total probability that each is reached) is constant with respect to . Hence, the adversary’s best response problem of computing

is a linear program and can be easily solved. The coefficients of this LP (the inner expectation in the above sum) can easily be computed exactly for any independent distribution. Further, since any LP has an optimal solution among the vertices of

, we can without loss of generality restrict the adversary’s pure strategies to a finite (though exponentially large) number.

Lastly, we remark that it also possible to obtain a BRMI for this problem. For any distribution , we can find a best response via linear programming provided that the coefficients can be computed for each . This is easy when is given explicitly as a mixture of independent distributions since we just average over the corresponding term for each individual . Hence, we can use Algorithm 3 to obtain a -approximation. Nevertheless, we use the original EQUATOR algorithm in our experiments and find that it performs near-optimally despite its theoretically weaker approximation ratio.

Figure 1: Experimental results for network security games.

Experiments

We now show experimental results from applying EQUATOR to these two domains.

Network security games: We first study the network security game defined above. We compare EQUATOR to the SNARES algorithm (Jain et al., 2013) which is the current state of the art algorithm with guaranteed solution quality. SNARES uses a double oracle approach to find a globally optimal solution. However, it incorporates several domain-specific heuristics which substantially improve its runtime over a standard implementation of double oracle. We note that Iwashita et al. Iwashita et al. (2016) proposed a newer double-oracle style algorithm which first preprocesses the graph to remove unnecessary edges. We do not compare to this approach because the preprocessing step can be applied equally well to either EQUATOR or double oracle. We use random geometric graphs, which are commonly used to assess algorithms for this domain due to their similarity to real world road networks (Jain et al., 2013; Iwashita et al., 2016). As in Jain et al. Jain et al. (2013), we use density with the value of each target drawn uniformly at random in . We set to be one percent of the number of edges. Each data point averages over 30 random instances. EQUATOR was run with .

Figure 1 shows the results. Figures 1(a) and 1(b) vary the network size with three randomly chosen source and target nodes. Figure 1(a) plots utility (i.e., how much loss is averted by the defender’s allocation) as a function of

. Error bars show one standard deviation. We see that EQUATOR obtains utility within 6% of SNARES, which computes a global optimum. Figure

1(b) shows runtime (on a logarithmic scale) as a function of . SNARES was terminated after 10 hours for graphs with 250 nodes, while EQUATOR easily scales to 1000 nodes. Next, Figures 1(c) and 1(d) show results as the number of sources and targets grows. As expected, utility decreases with more sources/targets since the number of resources is constant and it becomes harder to defend the network. EQUATOR obtains utility within 4% of SNARES. However, SNARES was terminated after 10 hours for just 5 source/targets, while EQUATOR runs in under 25 seconds with 20 source/targets.

Robust budget allocation: We compare three algorithms for robust budget allocation. First, EQUATOR. Second, double oracle. We use the greedy algorithm for the defender’s best response (which is a -approximation) since the exact best response is intractable. For the adversary’s best response, we use the linear program discussed in the section on robust coverage. Third, we compare to “greedy”, which greedily optimizes the advertiser’s return under the point estimate . Greedy was implemented with lazy evaluation (Minoux, 1978) which greatly improves its runtime at no cost to solution value. We generated random bipartite graphs with where each potential edge is present with probability and for each edge , is draw uniformly in . was randomly generated with each coordinate uniform in . Our uncertainty set is the D-norm set around with , representing a substantial degree of uncertainty. The budget was since the problem is hardest when is small relative to . EQUATOR was run with .

Figure 2 shows the results. Each point averages over 30 random problem instances (error bars would be hidden under the markers). Figure 2(a) plots the profit obtained by each algorithm when the true is chosen as the worst case in , with increasing on the axis. Figure 2(b) plots the average runtime for each . We see that double oracle produces highly robust solutions. However, for even , its execution was halted after 10 hours. Greedy is highly scalable, but produces solutions that are approximately 40% less robust than double oracle. EQUATOR produces solution quality within 7% of double oracle and runs in less than 30 seconds with .

Figure 2: Experimental results for budget allocation.

Next, we show results on a real world dataset from Yahoo webscope (Yahoo, 2007). The dataset logs bids placed by advertisers on a set of phrases. We create a budget allocation problem where the phrases are advertising channels and the accounts are targets; the resulting problem has and . Other parameters are the same as before. We obtain instances of varying size by randomly sampling a subset of . Figures 2(c-d) show results (averaging over 30 random instances). In Figure 2(c), we see that both double oracle and EQUATOR find highly robust solutions, with EQUATOR’s solution value within 8% of that of double oracle. By contrast, greedy obtains no profit in the worst case for , validating the importance of robust solutions on real problems. In Figure 2(d), we observe that double oracle was terminated after 10 hours for while EQUATOR scales to in under 40 seconds. Hence, EQUATOR is empirically successful at finding highly robust solutions in an efficient manner, complementing its theoretical guarantees.

Discussion and conclusion

This paper introduces the class of submodular best response games, capturing the zero sum interaction between two players when one has a submodular best response problem. Examples include network security games and robust submodular optimization problems. We study the case where the set of possible objective functions is very large (exponential in the problem size), arising from an underlying combinatorial structure. Our main result is a pseudopolynomial time algorithm to compute an approximate minimax equilibrium strategy for the maximizing player when the set of submodular objectives admits a certain form of best response oracle. We instantiate this framework for two example domains, and show experimentally that our algorithm scales to much larger instances than previous approaches.

One interesting direction for future work is to extend this framework to new application domains. Submodular structure is present in many problems, e.g., sensor placement in water networks (Krause et al., 2008) or cyber-security monitoring (Haghtalab et al., 2015). Both seem natural domains for future work, but designing appropriate best response oracles may be algorithmically challenging. Another open direction is to extend our framework to cases where only approximate best responses are available for the adversary. This would enable applications even in settings where an exact BRI is computationally intractable.

Acknowledgments: This research was supported by a NSF Graduate Fellowship. We thank Shaddin Dughmi for helpful conversations.

References

  • Agrawal et al. [2010] Shipra Agrawal, Yichuan Ding, Amin Saberi, and Yinyu Ye. Correlation robust stochastic optimization. In SODA, 2010.
  • Ahmadinejad et al. [2016] AmirMahdi Ahmadinejad, Sina Dehghani, MohammadTaghi Hajiaghayi, Brendan Lucier, Hamid Mahini, and Saeed Seddighin. From duels to battlefields: Computing equilibria of blotto and other games. In AAAI, 2016.
  • Alon et al. [2012] Noga Alon, Iftah Gamzu, and Moshe Tennenholtz. Optimizing budget allocation among channels and influencers. In WWW, pages 381–388. ACM, 2012.
  • Bertsimas et al. [2004] Dimitris Bertsimas, Dessislava Pachamanova, and Melvyn Sim. Robust linear optimization under general norms. Operations Research Letters, 32(6):510–516, 2004.
  • Bian et al. [2017] Andrew An Bian, Baharan Mirzasoleiman, Joachim M. Buhmann, and Andreas Krause. Guaranteed non-convex optimization: Submodular maximization over continuous domains. In AISTATS, 2017.
  • Bosansky et al. [2014] Branislav Bosansky, Christopher Kiekintveld, Viliam Lisy, and Michal Pechoucek. An exact double-oracle algorithm for zero-sum extensive-form games with imperfect information. Journal of Artificial Intelligence Research, 51:829–866, 2014.
  • Bosanskỳ et al. [2015] Branislav Bosanskỳ, Albert Xin Jiang, Milind Tambe, and Christopher Kiekintveld. Combining compact representation and incremental generation in large games with sequential strategies. In AAAI, 2015.
  • Calinescu et al. [2011] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766, 2011.
  • Chan et al. [2016] Hau Chan, Albert Xin Jiang, Kevin Leyton-Brown, and Ruta Mehta. Multilinear games. In WINE, 2016.
  • Chekuri et al. [2010] Chandra Chekuri, Jan Vondrak, and Rico Zenklusen. Dependent randomized rounding via exchange properties of combinatorial structures. In FOCS, 2010.
  • Chen et al. [2017] Robert Chen, Brendan Lucier, Yaron Singer, and Vasilis Syrgkanis. Robust optimization for non-convex objectives. In NIPS, 2017.
  • Delage et al. [2016] Erick Delage, Daniel Kuhn, and Wolfram Wiesemann. “dice”-sion making under uncertainty: When can a random decision reduce risk? Technical Report EPFL-ARTICLE-220662, 2016.
  • Duchi et al. [2012] John C Duchi, Peter L Bartlett, and Martin J Wainwright. Randomized smoothing for stochastic optimization. SIAM Journal on Optimization, 22(2):674–701, 2012.
  • Dughmi and Xu [2017] Shaddin Dughmi and Haifeng Xu. Algorithmic persuasion with no externalities. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 351–368. ACM, 2017.
  • Ene and Nguyen [2016] Alina Ene and Huy L Nguyen. A reduction for optimizing lattice submodular functions with diminishing returns. arXiv preprint arXiv:1606.08362, 2016.
  • Haghtalab et al. [2015] Nika Haghtalab, Aron Laszka, Ariel D Procaccia, Yevgeniy Vorobeychik, and Xenofon Koutsoukos. Monitoring stealthy diffusion. In Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM), pages 151–160. IEEE Computer Society, 2015.
  • Halvorson et al. [2009] Erik Halvorson, Vincent Conitzer, and Ronald Parr. Multi-step multi-sensor hider-seeker games. In IJCAI, 2009.
  • Iwashita et al. [2016] Hiroaki Iwashita, Kotaro Ohori, Hirokazu Anai, and Atsushi Iwasaki. Simplifying urban network security games with cut-based graph contraction. In AAMAS, 2016.
  • Iyer et al. [2014] Rishabh K. Iyer, Stefanie Jegelka, and Jeff A. Bilmes. Monotone closure of relaxed constraints in submodular optimization: Connections between minimization and maximization. In UAI, 2014. URL https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=2471&proceeding_id=30.
  • Jain et al. [2011] Manish Jain, Dmytro Korzhyk, Ondřej Vaněk, Vincent Conitzer, Michal Pěchouček, and Milind Tambe. A double oracle algorithm for zero-sum security games on graphs. In AAMAS, 2011.
  • Jain et al. [2013] Manish Jain, Vincent Conitzer, and Milind Tambe. Security scheduling for real-world networks. In AAMAS, 2013.
  • Korte et al. [2012] Bernhard Korte, Jens Vygen, B Korte, and J Vygen. Combinatorial optimization, volume 2. Springer, 2012.
  • Krause and Golovin [2014] Andreas Krause and Daniel Golovin. Submodular function maximization., 2014.
  • Krause et al. [2008] Andreas Krause, H Brendan McMahan, Carlos Guestrin, and Anupam Gupta. Robust submodular observation selection. Journal of Machine Learning Research, 9(Dec):2761–2801, 2008.
  • Krause et al. [2011] Andreas Krause, Alex Roper, and Daniel Golovin. Randomized sensing in adversarial environments. In IJCAI, 2011.
  • McMahan et al. [2003] H. Brendan McMahan, Geoffrey J. Gordon, and Avrim Blum. Planning in the presence of cost functions controlled by an adversary. In ICML, 2003.
  • Minoux [1978] Michel Minoux. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, pages 234–243, 1978.
  • Miyauchi et al. [2015] Atsushi Miyauchi, Yuni Iwamasa, Takuro Fukunaga, and Naonori Kakimura. Threshold influence model for allocating advertising budgets. In ICML, pages 1395–1404, 2015.
  • Orlin et al. [2016] James B. Orlin, Andreas S. Schulz, and Rajan Udwani. Robust monotone submodular function maximization. In IPCO, 2016.
  • Soma et al. [2014] Tasuku Soma, Naonori Kakimura, Kazuhiro Inaba, and Ken-ichi Kawarabayashi. Optimal budget allocation: Theoretical guarantee and efficient algorithm. In ICML, pages 351–359, 2014.
  • Staib and Jegelka [2017] Matthew Staib and Stefanie Jegelka. Robust budget allocation via continuous submodular functions. In ICML, 2017.
  • Tsai et al. [2010] Jason Tsai, Zhengyu Yin, Jun-young Kwak, David Kempe, Christopher Kiekintveld, and Milind Tambe. Urban security: Game-theoretic resource allocation in networked physical domains. In National Conference on Artificial Intelligence (AAAI), 2010.
  • Wilder et al. [2017] Bryan Wilder, Amulya Yadav, Nicole Immorlica, Eric Rice, and Milind Tambe. Uncharted but not uninfluenced: Influence maximization with an uncertain network. In AAMAS, 2017.
  • Xu [2016] Haifeng Xu. The mysteries of security games: Equilibrium computation becomes combinatorial algorithm design. In EC, 2016.
  • Yahoo [2007] Yahoo. Yahoo! webscope dataset ydata-ysm-advertiser-bids-v1 0. http://research.yahoo.com/Academic_Relations, 2007.

Appendix A Appendix: Omitted proofs

We start out by proving some lemmas from the main text.

Proof of Lemma 1.

Let . We would like to show that for any and any , is concave as a function of . Fix any and any . We have

where the first inequality follows because each is individually up-concave. ∎

Proof of Lemma 2.

is differentiable at a point precisely when there is a unique such that . Here, we have . Note that . By submodularity, we conclude that . Further, always holds by monotonicity. Thus, . ∎

Let be the uniform probability distribution over the ball of radius . Define the smoothed function . We will show the following properties of :

Proof of Lemma 3.

For the first property, we start out by fixing the draw of from . Following the logic of Lemma 1, we have that

Since these inequalities hold for any fixed , they also hold in expectation over a random , so we conclude that is up-concave.

For the second property: since , is -Lipschitz with respect to the norm. Thus, we have

and analogously, .

The third property follows from the fact that is differentiable almost everywhere. To see this, note that is differentiable wherever there is a unique minimizing , in which case . Suppose that there is not a unique minimizer at some point . There are two cases. First, if there is an open ball around such that the minimizing functions at coincide at every point in the ball, then their gradients also coincide in the ball. Thus, is still differentiable at . Second, if no such open ball exists, then the set of points at which is not differentiable has measure zero. Hence, taking a random perturbation of the input avoids such points with probability 1.

For the proof of the fourth property, we follow the argument of Duchi et al. (2012). We first claim that

(3)

We prove this claim as follows. Without loss of generality, we take for this step of the proof (via a linear change of variables). Let be a function that is defined as where is differentiable. At the (measure 0) set of points where is not differentiable, we define to be equal to for an arbitrary . With probability 1, We have