Online Risk-Averse Submodular Maximization

by   Tasuku Soma, et al.
The University of Tokyo

We present a polynomial-time online algorithm for maximizing the conditional value at risk (CVaR) of a monotone stochastic submodular function. Given T i.i.d. samples from an underlying distribution arriving online, our algorithm produces a sequence of solutions that converges to a (1-1/e)-approximate solution with a convergence rate of O(T^-1/4) for monotone continuous DR-submodular functions. Compared with previous offline algorithms, which require Ω(T) space, our online algorithm only requires O(√(T)) space. We extend our online algorithm to portfolio optimization for monotone submodular set functions under a matroid constraint. Experiments conducted on real-world datasets demonstrate that our algorithm can rapidly achieve CVaRs that are comparable to those obtained by existing offline algorithms.



There are no comments yet.


page 1

page 2

page 3

page 4


Non-monotone DR-submodular Maximization: Approximation and Regret Guarantees

Diminishing-returns (DR) submodular optimization is an important field w...

Online Non-Monotone DR-submodular Maximization

In this paper, we study problems at the interface of two important field...

An Optimal Algorithm for Online Unconstrained Submodular Maximization

We consider a basic problem at the interface of two fundamental fields: ...

Budget-Feasible Mechanism Design for Non-Monotone Submodular Objectives: Offline and Online

The framework of budget-feasible mechanism design studies procurement au...

No-regret algorithms for online k-submodular maximization

We present a polynomial time algorithm for online maximization of k-subm...

Distributionally Robust Submodular Maximization

Submodular functions have applications throughout machine learning, but ...

Constructive approach to the monotone rearrangement of functions

We detail a simple procedure (easily convertible to an algorithm) for co...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Submodular function maximization

is a ubiquitous problem naturally arising in broad areas such as machine learning, social network analysis, economics, combinatorial optimization, and decision-making 

(Krause2014survey; Buchbinder2018). Submodular maximization in these applications is stochastic in nature, i.e., the input data could be a sequence of samples drawn from some underlying distribution, or there could be some uncertainty in the environment. In this paper, we consider nonnegative monotone continuous DR-submodular111See Section 2 for the definition of continuous DR-submodularity. (Bian2017) functions

parameterized by a random variable

drawn from some distribution . The simplest approach for such stochastic submodular objectives is to maximize the expectation , which has been extensively studied (Karimi2017a; Hassani2017a; Mokhtari2018; Karbasi2019).

However, in real-world decision-making tasks in finance, robotics, and medicine, we sometimes must be risk-averse: We want to minimize the risk of suffering a considerably small gain rather than simply maximizing the expected gain (Mansini2007; Yau2011; Tamar2015). In medical applications, for example, we must avoid catastrophic events such as patient fatalities. In finance and robotics, all progress ceases when poor decisions cause bankruptcies or irreversible damage to robots.

Conditional value at risk (CVaR) is a popular objective for such risk-averse domains (Rockafellar2000; Krokhmal2002). Formally, given a parameter , the CVaR of a feasible solution is defined as

where is the

-quantile of the random variable

, i.e.,

Note that when , the CVaR is simply the expected value of . Since is typically set to be or in practice, we assume that is a fixed constant throughout this paper. When is clear from the context, we omit it from the notations. CVaR can also be characterized by a variational formula:

where (Rockafellar2000).

Maehara2015 initiated the study of maximizing the CVaR of stochastic submodular set functions. It was shown that the CVaR of a stochastic submodular set function is not necessarily submodular, and that it is impossible to compute a single set that attains any multiplicative approximation to the optimal CVaR. Ohsaka2017 introduced a relaxed problem of finding a portfolio over sets rather than a single set, and devised the first CVaR maximization algorithm with an approximation guarantee for the influence maximization problem (Kempe2003), a prominent example of discrete submodular maximization. Wilder2018 further considered this approach and devised an algorithm called RASCAL for maximizing the CVaR of continuous DR-submodular functions subject to a down-closed convex set.

The algorithms mentioned above are offline (batch) methods, i.e., a set of samples drawn from the underlying distribution is given as the input. However, because the number of samples needed to accurately represent can be exceedingly large, it is often inefficient or even impossible to store all the samples in memory. Further, when a new sample is observed, these offline algorithms must be rerun from scratch.

In the machine learning community, online methods, which can efficiently handle large volumes of data, have been extensively studied (Hazan2016OCO). Online methods read data in a streaming fashion, and update the current solution using a few data elements stored in memory. Further, online methods can update a solution at a low computational cost.

1.1 Our contributions

In this work, we propose online algorithms for maximizing the CVaR of stochastic submodular objectives in continuous and discrete settings.

Continuous setting

Let be i.i.d. samples drawn from arriving sequentially and be a down-closed convex set. Our main result is a polynomial-time online algorithm, StochasticRASCAL, that finds such that

for any , where the expectation is taken over and the randomness of the algorithm. StochasticRASCAL only stores data in memory, drawing a contrast to RASCAL, which store all data points. Note that the approximation ratio of is optimal for any algorithm that performs polynomially many function value queries even if  (Vondrak2013). We also conduct several experiments on real-world datasets to show the practical efficiency of our algorithm. We demonstrate that our algorithm rapidly achieves CVaR comparable to that obtained by known offline methods.

Discrete setting

As an application of the above algorithm, we devise an online algorithm to create a portfolio that maximizes the CVaR of a discrete submodular function subject to a matroid constraint. Let be a monotone stochastic submodular set function on a ground set and be a matroid. The goal is to find a portfolio of feasible sets in that maximizes

given i.i.d. samples from . We show that this problem can be reduced to online CVaR maximization of continuous DR-submodular functions on a matroid polytope, and devise a polynomial-time online approximation algorithm for creating a portfolio such that

for any portfolio over feasible sets. Note that this algorithm is the first online algorithm that converges to a -approximation portfolio in the discrete setting, which generalizes the known offline algorithm (Wilder2018) to the i.i.d. setting.

1.2 Our techniques

To analyze our online algorithms, we introduce a novel adversarial online learning problem, which we call adversarial online submodular CVaR learning. This online learning problem is described as follows. For , the learner chooses and possibly in a randomized manner. After and are chosen, the adversary reveals a monotone continuous DR-submodular function to the learner. The goal of the learner is to minimize the approximate regret

for arbitrary and , where the function is given by

We devise an efficient algorithm that achieves approximate regret in expectation. Further, we show that, given an online algorithm with a sublinear approximate regret, we can construct an online algorithm that achieves a -approximation to CVaR maximization, whose convergence rate is . Combining these results, we obtain an online -approximation algorithm for CVaR maximization with a convergence rate of .

We remark that adversarial online submodular CVaR learning may be of interest in its own right: Although the objective function is neither monotone nor continuous DR-submodular in general, we can design an online algorithm with a sublinear ()-regret by exploiting the underlying structure of . As per our knowledge, an online algorithm for non-monotone and non-DR-submodular maximization does not exist in the literature.

1.3 Related work

Several studies focused on CVaR optimization in the adversarial online settings and i.i.d. settings. Tamar2015

studied CVaR optimization over i.i.d. samples and analyzed stochastic gradient descent under the strong assumption that CVaR is continuously differentiable. Recently,


introduced the concept of the CVaR regret for convex loss functions and provided online algorithms for minimizing the CVaR regret under bandit feedback.

Online and stochastic optimization of submodular maximization have been extensively studied in Streeter2008; Streeter2009; Golovin2014; Karimi2017a; Hassani2017a; Mokhtari2018; Chen2018; Roughgarden2018; Soma2019; Karbasi2019; Zhang2019. These studies optimize either the approximate regret or the expectation and do not consider CVaR.

Another line of related work is robust submodular maximization (Krause2008b; Chen2017a; Anari2019). In robust submodular maximization, we maximize the minimum of submodular functions, i.e., . Robust submodular maximization is the limit of CVaR maximization, where

is the uniform distribution over

values and . Recently, Staib2019b proposed distributionally robust submodular optimization, which maximizes for an uncertainty set of distributions. It is known that CVaR can be formulated in the distributionally robust framework (Shapiro2014). However, the algorithms proposed by Staib2019b require that is a subset of the

-dimensional probability simplex; moreover, their time complexity depends on

. Our algorithms work even if is infinite.

1.4 Organization of this paper

This paper is organized as follows. Section 2 introduces the background of submodular optimization. Sections 3 and 4 describe our algorithms for continuous and discrete setting, respectively. Section 5 present experimental results using real-world dataset. The omitted analysis and the details of adversarial setting can be found in Appendix.

2 Preliminaries

Throughout the paper, denotes the ground set and denotes the size of the ground set. For a set function , the multilinear extension is defined as . For a matroid on , the base polytope is the convex hull of bases of the matroid. It is well-known that the linear optimization on a base polytope can be solved by the greedy algorithm (Fujishige2005).

We denote the Euclidean norm and inner product by and , respectively. The norm () is denoted by . The Euclidean projection of onto a set is denoted by . A convex set is said to be down-closed if and imply . A function is said to be -Lipschitz (continuous) for if for all . We say that is -smooth for if is continuously differentiable and . A smooth function is said to be continuous DR-submodular (Bian2017) if for all . The multilinear extension of a submodular function is known to be DR-submodular (Calinescu2011). The continuous DR-submodularity implies up-concavity: For a continuous DR-submodular function , , and , the univariate function is concave.

The uniform distribution of a set is denoted by

. The standard normal distribution is denoted by


3 CVaR Maximization of Continuous DR-submodular Functions

We present our online algorithm for CVaR maximization via i.i.d. samples. Let be the monotone continuous DR-submodular function corresponding to the -th sample , i.e.,

for . Similarly, define an auxiliary function with respect to by

for . Let be a down-closed convex set. Formally, we make the following very mild assumptions on and .

Assumption 1.


  • For all , is -Lipschitz and -smooth, and .

  • The diameter of is bounded by .

  • We are given a linear optimization oracle over .

When the underlying norm is the -norm with , we write to emphasize it. For example, if is the multilinear extension of a submodular set function and is the base polytope of a rank- matroid, we have , , .

Our algorithm borrows some ideas from an algorithm called RASCAL (Wilder2018). First, we define a smoothed auxiliary function as

where is a smoothing parameter specified later. This smoothing guarantees that is differentiable for all and has Lipschitz continuous gradients.

Lemma 2 (Lemma 6 of Wilder2018).


  1. for all and .

  2. If is -Lipschitz and -smooth, and , then is -Lipschitz.

Lemma 3 (Wilder2018).

The function is monotone and up-concave.

3.1 StochasticRASCAL

We now formally describe our algorithm, StochasticRASCAL. We note that RASCAL runs the Frank-Wolfe algorithm on a function . Owing to the up-concavity and smoothness properties of this function, one can obtain -approximation. However, in our online setting, we cannot evaluate this function because will be revealed online, and hence we cannot simply run RASCAL.

To overcome the issue above, first we split the samples into mini-batches of length , which we specify later. The key idea is to use the following objective function

for each -th mini-batch (). We can see that is monotone, up-concave, and can be evaluated only using samples in the -th mini-batch. Then, we run a perturbed version of Frank-Wolfe algorithm (Golovin2014; Bian2017) on . More formally, we first initialize and for each , we perform the update

where is the step size, is a solution to a perturbed linear optimization problem on :


is a perturbation vector. This perturbation trick aims to stabilize the algorithm so that we can maximize the true objective

using only mini-batch objectives.

In each iteration of continuous greedy, we need the gradients , which in turn requires us to compute . These gradients and the optimal can be computed by SmoothGrad and SmoothTau subroutines, respectively, which were proposed in Wilder2018; See Algorithms 2 and 3.

Let us write for . The final output of StochasticRASCAL is for a random index chosen uniformly at random from . The pseudocode of StochasticRASCAL is presented in Algorithm 1.

0:  learning rates , step size , perturbation distribution , smoothing parameter , and mini-batch size
1:  Initialize arbitrary.
2:  for  do
3:     Observe samples and store them in mini-batch .
4:     /* continuous greedy */
5:     Let .
6:     for  do
8:        .
9:        Find a vertex of that maximizes for , where . // FPL
11:     end for
12:     Let .
13:  end for
14:  return   for chosen from uniformly at random.
Algorithm 1 StochasticRASCAL
0:  , , , and mini-batch
1:   for .
2:  return  
Algorithm 2 SmoothGrad
0:  , , and mini-batch
2:  Sort in ascending order, obtaining .
6:  return  the solution of the linear equation
Algorithm 3 SmoothTau

3.2 Convergence rate via regret bounds

Let us consider the convergence rate of StochasticRASCAL. The main challenge of the analysis is how to set the parameters used in the algorithm, i.e., learning rates , step size , perturbation distribution , smoothing parameter , and mini-batch size , to achieve the desired convergence rate.

To this end, using tools from online convex optimization, we prove an approximate regret bound for a variant of StochasticRASCAL for adversarial online submodular CVaR learning (see Introduction for the definition).

Theorem 4 (informal).

There exists an efficient online algorithm for adversarial online CVaR learning with

for an arbitrary and , where the big-O notation hides a factor polynomial in , and .

We then show that the above regret bound can be used to show a convergence rate of StochasticRASCAL. The technical detail of the adversarial setting and the proof of the following theorem is deferred to Appendix.

Theorem 5.

Under Assumption 1, StochasticRASCAL outputs such that for any ,

where we set , , , , and , and . Further, if is an integral polytope contained in , then

for , , , , and .

To achieve for a desired error , StochasticRASCAL requires samples and space, whereas RASCAL (Wilder2018) requires samples and space. Our algorithm runs in a smaller space when the parameters are of moderate size. For example, if and , the space complexity of StochasticRASCAL is better than that of RASCAL.

4 CVaR Maximization of Discrete Submodular Functions

We now present our online algorithm for a monotone submodular set function and a matroid constraint. Let be a monotone submodular function corresponding to the -th sample and be its multilinear extension for .

The basic idea is to run StochasticRASCAL on the multilinear extensions and the matroid polytope . However, we must address several technical obstacles. First, we must compare the output portfolio with the optimal portfolio; the error bound in the previous sections compared it with the optimal solution. To this end, we make multiple copies of variables so that we can approximate an optimal portfolio by a uniform distribution over a multiset of feasible solutions. More precisely, we define a continuous DR-submodular function by

for some sufficiently large . Then, we feed to StochasticRASCAL. Suppose that we obtain at Line 12 for each mini-batch . Abusing the notation, let us denote when the -th sample is in the -th mini-batch.

Next, we need to convert to feasible sets without significantly deteriorating the values of the multilinear extensions. To this end, we independently apply randomized swap rounding (Chekuri2010) times to each to obtain feasible sets . Note that randomized swap rounding is oblivious rounding and independent from . We can show that is close to by using a concentration inequality. Finally, after rounds, we return the uniform portfolio over all . The pseudocode is given in Algorithm 4. Carefully choosing and , we obtain the following theorem.

1:  Run StochasticRASCAL for and the matroid polytope and let be the temporary solution at Line 12 in StochasticRASCAL for .
2:   for , , and .
3:  return  Uniform portfolio over all .
Algorithm 4 Online algorithm for maximizing a monotone submodular set function subject to a matroid constraint.
Theorem 6.

Algorithm 4 achieves

for arbitrary portfolio , where we set and and the expectation is taken over and the randomness of the algorithm.

5 Experiments

(a) NetScience
(b) EuroRoad
(c) BWSN
Figure 1: CVaR and the number of samples
(a) NetScience
(b) EuroRoad
(c) BWSN
Figure 2: CVaR and budget

In this section, we show our experimental results. In all the experiments, the parameter of CVaR was set to . The experiments were conducted on a Linux server with Intel Xeon Gold 6242 (2.8GHz) and 384GB of main memory.

Problem Description.

We conducted experiments on the sensor resource allocation problem, in which the goal is to rapidly detect a contagion spreading through a network using a limited budget (Bian2017; leskovec2007cost; soma2015generalization).

Here, we follow the configuration of the experiments conducted in Wilder2018. Let be a graph on vertices. A contagion starts at a random vertex and spreads over time according to some specific stochastic process. Let be the time at which the contagion reaches , and let . If for some vertex , that is, the contagion does not reach , we reassign , as described in Wilder2018.

The decision maker has a budget (e.g., energy) to spend on sensing resources. Let represent the amount of energy allocated to the sensor at a vertex . When contagion reaches at time , the sensor detects it with probability , where is the probability that detects the contagion per unit of energy. The objective on vectors and is the expected amount of detection time that is saved by the sensor placements:

where the vertices are ordered so that . It is known that the function is DR-submodular (Bian2017).


We consider two sensing models and generated three datasets. In all of them, the source vertex is chosen uniformly at random.

The first model is the continuous time independent cascade model (CTIC). In this model, each edge has propagation time

drawn from an exponential distribution with mean

. The contagion starts at the source vertex , i.e., , and we iteratively set , where is the set of neighbors of . Note that is the first time that the contagion reaches from its neighbor. We generated datasets using two real-world networks222 NetScience, a collaboration network of 1,461 network scientists, and EuroRoad, a network of 1,174 European cities and the roads between them. For both networks, we set and , and we generated 1,000 scenarios.

The second model, known as the Battle of Water Sensor Networks (BWSN) (ostfeld2008battle), involves contamination detection in a water network. BWSM simulates the spread of contamination through a 126-vertex water network consisting of junctions, tanks, pumps, and the links between them, and the values are provided by a simulator. We set and generated 1,000 scenarios.


We compared our method against two offline algorithms, RASCAL (Wilder2018) and the Frank–Wolfe (FW) algorithm (Bian2017). We note that the latter algorithm is designed to maximize the expectation of a DR-submodular function instead of its CVaR. We run those offline methods on the generated 1,000 scenarios for each dataset. As our method is an online algorithm, we run our method on 20,000 samples in an online manner, where each sample was uniformly drawn from the set of generated scenarios.


Figure 1 shows how the CVaR changes as increases. For each dataset, as long as the batch size is not excessively small, the CVaR attained by our method approaches to that attained by RASCAL. FW algorithm showed significantly lower performance because it is not designed to maximize CVaR.

Figure 2 shows how the CVaR changes as the budget increases. For our method, we plotted the CVaR after processing samples. We can again confirm that the CVaR attained by our method is close to that attained by RASCAL.

6 Conclusion

We devised StochasticRASCAL for maximizing CVaR of a monotone stochastic submodular function. We showed that StochasticRASCAL finds a ()-approximate solution with a convergence rate of for monotone continuous DR-submodular functions. We extended it to portfolio optimization for monotone submodular set functions under a matroid constraint. Experiments using CTIC and BWSN datasets demonstrated that our algorithm can rapidly achieve CVaRs that are comparable to RASCAL.


T.S. is supported by JST, ERATO, Grant Number JPMJER1903, Japan. Y.Y. is supported in part by JSPS KAKENHI Grant Number 18H05291 and 20H05965.


Appendix A Adversarial Setting

In this section, we present online algorithm for CVaR maximization in an adversarial environment.

a.1 Preliminaries on Online Convex Optimization

We use the framework called online convex optimization (OCO) extensively, and will briefly explain it below. For details, the reader is referred to a monograph [Hazan2016OCO]. In OCO, the learner is given a compact convex set . For each round , the learner must select and then the adversary reveals a concave reward function333Although OCO is usually formulated as convex minimization, we state OCO in the form of concave maximization for later use. Note that we can convert minimization to maximization by negating the objective function. to the learner. The goal of the learner is to minimize the (1-)regret

An important subclass of OCO is online linear optimization (OLO), in which the objective functions are linear.

We use the following OCO algorithms. Let be a learning rate.

Online Gradient Descent (OGD)

Follow the Perturbed Leader (FPL)

where is a perturbation term drawn from some distribution .

Lemma 7 (Hazan2016OCO).

If and the diameter of is , then the OGD sequence satisfies

Lemma 8 (Hazan2016OCO; Cohen and Hazan [2015]).

If , , and the diameter of is , then the FPL sequence with satisfies

where the expectation is taken over the randomness in the algorithm. Further, if and is an integral polytope contained in , then

for .

a.2 Online Algorithm for Adversarial Online CVaR Learning

We now present our online algorithm, OnlineRASCAL, for adversarial online submodular CVaR learning. Let be a monotone continuous DR-submodular function with for . Let be a down-closed convex set. OnlineRASCAL maintains . Let us consider the -th mini-batch and denote a variable in this mini-batch by . Within the -th mini-batch, we play the same and use OGD to learn . At the end of the -th mini-batch, we update using online continuous greedy where each inner iteration performs FPL. The pseudocode can be found in Algorithm 5.

0:  learning rates , step size , FPL distribution , and mini-batch size
1:  Initialize .
2:  for  do
3:     /* learn in mini-batch */
4:     Initialize .
5:     for  do
6:        Play and observe .
7:        .
8:        . // OGD
9:     end for
10:     Let .
11:     /* continuous greedy */
12:     Let .
13:     for  do
14:        Compute via SmoothGrad and SmoothTau.
15:        Take a vertex of that maximizes for , where . // FPL
17:     end for
18:     Let .
19:  end for
Algorithm 5 OnlineRASCAL
Theorem 9.

Under Assumption 1, OnlineRASCAL achieves

for arbitrary and , where is the 1-regret of the -th FPL, is the 1-regret of OGD, , and is the diameter of .

Before diving into the formal proof, we outline the proof. Recall that the 1-regret of the -th FPL algorithm is

for each . Then, using the up-concavity of , we can prove

for each . On the other hand, for each , we have

where is the smoothness parameter of . By combining these two inequalities, we can show

via the standard analysis of continuous greedy.

The next step is to relate with the regret in terms of . We have

by the definition of the 1-regret of OGD. Combining these two bounds, we can prove Theorem 9.

Proof of Theorem 9.

First, we obtain

( and )
(definition of 1-regret)

On the other hand, we have