1 Introduction
Submodular function maximization
is a ubiquitous problem naturally arising in broad areas such as machine learning, social network analysis, economics, combinatorial optimization, and decisionmaking
(Krause2014survey; Buchbinder2018). Submodular maximization in these applications is stochastic in nature, i.e., the input data could be a sequence of samples drawn from some underlying distribution, or there could be some uncertainty in the environment. In this paper, we consider nonnegative monotone continuous DRsubmodular^{1}^{1}1See Section 2 for the definition of continuous DRsubmodularity. (Bian2017) functionsparameterized by a random variable
drawn from some distribution . The simplest approach for such stochastic submodular objectives is to maximize the expectation , which has been extensively studied (Karimi2017a; Hassani2017a; Mokhtari2018; Karbasi2019).However, in realworld decisionmaking tasks in finance, robotics, and medicine, we sometimes must be riskaverse: We want to minimize the risk of suffering a considerably small gain rather than simply maximizing the expected gain (Mansini2007; Yau2011; Tamar2015). In medical applications, for example, we must avoid catastrophic events such as patient fatalities. In finance and robotics, all progress ceases when poor decisions cause bankruptcies or irreversible damage to robots.
Conditional value at risk (CVaR) is a popular objective for such riskaverse domains (Rockafellar2000; Krokhmal2002). Formally, given a parameter , the CVaR of a feasible solution is defined as
where is the
quantile of the random variable
, i.e.,Note that when , the CVaR is simply the expected value of . Since is typically set to be or in practice, we assume that is a fixed constant throughout this paper. When is clear from the context, we omit it from the notations. CVaR can also be characterized by a variational formula:
where (Rockafellar2000).
Maehara2015 initiated the study of maximizing the CVaR of stochastic submodular set functions. It was shown that the CVaR of a stochastic submodular set function is not necessarily submodular, and that it is impossible to compute a single set that attains any multiplicative approximation to the optimal CVaR. Ohsaka2017 introduced a relaxed problem of finding a portfolio over sets rather than a single set, and devised the first CVaR maximization algorithm with an approximation guarantee for the influence maximization problem (Kempe2003), a prominent example of discrete submodular maximization. Wilder2018 further considered this approach and devised an algorithm called RASCAL for maximizing the CVaR of continuous DRsubmodular functions subject to a downclosed convex set.
The algorithms mentioned above are offline (batch) methods, i.e., a set of samples drawn from the underlying distribution is given as the input. However, because the number of samples needed to accurately represent can be exceedingly large, it is often inefficient or even impossible to store all the samples in memory. Further, when a new sample is observed, these offline algorithms must be rerun from scratch.
In the machine learning community, online methods, which can efficiently handle large volumes of data, have been extensively studied (Hazan2016OCO). Online methods read data in a streaming fashion, and update the current solution using a few data elements stored in memory. Further, online methods can update a solution at a low computational cost.
1.1 Our contributions
In this work, we propose online algorithms for maximizing the CVaR of stochastic submodular objectives in continuous and discrete settings.
Continuous setting
Let be i.i.d. samples drawn from arriving sequentially and be a downclosed convex set. Our main result is a polynomialtime online algorithm, StochasticRASCAL, that finds such that
for any , where the expectation is taken over and the randomness of the algorithm. StochasticRASCAL only stores data in memory, drawing a contrast to RASCAL, which store all data points. Note that the approximation ratio of is optimal for any algorithm that performs polynomially many function value queries even if (Vondrak2013). We also conduct several experiments on realworld datasets to show the practical efficiency of our algorithm. We demonstrate that our algorithm rapidly achieves CVaR comparable to that obtained by known offline methods.
Discrete setting
As an application of the above algorithm, we devise an online algorithm to create a portfolio that maximizes the CVaR of a discrete submodular function subject to a matroid constraint. Let be a monotone stochastic submodular set function on a ground set and be a matroid. The goal is to find a portfolio of feasible sets in that maximizes
given i.i.d. samples from . We show that this problem can be reduced to online CVaR maximization of continuous DRsubmodular functions on a matroid polytope, and devise a polynomialtime online approximation algorithm for creating a portfolio such that
for any portfolio over feasible sets. Note that this algorithm is the first online algorithm that converges to a approximation portfolio in the discrete setting, which generalizes the known offline algorithm (Wilder2018) to the i.i.d. setting.
1.2 Our techniques
To analyze our online algorithms, we introduce a novel adversarial online learning problem, which we call adversarial online submodular CVaR learning. This online learning problem is described as follows. For , the learner chooses and possibly in a randomized manner. After and are chosen, the adversary reveals a monotone continuous DRsubmodular function to the learner. The goal of the learner is to minimize the approximate regret
for arbitrary and , where the function is given by
We devise an efficient algorithm that achieves approximate regret in expectation. Further, we show that, given an online algorithm with a sublinear approximate regret, we can construct an online algorithm that achieves a approximation to CVaR maximization, whose convergence rate is . Combining these results, we obtain an online approximation algorithm for CVaR maximization with a convergence rate of .
We remark that adversarial online submodular CVaR learning may be of interest in its own right: Although the objective function is neither monotone nor continuous DRsubmodular in general, we can design an online algorithm with a sublinear ()regret by exploiting the underlying structure of . As per our knowledge, an online algorithm for nonmonotone and nonDRsubmodular maximization does not exist in the literature.
1.3 Related work
Several studies focused on CVaR optimization in the adversarial online settings and i.i.d. settings. Tamar2015
studied CVaR optimization over i.i.d. samples and analyzed stochastic gradient descent under the strong assumption that CVaR is continuously differentiable. Recently,
Cardoso19aintroduced the concept of the CVaR regret for convex loss functions and provided online algorithms for minimizing the CVaR regret under bandit feedback.
Online and stochastic optimization of submodular maximization have been extensively studied in Streeter2008; Streeter2009; Golovin2014; Karimi2017a; Hassani2017a; Mokhtari2018; Chen2018; Roughgarden2018; Soma2019; Karbasi2019; Zhang2019. These studies optimize either the approximate regret or the expectation and do not consider CVaR.
Another line of related work is robust submodular maximization (Krause2008b; Chen2017a; Anari2019). In robust submodular maximization, we maximize the minimum of submodular functions, i.e., . Robust submodular maximization is the limit of CVaR maximization, where
is the uniform distribution over
values and . Recently, Staib2019b proposed distributionally robust submodular optimization, which maximizes for an uncertainty set of distributions. It is known that CVaR can be formulated in the distributionally robust framework (Shapiro2014). However, the algorithms proposed by Staib2019b require that is a subset of thedimensional probability simplex; moreover, their time complexity depends on
. Our algorithms work even if is infinite.1.4 Organization of this paper
This paper is organized as follows. Section 2 introduces the background of submodular optimization. Sections 3 and 4 describe our algorithms for continuous and discrete setting, respectively. Section 5 present experimental results using realworld dataset. The omitted analysis and the details of adversarial setting can be found in Appendix.
2 Preliminaries
Throughout the paper, denotes the ground set and denotes the size of the ground set. For a set function , the multilinear extension is defined as . For a matroid on , the base polytope is the convex hull of bases of the matroid. It is wellknown that the linear optimization on a base polytope can be solved by the greedy algorithm (Fujishige2005).
We denote the Euclidean norm and inner product by and , respectively. The norm () is denoted by . The Euclidean projection of onto a set is denoted by . A convex set is said to be downclosed if and imply . A function is said to be Lipschitz (continuous) for if for all . We say that is smooth for if is continuously differentiable and . A smooth function is said to be continuous DRsubmodular (Bian2017) if for all . The multilinear extension of a submodular function is known to be DRsubmodular (Calinescu2011). The continuous DRsubmodularity implies upconcavity: For a continuous DRsubmodular function , , and , the univariate function is concave.
3 CVaR Maximization of Continuous DRsubmodular Functions
We present our online algorithm for CVaR maximization via i.i.d. samples. Let be the monotone continuous DRsubmodular function corresponding to the th sample , i.e.,
for . Similarly, define an auxiliary function with respect to by
for . Let be a downclosed convex set. Formally, we make the following very mild assumptions on and .
Assumption 1.

For all , is Lipschitz and smooth, and .

The diameter of is bounded by .

We are given a linear optimization oracle over .
When the underlying norm is the norm with , we write to emphasize it. For example, if is the multilinear extension of a submodular set function and is the base polytope of a rank matroid, we have , , .
Our algorithm borrows some ideas from an algorithm called RASCAL (Wilder2018). First, we define a smoothed auxiliary function as
where is a smoothing parameter specified later. This smoothing guarantees that is differentiable for all and has Lipschitz continuous gradients.
Lemma 2 (Lemma 6 of Wilder2018).

for all and .

If is Lipschitz and smooth, and , then is Lipschitz.
Lemma 3 (Wilder2018).
The function is monotone and upconcave.
3.1 StochasticRASCAL
We now formally describe our algorithm, StochasticRASCAL. We note that RASCAL runs the FrankWolfe algorithm on a function . Owing to the upconcavity and smoothness properties of this function, one can obtain approximation. However, in our online setting, we cannot evaluate this function because will be revealed online, and hence we cannot simply run RASCAL.
To overcome the issue above, first we split the samples into minibatches of length , which we specify later. The key idea is to use the following objective function
for each th minibatch (). We can see that is monotone, upconcave, and can be evaluated only using samples in the th minibatch. Then, we run a perturbed version of FrankWolfe algorithm (Golovin2014; Bian2017) on . More formally, we first initialize and for each , we perform the update
where is the step size, is a solution to a perturbed linear optimization problem on :
Here,
is a perturbation vector. This perturbation trick aims to stabilize the algorithm so that we can maximize the true objective
using only minibatch objectives.In each iteration of continuous greedy, we need the gradients , which in turn requires us to compute . These gradients and the optimal can be computed by SmoothGrad and SmoothTau subroutines, respectively, which were proposed in Wilder2018; See Algorithms 2 and 3.
Let us write for . The final output of StochasticRASCAL is for a random index chosen uniformly at random from . The pseudocode of StochasticRASCAL is presented in Algorithm 1.
3.2 Convergence rate via regret bounds
Let us consider the convergence rate of StochasticRASCAL. The main challenge of the analysis is how to set the parameters used in the algorithm, i.e., learning rates , step size , perturbation distribution , smoothing parameter , and minibatch size , to achieve the desired convergence rate.
To this end, using tools from online convex optimization, we prove an approximate regret bound for a variant of StochasticRASCAL for adversarial online submodular CVaR learning (see Introduction for the definition).
Theorem 4 (informal).
There exists an efficient online algorithm for adversarial online CVaR learning with
for an arbitrary and , where the bigO notation hides a factor polynomial in , and .
We then show that the above regret bound can be used to show a convergence rate of StochasticRASCAL. The technical detail of the adversarial setting and the proof of the following theorem is deferred to Appendix.
Theorem 5.
Under Assumption 1, StochasticRASCAL outputs such that for any ,
where we set , , , , and , and . Further, if is an integral polytope contained in , then
for , , , , and .
To achieve for a desired error , StochasticRASCAL requires samples and space, whereas RASCAL (Wilder2018) requires samples and space. Our algorithm runs in a smaller space when the parameters are of moderate size. For example, if and , the space complexity of StochasticRASCAL is better than that of RASCAL.
4 CVaR Maximization of Discrete Submodular Functions
We now present our online algorithm for a monotone submodular set function and a matroid constraint. Let be a monotone submodular function corresponding to the th sample and be its multilinear extension for .
The basic idea is to run StochasticRASCAL on the multilinear extensions and the matroid polytope . However, we must address several technical obstacles. First, we must compare the output portfolio with the optimal portfolio; the error bound in the previous sections compared it with the optimal solution. To this end, we make multiple copies of variables so that we can approximate an optimal portfolio by a uniform distribution over a multiset of feasible solutions. More precisely, we define a continuous DRsubmodular function by
for some sufficiently large . Then, we feed to StochasticRASCAL. Suppose that we obtain at Line 12 for each minibatch . Abusing the notation, let us denote when the th sample is in the th minibatch.
Next, we need to convert to feasible sets without significantly deteriorating the values of the multilinear extensions. To this end, we independently apply randomized swap rounding (Chekuri2010) times to each to obtain feasible sets . Note that randomized swap rounding is oblivious rounding and independent from . We can show that is close to by using a concentration inequality. Finally, after rounds, we return the uniform portfolio over all . The pseudocode is given in Algorithm 4. Carefully choosing and , we obtain the following theorem.
Theorem 6.
Algorithm 4 achieves
for arbitrary portfolio , where we set and and the expectation is taken over and the randomness of the algorithm.
5 Experiments
In this section, we show our experimental results. In all the experiments, the parameter of CVaR was set to . The experiments were conducted on a Linux server with Intel Xeon Gold 6242 (2.8GHz) and 384GB of main memory.
Problem Description.
We conducted experiments on the sensor resource allocation problem, in which the goal is to rapidly detect a contagion spreading through a network using a limited budget (Bian2017; leskovec2007cost; soma2015generalization).
Here, we follow the configuration of the experiments conducted in Wilder2018. Let be a graph on vertices. A contagion starts at a random vertex and spreads over time according to some specific stochastic process. Let be the time at which the contagion reaches , and let . If for some vertex , that is, the contagion does not reach , we reassign , as described in Wilder2018.
The decision maker has a budget (e.g., energy) to spend on sensing resources. Let represent the amount of energy allocated to the sensor at a vertex . When contagion reaches at time , the sensor detects it with probability , where is the probability that detects the contagion per unit of energy. The objective on vectors and is the expected amount of detection time that is saved by the sensor placements:
where the vertices are ordered so that . It is known that the function is DRsubmodular (Bian2017).
Datasets.
We consider two sensing models and generated three datasets. In all of them, the source vertex is chosen uniformly at random.
The first model is the continuous time independent cascade model (CTIC). In this model, each edge has propagation time
drawn from an exponential distribution with mean
. The contagion starts at the source vertex , i.e., , and we iteratively set , where is the set of neighbors of . Note that is the first time that the contagion reaches from its neighbor. We generated datasets using two realworld networks^{2}^{2}2http://konect.cc: NetScience, a collaboration network of 1,461 network scientists, and EuroRoad, a network of 1,174 European cities and the roads between them. For both networks, we set and , and we generated 1,000 scenarios.The second model, known as the Battle of Water Sensor Networks (BWSN) (ostfeld2008battle), involves contamination detection in a water network. BWSM simulates the spread of contamination through a 126vertex water network consisting of junctions, tanks, pumps, and the links between them, and the values are provided by a simulator. We set and generated 1,000 scenarios.
Methods.
We compared our method against two offline algorithms, RASCAL (Wilder2018) and the Frank–Wolfe (FW) algorithm (Bian2017). We note that the latter algorithm is designed to maximize the expectation of a DRsubmodular function instead of its CVaR. We run those offline methods on the generated 1,000 scenarios for each dataset. As our method is an online algorithm, we run our method on 20,000 samples in an online manner, where each sample was uniformly drawn from the set of generated scenarios.
Results.
Figure 1 shows how the CVaR changes as increases. For each dataset, as long as the batch size is not excessively small, the CVaR attained by our method approaches to that attained by RASCAL. FW algorithm showed significantly lower performance because it is not designed to maximize CVaR.
Figure 2 shows how the CVaR changes as the budget increases. For our method, we plotted the CVaR after processing samples. We can again confirm that the CVaR attained by our method is close to that attained by RASCAL.
6 Conclusion
We devised StochasticRASCAL for maximizing CVaR of a monotone stochastic submodular function. We showed that StochasticRASCAL finds a ()approximate solution with a convergence rate of for monotone continuous DRsubmodular functions. We extended it to portfolio optimization for monotone submodular set functions under a matroid constraint. Experiments using CTIC and BWSN datasets demonstrated that our algorithm can rapidly achieve CVaRs that are comparable to RASCAL.
Acknowledgments
T.S. is supported by JST, ERATO, Grant Number JPMJER1903, Japan. Y.Y. is supported in part by JSPS KAKENHI Grant Number 18H05291 and 20H05965.
References
Appendix A Adversarial Setting
In this section, we present online algorithm for CVaR maximization in an adversarial environment.
a.1 Preliminaries on Online Convex Optimization
We use the framework called online convex optimization (OCO) extensively, and will briefly explain it below. For details, the reader is referred to a monograph [Hazan2016OCO]. In OCO, the learner is given a compact convex set . For each round , the learner must select and then the adversary reveals a concave reward function^{3}^{3}3Although OCO is usually formulated as convex minimization, we state OCO in the form of concave maximization for later use. Note that we can convert minimization to maximization by negating the objective function. to the learner. The goal of the learner is to minimize the (1)regret
An important subclass of OCO is online linear optimization (OLO), in which the objective functions are linear.
We use the following OCO algorithms. Let be a learning rate.
Online Gradient Descent (OGD)
Follow the Perturbed Leader (FPL)
where is a perturbation term drawn from some distribution .
Lemma 7 (Hazan2016OCO).
If and the diameter of is , then the OGD sequence satisfies
Lemma 8 (Hazan2016OCO; Cohen and Hazan [2015]).
If , , and the diameter of is , then the FPL sequence with satisfies
where the expectation is taken over the randomness in the algorithm. Further, if and is an integral polytope contained in , then
for .
a.2 Online Algorithm for Adversarial Online CVaR Learning
We now present our online algorithm, OnlineRASCAL, for adversarial online submodular CVaR learning. Let be a monotone continuous DRsubmodular function with for . Let be a downclosed convex set. OnlineRASCAL maintains . Let us consider the th minibatch and denote a variable in this minibatch by . Within the th minibatch, we play the same and use OGD to learn . At the end of the th minibatch, we update using online continuous greedy where each inner iteration performs FPL. The pseudocode can be found in Algorithm 5.
Theorem 9.
Under Assumption 1, OnlineRASCAL achieves
for arbitrary and , where is the 1regret of the th FPL, is the 1regret of OGD, , and is the diameter of .
Before diving into the formal proof, we outline the proof. Recall that the 1regret of the th FPL algorithm is
for each . Then, using the upconcavity of , we can prove
for each . On the other hand, for each , we have
where is the smoothness parameter of . By combining these two inequalities, we can show
via the standard analysis of continuous greedy.
The next step is to relate with the regret in terms of . We have
by the definition of the 1regret of OGD. Combining these two bounds, we can prove Theorem 9.
Proof of Theorem 9.
First, we obtain
(monotonicty)  
(upconcavity)  
( and )  
(definition of 1regret) 
On the other hand, we have
(upconcavity)  
Comments
There are no comments yet.