Online learning has received significant attention due to the growing amounts of information collected about individuals, and has been studied in the context of a wide variety of optimization problems, including portfolio optimization [7, 18, 15], shortest paths 12], convex optimization [13, 4], and game theoretic optimization 
. When these machine learning tools are applied to sensitive data from individuals, privacy concerns becoming increasingly important. In applications such as clinical trials, online ad placement, personalized pricing, and recommender systems, online learning algorithms are dealing with personal (and possibly highly sensitive) data.
In this paper, we develop the first algorithms for differentially private online submodular optimization. A function mapping from discrete collections of elements to real values is submodular if it exhibits the following diminishing returns property: for all sets such that and for all elements ,
Submodular functions have several applications in machine learning (see  for a survey) and are used extensively used economics because their diminishing returns property captures preferences for substitutable goods and satiation from multiple copies of the same good [25, 2].
In the Online Submodular Minimization problem, a sequence of submodular functions arrive in an online fashion. At every timestep , a decision maker choose a set before observing the function . The decision maker then incurs cost . The decision maker’s goal is to minimize her total regret, which is defined as,
That is, her regret is the difference between her total cost across all rounds, and the cost of the best fixed set in hindsight after seeing all the functions. We say that an algorithm for the Online Submodular Minimization problem is no regret is the regret (or expected regret for randomized algorithms) is sublinear in : .
We consider two different settings based on the type of informational feedback the decision maker receives in each round. In the full information setting, the decision maker observes the entire function after making her choice of . In the bandit setting, the decision maker only observes her cost and does not receive any additional information about the function . The bandit setting is a more challenging environment because the decision maker has severely restricted information when making decisions, but also captures the reality of many real-world online learning problems where counterfactual outcomes cannot be measured.
We formally incorporate the task of preserving privacy by using the framework of differential privacy. Differential privacy was first defined by  for algorithms operating on large static databases, and required that if a single entry in the database changed, then the algorithm would produce approximately the same output. In this work, we view our database as the sequence of submodular functions , and the algorithm’s output is the sequence of sets . We require that if a single function were changed to a different , then the entire sequence of chosen sets would be approximately the same. We formalize this in Definition 1 below.
Let and be sequences of functions. We say and are neighboring sequences if for all but at most one .
Definition 1 (Differential privacy ).
An algorithm is -differentially private if for all neighboring sequences and every subset of the output space ,
If , we say that is -differentially private.
The main goal of this paper is to design differentially private no-regret algorithms for the Online Submodular Minimization problem. There are many applications of online learning problems using sensitive data that could benefit from formal privacy guarantees, such as clinical drug trials, online ad placement, and personalized pricing. For concreteness, we provide the following motivating example for the study of private online submodular optimization.
As a concrete motivating example we consider the following online ad placement problem. Online retailers such as Amazon, Walmart, and Target design their websites such that the retailers can offer other products at check out which complement the item the customer is buying. Due to item complementarities, each user has a supermodular utility function , defined over the possible subsets of products the retailer can offer. For the user that arrives at time , the retailer must choose a set of products to display that maximize without knowing the user’s utility function. The retailer receives bandit feedback since they can only observe , and not the entire function . The retailer seeks to minimize regret: . Since supermodular maximization is mathematically equivalent to submodular minimization, the retailer has to solve an online submodular minimization problem with bandit feedback. Existing product recommendation systems have been shown to leak information about users , motivating the need for formal privacy guarantees in this setting. Therefore, the retailer should perform this optimization in a differentially private manner to ensure that no individual’s information is leaked to other users.
1.1 Our Results and Techniques
In this paper we develop the first algorithms for online submodular minimization that preserve differential privacy under full information feedback and bandit feedback.
We start with the full information setting, where the algorithm can observe the entire function after making its decision at each time . We give an algorithm in this setting that is both differentially private and satisfies no regret.
Theorem 1 (Informal).
In the full information setting of Online Submodular Minimization, there is an -differentially private algorithm that achieves regret:
This algorithm works by first relaxing each input submodular function to a convex function using the Lovasz extension (defined formally in Section 2.2). Our algorithm then simulates an algorithm for differentially private online convex optimization (due to Smith and Thakurta ) run on the sequence of Lovasz extensions. The differential privacy guarantee is inherited from the private online convex optimization algorithm. To prove the regret bound, we show that the relaxation and optimization on convex functions does not increase the regret guarantee by too much. Our algorithm loses only a factor of relative to the regret of  for private online convex optimization.
We next consider the bandit setting, which is significantly more challenging and requires new techniques. The private online convex optimization algorithm of Smith and Thakurta  requires use of the subgradient of the Lovasz extension. However in the bandit setting, the algorithm does not receive enough information to compute the exact Lovasz extension or its subgradients. Instead, we construct an unbiased estimate of the subgradient using the one-point estimation method of . We then apply the algorithm of  to the unbiased estimate of the gradient of the Lovasz extension. This yields a differentially private no-regret algorithm for online submodular minimization in the bandit setting.
Theorem 2 (Informal).
In the bandit setting of Online Submodular Minimization, there is an -differentially private algorithm that achieves regret:
The regret guarantees of our algorithms are worse than the best non-private algorithms by only a factor of and .
1.2 Related Work
Our results rely heavily on tools from  and .  provides a differentially private algorithm for online convex optimization that achieves a regret rate in the full information setting, which is worse than the non-private setting by only a factor of polylog. Under bandit feedback, they give a modification of their full information algorithm that achieves cumulative regret . One of the key components in our algorithms are modifications of these tools for online convex optimization, which are applied once we have relaxed the submodular functions to their convex Lovasz extensions.  provide algorithms for non-private online submodular minimization in both the full information and bandit feedback settings. They design subgradient descent-type algorithms that achieve regret of and in the full information and bandit settings respectively. Our algorithms make use of their one-point gradient estimation technique for the bandit setting. We remark that, to the best of our knowledge, there is no known way to modify subgradient descent-type algorithms, to achieve differential privacy in the online convex bandit problem without damaging the regret bounds by less than polylog factors.
Although our algorithms use these tools, composition of these previous results is not straight-forward. The bound on the variance of the one-point gradient estimator for the Lovasz extension is not the same as that of the estimator used for online convex optimization with bandit feedback, which requires special care in the analysis. If one were to blindly compose the results of and , it would yield regret in the bandit setting, instead of the regret rate that we achieve.
Other relevant work includes , where the authors design differentially private algorithms for online convex optimization. However, these algorithms only achieve optimal regret rates in some special cases. In , the authors provide differentially private algorithms for the special case of online linear optimization with bandit feedback, and obtain regret which is (almost) optimal. The problem of private online submodular maximization has been studied by  and . However, our work cannot be compared to theirs since the problems of minimizing and maximizing a submodular functions are very different. Additionally, these works only consider the offline problem with full information feedback. Finally,  studies non-private online submodular maximization only under full information feedback.
In this section we present background on convex functions, submodular functions, and differential privacy that will be useful for our results in later sections.
2.1 Convexity and Lipschitz Continuity
For a set we define its diameter . A set is a convex set if for any and any , . For a function , a subgradient of at a point , denoted
, is a vectorsuch that for all . The subdifferential of at , denoted , is the set of all subgradients of at .
Definition 2 (Strongly convex function).
Let be a convex set. A function is -strongly convex for if, for all . If , we say that is convex.111This is equivalent to the more commonly used definition that is convex if for any and for any , .
Note that every strongly convex function is also convex. For convex , the subdifferential at every point always exists and is a closed convex set.
Definition 3 (Lipschitz function).
A function is -Lipschitz continuous with respect to a norm if for every .
Lemma 1 gives an equivalence between Lipschiptzness of a convex function and properties of that function’s subgradients.
Lemma 1 ().
Let be a convex function. Then is -Lipschitz over with respect to norm if and only if for all and for all we have that , where denotes the dual norm of .
Throughout the paper, we will say that a function is -Lipschitz to indicate that is -Lipschitz with respect to the norm , unless otherwise stated. We also note that the norm is self-dual: .
2.2 Submodular Functions
Submodular functions share many properties with both convex and concave functions. They can be thought of as convex functions when one is trying to minimize them, however they also exhibit a diminishing marginal returns property as some concave functions do (i.e., ).
Definition 4 (Submodular function).
A function is submodular if for all sets such that and for all elements ,
The connection between convex and submodular functions is formalized through the Lovasz extension (Definition 6), which extends a submodular function over to its corresponding convex function over . The Lovasz extension works by first describing each point in as a convex combination of points in , which can be interpreted as subsets of . It then defines as the convex combination of evaluated on the sets associated with . We first define the necessary notation.
Definition 5 (Maximal chain ).
A chain of subsets of is a collection of sets such that . A chain is maximal if . For a maximal chain, , , and there is a unique associated permutation such that for all . For this permutation, we can write for all .
Define . For any set , let denote the characteristic vector of , defined as if and otherwise. For any , there is a unique chain such that can be expressed as a convex combination of the characteristic vectors of the . That is, such that and . Note that if (i.e., the chain is not maximal), the chain can be extended to a maximal chain by setting for all ’s corresponding the the subsets of that were not present in the original chain. The chain and the weights can be found in time (see, e.g., Chap. 3 of Bach ).
We are now ready to define the Lovasz extension of submodular function .
Definition 6 (Lovasz extension).
Let be submodular. The Lovasz extension of is defined as follows. For each , let be the chain associated with , and let be the corresponding weights in the convex combination . Then,
Equivalently, the Lovasz extension can also be defined by sampling uniformly at random from the unit interval and considering level set . Then for each .
We now provide some useful properties of the Lovasz extension.
The Lovasz extension of submodular function is convex. Additionally, for any , let be any maximal chain associated with and let be the corresponding permutation. Then a subgradient of at is given by: for all .
Lemma 3 ().
All subgradients of the Lovasz extension of a submodular function are bounded by .
2.3 Tools from Differential Privacy
Recall the definition of differential privacy from Section 1.
The following theorem says that differential privacy is robust to post-processing: computations performed on the output of a differentially private algorithm are still differentially private.
Theorem 3 (Post-processing ).
Let be -differentially private, and let be an arbitrary randomized function. Then is -differentially private.
In the remainder of this section, we review two differentially private algorithms that are needed for our results. Section 2.3.1 contains a Tree-based Aggregation Protocol (TBAP), which computes online differentially private partial sums of a stream of bits. Section 2.3.2 contains Private Follow the Approximate Leader, which is a differentially private algorithm for online convex optimization, and uses TBAP as a subroutine.
2.3.1 Tree-Based Aggregation Protocol (TBAP)
The Tree-Based Aggregation Protocol is a tool for maintaining differentially private partial sums of vectors arriving in an online sequence. At each time , TBAP outputs a noisy sum of the input vectors up to time . This algorithm was first introduced by Chan et al.  and Dwork et al. , and adapted in its current form by Smith and Thakurta .
The algorithm, presented formally in Appendix A, works by maintaining a complete binary tree, where the -dimensional input vectors are stored in the leaf nodes, and internal nodes in the tree store a noisy sum of all leaves in their sub-tree. At each time , TBAP receives input and updates the value of the -th leaf node to be . The algorithm also updates the value of each internal node affected by this change to be the updated sum plus noise drawn according to a high-dimensional analog of Laplace noise. The algorithm then outputs a noisy partial sum of the nodes in the tree that approximately sum to .
The sum at each internal node is -differentially private, and by construction each affects only nodes of the tree. By the composition property of differential privacy , the entire tree is -differentially private (Theorem 4).
TBAP is -differentially private for any and any sequence of vectors that each have norm at most .
2.3.2 Private Follow The Approximate Leader (PFTAL)
Private Follow The Approximate Leader (PFTAL) is an algorithm due to Smith and Thakurta  that takes in a sequence of strongly convex functions and outputs a sequence of points that minimizes regret. It is a variant of the Follow The Regularized Leader algorithm of , with the difference that instead of using exact sums of subgradients in the update step, the algorithm uses TBAP to provide private and accurate estimates of the sums of the subgradients. This algorithm inherits the differential privacy guarantee of TBAP via post-processing (Theorem 3). PFTAL enjoys low regret due to the no-regret guarantees of Follow the Regularized Leader, and from bounds on the noise added in TBAP. The full algorithm is stated in Appendix A.
Theorem 5 ().
PFTAL is -differentially private, and if are -strongly convex and -Lipschitz, then the expected regret of PFTAL satisfies:
3 Full Information Setting
In this section we present Submodular Private Follow The Approximate Leader (SubmodPFTAL) which is an algorithm for Online Submodular Minimization that is both differentially private and achieves near optimal regret. In the full information setting, the result follows easily from PFTAL applied to a modified version of the Lovasz extensions of the input submodular functions.
The main difference between using a Follow The Approximate Leader type algorithm versus the subgradient descent type algorithm of  is the following. When using SubmodPFTAL to make the decision at time , we use all the subgradients we have observed at times . To contrast, if we used the algorithm of , we would only be using the subgradient obtained at . This difference is crucial when trying to incorporate privacy into the setting.
Ideally, we would like to run PFTAL on the Lovasz extensions themselves, so that we can apply the regret guarantee of Theorem 5. However, PFTAL requires strongly convex input functions, but the Lovasz extension is only guaranteed to be convex. To overcome this barrier, we regularize the Lovasz extensions to ensure strong convexity. Define the -regularized Lovasz extension as,
The algorithm SubmodPFTAL then runs PFTAL on .
Theorem 6 (Privacy guarantee).
SubmodPFTAL is -differentially private for any sequence of functions with bounded range and for any .
By Theorem 4 we know that the output of TBAP, , is -differentially private. By Theorem 3 we get that the sequence is -differentially private since the procedure is simply post-processing of the ’s. Computing the output is further post-processing of the sequence , and Theorem 3 again yields the result. ∎
Theorem 7 (Regret guarantee).
SubmodPFTAL run with and for any sequence of submodular functions for any guarantees,
where the expectation is taken over the randomness of TBAP and the sampling procedure to choose .
To prove the theorem, we first draw a comparison between SubmodPFTAL and PFTAL so that we can call upon Theorem 5. Notice that SubmodPFTAL is PFTAL run on sequence of functions as defined in Equation (1), with two extra steps used to convert elements from to subsets of . Using the regret guarantee from PFTAL (Theorem 5) on the regularized Lovasz extension we get,
We now transform this regret guarantee into one for the Lovasz extension. First, notice that for any , , and therefore . Second, we now show that . Indeed, let . Then,
Putting this two observations together with Equation (2) we get,
Plugging , and yields,
We are ready to conclude the proof.
4 Bandit Setting
In this section we present Submodular Private Follow The Approximate Leader with Bandit Feedback (BanditSubmodPFTAL). This algorithm is differentially private and achieves a no regret guarantee for Online Submodular Minimization with bandit feedback.
The bandit setting makes the problem much more challenging because we do not have access to the whole function nor to its subgradients. Instead we only observe the function evaluated at a single point, for our chosen set . This means that we can no longer compute subgradients of the Lovasz extension and run PFTAL on the regularized as in the full information setting.
The key to obtain sublinear regret is to balance exploration and exploitation. In this setting, exploitation is achieved by sampling exactly from the distribution defined (through the Lovasz extension) by iterate of BanditSubmodPFTAL. However, if we sample according to the distribution over sets , we do not learn anything about the function’s subgradients so, it is unclear what to do in future steps. To fix this, we should sample from some distribution that is close to , that allows us to explore (i.e. obtain an unbiased estimate of the Lovasz extension at ). We use the sampling procedure from Hazan and Kale  to achieve this.
With these modifications, BanditSubmodPFTAL now works similarly to SubmodPFTAL for the full information setting. The algorithm works by computing an unbiased estimator of the gradient of the Lovasz extension , updating a private iterate using TBAP on the regularized estimator, and outputting a random set that depends on . We now present the full algorithm of BanditSubmodPFTAL in Algorithm 2.
The analysis of BanditSubmodPFTAL relies on the following key properties of the estimate .222Our Lemmas 4 and 5 were asserted without proof in . Due to minor errors in the construction of in , these claims are easily seen to be false under their construction. Here, we build the correct estimator and prove its correctness. Proofs are deferred to the Appendix.
The random vector computed in BanditSubmodPFTAL is an unbiased estimate of a subgradient of the Lovasz extension of submodular , evaluated at point . That is,
The random vector computed in BanditSubmodPFTAL satisfies the following bound on its expected -norm,
where the expectation is taken over the algorithm’s internal randomness up to time .
The exploration-exploitation dilemma can be better understood through the parameter . This parameter trades off between variance of the estimate and the approximation of the Lovasz extension to the true submodular function . When is large, the variance of is diminished, as can be seen in the statement of Lemma 5. When is small, the performance of is close to that of (see Lemma 6 in Section 4.1). In the statement of our main result (Theorem 9), we optimally tune to balance exploration and exploitation and minimize overall regret of BanditSubmodPFTAL.
Our two main results of this section show that BanditSubmodPFTAL is differentially private and achieves low regret.
Theorem 8 (Privacy guarantee).
BanditSubmodPFTAL is -differentially private for any sequence of functions with bounded range and for any .
By Theorem 4 we know that the output of TBAP, , is -differentially private. Notice that BanditSubmodPFTAL is running PFTAL on regularized functions thus by the same reasoning as in Theorem 6, the sequence is -differentially private since the procedure is simply post-processing of the ’s. Since is post-processing on the sequence , applying Theorem 3 again completes the proof. ∎
Theorem 9 (Regret guarantee).
BanditSubmodPFTAL run with , , and for any sequence of submodular functions for any guarantees,
4.1 Regret Analysis of BanditSubmodPFTAL
There are several sources of potential sub-optimality in the output of BanditSubmodPFTAL that must be bounded. Firstly, the algorithm optimizes using continuous iterates instead of discrete (Lemma 6). Secondly, it uses the -regularized Lovasz extension instead of the true Lovasz extension to compute iterates (Lemma 7). The algorithm incurs additional loss from the noise added in TBAP to preserve privacy (Lemma 9). Due to the bandit feedback, we cannot compute an exact subgradient of the regularized Lovasz extension, and must instead use a (random) unbiased estimator (Lemma 10).
The following lemmas bound the regret from these sources of error, and are used in the proof of Theorem 9 presented at the end of the section. All omitted proofs are presented in the appendix.
We start with a lemma from Hazan and Kale , showing that the additional loss from choosing a subset of the ground set instead of the point in is not too large.
Lemma 6 ().
For any submodular function , let and be the corresponding iterates and sets as defined in BanditSubmodPFTAL, then .
As in Section 3, the regret guarantees of PFTAL require input functions that are strongly convex, but the Lovasz extension of submodular is only convex. We again regularize the Lovasz extension to ensure that it is strongly convex. Recall the regularized Lovasz extension, as defined in Equation 1:
Recall also that is -strongly convex, satisfies , and is -Lipschitz continuous. Since is an unbiased estimate of the subgradient of the Lovasz extension at point (i.e., by Lemma 4), then .
We now show that the additional regret from regularized Lovasz extension instead of the Lovasz extension is not too high. The following lemma was stated without proof in ; we provide a proof in the appendix for completeness.
Lemma 7 ().
Let be any sequence of submodular functions, let be their Lovasz extensions, let be their regularized Lovasz extensions, let be any sequence of elements in . It holds that
It will be useful in our analysis to define , which is a quadratic lower bound on the regularized Lovasz extension since the regularized Lovasz extension is -strongly convex:
Note that is -Lipschitz continuous. Indeed .
Our next lemma shows that analyzing this lower bound instead of the regularized Lovasz extension does not harm regret by too much.
Let be any sequence of submodular functions, let be their regularized Lovasz extensions, let be any sequence of elements in . It holds that
For our analysis, we introduce random functions that satisfy for all . Define as follows:
The function is -Lipschitz continuous because .
If we were in a non-private setting, we would define the update step to in BanditSubmodPFTAL as,
where the second equality holds since the first two terms that define do not contain . However, since we desire a differentially private algorithm, we will instead use the private partial sum from TBAP to approximate . Thus the private update is,