As computational resources are increasing rapidly, modern websites and online systems are able to process large amounts of information gathered from their customers in real time. While typically these websites intend to learn and improve their systems in real-time using the available data, this also represents a severe threat to the privacy of customers.
For example, consider a generic scenario for a web search engine like Bing. Sponsored advertisements (ads) served with search results form a major source of revenue for Bing, for which, Bing needs to serve ads that are relevant to the user and the query. As each user is different and can have different definition of “relevance”, many websites typically try to learn the user behavior using past searches as well as other available demographic information. This learning problem has two key features: a) the advertisements are generated online in response to a query, b) feedback for goodness of an ad for a user cannot be obtained until the ad is served. Hence, the problem is an online learning game where the search engine tries to guess (from history and other available information) if a user would like an ad and gets the cost/reward only after making that online decision; after receiving the feedback the search engine can again update its model. This problem can be cast as a standard online learning problem and several existing algorithms can be used to solve it reasonably well.
However, processing critical user information in real-time also poses severe threats to a user’s privacy. For example, suppose Bing in response to certain past queries (let say about a disease), promotes a particular ad which otherwise doesn’t appear at the top and the user clicks that ad. Then, the corresponding advertiser should be able to guess user’s past queries, thus compromising privacy. Hence, it is critical for the search engine to use an algorithm which not only provides correct guess about relevance of an ad to a user, but also guarantees privacy to the user. Some of the other examples where privacy preserving online learning is critical are online portfolio management , online linear prediction  etc.
In this paper, we address privacy concerns for online learning scenarios similar to the ones mentioned above. Specifically, we provide a generic framework for privacy preserving online learning. We use differential privacy  as the formal privacy notion, and use online convex programming (OCP)  as the formal online learning model.
Differential privacy is a popular privacy notion with several interesting theoretical properties. Recently, there has been a lot of progress in differential privacy. However, most of the results assume that all of the data is available beforehand and an algorithm processes this data to extract interesting information without compromising privacy. In contrast, in the online setting that we consider in this paper, data arrives online111At each time step one data entry arrives. (e.g. user queries and clicks) and the algorithm has to provide an output (e.g. relevant ads) at each step. Hence, the number of outputs produced is roughly same as the size of the entire dataset. Now, to guarantee differential privacy one has to analyze privacy of the complete sequence of outputs produced, thereby making privacy preservation a significantly harder problem in this setting. In a related work,  also considered the problem of differential private online learning. Using the online experts model as the underlying online learning model,  provided an accurate differentially private algorithm to handle counting type problems. However, the setting and the class of problems handled by  is restrictive and it is not clear how their techniques can be extended to handle typical online learning scenarios, such as the one mentioned above. See Section 1.1 for a more detailed comparison to .
Online convex programming (OCP), that we use as our underlying online learning model, is an important and powerful online learning model with several theoretical and practical applications. OCP requires that the algorithm selects an output at each step from a fixed convex set, for which the algorithm incurs cost according to a convex function (that maybe different at each step). The cost function is revealed only after the point is selected. Now the goal is to minimize the regret, i.e., total “added” loss incurred in comparison to the optimal offline solution—a solution obtained after seeing all the cost functions. OCP encompasses various online learning paradigms and has several applications such as portfolio management . Now, assuming that each of the cost function is bounded over the fixed convex set, regret incurred by any OCP algorithm can be trivially bounded by where is the total number of time-steps for which the algorithm is executed. However, recently several interesting algorithms have been developed that can obtain regret that is sub-linear in . That is, as , the total cost incurred is same as the cost incurred by the optimal offline solution. In this paper, we use regret as a “goodness” or “utility” property of an algorithm and require that a reasonable OCP algorithm should at least have sub-linear regret.
To recall, we consider the problem of differentially private OCP , where we want to provide differential privacy guarantees along with sub-linear regret bound. To this end, we provide a general framework to convert any online learning algorithm into a differentially private algorithm with sub-linear regret, provided that the algorithm satisfies two criteria: a) linearly decreasing sensitivity (see Definition 3), b) sub-linear regret. We then analyze two popular OCP algorithms namely, Implicit Gradient Descent (IGD )  and Generalized Infinitesimal Gradient Ascent (GIGA )  to guarantee differential privacy as well as regret for a fairly general class of strongly convex, Lipschitz continuous gradient functions. In fact, we show that IGD can be used with our framework for even non-differentiable functions.We then show that if the cost functions are quadratic functions (e.g. online linear regression), then we can use another OCP algorithm called Follow The Leader (FTL) [20, 22] along with a generalization of a technique by  to guarantee regret while preserving privacy.
Furthermore, our differentially private online learning framework can be used to obtain privacy preserving algorithms for a large class of offline learning problems  as well. In particular, we show that our private OCP framework can be used to obtain good generalization error bounds for various offline learning problems using techniques from  (see Section 4.2). Our differentially private offline learning framework can handle a larger class of learning problems with better error bounds than the existing state-of-the-art methods .
1.1 Related Work
As more and more of world’s information is being digitized, privacy has become a critical issue. To this end, several ad-hoc privacy notions have been proposed, however, most of them stand broken now. De-anonymization of the Netflix challenge dataset by  and of the publicly released AOL search logs  are two examples that were instrumental in discarding these ad-hoc privacy notions. Even relatively sophisticated notions such as -anonymity  and -diversity  have been permeated through by attacks . Hence, in pursuit of a theoretically sound notion of privacy ,  proposed differential privacy, a cryptography inspired definition of privacy. This notion has now been accepted as the standard privacy notion, and in this work we adhere to this notion for our privacy guarantees.
Over the years, the privacy community have developed differentially private algorithms for several interesting problems [6, 7, 8]. In particular, there exists many results concerning privacy for learning problems [2, 3, 35, 29, 33]. Among these,  is of particular interest as they consider a large class of learning problems that can be written as (offline) convex programs. Interestingly, our techniques can be used to handle the offline setting of  as well and in fact, our method can handle larger class of learning problems with better error bounds (see Section 4.2).
As mentioned earlier, most of the existing work in differentially private learning has been in the offline setting where the complete dataset is provided upfront. One notable exception is the work of , where authors formally defined the notion of differentially private learning when the data arrives online. Specifically,  defined two notions of differential privacy, namely user level privacy and event level privacy. Roughly speaking, user level privacy guarantees are at the granularity of each user whose data is present in the dataset. In contrast, event level privacy provides guarantees at the granularity of individual records in the dataset. It has been shown in  that it is impossible to obtain any non-trivial result with respect to user level privacy. In our current work we use the notion of event level privacy.  also looked at a particular online learning setting called the experts setting, where their algorithm achieves a regret bound of for counting problems while guaranteeing event level differential privacy. However, their approach is restricted to experts advice setting, and cannot handle typical online learning problems that arise in practice. In contrast, we consider a significantly more practical and powerful class of online learning problems, namely, online convex programming, and also provide a method for handling a large class of offline learning problems.
In a related line of work, there have been a few results that use online learning techniques to obtain differentially private algorithms [18, 14]. In particular,  used experts framework to obtain a differentially private algorithm for answering adaptive counting queries on a dataset. However, we stress that although these methods use online learning techniques, however they are designed to handle the offline setting only where the dataset is fixed and known in advance.
Recall that in the online setting, whenever a new data entry is added to , a query has to be answered, i.e., the total number of queries to be answer is of the order of size of the dataset. In a line of work started by  and subsequently explored in details by [12, 25], it was shown that if one answers subset sum queries on a dataset with noise in each query smaller than , then using those answers alone one can reconstruct a large fraction of . That is, when the number of queries is almost same as the size of dataset, then a reasonably “large” amount of noise needs to be added for preserving privacy. Subsequently, there has been a lot of work in providing lower bounds (specific to differential privacy) on the amount of noise needed to guarantee privacy while answering a given number of queries (see [19, 25, 4]). We note that our generic online learning framework (see Section 3.1) also adds noise of the order of ,
at each step, thus respecting the established lower bounds. In contrast, our algorithm for quadratic loss function (see Section3.5) avoids this barrier by exploiting the special structure of queries that need to be answered.
1.2 Our Contributions
Following are the main contributions of this paper:
We formalize the problem of privacy preserving online learning using differential privacy as the privacy notion and Online Convex Programming (OCP) as the underlying online learning model. We provide a generic differentially private framework for OCP in Section 3 and provide privacy and utility (regret) guarantees.
For a special class of OCP where cost functions are quadratic functions only, we show that we can improve the regret bound to by exploiting techniques from . This special class includes a very important online learning problem, namely, online linear regression.
2.1 Online Convex Programming
Online convex programming (OCP ) is one of the most popular and powerful paradigm in the online learning setting. OCP can be thought of as a game between a player and an adversary. At each step , player selects a point from a convex set . Then, adversary selects a convex cost function and the player has to pay a cost of . Hence, an OCP algorithm maps a function sequence to a sequence of points , i.e., . Now, the goal of the player (or the algorithm) is to minimize the total cost incurred over a fixed number (say ) of iterations. However, as adversary selects function after observing player’s move , it can make the total cost incurred by the player arbitrarily large. Hence, a more realistic goal for the player is to minimize regret, i.e., the total cost incurred when compared to the optimal offline solution selected in hindsight, i.e., when all the functions have already been provided. Formally,
Definition 1 (Regret).
Let be an online convex programming algorithm. Also, let selects a point at -th iteration and be a convex cost function served at -th iteration. Then, the regret of over iterations is given by:
Assuming to be a bounded function over , any trivial algorithm that selects a random point will have regret. However, several results [27, 36] show that if each is a bounded Lipschitz function over , regret can be achieved. Furthermore, if each is a “strongly” convex function, regret can be achieved [27, 22].
2.2 Differential Privacy
We now formally define the notion of differential privacy in the context of our problem.
Let be a sequence of convex functions. Let , where be outputs of OCP algorithm when applied to . Then, a randomized OCP algorithm is -differentially private if given any two function sequences and that differ in at most one function entry, for all the following holds:
Intuitively, the above definition means that changing an to some other function will not modify the output sequence by a large amount. If we consider each to be some information associated with an individual, then the above definition states that the presence or absence of that individual’s entry in the dataset will not affect the output by too much. Hence, output of the algorithm will not reveal any extra information about the individual. Privacy parameters decides the extent to which an individual’s entry affects the output; lower values of and means higher level of privacy. Typically, should be exponentially small in the problem parameters, i.e., in our case .
denotes the function sequence given to an OCP algorithm and s.t. represents output sequence when is applied to . We denote the subsequence of functions till the -th step as . denotes the dimensionality of the ambient space of convex set
. Vectors are denoted by bold-face symbols, matrices are represented by capital letters.denotes the inner product of and . denotes spectral norm of matrix ; recall that for symmetric matrices ,
is the largest eigenvalue of.
Typically, is the minimum strong convexity parameter of any . Similarly, and are the largest Lipschitz constant and the Lipschitz constant of the gradient of any . Recall that a function is -strongly convex, if for all and for all the following holds: . Also recall that a function is -Lipschitz, if for all the following holds: . Function is Lipschitz continuous gradient if , for all . Non-private and private versions of an OCP algorithm outputs and respectively, at time step . denotes the optimal offline solution, that is . denotes regret of an OCP algorithm when applied for steps.
3 Differentially Private Online Convex Programming
In Section 2.1, we defined the online convex programming (OCP ) problem and presented a notion of utility (called regret) for OCP algorithms. Recall that a reasonable OCP should have sub-linear regret, i.e., the regret should be sub-linear in the number of time steps .
In this section, we present a generic differentially private framework for solving OCP problems (see Algorithm 1). We further provide formal privacy and utility guarantees for this framework (see Theorems 1 and 2). We then use our private OCP framework to convert two existing OCP algorithms, namely, Implicit Gradient Decent (IGD) and Generalized Infinitesimal Gradient Ascent (GIGA) into differentially private algorithms using a “generic” transformation. For both the algorithms mentioned above, we guarantee -differential privacy with sub-linear regret.
Recall that a differentially private OCP algorithm should not produce a significantly different output for a function sequence
(with high probability) when compared to, where and differ in exactly one function. Hence, to show differential privacy for an OCP algorithm, we first need to show that it is not very “sensitive” to previous cost functions. To this end, below we formally define sensitivity of an OCP algorithm .
Let be two function sequences differing in at most one entry, i.e., at most one function can be different. Then, the sensitivity of an algorithm is the difference in the -th output of the algorithm , i.e.,
As mentioned earlier, another natural requirement for an OCP algorithm is that it should have a provably low regret bound. There exists a variety of methods in literature which satisfy this requirement up to different degrees depending on the class of the functions .
Under the above two assumptions on the OCP algorithm , we provide a general framework for adapting the given OCP algorithm () into a differentially private algorithm. Formally, the given OCP algorithm should satisfy the following two conditions:
-sensitivity: The -sensitivity of the algorithm should decay linearly with time, i.e.,
where is a constant depending only on , and strong convexity, Lipschitz constant of the functions in .
Regret bound : Regret of is assumed to be bounded, typically by a sub-linear function of , i.e.,
Given that satisfies both (1) and (2), we convert it into a private algorithm by perturbing (output of at -th step) by a small amount of noise, whose magnitude is dependent on the sensitivity parameter of . Let be the perturbed output, which might be outside the convex set . As our online learning game requires each output to lie in , we project back to and output the projection . Note that, our Private OCP (POCP) algorithm also stores the “uncorrupted” iterate , as it would be used in the next step. See Algorithm 1 for a pseudo-code of our method.
Now, using the above two assumptions along with concentration bounds for Gaussian noise vectors, we obtain both privacy and regret bound for our Private OCP algorithm. See Section 3.1 and 3.2 for a detailed analysis of our privacy guarantee and the regret bound.
In Sections 3.3 and 3.4, we use our abstract private OCP framework to convert IGD and GIGA algorithms into private OCP methods. For both the algorithms, privacy and regret guarantees follow easily from the guarantees of our OCP framework once the corresponding sensitivity bounds are established.
3.1 Privacy Analysis for Pocp
Under the assumption (1), changing one function in the cost function sequence can lead to a change of at most in the -th output of . Hence, intuitively, adding a noise of the same order should make the -th step output of Algorithm 1 differentially private. We make the claim precise in the following lemma.
As the output is just a projection, i.e., a function (independent of the input functions ) of , hence -differential privacy for would imply the same for .
Now by the definition of differential privacy (see Definition 2), is -differential private, if for any measurable set :
where is the output of the noise addition step (see Algorithm 1, Step 7) of our POCP algorithm, when applied to function sequence . Similarly, is the output of the noise addition to which is obtained by applying update step to , where differs from in exactly one function entry.
Now, and . Let . Then, we have . Now, using assumption (1) for the OCPalgorithm and Mill’s inequality,
where . Setting R.H.S. , we have .
Now, we define a “good set” :
We now bound :
Now, for :
where and is as given in the Lemma statement. The second last inequality follows from the definition of and the sensitivity assumption (1).
Hence, proved. ∎
Now, the above lemma shows -differential privacy for each step of Algorithm 1. Hence, using a simple composition argument (see ) should guarantee -differential privacy for all the steps. So to get overall privacy, we will need . That is, a noise of the order needs to be added at each step, which intuitively means that the noise added is larger than the effect of incoming function and hence can lead to an arbitrarily bad regret.
To avoid this problem, we need to exploit the interdependence between the iterates (and outputs) of our algorithm so as to obtain a better bound than the one obtained by using the union bound. For this purpose, we use the following lemma by 
that bounds the relative entropy of two random variables in terms of thenorm of their probability density ratio and also a proof technique developed by [18, 17] for the problem of releasing differentially private datasets.
Lemma 2 ().
Suppose two random variables and satisfy,
Then . is the support set of a random variable .
We now state a technical lemma which will be useful for our differential privacy proof.
Using the fact that is -differential private:
Lemma now follows using the above observation with Lemma 2. ∎
Now we state the privacy guarantee for Algorithm 1 over all iterations.
Theorem 1 (Pocp Privacy).
Now, the probability that the noise vectors are all from the “good” set in all the rounds is at least .
We now condition the remaining proof on the event that the noise vector in each round is such that .
Let . Using Lemma 3,
Let . Since each is sampled independently and the randomness in is only due to , ’s are independent. We have , where . By Azuma-Hoeffding’s inequality,
Setting , we get . Hence, with probability at least , -differential privacy holds conditioned on , i.e,
Also, recall that with probability at least , the noise vector in each round itself was such that . Hence, with probability at least , -differential privacy holds. -differential privacy now follows using a standard argument similar to (5). ∎
3.2 Utility (Regret) Analysis for Pocp
In this section, we provide a generic regret bound analysis for our POCP algorithm (see Algorithm 1). The regret bound of POCP depends on the regret of the non-private OCP algorithm . For typical OCP algorithms like IGD, GIGA and FTL , , assuming each cost function is strongly convex.
Theorem 2 (Pocp Regret).
Let be the maximum Lipschitz constant of any function in the sequence , , the regret of the non-private OCP algorithm over -time steps and , the sensitivity parameter of (see (1)). Then the expected regret of our POCP algorithm (Algorithm 1) satisfies:
where is the dimensionality of the output space, and is the diameter of the convex set . In other words, the regret bound is .
Let be the output of the POCP algorithm. By the Lipschitz continuity of the cost functions we have,
Since at any time , is the projection of on the convex set , we have
where is the noise vector added in the -th iteration of the POCP algorithm. Therefore,
Therefore, follows Chi-distribution with parameters and .
Using Chebyshev’s inequality, we can also obtain a high probability bound on the regret.
Let be the maximum Lipschitz constant of any function in the sequence , , the regret of the non-private OCP algorithm over -time steps and , the sensitivity parameter of (see (1)). Then with probability at least ,the regret of our Private OCP algorithm (Algorithm 1) satisfies:
where is the dimensionality of the output space, is the diameter of .
3.3 Implicit Gradient Descent Algorithm
In this section, we consider the Implicit Gradient Descent (IGD) algorithm , a popular online convex programming algorithm, and present a differentially private version of the same using our generic framework (see Algorithm 1). Before deriving its privacy preserving version, we first briefly describe the IGD algorithm .
At each step , IGD incurs loss . Now, given , IGD finds the -th step output so that it not “far” away from the current solution but at the same time tries to minimize the cost . Formally,
where squared Euclidean distance is used as the notion of distance from the current iterate.  describe a much large class of distance functions that can be used, but for simplicity of exposition we consider the Euclidean distance only. Assuming each is a strongly convex function, a simple modification of the proof by  shows regret for IGD, i.e. .
Recall that our generic private OCP framework can be used to convert any OCP algorithm as long as it satisfies low-sensitivity and low-regret assumptions (see (1), (2)). Now, similar to POCP , our Private IGD (PIGD) algorithm also adds an appropriately calibrated noise at each update step to obtain differentially private outputs . See Algorithm 2 for a pseudo-code of our algorithm.
As stated above, if each is strongly convex. We now bound sensitivity of IGD at each step in the following lemma. The proof makes use of a simple and novel induction based technique.
Lemma 4 (Igd Sensitivity).
-sensitivity (see Definition 3) of the IGD algorithm is for the -th iterate, where is the maximum Lipschitz constant of any function .
We prove the above lemma using mathematical induction.
Base Case (): As is selected randomly, it’s value doesn’t depend on the underlying dataset.
Induction Step : As is strongly convex, the strong convexity coefficient of the function is . Now using strong convexity and the fact that at optima , we get:
Now, we consider two cases: