1 Introduction
Stochastic convex optimization (SCO) is a central problem in machine learning and statistics, where for a sample space
, parameter space , and a collection of convex losses , one wishes to solve(1) 
using an observed dataset . While as formulated, the problem is by now fairly wellunderstood [12, 38, 29, 10, 37], it is becoming clear that, because of considerations beyond pure statistical accuracy—memory or communication costs [45, 26, 13], fairness [23, 28], personalization or distributed learning [35]—problem (1) is simply insufficient to address modern learning problems. To that end, researchers have revisited SCO under the additional constraint that the solution preserves the privacy of the provided sample [22, 21, 1, 16, 19]. A waypoint is Bassily et al. [7], who provide a private method with optimal convergence rates for the related empirical risk minimization problem, with recent papers focus on SCO providing (worstcase) optimal rates in various settings: smooth convex functions [8, 25], nonsmooth functions [9], nonEuclidean geometry [5, 4] and under more stringent privacy constraints [34].
Yet these works ground their analyses in worstcase scenarios and provide guarantees for the hardest instance of the class of problems they consider. Conversely, they argue that their algorithms are optimal in a minimax sense: for any algorithm, there exists a hard instance on which the error achieved by the algorithm is equal to the upper bound. While valuable, these results are pessimistic—the exhibited hard instances are typically pathological—and fail to reflect achievable performance.
In this work, we consider the problem of adaptivity when solving (1) under privacy constraints. Importantly, we wish to provide private algorithms that adapt to the hardness of the objective
. A loss function
may belong to multiple problem classes, each exhibiting different achievable rates, so a natural desideratum is to attain the error rate of the easiest subclass. As a simple vignette, if one gets an arbitrary Lipschitz convex loss function , the worstcase guarantee of any DP algorithm is . However, if one learns that exhibits some growth property—say is strongly convex—the regret guarantee improves to the faster rate with the appropriate algorithm. It is thus important to provide algorithms that achieves the rates of the “easiest” class to which the function belongs [32, 46, 18].To that end, consider the nested classes of functions for such that, if then there exists such that for all ,
For example, strong convexity implies growth with parameter . This growth assumption closely relates to uniform convexity [32] and the PolyakKurdykaŁojasiewicz inequality [11], and we make these connections precise in Section 2. Intuitively, smaller makes the function much easier to optimize: the error around the optimal point grows quickly. Objectives with growth are widespread in machine learning applications: among others, the regularized hinge loss exhibits sharp growth (i.e. ) while  or constrained norm regression —i.e. and —has growth for any integer greater than [43]. In this work, we provide private adaptive algorithms that adapt to the actual growth of the function at hand.
We begin our analysis by examining Asi and Duchi’s inverse sensitivity mechanism [2] on ERM as a motivation. While not a practical algorithm, it achieves instanceoptimal rates for any onedimensional function under mild assumptions, quantifying the best bound one could hope to achieve with an adaptive algorithm, and showing (in principle) that adaptive private algorithms can exist. We first show that for any function with growth, the inverse sensitivity mechanism achieves privacy cost ; importantly, without knowledge of the function class , that belongs to. This constitutes grounding and motivation for our work in three ways: (i) it validates our choice of subclasses as the privacy rate is effectively controlled by the value of , (ii) it exhibits the rate we wish to achieve with efficient algorithms on and (iii) it showcases that for easier functions, privacy costs shrink significantly—to illustrate, for the privacy rate becomes .
We continue our treatment of problem (1) under growth in Section 4 and develop practical algorithms that achieve the rates of the inverse sensitivity mechanism. Moreover, for approximate differential privacy, our algorithms improve the rates, achieving roughly . Our algorithms hinge on a reduction to SCO: we show that by solving a sequence of increasingly constrained SCO problems, one achieves the right rate whenever the function exhibits growth at the optimum. Importantly, our algorithm only requires a lower bound (where is the actual growth of ).
We provide optimality guarantees for our algorithms in Section 5 and show that both the inverse sensitivity and the efficient algorithms of Section 4 are simultaneously minimax optimal over all classes whenever and for DP algorithms. Finally, we prove that in arbitrary dimension, for both pure and approximateDP constraints, our algorithms are also simultaneously optimal for all classes with .
On the way, we provide results that may be of independent interest to the community. First, we develop optimal algorithms for SCO under pure
differential privacy constraints, which, to the best of our knowledge, do not exist in the literature. Secondly, our algorithms and analysis provide highprobability bounds on the loss, whereas existing results only provide (weaker) bounds on the expected loss. Finally, we complete the results of
Ramdas and Singh [40] on (nonprivate) optimization lower bounds for functions with growth by providing informationtheoretic lower bounds (in contrast to oraclebased lower bounds that rely on observing only gradient information) and capturing the optimal dependence on all problem parameters (namely and ).1.1 Related work
Convex optimization is one of the best studied problems in private data analysis [16, 19, 41, 7]. The first papers in this line of work mainly study minimizing the empirical loss, and readily establish that the (minimax) optimal privacy rates are for pure DP and for DP [16, 7]. More recently, several works instead consider the harder problem of privately minimizing the population loss [8, 25]. These papers introduce new algorithmic techniques to obtain the worstcase optimal rates of for DP. They also show how to improve this rate to the faster in the case of strongly convex functions. Our work subsumes both of these results as they correspond to and respectively. To the best of our knowledge, there has been no work in private optimization that investigates the rates under general growth assumptions or adaptivity to such conditions.
In contrast, the optimization community has extensively studied growth assumptions [40, 32, 15] and show that on these problems, carefully crafted algorithms improves upon the standard for convex functions to the faster . [32] derives worstcase optimal (in the firstorder oracle model) gradient algorithms in the uniformly convex case (i.e. ) and provides technique to adapt to the growth , while [40]
, drawing connections between growth conditions and active learning, provides upper and lower bounds in the firstorder stochastic oracle model. We complete the results of the latter and provide
informationtheoretic lower bounds that have optimal dependence on and —their lower bound only holding for inversely proportional to , when . Closest to our work is [15] who studies instanceoptimality via local minimax complexity [14]. For onedimensional functions, they develop a bisectionbased instanceoptimal algorithm and show that on individual functions of the form , the local minimax rate is .2 Preliminaries
We first provide notation that we use throughout this paper, define useful assumptions and present key definitions in convex analysis and differential privacy.
Notation.
typically denotes the sample size and the dimension. Throughout this work, refers to the optimization variable, to the constraint set and to elements ( when random) of the sample space . We usually denote by the (convex) loss function and for a dataset , we define the empirical and population losses
We omit the dependence on as it is often clear from context. We reserve for the privacy parameters of Definition 2. We always take gradients with respect to the optimization variable . In the case that is not differentiable at , we override notation and define , where is the subdifferential of at . We use for (potentially random) mechanism and as a shorthand for . For , is the standard norm, is the corresponding dimensional ball of radius and is the dual of , i.e. such that . Finally, we define the Hamming distance between datasets , where is the set of permutations over sets of size .
Assumptions.
We first state standard assumptions for solving (1). We assume that is a closed, convex domain such that . Furthermore, we assume that for any , is convex and Lipschitz with respect to . Central to our work, we define the following growth assumption.
[growth] Let . For a loss and distribution , we say that has growth for and , if the population function satisfies
In the case where is the empirical distribution on a finite dataset , we refer to growth of as growth of the empirical function .
Uniform convexity and inequality.
Assumption 2 is closely related to two fundamental notions in convex analysis: uniform convexity and the inequality. Following [39], we say that is uniformly convex with and if
This immediately implies that (i) sums (and expectations) preserve uniform convexity (ii) if is uniformly convex with and , then it has growth. This will be useful when constructing hard instances as it will suffice to consider uniformly convex functions which are generally more convenient to manipulate. Finally, we point out that, in the general case that , the literature refers to Assumption 2 as the inequality [11] with, in their notation, . Theorem 5(ii) in [11] says that, under mild conditions, Assumption 2 implies the following inequality between the error and the gradient norm for all
(2) 
This is a key result in our analysis of the inverse sensitivity mechanism of Section 3.
Differential privacy.
We begin by recalling the definition of differential privacy. [[22, 21]] A randomized algorithm is differentially private (DP) if, for all datasets that differ in a single data element and for all events in the output space of , we have
We use the following standard results in differential privacy. [Composition [20, Thm. 3.16]] If are randomized algorithms that each is DP, then their composition is DP.
Next, we consider the Laplace mechanism. We will let denote a
dimensional vector
such that for . [Laplace mechanism [20, Thm. 3.6]] Let have sensitivity , that is, . Then the Laplace mechanism with is DP.Finally, we need the Gaussian mechanism for DP. [Gaussian mechanism [20, Thm. A.1]] Let have sensitivity , that is, . Then the Gaussian mechanism with is DP.
Inverse sensitivity mechanism.
Our goal is to design private optimization algorithms that adapt to the difficulty of the underlying function. As a reference point, we turn to the inverse sensitivity mechanism of [2] as it enjoys general instanceoptimality guarantees. For a given function
that we wish to estimate privately, define the
inverse sensitivity at(3) 
that is, the inverse sensitivity of a target parameter at instance is the minimal number of samples one needs to change to reach a new instance such that . Having this quantity, the inverse sensitivity mechanism samples an output from the following probability density
(4) 
The inverse sensitivity mechanism preserves DP and enjoys instanceoptimality guarantees in general settings [2]. In contrast to (worstcase) minimax optimality guarantees which measure the performance of the algorithm on the hardest instance, these notions of instanceoptimality provide stronger perinstance optimality guarantees.
3 Adaptive rates through inverse sensitivity for Dp
To understand the achievable rates when privately optimizing functions with growth, we begin our theoretical investigation by examining the inverse sensitivity mechanism in our setting. We show that, for instances that exhibit growth of the empirical function, the inverse sensitivity mechanism privately solves ERM with excess loss roughly .
In our setting, we use a gradientbased approximation of the inverse sensitivity mechanism to simplify the analysis, while attaining similar rates. Following [3] with our function of interest , we can lower bound the inverse sensitivity under natural assumptions. We define a smoothed version of this quantity which is more suitable to continuous domains
and define the smooth gradientbased inverse sensitivity mechanism
(5) 
Note that while exactly sampling from the unnormalized density is computationally intractable, analyzing its performance is an important step towards understanding the optimal rates for the family of functions with growth that we study in this work. The following theorem demonstrates the adaptivity of the inverse sensitivity mechanism to the growth of the underlying instance. We defer the proof to Appendix A. [] Let , be convex, Lipschitz for all . Let and assume is in the interior of . Assume that has growth (Assumption 2) with . For , the smooth inverse sensitivity mechanism (5) is DP, and with probability at least the output has
Moreover, setting , we have
The rates of the inverse sensitivity in Section 3 provide two main insights regarding the landscape of the problem with growth conditions. First, these conditions allow to improve the worstcase rate to for pure DP and therefore suggest a better rate is possible for approximate DP. Moreover, the general instanceoptimality guarantees of this mechanism [2] hint that these are the optimal rates for our class of functions. In the sections to come, we validate the correctness of these predictions by developing efficient algorithms that achieve these rates (for pure and approximate privacy), and prove matching lower bounds which demonstrate the optimality of these algorithms.
4 Efficient algorithms with optimal rates
While the previous section demonstrates that there exists algorithms that improve the rates for functions with growth, we pointed out that was computationally intractable in the general case. In this section, we develop efficient algorithms—e.g. that are implementable with gradientbased methods—that achieve the same convergence rates. Our algorithms build on the recent localization techniques that Feldman et al. [25] used to obtain optimal rates for DPSCO with general convex functions. In Section 4.1, we use these techniques to develop private algorithms that achieve the optimal rates for (pure) DPSCO with high probability, in contrast to existing results which bound the expected excess loss. These results are of independent interest.
In Section 4.2, we translate these results into convergence guarantees on privately optimizing convex functions with growth by solving a sequence of increasingly constrained SCO problems—the highprobability guarantees of Section 4.1 being crucial to our convergence analysis of these algorithms.
4.1 Highprobability guarantees for convex DPSCO
We first describe our algorithm (Algorithm 1) then analyze its performance under pureDP (Proposition 1) and approximateDP constraints (Proposition 1). Our analysis builds on novel tight generalization bounds for uniformlystable algorithms with high probability [24]. We defer the proofs to Appendix B.
[] Let , and be convex, Lipschitz for all . Setting
then for , Algorithm 1 is DP and has with probability
Similarly, by using a different choice for the parameters and noise distribution, we have the following guarantees for approximate DP. [] Let , and be convex, Lipschitz for all . Setting
then for , Algorithm 1 is DP and has with probability
4.2 Algorithms for DPSCO with growth
Building on the algorithms of the previous section, we design algorithms that recover the rates of the inverse sensitivity mechanism for functions with growth, importantly without knowledge of the value of
. Inspired by epochbased algorithms from the optimization literature
[31, 29], our algorithm iteratively applies the private procedures from the previous section. Crucially, the growth assumption allows to reduce the diameter of the domain after each run, hence improving the overall excess loss by carefully choosing the hyperparameters. We provide full details in Algorithm 2.The following theorem summarizes our main upper bound for DPSCO with growth in the pure privacy model, recovering the rates of the inverse sensitivity mechanism in Section 3. We defer the proof to Section B.3. [] Let , and be convex, Lipschitz for all . Assume that has growth (Assumption 2) with . Setting , Algorithm 2 is DP and has with probability
where hides logarithmic factors depending on and .
Sketch of the proof.
The main challenge of the proof is showing that the iterate achieves good risk without knowledge of . Let us denote by the error guarantee of Proposition 1 (or Proposition 1 for approximateDP). At each stage , as long as belongs to , the excess loss is of order and thus decreases exponentially fast with . The challenge is that, without knowledge of , we do not know the index (roughly ) after which for and the regret guarantees become meaningless with respect to the original problem. However, in the stages after , as the constraint set becomes very small, we upper bound the variations in function values and show that the suboptimality cannot increase (overall) by more than , thus achieving the optimal rate of stage .
∎
Moreover, we can improve the dependence on the dimension for approximate DP, resulting in the following bounds.
[] Let , and be convex, Lipschitz for all . Assume that has growth (Assumption 2) with . Setting and , Algorithm 2 is DP and has with probability
where hides logarithmic factors depending on and .
5 Lower bounds
In this section, we develop (minimax) lower bounds for the problem of SCO with growth under privacy constraints. Note that taking provides lower bound for the unconstrained minimax risk. For a sample space and collection of distributions over , we define the function class as the set of convex functions from that are Lipschitz and has growth (Assumption 2). We define the constrained minimax risk [6]
(6) 
where is the collection of DP mechanisms from to . When clear from context, we omit the dependency on of the function class and simply write . We also forego the dependence on when referring to pureDP constraints, i.e. . We now proceed to prove tight lower bounds for DP in Section 5.1 and DP in Section 5.2.
5.1 Lower bounds for pure Dp
Although in Section 4 we show that the same algorithm achieves the optimal upper bounds for all values of , the landscape of the problem is more subtle for the lower bounds and we need to delineate two different cases to obtain tight lower bounds. We begin with , which corresponds to uniform convexity and enjoys properties that make the problem easier (e.g., closure under summation or addition of linear terms). The second case, , corresponds to sharper growth and requires a different hard instance to satisfy the growth condition.
growth with .
We begin by developing lower bounds under pure DP for
[Lower bound for DP, ] Let , , , and . Let be the set of distributions on . Assume that
The following lower bound holds
(7) 
First of all, note that is not an overlyrestrictive assumption. Indeed, for an arbitrary uniformly convex and Lipschitz function, it always holds that . This is thus equivalent to assuming . Note that when , the standard lower bound holds. We present the proof in Section C.1.1 and preview the main ideas here.
Sketch of the proof.
Our lower bounds hinges on the collections of functions for to be chosen later. These functions are [39, Lemma 4] uniformly convex for any and in turn, so is the population function . We proceed as follows, we first prove an informationtheoretic (nonprivate) lower bound (Section C.1.1 in Appendix C.1.1) which provides the statistical term in (7). With the same family of functions, we exhibit a collection of datasets and prove by contradiction that if an estimator were to optimize below a certain error it would have violated DP—this yields a lower bound on ERM for our function class (Theorem C.1.1 in Appendix C.1.1). We conclude by proving a reduction from SCO to ERM in Section C.1.1. ∎
growth with .
As the construction of the hard instance is more intricate for , we provide a onedimensional lower bound and leave the highdimensional case for future work. In this case we directly obtain the result with a private version of Le Cam’s method [44, 42, 6], however with a different family of functions.
The issue with the construction of the previous section is that the function does not exhibit sharp growth for . Indeed, the added linear function shifts the minimum away from where the function is differentiable and as a result it locally behaves as a quadratic and only achieves growth . To establish the lower bound, we consider a different sample function that has growth exactly on one side and on the other side. This yields the following
[Lower bound for DP, ] Let , , , , , and . There exists a collection of distributions such that, whenever , it holds that
(8) 
5.2 Lower bounds under approximate privacy constraints
We conclude our treatment by providing lower bounds but now under approximate privacy constraints, demonstrating the optimality of the risk bound of Theorem 2. We prove the result via a reduction: we show that if one solves ERM with growth with error , this implies that one solves arbitrary convex ERM with error . Given that a lower bound of holds for ERM, a lower bound of holds for ERM with growth. However, for this reduction to hold, we require that . Furthermore, we consider to be roughly a constant—in the case that is too large, standard lower bounds on general convex functions apply.
[Private lower bound for DP] Let such that , . Let and . Assume that , then for any mechanism , there exists and such that
Section 5.2 implies that the same lower bound (up to logarithmic factors) applies to SCO via the reduction of [8, Appendix C]. Before proving the theorem, let us state (and prove in Section C.2) the following reduction: if an DP algorithm achieves excess error (roughly) on ERM for any function with growth, there exists an DP algorithm that achieves error for any convex function. We construct the latter by iteratively solving ERM problems with geometrically increasing regularization towards the previous iterate to ensure the objective has growth.
[Solving ERM with growth implies solving any convex ERM] Let . Assume there exists an mechanism such that for any Lipschitz loss on and dataset such that exhibits growth, the mechanism achieves excess loss
Then, we can construct an DP mechanism such that for any Lipschitz loss , the mechanism achieves excess loss
where is the smallest integer such that .
With this proposition, the proof of the theorem directly follows as Bassily et al. [7] prove a lower bound for ERM with DP.
Discussion
In this work, we develop private algorithms that adapt to the growth of the function at hand, achieving the convergence rate corresponding to the “easiest” subclass the function belongs to. However, the picture is not yet complete. First of, there are still gaps in our theoretical understanding, the most interesting one being . On these functions, appropriate optimization algorithms achieve linear convergence [43] and raise the question, can we achieve exponentially small privacy cost in our setting? Finally, while our optimality guarantees are more finegrained than the usual minimax results over convex functions, they are still contigent on some predetermined choice of subclasses. Studying more general notions of adaptivity is an important future direction in private optimization.
Acknowledgments
The authors would like to thank Karan Chadha and Gary Cheng for comments on an early version of the draft.
References
 Abadi et al. [2016] M. Abadi, A. Chu, I. Goodfellow, B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep learning with differential privacy. In 23rd ACM Conference on Computer and Communications Security (ACM CCS), pages 308–318, 2016.
 Asi and Duchi [2020a] H. Asi and J. Duchi. Near instanceoptimality in differential privacy. arXiv:2005.10630 [cs.CR], 2020a.
 Asi and Duchi [2020b] H. Asi and J. C. Duchi. Instanceoptimality in differential privacy via approximate inverse sensitivity mechanisms. In Advances in Neural Information Processing Systems 33, 2020b.
 Asi et al. [2021a] H. Asi, J. Duchi, A. Fallah, O. Javidbakht, and K. Talwar. Private adaptive gradient methods for convex optimization. arXiv:2106.13756 [cs.LG], 2021a.
 Asi et al. [2021b] H. Asi, V. Feldman, T. Koren, and K. Talwar. Private stochastic convex optimization: Optimal rates in geometry. arXiv:2103.01516 [cs.LG], 2021b.
 Barber and Duchi [2014] R. F. Barber and J. C. Duchi. Privacy and statistical risk: Formalisms and minimax bounds. arXiv:1412.4451 [math.ST], 2014.
 Bassily et al. [2014] R. Bassily, A. Smith, and A. Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 55th Annual Symposium on Foundations of Computer Science, pages 464–473, 2014.
 Bassily et al. [2019] R. Bassily, V. Feldman, K. Talwar, and A. Thakurta. Private stochastic convex optimization with optimal rates. In Advances in Neural Information Processing Systems 32, 2019.

Bassily et al. [2020]
R. Bassily, V. Feldman, C. Guzmán, and K. Talwar.
Stability of stochastic gradient descent on nonsmooth convex losses.
In Advances in Neural Information Processing Systems 33, 2020.  Beck and Teboulle [2003] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31:167–175, 2003.
 Bolte et al. [2017] J. Bolte, T. P. Nguyen, J. Peypouquet, and B. Suter. From error bounds to the complexity of firstorder descent methods for convex functions. Mathematical Programming, 165:471–507, 2017.
 Bottou et al. [2018] L. Bottou, F. Curtis, and J. Nocedal. Optimization methods for largescale learning. SIAM Review, 60(2):223–311, 2018.
 Braverman et al. [2016] M. Braverman, A. Garg, T. Ma, H. L. Nguyen, and D. P. Woodruff. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the FortyEighth Annual ACM Symposium on the Theory of Computing, 2016. URL https://arxiv.org/abs/1506.07216.
 Cai and Low [2015] T. Cai and M. Low. A framework for estimating convex functions. Statistica Sinica, 25:423–456, 2015.
 Chatterjee et al. [2016] S. Chatterjee, J. Duchi, J. Lafferty, and Y. Zhu. Local minimax complexity of stochastic convex optimization. In Advances in Neural Information Processing Systems 29, 2016.
 Chaudhuri et al. [2011] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12:1069–1109, 2011.
 Duchi [2019] J. C. Duchi. Information theory and statistics. Lecture Notes for Statistics 311/EE 377, Stanford University, 2019. URL http://web.stanford.edu/class/stats311/lecturenotes.pdf. Accessed May 2019.
 Duchi and Ruan [2021] J. C. Duchi and F. Ruan. Asymptotic optimality in stochastic optimization. Annals of Statistics, 49(1):21–48, 2021.
 Duchi et al. [2013] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy and statistical minimax rates. In 54th Annual Symposium on Foundations of Computer Science, pages 429–438, 2013.
 Dwork and Roth [2014] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 & 4):211–407, 2014.
 Dwork et al. [2006a] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology (EUROCRYPT 2006), 2006a.
 Dwork et al. [2006b] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Theory of Cryptography Conference, pages 265–284, 2006b.
 Dwork et al. [2012] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Innovations in Theoretical Computer Science (ITCS), pages 214–226, 2012.

Feldman and Vondrak [2019]
V. Feldman and J. Vondrak.
High probability generalization bounds for uniformly stable
algorithms with nearly optimal rate.
In
Proceedings of the Thirty Second Annual Conference on Computational Learning Theory
, pages 1270–1279, 2019. 
Feldman et al. [2020]
V. Feldman, T. Koren, and K. Talwar.
Private stochastic convex optimization: Optimal rates in linear time.
In
Proceedings of the FiftySecond Annual ACM Symposium on the Theory of Computing
, 2020.  Garg et al. [2014] A. Garg, T. Ma, and H. L. Nguyen. On communication cost of distributed statistical estimation and dimensionality. In Advances in Neural Information Processing Systems 27, 2014.
 Hardt and Talwar [2010] M. Hardt and K. Talwar. On the geometry of differential privacy. In Proceedings of the FortySecond Annual ACM Symposium on the Theory of Computing, pages 705–714, 2010. URL http://arxiv.org/abs/0907.3754.
 Hashimoto et al. [2018] T. Hashimoto, M. Srivastava, H. Namkoong, and P. Liang. Fairness without demographics in repeated loss minimization. In Proceedings of the 35th International Conference on Machine Learning, 2018.
 Hazan and Kale [2011] E. Hazan and S. Kale. An optimal algorithm for stochastic strongly convex optimization. In Proceedings of the Twenty Fourth Annual Conference on Computational Learning Theory, 2011. URL http://arxiv.org/abs/1006.2425.
 Jin et al. [2019] C. Jin, P. Netrapalli, R. Ge, S. M. Kakade, and M. I. Jordan. A short note on concentration inequalities for random vectors with subgaussian norm. arXiv:1902.03736 [math.PR], 2019.
 Juditsky and Nesterov [2010] A. Juditsky and Y. Nesterov. Primaldual subgradient methods for minimizing uniformly convex functions. URL http://hal.archivesouvertes.fr/docs/00/50/89/33/PDF/Stronghal.pdf, 2010.
 Juditsky and Nesterov [2014] A. Juditsky and Y. Nesterov. Deterministic and stochastic primaldual subgradient algorithms for uniformly convex minimization. Stochastic Systems, 4(1):44––80, 2014.
 Levy and Duchi [2019] D. Levy and J. C. Duchi. Necessary and sufficient geometries for gradient methods. In Advances in Neural Information Processing Systems 32, 2019.
 Levy et al. [2021] D. Levy, Z. Sun, K. Amin, S. Kale, A. Kulesza, M. Mohri, and A. T. Suresh. Learning with userlevel privacy. arXiv:2102.11845 [cs.LG], 2021. URL https://arxiv.org/abs/2102.11845.

McMahan et al. [2017]
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas.
Communicationefficient learning of deep networks from decentralized
data.
In
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics
, 2017.  Mitzenmacher and Upfal [2005] M. Mitzenmacher and E. Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005.
 Nemirovski and Yudin [1983] A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Wiley, 1983.
 Nemirovski et al. [2009] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574–1609, 2009.
 Nesterov [2008] Y. Nesterov. Accelerating the cubic regularization of newton’s method on convex problems. Mathematical Programming, 112(1):159–181, 2008.
 Ramdas and Singh [2013] A. Ramdas and A. Singh. Optimal rates for stochastic convex optimization under tsybakov noise condition. In Proceedings of the 30th International Conference on Machine Learning, pages 365–373, 2013.

Smith and Thakurta [2013]
A. Smith and A. Thakurta.
Differentially private feature selection via stability arguments, and the robustness of the Lasso.
In Proceedings of the Twenty Sixth Annual Conference on Computational Learning Theory, pages 819–850, 2013. URL http://proceedings.mlr.press/v30/Guha13.html.  Wainwright [2019] M. J. Wainwright. HighDimensional Statistics: A NonAsymptotic Viewpoint. Cambridge University Press, 2019.
 Xu et al. [2017] Y. Xu, Q. Lin, and T. Yang. Stochastic convex optimization: Faster local growth implies faster global convergence. In Proceedings of the 34th International Conference on Machine Learning, pages 3821–3830, 2017.
 Yu [1997] B. Yu. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam, pages 423–435. SpringerVerlag, 1997.
 Zhang et al. [2013] Y. Zhang, J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Informationtheoretic lower bounds for distributed estimation with communication constraints. In Advances in Neural Information Processing Systems 26, 2013.
 Zhu et al. [2016] Y. Zhu, S. Chatterjee, J. Duchi, and J. Lafferty. Local minimax complexity of stochastic convex optimization. In Advances in Neural Information Processing Systems 29, 2016.
Checklist

For all authors…

Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?

Did you describe the limitations of your work?

Did you discuss any potential negative societal impacts of your work?

Have you read the ethics review guidelines and ensured that your paper conforms to them?


If you are including theoretical results…

Did you state the full set of assumptions of all theoretical results?

Did you include complete proofs of all theoretical results?


If you ran experiments…

Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)?

Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)?

Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)?

Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)?


If you are using existing assets (e.g., code, data, models) or curating/releasing new assets…

If your work uses existing assets, did you cite the creators?

Did you mention the license of the assets?

Did you include any new assets either in the supplemental material or as a URL?

Did you discuss whether and how consent was obtained from people whose data you’re using/curating?

Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content?


If you used crowdsourcing or conducted research with human subjects…

Did you include the full text of instructions given to participants and screenshots, if applicable?

Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable?

Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation?

Potential negative societal impact
The aim of our work is theoretical in essence and as such, we do not expect direct negative societal impact. As DP becomes more establish as a norm, we believe this research is relevant for practitionners in both industry and government. Indeed, an important obstacle to applying DP is the loss of performance compared to nonprivate models; our theoretical results suggests that better adaptive algorithms would significantly narrow this performance gap. We wish to point out two potential negative consequences of growing research in privacy. First, a simple but effective method to guarantee privacy is to either delete existing user data or limit data collection in the first place. Paradoxically, the more confident institutions are in DP algorithms, the less they are susceptible to turn to these simpler—and most effective—solutions. Finally, using DP algorithms should not preclude one from (1) carefully choosing and to provide meaningful guarantees for the specific application at hand and (2) developing exhaustive and meticulous evaluation methods of the privacy of deployed models.
Appendix A Proofs for Section 3
a.1 Proof of Section 3
See 3
Let us first prove privacy. The sensitivity of is as is Lipschitz, therefore following the privacy proof of the smooth inverse sensitivity mechanism [2, Prop. 3.2] we get that (5) is DP.
Let us now prove the claim about utility. Denote and with to be chosen presently. We argue that it is enough to show that . Indeed then with probability at least we have , which implies there is such that and , hence using the inequality (2)
It remains to prove that . Let and . Note that for any as is in the interior of which implies . Hence the definition of the smooth inverse sensitivity mechanism (5) implies
where the last inequality follows by choosing .
Appendix B Proofs for Section 4
We need to the following result on the generalization properties of uniformly stable algorithms [24]. [24, Cor. 4.2] Assume . Let where and is Lipschitz and strongly convex for all . Let be the empirical minimizer. For , with probability at least
b.1 Proof of Algorithm 1
See 1
We begin by proving the privacy claim. We show that each iterate is DP and therefore postprocessing implies the claim as each sample is used in exactly one iterate. To this end, let and note that the minimizer has sensitivity [25], hence the sensitivity is at most . Standard properties of the Laplace mechanism [20] now imply that is DP which give the claim about privacy.
Now we proceed to prove utility which follows similar arguments to the localizationbased proof in [25]. Letting , we have:
Comments
There are no comments yet.