Private Stochastic Optimization in the Presence of Outliers: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses

by   Andrew Lowy, et al.

We study differentially private (DP) stochastic optimization (SO) with data containing outliers and loss functions that are not Lipschitz continuous. To date, the vast majority of work on DP SO assumes that the loss is Lipschitz (i.e. stochastic gradients are uniformly bounded), and their error bounds scale with the Lipschitz parameter of the loss. While this assumption is convenient, it is often unrealistic: in many practical problems where privacy is required, data may contain outliers or be unbounded, causing some stochastic gradients to have large norm. In such cases, the Lipschitz parameter may be prohibitively large, leading to vacuous excess risk bounds. Thus, building on a recent line of work [WXDX20, KLZ22], we make the weaker assumption that stochastic gradients have bounded k-th moments for some k ≥ 2. Compared with works on DP Lipschitz SO, our excess risk scales with the k-th moment bound instead of the Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). Moreover, in contrast to the prior works [WXDX20, KLZ22], our bounds do not require the loss function to be differentiable/smooth. We also devise an accelerated algorithm that runs in linear time and yields improved (compared to prior works) and nearly optimal excess risk for smooth losses. Additionally, our work is the first to address non-convex non-Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some classes of neural nets, among other practical models. Our Proximal-PL algorithm has nearly optimal excess risk that almost matches the strongly convex lower bound. Lastly, we provide shuffle DP variations of our algorithms, which do not require a trusted curator (e.g. for distributed learning).


page 1

page 2

page 3

page 4


Private Non-Convex Federated Learning Without a Trusted Server

We study differentially private (DP) federated learning (FL) with non-co...

Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings

We study differentially private stochastic optimization in convex and no...

Statistical Learning with Conditional Value at Risk

We propose a risk-averse statistical learning framework wherein the perf...

Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

We study stochastic convex optimization with heavy-tailed data under the...

Private Convex Optimization in General Norms

We propose a new framework for differentially private optimization of co...

Stochastic Bias-Reduced Gradient Methods

We develop a new primitive for stochastic optimization: a low-bias, low-...

Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

We provide sharp path-dependent generalization and excess error guarante...

Please sign up or login with your details

Forgot password? Click here to reset