Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

We provide sharp path-dependent generalization and excess error guarantees for the full-batch Gradient Decent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex). At the heart of our analysis is a new technique for bounding the generalization error of deterministic symmetric algorithms, which implies that average output stability and a bounded expected gradient of the loss at termination lead to generalization. This key result shows that small generalization error occurs at stationary points, and allows us to bypass Lipschitz or sub-Gaussian assumptions on the loss prevalent in previous works. For nonconvex, Polyak-Lojasiewicz (PL), convex and strongly convex losses, we show the explicit dependence of the generalization error in terms of the accumulated path-dependent optimization error, terminal optimization error, number of samples, and number of iterations. For nonconvex smooth losses, we prove that full-batch GD efficiently generalizes close to any stationary point at termination, under the proper choice of a decreasing step size. Further, if the loss is nonconvex but the objective is PL, we derive quadratically vanishing bounds on the generalization error and the corresponding excess risk, for a choice of a large constant step size. For (resp. strongly-) convex smooth losses, we prove that full-batch GD also generalizes for large constant step sizes, and achieves (resp. quadratically) small excess risk while training fast. In all cases, our full-batch GD generalization error and excess risk bounds are strictly tighter than existing bounds for (stochastic) GD, when the loss is smooth (but possibly non-Lipschitz).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2023

Select without Fear: Almost All Mini-Batch Schedules Generalize Optimally

We establish matching upper and lower generalization error bounds for mi...
research
02/14/2022

Black-Box Generalization

We provide the first generalization error analysis for black-box learnin...
research
02/19/2018

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

The success of deep learning has led to a rising interest in the general...
research
02/14/2020

Statistical Learning with Conditional Value at Risk

We propose a risk-averse statistical learning framework wherein the perf...
research
09/15/2022

Private Stochastic Optimization in the Presence of Outliers: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses

We study differentially private (DP) stochastic optimization (SO) with d...
research
10/12/2020

Large-Scale Methods for Distributionally Robust Optimization

We propose and analyze algorithms for distributionally robust optimizati...
research
10/17/2018

Uniform Graphical Convergence of Subgradients in Nonconvex Optimization and Learning

We investigate the stochastic optimization problem of minimizing populat...

Please sign up or login with your details

Forgot password? Click here to reset