Generalization Bounds for Gradient Methods via Discrete and Continuous Prior

05/27/2022
by   Jian Li, et al.
0

Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive assumptions on the learning rate (e.g., fast decreasing learning rate), or continuous injected noise (such as the Gaussian noise in Langevin dynamics). In this paper, we introduce a new discrete data-dependent prior to the PAC-Bayesian framework, and prove a high probability generalization bound of order O(1/n·∑_t=1^T(γ_t/ε_t)^2𝐠_t^2) for Floored GD (i.e. a version of gradient descent with precision level ε_t), where n is the number of training samples, γ_t is the learning rate at step t, 𝐠_t is roughly the difference of the gradient computed using all samples and that using only prior samples. 𝐠_t is upper bounded by and and typical much smaller than the gradient norm ∇ f(W_t). We remark that our bound holds for nonconvex and nonsmooth scenarios. Moreover, our theoretical results provide numerically favorable upper bounds of testing errors (e.g., 0.037 on MNIST). Using a similar technique, we can also obtain new generalization bounds for certain variants of SGD. Furthermore, we study the generalization bounds for gradient Langevin Dynamics (GLD). Using the same framework with a carefully constructed continuous prior, we show a new high probability generalization bound of order O(1/n + L^2/n^2∑_t=1^T(γ_t/σ_t)^2) for GLD. The new 1/n^2 rate is due to the concentration of the difference between the gradient of training samples and that of the prior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

Stochastic Gradient Descent with Large Learning Rate

As a simple and efficient optimization method in deep learning, stochast...
research
11/19/2022

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

Stochastic differential equations (SDEs) have been shown recently to wel...
research
02/27/2019

High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

Algorithmic stability is a classical approach to understanding and analy...
research
02/02/2019

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Generalization error (also known as the out-of-sample error) measures ho...
research
10/12/2022

On the Importance of Gradient Norm in PAC-Bayesian Bounds

Generalization bounds which assess the difference between the true risk ...
research
02/20/2023

On the Stability and Generalization of Triplet Learning

Triplet learning, i.e. learning from triplet data, has attracted much at...
research
02/14/2022

Black-Box Generalization

We provide the first generalization error analysis for black-box learnin...

Please sign up or login with your details

Forgot password? Click here to reset