Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

02/01/2021
by   Gergely Neu, et al.
0

We study the generalization properties of the popular stochastic gradient descent method for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output. Our key technical tool is combining the information-theoretic generalization bounds previously used for analyzing randomized variants of SGD with a perturbation analysis of the iterates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

Recently there are a considerable amount of work devoted to the study of...
research
05/02/2023

Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Renyi's Entropy Perspective

Recently, information theoretic analysis has become a popular framework ...
research
11/19/2022

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

Stochastic differential equations (SDEs) have been shown recently to wel...
research
09/10/2023

Generalization error bounds for iterative learning algorithms with bounded updates

This paper explores the generalization characteristics of iterative lear...
research
02/05/2019

Distribution-Dependent Analysis of Gibbs-ERM Principle

Gibbs-ERM learning is a natural idealized model of learning with stochas...
research
12/27/2022

Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

To date, no "information-theoretic" frameworks for reasoning about gener...
research
03/02/2023

Tight Risk Bounds for Gradient Descent on Separable Data

We study the generalization properties of unregularized gradient methods...

Please sign up or login with your details

Forgot password? Click here to reset