Learning While Dissipating Information: Understanding the Generalization Capability of SGLD

02/05/2021
by   Hao Wang, et al.
3

Understanding the generalization capability of learning algorithms is at the heart of statistical learning theory. In this paper, we investigate the generalization gap of stochastic gradient Langevin dynamics (SGLD), a widely used optimizer for training deep neural networks (DNNs). We derive an algorithm-dependent generalization bound by analyzing SGLD through an information-theoretic lens. Our analysis reveals an intricate trade-off between learning and information dissipation: SGLD learns from data by updating parameters at each iteration while dissipating information from early training stages. Our bound also involves the variance of gradients which captures a particular kind of "sharpness" of the loss landscape. The main proof techniques in this paper rely on strong data processing inequalities – a fundamental concept in information theory – and Otto-Villani's HWI inequality. Finally, we demonstrate our bound through numerical experiments, showing that it can predict the behavior of the true generalization gap.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2023

Learning Trajectories are Generalization Indicators

The aim of this paper is to investigate the connection between learning ...
research
06/18/2021

A Probabilistic Representation of DNNs: Bridging Mutual Information and Generalization

Recently, Mutual Information (MI) has attracted attention in bounding th...
research
09/29/2021

Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis

We derive a novel information-theoretic analysis of the generalization p...
research
09/10/2023

Generalization error bounds for iterative learning algorithms with bounded updates

This paper explores the generalization characteristics of iterative lear...
research
06/28/2023

On information captured by neural networks: connections with memorization and generalization

Despite the popularity and success of deep learning, there is limited un...
research
04/18/2018

Understanding Convolutional Neural Network Training with Information Theory

Using information theoretic concepts to understand and explore the inner...
research
12/07/2021

A generalization gap estimation for overparameterized models via the Langevin functional variance

This paper discusses the estimation of the generalization gap, the diffe...

Please sign up or login with your details

Forgot password? Click here to reset