On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

10/07/2021
by   Ziqiao Wang, et al.
0

This paper follows up on a recent work of (Neu, 2021) and presents new and tighter information-theoretic upper bounds for the generalization error of machine learning models, such as neural networks, trained with SGD. We apply these bounds to analyzing the generalization behaviour of linear and two-layer ReLU networks. Experimental study based on these bounds provide some insights on the SGD training of neural networks. They also point to a new and simple regularization scheme which we show performs comparably to the current state of the art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2022

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

Stochastic differential equations (SDEs) have been shown recently to wel...
research
02/09/2023

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds

We examine the relationship between the mutual information between the o...
research
10/05/2014

On the Computational Efficiency of Training Neural Networks

It is well-known that neural networks are computationally hard to train....
research
06/07/2021

An Information-theoretic Approach to Distribution Shifts

Safely deploying machine learning models to the real world is often a ch...
research
04/19/2023

Generalization and Estimation Error Bounds for Model-based Neural Networks

Model-based neural networks provide unparalleled performance for various...
research
11/20/2019

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

In this preliminary work, we study the generalization properties of infi...
research
02/14/2018

The Role of Information Complexity and Randomization in Representation Learning

A grand challenge in representation learning is to learn the different e...

Please sign up or login with your details

Forgot password? Click here to reset