Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Renyi's Entropy Perspective

05/02/2023
by   Yuxin Dong, et al.
0

Recently, information theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient/Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Renyi's entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon's entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Renyi's entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies1.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

We study the generalization properties of the popular stochastic gradien...
research
02/09/2023

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds

We examine the relationship between the mutual information between the o...
research
07/04/2023

Generalization Guarantees via Algorithm-dependent Rademacher Complexity

Algorithm- and data-dependent generalization bounds are required to expl...
research
11/06/2019

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

In this work, we improve upon the stepwise analysis of noisy iterative l...
research
09/10/2023

Generalization error bounds for iterative learning algorithms with bounded updates

This paper explores the generalization characteristics of iterative lear...
research
04/04/2023

VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution

Since the introduction of deep learning, a wide scope of representation ...
research
05/12/2023

A Logarithmic Decomposition for Information

The Shannon entropy of a random variable X has much behaviour analogous ...

Please sign up or login with your details

Forgot password? Click here to reset