Data-dependent Generalization Bounds via Variable-Size Compressibility

03/09/2023
by   Milad Sefidgaran, et al.
0

In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension based bounds is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with rate-distortion dimension of process, Rényi information dimension of process, and metric mean dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2022

Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms

Understanding generalization in modern machine learning settings has bee...
research
04/09/2019

Hypothesis Set Stability and Generalization

We present an extensive study of generalization for data-dependent hypot...
research
12/02/2018

Metric mean dimension and analog compression

Wu and Verdú developed a theory of almost lossless analog compression, w...
research
06/21/2022

Supermodular f-divergences and bounds on lossy compression and generalization error with mutual f-information

In this paper, we introduce super-modular -divergences and provide three...
research
02/10/2021

Learning under Distribution Mismatch and Model Misspecification

We study learning algorithms when there is a mismatch between the distri...
research
06/18/2019

New Uniform Bounds for Almost Lossless Analog Compression

Wu and Verdú developed a theory of almost lossless analog compression, w...
research
06/27/2022

Robustness Implies Generalization via Data-Dependent Generalization Bounds

This paper proves that robustness implies generalization via data-depend...

Please sign up or login with your details

Forgot password? Click here to reset