Understanding Generalization in the Interpolation Regime using the Rate Function

06/19/2023
by   Andrés R. Masegosa, et al.
0

In this paper, we present a novel characterization of the smoothness of a model based on basic principles of Large Deviation Theory. In contrast to prior work, where the smoothness of a model is normally characterized by a real value (e.g., the weights' norm), we show that smoothness can be described by a simple real-valued function. Based on this concept of smoothness, we propose an unifying theoretical explanation of why some interpolators generalize remarkably well and why a wide range of modern learning techniques (i.e., stochastic gradient descent, ℓ_2-norm regularization, data augmentation, invariant architectures, and overparameterization) are able to find them. The emergent conclusion is that all these methods provide complimentary procedures that bias the optimizer to smoother interpolators, which, according to this theoretical analysis, are the ones with better generalization error.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2022

Generalization Error Bounds for Deep Neural Networks Trained by SGD

Generalization error bounds for deep neural networks trained by stochast...
research
12/10/2018

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

Batch Normalization (BN) has become a cornerstone of deep learning acros...
research
10/12/2021

On Convergence of Training Loss Without Reaching Stationary Points

It is a well-known fact that nonconvex optimization is computationally i...
research
05/17/2018

Minimax regularization

Classical approach to regularization is to design norms enhancing smooth...
research
09/05/2018

Deep Bilevel Learning

We present a novel regularization approach to train neural networks that...
research
07/05/2022

Predicting Out-of-Domain Generalization with Local Manifold Smoothness

Understanding how machine learning models generalize to new environments...
research
02/11/2021

Higher Order Generalization Error for First Order Discretization of Langevin Diffusion

We propose a novel approach to analyze generalization error for discreti...

Please sign up or login with your details

Forgot password? Click here to reset