On the Lipschitz Constant of Deep Networks and Double Descent

01/28/2023
by   Matteo Gamba, et al.
0

Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors – namely loss landscape curvature and distance of parameters from initialization – respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.

READ FULL TEXT

page 9

page 16

research
09/21/2022

Deep Double Descent via Smooth Interpolation

Overparameterized deep networks are known to be able to perfectly fit th...
research
01/07/2019

Generalization in Deep Networks: The Role of Distance from Initialization

Why does training deep neural networks using stochastic gradient descent...
research
06/08/2021

What training reveals about neural network complexity

This work explores the hypothesis that the complexity of the function a ...
research
10/13/2021

On the Double Descent of Random Features Models Trained with SGD

We study generalization properties of random features (RF) regression in...
research
03/29/2023

Lipschitzness Effect of a Loss Function on Generalization Performance of Deep Neural Networks Trained by Adam and AdamW Optimizers

The generalization performance of deep neural networks with regard to th...
research
08/26/2021

Disentangling ODE parameters from dynamics in VAEs

Deep networks have become increasingly of interest in dynamical system p...
research
02/21/2018

Coresets For Monotonic Functions with Applications to Deep Learning

Coreset (or core-set) in this paper is a small weighted subset Q of the ...

Please sign up or login with your details

Forgot password? Click here to reset