A jamming transition from under- to over-parametrization affects loss landscape and generalization

10/22/2018
by   Stefano Spigler, et al.
0

We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) power law decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks. The theoretical results presented here appeared elsewhere for a physics audience. The results on generalization are new.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2018

The jamming transition as a paradigm to understand the loss landscape of deep neural networks

Deep learning has been immensely successful at a variety of tasks, rangi...
research
01/06/2019

Scaling description of generalization with number of parameters in deep learning

We provide a description for the evolution of the generalization perform...
research
05/31/2021

Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime

In this manuscript we consider Kernel Ridge Regression (KRR) under the G...
research
09/23/2021

Theory of overparametrization in quantum neural networks

The prospect of achieving quantum advantage with Quantum Neural Networks...
research
07/25/2020

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

Modern neural networks are often operated in a strongly overparametrized...
research
06/10/2022

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 )...
research
12/30/2020

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

Deep learning algorithms are responsible for a technological revolution ...

Please sign up or login with your details

Forgot password? Click here to reset