A jamming transition from under- to over-parametrization affects loss landscape and generalization

by   Stefano Spigler, et al.

We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) power law decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks. The theoretical results presented here appeared elsewhere for a physics audience. The results on generalization are new.


page 1

page 2

page 3

page 4


The jamming transition as a paradigm to understand the loss landscape of deep neural networks

Deep learning has been immensely successful at a variety of tasks, rangi...

Scaling description of generalization with number of parameters in deep learning

We provide a description for the evolution of the generalization perform...

Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime

In this manuscript we consider Kernel Ridge Regression (KRR) under the G...

Theory of overparametrization in quantum neural networks

The prospect of achieving quantum advantage with Quantum Neural Networks...

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

Modern neural networks are often operated in a strongly overparametrized...

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 )...

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

Deep learning algorithms are responsible for a technological revolution ...

Please sign up or login with your details

Forgot password? Click here to reset