The jamming transition as a paradigm to understand the loss landscape of deep neural networks

09/25/2018
by   Mario Geiger, et al.
0

Deep learning has been immensely successful at a variety of tasks, ranging from classification to artificial intelligence. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a challenge. Here we predict, and test empirically, an unexpected analogy between this landscape and the energy landscape of repulsive ellipses. We argue that in fully-connected deep networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. In the vicinity of this transition, properties of the curvature of the minima of the loss (the spectrum of the hessian) are critical and can be computed. This transition shares direct similarities with the jamming transition by which particles form a disordered solid as the density is increased. Our analysis gives a simple explanation as to why poor minima of the loss cannot be encountered in the overparametrized regime, and puts forward the surprising result that the ability of fully connected networks to fit random and realistic data is independent of their depth. We also study a quantity Δ which characterizes how well (Δ<0) or badly (Δ>0) a datum is learned. At the critical point it is power-law distributed, P_+(Δ)∼Δ^θ for Δ>0 and P_-(Δ)∼(-Δ)^-γ for Δ<0, with θ≈0.3 and γ≈0.2. This observation suggests that near the transition the loss landscape has a hierarchical structure and that the learning dynamics is prone to avalanche-like dynamics, with abrupt changes in the set of patterns that are learned.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2018

A jamming transition from under- to over-parametrization affects loss landscape and generalization

We argue that in fully-connected networks a phase transition delimits th...
research
06/30/2017

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

It is widely observed that deep learning models with learned parameters ...
research
06/11/2019

Large Scale Structure of Neural Network Loss Landscapes

There are many surprising and perhaps counter-intuitive properties of op...
research
01/06/2019

Scaling description of generalization with number of parameters in deep learning

We provide a description for the evolution of the generalization perform...
research
04/03/2023

Charting the Topography of the Neural Network Landscape with Thermal-Like Noise

The training of neural networks is a complex, high-dimensional, non-conv...
research
12/30/2020

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

Deep learning algorithms are responsible for a technological revolution ...
research
10/28/2020

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

In suitably initialized wide networks, small learning rates transform de...

Please sign up or login with your details

Forgot password? Click here to reset