Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

12/30/2020
by   Mario Geiger, et al.
13

Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension – a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental challenge. Other puzzles include that (i) learning corresponds to minimizing a loss in high dimension, which is in general not convex and could well get stuck bad minima. (ii) Deep learning predicting power increases with the number of fitting parameters, even in a regime where data are perfectly fitted. In this manuscript, we review recent results elucidating (i,ii) and the perspective they offer on the (still unexplained) curse of dimensionality paradox. We base our theoretical discussion on the (h,α) plane where h is the network width and α the scale of the output of the network at initialization, and provide new systematic measures of performance in that plane for MNIST and CIFAR 10. We argue that different learning regimes can be organized into a phase diagram. A line of critical points sharply delimits an under-parametrised phase from an over-parametrized one. In over-parametrized nets, learning can operate in two regimes separated by a smooth cross-over. At large initialization, it corresponds to a kernel method, whereas for small initializations features can be learnt, together with invariants in the data. We review the properties of these different phases, of the transition separating them and some open questions. Our treatment emphasizes analogies with physical systems, scaling arguments and the development of numerical observables to quantitatively test these results empirically.

READ FULL TEXT

page 3

page 6

page 9

page 11

page 13

research
06/19/2019

Disentangling feature and lazy learning in deep neural networks: an empirical study

Two distinct limits for deep learning as the net width h→∞ have been pro...
research
06/26/2023

Scaling and Resizing Symmetry in Feedforward Networks

Weights initialization in deep neural networks have a strong impact on t...
research
09/25/2018

The jamming transition as a paradigm to understand the loss landscape of deep neural networks

Deep learning has been immensely successful at a variety of tasks, rangi...
research
05/24/2022

Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Substantial work indicates that the dynamics of neural networks (NNs) is...
research
09/19/2023

On the different regimes of Stochastic Gradient Descent

Modern deep networks are trained with stochastic gradient descent (SGD) ...
research
10/22/2018

A jamming transition from under- to over-parametrization affects loss landscape and generalization

We argue that in fully-connected networks a phase transition delimits th...
research
06/07/2017

Are Saddles Good Enough for Deep Learning?

Recent years have seen a growing interest in understanding deep neural n...

Please sign up or login with your details

Forgot password? Click here to reset