How many degrees of freedom do we need to train deep networks: a loss landscape perspective

07/13/2021
by   Brett W. Larsen, et al.
0

A variety of recent works, spanning pruning, lottery tickets, and training within random subspaces, have shown that deep neural networks can be trained using far fewer degrees of freedom than the total number of parameters. We explain this phenomenon by first examining the success probability of hitting a training loss sub-level set when training within a random subspace of a given training dimensionality. We find a sharp phase transition in the success probability from 0 to 1 as the training dimension surpasses a threshold. This threshold training dimension increases as the desired final loss decreases, but decreases as the initial loss decreases. We then theoretically explain the origin of this phase transition, and its dependence on initialization and final desired loss, in terms of precise properties of the high dimensional geometry of the loss landscape. In particular, we show via Gordon's escape theorem, that the training dimension plus the Gaussian width of the desired loss sub-level set, projected onto a unit sphere surrounding the initialization, must exceed the total number of parameters for the success probability to be large. In several architectures and datasets, we measure the threshold training dimension as a function of initialization and demonstrate that it is a small fraction of the total number of parameters, thereby implying, by our theory, that successful training with so few dimensions is possible precisely because the Gaussian width of low loss sub-level sets is very large. Moreover, this threshold training dimension provides a strong null model for assessing the efficacy of more sophisticated ways to reduce training degrees of freedom, including lottery tickets as well a more optimal method we introduce: lottery subspaces.

READ FULL TEXT

page 4

page 7

page 12

research
02/15/2019

Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

Deep neural networks are typically highly over-parameterized with prunin...
research
08/01/2018

Geometry of energy landscapes and the optimizability of deep neural networks

Deep neural networks are workhorse models in machine learning with multi...
research
09/23/2021

Theory of overparametrization in quantum neural networks

The prospect of achieving quantum advantage with Quantum Neural Networks...
research
10/27/2020

Are wider nets better given the same number of parameters?

Empirical studies demonstrate that the performance of neural networks im...
research
12/02/2019

The intriguing role of module criticality in the generalization of deep networks

We study the phenomenon that some modules of deep neural networks (DNNs)...
research
06/09/2023

How Sparse Can We Prune A Deep Network: A Geometric Viewpoint

Overparameterization constitutes one of the most significant hallmarks o...
research
02/03/2023

Towards active learning: A stopping criterion for the sequential sampling of grain boundary degrees of freedom

Many materials processes and properties depend on the anisotropy of the ...

Please sign up or login with your details

Forgot password? Click here to reset