DeepAI AI Chat
Log In Sign Up

How many degrees of freedom do we need to train deep networks: a loss landscape perspective

by   Brett W. Larsen, et al.
Stanford University

A variety of recent works, spanning pruning, lottery tickets, and training within random subspaces, have shown that deep neural networks can be trained using far fewer degrees of freedom than the total number of parameters. We explain this phenomenon by first examining the success probability of hitting a training loss sub-level set when training within a random subspace of a given training dimensionality. We find a sharp phase transition in the success probability from 0 to 1 as the training dimension surpasses a threshold. This threshold training dimension increases as the desired final loss decreases, but decreases as the initial loss decreases. We then theoretically explain the origin of this phase transition, and its dependence on initialization and final desired loss, in terms of precise properties of the high dimensional geometry of the loss landscape. In particular, we show via Gordon's escape theorem, that the training dimension plus the Gaussian width of the desired loss sub-level set, projected onto a unit sphere surrounding the initialization, must exceed the total number of parameters for the success probability to be large. In several architectures and datasets, we measure the threshold training dimension as a function of initialization and demonstrate that it is a small fraction of the total number of parameters, thereby implying, by our theory, that successful training with so few dimensions is possible precisely because the Gaussian width of low loss sub-level sets is very large. Moreover, this threshold training dimension provides a strong null model for assessing the efficacy of more sophisticated ways to reduce training degrees of freedom, including lottery tickets as well a more optimal method we introduce: lottery subspaces.


page 4

page 7

page 12


Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization

Deep neural networks are typically highly over-parameterized with prunin...

Geometry of energy landscapes and the optimizability of deep neural networks

Deep neural networks are workhorse models in machine learning with multi...

Theory of overparametrization in quantum neural networks

The prospect of achieving quantum advantage with Quantum Neural Networks...

Are wider nets better given the same number of parameters?

Empirical studies demonstrate that the performance of neural networks im...

The intriguing role of module criticality in the generalization of deep networks

We study the phenomenon that some modules of deep neural networks (DNNs)...

Large Scale Structure of Neural Network Loss Landscapes

There are many surprising and perhaps counter-intuitive properties of op...