Edge of chaos as a guiding principle for modern neural network training

07/20/2021
by   Lin Zhang, et al.
0

The success of deep neural networks in real-world problems has prompted many attempts to explain their training dynamics and generalization performance, but more guiding principles for the training of neural networks are still needed. Motivated by the edge of chaos principle behind the optimal performance of neural networks, we study the role of various hyperparameters in modern neural network training algorithms in terms of the order-chaos phase diagram. In particular, we study a fully analytical feedforward neural network trained on the widely adopted Fashion-MNIST dataset, and study the dynamics associated with the hyperparameters in back-propagation during the training process. We find that for the basic algorithm of stochastic gradient descent with momentum, in the range around the commonly used hyperparameter values, clear scaling relations are present with respect to the training time during the ordered phase in the phase diagram, and the model's optimal generalization power at the edge of chaos is similar across different training parameter combinations. In the chaotic phase, the same scaling no longer exists. The scaling allows us to choose the training parameters to achieve faster training without sacrificing performance. In addition, we find that the commonly used model regularization method - weight decay - effectively pushes the model towards the ordered phase to achieve better performance. Leveraging on this fact and the scaling relations in the other hyperparameters, we derived a principled guideline for hyperparameter determination, such that the model can achieve optimal performance by saturating it at the edge of chaos. Demonstrated on this simple neural network model and training algorithm, our work improves the understanding of neural network training dynamics, and can potentially be extended to guiding principles of more complex model architectures and algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Flatter, faster: scaling momentum for optimal speedup of SGD

Commonly used optimization algorithms often show a trade-off between goo...
research
03/08/2022

Bayesian Optimisation-Assisted Neural Network Training Technique for Radio Localisation

Radio signal-based (indoor) localisation technique is important for IoT ...
research
07/15/2023

Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection

When artificial neural networks have demonstrated exceptional practical ...
research
05/12/2023

Optimal signal propagation in ResNets through residual scaling

Residual networks (ResNets) have significantly better trainability and t...
research
05/31/2021

Persistent Homology Captures the Generalization of Neural Networks Without A Validation Set

The training of neural networks is usually monitored with a validation (...
research
12/18/2019

Analytic expressions for the output evolution of a deep neural network

We present a novel methodology based on a Taylor expansion of the networ...
research
09/06/2023

Split-Boost Neural Networks

The calibration and training of a neural network is a complex and time-c...

Please sign up or login with your details

Forgot password? Click here to reset