Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

05/24/2022
by   Hanxu Zhou, et al.
0

Substantial work indicates that the dynamics of neural networks (NNs) is closely related to their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with infinite width (Luo et al., 2021), we make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width. First, we derive a normalized gradient flow for three-layer ReLU NNs and obtain two key independent quantities to distinguish different dynamical regimes for common initialization methods. With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical regime. The criteria is the relative change of input weights (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) as the width approaches infinity during the training, which tends to 0, +∞ and O(1), respectively. In addition, we also demonstrate that different layers can lie in different dynamical regimes in a training process within a deep NN. In the condensed regime, we also observe the condensation of weights in isolated orientations with low complexity. Through experiments under three-layer condition, our phase diagram suggests a complicated dynamical regimes consisting of three possible regimes, together with their mixture, for deep NNs and provides a guidance for studying deep NNs in different initialization regimes, which reveals the possibility of completely different dynamics emerging within a deep NN for its different layers.

READ FULL TEXT

page 8

page 9

research
07/15/2020

Phase diagram for two-layer ReLU neural networks at infinite-width limit

How neural network behaves during the training over different choices of...
research
03/12/2023

Phase Diagram of Initial Condensation for Two-layer Neural Networks

The phenomenon of distinct behaviors exhibited by neural networks under ...
research
09/15/2022

Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)

We study the average robustness notion in deep neural networks in (selec...
research
08/04/2022

Feature selection with gradient descent on two-layer networks in low-rotation regimes

This work establishes low test error of gradient flow (GF) and stochasti...
research
12/30/2020

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

Deep learning algorithms are responsible for a technological revolution ...
research
06/18/2019

Gradient Dynamics of Shallow Univariate ReLU Networks

We present a theoretical and empirical study of the gradient dynamics of...
research
02/15/2021

On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

A deep equilibrium model uses implicit layers, which are implicitly defi...

Please sign up or login with your details

Forgot password? Click here to reset