The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

06/25/2020
by   Chao Ma, et al.
6

A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametrized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model and the neurons are effectively quenched, followed by a late phase in which the neurons are divided into two groups: a group of a few "activated" neurons that dominate the dynamics and a group of background (or "quenched") neurons that support the continued activation and deactivation process. This neural network-like behavior is continued into the mildly over-parametrized regime, where it undergoes a transition to a random feature-like behavior. The quenching-activation process seems to provide a clear mechanism for "implicit regularization". This is qualitatively different from the dynamics associated with the "mean-field" scaling where all neurons participate equally and there does not appear to be qualitative changes when the network parameters are changed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2022

Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates

We consider optimizing two-layer neural networks in the mean-field regim...
research
07/15/2020

Phase diagram for two-layer ReLU neural networks at infinite-width limit

How neural network behaves during the training over different choices of...
research
02/07/2019

Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks

Can multilayer neural networks -- typically constructed as highly comple...
research
05/27/2020

On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

We describe a necessary and sufficient condition for the convergence to ...
research
02/16/2021

Analysis of feature learning in weight-tied autoencoders via the mean field lens

Autoencoders are among the earliest introduced nonlinear models for unsu...
research
08/13/2020

The Slow Deterioration of the Generalization Error of the Random Feature Model

The random feature model exhibits a kind of resonance behavior when the ...

Please sign up or login with your details

Forgot password? Click here to reset