
A Comparative Analysis of the Optimization and Generalization Property of Twolayer Neural Network and Random Feature Models Under Gradient Descent Dynamics
A fairly comprehensive analysis is presented for the gradient descent dy...
read it

Phase diagram for twolayer ReLU neural networks at infinitewidth limit
How neural network behaves during the training over different choices of...
read it

Analysis of feature learning in weighttied autoencoders via the mean field lens
Autoencoders are among the earliest introduced nonlinear models for unsu...
read it

Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks
Can multilayer neural networks  typically constructed as highly comple...
read it

Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skipconnections
The behavior of the gradient descent (GD) algorithm is analyzed for a de...
read it

A meanfield theory of lazy training in twolayer neural nets: entropic regularization and controlled McKeanVlasov dynamics
We consider the problem of universal approximation of functions by twol...
read it

Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance
The ability of neural networks to provide `best in class' approximation ...
read it
The QuenchingActivation Behavior of the Gradient Descent Dynamics for Twolayer Neural Network Models
A numerical and phenomenological study of the gradient descent (GD) algorithm for training twolayer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavierlike initialization, there are two distinctive phases in the dynamic behavior of GD in the underparametrized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model and the neurons are effectively quenched, followed by a late phase in which the neurons are divided into two groups: a group of a few "activated" neurons that dominate the dynamics and a group of background (or "quenched") neurons that support the continued activation and deactivation process. This neural networklike behavior is continued into the mildly overparametrized regime, where it undergoes a transition to a random featurelike behavior. The quenchingactivation process seems to provide a clear mechanism for "implicit regularization". This is qualitatively different from the dynamics associated with the "meanfield" scaling where all neurons participate equally and there does not appear to be qualitative changes when the network parameters are changed.
READ FULL TEXT
Comments
There are no comments yet.