Phase diagram for two-layer ReLU neural networks at infinite-width limit

by   Tao Luo, et al.

How neural network behaves during the training over different choices of hyperparameters is an important question in the study of neural networks. However, except for specific examples with particular choices of hyperparameters, e.g., neural tangent kernel (NTK), mean-field model, this question is largely unanswered. In this work, inspired by the phase diagram in statistical mechanics, we draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit for a complete characterization of its dynamical regimes and their dependence on hyperparameters. Through both experimental and theoretical approaches, we identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime, based on the relative change of input weights as the width approaches infinity, which tends to 0, O(1) and +∞, respectively. In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay. In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations. The critical regime serves as the boundary between above two regimes, which exhibits an intermediate nonlinear behavior with the mean-field model as a typical example. Overall, our phase diagram for the two-layer ReLU NN serves as a map for the future studies and is a first step towards a more systematical investigation of the training behavior and the implicit regularization of NNs of different structures.



There are no comments yet.


page 13


Global Convergence of Three-layer Neural Networks in the Mean Field Regime

In the mean field regime, neural networks are appropriately scaled so th...

Dynamically Stable Infinite-Width Limits of Neural Classifiers

Recent research has been focused on two different approaches to studying...

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

A numerical and phenomenological study of the gradient descent (GD) algo...

Interpolating between boolean and extremely high noisy patterns through Minimal Dense Associative Memories

Recently, Hopfield and Krotov introduced the concept of dense associati...

A priori generalization error for two-layer ReLU neural network through minimum norm solution

We focus on estimating a priori generalization error of two-layer ReLU n...

Embedded Ensembles: Infinite Width Limit and Operating Regimes

A memory efficient approach to ensembling neural networks is to share mo...

On Sparsity in Overparametrised Shallow ReLU Networks

The analysis of neural network training beyond their linearization regim...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.