DeepAI AI Chat
Log In Sign Up

A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks

by   Zhengdao Chen, et al.

To understand the training dynamics of neural networks (NNs), prior studies have considered the infinite-width mean-field (MF) limit of two-layer NN, establishing theoretical guarantees of its convergence under gradient flow training as well as its approximation and generalization capabilities. In this work, we study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed. To define the limiting model rigorously, we generalize the MF theory of two-layer NNs by treating the neurons as belonging to functional spaces. Then, by writing the MF training dynamics as a kernel gradient flow with a time-varying kernel that remains positive-definite, we prove that its training loss in L_2 regression decays to zero at a linear rate. Furthermore, we define function spaces that include the solutions obtainable through the MF training dynamics and prove Rademacher complexity bounds for these spaces. Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors while both exhibiting feature learning.


page 1

page 2

page 3

page 4


A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization

The training dynamics of two-layer neural networks with batch normalizat...

On Feature Learning in Neural Networks with Global Convergence Guarantees

We study the optimization of wide neural networks (NNs) via gradient flo...

Phase diagram for two-layer ReLU neural networks at infinite-width limit

How neural network behaves during the training over different choices of...

Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Obtaining theoretical guarantees for neural networks training appears to...

Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

We analyze feature learning in infinite width neural networks trained wi...

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

To theoretically understand the behavior of trained deep neural networks...