On Feature Learning in Neural Networks with Global Convergence Guarantees

04/22/2022
by   Zhengdao Chen, et al.
5

We study the optimization of wide neural networks (NNs) via gradient flow (GF) in setups that allow feature learning while admitting non-asymptotic global convergence guarantees. First, for wide shallow NNs under the mean-field scaling and with a general class of activation functions, we prove that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. Building upon this analysis, we study a model of wide multi-layer NNs whose second-to-last layer is trained via GF, for which we also prove a linear-rate convergence of the training loss to zero, but regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks

To understand the training dynamics of neural networks (NNs), prior stud...
research
02/02/2023

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

We consider the optimisation of large and shallow neural networks via gr...
research
05/31/2022

Feature Learning in L_2-regularized DNNs: Attraction/Repulsion and Sparsity

We study the loss surface of DNNs with L_2 regularization. We show that ...
research
07/22/2020

Compressing invariant manifolds in neural nets

We study how neural networks compress uninformative input space in model...
research
03/23/2017

Role of zero synapses in unsupervised feature learning

Synapses in real neural circuits can take discrete values, including zer...
research
07/02/2020

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach

Structural equation models (SEMs) are widely used in sciences, ranging f...
research
06/22/2021

The Rate of Convergence of Variation-Constrained Deep Neural Networks

Multi-layer feedforward networks have been used to approximate a wide ra...

Please sign up or login with your details

Forgot password? Click here to reset