DeepAI AI Chat
Log In Sign Up

On Feature Learning in Neural Networks with Global Convergence Guarantees

by   Zhengdao Chen, et al.

We study the optimization of wide neural networks (NNs) via gradient flow (GF) in setups that allow feature learning while admitting non-asymptotic global convergence guarantees. First, for wide shallow NNs under the mean-field scaling and with a general class of activation functions, we prove that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. Building upon this analysis, we study a model of wide multi-layer NNs whose second-to-last layer is trained via GF, for which we also prove a linear-rate convergence of the training loss to zero, but regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.


page 1

page 2

page 3

page 4


A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks

To understand the training dynamics of neural networks (NNs), prior stud...

Feature Learning in L_2-regularized DNNs: Attraction/Repulsion and Sparsity

We study the loss surface of DNNs with L_2 regularization. We show that ...

On overcoming the Curse of Dimensionality in Neural Networks

Let H be a reproducing Kernel Hilbert space. For i=1,...,N, let x_i∈R^d ...

Compressing invariant manifolds in neural nets

We study how neural networks compress uninformative input space in model...

Role of zero synapses in unsupervised feature learning

Synapses in real neural circuits can take discrete values, including zer...

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach

Structural equation models (SEMs) are widely used in sciences, ranging f...