Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

by   Arvind Mahankali, et al.

Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size n = O(d^3.1) where d is the dimension of the inputs, the network converges in polynomially many iterations to a non-trivial error that is not achievable by kernel methods using n ≪ d^4 samples, hence demonstrating a clear separation between unmodified gradient descent and NTK.


page 1

page 2

page 3

page 4


Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Recent work has revealed that overparameterized networks trained by grad...

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

We prove that the gradient descent training of a two-layer neural networ...

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

We study the relative power of learning with gradient descent on differe...

Neural Networks can Learn Representations with Gradient Descent

Significant theoretical work has established that in specific regimes, n...

A Relaxation Argument for Optimization in Neural Networks and Non-Convex Compressed Sensing

It has been observed in practical applications and in theoretical analys...

When Expressivity Meets Trainability: Fewer than n Neurons Can Work

Modern neural networks are often quite wide, causing large memory and co...

Mathematical Perspective of Machine Learning

We take a closer look at some theoretical challenges of Machine Learning...

Please sign up or login with your details

Forgot password? Click here to reset