Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

06/28/2023
by   Arvind Mahankali, et al.
0

Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size n = O(d^3.1) where d is the dimension of the inputs, the network converges in polynomially many iterations to a non-trivial error that is not achievable by kernel methods using n ≪ d^4 samples, hence demonstrating a clear separation between unmodified gradient descent and NTK.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Recent work has revealed that overparameterized networks trained by grad...
research
05/21/2020

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

We prove that the gradient descent training of a two-layer neural networ...
research
03/01/2021

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

We study the relative power of learning with gradient descent on differe...
research
06/30/2022

Neural Networks can Learn Representations with Gradient Descent

Significant theoretical work has established that in specific regimes, n...
research
02/03/2020

A Relaxation Argument for Optimization in Neural Networks and Non-Convex Compressed Sensing

It has been observed in practical applications and in theoretical analys...
research
10/21/2022

When Expressivity Meets Trainability: Fewer than n Neurons Can Work

Modern neural networks are often quite wide, causing large memory and co...
research
07/03/2020

Mathematical Perspective of Machine Learning

We take a closer look at some theoretical challenges of Machine Learning...

Please sign up or login with your details

Forgot password? Click here to reset