DeepAI AI Chat
Log In Sign Up

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

by   Stephan Wojtowytsch, et al.
Princeton University

We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than t^-4/(d-2) under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks.


page 1

page 2

page 3

page 4


On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

We describe a necessary and sufficient condition for the convergence to ...

Normalization effects on shallow neural networks and related asymptotic expansions

We consider shallow (single hidden layer) neural networks and characteri...

Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

We prove that two-layer (Leaky)ReLU networks initialized by e.g. the wid...

Learning with Gradient Descent and Weakly Convex Losses

We study the learning performance of gradient descent when the empirical...

Learning time-scales in two-layers neural networks

Gradient-based learning in multi-layer neural networks displays a number...

Proximal Mean-field for Neural Network Quantization

Compressing large neural networks by quantizing the parameters, while ma...