DeepAI AI Chat
Log In Sign Up

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

05/21/2020
by   Stephan Wojtowytsch, et al.
Princeton University
51

We prove that the gradient descent training of a two-layer neural network on empirical or population risk may not decrease population risk at an order faster than t^-4/(d-2) under mean field scaling. Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality. We present numerical evidence that gradient descent training with general Lipschitz target functions becomes slower and slower as the dimension increases, but converges at approximately the same rate in all dimensions when the target function lies in the natural function space for two-layer ReLU networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/27/2020

On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

We describe a necessary and sufficient condition for the convergence to ...
11/20/2020

Normalization effects on shallow neural networks and related asymptotic expansions

We consider shallow (single hidden layer) neural networks and characteri...
02/12/2020

Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

We prove that two-layer (Leaky)ReLU networks initialized by e.g. the wid...
01/13/2021

Learning with Gradient Descent and Weakly Convex Losses

We study the learning performance of gradient descent when the empirical...
02/28/2023

Learning time-scales in two-layers neural networks

Gradient-based learning in multi-layer neural networks displays a number...
12/11/2018

Proximal Mean-field for Neural Network Quantization

Compressing large neural networks by quantizing the parameters, while ma...