Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

12/28/2022
by   Ilja Kuzborskij, et al.
0

We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, non-differentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural networks trained to nearly zero training error are inconsistent in this class, we focus on the early-stopped GD which allows us to show consistency and optimal rates. In particular, we explore this problem from the viewpoint of the Neural Tangent Kernel (NTK) approximation of a GD-trained finite-width neural network. We show that whenever some early stopping rule is guaranteed to give an optimal rate (of excess risk) on the Hilbert space of the kernel induced by the ReLU activation function, the same rule can be used to achieve minimax optimal rate for learning on the class of considered Lipschitz functions by neural networks. We discuss several data-free and data-dependent practically appealing stopping rules that yield optimal rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping

We explore the ability of overparameterized shallow neural networks to l...
research
07/06/2020

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Overparametrized neural networks trained by gradient descent (GD) can pr...
research
07/27/2021

Stability Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel

We revisit on-average algorithmic stability of Gradient Descent (GD) for...
research
05/18/2020

Parsimonious Computing: A Minority Training Regime for Effective Prediction in Large Microarray Expression Data Sets

Rigorous mathematical investigation of learning rates used in back-propa...
research
06/10/2021

Early-stopped neural networks are consistent

This work studies the behavior of neural networks trained with the logis...
research
05/21/2018

Measuring and regularizing networks in function space

Neural network optimization is often conceptualized as optimizing parame...
research
05/04/2023

Statistical Optimality of Deep Wide Neural Networks

In this paper, we consider the generalization ability of deep wide feedf...

Please sign up or login with your details

Forgot password? Click here to reset