Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

09/18/2021
by   Rahul Parhi, et al.
0

We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks. The estimators we study minimize the sum of squared data-fitting errors plus a regularization term proportional to the Euclidean norm of the network weights. This minimization corresponds to the common approach of training a neural network with weight decay. We quantify the performance (mean-squared error) of these neural network estimators when the data-generating function belongs to the space of functions of second-order bounded variation in the Radon domain. This space of functions was recently proposed as the natural function space associated with shallow ReLU neural networks. We derive a minimax lower bound for the estimation problem for this function space and show that the neural network estimators are minimax optimal up to logarithmic factors. We also show that this is a "mixed variation" function space that contains classical multivariate function spaces including certain Sobolev spaces and certain spectral Barron spaces. Finally, we use these results to quantify a gap between neural networks and linear methods (which include kernel methods). This paper sheds light on the phenomenon that neural networks seem to break the curse of dimensionality.

READ FULL TEXT

page 1

page 10

research
06/14/2023

Nonparametric regression using over-parameterized shallow ReLU neural networks

It is shown that over-parameterized neural networks can achieve minimax ...
research
06/12/2020

Minimax Estimation of Conditional Moment Models

We develop an approach for estimating models described via conditional m...
research
05/25/2023

Vector-Valued Variation Spaces and Width Bounds for DNNs: Insights on Weight Decay Regularization

Deep neural networks (DNNs) trained to minimize a loss term plus the sum...
research
06/09/2021

Harmless Overparametrization in Two-layer Neural Networks

Overparametrized neural networks, where the number of active parameters ...
research
06/28/2021

Characterization of the Variation Spaces Corresponding to Shallow Neural Networks

We consider the variation space corresponding to a dictionary of functio...
research
09/02/2022

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

In this note, we study how neural networks with a single hidden layer an...

Please sign up or login with your details

Forgot password? Click here to reset