Ridgeless Interpolation with Shallow ReLU Networks in 1D is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions

09/27/2021
by   Boris Hanin, et al.
0

We prove a precise geometric description of all one layer ReLU networks z(x;θ) with a single linear unit and input/output dimensions equal to one that interpolate a given dataset 𝒟={(x_i,f(x_i))} and, among all such interpolants, minimize the ℓ_2-norm of the neuron weights. Such networks can intuitively be thought of as those that minimize the mean-squared error over 𝒟 plus an infinitesimal weight decay penalty. We therefore refer to them as ridgeless ReLU interpolants. Our description proves that, to extrapolate values z(x;θ) for inputs x∈ (x_i,x_i+1) lying between two consecutive datapoints, a ridgeless ReLU interpolant simply compares the signs of the discrete estimates for the curvature of f at x_i and x_i+1 derived from the dataset 𝒟. If the curvature estimates at x_i and x_i+1 have different signs, then z(x;θ) must be linear on (x_i,x_i+1). If in contrast the curvature estimates at x_i and x_i+1 are both positive (resp. negative), then z(x;θ) is convex (resp. concave) on (x_i,x_i+1). Our results show that ridgeless ReLU interpolants achieve the best possible generalization for learning 1d Lipschitz functions, up to universal constants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2021

Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

We study the problem of estimating an unknown function from noisy data u...
research
06/02/2022

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

The training of neural networks by gradient descent methods is a corners...
research
07/28/2023

Noisy Interpolation Learning with Shallow Univariate ReLU Networks

We study the asymptotic overfitting behavior of interpolation with minim...
research
06/20/2023

Any Deep ReLU Network is Shallow

We constructively prove that every deep ReLU network can be rewritten as...
research
06/12/2019

Decoupling Gating from Linearity

ReLU neural-networks have been in the focus of many recent theoretical w...
research
09/02/2022

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

In this note, we study how neural networks with a single hidden layer an...
research
01/30/2022

Implicit Regularization Towards Rank Minimization in ReLU Networks

We study the conjectured relationship between the implicit regularizatio...

Please sign up or login with your details

Forgot password? Click here to reset