Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

09/02/2022
by   Stephan Wojtowytsch, et al.
0

In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as d and √(d) respectively. We furthermore show that the weight decay regularizer grows exponentially in d if the label 1 is imposed on a ball of radius ε rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2021

Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay

In this paper, we present a spectral-based approach to study the linear ...
research
08/07/2020

Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

Using weight decay to penalize the L2 norms of weights in neural network...
research
09/18/2021

Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

We study the problem of estimating an unknown function from noisy data u...
research
07/18/2023

Convex Geometry of ReLU-layers, Injectivity on the Ball and Local Reconstruction

The paper uses a frame-theoretic setting to study the injectivity of a R...
research
02/10/2022

Exact Solutions of a Deep Linear Network

This work finds the exact solutions to a deep linear network with weight...

Please sign up or login with your details

Forgot password? Click here to reset