Overparameterized ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery

09/30/2022
by   Yifei Wang, et al.
21

The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We set out to resolve this discrepancy from a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models. The situation is simple: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models even when the labels are noisy. The phase transition phenomenon is confirmed through numerical experiments.

READ FULL TEXT

page 6

page 13

page 16

page 18

page 19

page 20

page 21

page 22

research
02/25/2020

Convex Geometry and Duality of Over-parameterized Neural Networks

We develop a convex analytic framework for ReLU neural networks which el...
research
04/19/2023

Generalization and Estimation Error Bounds for Model-based Neural Networks

Model-based neural networks provide unparalleled performance for various...
research
07/17/2020

Sparse-grid sampling recovery and deep ReLU neural networks in high-dimensional approximation

We investigate approximations of functions from the Hölder-Zygmund space...
research
10/11/2021

Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs

Understanding the fundamental mechanism behind the success of deep neura...
research
11/16/2016

Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

We introduce and analyze a new technique for model reduction for deep ne...
research
07/05/2018

Deeply-Sparse Signal rePresentations (DS^2P)

The solution to the regularized least-squares problem min_x∈R^p+1/2y-Ax_...
research
09/10/2021

ReLU Regression with Massart Noise

We study the fundamental problem of ReLU regression, where the goal is t...

Please sign up or login with your details

Forgot password? Click here to reset