
Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network
Overparametrized neural networks trained by gradient descent (GD) can pr...
read it

NearMinimax Optimal Estimation With Shallow ReLU Neural Networks
We study the problem of estimating an unknown function from noisy data u...
read it

On the Optimal Memorization Power of ReLU Neural Networks
We study the memorization power of feedforward ReLU neural networks. We ...
read it

Benign overfitting in ridge regression
Classical learning theory suggests that strong regularization is needed ...
read it

Decoupling Gating from Linearity
ReLU neuralnetworks have been in the focus of many recent theoretical w...
read it

Statistical Guarantees for Regularized Neural Networks
Neural networks have become standard tools in the analysis of data, but ...
read it
Harmless Overparametrization in Twolayer Neural Networks
Overparametrized neural networks, where the number of active parameters is larger than the sample size, prove remarkably effective in modern deep learning practice. From the classical perspective, however, much fewer parameters are sufficient for optimal estimation and prediction, whereas overparametrization can be harmful even in the presence of explicit regularization. To reconcile this conflict, we present a generalization theory for overparametrized ReLU networks by incorporating an explicit regularizer based on the scaled variation norm. Interestingly, this regularizer is equivalent to the ridge from the angle of gradientbased optimization, but is similar to the group lasso in terms of controlling model complexity. By exploiting this ridgelasso duality, we show that overparametrization is generally harmless to twolayer ReLU networks. In particular, the overparametrized estimators are minimax optimal up to a logarithmic factor. By contrast, we show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal.
READ FULL TEXT
Comments
There are no comments yet.