Dimension Independent Generalization Error with Regularized Online Optimization

03/25/2020
by   Xi Chen, et al.
1

One classical canon of statistics is that large models are prone to overfitting and model selection procedures are necessary for high-dimensional data. However, many overparameterized models such as neural networks, which are often trained with simple online methods and regularization, perform very well in practice. The empirical success of overparameterized models, which is often known as benign overfitting, motivates us to have a new look at the statistical generalization theory for online optimization. In particular, we present a general theory on the generalization error of stochastic gradient descent (SGD) for both convex and non-convex loss functions. We further provide the definition of "low effective dimension" so that the generalization error either does not depend on the ambient dimension p or depends on p via a poly-logarithmic factor. We also demonstrate on several widely used statistical models that the "low effect dimension" arises naturally in overparameterized settings. The studied statistical applications include both convex models such as linear regression and logistic regression, and non-convex models such as M-estimator and two-layer neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Stability of SGD: Tightness Analysis and Improved Bounds

Stochastic Gradient Descent (SGD) based methods have been widely used fo...
research
01/12/2022

On generalization bounds for deep networks based on loss surface implicit regularization

The classical statistical learning theory says that fitting too many par...
research
09/22/2019

A generalization of regularized dual averaging and its dynamics

Excessive computational cost for learning large data and streaming data ...
research
06/11/2020

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

We consider a commonly studied supervised classification of a synthetic ...
research
10/10/2017

High-dimensional dynamics of generalization error in neural networks

We perform an average case analysis of the generalization dynamics of la...
research
10/08/2018

A Unified Dynamic Approach to Sparse Model Selection

Sparse model selection is ubiquitous from linear regression to graphical...
research
09/26/2021

Data Summarization via Bilevel Optimization

The increasing availability of massive data sets poses a series of chall...

Please sign up or login with your details

Forgot password? Click here to reset