
Benign Overfitting in Linear Regression
The phenomenon of benign overfitting is one of the key mysteries uncover...
read it

Theory of Deep Learning III: explaining the nonoverfitting puzzle
A main puzzle of deep networks revolves around the absence of overfittin...
read it

Dimension Independent Generalization Error with Regularized Online Optimization
One classical canon of statistics is that large models are prone to over...
read it

Learning Onehiddenlayer neural networks via Provable Gradient Descent with Random Initialization
Although deep learning has shown its powerful performance in many applic...
read it

Benefit of deep learning with nonconvex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
Establishing a theoretical analysis that explains why deep learning can ...
read it

Overfitting Can Be Harmless for Basis Pursuit: Only to a Degree
Recently, there have been significant interests in studying the generali...
read it

Deep Learning Through the Lens of Example Difficulty
Existing work on understanding deep learning often employs measures that...
read it
Deep learning: a statistical viewpoint
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find nearoptimal solutions to nonconvex optimization problems, and despite giving a nearperfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting. We survey recent theoretical progress that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behavior of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favorable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with twolayer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
READ FULL TEXT
Comments
There are no comments yet.