
Uniform convergence may be unable to explain generalization in deep learning
We cast doubt on the power of uniform convergencebased generalization b...
read it

Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning
Stochastic methods with coordinatewise adaptive stepsize (such as RMSpr...
read it

Generalization in Deep Networks: The Role of Distance from Initialization
Why does training deep neural networks using stochastic gradient descent...
read it

Towards Understanding Generalization via Decomposing Excess Risk Dynamics
Generalization is one of the critical issues in machine learning. Howeve...
read it

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models
Recent work showed that there could be a large gap between the classical...
read it

Gradient Descent in RKHS with Importance Labeling
Labeling cost is often expensive and is a fundamental limitation of supe...
read it

Uniform Convergence, Adversarial Spheres and a Simple Remedy
Previous work has cast doubt on the general framework of uniform converg...
read it
Explaining generalization in deep learning: progress and fundamental limits
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive datadependent uniformconvergencebased generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, any uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an empirical technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniformconvergecebased complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.
READ FULL TEXT
Comments
There are no comments yet.