Explaining generalization in deep learning: progress and fundamental limits

10/17/2021
by   Vaishnavh Nagarajan, et al.
0

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error? In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive data-dependent uniform-convergence-based generalization bounds with improved dependencies on the parameter count. Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, any uniform convergence bound will provide only a vacuous generalization bound. With this realization in mind, in the last part of the thesis, we will change course and introduce an empirical technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision. We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2022

On generalization bounds for deep networks based on loss surface implicit regularization

The classical statistical learning theory says that fitting too many par...
research
02/13/2019

Uniform convergence may be unable to explain generalization in deep learning

We cast doubt on the power of uniform convergence-based generalization b...
research
05/23/2019

Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning

Stochastic methods with coordinate-wise adaptive stepsize (such as RMSpr...
research
06/11/2021

Towards Understanding Generalization via Decomposing Excess Risk Dynamics

Generalization is one of the critical issues in machine learning. Howeve...
research
06/19/2020

Gradient Descent in RKHS with Importance Labeling

Labeling cost is often expensive and is a fundamental limitation of supe...
research
10/22/2020

In Search of Robust Measures of Generalization

One of the principal scientific challenges in deep learning is explainin...
research
05/07/2021

Uniform Convergence, Adversarial Spheres and a Simple Remedy

Previous work has cast doubt on the general framework of uniform converg...

Please sign up or login with your details

Forgot password? Click here to reset