On generalization bounds for deep networks based on loss surface implicit regularization

by   Masaaki Imaizumi, et al.
The University of Tokyo

The classical statistical learning theory says that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. The implicit regularization induced by stochastic gradient descent (SGD) has been regarded to be important, but its specific principle is still unknown. In this work, we study how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima. Our work attempts to better connect non-convex optimization and generalization analysis with uniform convergence.


page 4

page 5


Generalization Error Bounds for Deep Neural Networks Trained by SGD

Generalization error bounds for deep neural networks trained by stochast...

Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience

The ability of overparameterized deep networks to generalize well has be...

Explaining generalization in deep learning: progress and fundamental limits

This dissertation studies a fundamental open challenge in deep learning ...

Dimension Independent Generalization Error with Regularized Online Optimization

One classical canon of statistics is that large models are prone to over...

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

This paper proposes a new optimization algorithm called Entropy-SGD for ...

Topology and Geometry of Half-Rectified Network Optimization

The loss surface of deep neural networks has recently attracted interest...

Fine-grained Optimization of Deep Neural Networks

In recent studies, several asymptotic upper bounds on generalization err...

Please sign up or login with your details

Forgot password? Click here to reset