Does the Data Induce Capacity Control in Deep Learning?

10/27/2021
by   Yang Rubing, et al.
0

This paper studies how the dataset may be the cause of the anomalous generalization performance of deep networks. We show that the data correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly over exponentially large ranges. We call such eigenspectra "sloppy" because sets of weights corresponding to small eigenvalues can be changed by large magnitudes without affecting the loss. Networks trained on atypical, non-sloppy synthetic data do not share these traits. We show how this structure in the data can give to non-vacuous PAC-Bayes generalization bounds analytically; we also construct data-distribution dependent priors that lead to accurate bounds using numerical optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Negative eigenvalues of the Hessian in deep neural networks

The loss function of deep networks is known to be non-convex but the pre...
research
02/26/2018

Data-dependent PAC-Bayes priors via differential privacy

The Probably Approximately Correct (PAC) Bayes framework (McAllester, 19...
research
05/18/2019

A note on variance bounds and location of eigenvalues

We discuss some extensions and refinements of the variance bounds for bo...
research
04/12/2021

PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging

Application of deep neural networks to medical imaging tasks has in some...
research
09/06/2019

Better PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature

We investigate whether it's possible to tighten PAC-Bayes bounds for dee...
research
06/04/2018

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

This study analyzes the Fisher information matrix (FIM) by applying mean...
research
07/13/2021

Geometry and Generalization: Eigenvalues as predictors of where a network will fail to generalize

We study the deformation of the input space by a trained autoencoder via...

Please sign up or login with your details

Forgot password? Click here to reset