Learning Capacity: A Measure of the Effective Dimensionality of a Model

05/27/2023
by   Daiwei Chen, et al.
0

We exploit a formal correspondence between thermodynamics and inference, where the number of samples can be thought of as the inverse temperature, to define a "learning capacity” which is a measure of the effective dimensionality of a model. We show that the learning capacity is a tiny fraction of the number of parameters for many deep networks trained on typical datasets, depends upon the number of samples used for training, and is numerically consistent with notions of capacity obtained from the PAC-Bayesian framework. The test error as a function of the learning capacity does not exhibit double descent. We show that the learning capacity of a model saturates at very small and very large sample sizes; this provides guidelines, as to whether one should procure more data or whether one should search for new architectures, to improve performance. We show how the learning capacity can be used to understand the effective dimensionality, even for non-parametric models such as random forests and k-nearest neighbor classifiers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2019

Deep Double Descent: Where Bigger Models and More Data Hurt

We show that a variety of modern deep learning tasks exhibit a "double-d...
research
07/02/2021

Mitigating deep double descent by concatenating inputs

The double descent curve is one of the most intriguing properties of dee...
research
03/04/2020

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

Neural networks appear to have mysterious generalization properties when...
research
03/13/2021

Conceptual capacity and effective complexity of neural networks

We propose a complexity measure of a neural network mapping function bas...
research
06/06/2022

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

We provide quantitative bounds measuring the L^2 difference in function ...
research
08/31/2021

When are Deep Networks really better than Random Forests at small sample sizes?

Random forests (RF) and deep networks (DN) are two of the most popular m...
research
12/01/2019

Adaptive Divergence for Rapid Adversarial Optimization

Adversarial Optimization (AO) provides a reliable, practical way to matc...

Please sign up or login with your details

Forgot password? Click here to reset