Fundamental limits of overparametrized shallow neural networks for supervised learning

07/11/2023
by   Francesco Camilli, et al.
0

We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, thus yield fundamental performance limits for any neural network (and actually any learning procedure) trained from limited data generated according to our two-layer teacher neural network model. The proof relies on rigorous tools from spin glasses and is guided by “Gaussian equivalence principles” lying at the core of numerous recent analyses of neural networks. With respect to the existing literature, which is either non-rigorous or restricted to the case of the learning of the readout weights only, our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the network parameters are trained.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2022

Is Stochastic Gradient Descent Near Optimal?

The success of neural networks over the past decade has established them...
research
03/01/2021

Computing the Information Content of Trained Neural Networks

How much information does a learning algorithm extract from the training...
research
07/07/2022

Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data

This paper analyzes the convergence and generalization of training a one...
research
02/03/2022

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

We focus on a specific class of shallow neural networks with a single hi...
research
06/14/2018

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Heuristic tools from statistical physics have been used in the past to l...
research
06/10/2023

Any-dimensional equivariant neural networks

Traditional supervised learning aims to learn an unknown mapping by fitt...
research
04/30/2015

Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?

Three important properties of a classification machinery are: (i) the sy...

Please sign up or login with your details

Forgot password? Click here to reset