Computing the Information Content of Trained Neural Networks

03/01/2021
by   Jeremy Bernstein, et al.
8

How much information does a learning algorithm extract from the training data and store in a neural network's weights? Too much, and the network would overfit to the training data. Too little, and the network would not fit to anything at all. Naïvely, the amount of information the network stores should scale in proportion to the number of trainable weights. This raises the question: how can neural networks with vastly more weights than training data still generalise? A simple resolution to this conundrum is that the number of weights is usually a bad proxy for the actual amount of information stored. For instance, typical weight vectors may be highly compressible. Then another question occurs: is it possible to compute the actual amount of information stored? This paper derives both a consistent estimator and a closed-form upper bound on the information content of infinitely wide neural networks. The derivation is based on an identification between neural information content and the negative log probability of a Gaussian orthant. This identification yields bounds that analytically control the generalisation behaviour of the entire solution space of infinitely wide networks. The bounds have a simple dependence on both the network architecture and the training data. Corroborating the findings of Valle-Pérez et al. (2019), who conducted a similar analysis using approximate Gaussian integration techniques, the bounds are found to be both non-vacuous and correlated with the empirical generalisation behaviour at finite width.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2023

Fundamental limits of overparametrized shallow neural networks for supervised learning

We carry out an information-theoretical analysis of a two-layer neural n...
research
02/08/2022

Width is Less Important than Depth in ReLU Neural Networks

We solve an open question from Lu et al. (2017), by showing that any tar...
research
06/15/2022

Reconstructing Training Data from Trained Neural Networks

Understanding to what extent neural networks memorize training data is a...
research
11/12/2019

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks

We explore the problem of selectively forgetting a particular set of dat...
research
10/08/2021

On the Implicit Biases of Architecture Gradient Descent

Do neural networks generalise because of bias in the functions returned ...
research
04/21/2021

MLDS: A Dataset for Weight-Space Analysis of Neural Networks

Neural networks are powerful models that solve a variety of complex real...
research
09/28/2022

Algorithm Unfolding for Block-sparse and MMV Problems with Reduced Training Overhead

In this paper we consider algorithm unfolding for the Multiple Measureme...

Please sign up or login with your details

Forgot password? Click here to reset