A Capacity Scaling Law for Artificial Neural Networks

08/20/2017
by   Gerald Friedland, et al.
0

By assuming an ideal neural network with gating functions handling the worst case data, we derive the calculation of two critical numbers predicting the behavior of perceptron networks. First, we derive the calculation of what we call the lossless memory (LM) dimension. The LM dimension is a generalization of the Vapnik-Chervonenkis (VC) dimension that avoids structured data and therefore provides an upper bound for perfectly fitting any training data. Second, we derive what we call the MacKay (MK) dimension. This limit indicates necessary forgetting, that is, the lower limit for most generalization uses of the network. Our derivations are performed by embedding the ideal network into Shannon's communication model which allows to interpret the two points as capacities measured in bits. We validate our upper bounds with repeatable experiments using different network configurations, diverse implementations, varying activation functions, and several learning algorithms. The bottom line is that the two capacity points scale strictly linear with the number of weights. Among other practical applications, our result allows network implementations with gating functions (e. g., sigmoid or rectified linear units) to be evaluated against our upper limit independent of a concrete task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2018

Bounds on the Approximation Power of Feedforward Neural Networks

The approximation power of general feedforward neural networks with piec...
research
04/21/2023

Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions

We present a new general-purpose algorithm for learning classes of [0,1]...
research
06/30/2020

Approximation Rates for Neural Networks with Encodable Weights in Smoothness Spaces

We examine the necessary and sufficient complexity of neural networks to...
research
02/10/2018

Generalization of an Upper Bound on the Number of Nodes Needed to Achieve Linear Separability

An important issue in neural network research is how to choose the numbe...
research
04/16/2021

Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums

We present results on the number of linear regions of the functions that...
research
03/13/2023

Bayes Complexity of Learners vs Overfitting

We introduce a new notion of complexity of functions and we show that it...
research
04/25/2022

Trainable Compound Activation Functions for Machine Learning

Activation functions (AF) are necessary components of neural networks th...

Please sign up or login with your details

Forgot password? Click here to reset