Complexity, Statistical Risk, and Metric Entropy of Deep Nets Using Total Path Variation

02/02/2019
by   Andrew R. Barron, et al.
0

For any ReLU network there is a representation in which the sum of the absolute values of the weights into each node is exactly 1, and the input layer variables are multiplied by a value V coinciding with the total variation of the path weights. Implications are given for Gaussian complexity, Rademacher complexity, statistical risk, and metric entropy, all of which are shown to be proportional to V. There is no dependence on the number of nodes per layer, except for the number of inputs d. For estimation with sub-Gaussian noise, the mean square generalization error bounds that can be obtained are of order V √(L + d)/√(n), where L is the number of layers and n is the sample size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2018

Total variation distance for discretely observed Lévy processes: a Gaussian approximation of the small jumps

It is common practice to treat small jumps of Lévy processes as Wiener n...
research
09/10/2018

Approximation and Estimation for High-Dimensional Deep Learning Networks

It has been experimentally observed in recent years that multi-layer art...
research
06/08/2021

Entropy of the Conditional Expectation under Gaussian Noise

This paper considers an additive Gaussian noise channel with arbitrarily...
research
09/04/2019

Learning Distributions Generated by One-Layer ReLU Networks

We consider the problem of estimating the parameters of a d-dimensional ...
research
08/12/2021

Statistical Learning using Sparse Deep Neural Networks in Empirical Risk Minimization

We consider a sparse deep ReLU network (SDRN) estimator obtained from em...
research
07/21/2023

What can a Single Attention Layer Learn? A Study Through the Random Features Lens

Attention layers – which map a sequence of inputs to a sequence of outpu...
research
02/13/2019

Variance-Preserving Initialization Schemes Improve Deep Network Training: But Which Variance is Preserved?

Before training a neural net, a classic rule of thumb is to randomly ini...

Please sign up or login with your details

Forgot password? Click here to reset