Global Capacity Measures for Deep ReLU Networks via Path Sampling

by   Ryan Theisen, et al.

Classical results on the statistical complexity of linear models have commonly identified the norm of the weights w as a fundamental capacity measure. Generalizations of this measure to the setting of deep networks have been varied, though a frequently identified quantity is the product of weight norms of each layer. In this work, we show that for a large class of networks possessing a positive homogeneity property, similar bounds may be obtained instead in terms of the norm of the product of weights. Our proof technique generalizes a recently proposed sampling argument, which allows us to demonstrate the existence of sparse approximants of positive homogeneous networks. This yields covering number bounds, which can be converted to generalization bounds for multi-class classification that are comparable to, and in certain cases improve upon, existing results in the literature. Finally, we investigate our sampling procedure empirically, which yields results consistent with our theory.


page 1

page 2

page 3

page 4


Sparse-grid sampling recovery and deep ReLU neural networks in high-dimensional approximation

We investigate approximations of functions from the Hölder-Zygmund space...

On Measuring Excess Capacity in Neural Networks

We study the excess capacity of deep networks in the context of supervis...

Capacity Control of ReLU Neural Networks by Basis-path Norm

Recently, path norm was proposed as a new capacity measure for neural ne...

Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

We study the relationship between geometry and capacity measures for dee...

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation

Existing Rademacher complexity bounds for neural networks rely only on n...

Fantastic Generalization Measures and Where to Find Them

Generalization of deep networks has been of great interest in recent yea...

Deep Linear Networks can Benignly Overfit when Shallow Ones Do

We bound the excess risk of interpolating deep linear networks trained u...