Generalization Bounds for Neural Networks via Approximate Description Length

10/13/2019
by   Amit Daniely, et al.
0

We investigate the sample complexity of networks with bounds on the magnitude of its weights. In particular, we consider the class H={W_t∘ρ∘...∘ρ∘ W_1 :W_1,...,W_t-1∈ M_d, d, W_t∈ M_1,d} where the spectral norm of each W_i is bounded by O(1), the Frobenius norm is bounded by R, and ρ is the sigmoid function e^x/1+e^x or the smoothened ReLU function ln (1+e^x). We show that for any depth t, if the inputs are in [-1,1]^d, the sample complexity of H is Õ(dR^2/ϵ^2). This bound is optimal up to log-factors, and substantially improves over the previous state of the art of Õ(d^2R^2/ϵ^2). We furthermore show that this bound remains valid if instead of considering the magnitude of the W_i's, we consider the magnitude of W_i - W_i^0, where W_i^0 are some reference matrices, with spectral norm of O(1). By taking the W_i^0 to be the matrices at the onset of the training process, we get sample complexity bounds that are sub-linear in the number of parameters, in many typical regimes of parameters. To establish our results we develop a new technique to analyze the sample complexity of families H of predictors. We start by defining a new notion of a randomized approximate description of functions f:X→R^d. We then show that if there is a way to approximately describe functions in a class H using d bits, then d/ϵ^2 examples suffices to guarantee uniform convergence. Namely, that the empirical loss of all the functions in the class is ϵ-close to the true loss. Finally, we develop a set of tools for calculating the approximate description length of classes of functions that can be presented as a composition of linear function classes and non-linear functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2022

On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

We investigate the sample complexity of bounded two-layer neural network...
research
05/25/2023

Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks

We provide several new results on the sample complexity of vector-valued...
research
06/03/2023

On Size-Independent Sample Complexity of ReLU Networks

We study the sample complexity of learning ReLU neural networks from the...
research
02/13/2022

The Sample Complexity of One-Hidden-Layer Neural Networks

We study norm-based uniform convergence bounds for neural networks, aimi...
research
03/02/2021

Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks

We consider the problem of finding a two-layer neural network with sigmo...
research
07/28/2021

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

A common lens to theoretically study neural net architectures is to anal...
research
03/20/2020

Sample Complexity Result for Multi-category Classifiers of Bounded Variation

We control the probability of the uniform deviation between empirical an...

Please sign up or login with your details

Forgot password? Click here to reset