On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

11/17/2022
by   Amit Daniely, et al.
0

We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class ℋ = {x↦⟨v, σ∘ Wx + b⟩ : b∈ℝ^d, W ∈ℝ^T× d, v∈ℝ^T} where the spectral norm of W and v is bounded by O(1), the Frobenius norm of W is bounded from its initialization by R > 0, and σ is a Lipschitz activation function. We prove that if σ is element-wise, then the sample complexity of ℋ is width independent and that this complexity is tight. Moreover, we show that the element-wise property of σ is essential for width-independent bound, in the sense that there exist non-element-wise activation functions whose sample complexity is provably width-dependent. For the upper bound, we use the recent approach for norm-based bounds named Approximate Description Length (ADL) by arXiv:1910.05697. We further develop new techniques and tools for this approach, that will hopefully inspire future works.

READ FULL TEXT
research
10/13/2019

Generalization Bounds for Neural Networks via Approximate Description Length

We investigate the sample complexity of networks with bounds on the magn...
research
02/13/2022

The Sample Complexity of One-Hidden-Layer Neural Networks

We study norm-based uniform convergence bounds for neural networks, aimi...
research
03/02/2021

Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks

We consider the problem of finding a two-layer neural network with sigmo...
research
04/13/2022

Approximation of Lipschitz Functions using Deep Spline Neural Networks

Lipschitz-constrained neural networks have many applications in machine ...
research
11/12/2019

Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks

We study the sample complexity of learning one-hidden-layer convolutiona...
research
06/01/2023

Provable Benefit of Mixup for Finding Optimal Decision Boundaries

We investigate how pair-wise data augmentation techniques like Mixup aff...
research
05/28/2023

On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences

We consider the class of noisy multi-layered sigmoid recurrent neural ne...

Please sign up or login with your details

Forgot password? Click here to reset