On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

11/17/2022
by   Amit Daniely, et al.
0

We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class ℋ = {x↦⟨v, σ∘ Wx + b⟩ : b∈ℝ^d, W ∈ℝ^T× d, v∈ℝ^T} where the spectral norm of W and v is bounded by O(1), the Frobenius norm of W is bounded from its initialization by R > 0, and σ is a Lipschitz activation function. We prove that if σ is element-wise, then the sample complexity of ℋ is width independent and that this complexity is tight. Moreover, we show that the element-wise property of σ is essential for width-independent bound, in the sense that there exist non-element-wise activation functions whose sample complexity is provably width-dependent. For the upper bound, we use the recent approach for norm-based bounds named Approximate Description Length (ADL) by arXiv:1910.05697. We further develop new techniques and tools for this approach, that will hopefully inspire future works.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset