On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences

05/28/2023
by   Alireza Fathollah Pour, et al.
0

We consider the class of noisy multi-layered sigmoid recurrent neural networks with w (unbounded) weights for classification of sequences of length T, where independent noise distributed according to 𝒩(0,σ^2) is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by O (wlog(T/σ)). For the non-noisy version of the same class (i.e., σ=0), we prove a lower bound of Ω (wT) for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on T for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on 1/σ, this gap still holds even for numerically negligible values of σ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2019

On sample complexity of neural networks

We consider functions defined by deep neural networks as definable objec...
research
06/28/2023

Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise

We study the problem of PAC learning γ-margin halfspaces with Random Cla...
research
03/01/2022

Sample Complexity versus Depth: An Information Theoretic Analysis

Deep learning has proven effective across a range of data sets. In light...
research
09/10/2021

PAC Mode Estimation using PPR Martingale Confidence Sequences

We consider the problem of correctly identifying the mode of a discrete ...
research
05/12/2022

Sample Complexity Bounds for Robustly Learning Decision Lists against Evasion Attacks

A fundamental problem in adversarial machine learning is to quantify how...
research
11/17/2022

On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation

We investigate the sample complexity of bounded two-layer neural network...
research
07/12/2021

Forster Decomposition and Learning Halfspaces with Noise

A Forster transform is an operation that turns a distribution into one w...

Please sign up or login with your details

Forgot password? Click here to reset