Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)

09/15/2022
by   Zhenyu Zhu, et al.
0

We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth hurts the robustness. Moreover, under non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by Huang et al. [2021], Wu et al. [2021] and are consistent with Bubeck and Sellke [2021], Bubeck et al. [2021].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2019

Finite Depth and Width Corrections to the Neural Tangent Kernel

We prove the precise scaling, at finite depth and width, for the mean an...
research
03/22/2022

On the (Non-)Robustness of Two-Layer Neural Networks in Different Learning Regimes

Neural networks are known to be highly sensitive to adversarial examples...
research
05/19/2019

A type of generalization error induced by initialization in deep neural networks

How different initializations and loss functions affect the learning of ...
research
05/24/2022

Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width

Substantial work indicates that the dynamics of neural networks (NNs) is...
research
11/02/2021

Subquadratic Overparameterization for Shallow Neural Networks

Overparameterization refers to the important phenomenon where the width ...
research
01/16/2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

The selection of initial parameter values for gradient-based optimizatio...
research
11/03/2016

Demystifying ResNet

The Residual Network (ResNet), proposed in He et al. (2015), utilized sh...

Please sign up or login with your details

Forgot password? Click here to reset