Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes

06/04/2021
by   Elvis Dohmatob, et al.
0

This work studies the (non)robustness of two-layer neural networks in various high-dimensional linearized regimes. We establish fundamental trade-offs between memorization and robustness, as measured by the Sobolev-seminorm of the model w.r.t the data distribution, i.e the square root of the average squared L_2-norm of the gradients of the model w.r.t the its input. More precisely, if n is the number of training examples, d is the input dimension, and k is the number of hidden neurons in a two-layer neural network, we prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded by (i) √(n) in case of infinite-width random features (RF) or neural tangent kernel (NTK) with d ≳ n; (ii) √(n) in case of finite-width RF with proportionate scaling of d and k; and (iii) √(n/k) in case of finite-width NTK with proportionate scaling of d and k. Moreover, all of these lower-bounds are tight: they are attained by the min-norm / least-squares interpolator (when n, d, and k are in the appropriate interpolating regime). All our results hold as soon as data is log-concave isotropic, and there is label-noise, i.e the target variable is not a deterministic function of the data / features. We empirically validate our theoretical results with experiments. Accidentally, these experiments also reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2022

On the (Non-)Robustness of Two-Layer Neural Networks in Different Learning Regimes

Neural networks are known to be highly sensitive to adversarial examples...
research
04/15/2020

A function space analysis of finite neural networks with insights from sampling theory

This work suggests using sampling theory to analyze the function space r...
research
02/13/2019

How do infinite width bounded norm networks look in function space?

We consider the question of what functions can be captured by ReLU netwo...
research
10/07/2021

Tighter Sparse Approximation Bounds for ReLU Neural Networks

A well-known line of work (Barron, 1993; Breiman, 1993; Klusowski Ba...
research
03/02/2020

Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems

The expressivity of neural networks as a function of their depth, width ...
research
10/03/2019

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

A key element of understanding the efficacy of overparameterized neural ...
research
02/13/2023

Precise Asymptotic Analysis of Deep Random Feature Models

We provide exact asymptotic expressions for the performance of regressio...

Please sign up or login with your details

Forgot password? Click here to reset