A Universal Law of Robustness via Isoperimetry

05/26/2021
by   Sébastien Bubeck, et al.
1

Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry. In the case of two-layers neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li and Nagaraj.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2023

Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels

Machine learning models are vulnerable to adversarial perturbations, and...
research
02/08/2021

Learning Curve Theory

Recently a number of empirical "universal" scaling law papers have been ...
research
02/02/2023

Sharp Lower Bounds on Interpolation by Deep ReLU Neural Networks at Irregularly Spaced Data

We study the interpolation, or memorization, power of deep ReLU neural n...
research
07/23/2022

A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

In this work we establish an algorithm and distribution independent non-...
research
10/03/2022

Plateau in Monotonic Linear Interpolation – A "Biased" View of Loss Landscape for Deep Networks

Monotonic linear interpolation (MLI) - on the line connecting a random i...
research
09/28/2020

Learning Deep ReLU Networks Is Fixed-Parameter Tractable

We consider the problem of learning an unknown ReLU network with respect...
research
10/20/2017

Is space a word, too?

For words, rank-frequency distributions have long been heralded for adhe...

Please sign up or login with your details

Forgot password? Click here to reset