Learning Deep ReLU Networks Is Fixed-Parameter Tractable

09/28/2020
by   Sitan Chen, et al.
31

We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. Our bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network (we show that some dependence on the Lipschitz constant is necessary). We also give a bound that is doubly exponential in the size of the network but is independent of spectral norm. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, even when the above parameters are bounded by a constant. Additionally, all prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2023

Learning Narrow One-Hidden-Layer ReLU Networks

We consider the well-studied problem of learning a linear combination of...
research
05/28/2019

Concavifiability and convergence: necessary and sufficient conditions for gradient descent analysis

Convergence of the gradient descent algorithm has been attracting renewe...
research
11/04/2016

Understanding Deep Neural Networks with Rectified Linear Units

In this paper we investigate the family of functions representable by de...
research
11/15/2021

ReLU Network Approximation in Terms of Intrinsic Parameters

This paper studies the approximation error of ReLU networks in terms of ...
research
05/09/2019

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation

Existing Rademacher complexity bounds for neural networks rely only on n...
research
04/15/2019

Depth Separations in Neural Networks: What is Actually Being Separated?

Existing depth separation results for constant-depth networks essentiall...
research
05/26/2021

A Universal Law of Robustness via Isoperimetry

Classically, data interpolation with a parametrized model class is possi...

Please sign up or login with your details

Forgot password? Click here to reset