Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

11/04/2019
by   Surbhi Goel, et al.
0

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let opt < 1 be the population loss of the best-fitting ReLU. We prove: 1. Finding a ReLU with square-loss opt + ϵ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply -unconditionally- that gradient descent cannot converge to the global minimum in polynomial time. 2. There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error O(opt^2/3). The algorithm uses a novel reduction to noisy halfspace learning with respect to 0/1 loss. Prior work due to Soltanolkotabi [Sol17] showed that gradient descent can find the best-fitting ReLU with respect to Gaussian marginals, if the training set is exactly labeled by a ReLU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2022

Agnostic Learning of General ReLU Activation Using Gradient Descent

We provide a convergence analysis of gradient descent for the problem of...
research
05/26/2020

Approximation Schemes for ReLU Regression

We consider the fundamental problem of ReLU regression, where the goal i...
research
07/09/2020

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

We consider the dynamic of gradient descent for learning a two-layer neu...
research
06/22/2020

Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent

We prove the first superpolynomial lower bounds for learning one-layer n...
research
02/13/2023

Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals

We study the task of agnostically learning halfspaces under the Gaussian...
research
07/24/2023

Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials

We study the problem of PAC learning a linear combination of k ReLU acti...
research
10/01/2022

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

The NP-hard problem of optimizing a shallow ReLU network can be characte...

Please sign up or login with your details

Forgot password? Click here to reset