Tight Hardness Results for Training Depth-2 ReLU Networks

11/27/2020
by   Surbhi Goel, et al.
0

We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of k ReLUs minimizing the squared error (for k>1) even in the realizable setting (i.e., when the labels are consistent with an unknown depth-2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error ϵ. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest κ-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in 1/ϵ^2. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth-2 network of ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on ϵ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2018

The Computational Complexity of Training ReLU(s)

We consider the computational complexity of training depth-2 neural netw...
research
05/18/2021

The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality

Understanding the computational complexity of training simple neural net...
research
11/11/2020

The Strongish Planted Clique Hypothesis and Its Consequences

We formulate a new hardness assumption, the Strongish Planted Clique Hyp...
research
06/26/2022

Bounding the Width of Neural Networks via Coupled Initialization – A Worst Case Analysis

A common method in training neural networks is to initialize all the wei...
research
02/24/2023

Lower Bounds on the Depth of Integral ReLU Neural Networks via Lattice Polytopes

We prove that the set of functions representable by ReLU neural networks...
research
01/22/2019

Lower bounds for testing graphical models: colorings and antiferromagnetic Ising models

We study the identity testing problem in the context of spin systems or ...
research
02/15/2023

Efficiently Learning Neural Networks: What Assumptions May Suffice?

Understanding when neural networks can be learned efficiently is a funda...

Please sign up or login with your details

Forgot password? Click here to reset