Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

02/26/2017
by   Alon Brutzkus, et al.
0

Deep learning models are often successfully trained using gradient descent, despite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeed. Here we provide a strong result of this kind. We consider a neural net with one hidden layer and a convolutional structure with no overlap and a ReLU activation function. For this architecture we show that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. To the best of our knowledge, this is the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2018

Learning One-hidden-layer ReLU Networks via Gradient Descent

We study the problem of learning one-hidden-layer neural networks with R...
research
12/03/2017

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

We consider the problem of learning a one-hidden-layer neural network wi...
research
09/18/2017

When is a Convolutional Filter Easy To Learn?

We analyze the convergence of (stochastic) gradient descent algorithm fo...
research
06/22/2020

Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent

We prove the first superpolynomial lower bounds for learning one-layer n...
research
10/28/2016

Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods

The optimization problem behind neural networks is highly non-convex. Tr...
research
10/01/2022

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

The NP-hard problem of optimizing a shallow ReLU network can be characte...
research
11/08/2017

Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels

In this paper, we consider parameter recovery for non-overlapping convol...

Please sign up or login with your details

Forgot password? Click here to reset