Learning One-hidden-layer neural networks via Provable Gradient Descent with Random Initialization

by   Shuhao Xia, et al.

Although deep learning has shown its powerful performance in many applications, the mathematical principles behind neural networks are still mysterious. In this paper, we consider the problem of learning a one-hidden-layer neural network with quadratic activations. We focus on the under-parameterized regime where the number of hidden units is smaller than the dimension of the inputs. We shall propose to solve the problem via a provable gradient-based method with random initialization. For the non-convex neural networks training problem we reveal that the gradient descent iterates are able to enter a local region that enjoys strong convexity and smoothness within a few iterations, and then provably converges to a globally optimal model at a linear rate with near-optimal sample complexity. We further corroborate our theoretical findings via various experiments.



There are no comments yet.



Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mystery in the success of neural networks is randomly initial...

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

We show that the (stochastic) gradient descent algorithm provides an imp...

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...

Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks

The lottery ticket hypothesis (LTH) states that learning on a properly p...

Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression

We study the local geometry of a one-hidden-layer fully-connected neural...

Optimization-Based Separations for Neural Networks

Depth separation results propose a possible theoretical explanation for ...

R-FORCE: Robust Learning for Random Recurrent Neural Networks

Random Recurrent Neural Networks (RRNN) are the simplest recurrent netwo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.