Learning One-hidden-layer neural networks via Provable Gradient Descent with Random Initialization

07/04/2019
by   Shuhao Xia, et al.
0

Although deep learning has shown its powerful performance in many applications, the mathematical principles behind neural networks are still mysterious. In this paper, we consider the problem of learning a one-hidden-layer neural network with quadratic activations. We focus on the under-parameterized regime where the number of hidden units is smaller than the dimension of the inputs. We shall propose to solve the problem via a provable gradient-based method with random initialization. For the non-convex neural networks training problem we reveal that the gradient descent iterates are able to enter a local region that enjoys strong convexity and smoothness within a few iterations, and then provably converges to a globally optimal model at a linear rate with near-optimal sample complexity. We further corroborate our theoretical findings via various experiments.

READ FULL TEXT
research
10/04/2018

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mystery in the success of neural networks is randomly initial...
research
12/26/2017

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

We show that the (stochastic) gradient descent algorithm provides an imp...
research
02/18/2018

Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression

We study the local geometry of a one-hidden-layer fully-connected neural...
research
09/17/2021

AdaLoss: A computationally-efficient and provably convergent adaptive gradient method

We propose a computationally-friendly adaptive learning rate schedule, "...
research
05/17/2023

On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural Networks with Linear Activations

Recent research in neural networks and machine learning suggests that us...
research
12/04/2021

Optimization-Based Separations for Neural Networks

Depth separation results propose a possible theoretical explanation for ...
research
06/08/2022

Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials

A recent goal in the theory of deep learning is to identify how neural n...

Please sign up or login with your details

Forgot password? Click here to reset