Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks

01/28/2022
by   Bartłomiej Polaczyk, et al.
0

We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions used in practice, including ReLU. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network. First, we establish the global convergence of continuous solutions of the differential inclusion being a nonsmooth analogue of the gradient flow for the MSE loss. Second, we provide a technical result (working also for general approximators) relating solutions of the aforementioned differential inclusion to the (discrete) stochastic gradient descent sequences, hence establishing linear convergence towards zero loss for the stochastic gradient descent iterations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2018

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

We study the problem of training deep neural networks with Rectified Lin...
research
02/03/2022

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

We focus on a specific class of shallow neural networks with a single hi...
research
07/02/2020

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach

Structural equation models (SEMs) are widely used in sciences, ranging f...
research
08/30/2019

Partitioned integrators for thermodynamic parameterization of neural networks

Stochastic Gradient Langevin Dynamics, the "unadjusted Langevin algorith...
research
05/23/2019

Parsimonious Deep Learning: A Differential Inclusion Approach with Global Convergence

Over-parameterization is ubiquitous nowadays in training neural networks...
research
04/12/2022

An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization

We propose a new stochastic gradient descent algorithm for finding the g...
research
10/29/2020

Over-parametrized neural networks as under-determined linear systems

We draw connections between simple neural networks and under-determined ...

Please sign up or login with your details

Forgot password? Click here to reset