Convergence of gradient descent for deep neural networks

03/30/2022
by   Sourav Chatterjee, et al.
178

Optimization by gradient descent has been one of main drivers of the "deep learning revolution". Yet, despite some recent progress for extremely wide networks, it remains an open problem to understand why gradient descent often converges to global minima when training deep neural networks. This article presents a new criterion for convergence of gradient descent to a global minimum, which is provably more powerful than the best available criteria from the literature, namely, the Lojasiewicz inequality and its generalizations. This criterion is used to show that gradient descent with proper initialization converges to a global minimum when training any feedforward neural network with smooth and strictly increasing activation functions, provided that the input dimension is greater than or equal to the number of data points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2021

Convergence of gradient descent for learning linear neural networks

We study the convergence properties of gradient descent for training dee...
research
08/05/2019

Gradient Descent Finds Global Minima for Generalizable Deep Neural Networks of Practical Sizes

In this paper, we theoretically prove that gradient descent can find a g...
research
07/07/2020

Gradient Descent Converges to Ridgelet Spectrum

Deep learning achieves a high generalization performance in practice, de...
research
05/24/2019

A Polynomial-Based Approach for Architectural Design and Learning with Deep Neural Networks

In this effort we propose a novel approach for reconstructing multivaria...
research
11/25/2021

Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

Despite their massive success, training successful deep neural networks ...
research
12/12/2018

Gradient Descent Happens in a Tiny Subspace

We show that in a variety of large-scale deep learning scenarios the gra...
research
05/26/2022

A framework for overparameterized learning

An explanation for the success of deep neural networks is a central ques...

Please sign up or login with your details

Forgot password? Click here to reset