Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

02/18/2020
by   Quynh Nguyen, et al.
0

A recent line of research has provided convergence guarantees for gradient descent algorithms in the excessive over-parameterization regime where the widths of all the hidden layers are required to be polynomially large in the number of training samples. However, the widths of practical deep networks are often only large in the first layer(s) and then start to decrease towards the output layer. This raises an interesting open question whether similar results also hold under this empirically relevant setting. Existing theoretical insights suggest that the loss surface of this class of networks is well-behaved, but these results usually do not provide direct algorithmic guarantees for optimization. In this paper, we close the gap by showing that one wide layer followed by pyramidal deep network topology suffices for gradient descent to find a global minimum with a geometric rate. Our proof is based on a weak form of Polyak-Lojasiewicz inequality which holds for deep pyramidal networks in the manifold of full-rank weight matrices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2021

On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths

This paper studies the global convergence of gradient descent for deep R...
research
05/20/2022

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provid...
research
04/12/2021

A Recipe for Global Convergence Guarantee in Deep Neural Networks

Existing global convergence guarantees of (stochastic) gradient descent ...
research
10/04/2018

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

We analyze speed of convergence to global optimum for gradient descent t...
research
02/12/2019

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

Many modern neural network architectures are trained in an overparameter...
research
12/30/2017

Theory of Deep Learning III: explaining the non-overfitting puzzle

A main puzzle of deep networks revolves around the absence of overfittin...
research
02/18/2021

On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality

It has been empirically observed that, in deep neural networks, the solu...

Please sign up or login with your details

Forgot password? Click here to reset