Convergence of gradient descent for learning linear neural networks

08/04/2021
by   Gabin Maxime Nguegnang, et al.
0

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on the step sizes gradient descent converges to a critical point of the loss function, i.e., the square loss in this article. Furthermore, we demonstrate that for almost all initializations gradient descent converges to a global minimum in the case of two layers. In the case of three or more layers we show that gradient descent converges to a global minimum on the manifold matrices of some fixed rank, where the rank cannot be determined a priori.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2018

Gradient descent aligns the layers of deep linear networks

This paper establishes risk convergence and asymptotic weight matrix ali...
research
03/30/2022

Convergence of gradient descent for deep neural networks

Optimization by gradient descent has been one of main drivers of the "de...
research
10/12/2019

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

We study the convergence of gradient flows related to learning deep line...
research
11/08/2018

A Geometric Approach of Gradient Descent Algorithms in Neural Networks

In this article we present a geometric framework to analyze convergence ...
research
05/27/2019

Fast Convergence of Natural Gradient Descent for Overparameterized Neural Networks

Natural gradient descent has proven effective at mitigating the effects ...
research
11/02/2019

Global Convergence of Gradient Descent for Deep Linear Residual Networks

We analyze the global convergence of gradient descent for deep linear re...
research
06/04/2018

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

We study the implicit regularization imposed by gradient descent for lea...

Please sign up or login with your details

Forgot password? Click here to reset