Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

10/12/2019
by   Bubacarr Bah, et al.
0

We study the convergence of gradient flows related to learning deep linear neural networks from data (i.e., the activation function is the identity map). In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. We show that the gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-r matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, in the special case of an autoencoder, we show that the flow converges to a global minimum for almost all initializations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2021

Convergence of gradient descent for learning linear neural networks

We study the convergence properties of gradient descent for training dee...
research
01/24/2023

On the convergence of Sobolev gradient flow for the Gross-Pitaevskii eigenvalue problem

We study the convergences of three projected Sobolev gradient flows to t...
research
06/30/2023

Quantum State Assignment Flows

This paper introduces assignment flows for density matrices as state spa...
research
08/15/2016

A Riemannian Network for SPD Matrix Learning

Symmetric Positive Definite (SPD) matrix learning methods have become po...
research
01/31/2023

Dynamic Flows on Curved Space Generated by Labeled Data

The scarcity of labeled data is a long-standing challenge for many machi...
research
02/02/2023

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

We consider the optimisation of large and shallow neural networks via gr...
research
04/13/2018

Representing smooth functions as compositions of near-identity functions with implications for deep network optimization

We show that any smooth bi-Lipschitz h can be represented exactly as a c...

Please sign up or login with your details

Forgot password? Click here to reset