Deep orthogonal linear networks are shallow

11/27/2020
by   Pierre Ablin, et al.
0

We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between. We show that training the weights with Riemannian gradient descent is equivalent to training the whole factorization by gradient descent. This means that there is no effect of overparametrization and implicit bias at all in this setting: training such a deep, overparametrized, network is perfectly equivalent to training a one-layer shallow network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2022

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

We prove linear convergence of gradient descent to a global minimum for ...
research
10/27/2022

On the biological plausibility of orthogonal initialisation for solving gradient instability in deep neural networks

Initialising the synaptic weights of artificial neural networks (ANNs) w...
research
07/08/2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

As part of the effort to understand implicit bias of gradient descent in...
research
05/05/2011

Rapid Feature Learning with Stacked Linear Denoisers

We investigate unsupervised pre-training of deep architectures as featur...
research
11/17/2016

Generalized BackPropagation, Étude De Cas: Orthogonality

This paper introduces an extension of the backpropagation algorithm that...
research
02/22/2020

On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Training overparameterized convolutional neural networks with gradient b...
research
02/05/2018

Learning Compact Neural Networks with Regularization

We study the impact of regularization for learning neural networks. Our ...

Please sign up or login with your details

Forgot password? Click here to reset