Depth Without the Magic: Inductive Bias of Natural Gradient Descent

11/22/2021
by   Anna Kerekes, et al.
0

In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep learning. However, natural gradient descent is approximately invariant to reparameterization, it always follows the same trajectory and finds the same optimum. The question naturally arises: What happens if we eliminate the role of parameterization, which solution will be found, what new properties occur? We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization. Some of our findings extend to nonlinear neural networks with sufficient but finite over-parametrization. We demonstrate that there exist learning problems where natural gradient descent fails to generalize, while gradient descent with the right architecture performs well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2020

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural netw...
research
02/05/2022

The Implicit Bias of Gradient Descent on Generalized Gated Linear Networks

Understanding the asymptotic behavior of gradient-descent training of de...
research
05/08/2012

The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use

The natural gradient allows for more efficient gradient descent by remov...
research
06/01/2023

Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks

Works on implicit regularization have studied gradient trajectories duri...
research
06/04/2018

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

We study the implicit regularization imposed by gradient descent for lea...
research
10/08/2021

On the Implicit Biases of Architecture Gradient Descent

Do neural networks generalise because of bias in the functions returned ...
research
11/17/2019

Deep Matrix Factorization with Spectral Geometric Regularization

Deep Matrix Factorization (DMF) is an emerging approach to the problem o...

Please sign up or login with your details

Forgot password? Click here to reset