Gradient Descent Happens in a Tiny Subspace

12/12/2018
by   Guy Gur-Ari, et al.
0

We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly in this subspace. We give an example of this effect in a solvable model of classification, and we comment on possible implications for optimization and learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2011

Krylov Subspace Descent for Deep Learning

In this paper, we propose a second order optimization method to learn mo...
research
01/12/2022

There is a Singularity in the Loss Landscape

Despite the widespread adoption of neural networks, their training dynam...
research
03/30/2022

Convergence of gradient descent for deep neural networks

Optimization by gradient descent has been one of main drivers of the "de...
research
06/11/2020

Randomized Fast Subspace Descent Methods

Randomized Fast Subspace Descent (RFASD) Methods are developed and analy...
research
11/02/2022

70 years of Krylov subspace methods: The journey continues

Using computed examples for the Conjugate Gradient method and GMRES, we ...
research
10/01/2016

Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation From Undersampled Data

Subspace learning and matrix factorization problems have a great many ap...
research
02/19/2014

Subspace Learning with Partial Information

The goal of subspace learning is to find a k-dimensional subspace of R^d...

Please sign up or login with your details

Forgot password? Click here to reset