Secondary gradient descent in higher codimension

09/14/2018
by   Y Cooper, et al.
0

In this paper, we analyze discrete gradient descent and ϵ-noisy gradient descent on a special but important class of functions. We find that when used to minimize a function L: R^n →R in this class we consider, discrete gradient descent can exhibit strikingly different behavior from continuous gradient descent. On long time scales, discrete gradient descent and continuous gradient descent tend toward different global minima of L. Discrete gradient descent preferentially finds global minima at which the graph of the function L is shallowest, while gradient flow shows no such preference.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2018

Discrete gradient descent differs qualitatively from gradient flow

We consider gradient descent on functions of the form L_1 = |f| and L_2 ...
research
07/08/2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent

As part of the effort to understand implicit bias of gradient descent in...
research
07/14/2021

Continuous vs. Discrete Optimization of Deep Neural Networks

Existing analyses of optimization in deep learning are either continuous...
research
05/10/2021

Exact asymptotic characterisation of running time for approximate gradient descent on random graphs

In this work we study the time complexity for the search of local minima...
research
10/01/2020

Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins

We analyze the properties of gradient descent on convex surrogates for t...
research
06/20/2020

Blind Descent: A Prequel to Gradient Descent

We describe an alternative to gradient descent for backpropogation throu...
research
08/29/2021

A Closed Loop Gradient Descent Algorithm applied to Rosenbrock's function

We introduce a novel adaptive damping technique for an inertial gradient...

Please sign up or login with your details

Forgot password? Click here to reset