SGD momentum optimizer with step estimation by online parabola model

07/16/2019
by   Jarek Duda, et al.
0

In stochastic gradient descent, especially for neural network training, there are currently dominating first order methods: not modeling local distance to minimum. This information required for optimal step size is provided by second order methods, however, they have many difficulties, starting with full Hessian having square of dimension number of coefficients. This article proposes a minimal step from successful first order momentum method toward second order: online parabola modelling in just a single direction: normalized v̂ from momentum method. It is done by estimating linear trend of gradients g⃗=∇ F(θ⃗) in v̂ direction: such that g(θ⃗_+θv̂)≈λ (θ -p) for θ = θ⃗·v̂, g= g⃗·v̂, θ⃗_=θ⃗-θv̂. Using linear regression, λ, p are MSE estimated by just updating four averages (of g, θ, gθ, θ^2) in the considered direction. Exponential moving averages allow here for inexpensive online estimation, weakening contribution of the old gradients. Controlling sign of curvature λ, we can repel from saddles in contrast to attraction in standard Newton method. In the remaining directions: not considered in second order model, we can simultaneously perform e.g. gradient descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2019

Improving SGD convergence by tracing multiple promising directions and estimating distance to minimum

Deep neural networks are usually trained with stochastic gradient descen...
research
11/08/2022

Black Box Lie Group Preconditioners for SGD

A matrix free and a low rank approximation preconditioner are proposed t...
research
07/30/2022

DRSOM: A Dimension Reduced Second-Order Method and Preliminary Analyses

We introduce a Dimension-Reduced Second-Order Method (DRSOM) for convex ...
research
09/26/2021

Curvature Injected Adaptive Momentum Optimizer for Convolutional Neural Networks

In this paper, we propose a new approach, hereafter referred as AdaInjec...
research
06/17/2020

A block coordinate descent optimizer for classification problems exploiting convexity

Second-order optimizers hold intriguing potential for deep learning, but...
research
07/17/2019

Meta-descent for Online, Continual Prediction

This paper investigates different vector step-size adaptation approaches...

Please sign up or login with your details

Forgot password? Click here to reset