SketchySGD: Reliable Stochastic Optimization via Robust Curvature Estimates

11/16/2022
by   Zachary Frangella, et al.
0

We introduce SketchySGD, a stochastic quasi-Newton method that uses sketching to approximate the curvature of the loss function. Quasi-Newton methods are among the most effective algorithms in traditional optimization, where they converge much faster than first-order methods such as SGD. However, for contemporary deep learning, quasi-Newton methods are considered inferior to first-order methods like SGD and Adam owing to higher per-iteration complexity and fragility due to inexact gradients. SketchySGD circumvents these issues by a novel combination of subsampling, randomized low-rank approximation, and dynamic regularization. In the convex case, we show SketchySGD with a fixed stepsize converges to a small ball around the optimum at a faster rate than SGD. In the non-convex case, SketchySGD converges linearly under two additional assumptions, interpolation and the Polyak-Lojaciewicz condition, the latter of which holds with high probability for wide neural networks. Numerical experiments on image and tabular data demonstrate the improved reliability and speed of SketchySGD for deep learning, compared to standard optimizers such as SGD and Adam and existing quasi-Newton methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2014

A Stochastic Quasi-Newton Method for Large-Scale Optimization

The question of how to incorporate curvature information in stochastic a...
research
08/17/2023

Dual Gauss-Newton Directions for Deep Learning

Inspired by Gauss-Newton-like methods, we study the benefit of leveragin...
research
05/07/2018

Implementation of Stochastic Quasi-Newton's Method in PyTorch

In this paper, we implement the Stochastic Damped LBFGS (SdLBFGS) for st...
research
11/04/2015

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

Recurrent Neural Networks (RNNs) are powerful models that achieve except...
research
09/28/2020

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex ...
research
11/25/2022

Nonlinear Schwarz preconditioning for Quasi-Newton methods

We propose the nonlinear restricted additive Schwarz (RAS) preconditioni...
research
09/28/2020

Escaping Saddle-Points Faster under Interpolation-like Conditions

In this paper, we show that under over-parametrization several standard ...

Please sign up or login with your details

Forgot password? Click here to reset