Stochastic Gradient Methods with Preconditioned Updates

06/01/2022
by   Abdurakhmon Sadiev, et al.
0

This work considers non-convex finite sum minimization. There are a number of algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner that is based upon Hutchinson's approach to approximating the diagonal of the Hessian, and couple it with several gradient based methods to give new `scaled' algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical complexity guarantees under smoothness assumptions are presented, and we prove linear convergence when both smoothness and the PL-condition is assumed. Because our adaptively scaled methods use approximate partial second order curvature information, they are better able to mitigate the impact of badly scaled problems, and this improved practical performance is demonstrated in the numerical experiments that are also presented in this work.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition

We provide a theoretical explanation for the fast convergence of gradien...
research
09/11/2021

Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information

We present a novel adaptive optimization algorithm for large-scale machi...
research
11/24/2020

Shuffling Gradient-Based Methods with Momentum

We combine two advanced ideas widely used in optimization for machine le...
research
06/06/2020

SONIA: A Symmetric Blockwise Truncated Optimization Algorithm

This work presents a new algorithm for empirical risk minimization. The ...
research
11/04/2015

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

Recurrent Neural Networks (RNNs) are powerful models that achieve except...
research
06/07/2022

Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation

Despite their high computation and communication costs, Newton-type meth...
research
07/01/2016

A scaled Bregman theorem with applications

Bregman divergences play a central role in the design and analysis of a ...

Please sign up or login with your details

Forgot password? Click here to reset