KrADagrad: Kronecker Approximation-Domination Gradient Preconditioned Stochastic Optimization

05/30/2023
by   Jonathan Mei, et al.
0

Second order stochastic optimizers allow parameter update step size and direction to adapt to loss curvature, but have traditionally required too much memory and compute for deep learning. Recently, Shampoo [Gupta et al., 2018] introduced a Kronecker factored preconditioner to reduce these requirements: it is used for large deep models [Anil et al., 2020] and in production [Anil et al., 2022]. However, it takes inverse matrix roots of ill-conditioned matrices. This requires 64-bit precision, imposing strong hardware constraints. In this paper, we propose a novel factorization, Kronecker Approximation-Domination (KrAD). Using KrAD, we update a matrix that directly approximates the inverse empirical Fisher matrix (like full matrix AdaGrad), avoiding inversion and hence 64-bit precision. We then propose KrADagrad^⋆, with similar computational costs to Shampoo and the same regret. Synthetic ill-conditioned experiments show improved performance over Shampoo for 32-bit precision, while for several real datasets we have comparable or better generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

The second-order training methods can converge much faster than first-or...
research
10/18/2019

First-Order Preconditioning via Hypergradient Descent

Standard gradient descent methods are susceptible to a range of issues t...
research
11/27/2020

Eigenvalue-corrected Natural Gradient Based on a New Approximation

Using second-order optimization methods for training deep neural network...
research
04/30/2019

Hitting Time of Stochastic Gradient Langevin Dynamics to Stationary Points: A Direct Analysis

Stochastic gradient Langevin dynamics (SGLD) is a fundamental algorithm ...
research
04/15/2020

A New Constrained Optimization Model for Solving the Nonsymmetric Stochastic Inverse Eigenvalue Problem

The stochastic inverse eigenvalue problem aims to reconstruct a stochast...
research
06/14/2021

NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

In this paper, a novel second-order method called NG+ is proposed. By fo...
research
02/27/2019

A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization : Revisited

A simple prior free factorization algorithmdai2014simple is quite often ...

Please sign up or login with your details

Forgot password? Click here to reset