Eva: A General Vectorized Approximation Framework for Second-order Optimization

08/04/2023
by   Lin Zhang, et al.
0

Second-order optimization algorithms exhibit excellent convergence properties for training deep learning models, but often incur significant computation and memory overheads. This can result in lower training efficiency than the first-order counterparts such as stochastic gradient descent (SGD). In this work, we present a memory- and time-efficient second-order algorithm named Eva with two novel techniques: 1) we construct the second-order information with the Kronecker factorization of small stochastic vectors over a mini-batch of training data to reduce memory consumption, and 2) we derive an efficient update formula without explicitly computing the inverse of matrices using the Sherman-Morrison formula. We further extend Eva to a general vectorized approximation framework to improve the compute and memory efficiency of two existing second-order algorithms (FOOF and Shampoo) without affecting their convergence performance. Extensive experimental results on different models and datasets show that Eva reduces the end-to-end training time up to 2.05x and 2.42x compared to first-order SGD and second-order algorithms (K-FAC and Shampoo), respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems

First-order methods such as stochastic gradient descent (SGD) are curren...
research
02/20/2020

Second Order Optimization Made Practical

Optimization in machine learning, both theoretical and applied, is prese...
research
06/15/2021

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is prese...
research
08/17/2020

Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible

Machine learning is predicated on the concept of generalization: a model...
research
04/16/2018

Block Mean Approximation for Efficient Second Order Optimization

Advanced optimization algorithms such as Newton method and AdaGrad benef...
research
09/29/2021

Second-Order Neural ODE Optimizer

We propose a novel second-order optimization framework for training the ...

Please sign up or login with your details

Forgot password? Click here to reset