KAISA: An Adaptive Second-order Optimizer Framework for Deep Neural Networks

07/04/2021
by   J. Gregory Pauloski, et al.
15

Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific models and hardware to achieve maximized performance and enhanced scalability. We quantify the tradeoffs between memory and communication cost and evaluate KAISA on large models, including ResNet-50, Mask R-CNN, U-Net, and BERT, on up to 128 NVIDIA A100 GPUs. Compared to the original optimizers, KAISA converges 18.1-36.3 faster across applications with the same global batch size. Under a fixed memory budget, KAISA converges 32.5 BERT-Large, respectively. KAISA can balance memory and communication to achieve scaling efficiency equal to or better than the baseline optimizers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2022

Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning

The second-order optimization methods, notably the D-KFAC (Distributed K...
research
06/02/2023

MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates

This work proposes a Momentum-Enabled Kronecker-Factor-Based Optimizer U...
research
07/01/2020

Convolutional Neural Network Training with Distributed K-FAC

Training neural networks with many processors can reduce time-to-solutio...
research
01/21/2019

AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Deep Neural Networks

Typically, Ultra-deep neural network(UDNN) tends to yield high-quality m...
research
06/04/2020

Scaling Distributed Training with Adaptive Summation

Stochastic gradient descent (SGD) is an inherently sequential training a...
research
06/15/2021

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is prese...
research
05/16/2022

Optimizing the optimizer for data driven deep neural networks and physics informed neural networks

We investigate the role of the optimizer in determining the quality of t...

Please sign up or login with your details

Forgot password? Click here to reset