Error Feedback Can Accurately Compress Preconditioners

06/09/2023
by   Ionuţ-Vlad Modoranu, et al.
0

Leveraging second-order information at the scale of deep networks is one of the main lines of approach for improving the performance of current optimizers for deep learning. Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to medium-scale models, as they must store a sliding window of gradients, whose memory requirements are multiplicative in the model dimension. In this paper, we address this issue via an efficient and simple-to-implement error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. Specifically, our approach compresses the gradient information via sparsification or low-rank compression before it is fed into the preconditioner, feeding the compression error back into future iterations. Extensive experiments on deep neural networks for vision show that this approach can compress full-matrix preconditioners by up to two orders of magnitude without impact on accuracy, effectively removing the memory overhead of full-matrix preconditioning for implementations of full-matrix Adagrad (GGT) and natural gradient (M-FAC). Our code is available at https://github.com/IST-DASLab/EFCP.

READ FULL TEXT

page 18

page 19

page 20

page 21

page 22

page 23

page 24

page 25

research
07/07/2021

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization

Efficiently approximating local curvature information of the loss functi...
research
03/04/2023

A Fast Training-Free Compression Framework for Vision Transformers

Token pruning has emerged as an effective solution to speed up the infer...
research
05/31/2019

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

We study gradient compression methods to alleviate the communication bot...
research
07/23/2021

Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition

We present a novel global compression framework for deep neural networks...
research
01/01/2021

An iterative K-FAC algorithm for Deep Learning

Kronecker-factored Approximate Curvature (K-FAC) method is a high effici...
research
10/27/2020

Memory Optimization for Deep Networks

Deep learning is slowly, but steadily, hitting a memory bottleneck. Whil...
research
06/19/2017

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

We propose a simple yet effective technique for neural network learning....

Please sign up or login with your details

Forgot password? Click here to reset