Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

01/09/2023
by   Bowen Lei, et al.
3

Despite impressive performance on a wide variety of tasks, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these costs, however, the sparsity constraints add difficulty to the optimization, resulting in an increase in training time and instability. In this work, we aim to overcome this problem and achieve space-time co-efficiency. To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method. Specifically, we approximate the correlation between the current and previous gradients, which is used to balance the two gradients to obtain a corrected gradient. Our method can be used with most popular sparse training pipelines under both standard and adversarial setups. Theoretically, we prove that our method can accelerate the convergence rate of sparse training. Extensive experiments on multiple datasets, model architectures, and sparsities demonstrate that our method outperforms leading sparse training methods by up to 5.0% in accuracy given the same number of training epochs, and reduces the number of training epochs by up to 52.1% to achieve the same accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2023

Calibrating the Rigged Lottery: Making All Tickets Reliable

Although sparse training has been successfully used in various resource-...
research
08/11/2022

Quantized Adaptive Subgradient Algorithms and Their Applications

Data explosion and an increase in model size drive the remarkable advanc...
research
09/22/2022

Layer Freezing Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

Recently, sparse training has emerged as a promising paradigm for effici...
research
03/04/2019

Optimistic Adaptive Acceleration for Optimization

We consider a new variant of AMSGrad. AMSGrad RKK18 is a popular adaptiv...
research
08/01/2019

Accelerating CNN Training by Sparsifying Activation Gradients

Gradients to activations get involved in most of the calculations during...
research
04/28/2023

An Adaptive Policy to Employ Sharpness-Aware Minimization

Sharpness-aware minimization (SAM), which searches for flat minima by mi...
research
03/20/2019

Accelerating Gradient Boosting Machine

Gradient Boosting Machine (GBM) is an extremely powerful supervised lear...

Please sign up or login with your details

Forgot password? Click here to reset