Accelerating CNN Training by Sparsifying Activation Gradients

08/01/2019
by   Xucheng Ye, et al.
1

Gradients to activations get involved in most of the calculations during back propagation procedure of Convolution Neural Networks (CNNs) training. However, an important known observation is that the majority of these gradients are close to zero, imposing little impact on weights update. These gradients can be then pruned to achieve high gradient sparsity during CNNs training and reduce the computational cost. In particular, we randomly change a gradient to zero or a threshold value if the gradient is below the threshold which is determined by the statistical distribution of activation gradients. We also theoretically proved that the training convergence of the CNN model can be guaranteed when the above activation gradient sparsification method is applied. We evaluated our method on AlexNet, MobileNet, ResNet-18, 34, 50, 101, 152 with CIFAR-10, 100 and ImageNet datasets. Experimental results show that our method can substantially reduce the computational cost with negligible accuracy loss or even accuracy improvement. Finally, we analyzed the benefits that the sparsity of activation gradients introduced in detail.

READ FULL TEXT

page 1

page 2

page 3

page 7

page 8

page 9

research
05/30/2016

Parametric Exponential Linear Unit for Deep Convolutional Neural Networks

The activation function is an important component in Convolutional Neura...
research
01/07/2020

Sparse Weight Activation Training

Training convolutional neural networks (CNNs) is time-consuming. Prior w...
research
05/27/2021

Efficient and Accurate Gradients for Neural SDEs

Neural SDEs combine many of the best qualities of both RNNs and SDEs, an...
research
09/17/2019

Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks

Convolutional neural networks (CNNs) introduce state-of-the-art results ...
research
11/17/2017

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

We propose a simple yet effective technique to simplify the training and...
research
02/27/2019

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

Residual neural networks can be viewed as the forward Euler discretizati...
research
01/09/2023

Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

Despite impressive performance on a wide variety of tasks, deep neural n...

Please sign up or login with your details

Forgot password? Click here to reset