Sparse Weight Activation Training

01/07/2020
by   Md Aamir Raihan, et al.
0

Training convolutional neural networks (CNNs) is time-consuming. Prior work has explored how to reduce the computational demands of training by eliminating gradients with relatively small magnitude. We show that eliminating small magnitude components has limited impact on the direction of high-dimensional vectors. However, in the context of training a CNN, we find that eliminating small magnitude components of weight and activation vectors allows us to train deeper networks on more complex datasets versus eliminating small magnitude components of gradients. We propose Sparse Weight Activation Training (SWAT), an algorithm that embodies these observations. SWAT reduces computations by 50 to 80 Sparse Graph algorithm. SWAT also reduces memory footprint by 23 activations and 50

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2019

Accelerating CNN Training by Sparsifying Activation Gradients

Gradients to activations get involved in most of the calculations during...
research
03/09/2018

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

We show that, during inference with Convolutional Neural Networks (CNNs)...
research
04/29/2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

The increasing size of neural network models has been critical for impro...
research
05/31/2023

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Running out of GPU memory has become a main bottleneck for large-scale D...
research
07/21/2018

Exploiting Spatial Correlation in Convolutional Neural Networks for Activation Value Prediction

Convolutional neural networks (CNNs) compute their output using weighted...
research
09/17/2019

Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks

Convolutional neural networks (CNNs) introduce state-of-the-art results ...
research
06/22/2022

GACT: Activation Compressed Training for General Architectures

Training large neural network (NN) models requires extensive memory reso...

Please sign up or login with your details

Forgot password? Click here to reset