SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors

11/22/2019
by   Zhangxiaowen Gong, et al.
0

Our community has greatly improved the efficiency of deep learning applications, including by exploiting sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known statically, and/or for specialized hardware. We propose a scheme to leverage dynamic sparsity during training. In particular, we exploit zeros introduced by the ReLU activation function to both feature maps and their gradients. This is challenging because the sparsity degree is moderate and the locations of zeros change over time. We also rely purely on software. We identify zeros in a dense data representation without transforming the data and performs conventional vectorized computation. Variations of the scheme are applicable to all major components of training: forward propagation, backward propagation by inputs, and backward propagation by weights. Our method significantly outperforms a highly-optimized dense direct convolution on several popular deep neural networks. At realistic sparsity, we speed up the training of the non-initial convolutional layers in VGG16, ResNet-34, ResNet-50, and Fixup ResNet-50 by 2.19x, 1.37x, 1.31x, and 1.51x respectively on an Intel Skylake-X CPU.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2018

Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks

While CNNs naturally lend themselves to densely sampled data, and sophis...
research
06/01/2018

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

Exploiting sparsity enables hardware systems to run neural networks fast...
research
12/24/2018

Dynamic Runtime Feature Map Pruning

High bandwidth requirements are an obstacle for accelerating the trainin...
research
07/21/2020

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Training Convolutional Neural Networks (CNNs) usually requires a large n...
research
05/03/2017

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

Popular deep learning frameworks require users to fine-tune their memory...
research
09/16/2021

Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Machine/deep-learning (ML/DL) based techniques are emerging as a driving...
research
12/18/2017

Parallel Complexity of Forward and Backward Propagation

We show that the forward and backward propagation can be formulated as a...

Please sign up or login with your details

Forgot password? Click here to reset