When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation of deep learning (DL) algorithms for GPUs. GPUs are one of the most efficient and commonly used accelerators for deep learning computations. The modern CNN models need megabytes of coefficients and needed millions MAC operations to perform convolution. One of the most common techniques for compressing CNN models is weight pruning. There are two main types of pruning: structural (based on removing whole weight channels) and non-structural (removing individual weights). The first enables much easier acceleration, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a matrix-weight up to ∼90% or more of sparsity in some deep CNN models. This work shows when is worth using a direct sparse operation to speed-up the calculation of the convolution layers. The VGG-16, CNN-non-static and 1x1 layers from ResNet models were used as a benchmarks. In addition, we present the impact of using reduced precision on time efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2021

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural network...
research
08/26/2020

SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference

In recent years, there has been a flurry of research in deep neural netw...
research
02/28/2018

Escort: Efficient Sparse Convolutional Neural Networks on GPUs

Deep neural networks have achieved remarkable accuracy in many artificia...
research
08/01/2018

Energy-based Tuning of Convolutional Neural Networks on Multi-GPUs

Deep Learning (DL) applications are gaining momentum in the realm of Art...
research
09/22/2019

Performance optimization of convolution calculation by blocking and sparsity on GPU

Convolution neural network (CNN) plays a paramount role in machine learn...
research
02/01/2018

Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

Model pruning has become a useful technique that improves the computatio...
research
01/08/2019

Spatial-Winograd Pruning Enabling Sparse Winograd Convolution

Deep convolutional neural networks (CNNs) are deployed in various applic...

Please sign up or login with your details

Forgot password? Click here to reset