Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

12/28/2021
by   Marcin Pietroń, et al.
0

This work is focused on the pruning of some convolutional neural networks (CNNs) and improving theirs efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the most effective implementations of deep learning (DL) algorithms for GPUs. GPUs are the most commonly used accelerators for deep learning computations. One of the most common techniques for improving the efficiency of CNN models is weight pruning and quantization. There are two main types of pruning: structural and non-structural. The first enables much easier acceleration on many type of accelerators, but with this type it is difficult to achieve a sparsity level and accuracy as high as that obtained with the second type. Non-structural pruning with retraining can generate a weight tensors up to 90 the pruning algorithm is presented which makes it possible to achieve high sparsity levels without accuracy drop. In the next stage the linear and non-linear quantization is adapted for further time and footprint reduction. This paper is an extended of previously published paper concerning effective pruning techniques and present real models pruned with high sparsities and reduced precision which can achieve better performance than the CuDnn library.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2020

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse co...
research
02/03/2022

PRUNIX: Non-Ideality Aware Convolutional Neural Network Pruning for Memristive Accelerators

In this work, PRUNIX, a framework for training and pruning convolutional...
research
06/22/2020

Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

Pruning and quantization are proven methods for improving the performanc...
research
08/26/2020

SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference

In recent years, there has been a flurry of research in deep neural netw...
research
02/28/2018

Escort: Efficient Sparse Convolutional Neural Networks on GPUs

Deep neural networks have achieved remarkable accuracy in many artificia...
research
10/05/2020

Joint Pruning Quantization for Extremely Sparse Neural Networks

We investigate pruning and quantization for deep neural networks. Our go...
research
02/20/2020

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Convolutional Neural Networks (CNN) are becoming a common presence in ma...

Please sign up or login with your details

Forgot password? Click here to reset