Balanced Sparsity for Efficient DNN Inference on GPU

11/01/2018
by   Zhuliang Yao, et al.
0

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, balanced sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that balanced sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as fine-grained sparsity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2019

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

Model compression techniques on Deep Neural Network (DNN) have been wide...
research
12/17/2022

FSCNN: A Fast Sparse Convolution Neural Network Inference System

Convolution neural networks (CNNs) have achieved remarkable success, but...
research
10/26/2019

Cross-Channel Intragroup Sparsity Neural Network

Modern deep neural network models generally build upon heavy over-parame...
research
05/24/2017

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

Sparsity helps reduce the computational complexity of deep neural networ...
research
06/14/2022

Learning Best Combination for Efficient N:M Sparsity

By forcing at most N out of M consecutive weights to be non-zero, the re...
research
05/05/2021

Sequential Encryption of Sparse Neural Networks Toward Optimum Representation of Irregular Sparsity

Even though fine-grained pruning techniques achieve a high compression r...
research
05/30/2023

Dynamic Sparsity Is Channel-Level Sparsity Learner

Sparse training has received an upsurging interest in machine learning d...

Please sign up or login with your details

Forgot password? Click here to reset