Escort: Efficient Sparse Convolutional Neural Networks on GPUs

02/28/2018
by   Xuhao Chen, et al.
0

Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80 hurts inference performance (speed) when running models on GPUs. Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we propose Escort, an efficient sparse convolutional neural networks on GPUs. Instead of using the lowering method, we choose to compute the sparse convolutions directly. We then orchestrate the parallelism and locality for the direct sparse convolution kernel, and apply customized optimization techniques to further improve performance. Evaluation on NVIDIA GPUs show that Escort can improve sparse convolution speed by 2.63x and 3.07x, and inference speed by 1.38x and 1.60x, compared to CUBLAS and CUSPARSE respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2020

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse co...
research
08/04/2016

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Phenomenally successful in practical inference problems, convolutional n...
research
03/09/2022

Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

Weight pruning in deep neural networks (DNNs) can reduce storage and com...
research
06/30/2020

Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs

Despite foreseeing tremendous speedups over conventional deep neural net...
research
12/24/2019

Pruning Deep Neural Networks Architectures with Evolution Strategy

Currently, Deep Neural Networks (DNNs) are used to solve all kinds of pr...
research
12/28/2021

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural network...
research
04/21/2022

TorchSparse: Efficient Point Cloud Inference Engine

Deep learning on point clouds has received increased attention thanks to...

Please sign up or login with your details

Forgot password? Click here to reset