Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

08/29/2020
by   Cong Guo, et al.
0

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse models cannot achieve meaningful speedup on commodity hardware (e.g., GPU) built for dense matrix computations. As such, prior works usually modify or design completely new sparsity-optimized architectures for exploiting sparsity. We propose an algorithm-software co-designed pruning method that achieves latency speedups on existing dense architectures. Our work builds upon the insight that the matrix multiplication generally breaks the large matrix into multiple smaller tiles for parallel execution. We propose a tiling-friendly "tile-wise" sparsity pattern, which maintains a regular pattern at the tile level for efficient execution but allows for irregular, arbitrary pruning at the global scale to maintain the high accuracy. We implement and evaluate the sparsity pattern on GPU tensor core, achieving a 1.95x speedup over the dense model.

READ FULL TEXT
research
04/16/2021

Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the in...
research
03/09/2022

Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

Weight pruning in deep neural networks (DNNs) can reduce storage and com...
research
05/20/2021

Dual-side Sparse Tensor Core

Leveraging sparsity in deep neural network (DNN) models is promising for...
research
06/08/2015

Fast ConvNets Using Group-wise Brain Damage

We revisit the idea of brain damage, i.e. the pruning of the coefficient...
research
09/19/2023

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

With the fast growth of parameter size, it becomes increasingly challeng...
research
05/09/2023

Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra

Sparse linear algebra is crucial in many application domains, but challe...
research
02/09/2022

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

The lottery ticket hypothesis (LTH) has shown that dense models contain ...

Please sign up or login with your details

Forgot password? Click here to reset