Dual-side Sparse Tensor Core

05/20/2021
by   Yang Wang, et al.
13

Leveraging sparsity in deep neural network (DNN) models is promising for accelerating model inference. Yet existing GPUs can only leverage the sparsity from weights but not activations, which are dynamic, unpredictable, and hence challenging to exploit. In this work, we propose a novel architecture to efficiently harness the dual-side sparsity (i.e., weight and activation sparsity). We take a systematic approach to understand the (dis)advantages of previous sparsity-related architectures and propose a novel, unexplored paradigm that combines outer-product computation primitive and bitmap-based encoding format. We demonstrate the feasibility of our design with minimal changes to the existing production-scale inner-product-based Tensor Core. We propose a set of novel ISA extensions and co-design the matrix-matrix multiplication and convolution algorithms, which are the two dominant computation patterns in today's DNN models, to exploit our new dual-side sparse Tensor Core. Our evaluation shows that our design can fully unleash the dual-side DNN sparsity and improve the performance by up to one order of magnitude with small hardware overhead.

READ FULL TEXT

page 3

page 7

page 8

page 11

research
03/02/2021

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

Resistive Random-Access-Memory (ReRAM) crossbar is a promising technique...
research
08/29/2020

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural netw...
research
03/09/2022

Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

Weight pruning in deep neural networks (DNNs) can reduce storage and com...
research
01/25/2023

Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing

Sparsity is a growing trend in modern DNN models. Existing Sparse-Sparse...
research
07/27/2021

Griffin: Rethinking Sparse Optimization for Deep Learning Architectures

This paper examines the design space trade-offs of DNNs accelerators aim...
research
04/16/2021

Accelerating Sparse Deep Neural Networks

As neural network model sizes have dramatically increased, so has the in...
research
05/09/2023

Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra

Sparse linear algebra is crucial in many application domains, but challe...

Please sign up or login with your details

Forgot password? Click here to reset