Accelerating Sparse Deep Neural Networks

04/16/2021
by   Asit Mishra, et al.
0

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero values in parameters that can then be discarded from storage or computations. While most research focuses on high levels of sparsity, there are challenges in universally maintaining model accuracy as well as achieving significant speedups over modern matrix-math hardware. To make sparsity adoption practical, the NVIDIA Ampere GPU architecture introduces sparsity support in its matrix-math units, Tensor Cores. We present the design and behavior of Sparse Tensor Cores, which exploit a 2:4 (50 that leads to twice the math throughput of dense matrix units. We also describe a simple workflow for training networks that both satisfy 2:4 sparsity pattern requirements and maintain accuracy, verifying it on a wide range of common tasks and model architectures. This workflow makes it easy to prepare accurate models for efficient deployment on Sparse Tensor Cores.

READ FULL TEXT

page 4

page 5

page 7

research
08/29/2020

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural netw...
research
03/09/2022

Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

Weight pruning in deep neural networks (DNNs) can reduce storage and com...
research
08/19/2019

A Computational Model for Tensor Core Units

To respond to the need of efficient training and inference of deep neura...
research
09/19/2023

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

With the fast growth of parameter size, it becomes increasingly challeng...
research
06/06/2022

Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors

Tensor Cores have been an important unit to accelerate Fused Matrix Mult...
research
05/20/2021

Dual-side Sparse Tensor Core

Leveraging sparsity in deep neural network (DNN) models is promising for...
research
02/16/2023

Reducing Computational and Statistical Complexity in Machine Learning Through Cardinality Sparsity

High-dimensional data has become ubiquitous across the sciences but caus...

Please sign up or login with your details

Forgot password? Click here to reset