S4: a High-sparsity, High-performance AI Accelerator

07/16/2022
by   Ian En-Hsu Yen, et al.
5

Exploiting sparsity underlying neural networks has become one of the most potential methodologies to reduce the memory footprint, I/O cost, and computation workloads during inference. And the degree of sparsity one can exploit has become higher as larger model sizes have been considered along with the trend of pre-training giant models. On the other hand, compared with quantization that has been a widely supported option, acceleration through high-degree sparsity is not supported in most computing platforms. In this work, we introduce the first commercial hardware platform supporting high-degree sparsity acceleration up to 32 times – S4. Combined with state-of-the-art sparse pruning techniques, we demonstrate several-times practical inference speedup on S4 over mainstream inference platforms such as Nvidia T4. We also show that in practice a sparse model of larger size can achieve both higher accuracy and higher throughput on S4 than a dense model of smaller size.

READ FULL TEXT
research
01/31/2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

The growing energy and performance costs of deep learning have driven th...
research
05/03/2023

Dynamic Sparse Training with Structured Sparsity

DST methods achieve state-of-the-art results in sparse neural network tr...
research
02/17/2023

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

Deep Learning (DL) acceleration support in CPUs has recently gained a lo...
research
04/20/2022

Multiply-and-Fire (MNF): An Event-driven Sparse Neural Network Accelerator

Machine learning, particularly deep neural network inference, has become...
research
12/27/2021

Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

In principle, sparse neural networks should be significantly more effici...
research
01/31/2018

Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in Convolutional Networks

While CNNs naturally lend themselves to densely sampled data, and sophis...
research
05/18/2023

Boost Vision Transformer with GPU-Friendly Sparsity and Quantization

The transformer extends its success from the language to the vision doma...

Please sign up or login with your details

Forgot password? Click here to reset