VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

02/17/2023
by   Geonhwa Jeong, et al.
0

Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with several companies (Arm, Intel, IBM) announcing products with specialized matrix engines accessible via GEMM instructions. CPUs are pervasive and need to handle diverse requirements across DL workloads running in edge/HPC/cloud platforms. Therefore, as DL workloads embrace sparsity to reduce the computations and memory size of models, it is also imperative for CPUs to add support for sparsity to avoid under-utilization of the dense matrix engine and inefficient usage of the caches and registers. This work presents VEGETA, a set of ISA and microarchitecture extensions over dense matrix engines to support flexible structured sparsity for CPUs, enabling programmable support for diverse DL models with varying degrees of sparsity. Compared to the state-of-the-art (SOTA) dense matrix engine in CPUs, a VEGETA engine provides 1.09x, 2.20x, 3.74x, and 3.28x speed-ups when running 4:4 (dense), 2:4, 1:4, and unstructured (95

READ FULL TEXT
research
05/22/2023

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Due to complex interactions among various deep neural network (DNN) opti...
research
07/16/2022

S4: a High-sparsity, High-performance AI Accelerator

Exploiting sparsity underlying neural networks has become one of the mos...
research
09/16/2021

Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Machine/deep-learning (ML/DL) based techniques are emerging as a driving...
research
08/12/2022

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

The Transformer has been an indispensable staple in deep learning. Howev...
research
03/18/2021

Extending Sparse Tensor Accelerators to Support Multiple Compression Formats

Sparsity, which occurs in both scientific applications and Deep Learning...
research
12/01/2020

A Study of Checkpointing in Large Scale Training of Deep Neural Networks

Deep learning (DL) applications are increasingly being deployed on HPC s...
research
04/21/2022

TorchSparse: Efficient Point Cloud Inference Engine

Deep learning on point clouds has received increased attention thanks to...

Please sign up or login with your details

Forgot password? Click here to reset