Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network

06/15/2023
by   Song Han, et al.
0

EIE proposed to accelerate pruned and compressed neural networks, exploiting weight sparsity, activation sparsity, and 4-bit weight-sharing in neural network accelerators. Since published in ISCA'16, it opened a new design space to accelerate pruned and sparse neural networks and spawned many algorithm-hardware co-designs for model compression and acceleration, both in academia and commercial AI chips. In retrospect, we review the background of this project, summarize the pros and cons, and discuss new opportunities where pruning, sparsity, and low precision can accelerate emerging deep learning workloads.

READ FULL TEXT
research
03/01/2021

SWIS – Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Quantization is spearheading the increase in performance and efficiency ...
research
12/02/2021

Putting 3D Spatially Sparse Networks on a Diet

3D neural networks have become prevalent for many 3D vision tasks includ...
research
01/20/2021

SparseDNN: Fast Sparse Deep Learning Inference on CPUs

The last few years have seen gigantic leaps in algorithms and systems to...
research
01/31/2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

The growing energy and performance costs of deep learning have driven th...
research
05/09/2020

GPU Acceleration of Sparse Neural Networks

In this paper, we use graphics processing units(GPU) to accelerate spars...
research
12/27/2021

Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

In principle, sparse neural networks should be significantly more effici...
research
09/04/2020

Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration

Convolutional neural network (CNN) inference on mobile devices demands e...

Please sign up or login with your details

Forgot password? Click here to reset