Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators

11/14/2018
by   Hang Lu, et al.
0

Inference efficiency is the predominant consideration in designing deep learning accelerators. Previous work mainly focuses on skipping zero values to deal with remarkable ineffectual computation, while zero bits in non-zero values, as another major source of ineffectual computation, is often ignored. The reason lies on the difficulty of extracting essential bits during operating multiply-and-accumulate (MAC) in the processing element. Based on the fact that zero bits occupy as high as 68.9 deep convolutional neural network models, this paper firstly proposes a weight kneading technique that could eliminate ineffectual computation caused by either zero value weights or zero bits in non-zero weights, simultaneously. Besides, a split-and-accumulate (SAC) computing pattern in replacement of conventional MAC, as well as the corresponding hardware accelerator design called Tetris are proposed to support weight kneading at the hardware level. Experimental results prove that Tetris could speed up inference up to 1.50x, and improve power efficiency up to 5.33x compared with the state-of-the-art baselines.

READ FULL TEXT

page 2

page 3

page 4

page 6

research
09/03/2021

On the Accuracy of Analog Neural Network Inference Accelerators

Specialized accelerators have recently garnered attention as a method to...
research
04/26/2018

Accelerator-Aware Pruning for Convolutional Neural Networks

Convolutional neural networks have shown tremendous performance in compu...
research
02/01/2023

Bit-balance: Model-Hardware Co-design for Accelerating NNs by Exploiting Bit-level Sparsity

Bit-serial architectures can handle Neural Networks (NNs) with different...
research
05/10/2018

Laconic Deep Learning Computing

We motivate a method for transparently identifying ineffectual computati...
research
04/10/2020

A Survey on Impact of Transient Faults on BNN Inference Accelerators

Over past years, the philosophy for designing the artificial intelligenc...
research
08/25/2020

IKW: Inter-Kernel Weights for Power Efficient Edge Computing

Deep Convolutional Neural Networks (CNN) have achieved state-of-the-art ...
research
04/24/2020

Computation on Sparse Neural Networks: an Inspiration for Future Hardware

Neural network models are widely used in solving many challenging proble...

Please sign up or login with your details

Forgot password? Click here to reset