Bit-balance: Model-Hardware Co-design for Accelerating NNs by Exploiting Bit-level Sparsity

02/01/2023
by   Wenhao Sun, et al.
0

Bit-serial architectures can handle Neural Networks (NNs) with different weight precisions, achieving higher resource efficiency compared with bit-parallel architectures. Besides, the weights contain abundant zero bits owing to the fault tolerance of NNs, indicating that bit sparsity of NNs can be further exploited for performance improvement. However, the irregular proportion of zero bits in each weight causes imbalanced workloads in the Processing Element (PE) array, which degrades performance or induces overhead for sparse processing. Thus, this paper proposed a bit-sparsity quantization method to maintain the bit sparsity ratio of each weight to no more than a certain value for balancing workloads, with little accuracy loss. Then, we co-designed a sparse bit-serial architecture, called Bit-balance, to improve overall performance, supporting weight-bit sparsity and adaptive bitwidth computation. The whole design was implemented with 65nm technology at 1 GHz and performs at 326-, 30-, 56-, and 218-frame/s for AlexNet, VGG-16, ResNet-50, and GoogleNet respectively. Compared with sparse bit-serial accelerator, Bitlet, Bit-balance achieves 1.8x 2.7x energy efficiency (frame/J) and 2.1x 3.7x resource efficiency (frame/mm2).

READ FULL TEXT

page 1

page 4

page 9

research
03/01/2021

SWIS – Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Quantization is spearheading the increase in performance and efficiency ...
research
02/01/2022

Sense: Model Hardware Co-design for Accelerating Sparse Neural Networks

Sparsity is an intrinsic property of neural network(NN). Many software r...
research
09/09/2021

SONIC: A Sparse Neural Network Inference Accelerator with Silicon Photonics for Energy-Efficient Deep Learning

Sparse neural networks can greatly facilitate the deployment of neural n...
research
12/10/2020

A MAC-less Neural Inference Processor Supporting Compressed, Variable Precision Weights

This paper introduces two architectures for the inference of convolution...
research
11/14/2018

Tetris: Re-architecting Convolutional Neural Network Computation for Machine Learning Accelerators

Inference efficiency is the predominant consideration in designing deep ...
research
12/10/2017

A Scalable High-Performance Priority Encoder Using 1D-Array to 2D-Array Conversion

In our prior study of an L-bit priority encoder (PE), a so-called one-di...
research
05/18/2021

IMPULSE: A 65nm Digital Compute-in-Memory Macro with Fused Weights and Membrane Potential for Spike-based Sequential Learning Tasks

The inherent dynamics of the neuron membrane potential in Spiking Neural...

Please sign up or login with your details

Forgot password? Click here to reset