Log In Sign Up

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

by   Fangxin Liu, et al.

Resistive Random-Access-Memory (ReRAM) crossbar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in the DNN. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure. As the countermeasure, we developed a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weigh mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7x and 2.1x using Resent-50 and MobileNet-v2, respectively, with less than 0.3 accuracy drop on ImageNet.


page 1

page 3

page 4


Dual-side Sparse Tensor Core

Leveraging sparsity in deep neural network (DNN) models is promising for...

n-hot: Efficient bit-level sparsity for powers-of-two neural network quantization

Powers-of-two (PoT) quantization reduces the number of bit operations of...

SWIS – Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Quantization is spearheading the increase in performance and efficiency ...

Exploring Bit-Slice Sparsity in Deep Neural Networks for Efficient ReRAM-Based Deployment

Emerging resistive random-access memory (ReRAM) has recently been intens...

AID: Accuracy Improvement of Analog Discharge-Based in-SRAM Multiplication Accelerator

This paper presents a novel circuit (AID) to improve the accuracy of an ...

DNN Training Acceleration via Exploring GPGPU Friendly Sparsity

The training phases of Deep neural network (DNN) consumes enormous proce...