MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks

10/24/2020
by   Syuan-Hao Sie, et al.
8

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM) architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. However, the intensive multiply and accumulation (MAC) operations executed at the crossbar array and the limited capacity of CIM macros remain bottlenecks for further improvement of energy efficiency and throughput. To reduce computation costs, network pruning and quantization are two widely studied compression methods to shrink the model size. However, most of the model compression algorithms can only be implemented in digital-based CNN accelerators. For implementation in a static random access memory (SRAM) CIM-based accelerator, the model compression algorithm must consider the hardware limitations of CIM macros, such as the number of word lines and bit lines that can be turned on at the same time, as well as how to map the weight to the SRAM CIM macro. In this study, a software and hardware co-design approach is proposed to design an SRAM CIM-based CNN accelerator and an SRAM CIM-aware model compression algorithm. To lessen the high-precision MAC required by batch normalization (BN), a quantization algorithm that can fuse BN into the weights is proposed. Furthermore, to reduce the number of network parameters, a sparsity algorithm that considers a CIM architecture is proposed. Last, MARS, a CIM-based CNN accelerator that can utilize multiple SRAM CIM macros as processing units and support a sparsity neural network, is proposed.

READ FULL TEXT

page 3

page 4

page 5

page 8

research
09/02/2019

SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference

CNNs outperform traditional machine learning algorithms across a wide ra...
research
07/25/2018

Crossbar-aware neural network pruning

Crossbar architecture based devices have been widely adopted in neural n...
research
10/31/2018

Convolutional Neural Network Quantization using Generalized Gamma Distribution

As edge applications using convolutional neural networks (CNN) models gr...
research
03/10/2018

Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration

Many recent works have designed accelerators for Convolutional Neural Ne...
research
08/15/2020

Breaking Barriers: Maximizing Array Utilization for Compute In-Memory Fabrics

Compute in-memory (CIM) is a promising technique that minimizes data tra...
research
01/30/2018

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Convolutional neural networks (CNNs) are one of the most successful mach...
research
02/19/2021

BPLight-CNN: A Photonics-based Backpropagation Accelerator for Deep Learning

Training deep learning networks involves continuous weight updates acros...

Please sign up or login with your details

Forgot password? Click here to reset