MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks

10/30/2018
by   Muhammad Abdullah Hanif, et al.
0

The state-of-the-art accelerators for Convolutional Neural Networks (CNNs) typically focus on accelerating only the convolutional layers, but do not prioritize the fully-connected layers much. Hence, they lack a synergistic optimization of the hardware architecture and diverse dataflows for the complete CNN design, which can provide a higher potential for performance/energy efficiency. Towards this, we propose a novel Massively-Parallel Neural Array (MPNA) accelerator that integrates two heterogeneous systolic arrays and respective highly-optimized dataflow patterns to jointly accelerate both the convolutional (CONV) and the fully-connected (FC) layers. Besides fully-exploiting the available off-chip memory bandwidth, these optimized dataflows enable high data-reuse of all the data types (i.e., weights, input and output activations), and thereby enable our MPNA to achieve high energy savings. We synthesized our MPNA architecture using the ASIC design flow for a 28nm technology, and performed functional and timing validation using multiple real-world complex CNNs. MPNA achieves 149.7GOPS/W at 280MHz and consumes 239mW. Experimental results show that our MPNA architecture provides 1.7x overall performance improvement compared to state-of-the-art accelerator, and 51

READ FULL TEXT

page 1

page 5

page 7

research
06/23/2017

Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks

Loom (LM), a hardware inference accelerator for Convolutional Neural Net...
research
12/16/2019

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient o...
research
06/15/2021

S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks

Convolutional neural networks (CNNs) have achieved great success in perf...
research
09/07/2023

Mapping of CNNs on multi-core RRAM-based CIM architectures

RRAM-based multi-core systems improve the energy efficiency and performa...
research
06/24/2020

On the Difficulty of Designing Processor Arrays for Deep Neural Networks

Systolic arrays are a promising computing concept which is in particular...
research
07/04/2018

Restructuring Batch Normalization to Accelerate CNN Training

Because CNN models are compute-intensive, where billions of operations c...
research
10/16/2018

Morph: Flexible Acceleration for 3D CNN-based Video Understanding

The past several years have seen both an explosion in the use of Convolu...

Please sign up or login with your details

Forgot password? Click here to reset