NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

06/05/2017
by   Alessandro Aimar, et al.
0

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications. We propose a flexible and efficient CNN accelerator architecture which can support the implementation of SOA CNNs in low-power and low-latency application scenarios. This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across a wide range of convolutional network kernel sizes; and numbers of input and output feature maps. We implemented the proposed architecture on an FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and efficiency of 368 achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2

READ FULL TEXT

page 3

page 4

page 8

research
03/05/2018

XNORBIN: A 95 TOp/s/W Hardware Accelerator for Binary Convolutional Neural Networks

Deploying state-of-the-art CNNs requires power-hungry processors and off...
research
06/10/2023

RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Deep Neural Network (DNN) based inference at the edge is challenging as ...
research
07/23/2018

PCNNA: A Photonic Convolutional Neural Network Accelerator

Convolutional Neural Networks (CNN) have been the centerpiece of many ap...
research
01/30/2018

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Convolutional neural networks (CNNs) are one of the most successful mach...
research
08/30/2016

Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are one of the most successful deep...
research
04/18/2021

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks

Convolutional neural networks (CNNs) are emerging as powerful tools for ...
research
07/18/2020

DeepDive: An Integrative Algorithm/Architecture Co-Design for Deep Separable Convolutional Neural Networks

Deep Separable Convolutional Neural Networks (DSCNNs) have become the em...

Please sign up or login with your details

Forgot password? Click here to reset