NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks

07/19/2020
by   Mahmood Azhar Qureshi, et al.
0

Convolutional neural networks (CNNs) require high throughput hardware accelerators for real time applications owing to their huge computational cost. Most traditional CNN accelerators rely on single core, linear processing elements (PEs) in conjunction with 1D dataflows for accelerating convolution operations. This limits the maximum achievable ratio of peak throughput per PE count to unity. Most of the past works optimize their dataflows to attain close to a 100 a high throughput, multi-threaded, log-based PE core. The designed core provides a 200 6 with same output bit precision. We also present a 2D weight broadcast dataflow which exploits the multi-threaded nature of the PE cores to achieve a high hardware utilization per layer for various CNNs. The entire architecture, which we refer to as NeuroMAX, is implemented on Xilinx Zynq 7020 SoC at 200 MHz processing clock. Detailed analysis is performed on throughput, hardware utilization, area and power breakdown, and latency to show performance improvement compared to previous FPGA and ASIC designs.

READ FULL TEXT

page 2

page 3

page 4

page 7

research
05/02/2022

VWA: Hardware Efficient Vectorwise Accelerator for Convolutional Neural Network

Hardware accelerators for convolution neural networks (CNNs) enable real...
research
10/21/2019

Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs

Modern deep Convolutional Neural Networks (CNNs) are computationally dem...
research
08/08/2017

Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks

Deep convolutional neural networks (CNNs) are the deep learning model of...
research
12/23/2021

Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator

Decision-making by artificial neural networks with minimal latency is pa...
research
07/13/2018

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

This work presents CascadeCNN, an automated toolflow that pushes the qua...
research
11/14/2020

Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators

Low latency, high throughput inference on Convolution Neural Networks (C...
research
11/09/2021

Phantom: A High-Performance Computational Core for Sparse Convolutional Neural Networks

Sparse convolutional neural networks (CNNs) have gained significant trac...

Please sign up or login with your details

Forgot password? Click here to reset