DeepAI AI Chat
Log In Sign Up

NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks

by   Mahmood Azhar Qureshi, et al.
Kansas State University

Convolutional neural networks (CNNs) require high throughput hardware accelerators for real time applications owing to their huge computational cost. Most traditional CNN accelerators rely on single core, linear processing elements (PEs) in conjunction with 1D dataflows for accelerating convolution operations. This limits the maximum achievable ratio of peak throughput per PE count to unity. Most of the past works optimize their dataflows to attain close to a 100 a high throughput, multi-threaded, log-based PE core. The designed core provides a 200 6 with same output bit precision. We also present a 2D weight broadcast dataflow which exploits the multi-threaded nature of the PE cores to achieve a high hardware utilization per layer for various CNNs. The entire architecture, which we refer to as NeuroMAX, is implemented on Xilinx Zynq 7020 SoC at 200 MHz processing clock. Detailed analysis is performed on throughput, hardware utilization, area and power breakdown, and latency to show performance improvement compared to previous FPGA and ASIC designs.


page 2

page 3

page 4

page 7


VWA: Hardware Efficient Vectorwise Accelerator for Convolutional Neural Network

Hardware accelerators for convolution neural networks (CNNs) enable real...

Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs

Modern deep Convolutional Neural Networks (CNNs) are computationally dem...

Snowflake: A Model Agnostic Accelerator for Deep Convolutional Neural Networks

Deep convolutional neural networks (CNNs) are the deep learning model of...

Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator

Decision-making by artificial neural networks with minimal latency is pa...

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

This work presents CascadeCNN, an automated toolflow that pushes the qua...

Sense: Model Hardware Co-design for Accelerating Sparse Neural Networks

Sparsity is an intrinsic property of neural network(NN). Many software r...

Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators

Low latency, high throughput inference on Convolution Neural Networks (C...