A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

03/21/2017
by   Kamel Abdelouahab, et al.
0

Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, CNN are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on FPGAs. Unfortunately, FPGA resources such as logic elements or DSP units remain limited. This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on an FPGA. This method was tested when implementing a reconfigurable OCR convolutional neural network on an Altera Stratix V device and varying both data representation and CNN topology in order to find the best combination in terms of DSP block utilization and classification accuracy. This exploration generated dataflow architectures of 76 CNN topologies with 5 different fixed point representation. Most efficient implementation performs 883 classifications/sec at 256 x 256 resolution using 8

READ FULL TEXT

page 4

page 7

research
05/04/2017

Hardware Automated Dataflow Deployment of CNNs

Deep Convolutional Neural Networks (CNNs) are the state of the art syste...
research
08/21/2021

Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks

Convolutional Neural Networks (CNNs) are widely used in deep learning ap...
research
12/23/2020

Overview of FPGA deep learning acceleration based on convolutional neural network

In recent years, deep learning has become more and more mature, and as a...
research
05/14/2020

ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network

Image Understanding is becoming a vital feature in ever more application...
research
05/31/2018

A Highly Parallel FPGA Implementation of Sparse Neural Network Training

We demonstrate an FPGA implementation of a parallel and reconfigurable a...
research
06/30/2018

The Challenge of Multi-Operand Adders in CNNs on FPGAs: How not to solve it!

Convolutional Neural Networks (CNNs) are computationally intensive algor...
research
01/12/2017

Scaling Binarized Neural Networks on Reconfigurable Logic

Binarized neural networks (BNNs) are gaining interest in the deep learni...

Please sign up or login with your details

Forgot password? Click here to reset