Software-Defined FPGA Accelerator Design for Mobile Deep Learning Applications

Recently, the field of deep learning has received great attention by the scientific community and it is used to provide improved solutions to many computer vision problems. Convolutional neural networks (CNNs) have been successfully used to attack problems such as object recognition, object detection, semantic segmentation, and scene understanding. The rapid development of deep learning goes hand by hand with the adaptation of GPUs for accelerating its processes, such as network training and inference. Even though FPGA design exists long before the use of GPUs for accelerating computations and despite the fact that high-level synthesis (HLS) tools are getting more attractive, the adaptation of FPGAs for deep learning research and application development is poor due to the requirement of hardware design related expertise. This work presents a workflow for deep learning mobile application acceleration on small low-cost low-power FPGA devices using HLS tools. This workflow eases the design of an improved version of the SqueezeJet accelerator used for the speedup of mobile-friendly low-parameter ImageNet class CNNs, such as the SqueezeNet v1.1 and the ZynqNet. Additionally, the workflow includes the development of an HLS-driven analytical model which is used for performance estimation of the accelerator. This model can be also used to direct the design process and lead to future design improvements and optimizations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2018

SqueezeJet: High-level Synthesis Accelerator Design for Deep Convolutional Neural Networks

Deep convolutional neural networks have dominated the pattern recognitio...
research
05/23/2016

DLAU: A Scalable Deep Learning Accelerator Unit on FPGA

As the emerging field of machine learning, deep learning shows excellent...
research
06/18/2020

Caffe Barista: Brewing Caffe with FPGAs in the Training Loop

As the complexity of deep learning (DL) models increases, their compute ...
research
05/07/2017

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

In recent years deep learning algorithms have shown extremely high perfo...
research
12/14/2015

Origami: A 803 GOp/s/W Convolutional Network Accelerator

An ever increasing number of computer vision and image/video processing ...
research
04/06/2021

Exploration of Hardware Acceleration Methods for an XNOR Traffic Signs Classifier

Deep learning algorithms are a key component of many state-of-the-art vi...
research
02/13/2016

Deep Learning on FPGAs: Past, Present, and Future

The rapid growth of data size and accessibility in recent years has inst...

Please sign up or login with your details

Forgot password? Click here to reset