Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks?

Graphics Processing Units (GPUs) are currently the dominating programmable architecture for Deep Learning (DL) accelerators. The adoption of Field Programmable Gate Arrays (FPGAs) in DL accelerators is however getting momentum. In this paper, we demonstrate that Direct Hardware Mapping (DHM) of a Convolutional Neural Network (CNN) on an embedded FPGA substantially outperforms a GPU implementation in terms of energy efficiency and execution time. However, DHM is highly resource intensive and cannot fully substitute the GPU when implementing a state-of-the-art CNN. We thus propose a hybrid FPGA-GPU DL acceleration method and demonstrate that heterogeneous acceleration outperforms GPU acceleration even including communication overheads. Experimental results are conducted on a heterogeneous multi-platform setup embedding an Nvidia(R) Jetson TX2 CPU-GPU board and an Intel(R) Cyclone10GX FPGA board. The SqueezeNet, MobileNetv2, and ShuffleNetv2 mobile-oriented CNNs are experimented. We show that heterogeneous FPG-AGPU acceleration outperforms GPU acceleration for classification inference task over MobileNetv2 (12 energy reduction, 4 reduction, same latency), and ShuffleNetv2 (25 reduction).

READ FULL TEXT

page 1

page 4

research
02/20/2017

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs...
research
09/17/2019

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

Intensive computation is entering data centers with multiple workloads o...
research
09/21/2023

AIM: Accelerating Arbitrary-precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP

Arbitrary-precision integer multiplication is the core kernel of many ap...
research
08/08/2018

On the Feasibility of FPGA Acceleration of Molecular Dynamics Simulations

Classical molecular dynamics (MD) simulations are important tools in lif...
research
11/20/2017

Tactics to Directly Map CNN graphs on Embedded FPGAs

Deep Convolutional Neural Networks (CNNs) are the state-of-the-art in im...
research
03/06/2019

Towards a Uniform Architecture for the Efficient Implementation of 2D and 3D Deconvolutional Neural Networks on FPGAs

Three-dimensional deconvolution is widely used in many computer vision a...
research
12/31/2020

Accelerating ODE-Based Neural Networks on Low-Cost FPGAs

ODENet is a deep neural network architecture in which a stacking structu...

Please sign up or login with your details

Forgot password? Click here to reset