FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10

11/18/2019
by   Ke He, et al.
0

Deep learning and Convolutional Neural Network (CNN) have becoming increasingly more popular and important in both academic and industrial areas in recent years cause they are able to provide better accuracy and result in classification, detection and recognition areas, compared to traditional approaches. Currently, there are many popular frameworks in the market for deep learning development, such as Caffe, TensorFlow, Pytorch, and most of frameworks natively support CPU and consider GPU as the mainline accelerator by default. FPGA device, viewed as a potential heterogeneous platform, still cannot provide a comprehensive support for CNN development in popular frameworks, in particular to the training phase. In this paper, we firstly propose the FeCaffe, i.e. FPGA-enabled Caffe, a hierarchical software and hardware design methodology based on the Caffe to enable FPGA to support mainline deep learning development features, e.g. training and inference with Caffe. Furthermore, we provide some benchmarks with FeCaffe by taking some classical CNN networks as examples, and further analysis of kernel execution time in details accordingly. Finally, some optimization directions including FPGA kernel design, system pipeline, network architecture, user case application and heterogeneous platform levels, have been proposed gradually to improve FeCaffe performance and efficiency. The result demonstrates the proposed FeCaffe is capable of supporting almost full features during CNN network training and inference respectively with high degree of design flexibility, expansibility and reusability for deep learning development. Compared to prior studies, our architecture can support more network and training settings, and current configuration can achieve 6.4x and 8.4x average execution time improvement for forward and backward respectively for LeNet.

READ FULL TEXT

page 3

page 7

research
07/04/2019

FusionAccel: A General Re-configurable Deep Learning Inference Accelerator on FPGA for Convolutional Neural Networks

The deep learning accelerator is one of the methods to accelerate deep l...
research
02/02/2021

Transparent FPGA Acceleration with TensorFlow

Today, artificial neural networks are one of the major innovators pushin...
research
09/30/2016

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have gained significant traction in...
research
04/13/2023

DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference

Dynamic Graph Neural Networks (DGNNs) are becoming increasingly popular ...
research
08/15/2019

Automatic Compiler Based FPGA Accelerator for CNN Training

Training of convolutional neural networks (CNNs)on embedded platforms to...
research
02/20/2019

DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators

The convolutional neural network (CNN) has become a state-of-the-art met...
research
07/20/2020

HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs

We present both a novel Convolutional Neural Network (CNN) accelerator a...

Please sign up or login with your details

Forgot password? Click here to reset