HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs

07/20/2020
by   Mathew Hall, et al.
0

We present both a novel Convolutional Neural Network (CNN) accelerator architecture and a network compiler for FPGAs that outperforms all prior work. Instead of having generic processing elements that together process one layer at a time, our network compiler statically partitions available device resources and builds custom-tailored hardware for each layer of a CNN. By building hardware for each layer we can pack our controllers into fewer lookup tables and use dedicated routing. These efficiencies enable our accelerator to utilize 2x the DSPs and operate at more than 2x the frequency of prior work on sparse CNN acceleration on FPGAs. We evaluate the performance of our architecture on both sparse Resnet-50 and dense MobileNet Imagenet classifiers on a Stratix 10 2800 FPGA. We find that the sparse Resnet-50 model has throughput at a batch size of 1 of 4550 images/s, which is nearly 4x the throughput of NVIDIA's fastest machine learning targeted GPU, the V100, and outperforms all prior work on FPGAs.

READ FULL TEXT

page 3

page 6

research
06/30/2022

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

This paper introduces the sparse periodic systolic (SPS) dataflow, which...
research
12/28/2021

FPGA Based Accelerator for Neural Networks Computation with Flexible Pipelining

FPGA is appropriate for fix-point neural networks computing due to high ...
research
05/14/2020

ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network

Image Understanding is becoming a vital feature in ever more application...
research
05/26/2021

ATRIA: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing

With the rapidly growing use of Convolutional Neural Networks (CNNs) in ...
research
05/16/2020

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

Convolutional neural network (CNN) inference on mobile devices demands e...
research
10/11/2021

Decomposing Convolutional Neural Networks into Reusable and Replaceable Modules

Training from scratch is the most common way to build a Convolutional Ne...
research
11/18/2019

FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10

Deep learning and Convolutional Neural Network (CNN) have becoming incre...

Please sign up or login with your details

Forgot password? Click here to reset