DeepAI AI Chat
Log In Sign Up

FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software

by   Jin Hee Kim, et al.

A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.


GRVI Phalanx: A Massively Parallel RISC-V FPGA Accelerator Accelerator

GRVI is an FPGA-efficient RISC-V RV32I soft processor. Phalanx is a para...

Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning

We present a novel characterization of the mapping of multiple paralleli...

TinyCNN: A Tiny Modular CNN Accelerator for Embedded FPGA

In recent years, Convolutional Neural Network (CNN) based methods have a...

Arrow: A RISC-V Vector Accelerator for Machine Learning Inference

In this paper we present Arrow, a configurable hardware accelerator arch...

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Using FPGAs to accelerate ConvNets has attracted significant attention i...

SqueezeJet: High-level Synthesis Accelerator Design for Deep Convolutional Neural Networks

Deep convolutional neural networks have dominated the pattern recognitio...

A Scalable Pipelined Dataflow Accelerator for Object Region Proposals on FPGA Platform

Region proposal is critical for object detection while it usually poses ...