SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices

AI has led to significant advancements in computer vision and image processing tasks, enabling a wide range of applications in real-life scenarios, from autonomous vehicles to medical imaging. Many of those applications require efficient object detection algorithms and complementary real-time, low latency hardware to perform inference of these algorithms. The YOLO family of models is considered the most efficient for object detection, having only a single model pass. Despite this, the complexity and size of YOLO models can be too computationally demanding for current edge-based platforms. To address this, we present SATAY: a Streaming Architecture Toolflow for Accelerating YOLO. This work tackles the challenges of deploying stateof-the-art object detection models onto FPGA devices for ultralow latency applications, enabling real-time, edge-based object detection. We employ a streaming architecture design for our YOLO accelerators, implementing the complete model on-chip in a deeply pipelined fashion. These accelerators are generated using an automated toolflow, and can target a range of suitable FPGA devices. We introduce novel hardware components to support the operations of YOLO models in a dataflow manner, and off-chip memory buffering to address the limited on-chip memory resources. Our toolflow is able to generate accelerator designs which demonstrate competitive performance and energy characteristics to GPU devices, and which outperform current state-of-the-art FPGA accelerators.

READ FULL TEXT
research
04/09/2019

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

While embedded FPGAs are attractive platforms for DNN acceleration on ed...
research
05/02/2022

A Real Time 1280x720 Object Detection Chip With 585MB/s Memory Traffic

Memory bandwidth has become the real-time bottleneck of current deep lea...
research
09/15/2021

A Column Streaming-Based Convolution Engine and Mapping Algorithm for CNN-based Edge AI accelerators

Edge AI accelerators have been emerging as a solution for near customers...
research
04/06/2023

ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators

Image processing algorithms are prime targets for hardware acceleration ...
research
06/25/2019

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

Developing artificial intelligence (AI) at the edge is always challengin...
research
06/12/2020

CoDeNet: Algorithm-hardware Co-design for Deformable Convolution

Deploying deep learning models on embedded systems for computer vision t...
research
05/31/2023

fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs

Surveillance systems, autonomous vehicles, human monitoring systems, and...

Please sign up or login with your details

Forgot password? Click here to reset