A Real Time 1280x720 Object Detection Chip With 585MB/s Memory Traffic

05/02/2022
by   Kuo-Wei Chang, et al.
0

Memory bandwidth has become the real-time bottleneck of current deep learning accelerators (DLA), particularly for high definition (HD) object detection. Under resource constraints, this paper proposes a low memory traffic DLA chip with joint hardware and software optimization. To maximize hardware utilization under memory bandwidth, we morph and fuse the object detection model into a group fusion-ready model to reduce intermediate data access. This reduces the YOLOv2's feature memory traffic from 2.9 GB/s to 0.15 GB/s. To support group fusion, our previous DLA based hardware employes a unified buffer with write-masking for simple layer-by-layer processing in a fusion group. When compared to our previous DLA with the same PE numbers, the chip implemented in a TSMC 40nm process supports 1280x720@30FPS object detection and consumes 7.9X less external DRAM access energy, from 2607 mJ to 327.6 mJ.

READ FULL TEXT

page 4

page 5

page 7

page 9

research
05/09/2022

A Real Time Super Resolution Accelerator with Tilted Layer Fusion

Deep learning based superresolution achieves high-quality results, but i...
research
05/02/2022

Pre-RTL DNN Hardware Evaluator With Fused Layer Support

With the popularity of the deep neural network (DNN), hardware accelerat...
research
05/09/2022

Hardware-Robust In-RRAM-Computing for Object Detection

In-memory computing is becoming a popular architecture for deep-learning...
research
01/16/2018

Inter-thread Communication in Multithreaded, Reconfigurable Coarse-grain Arrays

Traditional von Neumann GPGPUs only allow threads to communicate through...
research
09/04/2023

SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices

AI has led to significant advancements in computer vision and image proc...
research
05/04/2018

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators

We present MAESTRO, a framework to describe and analyze CNN dataflows, a...

Please sign up or login with your details

Forgot password? Click here to reset