Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC

03/28/2018
by   Guanwen Zhong, et al.
0

Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for datacenter-scale environments. The recent proliferation of mobile and IoT devices have necessitated real-time, energy-efficient deep neural network inference on embedded-class, resource-constrained platforms. In this context, we present Synergy, an automated, hardware-software co-designed, pipelined, high-throughput CNN inference framework on embedded heterogeneous system-on-chip (SoC) architectures (Xilinx Zynq). Synergy leverages, through multi-threading, all the available on-chip resources, which includes the dual-core ARM processor along with the FPGA and the NEON SIMD engines as accelerators. Moreover, Synergy provides a unified abstraction of the heterogeneous accelerators (FPGA and NEON) and can adapt to different network configurations at runtime without changing the underlying hardware accelerator architecture by balancing workload across accelerators through work-stealing. Synergy achieves 7.3X speedup, averaged across seven CNN models, over a well-optimized software-only solution. Synergy demonstrates substantially better throughput and energy-efficiency compared to the contemporary CNN implementations on the same SoC architecture.

READ FULL TEXT
research
11/14/2020

Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Custom dataflow Convolutional Neural Network (CNN) inference accelerator...
research
12/04/2017

NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs

Deep convolutional neural networks (CNNs) obtain outstanding results in ...
research
02/20/2019

DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators

The convolutional neural network (CNN) has become a state-of-the-art met...
research
10/03/2021

Heterogeneous Dual-Core Overlay Processor for Light-Weight CNNs

Light-weight convolutional neural networks (CNNs) have small complexity ...
research
10/21/2019

Automatic Generation of Multi-precision Multi-arithmetic CNN Accelerators for FPGAs

Modern deep Convolutional Neural Networks (CNNs) are computationally dem...
research
07/25/2023

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation

The unprecedented accuracy of convolutional neural networks (CNNs) acros...
research
02/23/2018

PIRT: A Runtime Framework to Enable Energy-Efficient Real-Time Robotic Applications on Heterogeneous Architectures

Enabling full robotic workloads with diverse behaviors on mobile systems...

Please sign up or login with your details

Forgot password? Click here to reset