Duet: Creating Harmony between Processors and Embedded FPGAs

01/07/2023
by   Ang Li, et al.
0

The demise of Moore's Law has led to the rise of hardware acceleration. However, the focus on accelerating stable algorithms in their entirety neglects the abundant fine-grained acceleration opportunities available in broader domains and squanders host processors' compute power. This paper presents Duet, a scalable, manycore-FPGA architecture that promotes embedded FPGAs (eFPGA) to be equal peers with processors through non-intrusive, bi-directionally cache-coherent integration. In contrast to existing CPU-FPGA hybrid systems in which the processors play a supportive role, Duet unleashes the full potential of both the processors and the eFPGAs with two classes of post-fabrication enhancements: fine-grained acceleration, which partitions an application into small tasks and offloads the frequently-invoked, compute-intensive ones onto various small accelerators, leveraging the processors to handle dynamic control flow and less accelerable tasks; hardware augmentation, which employs eFPGA-emulated hardware widgets to improve processor efficiency or mitigate software overheads in certain execution models. An RTL-level implementation of Duet is developed to evaluate the architecture with high fidelity. Experiments using synthetic benchmarks show that Duet can reduce the processor-accelerator communication latency by up to 82 The RTL implementation is further evaluated with seven application benchmarks, achieving 1.5-24.9x speedup.

READ FULL TEXT

page 1

page 3

page 7

page 9

research
06/21/2016

A Soft Processor Overlay with Tightly-coupled FPGA Accelerator

FPGA overlays are commonly implemented as coarse-grained reconfigurable ...
research
05/02/2019

On Linear Learning with Manycore Processors

A new generation of manycore processors is on the rise that offers dozen...
research
06/02/2021

Dagger: Accelerating RPCs in Cloud Microservices Through Tightly-Coupled Reconfigurable NICs

The ongoing shift of cloud services from monolithic designs to microserv...
research
08/27/2015

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

Offloading compute intensive nested loops to execute on FPGA accelerator...
research
08/28/2015

GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs

In recent years, architectures combining a reconfigurable fabric and a g...
research
03/06/2020

Stretching the capacity of Hardware Transactional Memory in IBM POWER architectures

The hardware transactional memory (HTM) implementations in commercially ...
research
07/27/2019

Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems

Innovations in Next-Generation Sequencing are enabling generation of DNA...

Please sign up or login with your details

Forgot password? Click here to reset