Profile-Guided Parallel Task Extraction and Execution for Domain Specific Heterogeneous SoC

11/26/2022
by   Liangliang Chang, et al.
0

In this study, we introduce a methodology for automatically transforming user applications in the radar and communication domain written in C/C++ based on dynamic profiling to a parallel representation targeted for a heterogeneous SoC. We present our approach for instrumenting the user application binary during the compilation process with barrier synchronization primitives that enable runtime system schedule and execute independent tasks concurrently over the available compute resources. We demonstrate the capabilities of our integrated compile time and runtime flow through task-level parallel and functionally correct execution of real-life applications. We perform validation of our integrated system by executing four distinct applications each carrying various degrees of task level parallelism over the Xeon-based multi-core homogeneous processor. We use the proposed compilation and code transformation methodology to re-target each application for execution on a heterogeneous SoC composed of three ARM cores and one FFT accelerator that is emulated on the Xilinx Zynq UltraScale+ platform. We demonstrate our runtime's ability to process application binary, dispatch independent tasks over the available compute resources of the emulated SoC on the Zynq FPGA based on three different scheduling heuristics. Finally we demonstrate execution of each application individually with task level parallelism on the Zynq FPGA and execution of workload scenarios composed of multiple instances of the same application as well as mixture of two distinct applications to demonstrate ability to realize both application and task level parallel execution. Our integrated approach offers a path forward for application developers to take full advantage of the target SoC without requiring users to become hardware and parallel programming experts.

READ FULL TEXT

page 1

page 3

page 5

research
07/22/2022

A Hardware-based HEFT Scheduler Implementation for Dynamic Workloads on Heterogeneous SoCs

Non-uniform performance and power consumption across the processing elem...
research
04/24/2023

CEDR-API: Productive, Performant Programming of Domain-Specific Embedded Systems

As the computing landscape evolves, system designers continue to explore...
research
01/23/2023

Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism

The demise of Moore's Law and Dennard Scaling has revived interest in sp...
research
12/17/2021

Dynamic resource allocation for efficient parallel CFD simulations

CFD users of supercomputers usually resort to rule-of-thumb methods to s...
research
01/31/2022

Overhead Management in Multi-Core Environment

In multi-core systems, various factors like inter-process communication,...
research
07/15/2022

mAPN: Modeling, Analysis, and Exploration of Algorithmic and Parallelism Adaptivity

Using parallel embedded systems these days is increasing. They are getti...
research
04/15/2022

CEDR – A Compiler-integrated, Extensible DSSoC Runtime

In this work, we present CEDR, a Compiler-integrated, Extensible Domain ...

Please sign up or login with your details

Forgot password? Click here to reset