Taskgraph: A Low Contention OpenMP Tasking Framework

12/09/2022
by   Chenle Yu, et al.
0

OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a task-based model that offers a high-level of abstraction to effectively exploit highly dynamic structured and unstructured parallelism in an easy and flexible way. Unfortunately, the run-time overheads introduced to manage tasks are (very) high in most common OpenMP frameworks (e.g., GCC, LLVM), which defeats the potential benefits of the tasking model, and makes it suitable for coarse-grained tasks only. This paper presents taskgraph, a framework that uses a task dependency graph (TDG) to represent a region of code implemented with OpenMP tasks in order to reduce the run-time overheads associated with the management of tasks, i.e., contention and parallel orchestration, including task creation and synchronization. The TDG avoids the overheads related to the resolution of task dependencies and greatly reduces those deriving from the accesses to shared resources. Moreover, the taskgraph framework introduces in OpenMP the record-and-replay execution model that accelerates the taskgraph region from its second execution. Overall, the multiple optimizations presented in this paper allow exploiting fine-grained OpenMP tasks to cope with the trend in current applications pointing to leverage massive on-node parallelism, fine-grained and dynamic scheduling paradigms. The framework is implemented on LLVM 15.0. Results show that the taskgraph implementation outperforms the vanilla OpenMP system in terms of performance and scalability, for all structured and unstructured parallelism, and considering coarse and fine grained tasks. Furthermore, the proposed framework considerably reduces the performance gap between the task and the thread models of OpenMP.

READ FULL TEXT

page 3

page 9

page 11

research
04/07/2020

Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism

Shared memory programming models usually provide worksharing and task co...
research
09/27/2016

An Evaluation of Coarse-Grained Locking for Multicore Microkernels

The trade-off between coarse- and fine-grained locking is a well underst...
research
05/17/2021

Advanced Synchronization Techniques for Task-based Runtime Systems

Task-based programming models like OmpSs-2 and OpenMP provide a flexible...
research
05/19/2010

Efficient System-Enforced Deterministic Parallelism

Deterministic execution offers many benefits for debugging, fault tolera...
research
04/12/2020

Accelerating Filesystem Checking and Repair with pFSCK

File system checking and recovery (C/R) tools play a pivotal role in inc...
research
04/23/2018

goSLP: Globally Optimized Superword Level Parallelism Framework

Modern microprocessors are equipped with single instruction multiple dat...
research
09/16/2020

PySchedCL: Leveraging Concurrency in Heterogeneous Data-Parallel Systems

In the past decade, high performance compute capabilities exhibited by h...

Please sign up or login with your details

Forgot password? Click here to reset