Generalizing Hierarchical Parallelism

09/05/2023
by   Michael Kruse, et al.
0

Since the days of OpenMP 1.0 computer hardware has become more complex, typically by specializing compute units for coarse- and fine-grained parallelism in incrementally deeper hierarchies of parallelism. Newer versions of OpenMP reacted by introducing new mechanisms for querying or controlling its individual levels, each time adding another concept such as places, teams, and progress groups. In this paper we propose going back to the roots of OpenMP in the form of nested parallelism for a simpler model and more flexible handling of arbitrary deep hardware hierarchies.

READ FULL TEXT
research
04/07/2020

Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism

Shared memory programming models usually provide worksharing and task co...
research
10/21/2021

FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism

FlexTOE is a flexible, yet high-performance TCP offload engine (TOE) to ...
research
06/29/2023

SYCL compute kernels for ExaHyPE

We discuss three SYCL realisations of a simple Finite Volume scheme over...
research
07/14/2021

Model-Parallel Model Selection for Deep Learning Systems

As deep learning becomes more expensive, both in terms of time and compu...
research
06/03/2019

Exploiting nested task-parallelism in the ℋ-LU factorization

We address the parallelization of the LU factorization of hierarchical m...
research
06/11/2019

Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support

The deep neural networks (DNNs) have been enormously successful in tasks...
research
01/08/2022

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Dynamic parallelism on GPUs allows GPU threads to dynamically launch oth...

Please sign up or login with your details

Forgot password? Click here to reset