HPX Smart Executors

11/05/2017
by   Zahra Khatami, et al.
0

The performance of many parallel applications depends on loop-level parallelism. However, manually parallelizing all loops may result in degrading parallel performance, as some of them cannot scale desirably to a large number of threads. In addition, the overheads of manually tuning loop parameters might prevent an application from reaching its maximum parallel performance. We illustrate how machine learning techniques can be applied to address these challenges. In this research, we develop a framework that is able to automatically capture the static and dynamic information of a loop. Moreover, we advocate a novel method by introducing HPX smart executors for determining the execution policy, chunk size, and prefetching distance of an HPX loop to achieve higher possible performance by feeding static information captured during compilation and runtime-based dynamic information to our learning model. Our evaluated execution results show that using these smart executors can speed up the HPX execution process by around 12 Stream and 2D Stencil benchmarks compared to setting their HPX loop's execution policy/parameters manually or using HPX auto-parallelization techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2023

Evaluating the Performance of Speculative DOACROSS Loop Parallelization with taskloop

OpenMP provides programmers with directives to parallelize DOALL loops s...
research
05/07/2022

Can We Run in Parallel? Automating Loop Parallelization for TornadoVM

With the advent of multi-core systems, GPUs and FPGAs, loop parallelizat...
research
01/31/2022

Overhead Management in Multi-Core Environment

In multi-core systems, various factors like inter-process communication,...
research
07/17/2023

Maximum Flows in Parametric Graph Templates

Execution graphs of parallel loop programs exhibit a nested, repeating s...
research
12/14/2018

Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

Scientific applications often contain large computationally-intensive pa...
research
11/30/2022

A comparison between Automatically versus Manually Parallelized NAS Benchmarks

We compare automatically and manually parallelized NAS Benchmarks in ord...
research
03/08/2019

Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond

We propose a static loop vectorization optimization on top of high level...

Please sign up or login with your details

Forgot password? Click here to reset