Auto Adaptive Irregular OpenMP Loops

07/15/2020
by   Joshua Dennis Booth, et al.
0

OpenMP is a standard for the parallelization due to the ease in programming parallel-for loops in a fork-join manner. Many shared-memory applications are implemented using this model despite not being ideal for applications with high load imbalance, such as those that make irregular memory accesses. One parameter, i.e., chunk size, is made available to users in order to mitigate performance loss. However, this parameter is dependent on architecture, system load, application, and input; making it difficult to tune. We present an OpenMP scheduler that does an adaptive tuning for chunk size for unbalanced applications that make irregular memory accesses. In particular, this method(iCh) uses work-stealing for imbalance and adapts chunk size using a force-feedback model that approximates variance of task length in a chunk. This scheduler has low overhead and allows for active load balancing while the applications are running. We demonstrate this using both sparse matrix-vector multiplication (spmv) and Betweenness Centrality (bc) and show that iCh can achieve average speedups close (i.e., within 1.061x for spmv and 1.092x for bc) of either OpenMP loops scheduled with dynamic or work-stealing methods that had chunk size tuned offline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2019

On the Benefits of Anticipating Load Imbalance for Performance Optimization of Parallel Applications

In parallel iterative applications, computational efficiency is essentia...
research
04/12/2020

GAPP: A Fast Profiler for Detecting Serialization Bottlenecks in Parallel Linux Applications

We present a parallel profiling tool, GAPP, that identifies serializatio...
research
11/20/2019

An Adaptive Load Balancer For Graph Analytical Applications on GPUs

Load balancing graph analytics workloads on GPUs is difficult because of...
research
12/17/2022

GPU Load Balancing

Fine-grained workload and resource balancing is the key to high performa...
research
12/14/2018

Impact of Traditional Sparse Optimizations on a Migratory Thread Architecture

Achieving high performance for sparse applications is challenging due to...
research
09/01/2020

Helper Without Threads: Customized Prefetching for Delinquent Irregular Loads

The growing memory footprints of cloud and big data applications mean th...
research
04/16/2019

Calculation of distributed system imbalance in condition of multifractal load

The method of calculating a distributed system imbalance based on the ca...

Please sign up or login with your details

Forgot password? Click here to reset