Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

03/22/2019
by   Ahmed Eleliemy, et al.
0

Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach.

READ FULL TEXT
research
12/14/2018

Dynamic Loop Scheduling Using MPI Passive-Target Remote Memory Access

Scientific applications often contain large computationally-intensive pa...
research
06/28/2022

NumS: Scalable Array Programming for the Cloud

Scientists increasingly rely on Python tools to perform scalable distrib...
research
01/18/2021

A Distributed Chunk Calculation Approach for Self-scheduling of Parallel Applications on Distributed-memory Systems

Loop scheduling techniques aim to achieve load-balanced executions of sc...
research
05/09/2018

MPI+X: task-based parallelization and dynamic load balance of finite element assembly

The main computing tasks of a finite element code(FE) for solving partia...
research
04/30/2019

Pushing the Limit: A Hybrid Parallel Implementation of the Multi-resolution Approximation for Massive Data

The multi-resolution approximation (MRA) of Gaussian processes was recen...
research
05/13/2018

Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform

Advances in detectors and computational technologies provide new opportu...
research
12/18/2019

HDOT – an Approach Towards Productive Programming of Hybrid Applications

MPI applications matter. However, with the advent of many-core processor...

Please sign up or login with your details

Forgot password? Click here to reset