Elastic execution of checkpointed MPI applications

05/15/2020
by   Sumeet Gajjar, et al.
0

MPI applications begin with a fixed number of rank and, by default, the rank remains constant throughout the application's lifetime. The developer can choose to increase the rank by dynamically spawning MPI processes. However doing this manually adds complexity to the MPI application. Making the MPI applications malleable <cit.> would allow HPC applications to have the same elasticity as that of cloud applications. We propose multiple approaches to change the rank of an MPI program agnostic to the modification of the user code. We use checkpointing as a tool to achieve mutability of rank by halting the execution and resuming the MPI program with a new state. In this paper, we focus on the scenario of increasing the rank of an MPI program using ExaMPI as the implementation for MPI.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2018

DMTCP Checkpoint/Restart of MPI Programs via Proxies

MPI accomplishes portable, standardized message-passing between processe...
research
02/23/2023

Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

Comprehending the performance bottlenecks at the core of the intricate h...
research
07/30/2020

New approach to MPI program execution time prediction

The problem of MPI programs execution time prediction on a certain set o...
research
02/15/2021

Simulation-based Optimization and Sensibility Analysis of MPI Applications: Variability Matters

Finely tuning MPI applications and understanding the influence of keypar...
research
11/06/2017

Enabling rootless Linux Containers in multi-user environments: the udocker tool

Containers are increasingly used as means to distribute and run Linux se...
research
11/28/2017

MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

We describe a program for the parallel implementation of multiple runs o...
research
05/27/2022

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

This paper studies the utility of using data analytics and machine learn...

Please sign up or login with your details

Forgot password? Click here to reset