Open-MPI over MOSIX: paralleled computing in a clustered world

06/29/2019
by   Adam Lev-Libfeld, et al.
0

Recent increased interest in Cloud computing emphasizes the need to find an adequate solution to the load-balancing problem in parallel computing – efficiently running several jobs concurrently on a cluster of shared computers (nodes). One approach to solve this problem is by preemptive process migration – the transfer of running processes between nodes. A possible drawback of this approach is the increased overhead between heavily communicating processes. This project presents a solution to this last problem by incorporating the process migration capability of MOSIX into Open-MPI and by reducing the resulting communication overhead. Specifically, we developed a module for direct communication (DiCOM) between migrated Open-MPI processes, to overcome the increased communication latency of TCP/IP between such processes. The outcome is reduced run-time by improved resource allocation.

READ FULL TEXT

page 7

page 12

page 14

research
12/12/2022

Collective Vector Clocks: Low-Overhead Transparent Checkpointing for MPI

MPI is the de facto standard for parallel computation on a cluster of co...
research
12/29/2020

Improving the Performance and Resilience of MPI Parallel Jobs with Topology and Fault-Aware Process Placement

HPC systems keep growing in size to meet the ever-increasing demand for ...
research
07/29/2019

Improving MPI Collective I/O Performance With Intra-node Request Aggregation

Two-phase I/O is a well-known strategy for implementing collective MPI-I...
research
07/01/2019

Distributed-Memory Load Balancing with Cyclic Token-based Work-Stealing Applied to Reverse Time Migration

Reverse time migration (RTM) is a prominent technique in seismic imaging...
research
02/07/2020

Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

Analytic, first-principles performance modeling of distributed-memory pa...
research
12/13/2013

Transparent Checkpoint-Restart over InfiniBand

InfiniBand is widely used for low-latency, high-throughput cluster compu...
research
09/27/2019

COUNTDOWN Slack: a Run-time Library to Reduce Energy Footprint in Large-scale MPI Applications

The power consumption of supercomputers is a major challenge for system ...

Please sign up or login with your details

Forgot password? Click here to reset