DMR API: Improving cluster productivity by turning applications into malleable

05/12/2020
by   Sergio Iserte, et al.
0

Adaptive workloads can change on–the–fly the configuration of their jobs, in terms of number of processes. In order to carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager and, through the runtime, to change its number of MPI ranks. The collaboration between both the workload manager—aware of the queue of jobs and the resource allocation—and the parallel runtime—able to transparently handle the processes and the program data—is crucial for our throughput-aware malleability methodology. Hence, when a job triggers a reconfiguration, the resource manager will check the cluster status and return an action: an expansion, if there are spare resources; a shrink, if queued jobs can be initiated; or none, if no change can improve the global productivity. In this paper, we describe the internals of our framework and how it is capable of reducing the global workload completion time along with providing a smarter usage of the underlying resources. For this purpose, we present a thorough study of the adaptive workloads processing by showing the detailed behavior of our framework in representative experiments and the low overhead that our reconfiguration involves.

READ FULL TEXT

page 6

page 10

page 12

page 15

page 21

research
06/01/2022

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

Many organizations routinely analyze large datasets using systems for di...
research
02/14/2021

Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs

Distributed data processing systems like MapReduce, Spark, and Flink are...
research
05/07/2021

An Extensive Analytical Approach on Human Resources using Random Forest Algorithm

The current job survey shows that most software employees are planning t...
research
01/31/2018

Henge: Intent-driven Multi-Tenant Stream Processing

We present Henge, a system to support intent-based multi-tenancy in mode...
research
04/07/2023

Runtime Variation in Big Data Analytics

The dynamic nature of resource allocation and runtime conditions on Clou...
research
12/22/2018

Trevor: Automatic configuration and scaling of stream processing pipelines

Operating a distributed data stream processing workload efficiently at s...
research
12/17/2021

Dynamic resource allocation for efficient parallel CFD simulations

CFD users of supercomputers usually resort to rule-of-thumb methods to s...

Please sign up or login with your details

Forgot password? Click here to reset